Data Engineering Resources

The Latest Data Engineering Topics

SelectMany: Probably The Most Powerful LINQ Operator

Hi there back again. Hope everyone is already exploiting the power of LINQ on a fairly regular basis. Okay, everyone knows by now how simple LINQ queries with a where and select (and orderby, and Take and Skip and Sum, etc) are translated from a query comprehension into an equivalent expression for further translation: from p in products where p.Price > 100 select p.Name becomes products.Where(p => p.Price > 100).Select(p => p.Name) All blue syntax highlighting has gone; the compiler is happy with what remains and takes it from there in a left-to-right fashion (so, it depends on the signature of the found Where method whether or not we take the route of anonymous methods or, in case of an Expression<…> signature, the route of expression trees). But let’s make things slightly more complicated and abstract: from i in afrom j in bwhere i > jselect i + j It’s more complicated because we have two from clauses; it’s more abstract because we’re using names with no intrinsic meaning. Let’s assume a and b are IEnumerable sequences in what follows. Actually what the above query means in abstract terms is: (a X b).Where((i, j) => i > j).Select((i, j) => i + j) where X is a hypothetical Cartesian product operator, i.e. given a = { 1, 4, 7 } and b = { 2, 5, 8 }, it produces { (1,2), (1,5), (1,8), (4,2), (4,5), (4,8), (7,2), (7,5), (7,8) }, or all the possible pairs with elements from the first sequence combined with an element from the second sequence. For the record, the generalized from of such a pair – having any number of elements – would be a tuple. If we would have this capability, Where would get a sequence of such tuples, and it could identify a tuple in its lambda expression as a set of parameters (i, j). Similarly, Select would do the same and everyone would be happy. You can verify the result would be { 6, 9, 12 }. Back to reality now: we don’t have the direct equivalent of Cartesian product in a form that produces tuples. In addition to this, the Where operator in LINQ has a signature like this: IEnumerable Where(this IEnumerable source, Func predicate) where the predicate parameter is a function of one – and only one – argument. The lambda (i, j) => i > j isn’t compatible with this since it has two arguments. A similar remark holds for Select. So, how can we get around this restriction? SelectMany is the answer. Demystifying SelectMany What’s the magic SelectMany all about? Where could we better start our investigation than by looking at one of its signatures? IEnumerable SelectMany( this IEnumerable source, Func> collectionSelector, Func resultSelector) Wow, might be a little overwhelming at first. What does it do? Given a sequence of elements (called source) of type TSource, it asks every such element (using collectionSelector) for a sequence of – in some way related – elements of type TCollection. Next, it combines the currently selected TSource element with all of the TCollection elements in the returned sequence and feed it in to resultSelector to produce a TResult that’s returned. Still not clear? The implementation says it all and is barely three lines: foreach (TSource item in source) foreach (TCollection subItem in collectionSelector(item)) yield return resultSelector(item, subItem); This already gives us a tremendous amount of power. Here’s a sample: products.SelectMany(p => p.Categories, (p, c) => p.Name + “ has category “ + c.Name) How can we use this construct to translate multiple from clauses you might wonder? Well, there’s no reason the function passed in as the first argument (really the second after rewriting the extension method, i.e. the collectionSelector) uses the TSource argument to determine the IEnumerable result. For example: products.SelectMany(p => new int[] { 1, 2, 3 }, (p, i) => p.Name + “ with irrelevant number “ + i) will produce a sequence of strings like “Chai with irrelevant number 1”, “Chai with irrelevant number 2”, “Chai with irrelevant number 3”, and similar for all subsequent products. This sample doesn’t make sense but it illustrates that SelectMany can be used to form a Cartesian product-like sequence. Let’s focus on our initial sample: var a = new [] { 1, 4, 7 };var b = new [] { 2, 5, 8 };from i in afrom j in bselect i + j; I’ve dropped the where clause for now to simplify things a bit. With our knowledge of SelectMany above we can now translate the LINQ query into: a.SelectMany(i => b, …) This means: for every i in a, “extract” the sequence b and feed it into …. What’s the …’s signature? Something from a (i.e. an int) and something from the result of the collectionSelector (i.e. an int from b), is mapped onto some result. Well, in this case we can combine those two values by summing them, therefore translating the select clause in one go: a.SelectMany(i => b, (i, j) => i + j) What happens when we introduce a seemingly innocent where clause in between? from i in afrom j in bwhere i > jselect i + j; The first two lines again look like: a.SelectMany(i => b, …) However, going forward from there we’ll need to be able to reference i (from a) and j (from b) in both the where and select clause that follow but both the corresponding Where and Select methods only take in “single values”: IEnumerable Where(this IEnumerable source, Func predicate);IEnumerable Select(this IEnumerable source, Func projection); So what can we do to combine the value i and j into one single object? Right, use an anonymous type: a.SelectMany(i => b, (i, j) => new { i = i, j = j }) This produces a sequence of objects that have two public properties “i” and “j” (since it’s anonymous we don’t care much about casing, and indeed the type never bubbles up to the surface in the query above, because of what follows: a.SelectMany(i => b, (i, j) => new { i = i, j = j }).Where(anon => anon.i > anon.j).Select(anon => anon.i + anon.j) In other words, all references to i and j in the where and select clauses in the original query expression have been replaced by references to the corresponding properties in the anonymous type spawned by SelectMany. Lost in translation This whole translation of this little query above puts quite some work on the shoulder of the compiler (assuming a and b are IEnumerable and nothing more, i.e. no IQueryable): The lambda expression i => b captures variable b, hence a closure is needed. That same lambda expression acts as a parameter to SelectMany, so an anonymous method will be created inside the closure class. For new { i = i, j = j } an anonymous type needs to be generated. SelectMany’s second argument, Where’s first argument and Select’s first argument are all lambda expressions that generate anonymous methods as well. As a little hot summer evening exercise, I wrote all of this plumbing manually to show how much code would be needed in C# 2.0 minus closures and anonymous methods (more or less C# 1.0 plus generics). Here’s where we start from: class Q{ IEnumerable GetData(IEnumerable a, IEnumerable b) { return from i in a from j in b where i > j select i + j; } This translates into: class Q{ IEnumerable GetData(IEnumerable a, IEnumerable b) { Closure0 __closure = new Closure0(); __closure.b = b; return Enumerable.Select( Enumerable.Where( Enumerable.SelectMany( a, new Func>(__closure.__selectMany1), new Func>(__selectMany2) ), new Func, bool>(__where1) ), new Func, int>(__select1) ); } private class Closure0 { public IEnumerable b; public IEnumerable __selectMany1(int i) { return b; } } private static Anon0 __selectMany2(int i, int j) { return new Anon0(i, j); } private static bool __where1(Anon0 anon) { return anon.i > anon.j; } private static int __select1(Anon0 anon) { return anon.i + anon.j; }private class Anon0 // generics allow reuse of type for all anonymous types with 2 properties, hence the use of EqualityComparers in the implementation{ private readonly TI _i; private readonly TJ _j; public Anon0(TI i, TJ t2) { _i = i; _j = j; } public TI i { get { return _i; } } public TJ j { get { return _j; } } public override bool Equals(object o) { Anon0 anonO = o as Anon0; return anonO != null && EqualityComparer.Default.Equals(_i, anonO._i) && EqualityComparer.Default.Equals(_j, anonO._j); } public override int GetHashCode() { return EqualityComparer.Default.GetHashCode(_i) ^ EqualityComparer.Default.GetHashCode(_j); // lame quick-and-dirty hash code } public override string ToString() { return “( i = “ + i + “, j = ” + j + “ }”; // lame without StringBuilder } Just a little thought… Would you like to go through this burden to write a query? “Syntactical sugar” might have some bad connotation to some, but it can be oh so sweet baby! Bind in disguise Fans of “monads”, a term from category theory that has yielded great results in the domain of functional programming as a way to make side-effects explicit through the type system (e.g. the IO monad in Haskell), will recognize SelectMany’s (limited) signature to match the one of bind: IEnumerable SelectMany( this IEnumerable source, Func> collectionSelector) corresponds to: (>>=) :: M x –> (x –> M y) –> M y Which is Haskell’s bind operator. For those familiar with Haskell, the “do” notation – that allows the visual illusion of embedding semi-colon curly brace style of “imperative programming” in Haskell code – is syntactical sugar on top of this operator, defined (recursively) as follows: do { e } = edo { e; s } = e >>= \_ –> do { s }do { x <- e; s } = e >>= (\x –> do { s })do { let x = e; s } = let x = e in do { s } Rename to SelectMany, replace M x by IEnumerable and assume a non-curried form and you end up with: SelectMany :: (IEnumerable, x –> IEnumerable) –> IEnumerable Identifying x with TSource, y with TResult and turning a –> b into Func yields: SelectMany :: Func, Func>, IEnumerable> and you got identically the same signature as the SelectMany we started from. For the curious, M in the original form acts as a type constructor, something the CLR doesn’t support since it lacks higher-order kinded polymorphism; it’s yet another abstraction one level higher than generics that math freaks love to use in category theory. The idea is that if you can prove laws to be true in some “structure” and you can map that structure onto an another “target structure” by means of some mapping function, corresponding laws will hold true in the “target structure” as well. For instance: ({ even, odd }, +) and ({ pos, neg }, *) can be mapped onto each other pairwise and recursively, making it possible to map laws from the first one to the second one, e.g. even + odd –> oddpos * neg –> neg This is a largely simplified sample of course, I’d recommend everyone who’s interested to get a decent book on category theory to get into the gory details. A word of caution Now that you know how SelectMany works, can you think of a possible implication when selecting from multiple sources? Let me give you a tip: nested foreachs. This is an uninteresting sentence that acts as a placeholder in the time space while you’re thinking about the question. Got it? Indeed, order matters. Writing the following two lines of code produces a different query with a radically different execution pattern: from i in a from j in b …from j in b from i in a … Those roughly correspond to: foreach (var i in a) foreach (var j in b) … versus foreach (var j in b) foreach (var i in a) … But isn’t this much ado about nothing? No, not really. What if iterating over b is much more costly than iterating over a? For example, from p in localCollectionOfProductsfrom c in sqlTableOfCategories… This means that for every product iterated locally, we’ll reach out to the database to iterate over the (retrieved) categories. If both were local, there wouldn’t be a problem of course; if both were remote, the (e.g.) SQL translation would take care of it to keep the heavy work on the remote machine. If you want to see the difference yourself, you can use the following simulation: using System; using System.Collections.Generic; using System.Diagnostics; using System.Linq; using System.Threading; class Q { static void Main() { Stopwatch sw = new Stopwatch(); Console.WriteLine("Slow first"); sw.Start(); foreach (var s in Perf(Slow(), Fast())) Console.WriteLine(s); sw.Stop(); Console.WriteLine(sw.Elapsed); sw.Reset(); Console.WriteLine("Fast first"); sw.Start(); foreach (var s in Perf(Fast(), Slow())) Console.WriteLine(s); sw.Stop(); Console.WriteLine(sw.Elapsed); } static IEnumerable Perf(IEnumerable a, IEnumerable b) { return from i in a from j in b select i + "," + j; } static IEnumerable Slow() { Console.Write("Connecting... "); Thread.Sleep(2000); // mimic query overhead (e.g. remote server) Console.WriteLine("Done!"); yield return 1; yield return 2; yield return 3; } static IEnumerable Fast() { return new [] { 'a', 'b', 'c' }; } } This produces: [img_assist|nid=4625|title=|desc=|link=none|align=none|width=259|height=374] Obviously, it might be the case you’re constructing a query that can only execute by reaching out to the server multiple times, e.g. because order of the result matters (see screenshot above for an illustration of the ordering influence – but some local sorting operation might help too in order to satisfy such a requirement) or because the second query source depends on the first one (from i in a from j in b(i) …). There’s no silver bullet for a solution but knowing what happens underneath the covers certainly provides the necessary insights to come up with scenario-specific solutions. Happy binding!

August 20, 2008

by Bart De Smet

· 135,299 Views · 1 Like

LINQ: Folding Left, Right And The LINQ Aggregation Operator

Discussion of functional programming concepts, focusing on fold operations and their implementation in C# using LINQ and expression trees.

August 18, 2008

by Bart De Smet

· 10,330 Views

Service-Orientation vs. Object-Orientation: Understanding the Impedance Mismatch

Object-oriented programming languages and techniques provide a powerful means for designing and building applications. These techniques do not always translate well into a service oriented paradigm. Service orientation demands a different set of design guidelines and requirements than an object-oriented application. Understanding how an object-oriented design can negatively impact a service-oriented design is key to building services that support an agile enterprise. This article examines where the two designs impact each other as well as methods for addressing the incompatibilities between the two while still leveraging the power of both. by Larry Guger Introduction Object orientation is a good thing. I would like to believe that I write code in a well-defined object oriented manner, taking advantage of all the goodness that is provided, such as encapsulation, polymorphism and inheritance. These are important concepts that make modern software applications easier to develop, enhance and maintain. I’m sold. Service-orientation is a good thing too. As the industry moves toward service-orientation we naturally take along a lot of what we have learned over the years and apply this to the new way of doing things. Visual Studio, the .NET framework, and especially Windows Communication Foundation (WCF) support the development of service-oriented applications. This was one of the core design goals behind WCF, but moving from object-oriented design techniques to service-oriented design techniques is not without its challenges. Part of the reason can be apportioned towards the tooling. We have very mature tools to support object-oriented design and development but fewer tools that emphasize service-oriented design. This is partly because we, as an industry, are still really figuring out what service-orientation really means. This article explores what I am referring to as the “impedance mismatch” between the two design paradigms. Note: Most of the concepts and ideas presented are not specific to .NET or Visual Studio until I refer to the namespace generation problem later in the discussion (as it manifests itself in Visual Studio). However, it is worth noting that this problem can present itself on any platform. Designing an OO System Here is an example of one model that would make sense in the object-oriented world: Figure 1 Figure 1 is a simplified model designed to support some form of purchasing functionality provided by the application. A customer contains a mailing address and a shipping address and the customer can also hold many contracts with the company that is supplying products. The customer is able to place orders against these contracts with any given order containing one or more line items each of which is an order for a product. In addition, the model permits the developer to navigate from a PurchaseOrder object to the Contract under which the order was placed by using an object reference. Likewise, a Contract will contain a collection of all of the PurchaseOrders placed under it. This model is also repeated between the Customer and its Contracts as well as PurchaseOrders and OrderLineItems. We now have a nice object model that permits navigation between related objects in any direction. Service Enablement In a distributed computing environment such as is found in almost every enterprise today there is a business logic/service tier that resides on some central server farm that exposes services for working with the contained business functionality. This service tier is accessed by a client tier, whether the client is a Web application, a rich client application or a B2B service implementation. To support the object model above, one can imagine a collection of services that are targeted at a handful of business needs: customer services, contract services and order services. Each of these services has its own endpoint with a few methods to support working with each of the primary business types identified. To be specific, a service could be developed to support working with customer data that would contain methods such as CreateCustomer, SearchCustomers, and GetCustomer. Each of these methods would either accept or return customer objects and if you retrieved a customer object using the GetCustomer service you could inspect the contracts that the customer has as well as the orders placed under each contract. Chances are that if you retrieved a customer using the GetCustomer method you would not be getting the populated contract objects along with it at that time. You would need to make additional calls to retrieve individual contracts and associated orders if that was the information you were after. The same concept should hold for working with contracts or orders. As you can see, the object hierarchy is maintained. Assuming that this is the approach taken and the GetCustomer method returns a serialized object of type Customer once that customer object is deserialized in the client application it is easy enough to create contract objects, attach them to the contracts collection on the customer and create order objects and attach them to the contract objects and we have the same object oriented goodness on the client as we have on the server. This is a standard approach when first building service enabled applications. Unfortunately this does not work well for service-oriented applications that are intended to be reused throughout the enterprise for other purposes beyond the initial application. Here’s why. Service Referencing Let’s think about developing the client side portion of our application, whether it is a web, WinForms or WPF application does not matter. To begin, we create a user interface for dealing with all things related to customers. We can create new customers, modify existing ones and search for customers based on various criteria. Once we have the user interface defined we add a service reference, using Visual Studio, to our previously created service which in turn generates our classes and proxies. We instantiate objects, add data supplied by the user and submit these objects to our services. Pretty standard stuff. Next we move on to developing the portion of the user interface that deals with contracts. We follow the same pattern and things are working well until we get to a namespace collision. When we added a reference to the customer related service, Visual Studio generated code for the classes that make up the return types and the request types our service supports. This will include all of the serializable types in the customer object graph. When we add a reference to the contract service, Visual Studio will generate the code for the classes that make up the return and request types that this service supports. This will also include all of the serializable types in the contract object graph. In other words we end up with generated code for all of the classes described above twice! This is because each reference generates the classes in a distinct namespace. Even if you use the same reference name Visual Studio will alter them slightly to make them distinct. For example if both the first and second service references are given the name “localhost”, Visual Studio will append a “1” to the end of the first reference making the namespace begin with “localhost1”. You now have two Customer classes, localhost.Customer and localhost1.Customer, as well as two of every class in the respective object graphs. Now your code is duplicated and stored in different types. You cannot create an instance of a localhost.Customer object and assign it to a variable of type localhost1.Customer. Not only do you not get object compatibility but you also end up with a whole bunch of equivalent classes under different namespaces. There are a few commonly used ways to deal with this problem: 1. Add a reference to the assembly that contains your objects to your UI projects, and alter the service references to use that/those assemblies rather than generating the code. 2. Alter the generated code to remove the duplicates. 3. Alter the code generation process to reference the assemblies containing your data objects. 4. Develop mapping code to translate from one type to another. There are challenges with all of these. The first and third options may not be possible if you don’t have access to the assemblies, perhaps you are referencing services that you did not develop, and perhaps the objects contain code that shouldn’t reside on the UI side of things such as database access logic. The second option is simply fraught with perils as the code will be updated and need re-altering after every reference update, and if this is in mid development there may be many updates. The fourth option adds extra work and who needs extra work? The Service-oriented Approach The correct approach is to avoid these problems completely when developing your services. Here’s how. A Customer service should know about customers and data directly related to a customer only. Your Customer service methods should return a slightly different object graph than the object graph defined above. You will still have the customer object and you will still have the addresses for that customer but that’s it. Break the graph at the collection of contracts. When the client requires the contracts for a customer they need to submit a request to the Contract service asking for the contracts for a particular customer. This should actually be the same approach regardless of the object graph in use. Whether the customer explicitly asks for the contracts or the system hides the implementation details and makes the call to retrieve contracts are simply implementation details. The fact that the object graph does not contain direct links to the contracts would not impact the user experience. Let me explain. Regardless of the size of the object graph in use it should be a rare case in which the entire graph is populated and returned by a service call. Generally you will find that only a portion of the objects are in use for any given user action. To minimize the amount of data that is sent over a network only the relevant objects should be populated and returned. The code that is developed on the client side takes the responsibility for calling the appropriate services to populate further objects in the graph as the user requests them - this is often termed “lazy loading”. The goal of “lazy loading” is to retrieve only the data that is required so as to improve performance of the application. In fact, the simplified object graph better supports this design approach than the more complex graph does. With the complex graph, if you have a Contract object and need to perform some work with the related Customer for that contract you still need a sparsely populated Customer object that contains, at a minimum, the customer Id that can be used to retrieve the full customer object from the service. With the simplified graph the Contract class contains a customer Id directly. The relations between the objects are still maintained, however with the simplified version the relationships are more explicit than implicit, as they are with the complex version. To bring this back to the user experience, the user should not know whether the underlying code has been developed using the simplified or complex graph. They should be able to navigate from a contract to the related customer just as easily. It is up to you, the developer, to make this experience seemless. Figure 2 Again the object graph is kept simple as shown in Figure 2. As with the customer, contracts would not maintain a full customer object reference as part of the contract definition on the client side of the equation. A Contract object passed from the service would consist of only itself, no customer, and no PurchaseOrders. In place of the customer reference each contract object would maintain the customer identifier so as to be able to uniquely identify which customer the contract belongs to and provide the means to “navigate” to the customer object when required. As a side note, when requesting the collection of contracts for any given customer the collection of data returned should only consist of enough data in each object to uniquely identify the contract being sought, whether these objects are sparsely populated contract objects or a “light” version of the contract class does not really matter. You can then query for the full contract object based on the unique identifier that would be part of the collection of query results from the first request that returned the collection. Again, these are implementation details that are hidden from the user experience and need to be determined based on application needs. The End Result The end results of using this approach are: • Returned result data is kept to the relevant bits. Even though this goal can still be achieved with a more complex object graph this approach enforces it. • When adding a service reference using tools like Visual Studio the generated classes have a smaller object graph which avoids having multiple identical classes in different namespaces. • Cleaner separation of duties. Once you have adjusted to using this approach you will find that you need to create a mapping layer just behind the services - to map to and from the interfaces exposed by the services and your more complex object graphs. Either that or you can use simpler object graphs on the server side. However, these simple object graphs may not provide the functionality needed by the consumer of the service, especially if the consumer is a rich client application. In that case, a more complex object model can be designed and mapped to the simpler objects returned by the service. This keeps the service architecture simple while allowing the ability to use all of the OO principles that we’ve learned over the years. When it’s not needed, the simpler objects can simply be used as is with no additional work required. By keeping the server side object graphs at the same simplicity as the service interfaces you will find that your code becomes cleaner, more modular, your classes become more cohesive and there is less coupling between your objects. In addition, the flexibility to reuse the customer service with other services which rely on customer information, but are sourced from a different repository, is easier to achieve. Merging multiple customer data systems into a single customer service also becomes much easier if the customer data is de-coupled from any other data. This now leads to a potential increase in service-orientation reuse, and a happier enterprise. Conclusion Both object oriented and service-oriented design and develop techniques have their place in modern systems development. Object oriented systems fit well in a stateful environment while a service-oriented approach requires a stateless environment. There is nothing wrong with the strong object oriented approach as described at the start of this paper however it will not serve you well if you try and expose those object graphs through a service. With years of OO experience it’s easy to fall into OO design by default, but when designing systems we need to shift our mindset and think about what we are designing for. If it’s an SOA system, a traditional OO approach may not be the best. The tight coupling will get you in trouble as you expand the reach and reuse of your services throughout your enterprise. Keep the interfaces into your services simple and focused and you will find that your services become much easier to manage and become much more scalable. To summarize, OO is, by its nature, stateful while SOA is, by its nature, stateless. This is where the impedance mismatch shows itself.

July 31, 2008

by Masoud Kalali

· 43,654 Views

Compute Grids vs. Data Grids

in a nutshell, grid computing is a way to distribute your computations across multiple computers (nodes). however, even jms does that, but jms is not a grid computing product - it's a messaging protocol. to correctly classify grid computing products we have to split them into 2 categories: compute grids and data grids. compute grid compute grids allow you to take a computation, optionally split it into multiple parts, and execute them on different grid nodes in parallel. the obvious benefit here is that your computation will perform faster as it now can use resources from all grid nodes in parallel. one of the most common design patterns for parallel execution is mapreduce . however, compute grids are useful even if you don't need to split your computation - they help you improve overall scalability and fault-tolerance of your system by offloading your computations onto most available nodes. some of the "must have" compute grid features are: automatic deployment - allows for automatic deployment of classes and resources onto grid without any extra steps from user. this feature alone provides one of the largest productivity boosts in distributed systems. users usually are able to simply execute a task from one grid node and as task execution penetrates the grid, all classes and resources are also automatically deployed. topology resolution - allows to provision nodes based on any node characteristic or user-specific configuration. for example, you can decide to only include linux nodes for execution, or to only include a certain group of nodes within certain time window. you should also be able to choose all nodes with cpu loaded, say, under 50% that have more than 2gb of available heap memory. collision resolution - allows users to control which jobs get executed, which jobs get rejected, how many jobs can be executed in parallel, order of overall execution, etc. load balancing - allows to balance properly balance your system load within grid. usually range of load balancing policies varies within products. some of the most common ones are round robin, random, or adaptive. more advanced vendors also provide affinity load balancing where grid jobs always end up on the same node based on job's affinity key. this policy works well with data grids described below. fail-over - grid jobs should automatically fail-over onto other nodes in case of node crash or some other job failure. checkpoints - long running jobs should be able to periodically store their intermediate state. this is useful for fail-overs, when a failed job should be able to pick up its execution from the latest checkpoint, rather than start from scratch. grid events - a querying mechanism for all grid events is essential. any grid node should be able to query all events that happened on remote grid nodes during grid task execution. node metrics - a good compute grid solution should be able to provide dynamic grid metrics for all grid nodes. metrics should include vital node statistics, from cpu load to average job execution time. this is especially useful for load balancing, when the system or user need to pick the least loaded node for execution. pluggability - in order to blend into any environment a good compute grid should have well thought out pluggability points. for example, if running on top of jboss, a compute grid should totally reuse jboss communication and discovery protocols. data grid integration - it is important that compute grid are able to natively integrate with data grids as quite often businesses will need both, computational and data features working within same application. some compute grid vendors: - gridgain - professional open source - jppf - open source data grid data grids allow you to distribute your data across the grid. most of us are used to the term distributed cache rather than data grid (data grid does sound more savvy though). the main goal of data grid is to provide as much data as possible from memory on every grid node and to ensure data coherency. some of the important data grid features include: data replication - all data is fully replicated to all nodes in the grid. this strategy consumes the most resources, however it is the most effective solution for read-mostly scenarios, as data is available everywhere for immediate access. data invalidation - in this scenario, nodes load data on demand. whenever data changes on one of the nodes, then the same data on all other nodes is purged (invalidated). then this data will be loaded on-demand the next time it is accessed. distributed transactions - transactions are required to ensure data coherency. cache updates must work just like database updates - whenever an update failed, then the whole transaction must be rolled back. most data grid support various transaction policies, such as read committed, write committed, serializable, etc... data backups - useful for fail-over. some data grid products provide ability to assign backup nodes for the data. this way whenever a node crashes, the data is immediately available from another node. data affinity/partitioning - data affinity allows you to split/partition your whole data set into multiple subsets and assign every subset to a grid node. in the purest form, data is not replicated between nodes at all, every node is only responsible for it's own subset of data. however, various data grid products may provide different flavors of data affinity, such as replication only to back up nodes for example. data affinity is one of the more advanced features, and is not provided by every vendor. to my knowledge, according to product websites, out of commercial vendors oracle coherence and gemstone have it (there may be others). in professional open source space you can take a look at combination of gridgain with affinity load balancing and jbosscache . some data grid/cache vendors: - oracle coherence - commercial - gemstone - commercial - gigaspaces - commercial - jbosscache - professional open source - ehcache - open source

July 31, 2008

by Dmitriy Setrakyan

· 28,336 Views · 3 Likes

Adding SWT Input Validation the Easy Way

Any input provided by a user in a GUI application must typically be validated in one way or another. There is a number of ways this gets done, while some applications have just ignored the matter altogether. When crafting an Eclipse RCP application, there are some help provided by SWT and JFace. We can add ModifyListeners and VerifyListeners to certain SWT widgets. JFace also provides ControlDecorations to help us indicate to the user where a problem with a specific input value exists. The problem is that these are at a low level, and we need to do a lot of "monkey"-coding just to add basic validation and error indication to a widget, and then we're not even touching the world of input masks. If you're like me, you want to concentrate on solving your business problem, and don't want to write lots of basic UI code over and over. This is where the RCP Toolbox is very useful. It provides a light-weight validation framework (among other features) that makes it much easier to add validation and input masks to SWT Text, Combo and CCombo widgets. The goal Let us have a look at how to define a basic wizard for creating a new Booking. This wizard must capture the following fields from the user: Name: the name of the booking, typically the name of the person making the booking. May not be empty, and preferably not less than three characters. Date: the date and time of the booking. Must be any time from the current time to the end of the year. Number of Persons: the number of persons to book for. Must be a number from 1 to 10. Telephone Number: the telephone number of the person making the booking. Must be in the form +(country code) (area code) number, e.g. +44 (33) 555-1111. And of course we want indicators next to each field when an error or warning condition exists in the field, as well as a message being written to the WizardDialog's message area. For fun we want the user to be able to get a quick-fix option on the date field for setting it to the current time. The Validation framework The RCP Toolbox provides a number of custom widgets and a easy to use validation framework. Adding validation starts with the ValidationToolkit class. This class gets instantiated to work with a specific type of contents, and is then used to create ValidatingField instances that can handle that type of contents. The rest of the framework deals with interfaces and default implementations to facilitate the validation of contents, definition of input masks, provision of quick-fixes, error-handling and conversion of the input text to specific class types. The WizardPage and the ValidationToolkits We start by first defining our BookingWizardPage class and instantiating the necessary ValidationToolkit instances. //Not all imports are shown import com.richclientgui.toolbox.validation.IFieldErrorMessageHandler; import com.richclientgui.toolbox.validation.ValidationToolkit; import com.richclientgui.toolbox.validation.converter.DateStringConverter; import com.richclientgui.toolbox.validation.converter.IntegerStringConverter; import com.richclientgui.toolbox.validation.string.StringValidationToolkit; public class BookingWizardPage extends WizardPage { private static final int DECORATOR_POSITION = SWT.TOP | SWT.LEFT; private static final int DECORATOR_MARGIN_WIDTH = 1; private static final int DEFAULT_WIDTH_HINT = 150; private StringValidationToolkit strValToolkit = null; private ValidationToolkit dateValToolkit = null; private ValidationToolkit intValToolkit = null; private final IFieldErrorMessageHandler errorMessageHandler; public BookingWizardPage() { super("booking.pageone","New Booking Entry", null); errorMessageHandler = new WizardPageErrorHandler(); } public void createControl(Composite parent) { final Composite composite = new Composite(parent, SWT.NONE); composite.setLayout(new GridLayout(2, false)); strValToolkit = new StringValidationToolkit(DECORATOR_POSITION, DECORATOR_MARGIN_WIDTH, true); strValToolkit.setDefaultErrorMessageHandler(errorMessageHandler); intValToolkit = new ValidationToolkit(new IntegerStringConverter(), DECORATOR_POSITION, DECORATOR_MARGIN_WIDTH, true); intValToolkit.setDefaultErrorMessageHandler(errorMessageHandler); dateValToolkit = new ValidationToolkit(new DateStringConverter(), DECORATOR_POSITION, DECORATOR_MARGIN_WIDTH, true); dateValToolkit.setDefaultErrorMessageHandler(errorMessageHandler); //TODO: create ValidatingFields setControl(composite); } } The StringValidationToolkit class we instantiate in line 28 is a ValidationToolkit that deals specifically with ValidatingFields that have normal String contents. In line 32 we instantiate a typed instance of ValidationToolkit that will create ValidatingFields that only takes Integers as input. We must provide a way that the contents of the fields are converted from a String to the correct content type. This is done with a set of coverter classes provided by the framework. In lines 32 and 36 we specify a IntegerStringConverter and a DateStringConverter to convert Integer and java.util.Date values respectively. The framework makes use of the JFace org.eclipse.jface.fieldassist.ControlDecoration and related classes to indicate whether a field has an error or warning condition, whether it is a required field, and if there is a quick-fix available (by right-clicking on the decorator icon) for the current error or warning condition. The position of these decorator icons relative to the input widgets as well as the margin width between the decorator icon and the widget can be specified when constructing a ValidationToolkit. All the fields created by this ValidationToolkit instance will use the same settings for there decorator icons. In lines 9 - 10 we have defined some constants for the decorator position and margins, and we use this for constructing all the ValidationToolkit instances. Handling the error messages We also make use of an IFieldErrorMessageHandler to get feedback from the validation process. The validation framework will call these error handlers when error or warning conditions occur, and allow us to do something with those messages. By default these messages are only displayed in the tooltips of the decorator icons. A default error handler can be specified for each toolkit instance, or a separate handler can be set for each ValidatingField if so required. The inner class WizardPageErrorHandler implements the IFieldErrorMessageHandler interface and basically just set the messages on the WizardPage's message area. //inner class of BookingWizardPage class WizardPageErrorHandler implements IFieldErrorMessageHandler { public void handleErrorMessage(String message, String input) { setMessage(null, DialogPage.WARNING); setErrorMessage(message); } public void handleWarningMessage(String message, String input) { setErrorMessage(null); setMessage(message, DialogPage.WARNING); } public void clearMessage() { setErrorMessage(null); setMessage(null, DialogPage.WARNING); } } The actual error or warning messages are generated by the various IFieldValidator implementations (we'll get to those), and can easily be customized by implementing custom validators. Creating a simple ValidatingField Of course just having some toolkit instances does not help us much. We need actual input widgets that are being validated. The first step is to update the createControl method. public void createControl(Composite parent) { final Composite composite = new Composite(parent, SWT.NONE); composite.setLayout(new GridLayout(2, false)); strValToolkit = new StringValidationToolkit(DECORATOR_POSITION, DECORATOR_MARGIN_WIDTH, true); strValToolkit.setDefaultErrorMessageHandler(errorMessageHandler); intValToolkit = new ValidationToolkit(new IntegerStringConverter(), DECORATOR_POSITION, DECORATOR_MARGIN_WIDTH, true); intValToolkit.setDefaultErrorMessageHandler(errorMessageHandler); dateValToolkit = new ValidationToolkit(new DateStringConverter(), DECORATOR_POSITION, DECORATOR_MARGIN_WIDTH, true); dateValToolkit.setDefaultErrorMessageHandler(errorMessageHandler); createNameField(composite); createDateField(composite); createNumberPersonsField(composite); createTelephoneNumberField(composite); setControl(composite); } Then we can look at creating our first validated input field that makes use of a SWT Text widget to capture the name of the person doing the booking. private void createNameField(Composite composite) { new Label(composite, SWT.NONE).setText("Booking Name:"); final ValidatingField nameField = strValToolkit.createTextField( composite, new IFieldValidator(){ public String getErrorMessage() { return "Name may not be empty."; } public String getWarningMessage() { return "That's a very short name..."; } public boolean isValid(String contents) { return !(contents.length()==0); } public boolean warningExist(String contents) { return contents.length() < 3; } }, true, ""); GridData gd = new GridData(SWT.LEFT, SWT.CENTER, false, false); gd.widthHint = DEFAULT_WIDTH_HINT; nameField.getControl().setLayoutData(gd); } Since this field works with String contents, we make use of the strValToolkit instance to create the field in line 4 above. We specify the parent composite that the input widget must be added to, the IFieldValidator that will be used to validate the field contents, whether this is a required field or not (thus whether the required decorator icon must be shown or not) and an initial empty string value for the field. Note that this call will also create a Text widget to be used for the field, but the API allows that you can create your own Text, Combo or CCombo instance and pass that to the toolkit to use when creating a new ValidatingField. An anonymous inner class implementation of IFieldValidator is specified in lines 5 - 23. We're doing some very basic validation checks in this example, but it is easy to implement validators that makes use of other heavy-weight business validation frameworks. Our validator will indicate an error condition if the contents of the field is empty (line 16), in which case the error message "Name may not be empty." will be displayed (line 8). This validator will also indicate a warning condition if the name field contains less than 3 characters (line 20) with the message "That's a very short name..." (line 12). In lines 24 - 26 we set the layout of the input widget on the composite. Dating an input mask Dates have always been a difficult input type to deal with. The easiest way for us developers to deal with them are to use widgets that pop up a calendar from where the user can choose a day, and possibly a time as well. However, this way of dealing with dates are not always a favourite with touch-typing end-users. They prefer some masked field where they only need to fill in the bits of the date that matter. Using the mouse should be restricted as far as possible. Luckily the RCP Toolbox provides a way of specifying input masks, as well as specific implementations of validators and converters for dealing with Dates. We want to create a field that only takes dates as input, where the date entered must be of the form yyyy-MM-dd HH:mm (e.g 2008-07-19 21:00), and fall in the date range starting at the current time and ending at the end of the year 2008. private void createDateField(Composite composite) { new Label(composite, SWT.NONE).setText("Booking Time:"); final Date endYear = getEndYearDate(); //we create a Date field that takes input of form yyyy-MM-dd HH:mm //and only allows values from now till the end of the year final ValidatingField rangedDateField = dateValToolkit.createTextField(composite, new RangedDateFieldValidator( DateFieldValidator.DATE_TIME_HHMM_DASH, dateValToolkit.getStringConverter(), new Date(), endYear), true, new Date()); GridData gd = new GridData(SWT.LEFT, SWT.CENTER, false, false); gd.widthHint = DEFAULT_WIDTH_HINT; rangedDateField.getControl().setLayoutData(gd); } Here we used the dateValToolkit instance to create the field as the contents of the field must be a java.util.Date. We specify an instance of RangedDateFieldValidator (provided by the framework) that makes use of the specified Date input mask pattern DateFieldValidator.DATE_TIME_HHMM_DASH (line 9) to validate the contents of the field. Other patterns are available, or custom ones can also be defined. The DateStringConverter specified when dateValToolkit was constructed will be used to convert from Dates to Strings and vice versa. In line 11 we specify the valid date range for the field, from the current date and time to the end of the year, and in line 13 we set the initial value of the field to the current time. Providing quick-fixes I found that developers using Eclipse RCP to develop their applications like to make the experience for the end-user as good as possible. So we should not stop at just validating input; we must also try and help them quickly fix mistakes, where possible. In this example we can do that by specifying a IQuickFixProvider. //add at end of createDateField(..) method //we add a quickfix that will set it to the current date rangedDateField.setQuickFixProvider(new IQuickFixProvider(){ public boolean doQuickFix(ValidatingField field) { field.setContents(new Date()); return true; } public String getQuickFixMenuText() { return "Set to current time"; } public boolean hasQuickFix(Date contents) { //would typically first check contents to determine if quickfix //is possible return true; } }); The above is a very simple quick-fixer. It always says it has a quick-fix available (line 17), where a more complex provider will first check the contents of the field to determine if there is a quick-fix. When the user performs the quick-fix, it just sets the contents of the field to the current date and time (line 6). When the validation framework detects there is an error condition on a field, it will see if there is a IQuickFixProvider available with a quick-fix. If this is the case, it will add the quick-fix option to a context-menu on the decorator icon with the text specified in line 11. All the user then needs to do is right-click on the decorator icon and select the quick-fix to perform. Validating Combos A Combo widget would be just the thing to use for capturing the number of persons for the booking. Our requirements say we must limit the number to a maximum of 10 people (and a minimum of 1 goes without saying). Once again we do not want to force the user to use numerous mouse-clicks or keystrokes to select the number, so we do not make the Combo read-only, and rather decide to add validation to it. private void createNumberPersonsField(Composite composite) { new Label(composite, SWT.NONE).setText("Number of persons:"); final ValidatingField numberPersonsField = intValToolkit.createComboField( composite, new StrictRangedNumberFieldValidator(1, 10){ public String getErrorMessage() { return "Bookings for groups bigger than 10 not allowed"; } public String getWarningMessage() { return null; } public boolean warningExist(Integer contents) { return false; } }, true, 2, new Integer[]{1,2,3,4,5,6,7,8,9,10}); GridData gd = new GridData(SWT.LEFT, SWT.CENTER, false, false); gd.widthHint = DEFAULT_WIDTH_HINT; numberPersonsField.getControl().setLayoutData(gd); } Here we make use of the intValToolkit.createComboField method to create a field containing a Combo widget with contents of type Integer. We specify a StrictRangedNumberFieldValidator (line 6) to ensure that the entered value only consists of digits and falls in the range 1 to 10. No warning conditions are checked. In line 22 we populate the Combo with a list of Integers from 1 to 10, and in line 21 we select a default value of 2. As easy as counting to 5. And don't forget the telephone number Freeform telephone number fields are used a lot, but unfortunately end-users can easily make mistakes in such fields. We want to force our user to input the telephone number in a specific form, thus at least preventing the cases where digits are missed. To do this, we make use of the framework's TelephoneNumberValidator. This validator allows telephone number to be entered in either international format (e.g. +44 (55) 555-5555) or in domestic format (e.g. (055) 555-5555). private void createTelephoneNumberField(Composite composite) { new Label(composite, SWT.NONE).setText("Contact Telephone Nr:"); final ValidatingField telephoneField = strValToolkit.createTextField( composite, new TelephoneNumberValidator(true), true, "+44 (55) 555-4321"); GridData gd = new GridData(SWT.LEFT, SWT.CENTER, false, false); gd.widthHint = DEFAULT_WIDTH_HINT; telephoneField.getControl().setLayoutData(gd); } We are using strValToolkit to create the field, since the contents will be managed as a String. Then we specify a TelephoneNumberValidator as the validator in line 7, with the true parameter indicating that we want to use the international format. In line 9 we provide an initial value. Conclusion This article describes a very simple example of how to add validation to SWT widgets using the RCP Toolbox. In a real-world application the actual business validations to be done might be more complex, but if this validation framework is used, the UI code related to validation would remain as simple as the code in these examples. This validation framework really made our development much easier, allowing us to concentrate on the business code. It is very easy to extend the framework with custom validators, converters, quick-fix providers and error handlers that ties into existing business code or other validation code, rules engines etc. Note that the examples in this article need Eclipse RCP 3.3 or 3.4 as well as RCP Toolbox v1.0.1, created and distributed by www.richclientgui.com.

July 21, 2008

by Herman Lintvelt

· 48,873 Views

ASP.NET - Query Strings - Client Side State Management

Continuing the tour in the ASP.NET client side state management our current stop is the query string technique. You can read my previous posts in the state management subject in the following links: Client side state management introduction ViewState technique Hidden fields technique What are Query Strings? Query strings are data that is appended to the end of a page URL. They are commonly used to hold data like page numbers or search terms or other data that isn't confidential. Unlike ViewState and hidden fields, the user can see the values which the query string holds without using special operations like View Source. An example of a query string can look like http://www.srl.co.il?a=1;b=2. Query strings are included in bookmarks and in URLs that you pass in an e-mail. They are the only way to save a page state when copying and pasting a URL. The Query String Structure As written earlier, query strings are appended to the end of a URL. First a question mark is appended to the URL's end and then every parameter that we want to hold in the query string. The parameters declare the parameter name followed by = symbol which followed by the data to hold. Every parameter is separated with the ampersand symbol. You should always use the HttpUtility.UrlEncode method on the data itself before appending it. Query String Limitations You can use query string technique when passing from one page to another but that is all. If the first page need to pass non secure data to the other page it can build a URL with a query string and then redirect. You should always keep in mind that a query string isn't secure and therefore always validate the data you received. There are a few browser limitation when using query strings. For example, there are browsers that impose a length limitation on the query string. Another limitation is that query strings are passed only in HTTP GET command. How To Use Query Strings When you need to use a query string data you do it in the following way: string queryStringData = Request.QueryString["data"]; In the example I extract a data query string. The structure of the URL can look like url?data=somthing. After getting to data parameter value you should validate it in order not to enable security breaches. The next example is a code to help inject a query string into a URL: public string BuildQueryString(string url, NameValueCollection parameters){ StringBuilder sb = new StringBuilder(url); sb.Append("?"); IEnumerator enumerator = parameters.GetEnumerator(); while (enumerator.MoveNext()) { // get the current query parameter string key = enumerator.Current.ToString(); // insert the parameter into the url sb.Append(string.Format("{0}={1}&", key, HttpUtility.UrlEncode(parameters[key]))); } // remove the last ampersand sb.Remove(sb.Length - 1, 1); return sb.ToString(); } Summary To sum up the post, query string is another ASP.NET client side state management technique. It is most helpful for page number state or search terms. The technique isn't secured so avoid using it with confidential data. In the next post in this series I'll explain the how to use cookies.

July 20, 2008

by Gil Fink

· 77,652 Views

Introducing Caching for Java Applications (Part 1)

Caching may address new challenges when developing performing applications.

July 17, 2008

by Slava Imeshev

· 141,176 Views · 7 Likes

GWT Basic Project Structure And Components

[img_assist|nid=3421|title=|desc=|link=url|url=http://www.manning.com/affiliate/idevaffiliate.php?id|align=left|width=208|height=388]The core of every GWT project is the project layout and the basic components required—host pages, entry points, and modules. To begin a GWT project, you need to create the default layout and generate the initial files. The easiest way to do this is to use the provided ApplicationCreator tool. Generating a project ApplicationCreator is provided by GWT to create the default starting points and layout for a GWT project. ApplicationCreator, like the GWT shell, supports several command-line parameters, which are listed in table 1. ApplicationCreator [-eclipse projectName] [-out dir] [-overwrite] [-ignore] className Table 1 ApplicationCreator command-line parameters Parameter Description -eclipse Creates a debug launch configuration for the named eclipse project -out The directory to which output files will be written (defaults to the current directory) -overwrite Overwrites any existing files -ignore Ignores any existing files; does not overwrite className The fully qualified name of the application class to be created To stub out an example calculator project, we’ll use ApplicationCreator based on a relative GWT_HOME path, and a className of com.manning.gwtip.calculator.client.Calculator, as follows: mkdir [PROJECT_HOME] cd [PROJECT_HOME] [GWT_HOME]/applicationCreator com.manning.gwtip.calculator.client.Calculator GWT_HOME It is recommended that you establish GWT_HOME as an environment variable referring to the filesystem location where you have unpacked GWT. Additionally, you may want to add GWT_HOME to your PATH for further convenience. We use GWT_HOME when referencing the location where GWT is installed and PROJECT_HOME to refer to the location of the current project. PATH SEPARATORS For convenience, when referring to filesystem paths, we'll use forward slashes, which work for two-thirds of supported GWT platforms. If you are using Windows, please adjust the path separators to use backward slashes. Running ApplicationCreator as described creates the default src directory structure and the starting-point GWT file resources. The standard directory structure Even though it's quite simple, the GWT layout is very important because the toolkit can operate in keeping with a Convention over Configuration design approach. As we’ll see, several parts of the GWT compilation process make assumptions about the default layout. Because of this, not everything has to be explicitly defined in every instance (which cuts down on the amount of configuration required). Taking a look at the output of the ApplicationCreator script execution, you will see a specific structure and related contents, as shown in listing 1. This represents the default configuration for a GWT project. Listing 1 ApplicationCreator output, showing the default GWT project structure: src src/com src/com/manning src/com/manning/gwtip src/com/manning/gwtip/calculator src/com/manning/gwtip/calculator/Calculator.gwt.xml src/com/manning/gwtip/calculator/client src/com/manning/gwtip/calculator/client/Calculator.java src/com/manning/gwtip/calculator/public src/com/manning/gwtip/calculator/public/Calculator.html Calculator-shell.sh Calculator-compile.sh The package name, com.manning.gwtip.calculator, is represented in the structure as a series of subdirectories in the src tree. This is the standard Java convention, and there are notably separate client and public subdirectories within. The client directory is intended for resources that will be compiled into JavaScript . Client items are translatable, or serializable, and will ultimately be downloaded to a client browser—these are Java resources in the source. The client package is known in GWT terminology as the source path. The public directory denotes files that will also be distributed to the client, but that do not require compilation and translation to JavaScript . This typically includes CSS, images, static HTML, and any other such assets that should not be translated, including existing JavaScript. The public package is known as the public path. Note that our client-side example does not use any server resources, but GWT does include the concept of a server path/package for server-side resources. Figure 1 illustrates this default GWT project layout. [img_assist|nid=4037|title=|desc=|link=none|align=none|width=293|height=284] ApplicationCreator generates the structure and a required set of minimal files for a GWT project. The generated files include the XML configuration module definition, the entry point Java class, and the HTML host page. These are some of the basic GWT project concepts. Along with the module definition, entry point, and host page, some shortcut scripts have also been created for use with the GWTShell and GWTCompiler tools. These scripts run the shell and compiler for the project. Table 2 lists all of the files created by ApplicationCreator: the basic resources and shortcut scripts needed for a GWT project. Table 2 ApplicationCreator-generated initial project files that serve as a starting point for GWT applications File Name Purpose GWT module file ProjectName.gwt.xml Defines the project configuration Entry point class ProjectName.java Starting class invoked by the module Host page ProjectName.html Initial HTML page that loads the module GWTShell shortcut invoker script ProjectName-shell.sh Invokes GWTShell for the project GWTCompiler shortcut invoker script ProjectName-compile.sh Invokes GWTCompiler for the project The starting points ApplicationCreator provides essentially wire up all the moving parts for you and stub out your project. You take it from there and modify these generated files to begin building a GWT application. If the toolkit did not provide these files via ApplicationCreator, getting a project started, at least initially, would be much more time consuming and confusing. Once you are experienced in the GWT ways, you may wind up using other tools to kick off a project: an IDE plugin, a Maven “archetype,” or your own scripts. ApplicationCreator, though, is the helpful default. The contents and structure that ApplicationCreator provides are themselves a working GWT “hello world” example. You get “hello world” for free, out of the box. "Hello world", however, is not that interesting. The connection of all the moving parts is what is really important; how a host page includes a module, how a module describes project resources, and how an entry point invokes project code. These concepts are applicable to all levels of GWT projects—the basic ones and beyond. Understanding these parts is key to gaining an overall understanding of GWT. Next, we’ll take a closer look at each of these concepts, beginning with the host page. Host pages A host page is the initial HTML page that invokes a GWT application. A host page contains a script tag that references a special GWT JavaScript file, Module.nocache.js. This JavaScript file, which the toolkit provides when you compile your project, kicks off the GWT application loading process. Along with the script reference that loads the project resources, you can also specify several GWT-related tags in the host page. These tag options are not present in the default host page created by ApplicationCreator, but it’s still important to be aware of them. The GWT tags that are supported in a host page are listed in table 3, as a reference. Table 3 GWT tags supported in host pages Meta tag Syntax Purpose gwt:module (Legacy, pre GWT 1.4.) Specifies the module to be loaded gwt:property Statically defines a deferred binding client property gwt:onPropertyErrorFn Specifies the name of a function to call if a client property is set to an invalid value (meaning that no matching compilation will be found) gwt:onLoadErrorFn Specifies the name of a function to call if an exception happens during bootstrapping or if a module throws an exception out of onModuleLoad(); the function should take a message parameter Thus, a host page includes a script reference that gets the GWT process started and refers to all the required project resources. The required resources for a project are assembled by the GWT compilation process, and are based on the module configuration. Modules GWT applications inhabit a challenging environment. This is partly because of the scope of responsibility GWT has elected to take on and partly because of the Internet landscape. Being a rich Internet-based platform and using only the basic built-in browser support for HTML, CSS, and JavaScript makes GWT quite elegant and impressive, but this combination is tough to achieve. Browsers that are “guided” by standards, but that don’t always stick to them, add to the pressure. Couple that environment with an approach that aims to bring static types, code standards, profiling and debugging, inheritance, and reuse to the web tier, and you have a tall order. To help with this large task, GWT uses modules as configuration and execution units that handle discreet areas of responsibility. Modules enable the GWT compiler to optimize the Java code it gets fed, create variants for all possible situations from a single code base, and make inheritance and property support possible. One of the most important resources generated by the ApplicationCreator is the Module.gwt.xml module descriptor for your project. This file exists in the top-level directory of your project’s package and provides a means to define resource locations and structure. In a default generated module file, there are only two elements: and . An element simply includes the configuration for another named GWT module in the current definition, and defines a class that kicks things off and moves from configuration to code. Table 4 provides an overview of the most common GWT module descriptor elements. Table 4 A summary of the most common elements supported by the GWT module descriptor Module element Description Identifies additional GWT modules that should be inherited into the current module Specifies which EntryPoint class should be invoked when starting a GWT project Identifies where the source code that should be translated into JavaScript by the GWT compiler is located Identifies where assets that are not translatable source code, such as images and CSS files, are located

July 14, 2008

by Schalk Neethling

· 32,402 Views

Concurrency and HashMap

In theory everyone knows Hash Map is not Thread Safe and it shouldn’t be used in multi Threaded applications. But still people come out with their own theories that they can use HashMap in their context. Some say they are just reading the data and map is not written to a lot. Unfortunately none of these explanations holds good when one lands up in a synchronization issue. Normally most of the guys do not understand fundamentals around Java Memory Model and Concurrency .One cannot blame them for not knowing their fundamentals as its hard for people to visualize concurrent executions since from college days we are used to sequential executions of program. Enuf of blame game, now lets look into some code.Have a look at the code mentioned below and come out with all your theories of what can go wrong with it or theories which say its all correct. public class MapTestTask implements Runnable { private Map hashMap; private Object value = new Object(); public MapTestTask(Map map) { this.hashMap = map; } public void run() { hashMap.put(Thread.currentThread(), value); Object retrieved = hashMap.get(Thread.currentThread()); if (retrieved == null) { // Can it ever Happen } } } Now question is when we run multiple such Threads can we ever see retrieved as null. If we look from sequential point of view it can never happen.But when concurrency comes into picture this can happen. I will give you a code with which can reproduce this scenario. This is my sincere advise : do not over engineer when Concurreny is involved. Because none of the theories will stand the test of time in a concurrent environment. As a rule of thumb use Thread safe collections wherever concurrency is involved. Finally i will leave you with this interesting bug in Java which says HashMap can get into infinite loop when used in muti Threaded Environment which further reiterates my point that not all the possible scenarios can be visualized in a multi threaded environment and therefore one should rely on basics rather than trying a smart creativity which has high probability of landing into trouble at some point in future. Source Code for Reproducing package test; import java.util.Map; public class MapTestTask implements Runnable { private Map hashMap; private Object value = new Object(); public MapTestTask(Map map) { this.hashMap = map; } public void run() { hashMap.put(Thread.currentThread(), value); Object retrieved = hashMap.get(Thread.currentThread()); if (retrieved == null) { // Can it ever Happen System.out.println("Oh My God it can happen."); } } } package test; import java.util.Map; import java.util.HashMap; public class TestMap { public static void main(String[] args) throws InterruptedException { Map map = new HashMap(); int NUM_THREADS = 1000; Thread[] threads = new Thread[NUM_THREADS]; for (int i = 0; i < NUM_THREADS; i++) { threads[i] = new Thread(new MapTestTask(map)); } for (int i = 0; i < NUM_THREADS; i++) { threads[i].start(); } for (int i = 0; i < NUM_THREADS; i++) { threads[i].join(); } } } From http://pitfalls.wordpress.com/2008/06/29/concurrencyandhashmap/

June 30, 2008

by Pavitar Singh

· 37,805 Views

Glimmer - Using Ruby to Build SWT User Interfaces

Glimmer is a JRuby DSL that enables easy and efficient authoring of user-interfaces using the robust platform-independent Eclipse SWT library. Glimmer comes with built-in data-binding support to greatly facilitate synchronizing UI with domain models. The goal of the Glimmer project is to create a JRuby framework on top of Eclipse technologies to enable easy and efficient authoring of desktop applications by taking advantage of the Ruby language. With Glimmer having just become an Eclipse project, it's a good time to find out more. Philosophy Glimmer's design philosophy can be summarized as follows: Concise and DRY Asks for minimum info needed to accomplish task Convention over configuration As predictable as possible for existing SWT developers Conventions Since Glimmer relies on Ruby, it is different in its syntax and conventions from what typical Java SWT developers would expect: Method parentheses are optional Java-vs-Ruby example: show() => show Method names follow underscored syntax Java-vs-Ruby example: addListener => add_listener Classes are constructed using the new(...) method (as opposed to new keyword): Java-vs-Ruby example: new GridLayout() => GridLayout.new Download Please download Glimmer from RubyForge: https://rubyforge.org/projects/glimmer/ NOTE: Glimmer is moving to Eclipse.org. Please visit http://andymaleh.blogspot.com for up-to-date news on the move and the upcoming download location on the Eclipse website. Installation Extract the Glimmer zip file and follow the installation instructions in the README file. NOTE: While Glimmer is platform-independent, its functionality has only been verified on Windows. Feedback from Mac and Linux users would be greatly appreciated. Tutorial Let's start with a very simple Glimmer Hello World example: shell { label { text “Hello World!” } } This will render the following: [img_assist|nid=3586|title=|desc=|link=none|align=undefined|width=126|height=48] In the SWT library a shell represents an application's window. It acts as a frame around the application widgets, which are visual components that display information and/or enable interaction with the user. One widget that was used in the Hello World example is the label widget, which simply displays text on the screen. Shell is also considered a widget, except it is a special kind of widget called composite. The shell keyword, which declared the application's shell, was followed by a block of code encased in curly braces. This block contains the shell content declarations, such as the Hello World label. The label keyword was also followed by a block of code. However, this block contained a property declaration for the label, stating that the text value is “Hello World!” So, to declare a widget, simply state its name followed by a block of code. The block may specify property values or nest other widget declarations for composite widgets. Now, let's move on to a more advanced example: shell { text "User Profile" composite { layout GridLayout.new(2, false) group { text "Name" layout GridLayout.new(2, false) layout_data GridData.new(fill, fill, true, true) label {text "First"}; text {text "Bullet"} label {text "Last"}; text {text "Tooth"} } group { layout_data GridData.new(fill, fill, true, true) text "Gender" button(radio) {text "Male"; selection true} button(radio) {text "Female"} } group { layout_data GridData.new(fill, fill, true, true) text "Role" button(check) {text "Student"; selection true} button(check) {text "Employee"; selection true} } group { text "Experience" layout RowLayout.new layout_data GridData.new(fill, fill, true, true) spinner {selection 5}; label {text "years"} } button { text "save" layout_data GridData.new(right, center, true, true) } button { text "close" layout_data GridData.new(left, center, true, true) } } }.open This will render the following: [img_assist|nid=3587|title=|desc=|link=none|align=undefined|width=195|height=209] The example contains a variation of widgets from SWT: Composite: a widget that can simply contain other widgets and manage their layout Group: Similar to Composite except that it usually has a border and a title. Text Field: Enables user to type in text information Checkbox Button: Allows user to make a selection from different options Radio Button: Allows user to make a selection between options that are mutually exclusive Spinner: Enables user to type in numeric information or spinning a number selection by mouse Push Button: Enables user to initiate actions Given that Glimmer relies on the Eclipse SWT library, developers may consult the SWT API as a reference on all the widgets, including their properties and layout options: http://help.eclipse.org/stable/nftopic/org.eclipse.platform.doc.isv/reference/api/index.html Keep in mind the following rules when reading the SWT API: Any widget available in SWT, including custom widgets written by developers, can be accessed from Glimmer by downcasing/underscoring the widget's name (e.g. Composite -> composite, LabledText -> labeled_text) Properties available on SWT widgets are specified by listing them followed by their values, each on a line or separated by semicolons within the widget's block (e.g. label {text "Username:"; font some_font}) Property names are also downcased/underscored in Glimmer. SWT widgets must have a style value specified, which is a constant available on the “SWT” class. Glimmer generally hides that by relying on smart defaults. Here is a listing of the defaults configured in Glimmer: text: SWT::BORDER table: SWT::BORDER spinner: SWT::BORDER button: SWT::PUSH Nonetheless, to customize a widget, a style value may be optionally specified within parentheses after the widget name. For an example, “button(SWT::RADIO)” renders a radio button and “button(SWT::CHECK)” renders a checkbox button. Glimmer's syntax also has syntactic sugar for specifying the style. Simply state the name of the style in the standard Ruby downcased/underscored format without the “SWT::” prefix. For example, button(SWT::RADIO) becomes button(radio). SWT composite widgets, such as shell, composite, and group can have a layout manager that lays out child widgets according to a certain pattern without the need to specify the (x, y) position of each child widget explicitly. Layout managers come in many flavors, such as GridLayout, offering a grid-like layout; FillLayout, allowing child widgets to fill the whole available area; and RowLayout, rendering child widgets one after the other in a row by default. Glimmer is configured with smart defaults for layout managers too: shell: FillLayout composite: GridLayout with one column group: GridLayout with one column GridLayout is a particularly useful SWT layout, so I will go over it in a little more detail here. GridLayout allows you to lay widgets out in a grid similar to HTML tables. To instantiate a custom GridLayout, you must specify the number of columns and whether they are of equal width or not. Here is a block of code demonstrating a group box having a GridLayout with 2 columns of unequal width: group { layout GridLayout.new(2, false) } Now, suppose we add four elements to that group box: group { layout GridLayout.new(2, false) label {text "First"}; text {text "Bullet"} label {text "Last"}; text {text "Tooth"} } The specified GridLayout will lay out the child widgets in the grid from left to right and top to bottom: The label with the text “First” will go into the 1st column of the 1st row. The text box with the text “Bullet” will go into the 2nd column of the 1st row. The label with the text “Last” will go into the 1st column of the 2nd row. The text box with the text “Tooth” will go into the 2nd column of the 2nd row. The group was actually a part of the advanced example illustrated earlier. It was given a title (by specifying the text attribute,) and the widget declarations were written in a way that maps visually to how they appear on the screen. Notice how text box declarations are on the same line as the label declarations since both the label and text box go under the same row, which helps improve code readability and maintainability: group { text "Name" layout GridLayout.new(2, false) layout_data GridData.new(fill, fill, true, true) label {text "First"}; text {text "Bullet"} label {text "Last"}; text {text "Tooth"} } That renders the following: [img_assist|nid=3588|title=|desc=|link=none|align=undefined|width=89|height=73] Layout of specific widgets may be further customized by specifying layout data. For GridLayout, layout data is specified through GridData objects. For example, we may decide to have the text boxes in the previous example have a greater width: group { layout GridLayout.new(2, false) label {text "First"}; text { text "Bullet" layout_data GridData.new(100, default) } label {text "Last"}; text { text "Tooth" layout_data GridData.new(100, default) } } This renders the following: [img_assist|nid=3589|title=|desc=|link=none|align=undefined|width=125|height=73] The used GridData constructor takes two parameters: width hint and height hint. The width was set to 100 pixels for both text boxes. The height was kept at the default value (SWT::DEFAULT) For more details about GridLayout, GridData, and other layout managers, please refer to the SWT API documentation. So far we have covered how to construct user-interfaces that can display data and gather input from the user. Next, we will demonstrate how to perform work based on actions taken by the user. SWT widgets can be monitored for certain user-interface events, such as mouse clicks, focus gain and loss, and key presses. With the original SWT API, events can be monitored by adding listeners to widgets. For example, to monitor the push of a button, you would add a SelectionListener that does some work in its widgetSelected event method. With Glimmer, events can be monitored by declaring their name (following Ruby conventions) prefixed by “on” Here is an example of how to monitor button selection: import org.eclipse.swt.widgets.MessageBox @shell1 = shell { composite { button { text 'Save' on_widget_selected { message_box = MessageBox.new(@shell1.widget, SWT::NULL) message_box.text = 'Information' message_box.message = 'Saved!' message_box.open } } } } @shell1.open This renders the following: [img_assist|nid=3590|title=|desc=|link=none|align=undefined|width=170|height=145] On click of the button, a message box is opened to let the user know that the information entered is saved. MessageBox is a class from SWT that represents message dialogs. It was imported using the JRuby import method. Its constructor takes a parent and style. To obtain the parent, we assigned the shell object to a Ruby class variable @shell1. Since Glimmer wraps all SWT constructed objects with Glimmer decorators ( e.g. Shell is wrapped with RShell,) to obtain the SWT Shell class and pass it as the parent to the MessageBox constructor, the widget method was called (e.g. @shell1.widget.) In the original SWT API, MessageBox has setter methods to set its text and message attributes. However in JRuby, the developer has the option to set them following the Ruby attribute conventions (e.g. message_box.text = 'value') because JRuby automatically enhances all Java objects with methods that follow the Ruby convention. Another example that benefits from event monitoring is field validation on loss of focus. For example, let's say we are validating the ZIP code on an address form, and we would like to display an error message if its value does not have a valid ZIP code format (e.g. 12345 or 12345-1234,) here is how we would do it with Glimmer (please add the following code before the button in the previous example): import org.eclipse.swt.widgets.MessageBox @shell1 = shell { composite { label { text "ZIP Code" } text { on_focus_lost { |focus_event| zip_code = focus_event.widget.text unless zip_code =~ /^\d{5}([-]\d{4})?$/ message_box = MessageBox.new(@shell1.widget, SWT::NULL) message_box.text = 'Validation Error' message_box.message = 'Format must match ##### or #####-####' message_box.open focus_event.widget.set_focus end } } button { text 'Save' on_widget_selected { message_box = MessageBox.new(@shell1.widget, SWT::NULL) message_box.message = 'Saved!' message_box.open } } } } @shell1.open Here is what it produces: [img_assist|nid=3591|title=|desc=|link=none|align=undefined|width=346|height=151] Notice how the on_focus_lost block has a FocusEvent object as a parameter. This parameter may be specified optionally whenever some information is needed from the event object. Again, this maps to the focusLost method on the FocusListener class in the original SWT API, which also takes a FocusEvent object as a parameter. While widgets in the original SWT API have a setFocus event to grab the user interface focus, in JRuby set_focus may be used instead following the Ruby naming conventions. Now, in order to cleanly separate event-driven behavior from user-interface code, we can rely on Glimmer's data-binding support. Stay tuned for the next tutorial, which will cover data-binding and how to achieve clean code separation with the Model-View-Presenter pattern. References: Glimmer Eclipse Technology Project Proposal: http://www.eclipse.org/proposals/glimmer/ Glimmer Newsgroup: http://www.eclipse.org/newsportal/thread.php?group=eclipse.technology.glimmer Glimmer at RubyForge: http://rubyforge.org/projects/glimmer/ Author Blog: http://andymaleh.blogspot.com Andy Maleh (andy at obtiva.com), Senior Consultant, Obtiva Corp.

June 19, 2008

by Andy Maleh

· 60,469 Views

Running the Table With JMesa

Shhhh. I’ll tell you a secret. I don’t like tables. I know. Shocking, isn't it? Don't get me wrong: I don't dislike tables per se. They're great for displaying tabular material. (For page organization, not so much.) But I so dislike the code needed to build a table within a JSP. It usually comes down to something like this: User IDNameEmail${row.userID}${row.name}${row.email} All that iterative logic simply looks incomprehensible to me. It's still better than scriptlets or custom tag libraries (both of which were, to be sure, phenomenal in their time), but it's an undigestible mass, and even if I do step through it line by line and understand what it does, I'm still left with just a table. Users accustomed to active, Javascript-assisted widgets don't respond to tables that just lie there. Many more lines of code will be needed to enable them to do useful things like paginating through long lists of items, sorting by column values, and the like. It'll be an unholy mix of HTML, JSP directives, JSP tags, EL, Javascript, Java, XML, properties files, and so forth. The whole thing seems so error-prone (note to self: more code + more languages = more "opportunities" for bugs). But recently I discovered an open-source Java library called JMesa that provides another way. I'm going to share with you some of the things I've found in JMesa, building up an HTML page containing a table from nothing to, well, considerably more than nothing. There's a good deal of code here, to give you a sense of the JMesa API; hopefully. you'll come away with some ideas about how you can use JMesa in your own projects. I won't bother with package declarations, imports, or code not relevant to the point at hand; the complete code is available for download in the form of an Eclipse project. Installation instructions will be found at the end of this article. Join me in exploring JMesa! Preparation A Page to Show Before we can get to JMesa, though, we'll need a few things: a page within which to display our table, for instance. In fact, we'll learn even more if we put this page in a context. I have recently fallen in like with Spring MVC and so will use that to build a simple site with a few pages. Just to be clear, while Spring dependency injection and utilities are woven into the code below, JMesa does not depend upon Spring. The pages are not fancy, and I am going to skip most of the setup. Everything is included in the download, of course. One thing I shouldn't skip is the controller for the search results page, the page within which we will build our table. We'll start with pretty much the simplest functionality we can: public class SimpleSearchController extends AbstractController { @Override protected ModelAndView handleRequestInternal(HttpServletRequest request, HttpServletResponse response) throws Exception { return new ModelAndView("simple-results", "results", "Here we will display search results"); } } For those not familiar with Spring MVC, the ModelAndView return value contains a string that will be resolved to a view (in this project, it is resolved to "/WEB-INF/jsp/simple-results.jsp"), and a key-value pair (the second and third constructor arguments) that can be accessed using EL on the JSP page: ${results} Finally, we use the Spring jmesa-servlet.xml configuration file to create and associate a URL with our controller: welcomeController simpleSearchController Clicking on the "Search" link in the menu now produces: [img_assist|nid=3678|title=Figure 1.|desc=A simple page for our table|link=none|align=left|width=757|height=240] All right, not much. But it's the page we need. Something to Display Another thing we need before we can build a table is something to show in it. This "domain" object should be pretty easy to display: public class HelloWorld implements Comparable { private int pk; private String hello = "Hello"; private String world = "world"; private String from = "from"; private String firstName; private String lastName; private String format = "{0}, {1}! {2} {3} {4}"; // ... accessors and mutators public String toString() { return MessageFormat.format(getFormat(), hello, world, from, getFirstName(), getLastName()); } // ... implementations of equals, hashCode, and compareTo } Persistence Service Of course, we need instances of this domain object. Normally, we'd get them from a persistence service; for now, we'll just create them in memory: public class HelloWorldService { private int nextId; private Set helloWorlds = new TreeSet(); public HelloWorldService() { nextId = 1; helloWorlds.add(newInstance("Albert", "Einstein")); helloWorlds.add(newInstance("Grazia", "Deledda")); helloWorlds.add(newInstance("Francis", "Crick")); helloWorlds.add(newInstance("Linus", "Pauling")); helloWorlds.add(newInstance("Theodore", "Roosevelt")); helloWorlds.add(newInstance("Hideki", "Yukawa")); helloWorlds.add(newInstance("Harold", "Urey")); helloWorlds.add(newInstance("Barbara", "McClintock")); helloWorlds.add(newInstance("Hermann", "Hesse")); helloWorlds.add(newInstance("Mikhail", "Gorbachev")); helloWorlds.add(newInstance("Amartya", "Sen")); helloWorlds.add(newInstance("Albert", "Gore")); helloWorlds.add(newInstance("Amnesty", "International")); helloWorlds.add(newInstance("Daniel", "Bovet")); helloWorlds.add(newInstance("William", "Faulkner")); helloWorlds.add(newInstance("Otto", "Diels")); helloWorlds.add(newInstance("Marie", "Curie")); } public Set findAll() { return helloWorlds; } private HelloWorld newInstance(String firstName, String lastName) { HelloWorld hw = new HelloWorld(); hw.setPk(nextId++); hw.setFirstName(firstName); hw.setLastName(lastName); return hw; } } That's that. Now we're ready to focus on JMesa. JMesa Let's start with something extremely simple. On the very first page of the JMesa web site we find four lines of code that we can appropriate and refashion for a Spring controller: public class BasicJMesaSearchController extends AbstractController { private HelloWorldService helloWorldService; public void setHelloWorldService(HelloWorldService helloWorldService) { this.helloWorldService = helloWorldService; } @Override protected ModelAndView handleRequestInternal(HttpServletRequest request, HttpServletResponse response) throws Exception { Set results = helloWorldService.findAll(); TableFacade tableFacade = new TableFacadeImpl("results",request); tableFacade.setItems(results); tableFacade.setColumnProperties("pk", "firstName", "lastName", "format"); return new ModelAndView("results", "results", tableFacade.render()); } } We let Spring inject the HelloWorldService, which we use to retrieve a set of items to display. Then we create and configure the JMesa TableFacade class. This class takes an HTTP request in its constructor: TableFacade is going to send itself messages passed as parameters in the request (more on this in a moment). We supply it with the set of items and with which JavaBean property of those items we want displayed in each column. We'll also need a bit of new code in the search results page (in the project, this is actually a different search results page, as you, oh sharp-eyed reader, have already noticed): ${results} And we'll need to create and point to the new controller in jmesa-servlet.xml: ... basicSearchController Redeploy, and the results look like magic. How did we get them? [img_assist|nid=3679|title=Figure 2.|desc=Using JMesa "out of the box"|link=none|align=left|width=757|height=405] The key is in the variable results, which now holds the entire text of the table generated by the JMesa TableFacade when we called its render method. We also put a self-submitting HTML form around the JMesa table that it will use to send itself messages about how to alter itself. This makes possible many amazing features. The table automagically paginates itself. It allows the user to change the number of rows displayed. It allows sorting on any column or combination of columns. It provides color striping of table rows and onMouseOver row highlighting. And every bit of this came for free: we did nothing to enable it but what you have already seen. (OK, we played around with some of JMesa's images and CSS style sheets to make it fit in with our color scheme, but that really shouldn't count.) To demonstrate, we'll use the select at the top of the form to change the number of rows displayed to 16, sort by first name ascending and last name descending (by clicking on the first column header once and the second twice), and mouse over the third row to see the highlighting: [img_assist|nid=3681|title=Figure 3.|desc=JMesa search results sorted and highlighted|link=none|align=left|width=757|height=565] Now Al Gore and Einstein appear in the order we asked for. You will have noticed the images in the table toolbar. Those on the left are standard first, previous, next, and last navigation icons. The select we've already mentioned. But there are two other images as well: these turn filtering, another amazing feature of JMesa that is active by default, on and off. Filtering allows the user to apply expressions to a column in order to display only rows having matching values in that column. While filtering can take setup beyond the scope of this article, even by default it's astonishing. Try typing "Einstein" in the text field that appears above the last-name column header and clicking on the filter icon (the magnifying glass). The results show only the row containing Einstein's name in the last name column. And we didn't have to do a thing! [img_assist|nid=3680|title=Figure 4.|desc=JMesa search results filtered|link=none|align=left|width=757|height=280] See the JMesa web site for details about filtering, editable tables that keep track of your changes for you, and much, much more: it's impressive stuff. Customizing And now, to business. The JMesa default is astounding, but no default is ever exactly like you want it. The ability to customize is critical. Also, defaults rarely exercise every feature, and this one is no exception. Let's start with some requirements: We will display the value of each HelloWorld item's toString method in an additional column We will display more user-friendly values in the format column We will ensure that columns that cannot be reasonably sorted are made unsortable We will add columns containing links to edit and delete pages for the HelloWorld items We will display images in the edit and delete columns We will not display the Pk property of each item, but will pass its value to edit and delete pages as needed We will enable the user to retrieve a comma-separated-values (CSV) copy of the table contents We will enable the user to retrieve an Excel spreadsheet copy of the table contents We will disable filtering and highlighting We will reorganize the toolbar items in a different order Believe it or not, implementing each of these features will be quite easy! and you'll begin to get a sense for the possibilities of JMesa. ToString Column Each HelloWorld item produces a formatted string within its toString method. This is not a JavaBean property method, so we cannot directly point the TableFacade at it. We want this value to be rendered (to use JMesa terminology) as the contents of a (a cell) in each HTML row. Cell contents are produced by implementations of the CellEditor interface. Its getValue method is passed the item to be displayed, the property to be called, and the current row count. Since only the item itself is actually needed for our purpose, the implementation is simple: public class ToStringCellEditor implements CellEditor { @Override public Object getValue(Object item, String property, int rowcount) { if (item == null) { return ""; } return item.toString(); } } Of course, we'll need a column into which to put the results. All we need do is add an arbitrary value to the column properties list: tableFacade.setColumnProperties("firstName", "lastName", "format", "toString"); This value is used to retrieve the column: Row row = tableFacade.getTable().getRow(); Column column = row.getColumn("toString"); column.getCellRenderer().setCellEditor(new ToStringCellEditor()); Of course, this means that the getValue method of the ToStringCellEditor will always be passed a bogus property value, but since the editor doesn't use it, that's no problem. (Note that we've also left off the pk column as per requirements.) User-Friendly Format Column We continue by introducing a more user-friendly value into the format column. The format string "{0}, {1}! {2} {3} {4}" looks ugly and most likely won't be understood by an end user. The only real information it conveys is that it is the default value. We'll use a Spring MessageSource to supply something a little easier on the eyes at runtime. First, we'll add a property to the messages.properties file loaded by Spring at application startup: format.{0},\ {1}!\ {2}\ {3}\ {4}=Default format (The backslashes are needed to escape the white space in the key.) As we have already seen, a CellEditor is needed to change a cell's displayed value. Using MessageSource to produce the display value at runtime requires a few more lines than the ToStringCellEditor: public class SpringMessageCellEditor implements CellEditor { MessageSource source; String prefix; Locale locale; public SpringMessageCellEditor(MessageSource source, String prefix, Locale locale) { this.source = source; this.prefix = prefix; this.locale = locale; } public Object getValue(Object item, String property, int rowcount) { if (item != null) { try { return source.getMessage(prefix + "." + PropertyUtils.getProperty(item, property), null, locale); } catch (IllegalAccessException ignore) { } catch (InvocationTargetException ignore) { } catch (NoSuchMethodException ignore) { } } return null; } } We still have to add this editor to the column displaying the format property: Column column = row.getColumn("format"); column.getCellRenderer().setCellEditor(new SpringMessageCellEditor(messageSource, "format", locale); Unsortable Columns Next, we want the table to know that some columns are unsortable. Columns are typically sorted by property value, but we just added a column that corresponds to no property, that displays the output of the toString method. If the user clicked on the header of that column, he or she would wind up with a very ugly NullPointerException message. Making a column (actually, we need to have an HtmlColumn, but most columns qualify) unsortable is very simple: htmlColumn.setSortable(false); With this, no onClick method will be generated for the column header, preventing users from accidentally causing a mess. Edit and Delete Columns Now we'll add columns containing links to edit and delete pages for HelloWorld items. I prefer using icons to buttons saying "Edit" and "Delete", as it reduces the amount of textual information the user must process. Tables typically present a lot of information in a compact space, making user overload a problem worthy of attention. To do this, we'll need a CellEditor (by now, you knew that was coming!). Since this is functionality I use a lot, let's design it for reuse, refactoring out reusable code into one class, and code tailored to this project into another. ImageCellEditor encapsulates the general process of setting up an image with a link, and includes a method that will let subclasses override the default processing of the link: public class ImageCellEditor extends AbstractContextSupport implements CellEditor { private String image; private String alt; private String link; public ImageCellEditor(String image, String alt, String link) { this.image = image; this.alt = alt; this.link = link; } public Object getValue(Object item, String property, int rowcount) { CoreContext context = getCoreContext(); String imagePath = context.getPreference("html.imagesPath"); StringBuilder img = new StringBuilder(); if (link != null && link.trim().length() != 0) { img.append(""); } img.append(""); if (link != null && link.trim().length() != 0) { img.append(""); } return img.toString(); } /** * This method can be overridden by subclasses to handle specific * HTML link needs. */ public String processLink(Object item, String property, int rowcount, String link) { return link; } } This is our opportunity to introduce CoreContext and WebContext, two important classes that plug our code into the JMesa infrastructure. Extending AbstractContextSupport gets us JavaBean property methods for these objects (just a convenience; I could have implemented the interface ContextSupport, but then I would have had to write the property methods myself). The CoreContext has many uses; our immediate purpose for it is to retrieve a value configured in the jmesa.properties file. This was pointed to in web.xml: jmesaPreferencesLocation WEB-INF/jmesa.properties It contains a preference called "html.imagesPath" that replaces the default path from which JMesa retrieves images: html.imagesPath=/images/ This means we won't have to hard-code a part of the image URL. (There are a lot more configurable preferences: for details, see the JMesa web site.) The WebContext provides us with the servlet context path, again letting us avoid hard-coding the image URL: getWebContext().getContextPath() Getting back to the two image columns, we have a requirement to pass the Pk property of the appropriate HelloWorld to the edit or delete pages when the images are clicked. Adding this property to the link is easy, using the MessageFormat class to process the link argument of the application-specific subclass: public class HelloWorldImageCellEditor extends ImageCellEditor { public String processLink(Object item, String property, int rowcount, String link) { return MessageFormat.format(link, ((HelloWorld) item).getPk()); } } After creating the editor, we can retrieve the context objects for it from the TableFacade: ImageCellEditor editor = new HelloWorldImageCellEditor("edit.gif", messageSource.getMessage("image.edit.alt", null, locale), "edit.html?pk={0,number,integer}"); editor.setWebContext(tableFacade.getWebContext()); editor.setCoreContext(tableFacade.getCoreContext()); Now we have the images and the links. But it would be awfully nice if the images could be centered within the column, something notoriously difficult to achieve with CSS style sheets. What would work would be to use the align and valign attributes of the cell. How can we do that? The cell itself, as opposed to its contents, is rendered by the interface CellRenderer. Unfortunately, the HtmlCellRenderer sub-interface that comes with JMesa has no method for adding attributes. The Decorator and Template patterns, however, come to the rescue. Again, we implement the functionality for reuse as two classes, the first a generic decorator with an additional template method: public abstract class AttributedHtmlCellRendererDecorator implements HtmlCellRenderer { // all other methods will be delegated to this renderer protected HtmlCellRenderer renderer; public AttributedHtmlCellRendererDecorator(HtmlCellRenderer renderer) { this.renderer = renderer; } public Object render(Object item, int rowcount) { HtmlBuilder html = new HtmlBuilder(); html.td(2); html.width(getColumn().getWidth()); addAttributes(html); html.style(getStyle()); html.styleClass(getStyleClass()); html.close(); String property = getColumn().getProperty(); Object value = getCellEditor().getValue(item, property, rowcount); if (value != null) { html.append(value.toString()); } html.tdEnd(); return html.toString(); } /** * Subclasses will add attributes. */ public abstract void addAttributes(HtmlBuilder html); } The second will be a subclass that adds the specific attributes we need: public class AlignedHtmlCellRendererDecorator extends AttributedHtmlCellRendererDecorator { private String align; private String valign; public AlignedHtmlCellRendererDecorator(HtmlCellRenderer renderer, String align, String valign) { super(renderer); this.align = align; this.valign = valign; } @Override public void addAttributes(HtmlBuilder html) { html.align(align); html.valign(valign); } } Whew, that was a mouthful! However, our images will come out nicely centered in the column, and we've learned a good deal more about how the JMesa API works. There will be edit and delete pages to link to, of course, but these are not of interest here and are completely trivial in the Eclipse project. CSV and Excel Output In JMesa terminology, output other than HTML is called exporting the table. As complex as it might seem, it's actually the easiest part of the process. Again, a single line of code will do all we need: tableFacade.setExportTypes(response, org.jmesa.limit.ExportType.CSV, org.jmesa.limit.ExportType.EXCEL); That's really all there is to it! (OK, you have to include some JAR files in the library, but what did you expect, magic?) Filtering and Highlighting Making a row (we need an HtmlRow) unfilterable and unhighlighted is just as simple as making a column unsortable: htmlRow.setFilterable(false); htmlRow.setHighlighter(false); With this, no filtering row or icons will be generated above the column header and the highlighting feature will be turned off. Toolbar The code to reorganize the toolbar is quite straightforward; while we're at it, we need to include icons for the various output formats: public class ReorderedToolbar extends AbstractToolbar { @Override public String render() { if (ViewUtils.isExportable(getExportTypes())) { addExportToolbarItems(getExportTypes()); addToolbarItem(ToolbarItemType.SEPARATOR); } MaxRowsItem maxRowsItem = (MaxRowsItem) addToolbarItem(ToolbarItemType.MAX_ROWS_ITEM); if (getMaxRowsIncrements() != null) { maxRowsItem.setIncrements(getMaxRowsIncrements()); } addToolbarItem(ToolbarItemType.SEPARATOR); addToolbarItem(ToolbarItemType.FIRST_PAGE_ITEM); addToolbarItem(ToolbarItemType.PREV_PAGE_ITEM); addToolbarItem(ToolbarItemType.NEXT_PAGE_ITEM); addToolbarItem(ToolbarItemType.LAST_PAGE_ITEM); return super.render(); } } I arranged the icons by simply specifying the order in which they are added to the toolbar. They look more natural to me this way; your mileage may vary. Note that we delegate the messy work of actually rendering the toolbar to the JMesa superclass. Putting It All Together We'll refactor out reusable code once more in writing a Factory to encapsulate building our customized table, starting with an abstract class: public abstract class AbstractTableFactory { protected abstract String getTableName(); protected abstract void configureColumns(TableFacade tableFacade, Locale locale); protected abstract void configureUnexportedTable(TableFacade tableFacade, Locale locale); protected abstract ImageCellEditor getEditImageCellEditor(Locale locale); protected abstract ImageCellEditor getDeleteImageCellEditor( Locale locale); public TableFacade createTable(HttpServletRequest request, HttpServletResponse response, Collection items) { TableFacade tableFacade = new TableFacadeImpl(getTableName(), request); tableFacade.setItems(items); tableFacade.setStateAttr("return"); configureTableFacade(response, tableFacade); Locale locale = request.getLocale(); configureColumns(tableFacade, locale); if (! tableFacade.getLimit().isExported()) { configureUnexportedTable(tableFacade, locale); } return tableFacade; } public void configureTableFacade(HttpServletResponse response, TableFacade tableFacade) { tableFacade.setExportTypes(response, getExportTypes()); tableFacade.setToolbar(new ReorderedToolbar()); Row row = tableFacade.getTable().getRow(); if (row instanceof HtmlRow) { HtmlRow htmlRow = (HtmlRow) row; htmlRow.setFilterable(false); htmlRow.setHighlighter(false); } } protected ExportType[] getExportTypes() { return null; } protected void configureColumn(Column column, String title, CellEditor editor) { configureColumn(column, title, editor, false, true); } protected void configureColumn(Column column, String title, CellEditor editor, boolean filterable, boolean sortable) { column.setTitle(title); if (editor != null) { column.getCellRenderer().setCellEditor(editor); } if (column instanceof HtmlColumn) { HtmlColumn htmlColumn = (HtmlColumn) column; htmlColumn.setFilterable(filterable); htmlColumn.setSortable(sortable); } } protected void configureEditAndDelete(Row row, WebContext webContext, CoreContext coreContext, Locale locale) { HtmlComponentFactory factory = new HtmlComponentFactory(webContext, coreContext); HtmlColumn col = factory.createColumn((String) null); col.setFilterable(false); col.setSortable(false); CellRenderer renderer = col.getCellRenderer(); ImageCellEditor editor = getEditImageCellEditor(locale); editor.setWebContext(webContext); editor.setCoreContext(coreContext); renderer.setCellEditor(editor); col.setCellRenderer(new AlignedHtmlCellRendererDecorator((HtmlCellRenderer) renderer, "center", "middle")); row.addColumn(col); col = factory.createColumn((String) null); col.setFilterable(false); col.setSortable(false); renderer = col.getCellRenderer(); editor = getDeleteImageCellEditor(locale); editor.setWebContext(webContext); editor.setCoreContext(coreContext); renderer.setCellEditor(editor); col.setCellRenderer(new AlignedHtmlCellRendererDecorator((HtmlCellRenderer) renderer, "center", "middle")); row.addColumn(col); } } This has a lot of code (note the abstract methods ), in part because I know I usually want edit and delete columns. One line that might pass by unnoticed in all this, however, is really quite something: tableFacade.setStateAttr("return"); When this attribute is set, JMesa uses the Memento design pattern to save the state of its tables. When you return to a table page and include the attribute you specify here in the URL, you return to the exact place you left: the page number to which you had moved before leaving the table, the number of values displayed per page, and so forth. The application-specific concrete class, after all this, can be pretty simple: public class HelloWorldTableFactory extends AbstractTableFactory { protected MessageSource messageSource; public void setMessageSource(MessageSource messageSource) { this.messageSource = messageSource; } @Override protected String getTableName() { return "results"; } @Override protected ExportType[] getExportTypes() { return new ExportType[] { CSV, EXCEL }; } @Override protected void configureColumns(TableFacade tableFacade, Locale locale) { tableFacade.setColumnProperties("firstName", "lastName", "format", "toString"); Row row = tableFacade.getTable().getRow(); configureColumn(row.getColumn("firstName"), messageSource.getMessage("column.firstName", null, locale), null); configureColumn(row.getColumn("lastName"), messageSource.getMessage("column.lastName", null, locale), null); configureColumn(row.getColumn("format"), messageSource.getMessage("column.format", null, locale), new SpringMessageCellEditor(messageSource, "format", locale), false, false); configureColumn(row.getColumn("toString"), messageSource.getMessage("column.toString", null, locale), new ToStringCellEditor(), false, false); } @Override protected void configureUnexportedTable(TableFacade tableFacade, Locale locale) { HtmlTable table = (HtmlTable) tableFacade.getTable(); table.setCaption(messageSource.getMessage("table.caption", null, locale)); configureEditAndDelete(table.getRow(), tableFacade.getWebContext(), tableFacade.getCoreContext(), locale); } @Override protected ImageCellEditor getEditImageCellEditor(Locale locale) { return new HelloWorldImageCellEditor("edit.gif", messageSource.getMessage("image.edit.alt", null, locale), "edit.html?pk={0,number,integer}"); } @Override protected ImageCellEditor getDeleteImageCellEditor(Locale locale) { return new HelloWorldImageCellEditor("delete.gif", messageSource.getMessage("image.delete.alt", null, locale), "delete.html?pk={0,number,integer}"); } } Controller We end as we began, with a Spring MVC Controller to launch all this infrastructure. Since the details of table creation are encapulated in a factory, this is uncluttered: the only decision to be made is whether or not the table is to be exported. If it is exported, the results will be written directly to the output stream of the response; if not, they'll be rendered as a string containing our HTML table: public class CustomJMesaSearchController extends AbstractController { private HelloWorldService helloWorldService; private HelloWorldTableFactory tableFactory; public void setHelloWorldService(HelloWorldService helloWorldService) { this.helloWorldService = helloWorldService; } public void setTableFactory(HelloWorldTableFactory tableFactory) { this.tableFactory = tableFactory; } @Override protected ModelAndView handleRequestInternal(HttpServletRequest request, HttpServletResponse response) throws Exception { Set results = helloWorldService.findAll(); TableFacade tableFacade = tableFactory.createTable(request, response, results); if (tableFacade.getLimit().isExported()) { tableFacade.render(); return null; } return new ModelAndView("results", "results", tableFacade.render()); } } We are actually reusing the same JSP page as in the basic JMesa setup: the only difference is in the Java code that generates the table. One more change in jmesa-servlet.xml to create everything and tie it all together: ... customSearchController And how different the display looks!: [img_assist|nid=3682|title=Figure 5.|desc=A customized search result|link=none|align=left|width=757|height=465] Ajax Finally, the table looks like we want it to, but it's irritating having to resubmit the form each time we want to make a change. Isn't that the sort of thing Ajax is supposed to help us avoid? The answer is, of course, yes! So how do we leverage Ajax to help us? Fortunately, the JMesa folks have already worked that out. There are two parts to the solution: changes to the controller and changes to the JSP page. ${results} In our previous solution, the onInvokeAction Javascript method called createHiddenInputFieldsForLimitAndSubmit, which submitted the form. In the Ajax solution, it assembles parameters for the TableFacade class and sends a request for the HTML for table display, adding a parameter to indicate that it's an Ajax request. Then a callback Javascript function substitutes the returned HTML for the contents of the that now holds the table. The simplicity and unusual syntax of the latter code come courtesy of the jQuery Ajax library, which is thoughtfully used by JMesa: The controller, of course, needs to interpret this new request correctly. This is just one more branch on the decision tree we saw in the previous controller: public class AjaxJMesaSearchController extends AbstractController { @Override protected ModelAndView handleRequestInternal(HttpServletRequest request, HttpServletResponse response) throws Exception { Set results = helloWorldService.findAll(); TableFacade tableFacade = tableFactory.createTable(request, response, results); if (tableFacade.getLimit().isExported()) { tableFacade.render(); return null; } else if ("true".equals(request.getParameter("ajax"))) { String encoding = response.getCharacterEncoding(); byte[] contents = tableFacade.render() .getBytes(encoding); response.getOutputStream().write(contents); return null; } return new ModelAndView("ajax-results", "results", tableFacade.render()); } } Of course, we have to make Spring aware of the controller change in jmesa-servlet.xml: ... ajaxSearchController That's all there is to it! The table looks and acts just as it did, except now it refreshes without resubmitting the form each time. Conclusion Now I don't have to like tables: I can program them in Java and not worry about them on a display JSP. This makes the page cleaner, gives me more functionality out-of-box, and enables me to nix at least some of the languages I'd otherwise have to fuss with. What's not to like? I hope you'll take a good look at JMesa and see if it can make your life easier, and that this article helps you decide. Good luck! Installation of the Eclipse Project Installing the Eclipse project is not difficult; the included Ant build file and these instructions assume Tomcat as the deployment target (I'm using version 6.0.14 with JDK 6.0_03). If you want to use another servlet container, though, feel free to modify the instructions and the Ant file as needed: download the ZIP archive unzip the archive to any directory; it will create its own top-level subdirectory open the project as a Java project in Eclipse the project must use the Java 6 compiler (available from "http://java.sun.com/javase/6/") the Tomcat installation must be version 6 (available from "http://tomcat.apache.org/download-60.cgi") open the build file and modify the path to the Tomcat root add an external JAR file to the Eclipse project build path from the Tomcat installation: lib/servlet-api.jar run the Ant "deploy" target, which will build automatically open a browser and point it to "http://localhost:8080/running-jmesa-examples/" or to an equivalent URL for your setup (N.B. Some code in the project has been refactored from the way it appears in the article.)

June 18, 2008

by David Sills

· 56,467 Views

ASP.NET - Preventing SQL Injection Attacks

Consider a simple web application that requires user input in some fields, lets say some search box. Suppose a user types the following string in that textbox: '; DROP DATABASE pubs -- On submit our application executes the following dynamic SQL statement SqlDataAdapter myCommand = new SqlDataAdapter("SELECT OrderId, OrderNumber FROM Orders WHERE OrderNumber = '" + OrderNumberTextBox.Text + "'", myConnection); Or stored procedure: SqlDataAdapter myCommand = new SqlDataAdapter("uspGetOrderList '" + OrderNumberTextBox.Text + "'", myConnection); The intention being that the user input would be run as: SELECT OrderId, OrderNumber FROM Orders WHERE OrderNumber = 'PO123' However, the code inserts the user's malicious input and generates the following query: SELECT OrderId, OrderNumber FROM Orders WHERE OrderNumber = ''; DROP DATABASE pubs --' In this case, the ' (single quotation mark) character that starts the rogue input terminates the current string literal in the SQL statement. As a result, the opening single quotation mark character of the rogue input results in the following statement. SELECT OrderId, OrderNumber FROM Orders WHERE OrderNumber = '' The; (semicolon) character tells SQL that this is the end of the current statement, which is then followed by the following malicious SQL code. ; DROP DATABASE pubs Finally, the -- (double dash) sequence of characters is a SQL comment that tells SQL to ignore the rest of the text. In this case, SQL ignores the closing ' (single quotation mark) character, which would otherwise cause a SQL parser error. --' Using stored procedures doesn’t solve the problem either because the generated query would be: uspGetOrderList ''; DROP DATABASE pubs--' Or perhaps this was your login page and your query being: SELECT UserId FROM Users WHERE LoginId = AND Password = AND IsActive = 1 Someone could easily login by typing in the following in your login textbox: ' OR 1 = 1; -- Which makes our query: SELECT UserId FROM Users WHERE LoginId = '' OR 1 = 1; --' AND Password = '' AND IsActive = 1 Viola, the attacker has now successfully logged in to your site using SQL injection attack. SQL injection can occur, as demonstrated above, when an application uses input to construct dynamic SQL statements or when it uses stored procedures to connect to the database. Conventional security measures, such as the use of SSL and IPSec, do not protect your application from SQL injection attacks. Successful SQL injection attacks enable malicious users to execute commands in an application's database. Common vulnerabilities that make your data access code susceptible to SQL injection attacks include: Weak input validation. Dynamic construction of SQL statements without the use of type-safe parameters. Use of over-privileged database logins. So what can we do to help protect our application from such attacks? To counter SQL injection attacks, we need to: Constrain and sanitize input data Check for known good data by validating for type, length, format, and range and using a list of acceptable characters to constrain input. Create a list of acceptable characters and use regular expressions to reject any characters that are not on the list. Using the list of unacceptable characters is impractical because it is very difficult to anticipate all possible variations of bad input. Start by constraining input in the server-side code for your ASP.NET Web pages. Do not rely on client-side validation because it can be easily bypassed. Use client-side validation only to reduce round trips and to improve the user experience. Check my other blog on Validation Application Block for server-side validation. If in the previous code example, the Order Number value is captured by an ASP.NET TextBox control, you can constrain its input by using a RegularExpressionValidator control as shown in the following. If the Order Number input is from another source, such as an HTML control, a query string parameter, or a cookie, you can constrain it by using the Regex class from the System.Text.RegularExpressions namespace. The following example assumes that the input is obtained from a cookie. using System.Text.RegularExpressions; if (Regex.IsMatch(Request.Cookies["OrderNumber"], "^PO\d{3}-\d{2}$")) { // access the database } else { // handle the bad input } Performing input validation is essential because almost all application-level attacks contain malicious input. You should validate all input, including form fields, query string parameters, and cookies to protect your application against malicious command injection. Assume all input to your Web application is malicious, and make sure that you use server validation for all sources of input. Use client-side validation to reduce round trips to the server and to improve the user experience, but do not rely on it because it is easily bypassed. Apply ASP.NET request validation during development to identify injection attacks ASP.NET request validation detects any HTML elements and reserved characters in data posted to the server. This helps prevent users from inserting script into your application. Request validation checks all input data against a hard-coded list of potentially dangerous values. If a match occurs, it throws an exception of type HttpRequestValidationException. Request validation is enabled by ASP.NET by default. You can see the following default setting in the Machine.config.comments file. Confirm that you have not disabled request validation by overriding the default settings in your server's Machine.config file or your application's Web.config file. You can disable request validation in your Web.config application configuration file by adding a element with validateRequest="false" or on an individual page by setting ValidateRequest="false" on the @ Pages element. NOTE: You should disable Request Validation only on the page with a free-format text field that accepts HTML-formatted input. You can test the effects of request validation. To do this, create an ASP.NET page that disables request validation by setting ValidateRequest="false", as follows: When you run the page, "Hello" is displayed in a message box because the script in txtString is passed through and rendered as client-side script in your browser. If you set ValidateRequest="true" or remove the ValidateRequest page attribute, ASP.NET request validation rejects the script input and produces an error similar to the following. A potentially dangerous Request. Form value was detected from the client (txtString=" Use type-safe SQL parameters for data access Parameter collections such as SqlParameterCollection provide type checking and length validation. If you use a parameters collection, input is treated as a literal value, and SQL Server does not treat it as executable code. An additional benefit of using a parameters collection is that you can enforce type and length checks. Values outside of the range trigger an exception. You can use these parameters with stored procedures or dynamically constructed SQL command strings. Using stored procedures does not necessarily prevent SQL injection. The important thing to do is use parameters with stored procedures. If you do not use parameters, your stored procedures can be susceptible to SQL injection if they use unfiltered input. The following code shows how to use SqlParameterCollection when calling a stored procedure: using System.Data; using System.Data.SqlClient; using (SqlConnection connection = new SqlConnection(connectionString)) { DataSet userDataset = new DataSet(); SqlDataAdapter myCommand = new SqlDataAdapter("uspGetOrderList", connection); myCommand.SelectCommand.CommandType = CommandType.StoredProcedure; myCommand.SelectCommand.Parameters.Add("@OrderNumber", SqlDbType.VarChar, 11); myCommand.SelectCommand.Parameters["@OrderNumber"].Value = OrderNumberTextBox.Text; myCommand.Fill(userDataset); } The @OrderNumber parameter is treated as a literal value and not as executable code. Also, the parameter is checked for type and length. In the preceding code example, the input value cannot be longer than 11 characters. If the data does not conform to the type or length defined by the parameter, the SqlParameter class throws an exception. You should review your application's use of stored procedures because simply using stored procedures with parameters does not necessarily prevent SQL injection. For example, the following parameterized stored procedure has several security vulnerabilities. CREATE PROCEDURE dbo.uspRunQuery @var ntext AS exec sp_executesql @var GO The stored procedure executes whatever statement is passed to it. Consider the @var variable being set to: DROP TABLE ORDERS; If you cannot use stored procedures, you should still use parameters when constructing dynamic SQL statements. The following code shows how to use SqlParametersCollection with dynamic SQL. using System.Data; using System.Data.SqlClient; using (SqlConnection connection = new SqlConnection(connectionString)) { DataSet userDataset = new DataSet(); SqlDataAdapter myDataAdapter = new SqlDataAdapter("SELECT OrderId, OrderNumber FROM Orders WHERE OrderNumber = @OrderNumber", connection); myCommand.SelectCommand.Parameters.Add("@OrderNumber", SqlDbType.VarChar, 11); myCommand.SelectCommand.Parameters["@OrderNumber"].Value = OrderNumberTextBox.Text; myDataAdapter.Fill(userDataset); } If you concatenate several SQL statements to send a batch of statements to the server in a single round trip, you can still use parameters if you make sure that parameter names are not repeated i.e. use unique parameter names during SQL text concatenation. SELECT OrderId, OrderNumber FROM Orders WHERE OrderNumber = 'PO123' using System.Data; using System.Data.SqlClient; using (SqlConnection oConn = new SqlConnection(connectionString)) { SqlDataAdapter oAdapter = new SqlDataAdapter( "SELECT CustomerID INTO #Temp1 FROM Customers " + "WHERE CustomerID > @custIDParm; " + "SELECT CompanyName FROM Customers " + "WHERE Country = @countryParm and CustomerID IN " + "(SELECT CustomerID FROM #Temp1);", oConn); SqlParameter custIDParm = oAdapter.SelectCommand.Parameters.Add("@custIDParm", SqlDbType.NChar, 5); custIDParm.Value = customerID.Text; SqlParameter countryParm = oAdapter.SelectCommand.Parameters.Add("@countryParm", SqlDbType.NVarChar, 15); countryParm.Value = country.Text; oConn.Open(); DataSet dataSet = new DataSet(); oAdapter.Fill(dataSet); } Use a least privileged account that has restricted permissions in the database Ideally, you should only grant execute permissions to selected stored procedures in the database and provide no direct table access. The problem is more severe if your application uses an over-privileged account to connect to the database. For example, if your application's login has privileges to eliminate a database, then without adequate safeguards, an attacker might be able to perform this operation. If you use Windows authentication to connect, the Windows account should be least-privileged from an operating system perspective and should have limited privileges and limited ability to access Windows resources. Additionally, whether or not you use Windows authentication or SQL authentication, the corresponding SQL Server login should be restricted by permissions in the database. Consider the example of an ASP.NET application running on Microsoft Windows Server 2003 that accesses a database on a different server in the same domain. By default, the ASP.NET application runs in an application pool that runs under the Network Service account. This account is a least privileged account. Create a SQL Server login for the Web server's Network Service account. The Network Service account has network credentials that are presented at the database server as the identity DOMAIN\WEBSERVERNAME$. For example, if your domain is called XYZ and the Web server is called 123, you create a database login for XYZ\123$. Grant the new login access to the required database by creating a database user and adding the user to a database role. Establish permissions to let this database role call the required stored procedures or access the required tables in the database. Only grant access to stored procedures the application needs to use, and only grant sufficient access to tables based on the application's minimum requirements. If the ASP.NET application only performs database lookups and does not update any data, you only need to grant read access to the tables. This limits the damage that an attacker can cause if the attacker succeeds in a SQL injection attack. Use Character Escaping Techniques In situations where parameterized SQL cannot be used, consider using character escaping techniques. If you are forced to use dynamic SQL and parameterized SQL cannot be used, you need to safeguard against input characters that have special meaning to SQL Server (such as the single quote character). If not handled, special characters such as the single quote character in the input can be utilized to cause SQL injection. Escape routines add an escape character to characters that have special meaning to SQL Server, thereby making them harmless. private static string GetStringForSQL(string inputSQL) { return inputSQL.Replace("'", "''"); } Special input characters pose a threat only with dynamic SQL and not when using parameterized SQL. Your first line of defense should always be to use parameterized SQL. Avoid disclosing database error information In the event of database errors, make sure you do not disclose detailed error messages to the user. Use structured exception handling to catch errors and prevent them from propagating back to the client. Log detailed error information locally, but return limited error details to the client. If errors occur while the user is connecting to the database, be sure that you provide only limited information about the nature of the error to the user. If you disclose information related to data access and database errors, you could provide a malicious user with useful information that he or she can use to compromise your database security. Attackers use the information in detailed error messages to help deconstruct a SQL query that they are trying to inject with malicious code. A detailed error message may reveal valuable information such as the connection string, SQL server name, or table and database naming conventions. See my other post on Exception Handling - Do's and Dont's. You can use the element to configure custom, generic error messages that should be returned to the client in the event of an application exception condition. Make sure that the mode attribute is set to "remoteOnly" in the web.config file as shown in the following example. After installing an ASP.NET application, you can configure the setting to point to your custom error page as shown in the following example. Conclusion The above list is just some points found on MSDN on how you can make your site more secure by effectively preventing SQL injection attacks. You should always be reviewing your code to find these or other security vulnerabilities; remember all type of attacks start with some input, and your first line of defense should be input validation using both client-side and server-side validation. Original Author Original article written by Misbah Arefin

June 18, 2008

by Schalk Neethling

· 90,721 Views

Hibernate - Dynamic Table Routing

I have been searching for a method to dynamically route objects to databases at runtime using Hibernate and recently I found a solution which fit the bill.

June 13, 2008

by alvin sd

· 60,074 Views · 3 Likes

Hibernate - Tuning Queries Using Paging, Batch Size, and Fetch Joins

This article covers queries - in particular a tuning test case and the relations between simple queries, join fetch queries, paging query results, and batch size. Paging the Query Results I will start with a short introduction about paging in EJB3: To support paging the EJB3 Query interface defines the following two methods: setMaxResults - sets the number of maximum rows to retrieve from the database setFirstResult - sets the first row to retrieve For example if our GUI displays a list of customers and we have 500,000 customers (database rows) in out database we wouldn't like to display all 500,000 records is one view (even if we put performance considerations aside - nobody can do anything with a list of 500,000 rows). The GUI design would usually include paging - we break the list of records to display into logical pages (for example 100 records per page) and the user can navigate between pages (same as Google's results navigator down the search page). When using the paging support it is important to remember that the query has to be sorted otherwise we can't be sure that when fetching the "next page" it will really be the next page (since in the absence of the 'order by' clause form a SQL query the order in which rows are fetch is unpredictable). Here is a sample use, for fetching the first tow pages of 100 rows each: Query q = entityManager.createQuery("select c from Customer c order by c.id"); q.setFirstResult(0).setMaxResults(100); .... next page ... Query q = entityManager.createQuery("select c from Customer c order by c.id"); q.setFirstResult(100).setMaxResults(100); This is a simple API and it's important (for performance) to remember using it when we need to fetch only parts of the results. Test Case Description This test cased is based on a real tuning I did for an application, I just changed the class names to Customer and Order. Let's assume that I have a Customer entity with a set of orders (lazily fetched - but it happens in eager fetch as well) and we need to: Fetch customers and their orders Do it in a "paging mode" - 100 customers per page Tuning Requirement #1 - Fetch Customers and Their Orders There are two possibilities to perform this kind of fetch: Simple select: select c from customer c order by c.id Join fetch: select distinct c from Customer c left outer join fetch c.orders order by c.id The simple select is as simple as it can be, we load a list of customers with a proxy collection in their orders field. The orders collection will be filled with data once I access it (for example c.getOrders().getSize() ). The 'join fetch' means that we want to fetch an association as an integral part of the query execution. The joined fetched entities (in the example above: c.orders) must be part of an association that is referenced by an entity returned from the query (in the example above: c). The 'join fetch' is one of the tools used for improving queries performance (see more in here). The Hibernate core documentations explains that "a 'fetch' join allows associations or collections of values to be initialized along with their parent objects, using a single select" (see here). I have in my database 18,998 customer records, each with few orders. Let's compare execution time for the two queries. My code looks the same for both queries (except of the query itself), I execute the query, then I iterate the results checking the size of of each customer orders collection and print the execution time and number of records fetch (as a sanity for the query syntax): Query q = entityManager.createQuery(queryStr); long a = System.currentTimeMillis(); List l = q.getResultList(); for (Customer c : l) { c.getOrders().size(); } long b = System.currentTimeMillis(); System.out.println("Execution time: " + (b - a)+ "; Number of records fetch: " + l.size() ); And to the numbers (avg. 3 executions): Simple select: 24,984 millis Join fetch: 1,219 millis The join fetch query execution time was 20 times faster(!) than the simple query. The reason is obvious, using the join fetch select I had only one round trip to the database. While using a simple select I had to fetch the customers (1 round trip to the database) and each time I accessed a collection I had another round trip (that's 18,998 additional round trips!). The winner is 'join fetch'. But does it? wait for the next one - the paging... Tuning Requirement #2 - Use Paging The second requirement was to do it in paging - each page will have 100 customers (so we will have 18,900/100+1 pages - the last page has 98 customers). So let's change the code above a little bit: Query q = entityManager.createQuery(queryStr); q.setFirstResult(pageNum*100).setMaxResults(100); long a = System.currentTimeMillis(); List l = q.getResultList(); for (Customer c : l) { c.getOrders().size(); } long b = System.currentTimeMillis(); System.out.println("Execution time: " + (b - a)+ "; Number of records fetch: " + l.size() ); I added the second line which limits the query result to a specific page with up to 100 records per page. And the numbers are (avg. 3 executions): Simple select: 328 millis Join fetch: 1,660 millis The wheel has turned over. Why? First a quote from the EJB3 Persistence specification: "The effect of applying setMaxResults or setFirstResult to a query involving fetch joins over collections is undefined" (section 3.6.1 - Query Interface) We could have stopped here but it is interesting to understand the issue and to see what Hibernate does. To implement the paging features Hibernate delegates the work to the database using its syntax to limit the number of records fetched by the query. Each database has its own proprietary syntax for limiting the number of fetched records, some examples: Postgres uses LIMIT and OFFSET Oracle has rownum MySQL uses its version of LIMIT and OFFSET MSSQL has the TOP keyword in the select and so on The important thing to remember here is meaning of such limit: the database returns a subset of the query result. So if we asked for the first 100 customers which their names contain 'Eyal' the outcome is logically the same as building a table in memory out of all customers that match the criteria and take from there the first 100 rows. And here is the catch: if the query with the limit includes a join clause for a collection than the first 100 row in the "logical table" will not necessarily be the first 100 customers. the outcome of the join might duplicate customers in the "logical tables" but the database doesn't aware or care about that - it performs operations on tables not on objects!. For example think of the extreme case, the customer 'Eyal' has 100 orders. The query will return 100 rows, hibernate will identify that all belong to the same customer and return only one Customer as the query result - this is not what we were asking for. This also works, of course, the other way around. If a customer had more than 100 orders and the result set size was limited to 100 rots the orders collection would not contain all of the customer's orders. To deal with that limitation Hibernate actually doesn't issue an SQL statement with a LIMIT clause. Instead it fetches all of the records and performs the paging in memory. This explains why using the 'join fetch' statement with paging took more than the one without paging - the delta is the in-memory paging done by Hibernate. If you look at Hibernate logs you will find the next warning issued by Hibernate: WARNING: firstResult/maxResults specified with collection fetch; applying in memory! Final Tuning - BatchSize Does it mean that in the case of paging we shouldn't use a join fetch? usually it does (unless your page size is very close to the actual number of records). But even if you use a simple select this is a classic case for using the @BatchSize annotation. If my session/entity manager has 100 customers attached to it than, be default, for each first access to one of the customers' order collection Hibernate will issue a SQL statement to fill that collection. At the end I will execute 100 statements to fetch 100 collections. You can see it in the log: Hibernate: /* select c from Customer c order by c.id */ select customer0_.id as id0_, customer0_.ccNumber as ccNumber0_, customer0_.name as name0_, customer0_.fixedDiscount as fixedDis5_0_, customer0_.DTYPE as DTYPE0_ from CUSTOMERS customer0_ order by customer0_.id limit ? offset ? Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id=? Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id=? Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id=? ............ Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id=? Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id=? Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id=? The @BatchSize annotation can be used to define how many identical associations to populate in a single database query. If the session has 100 customers attached to it and the mapping of the 'orders' collection is annotated with @BatchSize of size n. It means that whenever Hibernate needs to populate a lazy orders collection it checks the session and if it has more customers which their orders collections need to be populated it fetches up to n collections. Example: if we had 100 customers and the batch size was set to 16 when iterating over the customers to get their number of orders hibernate will go to the database only 7 times (6 times to fetch 16 collections and one more time to fetch the 4 remaining collections - see the sample below). If our batch size was set to 50 it would go only twice. @OneToMany(mappedBy="customer",cascade=CascadeType.ALL, fetch=FetchType.LAZY) @BatchSize(size=16) private Set orders = new HashSet(); And in the log: Hibernate: /* select c from Customer c order by c.id */ select customer0_.id as id0_, customer0_.ccNumber as ccNumber0_, customer0_.name as name0_, customer0_.fixedDiscount as fixedDis5_0_, customer0_.DTYPE as DTYPE0_ from CUSTOMERS customer0_ order by customer0_.id limit ? offset ? Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) Hibernate: /* load one-to-many par2.Customer.orders */ select orders0_.customer_id as customer4_1_, orders0_.id as id1_, orders0_.id as id1_0_, orders0_.customer_id as customer4_1_0_, orders0_.description as descript2_1_0_, orders0_.orderId as orderId1_0_ from ORDERS orders0_ where orders0_.customer_id in (?, ?, ?, ?) Back to our test case. In my example setting the batch size to 100 looks like a nice tuning opportunity. And indeed when setting it to 100 the total execution time dropped to 188 millis (that's an 132 (!!!) times faster than worse result we had). The batch size can also be set globally by setting the hibernate.default_batch_fetch_size property for the session factory. From http://www.jroller.com/eyallupu/

June 9, 2008

by Eyal Lupu

· 256,236 Views · 7 Likes

Taking the New Swing Tree Table for a Spin

Announcing the new Swing Tree Table yesterday, Tim Boudreau writes: Usage is incredibly easy - you just provide a standard Swing TreeModel of whatever sort you like, and an additional RowModel that can be queried for the other columns contents, editability and so forth. I found an example from some time ago, by Tim, and have been playing with it to get used to this new development. The result is as follows: To get started, I simply download the latest NetBeans IDE development build from netbeans.org and then attached the platform8/org-netbeans-swing-outline.jar to my Java SE project. For the rest, I wasn't required to do anything with NetBeans, necessarily. I could have attached the JAR to a project in Eclipse or anywhere else. Then I created a JFrame. To work with this Swing tree table, you need to provide the new "org.netbeans.swing.outline.Outline" class with the new "org.netbeans.swing.outline.OutlineModel" which, in turn, is built from a plain old javax.swing.tree.TreeModel, together with the new "org.netbeans.swing.outline.RowModel". Optionally, to change the default rendering, you can use the new "org.netbeans.swing.outline.RenderDataProvider". Let's first create a TreeModel for accessing files on disk. We will receive the root of the file system as a starting point: private static class FileTreeModel implements TreeModel { private File root; public FileTreeModel(File root) { this.root = root; } @Override public void addTreeModelListener(javax.swing.event.TreeModelListener l) { //do nothing } @Override public Object getChild(Object parent, int index) { File f = (File) parent; return f.listFiles()[index]; } @Override public int getChildCount(Object parent) { File f = (File) parent; if (!f.isDirectory()) { return 0; } else { return f.list().length; } } @Override public int getIndexOfChild(Object parent, Object child) { File par = (File) parent; File ch = (File) child; return Arrays.asList(par.listFiles()).indexOf(ch); } @Override public Object getRoot() { return root; } @Override public boolean isLeaf(Object node) { File f = (File) node; return !f.isDirectory(); } @Override public void removeTreeModelListener(javax.swing.event.TreeModelListener l) { //do nothing } @Override public void valueForPathChanged(javax.swing.tree.TreePath path, Object newValue) { //do nothing } } The above could simply be set as a JTree's model and then you'd have a plain old standard JTree. It would work, no problems, it would be a normal JTree. However, it wouldn't be a tree table since you'd only have a tree, without a table. Therefore, let's now add two extra columns, via the new "org.netbeans.swing.outline.RowModel" class, which will enable the creation of a tree table instead of a tree: private class FileRowModel implements RowModel { @Override public Class getColumnClass(int column) { switch (column) { case 0: return Date.class; case 1: return Long.class; default: assert false; } return null; } @Override public int getColumnCount() { return 2; } @Override public String getColumnName(int column) { return column == 0 ? "Date" : "Size"; } @Override public Object getValueFor(Object node, int column) { File f = (File) node; switch (column) { case 0: return new Date(f.lastModified()); case 1: return new Long(f.length()); default: assert false; } return null; } @Override public boolean isCellEditable(Object node, int column) { return false; } @Override public void setValueFor(Object node, int column, Object value) { //do nothing for now } } Now, after dragging-and-dropping an Outline object onto your JFrame (which is possible after adding the beans from the JAR to the NetBeans IDE Palette Manager) which, in turn, automatically creates a JScrollPane as well, this is how you could code the JFrame's constructor: public NewJFrame() { //Initialize the ui generated by the Matisse GUI Builder, which, //for example, adds the JScrollPane to the JFrame ContentPane: initComponents(); //Here I am assuming we are not on Windows, //otherwise use Utilities.isWindows() ? 1 : 0 //from the NetBeans Utilities API: TreeModel treeMdl = new FileTreeModel(File.listRoots()[0]); //Create the Outline's model, consisting of the TreeModel and the RowModel, //together with two optional values: a boolean for something or other, //and the display name for the first column: OutlineModel mdl = DefaultOutlineModel.createOutlineModel( treeMdl, new FileRowModel(), true, "File System"); //Initialize the Outline object: outline1 = new Outline(); //By default, the root is shown, while here that isn't necessary: outline1.setRootVisible(false); //Assign the model to the Outline object: outline1.setModel(mdl); //Add the Outline object to the JScrollPane: jScrollPane1.setViewportView(outline1); } Alternatively, without the NetBeans Matisse GUI Builder and NetBeans Palette Manager, i.e., simply using a standard Java class, you could do something like this: private Outline outline; public NewJFrame() { setDefaultCloseOperation(EXIT_ON_CLOSE); getContentPane().setLayout(new BorderLayout()); TreeModel treeMdl = new FileTreeModel(File.listRoots()[0]); OutlineModel mdl = DefaultOutlineModel.createOutlineModel( treeMdl, new FileRowModel(), true); outline = new Outline(); outline.setRootVisible(false); outline.setModel(mdl); getContentPane().add(new JScrollPane(outline),BorderLayout.CENTER); setBounds(20, 20, 700, 400); } At this point, you can run the JFrame, with this result: So, we see a lot of superfluous info that doesn't look very nice. Let's implement "org.netbeans.swing.outline.RenderDataProvider", as follows: private class RenderData implements RenderDataProvider { @Override public java.awt.Color getBackground(Object o) { return null; } @Override public String getDisplayName(Object o) { return ((File) o).getName(); } @Override public java.awt.Color getForeground(Object o) { File f = (File) o; if (!f.isDirectory() && !f.canWrite()) { return UIManager.getColor("controlShadow"); } return null; } @Override public javax.swing.Icon getIcon(Object o) { return null; } @Override public String getTooltipText(Object o) { File f = (File) o; return f.getAbsolutePath(); } @Override public boolean isHtmlDisplayName(Object o) { return false; } } Now, back in the constructor, add the renderer to the outline: outline1.setRenderDataProvider(new RenderData()); Run the JFrame again and the result should be the same as in the first screenshot above. Look again at the rendering code and note that, for example, you have tooltips:

June 4, 2008

by Geertjan Wielenga

· 83,973 Views

HashMap is not a Thread-Safe Structure

Last few months I have seen too much code where a HashMap (without any extra synchronization) is used instead of a thread-safe alternative like the ConcurrentHashMap or the less concurrent but still thread-safe HashTable. This is an example of a HashMap used in a home grown cache (used in a multi-threaded environment): interface ValueProvider{V retrieve(K key);}public class SomeCache{private Map map = new HashMap();private ValueProvider valueProvider;public SomeCache(ValueProvider valueProvider){this.valueProvider = valueProvider;}public V getValue(K key){V value = map.get(key);if(value == null){value = valueProvider.get(key);if(value!=null)map.put(key,value);}return value;} There is much wrong with this innocent looking piece of code. There is no happens before relation between the put of the value in the map, and the get of the value. This means that a thread that receives the value from the cache, doesn’t need to see all fields if the value has publication problems (most non thread-safe structures have publication problems). The same goes for the value and the internals (the buckets for example) of the HashMap. This means that updates to the internals of the HashMap while putting, don’t need to be visible to a thread that does the get. So it could be that the state of the cache in main memory is not in an allowed state (some of the changes maybe are stuck in the cpu-cache), and the cache could start behaving erroneous and if you are lucky starts throwing exceptions. And last, but certainly not least, there also is a classic race problem: if 2 threads do a interleaved map.put, the internals of the HashMap can get in an inconsistent state. In most cases an application reboot/redeploy would be the only way to fix this problem. There are other problems with the cache behavior of this code as well. The items don’t have a timeout, so once a value gets in the cache, it stays in the cache. In practice this could lead to web-page that keeps displaying some value, even though in the main repository the value has been updated. An application reboot also is the only way to solve this problem. Using a Common Of The Shelf (COTS) cache would be a much saver solution, even though a new library needs to be added. It is important to realize that a HashMap can be used perfectly in a multi-threaded environment if extra synchronization is added. But without extra synchronization, it is a time-bomb waiting to go off.

May 29, 2008

by Peter Veentjer

· 64,854 Views

Understanding HBase and BigTable

The hardest part about learning Hbase (the open source implementation of Google's BigTable), is just wrapping your mind around the concept of what it actually is. I find it rather unfortunate that these two great systems contain the words table and base in their names, which tend to cause confusion among RDBMS indoctrinated individuals (like myself). This article aims to describe these distributed data storage systems from a conceptual standpoint. After reading it, you should be better able to make an educated decision regarding when you might want to use Hbase vs when you'd be better off with a "traditional" database. It's all in the terminology Fortunately, Google's BigTable Paper clearly explains what BigTable actually is. Here is the first sentence of the "Data Model" section: A Bigtable is a sparse, distributed, persistent multidimensional sorted map. Note: At this juncture I like to give readers the opportunity to collect any brain matter which may have left their skulls upon reading that last line. The BigTable paper continues, explaining that: The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes. Along those lines, the HbaseArchitecture page of the Hadoop wiki posits that: HBase uses a data model very similar to that of Bigtable. Users store data rows in labelled tables. A data row has a sortable key and an arbitrary number of columns. The table is stored sparsely, so that rows in the same table can have crazily-varying columns, if the user likes. Although all of that may seem rather cryptic, it makes sense once you break it down a word at a time. I like to discuss them in this sequence: map, persistent, distributed, sorted, multidimensional, and sparse. Rather than trying to picture a complete system all at once, I find it easier to build up a mental framework piecemeal, to ease into it... map At its core, Hbase/BigTable is a map. Depending on your programming language background, you may be more familiar with the terms associative array (PHP), dictionary (Python), Hash (Ruby), or Object (JavaScript). From the wikipedia article, a map is "an abstract data type composed of a collection of keys and a collection of values, where each key is associated with one value." Using JavaScript Object Notation, here's an example of a simple map where all the values are just strings: { "zzzzz" : "woot", "xyz" : "hello", "aaaab" : "world", "1" : "x", "aaaaa" : "y" } persistent Persistence merely means that the data you put in this special map "persists" after the program that created or accessed it is finished. This is no different in concept than any other kind of persistent storage such as a file on a filesystem. Moving along... distributed Hbase and BigTable are built upon distributed filesystems so that the underlying file storage can be spread out among an array of independent machines. Hbase sits atop either Hadoop's Distributed File System (HDFS) or Amazon's Simple Storage Service (S3), while a BigTable makes use of the Google File System (GFS). Data is replicated across a number of participating nodes in an analogous manner to how data is striped across discs in a RAID system. For the purpose of this article, we don't really care which distributed filesystem implementation is being used. The important thing to understand is that it is distributed, which provides a layer of protection against, say, a node within the cluster failing. sorted Unlike most map implementations, in Hbase/BigTable the key/value pairs are kept in strict alphabetical order. That is to say that the row for the key "aaaaa" should be right next to the row with key "aaaab" and very far from the row with key "zzzzz". Continuing our JSON example, the sorted version looks like this: { "1" : "x", "aaaaa" : "y", "aaaab" : "world", "xyz" : "hello", "zzzzz" : "woot" } Because these systems tend to be so huge and distributed, this sorting feature is actually very important. The spacial propinquity of rows with like keys ensures that when you must scan the table, the items of greatest interest to you are near each other. This is important when choosing a row key convention. For example, consider a table whose keys are domain names. It makes the most sense to list them in reverse notation (so "com.jimbojw.www" rather than "www.jimbojw.com") so that rows about a subdomain will be near the parent domain row. Continuing the domain example, the row for the domain "mail.jimbojw.com" would be right next to the row for "www.jimbojw.com" rather than say "mail.xyz.com" which would happen if the keys were regular domain notation. It's important to note that the term "sorted" when applied to Hbase/BigTable does not mean that "values" are sorted. There is no automatic indexing of anything other than the keys, just as it would be in a plain-old map implementation. multidimensional Up to this point, we haven't mentioned any concept of "columns", treating the "table" instead as a regular-old hash/map in concept. This is entirely intentional. The word "column" is another loaded word like "table" and "base" which carries the emotional baggage of years of RDBMS experience. Instead, I find it easier to think about this like a multidimensional map - a map of maps if you will. Adding one dimension to our running JSON example gives us this: { "1" : { "A" : "x", "B" : "z" }, "aaaaa" : { "A" : "y", "B" : "w" }, "aaaab" : { "A" : "world", "B" : "ocean" }, "xyz" : { "A" : "hello", "B" : "there" }, "zzzzz" : { "A" : "woot", "B" : "1337" } } In the above example, you'll notice now that each key points to a map with exactly two keys: "A" and "B". From here forward, we'll refer to the top-level key/map pair as a "row". Also, in BigTable/Hbase nomenclature, the "A" and "B" mappings would be called "Column Families". A table's column families are specified when the table is created, and are difficult or impossible to modify later. It can also be expensive to add new column families, so it's a good idea to specify all the ones you'll need up front. Fortunately, a column family may have any number of columns, denoted by a column "qualifier" or "label". Here's a subset of our JSON example again, this time with the column qualifier dimension built in: { // ... "aaaaa" : { "A" : { "foo" : "y", "bar" : "d" }, "B" : { "" : "w" } }, "aaaab" : { "A" : { "foo" : "world", "bar" : "domination" }, "B" : { "" : "ocean" } }, // ... } Notice that in the two rows shown, the "A" column family has two columns: "foo" and "bar", and the "B" column family has just one column whose qualifier is the empty string (""). When asking Hbase/BigTable for data, you must provide the full column name in the form ":". So for example, both rows in the above example have three columns: "A:foo", "A:bar" and "B:". Note that although the column families are static, the columns themselves are not. Consider this expanded row: { // ... "zzzzz" : { "A" : { "catch_phrase" : "woot", } } } In this case, the "zzzzz" row has exactly one column, "A:catch_phrase". Because each row may have any number of different columns, there's no built-in way to query for a list of all columns in all rows. To get that information, you'd have to do a full table scan. You can however query for a list of all column families since these are immutable (more-or-less). The final dimension represented in Hbase/BigTable is time. All data is versioned either using an integer timestamp (seconds since the epoch), or another integer of your choice. The client may specify the timestamp when inserting data. Consider this updated example utilizing arbitrary integral timestamps: { // ... "aaaaa" : { "A" : { "foo" : { 15 : "y", 4 : "m" }, "bar" : { 15 : "d", } }, "B" : { "" : { 6 : "w" 3 : "o" 1 : "w" } } }, // ... } Each column family may have its own rules regarding how many versions of a given cell to keep (a cell is identified by its rowkey/column pair) In most cases, applications will simply ask for a given cell's data, without specifying a timestamp. In that common case, Hbase/BigTable will return the most recent version (the one with the highest timestamp) since it stores these in reverse chronological order. If an application asks for a given row at a given timestamp, Hbase will return cell data where the timestamp is less than or equal to the one provided. Using our imaginary Hbase table, querying for the row/column of "aaaaa"/"A:foo" will return "y" while querying for the row/column/timestamp of "aaaaa"/"A:foo"/10 will return "m". Querying for a row/column/timestamp of "aaaaa"/"A:foo"/2 will return a null result. sparse The last keyword is sparse. As already mentioned, a given row can have any number of columns in each column family, or none at all. The other type of sparseness is row-based gaps, which merely means that there may be gaps between keys. This, of course, makes perfect sense if you've been thinking about Hbase/BigTable in the map-based terms of this article rather than perceived similar concepts in RDBMS's. And that's about it Well, I hope that helps you understand conceptually what the Hbase data model feels like. As always, I look forward to your thoughts, comments and suggestions.

May 22, 2008

by Jim Wilson

· 84,601 Views · 5 Likes

Getting to Know Immutable Data Structures - Immutability and Concurrency – Part I

When asking the question how does functional programming help me with concurrent programming? The standard response tends to be functional programming use immutable data structures, read-only data structures can be shared between threads without issues, end of problem. Except it isn’t. Immutable data structures have a different set of problems associated with them when working on concurrent problems. This post will examine what these problems are, and then show that this is just a special case of a more general set of problems when working with immutable data structures. Finally will start taking a look at how we solve some of these problems, but in a single thread environment first of all. First let’s frame the problem by looking at how imperative programs work with threads. In a classic imperative/OO languages programmers tend to use either instance or static member variables to send messages between threads, let’s look a fragment of C# that does something classically multi-threaded: object workQueueLock = new object(); Queue workQueue = new Queue(); // this method runs on its own thread private void Worker() { while (true) { // define a work item attempt to retrive work from the queue WorkItem item = null; lock (workQueueLock) { if (workQueue.Count > 0) { item = workQueue.Dequeue(); } } // check if we have work to do, otherwise sleep if (item != null) { // do some work } else { Thread.Sleep(QueuePollInterval); } } } // an even that resets our flag private void Some_Event(object sender, WorkEventArgs ea) { lock (workQueueLock) { workQueue.Enqueue(ea.WorkItem); } } I wouldn’t recommend you use this naive version of a work queue, but the above code is straight forward enough to understand easily and illustrate the typical way imperative programs communicate between threads. We have a member variable “workQueue” that controls stores the work to be done, the method “Worker” is designed to read from this queue and if there’s some work to do, do the work, otherwise sleep till it’s time to poll the queue again. We use “workQueue” again in “Some_event” to send a message to “Worker”, to enqueue some work for it to do. It’s easy to see that mutation of the variable “workQueue” is essential to get this to work; if we couldn’t change the content of “workQueue” then we couldn’t send the message. It’s also easy to see that we now have a huge number of implementation choices: Do we lock on the queue, or have separate lock? What’s the shortest possible time we can hold the lock for (to avoid other threads be blocked when they want to write to the queue)? How long do we sleep for before polling the queue? Too shorter time and we risk wasting too much processor time polling the queue, too longer time and risk that the queue because unreactive because the worker wastes too much time sleeping when there’s work to be done. In pure functional programming there are no variables or mutation, so the above scenario simply isn’t possible. Sure, F# isn’t a pure function language, actually most functional languages aren’t, so you can indeed use mutable data structures to implement something similar to the C# fragment we showed earlier, but that’s not the point we want to learn how to use immutable data structures. To fully understand the limitations of immutable data structures, let’s look at another C# example do something simpler. Imagine that we want compute a key of a value then store it in a member variable, a dictionary in this case, for later use: Dictionary myDict = new Dictionary(); public void ReceiveValue(string val) { myDict.Add(ComputerKey(val), val); } Now let’s think about how we can translate this into F#. Firstly, if don’t mind being dirty and mutable we can translate this fragment verbatim: type Store() = let myDict = new Dictionary() member x.ReceiveValue (value:string) = myDict.Add(x.ComputeKey value, value) However, if we don’t want to be mutable it’s not quite so straight forward. F# contains a type called “Map”, which is very similar to a Dictionary except that it is immutable. When you add a new item to a map you don’t change the map you create a new version of the map with the new key added. So here is how our store class would look to if we used an immutable “Map” data structure: type ComputeKeys(myDict:Map) = member x.ReceiveValue (value:string) = new ComputeKeys(myDict.Add(x.ComputeKey value, value)) The important thing to notice is that we now have no “let” definition where we store our dictionary; instead the dictionary is passed to the class constructor. So our constructor receives a “Map”, and when we use our “ReceiveValue” method we create a new instance of the “ComputerKeys” which contains the newly created value. I think the type signature really helps us understand what’s going on: type ComputeKeys = class end with member ReceiveValue : value:string -> ComputeKeys new : myDict:Map -> ComputeKeys end This is pretty much the revelation of immutable data structures, “let” definitions become merely short conveniences for values, not memory location that can be updated at a later data if we want to. These new values are all held on the threads stack, if were being pure and fully immutable that we have no memory locations that we can write them to. Okay let’s have a look at how we might use these two classes: /// wraps a Dictionary to provide /// some hashing and printing functions type Store() = // the dictionary that stores the values let myDict = new Dictionary() /// receive a value, hash it store it member x.ReceiveValue (value:string) = myDict.Add(x.ComputeKey value, value) /// computers the hash (a bit naff for now) member x.ComputeKey (value:string) = value.GetHashCode().ToString() /// prints the stored values override x.ToString() = let stringWriter = new StringWriter() for key in myDict.Keys do stringWriter.WriteLine("{0}: {1}", key, myDict.[key]) stringWriter.ToString() let useStore() = let store = new Store() store.ReceiveValue("One") store.ReceiveValue("Two") store.ReceiveValue("Three") printfn "%s" (store.ToString()) The mutable version needs little explanation, it is classical imperative programming, we create an instance of store then add values to our store, and finally we print them out. Now compare this with the immutable version: /// wraps a Map to provide /// some hashing and printing functions type ComputeKeys(myDict:Map) = /// receive a value, hash it, return the new value member x.ReceiveValue (value:string) = new ComputeKeys(myDict.Add(x.ComputeKey value, value)) /// computers the hash (a bit naff for now) member x.ComputeKey (value:string) = value.GetHashCode().ToString() /// prints values in the map override x.ToString() = myDict.Fold (fun key value acc -> Printf.sprintf "%s \r\n%s: %s" acc key value) "" let useComputeKeys() = let keysEmpty = new ComputeKeys(Map.empty) let keysOne = keysEmpty.ReceiveValue("One") let keysTwo = keysOne.ReceiveValue("Two") let keysThree = keysTwo.ReceiveValue("Three") printfn "%s" (keysThree.ToString()) The thing to notice here is how similar using the immutable ComputeKeys class is to using the Store class. We create an instance of the class, we add values to it, and then finally we print it. The only difference being that we need to catch the value returned from RecieveValue and use this value in the next step. Here we’ve used different names for each instance – to illustrate that each let binding is to a different instances, but we don’t need to do that we can reuse the same name to save inventing new names: let useComputeKeysAlt() = let keys = new ComputeKeys(Map.empty) let keys = keys.ReceiveValue("One") let keys = keys.ReceiveValue("Two") let keys = keys.ReceiveValue("Three") printfn "%s" (keys.ToString()) The take away from this is that programming with immutable data structures when we have one thread of execution is not that different to programming with imperative mutable structures, we just have to remember that every time we want to make a change we copy and add rather than update. Wrapping It Up In this inductor post we’ve looked at why mutation is important to classical concurrent programming, and indeed classical imperative programming. Then we looked at immutable data structures and compared they way that they work to mutable data structures. In the next post we’ll dig deeper into immutable data structures, to really get a feel for the programming possibilities they offer. Then in the post after that we’ll look at concurrent programming with immutable data structures and finally get to grips the problem we posed ourselves in the first couple of paragraphs of this post. Patience is a virtue and good things come to those who wait J.

May 20, 2008

by Robert Pickering

· 8,125 Views

Python and the Star Schema

The star schema represents data as a table of facts (measurable values) that are associated with the various dimensions of the fact. Common dimensions include time, geography, organization, product and the like. I'm working with some folks whose facts are a bunch of medical test results, and the dimensions are patient, date, and a facility in which the tests were performed. I got an email with the following situation: "a client who is processing gigs of incoming fact data each day and they use a host of C/C++, Perl, mainframe and other tools for their incoming fact processing and I've seriously considered pushing Python in their organization.". Here are my thoughts on using Python for data warehousing when you've got Gb of data daily. Small Dimensions The pure Python approach only works when your dimension will comfortably fit into memory -- not a terribly big problem with most dimensions. Specifically, it doesn't work well for those dimensions which are so huge that the dimensional model becomes a snowflake instead of a simple star. When dealing with a large number of individuals (public utilities, banks, medical management, etc.) the "customer" (or "patient") dimension gets too big to fit into memory. Special bridge-table techniques must be used. I don't think Python would be perfect for this, since this involves slogging through a lot of data one record at a time. However, Python is considerably faster than PL/SQL. I don't know how it compares with Perl. Any programming language will be faster than any SQL procedure, because there's no RDBMS overhead. For all small dimensions. Load the dimension values from the RDBMS into a dict with a single query. Read all source data records (ideally from a flat file); conform the dimension, tracking changes; write a result record with the dimension FK information to a flat file. Iterate through the dimension dictionary and persist the dimension changes. The details vary with the Slowly Changing Dimension (SCD) rules you're using. The conformance algorithm is is essentially the following: row= Dimension(...) ident= ( row.field, row.field, row.field, ... ) dimension.setdefault( ident, row ) In some cases (like the Django ORM) this is called the get-or-create query. The Dimension Bus For BIG dimensions, I think you still have to implement the "dimension bus" outlined in The Data Warehouse Toolkit. To do this in Python, you should probably design things to look something like the following. For any big dimensions. Use an external sort-merge utility. Seriously. They're way fast for data sets too large to fit into memory. Use CSV format files and the resulting program is very tidy. The outline is as follows: First, sort the source data file into order by the identifying fields of the big dimension (customer number, patient number, whatever). Second, query the big dimension into a data file and sort it into the same order as the source file. (Using the SQL ORDER BY may be slower than an external sort; only measurements can tell which is faster.) Third, do a "match merge" to locate the differences between the dimension and the source. Don't use a utility like diff, it's too slow. This is a simple key matching between two files. The match-merge loop looks something like this. src= sourceFile.next() dim= dimensionFile.next() try: while True: src_key = ( src['field'], src['field'], ... ) dim_key= ( dim['field'], dim['field'], ... ) if src_key < dim_key: # missing some dimension values update_dimension( src ) src= sourceFile.next() elif dim_key < src_key: # extra dimension values dim= dimensionFile.next() else: # src and dim keys match # check non-key attributes for dimension change. src= sourceFile.next() except StopIteration, e: # if source is at end-of-file, that's good, we're done. # if dim is at end of file, all remaining src rows are dimension updates. for src in sourceFile: update_dimension( src ) At the end of this pass, you'll accumulate a file of customer dimension adds and changes, which is then persisted into the actual customer dimension in the database. This pass will also write new source records with the customer FK. You can also handle demographic or bridge tables at this time, too. Fact Loading The first step in DW loading is dimensional conformance. With a little cleverness the above processing can all be done in parallel, hogging a lot of CPU time. To do this in parallel, each conformance algorithm forms part of a large OS-level pipeline. The source file must be reformatted to leave empty columns for each dimension's FK reference. Each conformance process reads in the source file and writes out the same format file with one dimension FK filled in. If all of these conformance algorithms form a simple OS pipe, they all run in parallel. It looks something like this. src2cvs source | conform1 | conform2 | conform3 | load At the end, you use the RDBMS's bulk loader (or write your own in Python, it's easy) to pick the actual fact values and the dimension FK's out of the source records that are fully populated with all dimension FK's and load these into the fact table. I've written conformance processing in Java (which is faster than Python) and had to give up on SQL-based conformance for large dimensions. Instead, we did the above flat-file algorithm to merge large dimensions. The killer isn't the language speed, it's the RDBMS overheads. Once you're out of the database, things blaze. Indeed, products like the syncsort data sort can do portions of the dimension conformance at amazing speeds for large datasets. Hand Wringing "But," the hand-wringers say, "aren't you defeating the value of the RDBMS by working outside it?" The answer is NO. We're not doing incremental, transactional processing here. There aren't multiple update transactions in a warehouse. There are queries and there are bulk loads. Doing the prep-work for a bulk load outside the database is simply more efficient. We don't need locks, rollback segments, memory management, threading, concurrency, ACID rules or anything. We just need to match-merge the large dimension and the incoming facts.

May 20, 2008

by Steven Lott

· 11,292 Views · 1 Like

5 Techniques for Creating Java Web Services From WSDL

WSDL is a version of XML used to better work with web severs. In this post, we'll learn how to better use it alongside the Java language.

April 29, 2008

by Milan Kuchtiak

· 604,524 Views