Data Engineering Resources

The Latest Data Engineering Topics

How to create/restore a slave using GTID replication in MySQL 5.6

This post comes from Miguel Angel Nieto at the MySQL Performance Blog. MySQL 5.6 is GA! Now we have new things to play with and in my personal opinion the most interesting one is the new Global Transaction ID (GTID) support in replication. This post is not an explanation of what is GTID and how it works internally because there are many documents about that: http://dev.mysql.com/doc/refman/5.6/en/replication-gtids-concepts.html One thing that worths to mention is that if you want GTID support log_slave_updates will need to be enabled in slave server and the performance impact should be taken in account. Anyway, this post tends to be more practical, we will see how to create/restore new slaves from a master using GTID. How to set up a new slave The first thing that we need to know is that now Binary Logs and Position are not needed anymore with GTID enabled. Instead we need to know in which GTID is the master and set it on the slave. MySQL keeps two global variables with GTID numbers on it: gtid_executed: it contains a representation of the set of all transaction logged in the binary log gtid_purged: it contains a representation of the set of all transactions deleted from the binary log So now, the process is the following: take a backup from the master and store the value of gtid_executed restore the backup on the slave and set gtid_purged with the value of gtid_executed from the master The new mysqldump can do those tasks for us. Let’s see an example of how to take a backup from the master and restore it on the slave to set up a new replication server. master > show global variables like 'gtid_executed'; +---------------+-------------------------------------------+ | Variable_name | Value | +---------------+-------------------------------------------+ | gtid_executed | 9a511b7b-7059-11e2-9a24-08002762b8af:1-13 | +---------------+-------------------------------------------+ master > show global variables like 'gtid_purged'; +---------------+------------------------------------------+ | Variable_name | Value | +---------------+------------------------------------------+ | gtid_purged | 9a511b7b-7059-11e2-9a24-08002762b8af:1-2 | +---------------+------------------------------------------+ Now we take a backup with mysqldump from the master: # mysqldump --all-databases --single-transaction --triggers --routines --host=127.0.0.1 --port=18675 --user=msandbox --password=msandbox > dump.sql It will contain the following line: # grep PURGED dump.sql SET @@GLOBAL.GTID_PURGED='9a511b7b-7059-11e2-9a24-08002762b8af:1-13'; Therefore during the dump recover process on the slave it will set GTID_PURGED to the GTID_EXECUTED value from the master. So now, we just need to recover the dump and start the replication: slave1 > show global variables like 'gtid_executed'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | gtid_executed | | +---------------+-------+ slave1 > show global variables like 'gtid_purged'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | gtid_purged | | +---------------+-------+ slave1 > slave1> source test.sql; [...] slave1 > show global variables like 'gtid_executed'; +---------------+-------------------------------------------+ | Variable_name | Value | +---------------+-------------------------------------------+ | gtid_executed | 9a511b7b-7059-11e2-9a24-08002762b8af:1-13 | +---------------+-------------------------------------------+ slave1 > show global variables like 'gtid_purged'; +---------------+-------------------------------------------+ | Variable_name | Value | +---------------+-------------------------------------------+ | gtid_purged | 9a511b7b-7059-11e2-9a24-08002762b8af:1-13 | +---------------+-------------------------------------------+ The last step is to configure the slave using the auto-configuration method of GTID: slave1 > CHANGE MASTER TO MASTER_HOST="127.0.0.1", MASTER_USER="msandbox", MASTER_PASSWORD="msandbox", MASTER_PORT=18675, MASTER_AUTO_POSITION = 1; How to restore a slave in a bad and fast way Let’s imagine that our slave has been down for several days and the binary logs from the master have been purged. This is the error we are going to get: Slave_IO_Running: No Slave_SQL_Running: Yes Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.' So, let’s try to solve it. First we have the bad and fast way, that is, point to another GTID that the master has in the binary logs. First, we get the GTID_EXECUTED from the master: master > show global variables like 'GTID_EXECUTED'; +---------------+-------------------------------------------+ | Variable_name | Value | +---------------+-------------------------------------------+ | gtid_executed | 9a511b7b-7059-11e2-9a24-08002762b8af:1-14 | +---------------+-------------------------------------------+ And we set it on the slave: slave> set global GTID_EXECUTED="9a511b7b-7059-11e2-9a24-08002762b8af:1-14" ERROR 1238 (HY000): Variable 'gtid_executed' is a read only variable Error! Remember, we get the GTID_EXECUTED from the master and set is as GTID_PURGED on the slave. slave1 > set global GTID_PURGED="9a511b7b-7059-11e2-9a24-08002762b8af:1-14"; ERROR 1840 (HY000): GTID_PURGED can only be set when GTID_EXECUTED is empty. Error again, GTID_EXECUTED should be empty before changing GTID_PURGED manually but we can’t change it with SET because is a read only variable. The only way to change it is with reset master (yes, on a slave server): slave1> reset master; slave1 > show global variables like 'GTID_EXECUTED'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | gtid_executed | | +---------------+-------+ slave1 > set global GTID_PURGED="9a511b7b-7059-11e2-9a24-08002762b8af:1-14"; slave1> start slave io_thread; slave1> show slave status\G [...] Slave_IO_Running: Yes Slave_SQL_Running: Yes [...] Now, if you don’t get any error like primary/unique key duplication then you can run the pt-table-checksum and pt-table-sync. How to restore a slave in a good and slow way The good way is mysqldump again. We take a dump from the master like we saw before and try to restore it on the slave: slave1 [localhost] {msandbox} ((none)) > source test.sql; [...] ERROR 1840 (HY000): GTID_PURGED can only be set when GTID_EXECUTED is empty. [...] Wop! It is important to mention that these kind of error messages can dissapear on the shell buffer because the restore of the dump will continue. Be cautious. Same problem again so same solution too: slave1> reset master; slave1> source test.sql; slave1> start slave; slave1> show slave status\G [...] Slave_IO_Running: Yes Slave_SQL_Running: Yes [...] Conclusion With the new GTID we need to change our minds. Now binary log and position is not something we need to take in account, gtid_executed and gtid_purged are our new friends. Xtrabackup still doesn’t support it but we are working on it. I will update this post and create a one when we publish a xtrabackup version with full support of GTID.

February 11, 2013

by Peter Zaitsev

· 18,747 Views

Building SOLID Databases: Liskov Substitution Weirdness

if the open/closed principle starts looking a little strange as applied to object-relational design, the liskov substitution principle applies in almost the exactly opposite way in the database as in the applications. the reason become only clear once exploring this principle, seeing the limits on it, and investigating the realities of database development. the canonical examples of lsp violations in application programs do not violate the principle at all in the database. as always the reason is that databases model different things than applications and so their constraints are different. the formal definition of the liskov substitution principle let be a property provable about objects of type . then should be provable for objects of type where is a subtype of . the liskov substitution principle in application programming the liskov substitution principle is important because it ensures provability of state and behavior for subtypes. this means you have to look at the full constraints that operate on a class method (excluding constructors i think, which operate under a different set of constraints). in general for the lsp to be satisfied, preconstraints cannot be weakened, post-constraints cannot be strengthened, and invariants must be maintained not only in inherited methods but also non-inherited ones (this is broad enough to include the history constraint). the typical problem this addresses is the square-rectangle problem, which is a well-understood problem in object-oriented programming, and one which calls into question certain promises of oop as a whole. the square-rectangle problem establishes a problem of irreducible complexity in object-oriented programming. the typical examples are either "a square is-a rectangle" or "a circle is-a elipse." mathematically both of these statements are true but implementing them in an object-oriented framework adds a great deal of complexity. the reason is that the mathematical hierarchy has no concept of a change of state which preserves invariances, and so the object hierarchy ends up being complicated by the need to shim in what is notably lacking in the geometric hierarchy. this leads to either an explosion of abstract classes, an explosion of object metadata (can this rectangle be stretched -- always 'no' for a square)? can it be scaled?), or a loss of the sort of functionality we typically want from inheritance. on the database level, these problems mostly don't exist in relational designs but they do exist in object relational designs, only differently (more on that below). the liskov substitution principle in application programming primarily operates primarily to preserve assumptions regarding state changes in an object when an interface is called. if a square (defined as a rectangle where x and y measurements are the same) can be stretched, it is no longer a square. on the other hand, if we can't stretch it, it may not fill the contract of the rectangle. when a square is-a rectangle subtype the square-rectangle problem can be turned on its head to some extent with the question of when this is-a relationship is valid. in fact the is-a relationship is valid for immutable objects. it is perfectly valid to have a class immutablesquare be a subclass of immutablerectangle. secondly the subtyping can be valid if sufficient information is attached to the class to specify the invariants. for example we might have a class with attributes "can_stretch" and "can_scale" where setting can_stretch off (as it would always be in a square) ensures that length to width proportions are always preserved. the problem here is that the subclass invariances require support in the superclass and this leads to a lot of complexity in implementing the superclass, as well as fundamental limits of what can then be subclassed. a look at database limitations in a normal relational database, the lsp is always met. domain-level constraints only apply to storage and not to calculation outputs, for example. in database systems where tables instantiate types and substitutability is not anticipated (oracle, db2), then the lsp applies in the same way it does in application programming. in the cases of table inheritance (informix, postgresql), things start to get a little weird. for the purpose of stored data, the lsp is generally satisfied because constraints are inherited and strengthened. a subtype is simply a sub-domain of the parent type possibly with some additional attributes. the square-rectangle problem in object-relational databases the square-rectangle problem is only even a potential problem in object-relational databases. purely relational designs can never run into this issue. in object-relational designs a few very specific issues can occur depending entirely on how object-relational functionality is implemented at the database level. in databases like oracle and db2, the lsp applies pretty much as is. in informix and postgresql, the problems faced are actually somewhat different and lead to new classes of anomilies that need to be considered. these include most prominently update anomilies when parent tables are updated. these do not necessarily have easy workarounds. consider the following table structure: create table my_rectangle ( id serial primary key, height numeric, width numeric ); create table my_square ( check (height = width) ) inherits (my_rectangle); now, we may insert a bunch of squares and rectangles, and we could have checks. moreover we could have custom triggers to verify referential integrity for rows referencing my_rectangle and my_square. so suppose we use update on my_rectangle to double the height of every rectangle in the system: update my_rectangle set height = height * 2; oops, we can't do that. while rows in my_rectangle will happily be adjusted, rows in the child table my_square will not, and you will get an error. this error can be avoided through a few issues, each of which has problems: disallow updates, only allow inserts and deletes. this makes referential integrity harder to manage when the row must be deleted and re-inserted. use a trigger to rewrite an update to be a delete plus insert into either the current or parent table depending on constraints. this makes ri harder to manage in that the procedure must disable custom ri triggers before doing this. this would require that the procedure be aware of all custom ri triggers in order to do this. conclusions the liskov substitution principle depends quite a bit on specifics of how an object-relational database system intersects tables and objects for how it applies. in general in a purely relational design you will never need to worry about it, but in an object-relational design there are some nasty corner cases that can come up, particularly where a query may operate on heterogenous subtypes as a set. this is, perhaps, the only one of the solid principles which is o-r specific and it hits the db in different ways than the application because the db operates on different principles.

February 11, 2013

by Chris Travers

· 4,675 Views

Java 8: From PermGen to Metaspace

As you may be aware, the JDK 8 Early Access is now available for download. This allows Java developers to experiment with some of the new language and runtime features of Java 8. One of these features is the complete removal of the Permanent Generation (PermGen) space which has been announced by Oracle since the release of JDK 7. Interned strings, for example, have already been removed from the PermGen space since JDK 7. The JDK 8 release finalizes its decommissioning. This article will share the information that we found so far on the PermGen successor: Metaspace. We will also compare the runtime behavior of the HotSpot 1.7 vs. HotSpot 1.8 (b75) when executing a Java program “leaking” class metadata objects. The final specifications, tuning flags and documentation around Metaspace should be available once Java 8 is officially released. Metaspace: A new memory space is born The JDK 8 HotSpot JVM is now using native memory for the representation of class metadata and is called Metaspace; similar to the Oracle JRockit and IBM JVM's. The good news is that it means no more java.lang.OutOfMemoryError: PermGen space problems and no need for you to tune and monitor this memory space anymore…not so fast. While this change is invisible by default, we will show you next that you will still need to worry about the class metadata memory footprint. Please also keep in mind that this new feature does not magically eliminate class and classloader memory leaks. You will need to track down these problems using a different approach and by learning the new naming convention. I recommend that you read the PermGen removal summary and comments from Jon on this subject. In summary: PermGen space situation This memory space is completely removed. The PermSize and MaxPermSize JVM arguments are ignored and a warning is issued if present at start-up. Metaspace memory allocation model Most allocations for the class metadata are now allocated out of native memory. The klasses that were used to describe class metadata have been removed. Metaspace capacity By default class metadata allocation is limited by the amount of available native memory (capacity will of course depend if you use a 32-bit JVM vs. 64-bit along with OS virtual memory availability). A new flag is available (MaxMetaspaceSize), allowing you to limit the amount of native memory used for class metadata. If you don’t specify this flag, the Metaspace will dynamically re-size depending of the application demand at runtime. Metaspace garbage collection Garbage collection of the dead classes and classloaders is triggered once the class metadata usage reaches the “MaxMetaspaceSize”. Proper monitoring & tuning of the Metaspace will obviously be required in order to limit the frequency or delay of such garbage collections. Excessive Metaspace garbage collections may be a symptom of classes, classloaders memory leak or inadequate sizing for your application. Java heap space impact Some miscellaneous data has been moved to the Java heap space. This means you may observe an increase of the Java heap space following a future JDK 8 upgrade. Metaspace monitoring Metaspace usage is available from the HotSpot 1.8 verbose GC log output. Jstat & JVisualVM have not been updated at this point based on our testing with b75 and the old PermGen space references are still present. Enough theory now, let’s see this new memory space in action via our leaking Java program… PermGen vs. Metaspace runtime comparison In order to better understand the runtime behavior of the new Metaspace memory space, we created a class metadata leaking Java program. You can download the source here. The following scenarios will be tested: Run the Java program using JDK 1.7 in order to monitor & deplete the PermGen memory space set at 128 MB. Run the Java program using JDK 1.8 (b75) in order to monitor the dynamic increase and garbage collection of the new Metaspace memory space. Run the Java program using JDK 1.8 (b75) in order to simulate the depletion of the Metaspace by setting the MaxMetaspaceSize value at 128 MB. JDK 1.7 @64-bit – PermGen depletion Java program with 50K configured iterations Java heap space of 1024 MB Java PermGen space of 128 MB (-XX:MaxPermSize=128m) As you can see form JVisualVM, the PermGen depletion was reached after loading about 30K+ classes. We can also see this depletion from the program and GC output. Class metadata leak simulator Author: Pierre-Hugues Charbonneau http://javaeesupportpatterns.blogspot.com ERROR: java.lang.OutOfMemoryError: PermGen space Now let’s execute the program using the HotSpot JDK 1.8 JRE. JDK 1.8 @64-bit – Metaspace dynamic re-size Java program with 50K configured iterations Java heap space of 1024 MB Java Metaspace space: unbounded (default) As you can see from the verbose GC output, the JVM Metaspace did expand dynamically from 20 MB up to 328 MB of reserved native memory in order to honor the increased class metadata memory footprint from our Java program. We could also observe garbage collection events in the attempt by the JVM to destroy any dead class or classloader object. Since our Java program is leaking, the JVM had no choice but to dynamically expand the Metaspace memory space. The program was able to run its 50K of iterations with no OOM event and loaded 50K+ Classes. Let's move to our last testing scenario. JDK 1.8 @64-bit – Metaspace depletion Java program with 50K configured iterations Java heap space of 1024 MB Java Metaspace space: 128 MB (-XX:MaxMetaspaceSize=128m) As you can see form JVisualVM, the Metaspace depletion was reached after loading about 30K+ classes; very similar to the run with the JDK 1.7. We can also see this from the program and GC output. Another interesting observation is that the native memory footprint reserved was twice as much as the maximum size specified. This may indicate some opportunities to fine tune the Metaspace re-size policy, if possible, in order to avoid native memory waste. Now find below the Exception we got from the Java program output. Class metadata leak simulator Author: Pierre-Hugues Charbonneau http://javaeesupportpatterns.blogspot.com ERROR: java.lang.OutOfMemoryError: Metadata space Done! As expected, capping the Metaspace at 128 MB like we did for the baseline run with JDK 1.7 did not allow us to complete the 50K iterations of our program. A new OOM error was thrown by the JVM. The above OOM event was thrown by the JVM from the Metaspace following a memory allocation failure. #metaspace.cpp Final words I hope you appreciated this early analysis and experiment with the new Java 8 Metaspace. The current observations definitely indicate that proper monitoring & tuning will be required in order to stay away from problems such as excessive Metaspace GC or OOM conditions triggered from our last testing scenario. Future articles may include performance comparisons in order to identify potential performance improvements associated with this new feature. Please feel free to provide any comment.

February 11, 2013

by Pierre - Hugues Charbonneau

· 598,591 Views · 33 Likes

How Do You Organise Maven Sub-Modules?

Being an itinerant programmer one of the things I've noticed over the years is that every project you come across seems to have a slightly different way of organising its Maven modules. There seems to be no conventional way of characterising the contents of a project's sub-modules and not that much discussion on it either. This is strange, as defining the responsibilities of your Maven modules seems to me to be as critical as good class design and coding technique to a project's success. So, in light of this dearth of wisdom, here's my two penneth worth... When you first come across a new project, you'll generally find a layout convention that vaguely matches that defined by the Better Builds With Maven manual. The 'clean' project directory usually contains a POM file, a src folder and several sub-modules, each in their own subdirectory, as shown in the diagram below: If we all agree that this is the standard way of approaching the top level of project layout (and I have seen it done slightly differently) then there seems to be three different approaches taken when organising the responsibilities of each of a project's sub-modules. These are: Totally haphazardly. By class type. By functional area. I'm not going to linger on those projects that are organised seemingly without any structure or order except to say that they probably started off well organised but were not designed well enough to endure the changes forced upon them. In saying that a project's sub-modules are organised 'by class type', I mean that modules are used to group together all classes that comprise, but are not limited to, a layer in the program's architecture. For example a module could contain all classes that make up the program's service or persistence layers or a module could contain all model class (i.e. beans). Conversely, in saying that a project's sub-modules are organised by functional area I'm talking about a situation where each module contains, as close as possible, a vertical slice of the application, including model beans, service layer, controllers etc. If the truth be told then there are any number of ways to organise your project's sub-modules. Most project set-ups are fairly flat in structure, which is what I've demonstrated above; however, if you take a look at Erik Putrycz's 2009 talk Maven – or how to automate java builds, tests and version management with open source tools, he demonstrates that you can have modules within modules within modules. In order to explore this a little further, I'm going to invent my usual preposterously contrived scenario and in this scenario, you've got to write a program for a Welsh dental practice owned by a man called Jones also known locally as 'Jones The Driller'. The requirements would be pretty standard, I suspect, for a dental practice and would include handling: Patients details: name, address, DOB, phone number etc. Medical records, including treatments and outcomes. Appointments. Accounting, e.g. sales, purchase, wages etc. Auditing: as in who did what to whom... As a solution to Jones The Driller's problem, you propose that you write a multi-module web application based upon Spring, MVC and tomcat that, when assembled, has a standard 'n' tier design of a mySQL database, a database layer, service layer, a set of controllers and some JSPs that comprise the view. In creating your project your idea is to organise your sub-modules 'by class type' and you come up with the following module organisation, shown below roughly in build dependency order dentists-model dentists-utils dentist-repository dentists-services dentists-controllers dentists-web ...which on your screen looks something like this: Your dentists-model module contains the project's beans that model object used from the persistence layer right up to the JSPs. dentists-repository, dentists-services and dentists-controllers reflect the various layers of your application, with dentists-web module containing all the JSPs, CSS and other view paraphernalia. As for dentists-utils, well every project has a utils module where all the really useful, but disparate classes end up. Meanwhile, in a different universe, a different version of you decides to organise your project's sub-modules by functional area and you come up with the following breakdown: dentists-utils dentists-audit dentists-user-details dentists-medical-records dentists-appointments dentists-accounts dentists-repository dentists-integration dentists-web In this scenario, the build order is somewhat different; virtually all modules will depend upon dentists-utils and, depending upon your exact audit requirements, most modules will rely upon dentists-audit. You can also see in the following images that the sub-module package structure has been arranged on layer and type boundaries in that each module has its own model, repository (which contains interface definitions only) services and controller packages and that the layout of each module is identical at the top level. Another discussion to have here is the organisation of your project's package structure, where you can ask the same kind of questions: do you organise 'by class type' or 'by functional area' as shown above? You may have noticed that the dentists-repository modules can be fairly near the end of the build cycle as it only contains the implementation of the repository classes and not their interface definitions. You may have also noticed that dentists-web is again a separate module. This is because you're a pretty savvy business guy and in keeping the JSPs etc. in their own module, you hope to re-skin your app and sell it to that other Welsh dentist down the road: Williams The Puller. From a test perspective, each module contains its own unit tests, but there's a separate integration test module that, as it'll take longer to execute can be run when required. There are generally two ways of defining integration tests: firstly by putting them in their own module, which I prefer, and secondly by using a integration test naming convention such as somefileIT.java, and running all *IT.java files separately to all *Test.java files. Your two identical selves have proposed two different solutions to the same problem, so I guess that it's now time to takes a look at the pros and cons of each. Taking the 'by class type' solution first, what can be said about it? On the plus side, it's pretty maintainable in that you always known where to find stuff. Need a service class? Then that's in the dentist-service module. Also, the build order is very straight forward. On the down side, organisation 'by class type' is prone to problems with circular dependencies creeping in and classes with totally different responsibilities are all mixed up together making it difficult to re-use functionality in other projects without unnecessarily dragging in the who shebang. So, what about the pros and cons of the 'by functional area' approach? To my way of thinking, given the package structure of each module, it's just about as easy to locate a class using this technique as it is when using 'by class type'. The real benefit of using this approach is that it's far simpler to re-use a functional area of code in other projects. For example, I've worked on many projects in different companies and have implemented auditing several times. Each time I implement it I usually do it in roughly the same way, so wouldn't it be good just to reuse the first implementation? Not withstanding code ownership issues... The same idea also applies to dentists-user-details; the requirement to manage names and addresses applies equally as well to a shoe sales web site as it does a dental practice. And the downside? One of the benefits of this approach is that the modules are highly decoupled, but from experience no matter how hard you try, you always end up with more coupling that you'd like. You may have already spotted that both of these proposals are not 100% pure; 'by class type' contains a bit of 'by functionality' and conversely 'by functional area' contains a couple of 'by class type' modules. This may be avoidable, but I'm purposely being pragmatic. As I said earlier you always see a utils module in a project. Furthermore creating a separate database module allows you to change your project's database implementation fairly easily, which may make testing easier in some circumstances and likewise, having a separate web module allows you to re-skin your code should you be lucky enough to sell the same product to multiple customers with their own branding. Finally, one of the unwritten truths in software development is that once you've organised your project into its sub-modules you'll rarely get the opportunity to reorganise and improve them: there usually isn't the time or the political will as doing so costs money; however, it should be remembered that, in Agile terms, project module composition is, like code, a form of technical debt, which if done badly also costs you a lot of cash. It therefore seems a really good idea, as a team, to plan out your project thoroughly before starting to code. So be radical, do some design or have a meeting, you know it'll be worth it in the end.

February 8, 2013

by Roger Hughes

· 28,652 Views · 1 Like

Escaping Solr Query Characters In Python

I’ve been working in some Python Solr client code. One area where bugs have cropped up is in query terms that need to be escaped before passing to Solr. For example, how do you send Solr an argument term with a “:” in it? Or a “(“? It turns out that generally you just put a \ in front of the character you want to escape. So to search for “:” in the “funnychars” field, you would send q=funnychars:\:. Php programmer Mats Lindh has solved this pretty well, using str_replace. str_replace is a convenient, general-purpose string replacement function that lets you do batch string replacement. For example you can do: $matches = array("Mary","lamb","fleece"); $replacements = array("Harry","dog","fur"); str_replace($matches, $replacements,"Mary had a little lamb, its fleece was white as snow"); Python doesn’t quite have str_replace. There is translate which does single character to single character batch replacement. That can’t be used for escaping because the destination values are strings(ie “\:”), not single characters. There’s a general purpose “str_replace” drop-in replacement at this Stackoverflow question: edits =[("Mary","Harry"),("lamb","dog"),("fleece","fur")]# etc.for search, replace in edits: s = s.replace(search, replace) You’ll notice that this algorithm requires multiple passes through the string for search/replacement. This is because that earlier search/replaces may impact later search/replaces. For example, what if edits was this: edits =[("Mary","Harry"),("Harry","Larry"),("Larry","Barry")] First our str_replace will replace Mary with Harry in pass 1, then Harry with Larry in pass 2, etc. It turns out that escaping characters is a narrower string replacement case that can be done more efficiently without too much complication. The only character that one needs to worry about impacting other rules is escaping the \, as the other rules insert \ characters, we wouldn’t want them double escaped. Aside from this caveat, all the escaping rules can be processed from a single pass through the string which my solution below does, performing a heck of a lot faster: # These rules all independent, order of# escaping doesn't matter escapeRules ={'+':r'\+','-':r'\-','&':r'\&','|':r'\|','!':r'\!','(':r'$',')':r'$','{':r'\{','}':r'\}','[':r'\[',']':r'\]','^':r'\^','~':r'\~','*':r'\*','?':r'\?',':':r'\:','"':r'\"',';':r'\;',' ':r'\ '}defescapedSeq(term):""" Yield the next string based on the next character (either this char or escaped version """forcharin term:ifcharin escapeRules.keys():yield escapeRules[char]else:yieldchardefescapeSolrArg(term):""" Apply escaping to the passed in query terms escaping special characters like : , etc""" term = term.replace('\\',r'\\')# escape \ firstreturn"".join([nextStr for nextStr in escapedSeq(term)]) Aside from being a good general solution to this problem, in some basic benchmarks, this turns out to be about 5 orders of magnitude faster than doing it the naive way! Pretty cool, but you’ll probably rarely notice the difference. Nevertheless it could matter in specialized cases if you are automatically constructing and batching large/gnarly queries that require a lot of work to escape. Anyway, enjoy! I’d love to hear what you think!

February 8, 2013

by Doug Turnbull

· 4,778 Views

Hbase Error: Region is not online: -ROOT-„0

If you are running HBase and commands are giving you an error that looks like this: Fri Oct 05 21:45:02 UTC 2012, org.apache.hadoop.hbase.client.ScannerCallable@74f2db2d, org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region is not online: -ROOT-„0 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2859) at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2071) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc. HBaseServer$Handler.run(HBaseServer.java:1345) Your HBase master is failing to start because Zookeeper is giving it an incorrect location for where the -ROOT- table is located. If you go into the HBase webui, you’ll see “Assigning ROOT region” as the status of the master node on startup. To fix this: $ hbase zkcli zookeeper_cli> rmr /hbase/root-region-server Then restart your master node, and you should be fine.

February 7, 2013

by George London

· 12,868 Views

How to Compress and Uncompress a Java Byte Array Using Deflater/Inflater

Here is a small code snippet which shows an utility class that offers two methods to compress and extract a Java byte array.

February 6, 2013

by Ralf Quebbemann

· 135,176 Views · 2 Likes

Local WebHooks with Mule Cloud Connect and LocalTunnel v2

When using an external API for WebHooks or Callbacks as discussed in Chapters 3 and 5 of Getting Started with Mule Cloud Connect; The API provider running somewhere out there on the web needs to callback your application that is happily running in isolation on your local machine. For an API provider to callback your application, the application must be accessible over the web. Sure, you could upload and test your application on a public facing server, but you may find it quicker and easier to work on your local development machine and these are typically behind firewalls, NAT, or otherwise not able to provide a public URL. You need a way to make your local application available over the web. There are a few good services and tools out there to help with this. Examples include ProxyLocal, and Forward.io. Alternatively, you can set up your own reverse SSH Tunnel if you already have a remote system to forward your requests, but this is cumbersome to say the least. I find Localtunnel to be an excellent fit for this need and localtunnel have just recently released v2 of its service with a host of new features and enhancements. More information can be found here: http://progrium.com/blog/2012/12/25/localtunnel-v2-available-in-beta/ Installing Localtunnel Those familiar with version 1 of the service will know that the v1 Localtunnel client was written in Ruby and required Rubygems to install it. The v2 client is now written in Python and can instead be installed via easy_install or pip. If instead you're interested in using Localtunnel v1, then I have wrote a previous blog post on the subject here: http://blogs.mulesoft.org/connector-callback-testing-local/ To get started, you will first need to check that you have Python installed. Localtunnel requires Python 2.6 or later. Most systems come with Python installed as standard, but if not you can check via the following command: $ python -version More info on installing Python can be found here: http://wiki.python.org/moin/BeginnersGuide/Download Once complete, you will need easy_install to install the Localtunnel client.If you don't have easy_install after you install Python, you can install it with this bootstrap script: $ curl http://peak.telecommunity.com/dist/ez_setup.py | python Once complete, you can install the Localtunnel client using the following command: $ easy_install localtunnel First run with LocalTunnel Once installed, creating a tunnel is as simple as running the following command: $ localtunnel-beta 8082 The parameter after the command: "8000" is the local port we want Localtunnel to forward to. So whatever port your app is running on should replace this value. Each time you run the command you should get output similar to the following: Port 8082 is now accessible from http://fb0322605126.v2.localtunnel.com ... Note: As v2 is still in beta; the command local-tunnel-beta will eventually be installed as just localtunnel. This lets you keep the v1 just in case anything goes wrong with v2 during the beta. Configuring the Connector Now onto Mule! To demonstrate I will use the Twilio Cloud Connector example from Chapter 5. Twilio has an awesome WebHook implementation with great debugging tools. Twilio uses callbacks to tell you about the status of your requests; When you use Twilio to a place a phone call or send an SMS the Twilio API allows you to send a URL where you'll receive information about the phone call once it ends or the status of the outbound SMS message after it's processed. This example uses the Twilio Cloud Connector to send a simple SMS message. The most important thing to note is that the "status-callback-flow-ref" attribute. All connector operations that support callback's will have an optional attribute ending in "-flow-ref". In this case : "status-callback-flow-ref". As the name suggests, this attribute should reference a flow. This value must be a valid flow id from within your configuration. It is this flow that will be used to listen for the callback. Notice that the flow has no inbound endpoint? This is where the magic happens; when Twilio process the SMS message it will send a callback automatically to that flow without you having to define an inbound endpoint. The connector automatically generates an inbound endpoint and sends the auto generated URL to Twilio for you. Customizing the Callback The URL generated for the callback URL is built using 'localhost' as the host, the 'http.port' environment variable or 'localPort' value as the port and the path of the URL is typically just a random generated string or static value. So if I run this locally it would send Twilio my non public address, something like: http://localhost:80/...vv3v3er342fvvn. Each connector that accepts HTTP callbacks will provide you with an optional http-callback-config child element to override these settings. These settings can be set at the connector's config level as follows: Here we have amended the previous example to add the additonal http-callback-config configuration. The configuration takes three additional arguments: domain, localPort and remotePort. These settings will be used to constuct the URL that is passed to the external system. The URL will be the same as the default generated URL of the HTTP inbound-endpoint except that the host is replaced by the 'domain' setting (or its default value) and the port is replaced by the 'remotePort' setting (or its default value). In this case we have used the domain from the URL that Localtunnel generated for us earlier: fb0322605126.v2.localtunnel.com and set the localPort to 8082 as we run the Localtunnel command using port 8082 and the remotePort to 80 as the localtunnel server just runs on port 80. And that's it! If you run this configuration you should start seeing your callback being printed to the console. The same goes for any OAuth connectors too. If your using any OAuth connectors built using the DevKit OAuth modules, you can configure the OAuth callback in a similar fashion. A full Mule/Twilio WebHook project can be found here: https://github.com/ryandcarter/GettingStarted-MuleCloudConnect-OReilly/tree/master/chapter05/twilio-webhooks

February 5, 2013

by Ryan Carter

· 4,951 Views

Drools Decision Tables with Camel and Spring

As I've shown it in my previous post JBoss Drools are a very useful rules engine. The only problem is that creating the rules in the Rule language might be pretty complicated for a non-technical person. That's why one can provide an easy way for creating business rules - decision tables created in a spreadsheet! In the following example I will show you a really complicated business rule example converted to a decision table in a spreadsheet. As a backend we will have Drools, Camel and Spring. To begin with let us take a look at our imaginary business problem. Let us assume that we are running a business that focuses on selling products (either Medical or Electronic). We are shipping our products to several countries (PL, USA, GER, SWE, UK, ESP) and depending on the country there are different law regulations concerning the buyer's age. In some countries you can buy products when you are younger than in others. What is more depending on the country from which the buyer and the product comes from and on the quantity of products, the buyer might get a discount. As you can see there is a substantial number of conditions needed to be fullfield in this scenario (imagine the number of ifs needed to program this :P ). Another problem would be the business side (as usual). Anybody who has been working on a project knows how fast the requirements are changing. If one entered all the rules in the code he would have to redeploy the software each time the requirements changed. That's why it is a good practice to divide the business logic from the code itself. Anyway, let's go back to our example. To begin with let us take a look at the spreadsheets (before that it is worth taking a look at the JBoss website with precise description of how the decision table should look like): The point of entry of our program is the first spreadsheet that checks if the given user should be granted with the possibility of buying a product (it will be better if you download the spreadsheets and play with them from Too Much Coding's repository at Bitbucket: user_table.xls and product_table.xls, or Github user_table.xls and product_table.xls): user_table.xls (tables worksheet) Once the user has been approved he might get a discount: product_table.xls (tables worksheet) product_table.xls (lists worksheet) As you can see in the images the business problem is quite complex. Each row represents a rule, and each column represents a condition. Do you remember the rules syntax from my recent post? So you would understand the hidden part of the spreadsheet that is right above the first visible row: The rows from 2 to 6 represent some fixed configuration values such as rule set, imports ( you've already seen that in my recent post) and functions. Next in row number 7 you can find the name of the RuleTable. Then in row number 8 you have in our scenario either a CONDITION or an ACTION - so in other words either the LHS or rhe RHS respectively. Row number 9 is both representation of types presented in the condition and the binding to a variable. In row number 10 we have the exact LHS condition. Row number 11 shows the label of columns. From row number 12 we have the rules one by one. You can find the spreadsheets in the sources. Now let's take a look at the code. Let's start with taking a look at the schemas defining the Product and the User. Person.xsd User.xsd Due to the fact that we are using maven we may use a plugin that will convert the XSD into Java classes. part of the pom.xml org.apache.maven.plugins maven-compiler-plugin 2.5.1 org.codehaus.mojo jaxb2-maven-plugin 1.5 xjc xjc pl.grzejszczak.marcin.drools.decisiontable.model ${project.basedir}/src/main/resources/xsd Thanks to this plugin we have our generated by JAXB classes in the pl.grzejszczak.marcin.decisiontable.model package. Now off to the drools-context.xml file where we've defined all the necessary beans as far as Drools are concerned: As you can see in comparison to the application context from the recent post there are some differences. First instead of passing the DRL file as the resource inside the knowledge base we are providing the Decision table (DTABLE). I've decided to pass in two seperate files but you can provide one file with several worksheets and access those worksheets (through the decisiontable-conf element). Also there is an additional element called node. We have to choose an implementation of the Node interface (Execution, Grid...) for the Camel route to work properly as you will see in a couple of seconds in the Spring application context file. applicationContext.xml As you can see in order to access the Drools Camel Component we have to provide the node through which we will access the proper knowledge session. We have defined two routes - the first one ends at the Drools component that accesses the users knowledge session and the other the products knowledge session. We have a ProductService interface implementation called ProductServiceImpl that given an input User and Product objects pass them through the Camel's Producer Template to two Camel routes each ending at the Drools components. The concept behind this product service is that we are first processing the User if he can even buy the software and then we are checking what kind of a discount he would receive. From the service's point of view in fact we are just sending the object out and waiting for the response. Finally having reveived the response we are passing the User and the Product to the Financial Service implementation that will bill the user for the products that he has bought or reject his offer if needed. ProductServiceImpl.java package pl.grzejszczak.marcin.drools.decisiontable.service; import org.apache.camel.CamelContext; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Component; import pl.grzejszczak.marcin.drools.decisiontable.model.Product; import pl.grzejszczak.marcin.drools.decisiontable.model.User; import static com.google.common.collect.Lists.newArrayList; /** * Created with IntelliJ IDEA. * User: mgrzejszczak * Date: 14.01.13 */ @Component("productServiceImpl") public class ProductServiceImpl implements ProductService { private static final Logger LOGGER = LoggerFactory.getLogger(ProductServiceImpl.class); @Autowired CamelContext camelContext; @Autowired FinancialService financialService; @Override public void runProductLogic(User user, Product product) { LOGGER.debug("Running product logic - first acceptance Route, then discount Route"); camelContext.createProducerTemplate().sendBody("direct:acceptanceRoute", newArrayList(user, product)); camelContext.createProducerTemplate().sendBody("direct:discountRoute", newArrayList(user, product)); financialService.processOrder(user, product); } } Another crucial thing to remember about is that the Camel Drools Component requires the Command object as the input. As you can see, in the body we are sending a list of objects (and these are not Command objects). I did it on purpose since in my opinion it is better not to bind our code to a concrete solution. What if we find out that there is a better solution than Drools? Will we change all the code that we have created or just change the Camel route to point at our new solution? That's why Camel has the TypeConverters. We have our own here as well. First of all let's take a look at the implementation. ProductTypeConverter.java package pl.grzejszczak.marcin.drools.decisiontable.converter; import org.apache.camel.Converter; import org.drools.command.Command; import org.drools.command.CommandFactory; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import pl.grzejszczak.marcin.drools.decisiontable.model.Product; import java.util.List; /** * Created with IntelliJ IDEA. * User: mgrzejszczak * Date: 30.01.13 * Time: 21:42 */ @Converter public class ProductTypeConverter { private static final Logger LOGGER = LoggerFactory.getLogger(ProductTypeConverter.class); @Converter public static Command toCommandFromList(List inputList) { LOGGER.debug("Executing ProductTypeConverter's toCommandFromList method"); return CommandFactory.newInsertElements(inputList); } @Converter public static Command toCommand(Product product) { LOGGER.debug("Executing ProductTypeConverter's toCommand method"); return CommandFactory.newInsert(product); } } There is a good tutorial on TypeConverters on the Camel website - if you needed some more indepth info about it. Anyway, we are annotating our class and the functions used to convert different types into one another. What is important here is that we are showing Camel how to convert a list and a single product to Commands. Due to type erasure this will work regardless of the provided type that is why even though we are giving a list of Product and User, the toCommandFromList function will get executed. In addition to this in order for the type converter to work we have to provide the fully quallified name of our class (FQN) in the /META-INF/services/org/apache/camel/TypeConverter file. TypeConverter pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter In order to properly test our functionality one should write quite a few tests that would verify the rules. A pretty good way would be to have input files stored in the test resources folders that are passed to the rule engine and then the result would be compared against the verified output (unfortunately it is rather impossible to make the business side develop such a reference set of outputs). Anyway let's take a look at the unit test that verifies only a few of the rules and the logs that are produced from running those rules: ProductServiceImplTest.java package pl.grzejszczak.marcin.drools.decisiontable.service.drools; import org.apache.commons.lang.builder.ReflectionToStringBuilder; import org.apache.commons.lang.builder.ToStringStyle; import org.junit.Test; import org.junit.runner.RunWith; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.test.context.ContextConfiguration; import org.springframework.test.context.junit4.SpringJUnit4ClassRunner; import pl.grzejszczak.marcin.drools.decisiontable.model.*; import pl.grzejszczak.marcin.drools.decisiontable.service.ProductService; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertTrue; /** * Created with IntelliJ IDEA. * User: mgrzejszczak * Date: 03.02.13 * Time: 16:06 */ @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration("classpath:applicationContext.xml") public class ProductServiceImplTest { private static final Logger LOGGER = LoggerFactory.getLogger(ProductServiceImplTest.class); @Autowired ProductService objectUnderTest; @Test public void testRunProductLogicUserPlUnderageElectronicCountryPL() throws Exception { int initialPrice = 1000; int userAge = 6; int quantity = 10; User user = createUser("Smith", CountryType.PL, userAge); Product product = createProduct("Electronic", initialPrice, CountryType.PL, ProductType.ELECTRONIC, quantity); printInputs(user, product); objectUnderTest.runProductLogic(user, product); printInputs(user, product); assertTrue(product.getPrice() == initialPrice); assertEquals(DecisionType.REJECTED, user.getDecision()); } @Test public void testRunProductLogicUserPlHighAgeElectronicCountryPLLowQuantity() throws Exception { int initialPrice = 1000; int userAge = 19; int quantity = 1; User user = createUser("Smith", CountryType.PL, userAge); Product product = createProduct("Electronic", initialPrice, CountryType.PL, ProductType.ELECTRONIC, quantity); printInputs(user, product); objectUnderTest.runProductLogic(user, product); printInputs(user, product); assertTrue(product.getPrice() == initialPrice); assertEquals(DecisionType.ACCEPTED, user.getDecision()); } @Test public void testRunProductLogicUserPlHighAgeElectronicCountryPLHighQuantity() throws Exception { int initialPrice = 1000; int userAge = 19; int quantity = 8; User user = createUser("Smith", CountryType.PL, userAge); Product product = createProduct("Electronic", initialPrice, CountryType.PL, ProductType.ELECTRONIC, quantity); printInputs(user, product); objectUnderTest.runProductLogic(user, product); printInputs(user, product); double expectedDiscount = 0.1; assertTrue(product.getPrice() == initialPrice * (1 - expectedDiscount)); assertEquals(DecisionType.ACCEPTED, user.getDecision()); } @Test public void testRunProductLogicUserUsaLowAgeElectronicCountryPLHighQuantity() throws Exception { int initialPrice = 1000; int userAge = 19; int quantity = 8; User user = createUser("Smith", CountryType.USA, userAge); Product product = createProduct("Electronic", initialPrice, CountryType.PL, ProductType.ELECTRONIC, quantity); printInputs(user, product); objectUnderTest.runProductLogic(user, product); printInputs(user, product); assertTrue(product.getPrice() == initialPrice); assertEquals(DecisionType.REJECTED, user.getDecision()); } @Test public void testRunProductLogicUserUsaHighAgeMedicalCountrySWELowQuantity() throws Exception { int initialPrice = 1000; int userAge = 22; int quantity = 4; User user = createUser("Smith", CountryType.USA, userAge); Product product = createProduct("Some name", initialPrice, CountryType.SWE, ProductType.MEDICAL, quantity); printInputs(user, product); objectUnderTest.runProductLogic(user, product); printInputs(user, product); assertTrue(product.getPrice() == initialPrice); assertEquals(DecisionType.ACCEPTED, user.getDecision()); } @Test public void testRunProductLogicUserUsaHighAgeMedicalCountrySWEHighQuantity() throws Exception { int initialPrice = 1000; int userAge = 22; int quantity = 8; User user = createUser("Smith", CountryType.USA, userAge); Product product = createProduct("Some name", initialPrice, CountryType.SWE, ProductType.MEDICAL, quantity); printInputs(user, product); objectUnderTest.runProductLogic(user, product); printInputs(user, product); double expectedDiscount = 0.25; assertTrue(product.getPrice() == initialPrice * (1 - expectedDiscount)); assertEquals(DecisionType.ACCEPTED, user.getDecision()); } private void printInputs(User user, Product product) { LOGGER.debug(ReflectionToStringBuilder.reflectionToString(user, ToStringStyle.MULTI_LINE_STYLE)); LOGGER.debug(ReflectionToStringBuilder.reflectionToString(product, ToStringStyle.MULTI_LINE_STYLE)); } private User createUser(String name, CountryType countryType, int userAge){ User user = new User(); user.setUserName(name); user.setUserCountry(countryType); user.setUserAge(userAge); return user; } private Product createProduct(String name, double price, CountryType countryOfOrigin, ProductType productType, int quantity){ Product product = new Product(); product.setPrice(price); product.setCountryOfOrigin(countryOfOrigin); product.setName(name); product.setType(productType); product.setQuantity(quantity); return product; } } Of course the log.debugs in the tests are totally redundant but I wanted you to quickly see that the rules are operational :) Sorry for the length of the logs but I wrote a few tests to show different combinations of rules (in fact it's better too have too many logs than the other way round :) ) pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@1d48043[ userName=Smith userAge=6 userCountry=PL decision= decisionDescription= ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@1e8f2a0[ name=Electronic type=ELECTRONIC price=1000.0 countryOfOrigin=PL additionalInfo= quantity=10 ] pl.grzejszczak.marcin.drools.decisiontable.service.ProductServiceImpl:31 Running product logic - first acceptance Route, then discount Route pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.ProductService:8 Sorry, according to your age (< 18) and country (PL) you can't buy this product pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.FinancialServiceImpl:29 Sorry, user has been rejected... pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@1d48043[ userName=Smith userAge=6 userCountry=PL decision=REJECTED decisionDescription=Sorry, according to your age (< 18) and country (PL) you can't buy this product ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@1e8f2a0[ name=Electronic type=ELECTRONIC price=1000.0 countryOfOrigin=PL additionalInfo= quantity=10 ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@b28f30[ userName=Smith userAge=19 userCountry=PL decision= decisionDescription= ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@d6a0e0[ name=Electronic type=ELECTRONIC price=1000.0 countryOfOrigin=PL additionalInfo= quantity=1 ] pl.grzejszczak.marcin.drools.decisiontable.service.ProductServiceImpl:31 Running product logic - first acceptance Route, then discount Route pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.ProductService:8 Congratulations, you have successfully bought the product pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.ProductService:8 Sorry, no discount will be granted. pl.grzejszczak.marcin.drools.decisiontable.service.FinancialServiceImpl:25 User has been approved - processing the order... pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@b28f30[ userName=Smith userAge=19 userCountry=PL decision=ACCEPTED decisionDescription=Congratulations, you have successfully bought the product ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@d6a0e0[ name=Electronic type=ELECTRONIC price=1000.0 countryOfOrigin=PL additionalInfo=Sorry, no discount will be granted. quantity=1 ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@14510ac[ userName=Smith userAge=19 userCountry=PL decision= decisionDescription= ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@1499616[ name=Electronic type=ELECTRONIC price=1000.0 countryOfOrigin=PL additionalInfo= quantity=8 ] pl.grzejszczak.marcin.drools.decisiontable.service.ProductServiceImpl:31 Running product logic - first acceptance Route, then discount Route pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.ProductService:8 Congratulations, you have successfully bought the product pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.ProductService:8 Congratulations - you've been granted a 10% discount! pl.grzejszczak.marcin.drools.decisiontable.service.FinancialServiceImpl:25 User has been approved - processing the order... pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@14510ac[ userName=Smith userAge=19 userCountry=PL decision=ACCEPTED decisionDescription=Congratulations, you have successfully bought the product ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@1499616[ name=Electronic type=ELECTRONIC price=900.0 countryOfOrigin=PL additionalInfo=Congratulations - you've been granted a 10% discount! quantity=8 ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@17667bd[ userName=Smith userAge=19 userCountry=USA decision= decisionDescription= ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@ad9f5d[ name=Electronic type=ELECTRONIC price=1000.0 countryOfOrigin=PL additionalInfo= quantity=8 ] pl.grzejszczak.marcin.drools.decisiontable.service.ProductServiceImpl:31 Running product logic - first acceptance Route, then discount Route pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.ProductService:8 Sorry, according to your age (< 18) and country (USA) you can't buy this product pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.FinancialServiceImpl:29 Sorry, user has been rejected... pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@17667bd[ userName=Smith userAge=19 userCountry=USA decision=REJECTED decisionDescription=Sorry, according to your age (< 18) and country (USA) you can't buy this product ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@ad9f5d[ name=Electronic type=ELECTRONIC price=1000.0 countryOfOrigin=PL additionalInfo= quantity=8 ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@9ff588[ userName=Smith userAge=22 userCountry=USA decision= decisionDescription= ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@1b0d2d0[ name=Some name type=MEDICAL price=1000.0 countryOfOrigin=SWE additionalInfo= quantity=4 ] pl.grzejszczak.marcin.drools.decisiontable.service.ProductServiceImpl:31 Running product logic - first acceptance Route, then discount Route pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.ProductService:8 Congratulations, you have successfully bought the product pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.FinancialServiceImpl:25 User has been approved - processing the order... pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@9ff588[ userName=Smith userAge=22 userCountry=USA decision=ACCEPTED decisionDescription=Congratulations, you have successfully bought the product ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@1b0d2d0[ name=Some name type=MEDICAL price=1000.0 countryOfOrigin=SWE additionalInfo= quantity=4 ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@1b27882[ userName=Smith userAge=22 userCountry=USA decision= decisionDescription= ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@5b84b[ name=Some name type=MEDICAL price=1000.0 countryOfOrigin=SWE additionalInfo= quantity=8 ] pl.grzejszczak.marcin.drools.decisiontable.service.ProductServiceImpl:31 Running product logic - first acceptance Route, then discount Route pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.ProductService:8 Congratulations, you have successfully bought the product pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.ProductService:8 Congratulations, you are granted a discount pl.grzejszczak.marcin.drools.decisiontable.service.FinancialServiceImpl:25 User has been approved - processing the order... pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@1b27882[ userName=Smith userAge=22 userCountry=USA decision=ACCEPTED decisionDescription=Congratulations, you have successfully bought the product ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@5b84b[ name=Some name type=MEDICAL price=750.0 countryOfOrigin=SWE additionalInfo=Congratulations, you are granted a discount quantity=8 ] In this post I've presented how you can push some of your developing work to your BA by giving him a tool which he can be able to work woth - the Decision Tables in a spreadsheet. What is more now you will now how to integrate Drools with Camel. Hopefully you will see how you can simplify (thus minimize the cost of implementing and supporting) the implementation of business rules bearing in mind how prone to changes they are. I hope that this example will even better illustrate how difficult it would be to implement all the business rules in Java than in the previous post about Drools. If you have any experience with Drools in terms of decision tables, integration with Spring and Camel please feel free to leave a comment here or on my blog - let's have a discussion on that :) All the code is available at Too Much Coding repository at Bitbucket and GitHub.

February 5, 2013

by Marcin Grzejszczak

· 25,621 Views · 1 Like

Tutorial: Deploying an API on EC2 from AWS

Curator's Note: This article was co-authored by Andrzej Jarzyna. At 3scale we find Amazon to be a fantastic platform for running APIs due to the complete control you have on the application stack. For people new to AWS the learning curve is quite steep. So we put together our best practices into this short tutorial. Besides Amazon EC2 we will use the Ruby Grape gem to create the API interface and an Nginx proxy to handle access control. Best of all everything in this tutorial is completely FREE! For the purpose of this tutorial you will need a running API based on Ruby and Thin server. If you don’t have one you can simply clone an example repo as described below (in the “Deploying the Application” section). If you are interested in the background of this example (Sentiment API), you can see a couple of previous guides which 3scale has published. Here we use version_1 of the API(‘API up and running in 10 minutes‘) with some extra sentiment analysis functionality (this part is covered in the second tutorial of the Sentiment API tutorial). Now we will start the creation and configuration of the Amazon EC2 instance. If you already have an EC2 instance (micro or not), you can jump to the next step -> Preparing Instance for Deployment. Creating and configuring EC2 Instance Let’s start by signing up for the Amazon Elastic Compute Cloud (Amazon EC2). For our needs the free tier http://aws.amazon.com/free/ is enough, covering all the basic needs. Once the account is created go to the EC2 dashboard under your AWS Management Console and click on the Launch Instance button. That will transfer you to a popup window where you will continue the process: Choose the classic wizard Choose an AMI (Ubuntu Server 12.04.1 LTS 32bit, T1micro instance) leaving all the other settings for Instance Details as default Create a keypair and download it – this will be the key which you will use to make an ssh connection to the server, it’s VERY IMPORTANT! Add inbound rules for the firewall with source always 0.0.0.0/0 (HTTP, HTTPS, ALL ICMP, TCP port 3000 used by the Ruby thin server) Preparing Instance for Deployment Now, as we have the instance created and running, we can directly connect there from our console (Windows users from PuTTY). Right click on your instance, connect and choose Connect with a standalone SSH Client. Follow the steps and change the username to ubuntu (instead of root) in the given example. After executing this step you are connected to your instance. We will have to install new packages. Some of them require root credentials, so you will have to set a new root password: sudo passwd root. Then login as root: su root. Now with root credentials execute: sudo apt-get update and switch back to your normal user with exit command and install all the required packages: install some libraries which will be required by rvm, ruby and git: sudo apt-get install build-essential git zlib1g-dev libssl-dev libreadline-gplv2-dev imagemagick libxml2-dev libxslt1-dev openssl libreadline6 libreadline6-dev zlib1g libyaml-dev libxslt-dev autoconf libc6-dev ncurses-dev automake libtool bison libpq-dev libpq5 libeditline-dev install git (on Linux rather than from Source): http://www.git-scm.com/book/en/Getting-Started-Installing-Git install rvm: https://rvm.io/rvm/install/ install ruby rvm install 1.9.3 rvm use 1.9.3 --default Deploying the Application Our sample Sentiment API is located on Github. Try cloning the repository: git clone [email protected]:jerzyn/api-demo.git you can once again review the code and tutorial on creating and deploying this app here: http://www.3scale.net/2012/06/the-10-minute-api-up-running-3scale-grape-heroku-api-10-minutes/ and here http://www.3scale.net/2012/07/how-to-out-of-the-box-api-analytics/ note the changes (we are using only v1, as authentication will go through the proxy). Now you can deploy the app by issuing: bundle install. Now you can start the thin server: thin start. To access the API directly (i.e. without any security or access control) access: your-public-dns:3000/v1/words/awesome.json (you can find your-public-dns in the AWS EC2 Dashboard->Instances in the details window of your instance) For the Nginx integration you will have to create an elastic IP address. Inside the AWS EC2 dashboard create an elastic IP in the same region as your instance and associate that IP to it (you won’t have to pay anything for the elastic IP as long as it is associated with your instance in the same region). OPTIONAL: If you want to assign a custom domain to your amazon instance you will have to do one thing: add an A record to the DNS record of your domain mapping the domain to the elastic IP address you have previously created. Your domain provider should either give you some way to set the A record (the IPv4 address), or it will give you a way to edit the nameservers of your domain. If they do not allow you to set the A record directly, find a DNS management service, register your domain as a zone there and the service will give you the nameservers to enter in the admin panel of your domain provider. You can then add the A record for the domain. Some possible DNS management services include ZoneEdit (basic, free), Amazon route 53, etc. At this point you API is open to the world. This is good and bad – great that you are sharing, but bad in the sense that without rate limits a few apps could kill the resources of your server, and you have no insight into who is using your API and how it is being used. The solution is to add some management for your API… Enabling API Management with 3scale Rather than reinvent the wheel and implement rate limits, access controls and analytics from scratch we will leverage the handy 3scale API Management service. Get your free 3scale account, activate and log-in to the new instance through the provided links. The first time you log-in you can choose the option for some sample data to be created, so you will have some API keys to use later. Next you would probably like to go through the tour to get a glimpse on the system functionality (optional) and then start with the implementation. To get some instant results we will start with the sandbox proxy which can be used while in development. Then we will also configure an Nginx proxy which can scale up for full production deployments. There is some documentation on the configuration of the API proxy at 3scale: https://support.3scale.net/howtos/api-configuration/nginx-proxy and for more advanced configuration options here: https://support.3scale.net/howtos/api-configuration/nginx-proxy-advanced Once you sign into your 3scale account, Launch your API on the main Dashboard screen or Go to API->Select the service (API)->Integration in the sidebar->Proxy Set the address of of your API backend – this has to be the Elastic IP address unless the custom domain has been set, including http protocol and port 3000. Now you can save and turn on the sandbox proxy to test your API by hitting the sandbox endpoint (after creating some app credentials in 3scale): http://sandbox-endpoint/v1/words/awesome.json?app_id=APP_ID&app_key=APP_KEY where, APP_ID and APP_KEY are id and key of one of the sample applications which you created when you first logged into your 3scale account (if you missed that step just create a developer account and an application within that account). Try it without app credentials, next with incorrect credentials, and then once authenticated within and over any rate limits that you have defined. Only once it is working to your satisfaction do you need to download the config files for Nginx. Note: any time you have errors check whether you can access the API directly: your-public-dns:3000/v1/words/awesome.json. If that is not available, then you need to check if the AWS instance is running and if the Thin Server is running on the instance. Implement an Nginx Proxy for Access Control In order to streamline this step we recommend that you install the fantastic OpenResty web application that is basically a bundle of the standard Nginx core with almost all the necessary 3rd party Nginx modules built-in. Install dependencies: sudo apt-get install libreadline-dev libncurses5-dev libpcre3-dev perl Compile and install Nginx: cd ~ sudo wget http://agentzh.org/misc/nginx/ngx_openresty-1.2.3.8.tar.gz sudo tar -zxvf ngx_openresty-1.2.3.8.tar.gz cd ngx_openresty-1.2.3.8/ ./configure --prefix=/opt/openresty --with-luajit --with-http_iconv_module -j2 make sudo make install In the config file make the following changes: edit the .conf file from nginx download in line 28, which is preceded by info to change your server name put the correct domain (of your Elastic IP or custom domain name) in line 78 change the path to the .lua file, downloaded together with the .conf file. We are almost finished! Our last step is to start the NGINX proxy and put some traffic through it. If it is not running yet (remember, that thin server has to be started first), please go to your EC2 instance terminal (the one you were connecting through ssh before) and start it now: sudo /opt/openresty/nginx/sbin/nginx -p /opt/openresty/nginx/ -c /opt/openresty/nginx/conf/YOUR-CONFIG-FILE.conf The last step will be verifying that the traffic goes through with a proper authorization. To do that, access: http://your-public-dns/v1/words/awesome.json?app_id=APP_ID&app_key=APP_KEY where, APP_ID and APP_KEY are key and id of the application you want to access through the API call. Once everything is confirmed as working correctly, you will want to block public access to the API backend on port 3000, which bypasses any access controls. If encounter some problems with the Nginx configuration or need a more detailed guide, I encourage you to check the 3scale guide on configuring Nginx proxy: https://support.3scale.net/howtos/api-configuration/nginx-proxy. You can go completely wild with customization of your API gateway. If you want to dive more into the 3scale system configuration (like usage and monitoring of your API traffic) feel encouraged to browse our Quickstart guides and HowTo’s.

February 4, 2013

by Steven Willmott

· 17,816 Views

Performance of Graph vs. Relational Databases

Curator's Note: Here's a post from back in 2011 that offers a general look at the difference in performance of graph databases and relational databases. A few weeks ago, Emil Eifrem, CEO of Neo Technology gave a webinar introduction to graph databases. I watched it as a lead up to my own presentation on graph databases and Neo4j. Right around the 54 minute mark, Emil talks about a very interesting experiment showing the performance difference between a relational database and a graph database for a certain type of problem called "arbitrary path query", specifically, given 1,000 users with an average of 50 "friend" relationships each, determine if one person is connected to another in 4 or fewer hops. Against a popular open-source relational database, the query took around 2,000 ms. For a graph database, the same determination took 2 ms. So the graph database was 1,000 times faster for this particular use case. Not satisfied with that, they then decided to run the same experiment with 1,000,000 users. The graph database took 2 ms. They stopped the relational database after several days of waiting for results. I showed this clip at my presentation to a few people who stuck around afterwards, and we tried to figure out why the graph database had the same performance with 1,000 times the data, while the relational database became unusably slow. The answer has to do with the way in which each type of database searches information. Relational databases search all of the data looking for anything that meets the search criteria. The larger the set of data, the longer it takes to find matches, because the database has to examine everything in the collection. Here's an example: let's assume there is a table with "friend" relationships: > SELECT * FROM friends; +-------------+--------------+ | user_id | friend_id | +-------------+--------------+ | 1 | 2 | | 1 | 3 | | 1 | 4 | | 2 | 5 | | 2 | 6 | | 2 | 7 | | 3 | 8 | | 3 | 9 | | 3 | 10 | +-------------+--------------+ In order to see if user "9" is connected to user "2", the database has to find all the friends of user "9", and see if user "2" is in that list. If not, find all of their friends, and then see if user "2" is in that list. The database has to scan the entire table each time. This means that if you double the number of rows in the table, you've doubled the amount of data to search, and thus doubled the amount of time it takes to find what you are looking for. (Even with indexing, it still has to find the values in the index tree, which involves traversing that tree. The index tree grows larger with each new record, meaning the time it takes to traverse grows larger as well. And for each search, you always start at the root of the tree.) Each new batch of friends to look at requires an entirely new scan of the table/index. So more records leads to more search time. Conversely, a graph database looks only at records that are directly connected to other records. If it is given a limit on how many "hops" it is allowed to make, it can ignore everything more than that number of hops away. Only the blue records are ever seen during the search. And since a graph traversal "remembers" where it is at any time, it never has to start from the beginning, only from its last known position. You could add another ring of records around the outside, or add a thousand more rings of records outside, but the search would never need to look at them because they are more steps away than the limit. The only way to increase the number of records searched (and thereby decrease performance) is to add more records within the 2-step limit, or add more relationships between the existing records. So the reason why having 1,000 vs. 1,000,000 records causes such a stark difference between a relational and a graph database is that relational database performance decreases in relation to the number of records in the table, while graph database performance decreases in relation to the number of connections between the records.

February 4, 2013

by Josh Adell

· 35,921 Views · 2 Likes

Repository Pattern, Done Right

the repository pattern has been discussed a lot lately. especially about it’s usefulness since the introduction of or/m libraries. this post (which is the third in a series about the data layer) aims to explain why it’s still a great choice. let’s start with the definition : a repository mediates between the domain and data mapping layers, acting like an in-memory domain object collection. client objects construct query specifications declaratively and submit them to repository for satisfaction. objects can be added to and removed from the repository, as they can from a simple collection of objects, and the mapping code encapsulated by the repository will carry out the appropriate operations behind the scenes the repository pattern is used to create an abstraction between your domain and data layer. that is, when you use the repository you should not have to have any knowledge about the underlying data source or the data layer (i.e. entity framework, nhibernate or similar). why do we need it? read the abstractions part of my data layer article. it explains the basics to why we should use repositories or similar abstractions. but let’s also examine some simple business logic: var brokentrucks = _session.query().where(x => x.state == 1); foreach (var truck in brokentrucks) { if (truck.calculatereponsetime().totaldays > 30) sendemailtomanager(truck); } what does that give us? broken trucks? well. no. the statement was copied from another place in the code and the developer had forgot to update the query. any unit tests would likely just check that some trucks are returned and that they are emailed to the manager. so we basically have two problems here: a) most developers will likely just check the name of the variable and not on the query. b) any unit tests are against the business logic and not the query. both those problems would have been fixed with repositories. since if we create repositories we also have unit tests which targets the data layer only. implementations here are some different implementations with descriptions. base classes these classes can be reused for all different implementations. unitofwork the unit of work represents a transaction when used in data layers. typically the unit of work will roll back the transaction if savechanges() has not been invoked before being disposed. public interface iunitofwork : idisposable { void savechanges(); } paging we also need to have page results. public class pagedresult { ienumerable _items; int _totalcount; public pagedresult(ienumerable items, int totalcount) { _items = items; _totalcount = totalcount; } public ienumerable items { get { return _items; } } public int totalcount { get { return _totalcount; } } } we can with the help of that create methods like: public class userrepository { public pagedresult find(int pagenumber, int pagesize) { } } sorting finally we prefer to do sorting and page items, right? var constraints = new queryconstraints() .sortby("firstname") .page(1, 20); var page = repository.find("jon", constraints); do note that i used the property name, but i could also have written constraints.sortby(x => x.firstname) . however, that is a bit hard to write in web applications where we get the sort property as a string. the class is a bit big, but you can find it at github . in our repository we can apply the constraints as (if it supports linq): public class userrepository { public pagedresult find(string text, queryconstraints constraints) { var query = _dbcontext.users.where(x => x.firstname.startswith(text) || x.lastname.startswith(text)); var count = query.count(); //easy var items = constraints.applyto(query).tolist(); return new pagedresult(items, count); } } the extension methods are also available at github . basic contract i usually start use a small definition for the repository, since it makes my other contracts less verbose. do note that some of my repository contracts do not implement this interface (for instance if any of the methods do not apply). public interface irepository where tentity : class { tentity getbyid(tkey id); void create(tentity entity); void update(tentity entity); void delete(tentity entity); } i then specialize it per domain model: public interface itruckrepository : irepository { ienumerable findbrokentrucks(); ienumerable find(string text); } that specialization is important. it keeps the contract simple. only create methods that you know that you need. entity framework do note that the repository pattern is only useful if you have pocos which are mapped using code first. otherwise you’ll just break the abstraction using the entities. the repository pattern isn’t very useful then. what i mean is that if you use the model designer you’ll always get a perfect representation of the database (but as classes). the problem is that those classes might not be a perfect representation of your domain model. hence you got to cut corners in the domain model to be able to use your generated db classes. if you on the other hand uses code first you can modify the models to be a perfect representation of your domain model (if the db is reasonable similar to it). you don’t have to worry about your changes being overwritten as they would have been by the model designer. you can follow this article if you want to get a foundation generated for you. base class public class entityframeworkrepository where tentity : class { private readonly dbcontext _dbcontext; public entityframeworkrepository(dbcontext dbcontext) { if (dbcontext == null) throw new argumentnullexception("dbcontext"); _dbcontext = dbcontext; } protected dbcontext dbcontext { get { return _dbcontext; } } public void create(tentity entity) { if (entity == null) throw new argumentnullexception("entity"); dbcontext.set().add(entity); } public tentity getbyid(tkey id) { return _dbcontext.set().find(id); } public void delete(tentity entity) { if (entity == null) throw new argumentnullexception("entity"); dbcontext.set().attach(entity); dbcontext.set().remove(entity); } public void update(tentity entity) { if (entity == null) throw new argumentnullexception("entity"); dbcontext.set().attach(entity); dbcontext.entry(entity).state = entitystate.modified; } } then i go about and do the implementation: public class truckrepository : entityframeworkrepository, itruckrepository { private readonly truckerdbcontext _dbcontext; public truckrepository(truckerdbcontext dbcontext) { _dbcontext = dbcontext; } public ienumerable findbrokentrucks() { //compare having this statement in a business class compared //to invoking the repository methods. which says more? return _dbcontext.trucks.where(x => x.state == 3).tolist(); } public ienumerable find(string text) { return _dbcontext.trucks.where(x => x.modelname.startswith(text)).tolist(); } } unit of work the unit of work implementation is simple for entity framework: public class entityframeworkunitofwork : iunitofwork { private readonly dbcontext _context; public entityframeworkunitofwork(dbcontext context) { _context = context; } public void dispose() { } public void savechanges() { _context.savechanges(); } } nhibernate i usually use fluent nhibernate to map my entities. imho it got a much nicer syntax than the built in code mappings. you can use nhibernate mapping generator to get a foundation created for you. but you do most often have to clean up the generated files a bit. base class public class nhibernaterepository where tentity : class { isession _session; public nhibernaterepository(isession session) { _session = session; } protected isession session { get { return _session; } } public tentity getbyid(string id) { return _session.get(id); } public void create(tentity entity) { _session.saveorupdate(entity); } public void update(tentity entity) { _session.saveorupdate(entity); } public void delete(tentity entity) { _session.delete(entity); } } implementation public class truckrepository : nhibernaterepository, itruckrepository { public truckrepository(isession session) : base(session) { } public ienumerable findbrokentrucks() { return _session.query().where(x => x.state == 3).tolist(); } public ienumerable find(string text) { return _session.query().where(x => x.modelname.startswith(text)).tolist(); } } unit of work public class nhibernateunitofwork : iunitofwork { private readonly isession _session; private itransaction _transaction; public nhibernateunitofwork(isession session) { _session = session; _transaction = _session.begintransaction(); } public void dispose() { if (_transaction != null) _transaction.rollback(); } public void savechanges() { if (_transaction == null) throw new invalidoperationexception("unitofwork have already been saved."); _transaction.commit(); _transaction = null; } } typical mistakes here are some mistakes which can be stumbled upon when using or/ms. do not expose linq methods let’s get it straight. there are no complete linq to sql implementations. they all are either missing features or implement things like eager/lazy loading in their own way. that means that they all are leaky abstractions. so if you expose linq outside your repository you get a leaky abstraction. you could really stop using the repository pattern then and use the or/m directly. public interface irepository { iqueryable query(); // [...] } those repositories really do not serve any purpose. they are just lipstick on a pig (yay, my favorite) those who use them probably don’t want to face the truth: or are just not reading very good: learn about lazy loading lazy loading can be great. but it’s a curse for all which are not aware of it. if you don’t know what it is, google . if you are not careful you could get 101 executed queries instead of 1 if you traverse a list of 100 items. invoke tolist() before returning the query is not executed in the database until you invoke tolist() , firstordefault() etc. so if you want to be able to keep all data related exceptions in the repositories you have to invoke those methods. get is not the same as search there are to types of reads which are made in the database. the first one is to search after items. i.e. the user want to identify the items that he/she like to work with. the second one is when the user has identified the item and want to work with it. those queries are different. in the first one, the user only want’s to get the most relevant information. in the second one, the user likely want’s to get all information. hence in the former one you should probably return userlistitem or similar while the other case returns user . that also helps you to avoid the lazy loading problems. i usually let search methods start with findxxxx() while those getting the entire item starts with getxxxx() . also don’t be afraid of creating specialized pocos for the searches. two searches doesn’t necessarily have to return the same kind of entity information. summary don’t be lazy and try to make too generic repositories. it gives you no upsides compared to using the or/m directly. if you want to use the repository pattern, make sure that you do it properly.

February 4, 2013

by Jonas Gauffin

· 12,275 Views

Sending Keystrokes to Other Apps with Windows API and C#

Recently I had to tackle a task where I needed to send keystrokes to another application, that are initiated from a .NET Windows app.

February 1, 2013

by Denzel D.

· 48,512 Views · 1 Like

Link List: MongoDB Drivers for Java

I am working on a Spring MVC app that demonstrates all of the different MongoDB Java APIs. Some Links http://code.google.com/p/morphia/wiki/QuickStart http://www.mongodb.org/display/DOCS/Java+Tutorial http://jongo.org/#updating https://github.com/hibernate/hibernate-ogm https://openshift.redhat.com/community/blogs/configuring-hibernateogm-for-your-jboss-app-using-mongodb-on-openshift-paas http://www.hibernate.org/subprojects/ogm.html https://github.com/impetus-opensource/Kundera/wiki https://github.com/impetus-opensource/Kundera/wiki/Getting-Started-in-5-minutes http://blog.fisharefriends.us/morphia-vs-spring-data-mongodb/ http://www.ibm.com/developerworks/java/library/j-morphia/index.html Maven POM Settings For Various Drivers morphia Morphia http://morphia.googlecode.com/svn/mavenrepo/ default sonatype-nexus Kundera Public Repository https://oss.sonatype.org/content/repositories/releases true false kundera-missing Kundera Public Missing Resources Repository http://kundera.googlecode.com/svn/maven2/maven-missing-resources true true com.google.code.morphia morphia 0.99 org.hibernate.ogm hibernate-ogm-core 4.0.0-SNAPSHOT provided org.mongodb mongo-java-driver 2.10.1 org.springframework.data spring-data-mongodb 1.0.4.RELEASE Hibernate OGM for MongoDB https://community.jboss.org/wiki/PortingSeamHotelBookingExampleToOGM https://github.com/ajf8/seam-booking-ogm https://openshift.redhat.com/community/blogs/configuring-hibernateogm-for-your-jboss-app-using-mongodb-on-openshift-paas https://github.com/openshift/openshift-ogm-quickstart Kundera (JPA for MongoDB) https://github.com/impetus-opensource/Kundera-Examples/wiki/Using-Kundera-with-Spring https://github.com/impetus-opensource/Kundera https://github.com/impetus-opensource/kundera-mongo-performance https://github.com/impetus-opensource/Kundera-Examples https://github.com/impetus-opensource/Kundera/wiki/Sample-Codes-and-Examples https://github.com/impetus-opensource/Kundera-Examples/wiki/Twitter https://dzone.com/articles/sqlifying-nosql-–-are-orm https://github.com/xamry/twitample https://github.com/impetus-opensource/Kundera-Examples/wiki/Cross-datastore-persistence-using-Kundera http://prabhubuzz.wordpress.com/2012/05/25/mongodb-cassandra-jpa-service-using-kundera/ http://gora.apache.org/ http://xamry.wordpress.com/2011/05/02/working-with-mongodb-using-kundera/ https://github.com/impetus-opensource/Kundera/wiki/Getting-Started-in-5-minutes https://github.com/impetus-opensource/Kundera/wiki/Concepts

February 1, 2013

by Tim Spann

CORE

· 3,979 Views · 1 Like

Building SOLID Databases: Open/Closed Principle

Like the Single Responsibility Principle, the Open/Closed Principle is pretty easy to apply to object-relational design in PostgreSQL, very much unlike the Liskov Substitution Principle which will be the subject of next week's post. However, the Open/Closed principle applies only in a weaker way to relational design and hence object-relational design for much the same reason that the Liskov Substitution Principle applies in completely different ways. This instalment thus begins the beginning of what is likely to be a full rotation going from similarity to difference and back to similarity again. As is always the case, relations can be thought of as specialized "fact classes" which are then manipulated and used to create usable information. Because they model facts, and not behavior, design concerns often apply differently to them. Additionally this is the first real taste of complexity in how object and relational paradigms often combine in an object-relational setup. The Open/Closed Principle in Application Design In application design and development the Open/Closed Principle facilitates code stability. The basic principle is that one should be able to extend a module without modifying the source code. In the LedgerSMB 1.4 codebase, for example, there are places where we prepare a screen for entering report criteria but wrap the functions which do so in other functions which set up report-specific input information. The function then is open to extension without modification and so the complexity of the function is limited to the most common aspects of generating report filter screens. Similarly customizations may simply inherit existing code interfaces and add new routines. This allows modified routines to exist without possibly interrupting other callers where needed. In addition to code stability (meaning lack of rate of changes to code due to changing requirements), there are two other frequently-overlooked benefits to respecting the open-closed principle. The first is that software tends to be quite complex and state management is a major source of bugs in virtually all software programs. When a frequently used subroutine is changed, it may introduce unexpected changes which are still compliant with the interface specification and test cases, but nonetheless introduce or uncover bugs elsewhere in the application. In essence the consequences of changing code are not always easily foreseen. The major problem here is that state often changes when interfaces are called, and consequently changing code behind a currently used interface means that there may be subtle state changes that are not adequately thought through. As the complexity of an interface grows, so too this problem grows. Instead if interfaces are built for flexibility, either via accepting additional data which can be passed on to the next stage, or via inheritance, then state is more easily assured, bugs are less likely to surface, and the application can more quickly be adapted to changing business rules. Problems Defining "Extension" and "Modification" in a db schema Inside the database, state changes are well defined and follow very specific rules. Consequently it is not sufficient to define "modification" as a "change in source code." If it were, an "alter table add column... " would always qualify. But is this the case? or is it merely an extension? In general merely adding an optional field can't possibly pose the state problems seen in the application environments, but it does possibly pose some problems. Defining modification and extension is thus a problem. In general some sorts of modification are more dangerous than others. For this reason we may look at these both in a strict view, where ideally tables are left alone and we add new joining tables to extend, and a more relaxed view where extension includes adding columns which do not change internal data constraints (i.e. where all previous insert and update statements would remain valid, and no constraints are relaxed). In general, what makes a database nice in terms of relational design, is that one can prove of disprove whether a schema change is backwards compatible using math. If it is not backwards-compatible, then it is always modification. If it is backwards compatible, it is always extension when using the relaxed view. Another point that must be born in mind is that relational math is quite capable of creating new synthetic relations based on the fact that relations are structurally transparent and encapsulation is typically weak, while state management issues are managed through a very well-developed framework of transaction management, locking, and, frequently, snapshot views. When you combine these techniques with declarative constraints on values, one has a very robust state engine which is based on a very different approach than object-oriented applications managing application behavior. Problems with modifying table schemas In a case where software may be managed via overlapping deployment cycles, certain problems can occur when extending tables by adding columns. This is because the knowledge of the deployment cycle typically only goes one way--- the extension team has knowledge of the base team's past cycles while the base team typically has no knowledge of the extending team's work. This is typical in cases where software is deployed and then customized. Typically adding fields to tables makes extension easy, but the cost is that major version upgrades of the base package may overwrite or clobber extensions or may fail. In essence a relaxed standard takes on risk that upgrades of the base package may not go so smoothly. On the other hand, if the software is deployed via a single deployment cycle, as is typical of purely inhouse applications and commercial software, these problems are avoided, and extension by adding fields does not break anything. The obvious problem here is that these categories are not mutually exclusive, however much they appear to be. The commercial software may be extended by a second team on-site, and therefore a single deployment cycle on one entity does not guarantee a single deployment cycle. Object/Relational Interfaces and the OCP Because object-relational interfaces can encapsulate data, and often can be made to run in certain security contexts (which can be made to cascade), the open/closed principle has a number of applications in ensuring testable database interfaces where security barriers are involved. For example, we might have an automation system where computers connect to the database in various roles to pull job information. Rather than having separate accounts for each computer, we can assign them the same login role, but filter out the data they can see based on the client IP address. This would increase the level of containment in the event of a system compromise. So we might create an interface of something like my_client() which instantiates the client information from the client IP address, and use that in various functions to filter. Consequently we might just run: select * from get_jobs(); The problem of course with such a system is that of testability. We can't readily test the output because it depends on client IP address. So we might instead create a more flexible interface, available only to superusers, which accepts a client object which we can instantiate by name, integer id, or the like. In that case we may have a query that can be called by dba's and test suites like this, where 123 is the internal id of the client: SELECT * FROM get_jobs(client(123)); The get_jobs() function for the production clients would now look like this: CREATE OR REPLACE FUNCTION get_jobs() RETURNS SETOF jobs LANGUAGE SQL SECURITY DEFINER AS $$ SELECT * FROM get_jobs(my_client()); $$; We have essentially built an API which is open to extension for security controls but closed to modification. This means we can run test cases on the underlying database cases even on production (since these can be in transactions that roll back), and push tests of the my_client() interface to the clients themselves, to verify their proper setup. A Functional Example: Arbitrary data type support in pg_message_queue There are a few cases, however, where the open-closed principle has more direct applicability. In pg_message_queue 0.1, only text, bytea, and xml queues were supported. Since virtually anything can be put in a text field, this was deemed to be sufficient at the time. However in 0.2, I wanted to be able to support JSON queues but in a way that would not preclude the extension from running on PostgreSQL 9.1. The solution was to return to the open/closed principle and build a system which could be extended easily for arbitrary types. The result was much more powerful than initially hoped for (and in fact now I am using queues with integer and ip address payloads). In 0.1, the code looked like this: CREATE TABLE pg_mq_base ( msg_id bigserial not null, sent_at timestamp not null default now(), sent_by name not null default session_user, delivered_at timestamp ); CREATE TABLE pg_mq_xml ( payload xml not null, primary key (msg_id) ) inherits (pg_mq_base); CREATE TABLE pg_mq_text ( payload text not null, primary key (msg_id) ) inherits (pg_mq_base); CREATE TABLE pg_mq_bytea ( payload bytea not null, primary key (msg_id) ) inherits (pg_mq_base); The approach here was to use table inheritance so that new queue types could be easily added. When queues are added a table is created like one of the other tables, including all indexes etc. The relevant portion of the pg_mq_create_queue function is: EXECUTE 'CREATE TABLE ' || quote_ident(t_table_name) || '( like ' || quote_ident('pg_mq_' || in_payload_type ) || ' INCLUDING ALL )'; The problem here is that while it was possible to extend this, one couldn't do so very easily without modifying the source code of the functions. In 0.2, we reduced the latter part to: -- these are types for return values only. they are not for storage. -- using tables because types don't inherit CREATE TABLE pg_mq_text (payload text) inherits (pg_mq_base); CREATE TABLE pg_mq_bin (payload bytea) inherits (pg_mq_base); But the real change that the payload type for the queue. The table creation portion of pg_mq_create_queue is now: EXECUTE 'CREATE TABLE ' || quote_ident(t_table_name) || '( like pg_mq_base INCLUDING ALL, payload ' || in_payload_type || ' NOT NULL )'; This has the advantage of allowing payloads of any type known to PostgreSQL. We can have queues for mac addresses, internet addresses, GIS data, and even complex types if we want to do more object-relational processing on output. This approach will become more important after 0.3 is out and we begin working on object-relational interfaces on pg_message_queue. Conclusions The Open/Closed principle is where we start to see a mismatch between object-oriented application programming and object-relational database design. It's not that the basic principle doesn't apply, but just that it does so in often strange and counter-intuitive ways.

January 31, 2013

by Chris Travers

· 6,011 Views

ActiveMQ: JDBC Master Slave with MySQL

In this post I'll document the simple configuration needed by ActiveMQ to configure the JDBC persistence adapter and setting up MySQL as the persistent storage. With this type of configuration you can also configure a Master/Slave broker setup by having more than one broker connect to the same database instance. List of Binaries used for this Example mysql-5.5.20-osx10.6-x86_64 mysql-connector-java-5.1.18-bin.jar apache-activemq-5.5.1-fuse-01-13 Configuring MySQL Download and install MySQL. Note for OS X users: The dmg provides a simple install that contains a Startup Item package which will configure MySQL to start automatically after each reboot as well as a Preference Pane plugin which will get added to the Settings panel to allow you to start/stop and configure autostart of MySQL. Once you have you have MySQL installed and properly configured, you will need to start the MySQL Monitor to create a user and database. macbookpro-251a:bin jsherman$ ./mysql -u root Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 29 Server version: 5.5.20 MySQL Community Server (GPL) Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> Then create the database for ActiveMQ mysql> CREATE DATABASE activemq; Then create a user and grant them privileges for the database mysql> CREATE USER 'activemq'@'%'localhost' IDENTIFIED BY 'activemq'; mysql> GRANT ALL ON activemq.* TO 'activemq'@'localhost'; mysql> exit Now log back into the MySQL Monitor and access the activemq database with the activemq user to make sure everything is okay macbookpro-251a:bin jsherman$ ./mysql -u activemq -p activemq Enter password: Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 28 Server version: 5.5.20 MySQL Community Server (GPL) Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql>exit ActiveMQ Broker Configuration Download the latest FuseSource distribution of ActiveMQ. In the broker's configuration file, activemq.xml, add the following persistence adapter to configure a JDBC connection to MySQL. Then, just after the ending broker element () add the following bean Copy the MySQL diver to the ActiveMQ lib directory, mysql-connector-java-5.1.18-bin.jar was used in this example. Now start your broker, you should see the following output if running from the console INFO | Using Persistence Adapter: JDBCPersistenceAdapter(org.apache.commons.dbcp.BasicDataSource@303bc1a1) INFO | Database adapter driver override recognized for : [mysql-ab_jdbc_driver] - adapter: class org.apache.activemq.store.jdbc.adapter.MySqlJDBCAdapter INFO | Database lock driver override not found for : [mysql-ab_jdbc_driver]. Will use default implementation. INFO | Attempting to acquire the exclusive lock to become the Master broker INFO | Becoming the master on dataSource: org.apache.commons.dbcp.BasicDataSource@303bc1a1 INFO | ActiveMQ 5.5.1-fuse-01-13 JMS Message Broker (jdbcBroker1) is starting Now you can check your database in MySQL and see that ActiveMQ has created the required tables. mysql> USE activemq; SHOW TABLES; +--------------------+ | Tables_in_activemq | +--------------------+ | ACTIVEMQ_ACKS | | ACTIVEMQ_LOCK | | activemq_msgs | +--------------------+ 3 rows in set (0.00 sec) mysql> If you configure multiple brokers to use this same database instance in the jdbcPersistenceAdapter element then these brokers will attempt to acquire a lock, if they are unable to get a database lock the will wait until the lock becomes available. This can be seen by starting a second broker using the above JDBC persistence configuration. INFO | PListStore:activemq-data/jdbcBroker/tmp_storage started INFO | Using Persistence Adapter: JDBCPersistenceAdapter(org.apache.commons.dbcp.BasicDataSource@78979f67) INFO | Database adapter driver override recognized for : [mysql-ab_jdbc_driver] - adapter: class org.apache.activemq.store.jdbc.adapter.MySqlJDBCAdapter As you can see the second broker did not fully initialize as it is waiting to acquire the database lock. If the master broker is killed, then you see the slave will acquire the database lock and becomes the new master. INFO | Database lock driver override not found for : [mysql-ab_jdbc_driver]. Will use default implementation. INFO | Attempting to acquire the exclusive lock to become the Master broker INFO | Becoming the master on dataSource: org.apache.commons.dbcp.BasicDataSource@2e19fc25 INFO | ActiveMQ 5.5.1-fuse-01-13 JMS Message Broker (jdbcBroker2) is starting INFO | For help or more information please see: http://activemq.apache.org/ INFO | Listening for connections at: tcp://macbookpro-251a.home:61617 INFO | Connector openwire Started INFO | ActiveMQ JMS Message Broker (jdbcBroker2, ID:macbookpro-251a.home-53193-1328656157052-0:1) started Summary As you can see, it is fairly simple and straight forward to configure a robust highly-available messaging system using ActiveMQ with database persistence.

January 30, 2013

by Mitch Pronschinske

· 16,742 Views

JDBC Realm and Form Based Authentication with GlassFish 3.1.2.2 and Primefaces 3.4

One of the most popular posts on my blog is the short tutorial about the JDBC Security Realm and form based Authentication on GlassFish with Primefaces. After I received some comments about it that it isn't any longer working with latest GlassFish 3.1.2.2 I thought it might be time to revisit it and present an updated version. Here we go: Preparation As in the original tutorial I am going to rely on some stuff. Make sure to have a recent NetBeans 7.3 beta2 (which includes GlassFish 3.1.2.2) and the MySQL Community Server (5.5.x) installed. You should have verified that everything is up an running and that you can start GlassFish and the MySQL Server also is started. Some Basics A GlassFish authentication realm, also called a security policy domain or security domain, is a scope over which the GlassFish Server defines and enforces a common security policy. GlassFish Server is preconfigured with the file, certificate, and administration realms. In addition, you can set up LDAP, JDBC, digest, Oracle Solaris, or custom realms. An application can specify which realm to use in its deployment descriptor. If you want to store the user credentials for your application in a database your first choice is the JDBC realm. Prepare the Database Fire up NetBeans and switch to the Services tab. Right click the "Databases" node and select "Register MySQL Server". Fill in the details of your installation and click "ok". Right click the new MySQL node and select "connect". Now you see all the already available databases. Right click again and select "Create Database". Enter "jdbcrealm" as the new database name. Remark: We're not going to do all that with a separate database user. This is something that is highly recommended but I am using the root user in this examle. If you have a user you can also grant full access to it here. Click "ok". You get automatically connected to the newly created database. Expand the bold node and right click on "Tables". Select "Execute Command" or enter the table details via the wizard. CREATE TABLE USERS ( `USERID` VARCHAR(255) NOT NULL, `PASSWORD` VARCHAR(255) NOT NULL, PRIMARY KEY (`USERID`) ); CREATE TABLE USERS_GROUPS ( `GROUPID` VARCHAR(20) NOT NULL, `USERID` VARCHAR(255) NOT NULL, PRIMARY KEY (`GROUPID`) ); That is all for now with the database. Move on to the next paragraph. Let GlassFish know about MySQL First thing to do is to get the latest and greatest MySQL Connector/J from the MySQL website which is 5.1.22 at the time of writing this. Extract the mysql-connector-java-5.1.22-bin.jar file and drop it into your domain folder (e.g. glassfish\domains\domain1\lib). Done. Now it is finally time to create a project. Basic Project Setup Start a new maven based web application project. Choose "New Project" > "Maven" > Web Application and hit next. Now enter a name (e.g. secureapp) and all the needed maven cordinates and hit next. Choose your configured GlassFish 3+ Server. Select Java EE 6 Web as your EE version and hit "Finish". Now we need to add some more configuration to our GlassFish domain.Right click on the newly created project and select "New > Other > GlassFish > JDBC Connection Pool". Enter a name for the new connection pool (e.g. SecurityConnectionPool) and underneath the checkbox "Extract from Existing Connection:" select your registered MySQL connection. Click next. review the connection pool properties and click finish. The newly created Server Resources folder now shows your sun-resources.xml file. Follow the steps and create a "New > Other > GlassFish > JDBC Resource" pointing the the created SecurityConnectionPool (e.g. jdbc/securityDatasource).You will find the configured things under "Other Sources / setup" in a file called glassfish-resources.xml. It gets deployed to your server together with your application. So you don't have to care about configuring everything with the GlassFish admin console.Additionally we still need Primefaces. Right click on your project, select "Properties" change to "Frameworks" category and add "JavaServer Faces". Switch to the Components tab and select "PrimeFaces". Finish by clicking "OK". You can validate if that worked by opening the pom.xml and checking for the Primefaces dependency. 3.4 should be there. Feel free to change the version to latest 3.4.2. Final GlassFish Configuration Now it is time to fire up GlassFish and do the realm configuration. In NetBeans switch to the "Services" tab again and right click on the "GlassFish 3+" node. Select "Start" and watch the Output window for a successful start. Right click again and select "View Domain Admin Console", which should open your default browser pointing you to http://localhost:4848/. Select "Configurations > server-config > Security > Realms" and click "New..." on top of the table. Enter a name (e.g. JDBCRealm) and select the com.sun.enterprise.security.auth.realm.jdbc.JDBCRealm from the drop down. Fill in the following values into the textfields: JAAS jdbcRealm JNDI jdbc/securityDatasource User Table users User Name Column username Password Column password Group Table groups Group Name Column groupname Leave all the other defaults/blanks and select "OK" in the upper right corner. You are presented with a fancy JavaScript warning window which tells you to _not_ leave the Digest Algorithm Field empty. I field a bug about it. It defaults to SHA-256. Which is different to GlassFish versions prior to 3.1 which used MD5 here. The older version of this tutorial didn't use a digest algorithm at all ("none"). This was meant to make things easier but isn't considered good practice at all. So, let's stick to SHA-256 even for development, please. Secure your application Done with configuring your environment. Now we have to actually secure the application. First part is to think about the resources to protect. Jump to your Web Pages folder and create two more folders. One named "admin" and another called "users". The idea behind this is, to have two separate folders which could be accessed by users belonging to the appropriate groups. Now we have to create some pages. Open the Web Pages/index.xhtml and replace everything between the h:body tags with the following: Select where you want to go: Now add a new index.xhtml to both users and admin folders. Make them do something like this: Hello Admin|User On to the login.xhtml. Create it with the following content in the root of your Web Pages folder. Username: Password: As you can see, whe have the basic Primefaces p:panel component which has a simple html form which points to the predefined action j_security_check. This is, where all the magic is happening. You also have to include two input fields for username and password with the predefined names j_username and j_password. Now we are going to create the loginerror.xhtml which is displayed, if the user did not enter the right credentials. (use the same DOCTYPE and header as seen in the above example). Sorry, you made an Error. Please try again: Login The only magic here is the href link of the Login anchor. We need to get the correct request context and this could be done by accessing the faces context. If a user without the appropriate rights tries to access a folder he is presented a 403 access denied error page. If you like to customize it, you need to add it and add the following lines to your web.xml: 403 /faces/403.xhtml That snippet defines, that all requests that are not authorized should go to the 403 page. If you have the web.xml open already, let's start securing your application. We need to add a security constraint for any protected resource. Security Constraints are least understood by web developers, even though they are critical for the security of Java EE Web applications. Specifying a combination of URL patterns, HTTP methods, roles and transport constraints can be daunting to a programmer or administrator. It is important to realize that any combination that was intended to be secure but was not specified via security constraints, will mean that the web container will allow those requests. Security Constraints consist of Web Resource Collections (URL patterns, HTTP methods), Authorization Constraint (role names) and User Data Constraints (whether the web request needs to be received over a protected transport such as TLS). Admin Pages Protected Admin Area /faces/admin/* GET POST HEAD PUT OPTIONS TRACE DELETE admin NONE All Access None Protected User Area /faces/users/* GET POST HEAD PUT OPTIONS TRACE DELETE NONE If the constraints are in place you have to define, how the container should challenge the user. A web container can authenticate a web client/user using either HTTP BASIC, HTTP DIGEST, HTTPS CLIENT or FORM based authentication schemes. In this case we are using FORM based authentication and define the JDBCRealm FORM JDBCRealm /faces/login.xhtml /faces/loginerror.xhtml The realm name has to be the name that you assigned the security realm before. Close the web.xml and open the sun-web.xml to do a mapping from the application role-names to the actual groups that are in the database. This abstraction feels weird, but it has some reasons. It was introduced to have the option of mapping application roles to different group names in enterprises. I have never seen this used extensively but the feature is there and you have to configure it. Other appservers do make the assumption that if no mapping is present, role names and group names do match. GlassFish doesn't think so. Therefore you have to put the following into the glassfish-web.xml. You can create it via a right click on your project's WEB-INF folder, selecting "New > Other > GlassFish > GlassFish Descriptor" admin admin hat was it _basically_ ... everything you need is in place. The only thing that is missing are the users in the database. It is still empty ...We need to add a test user: Adding a Test-User to the Database And again we start by right clicking on the jdbcrealm database on the "Services" tab in NetBeans. Select "Execute Command" and insert the following: INSERT INTO USERS VALUES ("admin", "8c6976e5b5410415bde908bd4dee15dfb167a9c873fc4bb8a81f6f2ab448a918"); INSERT INTO USERS_GROUPS VALUES ("admin", "admin"); You can login with user: admin and password: admin and access the secured area. Sample code to generate the hash could look like this: try { MessageDigest md = MessageDigest.getInstance("SHA-256"); String text = "admin"; md.update(text.getBytes("UTF-8")); // Change this to "UTF-16" if needed byte[] digest = md.digest(); BigInteger bigInt = new BigInteger(1, digest); String output = bigInt.toString(16); System.out.println(output); } catch (NoSuchAlgorithmException | UnsupportedEncodingException ex) { Logger.getLogger(PasswordTest.class.getName()).log(Level.SEVERE, null, ex); } Have fun securing your apps and keep the questions coming! In case you need it, the complete source code is on https://github.com/myfear/JDBCRealmExample

January 29, 2013

by Markus Eisele

· 39,411 Views · 1 Like

Profiling MySQL Memory Usage With Valgrind Massif

This post comes from Roel Van de Paar at the MySQL Performance Blog. There are times where you need to know exactly how much memory the mysqld server (or any other program) is using, where (i.e. for what function) it was allocated, how it got there (a backtrace, please!), and at what point in time the allocation happened. For example; you may have noticed a sharp memory increase after executing a particular query. Or, maybe mysqld is seemingly using too much memory overall. Or again, maybe you noticed mysqld’s memory profile slowly growing overtime, indicating a possible memory bug. Whatever the reason, there is a simple but powerful way to profile MySQL memory usage; the Massif tool from Valgrind. An excerpt from the Massif manual page (Heap memory being simply the allotted pool of memory for use by programs); Massif tells you not only how much heap memory your program is using, it also gives very detailed information that indicates which parts of your program are responsible for allocating the heap memory. Firstly, we need to get the Valgrind program. Though you could use the latest version which comes with your OS (think yum or apt-get install Valgrind), I prefer to obtain & compile the latest release (3.8.1 at the moment): sudo yum remove valgrind* # or apt-get etc. sudo yum install wget make gcc gcc-c++ libtool libaio-devel bzip2 glibc* wget http://valgrind.org/downloads/valgrind-3.8.1.tar.bz2 # Or newer tar -xf valgrind-3.8.1.tar.bz2 cd valgrind-3.8.1 ./configure make sudo make install valgrind --version # check version to be same as what was downloaded (3.8.1 here) There are several advantages to self-compiling: When using the latest version of Valgrind, even compiled ‘out of the box’ (i.e. with no changes), you will likely see less issues then with earlier versions. For example, earlier versions may have too-small Valgrind-internal memory tracking allocations hardcoded. In other words; you may not be able to run your huge-buffer-pool under Valgrind without it complaining quickly. If you self compile, and those Valgrind-internal limits are still too small, you can easily change them before compiling. An often bumped up setting is VG_N_SEGMENTS in coregrind/m_aspacemgr/aspacemgr-linux.c (when you see ‘Valgrind: FATAL: VG_N_SEGMENTS is too low’) Newer releases [better] support newer hardware and software. Once ‘valgrind –version’ returns the correct installed version, you’re ready to go. In this example, we’ll write the output to /tmp/massif.out. If you prefer to use another location (and are therefore bound to set proper file rights etc.) use: $ touch /your_location/massif.out $ chown user:group /your_location/massif.out # Use the user mysqld will now run under (check 'user' setting in my.cnf also) (see here if this is not clear) Now, before you run mysqld under Valgrind, make sure debug symbols are present. Debug symbols are present when the binary is not stripped of them (downloaded ‘GA’ [generally available] packages may contain optimized or stripped binaries, which are optimized for speed rather than debugging). If the binaries you have are stripped, you have a few options to get a debug build of mysqld to use for memory profiling purposes: Download the appropriate debuginfo packages (these may not be available for all releases). Download debug binaries of the same server version as you are currently using, and simply use the debug mysqld as a drop-in replacement for your current mysqld (i.e. shutdown, mv mysqld mysqld.old, cp /debug_bin_path/mysqld ./mysqld, startup). If you have (through download or from past storage) the source code available (of the same server version as you are currently using) then simply debug-compile the source and use the mysqld binary as a drop-in replacement as shown in the last point. (For example, Percona Server 5.5 source can be debug-compiled by using ‘./build/build-binary –debug ..’). Valgrind Massif needs the debug symbol information to be present, so that it can print stack traces that show where memory is consumed. Without debug symbols available, you would not be able to see the actual function call responsible for memory usage. If you’re not sure if you have stripped binaries, simply test the procedure below and see what output you get. Once you’re all set with debug symbols, shutdown your mysqld server using your standard shutdown procedure, and then re-start it manually under Valgrind using the Massif tool: $ valgrind --tool=massif --massif-out-file=/tmp/massif.out /path/to/mysqld {mysqld options...} Note that ‘{mysqld options}’ could for instance include –default-file=/etc/my.cnf (if this is where your my.cnf file is located) in order to point mysqld to your settings file etc. After mysqld is properly started (check if you can login with your mysql client), you would execute whatever steps you think are necessary to increase memory usage/trigger the memory problem. You could also just leave the server running for some time (for example, if you have experienced memory increase over time). Once you’ve done that, shutdown mysqld (again using your normal shutdown procedure), and then use the ms_print tool on the masif.out file to output a textual graph of memory usage: ms_print /tmp/massif.out An partial example output from a recent customer problem we worked on: 96.51% (68,180,839B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->50.57% (35,728,995B) 0x7A3CB0: my_malloc (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | ->10.10% (7,135,744B) 0x7255BB: Log_event::read_log_event(char const*, unsigned int, char const**, Format_description_log_event const*) (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | ->10.10% (7,135,744B) 0x728DAA: Log_event::read_log_event(st_io_cache*, st_mysql_mutex*, Format_description_log_event const*) (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | ->10.10% (7,135,744B) 0x5300A8: ??? (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | | ->10.10% (7,135,744B) 0x5316EC: handle_slave_sql (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | | ->10.10% (7,135,744B) 0x3ECF60677B: start_thread (in /lib64/libpthread-2.5.so) | | | ->10.10% (7,135,744B) 0x3ECEAD325B: clone (in /lib64/libc-2.5.so) [...] And, a few snapshots later: 92.81% (381,901,760B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->84.91% (349,404,796B) 0x7A3CB0: my_malloc (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | ->27.73% (114,084,096B) 0x7255BB: Log_event::read_log_event(char const*, unsigned int, char const**, Format_description_log_event const*) (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | ->27.73% (114,084,096B) 0x728DAA: Log_event::read_log_event(st_io_cache*, st_mysql_mutex*, Format_description_log_event const*) (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | ->27.73% (114,084,096B) 0x5300A8: ??? (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | | ->27.73% (114,084,096B) 0x5316EC: handle_slave_sql (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | | ->27.73% (114,084,096B) 0x3ECF60677B: start_thread (in /lib64/libpthread-2.5.so) | | | ->27.73% (114,084,096B) 0x3ECEAD325B: clone (in /lib64/libc-2.5.so) As you can see, a fair amount of (and in this case ‘too much’) memory is being allocated to the Log_event::read_log_event function. You can also see the memory allocated to the function grow significantly accross the snapshots. This example helped to pin down a memory leak bug on a filtered slave (read more in the actual bug report). Besides running Valgrind Massif in the way above, you can also change Massif’s snapshot options and other cmd line options to match the snapshot frequency etc. to your specific requirements. However, you’ll likely find that the default options will perform well in most scenario’s. For the technically advanced, you can take things one step further: use Valgrind’s gdbserver to obtain Massif snapshots on demand (i.e. you can command-line initiate Massif snapshots just before, during and after executing any commands which may alter memory usage significantly). Conclusion: using Valgrind Massif, and potentially Valgrind’s gdbserver (which was not used in the resolution of the example bug discussed), will help you to analyze the ins and outs of mysqld’s (or any other programs) memory usage. Credits: Staff @ a Percona customer, Ovais, Laurynas, Sergei, George, Vladislav, Raghavendra, Ignacio, myself & others at Percona all combined efforts leading to the information you can read above.

January 26, 2013

by Peter Zaitsev

· 4,776 Views

Sorting Text Files with MapReduce

in my last post i wrote about sorting files in linux. decently large files (in the tens of gb’s) can be sorted fairly quickly using that approach. but what if your files are already in hdfs, or ar hundreds of gb’s in size or larger? in this case it makes sense to use mapreduce and leverage your cluster resources to sort your data in parallel. mapreduce should be thought of as a ubiquitous sorting tool, since by design it sorts all the map output records (using the map output keys), so that all the records that reach a single reducer are sorted. the diagram below shows the internals of how the shuffle phase works in mapreduce. given that mapreduce already performs sorting between the map and reduce phases, then sorting files can be accomplished with an identity function (one where the inputs to the map and reduce phases are emitted directly). this is in fact what the sort example that is bundled with hadoop does. you can look at the how the example code works by examining the org.apache.hadoop.examples.sort class. to use this example code to sort text files in hadoop, you would use it as follows: shell$ export hadoop_home=/usr/lib/hadoop shell$ $hadoop_home/bin/hadoop jar $hadoop_home/hadoop-examples.jar sort \ -informat org.apache.hadoop.mapred.keyvaluetextinputformat \ -outformat org.apache.hadoop.mapred.textoutputformat \ -outkey org.apache.hadoop.io.text \ -outvalue org.apache.hadoop.io.text \ /hdfs/path/to/input \ /hdfs/path/to/output this works well, but it doesn’t offer some of the features that i commonly rely upon in linux’s sort, such as sorting on a specific column, and case-insensitive sorts. linux-esque sorting in mapreduce i’ve started a new github repo called hadoop-utils , where i plan to roll useful helper classes and utilities. the first one is a flexible hadoop sort. the same hadoop example sort can be accomplished with the hadoop-utils sort as follows: shell$ $hadoop_home/bin/hadoop jar hadoop-utils--jar-with-dependencies.jar \ com.alexholmes.hadooputils.sort.sort \ /hdfs/path/to/input \ /hdfs/path/to/output to bring sorting in mapreduce closer to the linux sort, the --key and --field-separator options can be used to specify one or more columns that should be used for sorting, as well as a custom separator (whitespace is the default). for example, imagine you had a file in hdfs called /input/300names.txt which contained first and last names: shell$ hadoop fs -cat 300names.txt | head -n 5 roy franklin mario gardner willis romero max wilkerson latoya larson to sort on the last name you would run: shell$ $hadoop_home/bin/hadoop jar hadoop-utils--jar-with-dependencies.jar \ com.alexholmes.hadooputils.sort.sort \ --key 2 \ /input/300names.txt \ /hdfs/path/to/output the syntax of --key is pos1[,pos2] , where the first position (pos1) is required, and the second position (pos2) is optional - if it’s omitted then pos1 through the rest of the line is used for sorting. just like the linux sort, --key is 1-based, so --key 2 in the above example will sort on the second column in the file. lzop integration another trick that this sort utility has is its tight integration with lzop, a useful compression codec that works well with large files in mapreduce (see chapter 5 of hadoop in practice for more details on lzop). it can work with lzop input files that span multiple splits, and can also lzop-compress outputs, and even create lzop index files. you would do this with the codec and lzop-index options: shell$ $hadoop_home/bin/hadoop jar hadoop-utils--jar-with-dependencies.jar \ com.alexholmes.hadooputils.sort.sort \ --key 2 \ --codec com.hadoop.compression.lzo.lzopcodec \ --map-codec com.hadoop.compression.lzo.lzocodec \ --lzop-index \ /hdfs/path/to/input \ /hdfs/path/to/output multiple reducers and total ordering if your sort job runs with multiple reducers (either because mapreduce.job.reduces in mapred-site.xml has been set to a number larger than 1, or because you’ve used the -r option to specify the number of reducers on the command-line), then by default hadoop will use the hashpartitioner to distribute records across the reducers. use of the hashpartitioner means that you can’t concatenate your output files to create a single sorted output file. to do this you’ll need total ordering , which is supported by both the hadoop example sort and the hadoop-utils sort - the hadoop-utils sort enables this with the --total-order option. shell$ $hadoop_home/bin/hadoop jar hadoop-utils--jar-with-dependencies.jar \ com.alexholmes.hadooputils.sort.sort \ --total-order 0.1 10000 10 \ /hdfs/path/to/input \ /hdfs/path/to/output the syntax is for this option is unintuitive so let’s look at what each field means. more details on total ordering can be seen in chapter 4 of hadoop in practice . more details for details on how to download and run the hadoop-utils sort take a look at the cli guide in the github project page .

January 26, 2013

by Alex Holmes

· 15,460 Views

Building SOLID Databases: Single Responsibility and Normalization

Introduction This instalment will cover the single responsibility principle in object-relational design, and its relationship both to data normalization and object-oriented application programming. While single responsibility is a fairly easy object-oriented principle to apply here, I think it is critical to explore in depth because it helps provide a clearer framework to address object-relational design. As in later instalments I will be using snippets of code developed elsewhere for other areas. These will not be full versions of what was written, but versions sufficient to show the basics of data structure and interface. Relations and Classes: Similarities Objects and classes, in the surface, look deceptively similar, to the point where one can look at relations as sets of classes, and in fact this equivalence is the basis of object-relational database design. Objects are data structures used to store state, which have identity and are tightly bound to interface. Relations are data structures which store state, and if they meet second normal form, have identity in the form of a primary key. Object-relational databases then provide interface and thus in an object-relational database, relations are classes, and contain sets of objects of a certain class. Relations and Classes: Differences So similar are objects and classes in structure that a very common mistake is to simply use a relational database management system as a simple object store. This tends to result in brittle database leading to a relatively brittle application. Consequently many of us see this approach as something of an anti-pattern. The reason why this doesn't work terribly well is not that the basic equivalence is bad but that relations and classes are used in very different ways. On the application layer, classes are used to model (and control) behavior, while in the database, relations and tuples are used to model information. Thus tying database structures to application classes in this way essentially overloads the data structures, turning the structures into reporting objects as well as behavior objects. Relations thus need to be seen not only as classes but as specialized classes used for persistent storage and reporting. Thus they have fundamentally different requirements than the behavior classes in the application and thus they have different reasons to change. An application class typically changes when there is a need for a change in behavior, while a relation should only change when there is a change in data retention and reporting. Relations have traditionally tended to be divorced from interface and this provides a great deal of power. While classes tend to be fairly opaque, relations tend to be very transparent. The reason here is that while both represent state information whether it is application state or other facts, objects traditionally encapsulate behavior (and thus act as building blocks of behavior), relations always encapsulate information and are building blocks of information. Thus the data structures of relations must be transparent while object-oriented design tends to push for less transparency and more abstraction. It is worth noting then that because these systems are designed to do different things, there are many DBA's who suggest encapsulating the entire database behind an API, defined by stored procedures. The typical problem with this approach is that loose coupling of the application to the interface is difficult (but see one of the first posts on this blog for a solution). When the db interface is tightly coupled to the application, then typically one ends up with problems on several levels, and it tends to sacrifice good application design for good relational database design. Single Responsibility Principle in Application Programming The single responsibility principle holds that every class should have a single responsibility which it should entirely encapsulate. A responsibility is defined as a reason to change. The canonical example of a violation of this principle is a class which might format and print a report. Because both data changes and cosmetic changes may require a change to the class, this would be a violation of the principle at issue. In an ideal world, we'd separate out the printing and the formatting so that cosmetic changes do not require changes when data changes are made and vice versa. The problem of course with the canonical example is that it is not self-contained. If you change the data in the report, it will almost certainly require cosmetic changes. You can try to automate those changes but only within limits, and you can abstract interfaces (dependency inversion) but in the end if you change the data in the report enough, cosmetic changes will become necessary. Additionally a "reason to change" is epistemologically problematic. Reasons foreseen are rarely if ever atomic, and so there is a real question as far as how far one pushes this. In terms of formatting a report, do we want to break out the class that adjusts to paper size so that if we want to move from US Letter to A4 we no longer have to change the rest of the cosmetic layout? Perfect separation of responsibilities in that example is thus impossible, as it probably always is --- you can only change business rules to a certain point before interfaces must change, and when that happens the cascading flow of necessary changes can be quite significant. The database is, however, quite different in that that responsibility of database-level code (including DDL and DML) is limited to the proposition that we should construct answers from known facts. This makes a huge difference in terms of single responsibility, and it is possible to develop mathematical definitions for single responsibility. Not only is this possible but it has been done. All of the normal forms from third on up address single responsibility. The Definition of Third Normal Form Quoting Wikipedia, Codd's definition states that a table is in 3NF if and only if both of the following conditions hold: The relation R (table) is in second normal form (2NF) Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every superkey of R. A non-prime attribute is an attribute not part of a superkey. In essence what third normal form states is that every relation must contain a superkey and values functionally and directly dependent on that superkey. This will become more important as we look at how data anomilies dovetail with single responsibility. Normalization and Single Responsibility The process of database normalization is an attempt to create relational databases where data anomalies do not exist. Data anomalies occur where modifying data either requires modifying other data to maintain accuracy (where no independent fact changes are recorded), or where existing data may project current or historical facts not in existence (join anomilies). This process occurs by breaking down keys and superkeys, and their dependencies, such that data is tracked in smaller, self-contained units. Beginning at third normal form, one can see relations are forming single responsibilities of managing data directly dependent on their superkeys. From this point forward, relations' structures would change (assuming no decision to further decompose a relation into a higher normal form) if and only if a change is made to what data is tracked that is directly dependent on a superkey. The responsibility of the database layer is the storage of facts and the synthesis of answers. Since the storage relations themselves handle the first, normalization is a prerequisite to good object-relational design. The one major caveat here however is that first normal form's atomicity requirement must be interpreted slightly differently in object-relational setups because more complex data structures can be atomic compared to a purely relational design. In a purely relational database, the data types that can be used are relatively minor and therefore facts must be maximally decomposed. For example we might store an IP address plus network mask as 4 ints for the address and an int for the network mask, or we might store as a single 32-bit int plus another int for the network mask but the latter poses problems of display that the former does not. In an object-relational database, however, we might store the address as an array of 4 ints for IP v4 or, if we need better performance we might build a custom type. If storage is not a problem but ease of maintenance is, we might even define relations, domains, and such to hold IP addresses, and then store the tuple in a column with appropriate functional interfaces. None of these approaches necessarily violate first normal form, as long as the data type involved properly and completely encapsulates the required behavior. Where such encapsulation is problematic, however, they would violate 1NF because they can no longer be treated as atomic values. In all cases, the specific value has a 1:1 correspondence to an IP address. Additionally where the needs are different, storage, application interface, and reporting classes should be different (this can be handled with updateable views, object-relational interfaces, and the like). Object-Relational Interfaces and Single Responsibility For purely relational databases, normalization is sufficient to address single responsibility. Object-relational designs bring some additional complexity because some behavior may be encapsulated in the object interfaces. There are two fundamental cases where this may make a difference, namely in terms of compositional patterns and in terms of encapsulated data within columns. A compositional pattern in PostgreSQL typically would occur when we use table inheritance to manage commonly co-occuring fields which occur in ways which are functionally dependent on many other fields in a database. For example, we might have a notes abstract table, and then have various tables which inherit this, possibly as part of other larger tables. A common case where composition makes a big difference is in managing notes. People may want to attach notes to all kinds of other data in a database, and so one cannot say that the text or subject of a note is mutually dependent, A typical purely relational approach is to either have many independently managed notes tables or have a single global note table which stores notes for everything, and then have multiple join tables to add join dependencies. The problem with this is that the note data is usually dependent logically, if not mathematically, on the join dependency, and so there is no reasonable way of expressing this without a great deal of complexity in the database design. An object-relational approach might be to have multiple notes tables, but have them inherit the table structure of a common notes table. This table can then be expanded, interfaces added as needed, and it should fill the single responsibility principle even though we might not be able to say that there is a natural functional dependency within the table itself. The second case has to do with storing complex information in columns. Here stability and robustness of code is especially important, and traditional approaches of the single responsibility principle apply directly to the contained data type. Example: Machine Configuration Database and SMTP configuration One of my current projects is building a network configuration database for a LedgerSMBhosting business I am helping to found (more on this soon). For competitive reasons I cannot display my whole code here. However, what I would like to do is show a very abbreviated version here as I used to solve a very specific issue. One of the basic challenges in a network configuration database is that the direct functional dependencies for a given machine may become quite complex when we assume that a given piece of network software is not likely to be running more than once on a given machine. Additionally we often want to ensure that certain sorts of software are set to be configured for certain types of machines, and so constraints can exist that force wider tables. The width and complexity of some configuration tables can possibly pose a management problem over time for the reason that they may not be obviously broken into manageable chunks of columns. One possible solution is to decompose the storage class into smaller mix-ins, each of which expresses a set of functional dependencies on a specific key, fully encapsulating a single responsibility. The overall storage class then exists to manage cross-mixin constraints and handle the actual physical storage. The data can then be presented as a unified table, or as multiple joined tables (and this works even where views would add significant complexity). In this way the smaller sub-tables can be given the responsibility of managing the configuration of specific pieces of software. We might therefore have tables like: -- abstract table, contains no data CREATE TABLE mi_smtp_config ( mi_id bigint, smtp_hostname text, smtp_forward_to text ); CREATE TABLE machine_instance ( mi_id bigserial, mi_name text not null, inservice_date date not null, retire_date date. .... ) INHERITS (mi_smtp_config, ...); The major advantage to this approach is that we can easily check and add which fields are set up to configure which software, without looking through a much larger, wider table. This also provides additional interfaces for related data, and the like. For example, "select * from mi_smtp_config" is directly equivalent of "select (mi::mi_smtp_config).* from machine_instance mi; Conclusions When we think of relations as specialized "fact classes" as opposed to "behavior classes" in the application world, the idea of the single responsibility principle works quite well with relational databases, particularly when paired with other encapsulation processes like stored procedures and views. In object-relational designs, the principle can be used as a general guide for further decomposing relations into mix-in classes, or creating intelligent data types for attributes, and it becomes possible to solve a number of problems in this regard without breaking normalization rules.

January 25, 2013

by Chris Travers

· 7,925 Views