Databases Resources

The Latest Databases Topics

Solving the Detached Many-to-Many Problem with the Entity Framework

Introduction This article is part of the ongoing series I’ve been writing recently, but can be read as a standalone article. I’m going to do a better job of integrating the changes documented here into the ongoing solution I’ve been building. However, considering how much time and effort I put into solving this issue, I’ve decided to document the approach independently in case it is of use to others in the interim. The Problem Defined This issue presents itself when you are dealing with disconnected/detached Entity Framework POCO objects,. as the DbContext doesn’t track changes to entities. Specifically, trouble occurs with entities participating in a many-to-many relationship, where the EF has hidden a “join table” from the model itself. The problem with detached entities is that the data context has no way of knowing what changes have been made to an object graph, without fetching the data from the data store and doing an entity-by-entity comparison – and that assuming it’s possible to fetch the same way as it was originally. In this solution, all the entities are detached, don’t use proxy types and are designed to move between WCF service boundaries. Some Inspiration There are no out-of-the-box solutions that I’m aware of which can process POCO object graphs that are detached. I did find an interesting solution called GraphDiff which is available from github and also as a NuGet package, but it didn’t work with the latest RC version of the Entity Framework (v6). I also found a very comprehensive article on how to implement a generic repository pattern with the Entity Framework, but it was unable to handle detached many-to-many relationships. In any case, I highly recommend a read of this article, it was inspiration for some of the approach I’ve ended up taking with my own design. The Approach This morning I put together a simple data model with the relationships that I wanted to support with detached entities. I’ve attached the solution with a sample schema and test data at the bottom of this article. If you prefer to open and play with it, be sue to add the Entity Framework (v6 RC) via NuGet, I’ve omitted it for file size and licensing reasons). Here’s a logical view of the model I wanted to support: Here’s the schema view from SQL Server: Here’s the Entity Model which is generated from the above SQL schema: In the spirit of punching myself in the head, I’ve elected to have one table implement an identity specification (meaning the underlying schema allocated PK ID values) whereas the other two tables the ID must be specified. Theoretically, if I can handle the entity types in a generic fashion, then this solution can scale out to larger and more complex models. The scenarios I’m specifically looking to solve in this solution with detached object graphs are as follows: Add a relationship (many-to-many) Add a relationship (FK-based) Update a related entity (many-to-many) Update a related entity (FK-based) Remove a relationship (many-to-many) Remove a relationship (FK-based) Per the above, here’s the scenarios within the context of the above data model: Add a new Secondary entity to a Primary entity Add an Other entity to a Secondary entity Update a Secondary entity by updating a Primary entity Update an Other entity from a Secondary entity (or Primary entity) Remove (but not delete!) a Secondary entity from a Primary entity Remove (but not delete) a Other entity from a Secondary entity Establishing Test Data Just to give myself a baseline, the data model is populated (by default) with the following data. This gives us some “existing entities” to query and modify. More Work for the Consumer Although I tried my best, I couldn’t come to a design which didn’t require the consuming client to do slightly more work to enable this to work properly. Unfortunately the best place for change tracking to occur with disconnected entities is with the layer making changes – be it a business layer or something downstream. To this effect, entities will need to implement a property which reflects the state of the entity (added, modified, deleted etc.). For the object graph to be updated/managed successfully, the consumer of the entities needs to set the entity state properly. This isn’t at all as bad as it sounds, but it’s not nothing. Establishing some Scaffolding After generating the data model, the first thing to be done is ensure each entity derives from the same base class. (“EntityBase”) this is used later to establish the active state of an entity when it needs to be processed. I’ve also created an enum (“ObjectState”) which is a property of the base class and a helper function which maps ObjectState to an EF EntityState. In case this isn’t clear, here’s a class view: Constructing Data Access To ensure that the usage is consistent, I’ve defined a single Data Access class, mainly to establish the pattern for handling detached object graphs. I can’t stress enough that this is not intended as a guide to an appropriate way to structure your data access – I’ll be updating my ongoing series of articles to go into more detail – this is only to articulate a design approach to handling detached object graphs. Having said all that, here’s a look at my “DataAccessor” class, which can be used with generic data access entities (by way of generics): As with my ongoing project, the Entity Framework DbContext is instantiated by this class on construction, and implements IDisposable to ensure the DbContext is disposed properly upon construction. Here’s the constructor showing the EF configuration options I’m using: public DataAccessor() { _accessor = new SampleEntities(); _accessor.Configuration.LazyLoadingEnabled = false; _accessor.Configuration.ProxyCreationEnabled = false; } Updating an Entity We start with a basic scenario to ensure that the scaffolding has been implemented properly. The scenario is to query for a Primary entity and then change a property and update the entity in the data store. [TestMethod] public void UpdateSingleEntity() { Primary existing = null; String existingValue = String.Empty; using (DataAccessor a = new DataAccessor()) { existing = a.DataContext.Primaries.Include("Secondaries").First(); Assert.IsNotNull(existing); existingValue = existing.Title; existing.Title = "Unit " + DateTime.Now.ToString("MMdd hh:mm:ss"); } using (DataAccessor b = new DataAccessor()) { existing.State = ObjectState.Modified; b.InsertOrUpdate(existing); } using (DataAccessor c = new DataAccessor()) { existing.Title = existingValue; existing.State = ObjectState.Modified; c.InsertOrUpdate(existing); } } You’ll noticed that there is nothing particularly significant here, except that the object’s State is reset toModified between operations. Updating a Many-to-Many Relationship Now things get interesting. I’m going to query for a Primary entity, then I’ll update both a property of thePrimary entity itself, and a property of one of the entity’s relationships. [TestMethod] public void UpdateManyToMany() { Primary existing = null; Secondary other = null; String existingValue = String.Empty; String existingOtherValue = String.Empty; using (DataAccessor a = new DataAccessor()) { //Note that we include the navigation property in the query existing = a.DataContext.Primaries.Include("Secondaries").First(); Assert.IsTrue(existing.Secondaries.Count() > 1, "Should be at least 1 linked item"); } //save the original description existingValue = existing.Description; //set a new dummy value (with a date/time so we can see it working) existing.Description = "Edit " + DateTime.Now.ToString("yyyyMMdd hh:mm:ss"); existing.State = ObjectState.Modified; other = existing.Secondaries.First(); //save the original value existingOtherValue = other.AlternateDescription; //set a new value other.AlternateDescription = "Edit " + DateTime.Now.ToString("yyyyMMdd hh:mm:ss"); other.State = ObjectState.Modified; //a new data access class (new DbContext) using (DataAccessor b = new DataAccessor()) { //single method to handle inserts and updates //set a breakpoint here to see the result in the DB b.InsertOrUpdate(existing); } //return the values to the original ones existing.Description = existingValue; other.AlternateDescription = existingOtherValue; existing.State = ObjectState.Modified; other.State = ObjectState.Modified; using (DataAccessor c = new DataAccessor()) { //update the entities back to normal //set a breakpoint here to see the data before it reverts back c.InsertOrUpdate(existing); } } If we actually run this unit test and set the breakpoints accordingly, you’ll see the following in the database: Database at Breakpoint #1 / Database at Breakpoint #2 Database when Unit Test completes You’ll notice at the second breakpoint that the description of the first entities have both been updated. Examining the Insert/Update Code The function exposed by the “data access” class really just passes through to another private function which does the heavy lifting. This is mainly in case we need to reuse the logic, since it essentially processes state action on attached entities. public void InsertOrUpdate(params T[] entities) where T : EntityBase { ApplyStateChanges(entities); DataContext.SaveChanges(); } Here’s the definition of the ApplyStateChanges function, which I’ll discuss below: private void ApplyStateChanges(params T[] items) where T : EntityBase { DbSet dbSet = DataContext.Set(); foreach (T item in items) { //loads related entities into the current context dbSet.Attach(item); if (item.State == ObjectState.Added || item.State == ObjectState.Modified) { dbSet.AddOrUpdate(item); } else if (item.State == ObjectState.Deleted) { dbSet.Remove(item); } foreach (DbEntityEntry entry in DataContext.ChangeTracker.Entries() .Where(c => c.Entity.State != ObjectState.Processed && c.Entity.State != ObjectState.Unchanged)) { var y = DataContext.Entry(entry.Entity); y.State = HelperFunctions.ConvertState(entry.Entity.State); entry.Entity.State = ObjectState.Processed; } } } Notes on this Implementation What this function does is to iterate through the items to be examined, attach them to the current Data Context (which also attaches their children), act on each item accordingly (add/update/remove) and then process new entities which have been added to the Data Context’s change tracker. For each newly “discovered” entity (and ignoring entities which are unchanged or have already been examined), each entity’s DbEntityEntry is set according to the entity’s ObjectState (which is set by the calling client). Doing this allows the Entity Framework to understand what actions it needs to perform on the entities when SaveChanges() is invoked later. You’ll also note that I set the entity’s state to “Processed” when it has been examined, so we don’t act on it more than once (for performance purposes). Fun note: the AddOrUpdate extension method is something I found in theSystem.Data.Entity.Migrations namespace and it acts as an ‘Upsert’ operation, inserting or updating entities depending on whether they exist or not already. Bonus! That’s it for adding and updating, believe it or not. Corresponding Unit Test The following unit test establishes the creation of a new many-to-many entity, it is then removed (by relationship) and then finally deleted altogether from the database: [TestMethod] public void AddRemoveRelationship() { Primary existing = null; using (DataAccessor a = new DataAccessor()) { existing = a.DataContext.Primaries.Include("Secondaries") .FirstOrDefault(); Assert.IsNotNull(existing); } Secondary newEntity = new Secondary(); newEntity.State = ObjectState.Added; newEntity.AlternateTitle = "Unit"; newEntity.AlternateDescription = "Test"; newEntity.SecondaryId = 1000; existing.Secondaries.Add(newEntity); using (DataAccessor a = new DataAccessor()) { //breakpoint #1 here a.InsertOrUpdate(existing); } newEntity.State = ObjectState.Unchanged; existing.State = ObjectState.Modified; using (DataAccessor b = new DataAccessor()) { //breakpoint #2 here b.RemoveEntities(existing, x => x.Secondaries, newEntity); } using (DataAccessor c = new DataAccessor()) { //breakpoint #3 here c.Delete(newEntity); } } Test Results: Pre-Test – Breakpoint #1 / Breakpoint #2 Breakpoint #3 / Post execution (new entity deleted) SQL Profile Trace Removing a Many-to-Many Relationship Now this is where it gets tricky. I’d like to have something a little more polished, but the best I have come up with to date is a separate operation on the data provider which exposes functionality akin to “remove relationship”. The fundamental problem with how the EF POCO entities work without any modifications, is when they are detached, to remove a many-to-many relationship, the relationship to be removed is physically removed from the collection. When the object graph is sent back for processing, there’s a missing related entity, and the service or data context would have to make an assumption that the omission was on purpose, not to mention that it would have to compare against data currently in the data store. To make this easier, I’ve implemented a function called “RemoveEnttiies” which alters the relationship between the parent and the child/children. The one bug catch is that you need to specify the navigation property or collection, which might make it slightly undesirable to implement generically. In any case, I’ve provided two options – with the navigation property as a string parameter or as a LINQ expression – they both do the same thing. public void RemoveEntities(T parent, Expression> expression, params T2[] children) where T : EntityBase where T2 : EntityBase { DataContext.Set().Attach(parent); ObjectContext obj = DataContext.ToObjectContext(); foreach (T2 child in children) { DataContext.Set().Attach(child); obj.ObjectStateManager.ChangeRelationshipState(parent, child, expression, EntityState.Deleted); } DataContext.SaveChanges(); } Notes on this Implementation The “ToObjectContext” is an extension method, and is akin to (DataContext as IObjectContextAdapter).ObjectContext. This is to expose a more fundamental part of the Entity Framework’s object model. We need this level of access to get to the functionality which controls relationships. For each child to be removed (note: not deleted from the physical database), we nominate the parent object, the child, the navigation property (collection) and the nature of the relationship change (delete). Note that this will NOT WORK for Foreign Key defined relationships – more on that below. To delete entities which have active relationships, you’ll need to drop the relationship before attempting to delete or else you’ll have data integrity/referential integrity errors, unless you have accounted for cascading deletion (which I haven’t). Example execution: using (DataAccessor c = new DataAccessor()) { //c.RemoveEntities(existing, "Secondaries", s); //(or can use an expression): c.RemoveEntities(existing, x => x.Secondaries, s); } Removing FK Relationships As mentioned above, you can’t just edit the relationship to remove an FK-based relationship. Instead, you have to follow the EF practice of setting the FK entity to NULL. Here’s a Unit Test which demonstrates how this is achieved: Secondary s = ExistingEntity(); using (DataAccessor c = new DataAccessor()) { s.Other = null; s.OtherId = null; s.State = ObjectState.Modified; o.State = ObjectState.Unchanged; c.InsertOrUpdate(s); } We use the same “Insert or Update’ call – being aware that you have to set the ObjectState properties accordingly. Note: I’m in the process of testing the reverse removal – i.e. what happens if you want to remove a Secondaryentity from an Other entity’s collection. Deleting Entities This is fairly straightforward, but I’ve taken a few more precautions to ensure that the entity to be deleted is valid no the server side. public void Delete(params T[] entities) where T : EntityBase { foreach (T entity in entities) { T attachedEntity = Exists(entity); if (attachedEntity != null) { var attachedEntry = DataContext.Entry(attachedEntity); attachedEntry.State = EntityState.Deleted; } } DataContext.SaveChanges(); } To understand the above, you should take a look at the implementation of the “Exists” function which essentially checks the data store and local cache to see if there is an attached representation: protected T Exists(T entity) where T : EntityBase { var objContext = ((IObjectContextAdapter)this.DataContext) .ObjectContext; var objSet = objContext.CreateObjectSet(); var entityKey = objContext.CreateEntityKey(objSet.EntitySet.Name, entity); DbSet set = DataContext.Set(); var keys = (from x in entityKey.EntityKeyValues select x.Value).ToArray(); //Remember, there can by surrogate keys, so don't assume there's //just one column/one value //If a surrogate key isn't ordered properly, the Set().Find() //method will fail, use attributes on the entity to determine the //proper order. //context.Configuration.AutoDetectChangesEnabled = false; return set.Find(keys); } This is a fairly expensive operation which is why it’s pretty much reserved for deletes and not more frequent operations. It essentially determines the target entity’s primary key and then checks whether the entity exists or not. Note: I haven’t tested this on entities with surrogate keys, but I’ll get to it at some point. If you have surrogate key tables, you can define the PK key order using attributes on the model entity, but I haven’t done this (yet). Summary This article is the culmination of about two days of heavy analysis and investigation. I’ve got a whole lot more to contribute on this topic, but for now, I felt it was worthy enough to post as-is. What you’ve got here is still incredibly rough, and I haven’t done nearly enough testing. To be honest, I was quite excited by the initial results, which is why I decided to write this post. there’s an incredibly good chance that I’ve missed something in the design and implementation, so please be aware of that. I’ll be continuing to refine this approach in my main series of articles with much cleaner implementation. In the meantime though, if any of this helps anyone out there struggling with detached entities, I hope it helps. There’s precious few articles and samples that are up to date, and very few that seem to work. This is provided without any warranty of any kind! If you find any issues please e-mail me [email protected] and I’ll attempt to refactor/debug and find ways around some of the inherent limitations. In the meantime, there are a few helpful links I’ve come across in my travels on the WWW. See below. Example Solution Files [ Files ] Note: you’ll need to add the Entity Framework v6 RC package via NuGet, I haven’t included it in the archive. Helpful Links http://blog.magnusmontin.net/2013/05/30/generic-dal-using-entity-framework/ https://github.com/refactorthis/GraphDiff http://stackoverflow.com/questions/11686225/dbset-find-method-ridiculously-slow-compared-to-singleordefault-on-id http://stackoverflow.com/questions/10381106/cannot-update-many-to-many-relationships-in-entity-framework http://stackoverflow.com/questions/8413248/how-to-save-an-updated-many-to-many-collection-on-detached-entity-framework-4-1 http://stackoverflow.com/questions/6018711/generic-way-to-check-if-entity-exists-in-entity-framework

September 18, 2013

by Rob Sanders

· 163,539 Views

Introduction to ElasticSearch

Learn about ElasticSearch, an open source tool developed with Java. It is a Lucene-based, scalable, full-text search engine, and a data analysis tool.

September 17, 2013

by Hüseyin Akdoğan

CORE

· 12,113 Views · 5 Likes

EasyNetQ: Big Breaking Changes in the Advanced Bus

EasyNetQ is my little, easy to use, client API for RabbitMQ. It’s been doing really well recently. As I write this, it has 24,653 downloads on NuGet, making it by far the most popular high-level RabbitMQ API. The goal of EasyNetQ is to make working with RabbitMQ as easy as possible. I wanted junior developers to be able to use basic messaging patterns out-of-the-box with just a few lines of code and have EasyNetQ do all the heavy lifting: exchange-binding-queue configuration, error management, connection management, serialization, thread handling; all the things that make working against the low level AMQP C# API, provided by RabbitMQ, such a steep learning curve. To meet this goal, EasyNetQ has to be a very opinionated library. It has a set way of configuring exchanges, bindings and queues based on the .NET type of your messages. However, right from the first release, many users said that they liked the connection management, thread handling, and error management, but wanted to be able to set up their own broker topology. To support this, we introduced the advanced API, an idea stolen shamelessly from Ayende’s RavenDB client. You access the advanced bus (IAdvancedBus) via the Advanced property on IBus: var advancedBus = RabbitHutch.CreateBus("host=localhost").Advanced; Sometimes something can seem like a good idea at the time, and then later you think, “WTF! Why on earth did I do that?” It happens to me all the time. I thought it would be cool if I created the exchange-binding-queue topology and then passed it to the publish and subscribe methods, which would then internally declare the exchanges and queues and do the binding. I implemented a tasty little visitor pattern in my ITopologyVisitor. I optimized for my own programming pleasure, rather than an a simple, obvious, easy-to-understand API. I realized a while ago that a more straightforward set of declares on IAdvancedBus would be a far more obvious and intentional design. To this end, I’ve refactored the advanced bus to separate declares from publishing and consuming. I just pushed the changes to NuGet and have also updated the Advanced Bus documentation. Note that these are breaking changes, so please be careful if you are upgrading to the latest version, 0.12, and upwards. Here is a taste of how it works: Declare a queue, exchange and binding, and consume raw message bytes: var advancedBus = RabbitHutch.CreateBus("host=localhost").Advanced; var queue = advancedBus.QueueDeclare("my_queue"); var exchange = advancedBus.ExchangeDeclare("my_exchange", ExchangeType.Direct); advancedBus.Bind(exchange, queue, "routing_key"); advancedBus.Consume(queue, (body, properties, info) => Task.Factory.StartNew(() => { var message = Encoding.UTF8.GetString(body); Console.Out.WriteLine("Got message: '{0}'", message); })); Note that I’ve renamed ‘Subscribe’ to ‘Consume’ to better reflect the underlying AMQP method. Declare an exchange and publish a message: var advancedBus = RabbitHutch.CreateBus("host=localhost").Advanced; var exchange = advancedBus.ExchangeDeclare("my_exchange", ExchangeType.Direct); using (var channel = advancedBus.OpenPublishChannel()) { var body = Encoding.UTF8.GetBytes("Hello World!"); channel.Publish(exchange, "routing_key", new MessageProperties(), body); } You can also delete exchanges, queues and bindings: var advancedBus = RabbitHutch.CreateBus("host=localhost").Advanced; // declare some objects var queue = advancedBus.QueueDeclare("my_queue"); var exchange = advancedBus.ExchangeDeclare("my_exchange", ExchangeType.Direct); var binding = advancedBus.Bind(exchange, queue, "routing_key"); // and then delete them advancedBus.BindingDelete(binding); advancedBus.ExchangeDelete(exchange); advancedBus.QueueDelete(queue); advancedBus.Dispose(); I think these changes make for a much better advanced API. Have a look at the documentation for the details.

September 13, 2013

by Mike Hadlow

· 12,401 Views

How to shard a cron

Sharding is a database partitioning technique that distributed Aggregates such as rows or documents across multiple servers; this choice for horizontal queries trades in some client complexity (whose queries must include a shard key such as a zip code or a customer id) for the capability of distributing the dataset between multiple servers, scaling not only the read but also the write capacity. On the application side, there are several singleton processes - for example cron configurations - that are usually run only once on the whole data set. For a certain category of singleton processes we can switch to a shard-like architecture that can scale first to multiple processes and when necessary to multiple servers. Step 1: identify the candidate process Take a look at your crontab or at your process scheduling configuration if you use another infrastructure. Some of the processes are aggregations of data producing statistics, and their work can already be distributed with patterns such as MapReduce. The kind of processes interesting for client sharding is the one where an operation is performed over every single element of the data set. Each element is an Aggregate and as such does not interact, in a single transaction, with other ones. Therefore these processes are intrinsically parallelizable: you only need a way to distribute the load. Some examples of shard-able processes are: rebuild the data aggregates for a new day for each user perform some consistency checks on every customer order send all pending orders perform a renewal for each user subscription Step 2: choose a shard key Once you have identified the aggregates along which to parallelize a length operation, the choice of the shard key will usually be straightforward. This key must be uniformly distributed between the N shards you want to create on the client side. Some examples: the zip code for customers the numerical, sequential id for orders a UUID for uploaded videos the transaction reference number for money transactions Step 3: divide the work with the shard key A process before the application of sharding is usually composed of two phases: select all Aggregates whose satisfy condition C apply operation O to all the selected Aggregates. The first operation can be sometime sharded directly, transforming each of the N processes in: select all Aggregates whose satisfy condition C and whose shard key is equal to this shard's number modulo N. apply operation O to all the selected Aggregates. For example, the first operation for shard 0 of 4 can be accomplished by an SQL query: SELECT * FROM aggregate_table WHERE outdated=true // condition C AND aggregate_id % 4=0// sharding Precalculating aggregate_id % 4 can improve the performance of the query, depending on your database; however it can make more difficult to rescale the number of processes. When you switch to 8 or 16 client shards it will be necessary to stop all current running processes, recalculate the column and restart the new batch. Furthermore, the performance of ALTER TABLE is usually not good on large tables which are the subject of this client sharding technique. Some times we're not able to divide the query in a partition of the original data set directly in the database. The pattern becomes: select id (and shard key if different) for all Aggregates whose satisfy condition C. filter the subset by only considering the id whose modulo N is equal to this shard's number. apply operation O to all the selected Aggregates. For example, I use this second form while using multiple client with MongoDB. I do not know if it's possible to query ObjectIds by their modulo, so I resort to selecting all of them and then filtering them out on the client side: $id = (string) $document['_id']; $numericalValue = hexdec(substr($id, -4)); // without substr() the conversion will overflow 32-bit integers if ($numericalValues % $shards == $shard) { ... } This is only useful under the assumption that is not the query that's taking too much time in the original process, but the application of the O operation to all of its results. Note also that these solutions needs well-behaving processes that do not intervene on the data of each other: the filtering is left to the programmer. In general, it is also necessary to guarantee mutual exclusion with the original singleton process; this usually comes up when deploying the battery of N crons, as they should not start until the last of the singleton processes has terminated not to be started again.

September 4, 2013

by Giorgio Sironi

· 6,880 Views

API Gateway and API Portal - The pillars of API Management and the evolution of SOA

API Management solutions must combine an API Portal (for signing up developers) with an API Gateway (to link back to the enterprise). But where do these come from, and what is the relationship with SOA? To answer these questions, first let's look at a bit of history: In the 2000's, we had the SOA Gateway and the SOA Registry, working hand-in-hand. This was "SOA Governance". The SOA Registry (with a Repository) was intended to be the "central store of truth" for information about Web Services. It was often the public face of SOA Governance, the part which people could see. Usually the services in the registry took the form of heavyweight SOAP services, defined by WSDLs. The problem was that developers were often forced to register their SOAP services in the registry, rather than feeling that it was something beneficial to them. Browsing the registry was also a chore, involving the use of UDDI, also a heavyweight protocol (in fact, it was built on SOAP). Fast-forward to the current decade, and we find that the SOA Registry has been replaced by the API Portal. An API portal is also the "central store of truth", but now it includes REST APIs definitions (usually expressed using a Swagger-type format) as well as SOAP services. The API Portal is designed to be useful and helpful to developers who wish to build apps, rather than feeling like a chore to use. The lesson of SOA was that an attitude of "If we build it, they will come" (or "If we put it in the SOA Registry, people will use it") does not work. You have to make it into a pleasant experience for developers. API portals work for the very reason that SOA registries did not work: usability. Just like the SOA Gateway worked with the SOA Registry, so the API Gateway works hand-in-hand with the API Portal. Together, the combination of the API Portal with the API Gateway constitutes "API Management". The API Portal is for developers to sign up to use APIs, receive API Keys and quotas, and the API Gateway operates at runtime, managing the API Key usage and enforcing the API usage quotas. The API Gateway also performs the very important task of bridging from the technologies used by API clients (REST, OAuth) to the technologies used in the enterprise (Kerberos, SAML, or proprietary identity tokens such as CA SiteMinder smsession tokens). For more on this bridging, check out my webinar with Jason Cardinal from Identica tomorrow on "Bridging APIs to Enterprise Infrastructure". Gartner defines the combination of SOA Governance and API Management as "Application Services Governance". I'm proud to say that Axway (which acquired Vordel in 2012) is recognized by Gartner as a Leader in the category of Application Services Governance. We've seen an evolution of technologies (SOAP to REST) and approach (the UDDI registry to the web-based API Portal) in the journey from SOA Governance to API Management. From 30,000 feet, SOA Governance and API Management might look similar, but the new approach of API Management has already outshone SOA. The API Gateway and API Portal are key to this.

September 3, 2013

by Mitch Pronschinske

· 7,874 Views

Assigning UUIDs to Neo4j Nodes and Relationships

TL;DR: This blog post features a small demo project on github: neo4j-uuid and explains how to automatically assign UUIDs to nodes and relationships in Neo4j. A very brief introduction into Neo4j 1.9′s KernelExtensionFactory is included as well. A Little Rant on Neo4j Node/Relationship IDs In a lot of use cases there is demand for storing a reference to a Neo4j node or relationship in a third party system. The first naive idea probably is to use the internal node/relationship id that Neo4j provides. Do not do that! Ever! You ask why? Well, Neo4j’s id is basically a offset in one of the store files Neo4j uses (with some math involved). Assume you delete couple of nodes. This produces holes in the store files that Neo4j might reclaim when creating new nodes later on. And since the id is a file offset there is a chance that the new node will have exactly the same id like the previously deleted node. If you don’t synchronously update all node id references stored elsewhere, you’re in trouble. If neo4j would be completely redeveloped from scratch the getId() method would not be part of the public API. As long as you use node ids only inside a request of an application for example, there’s nothing wrong. To repeat myself: Never ever store a node id in a third party system. I have officially warned you. UUIDs Enough of ranting, let’s see what we can do to safely store node references in an external system. Basically we need an identifier that has no semantics in contrast to the node id. A common approach to this is using Universally Unique Identifiers (UUID). Java JDK offers a UUID implementation, so we could potentially use UUID.randomUUID(). Unfortunately random UUIDs are slow to generate. A preferred approach is to use the machine’s MAC and a timestamp as base for the UUID – this should provide enough uniqueness. There a nice library out there at http://wiki.fasterxml.com/JugHome providing exactly what we need. Automatic UUID Assignments For convenience it would be great if all fresh created nodes and relationships get automatically assigned a uuid property without doing this explicitly. Fortunately Neo4j supports TransactionEventHandlers, a callback interface pluging into transaction handling. A TransactionEventHandler has a chance to modify or veto any transaction. It’s a sharp tool which can have significant negative performance impact if used the wrong way. I’ve implemented a UUIDTransactionEventHandler that performs the following tasks: Populate a UUID property for each new node or relationship Reject a transaction if a manual modification of a UUID is attempted; either assignment or removal public class UUIDTransactionEventHandler implements TransactionEventHandler { public static final String UUID_PROPERTY_NAME = "uuid"; private final TimeBasedGenerator uuidGenerator = Generators.timeBasedGenerator(); @Override public Object beforeCommit(TransactionData data) throws Exception { checkForUuidChanges(data.removedNodeProperties(), "remove"); checkForUuidChanges(data.assignedNodeProperties(), "assign"); checkForUuidChanges(data.removedRelationshipProperties(), "remove"); checkForUuidChanges(data.assignedRelationshipProperties(), "assign"); populateUuidsFor(data.createdNodes()); populateUuidsFor(data.createdRelationships()); return null; } @Override public void afterCommit(TransactionData data, java.lang.Object state) { } @Override public void afterRollback(TransactionData data, java.lang.Object state) { } /** * @param propertyContainers set UUID property for a iterable on nodes or relationships */ private void populateUuidsFor(Iterable propertyContainers) { for (PropertyContainer propertyContainer : propertyContainers) { if (!propertyContainer.hasProperty(UUID_PROPERTY_NAME)) { final UUID uuid = uuidGenerator.generate(); final StringBuilder sb = new StringBuilder(); sb.append(Long.toHexString(uuid.getMostSignificantBits())).append(Long.toHexString(uuid.getLeastSignificantBits())); propertyContainer.setProperty(UUID_PROPERTY_NAME, sb.toString()); } } } private void checkForUuidChanges(Iterable> changeList, String action) { for (PropertyEntry removedProperty : changeList) { if (removedProperty.key().equals(UUID_PROPERTY_NAME)) { throw new IllegalStateException("you are not allowed to " + action + " " + UUID_PROPERTY_NAME + " properties"); } } } } Setting up Using KernelExtensionFactory There are two remaining tasks for full automation of UUID assignments: We need to setup autoindexing for uuid properties to have a convenient way to look up nodes or relationships by UUID We need to register UUIDTransactionEventHandler with the graph database Since version 1.9 Neo4j has the notion of KernelExtensionFactory. Using KernelExtensionFactory you can supply a class that receives lifecycle callbacks when e.g. Neo4j is started or stopped. This is the right place for configuring autoindexing and setting up the TransactionEventHandler. Since JVM’s ServiceLoader is used KernelExtenstionFactories need to be registered in a file META-INF/services/org.neo4j.kernel.extension.KernelExtensionFactory by listing all implementations you want to use: org.neo4j.extension.uuid.UUIDKernelExtensionFactory KernelExtensionFactories can declare dependencies, therefore declare a inner interface (“Dependencies” in code) below that just has getters. Using proxies Neo4j will implement this class and supply you with the required dependencies. The dependencies are match on requested type, see Neo4j’s source code what classes are supported for being dependencies. KernelExtensionFactories must implement a newKernelExtension method that is supposed to return a instance of LifeCycle. For our UUID project we return a instance of UUIDLifeCycle: package org.neo4j.extension.uuid; import org.neo4j.graphdb.GraphDatabaseService; import org.neo4j.graphdb.PropertyContainer; import org.neo4j.graphdb.event.TransactionEventHandler; import org.neo4j.graphdb.factory.GraphDatabaseSettings; import org.neo4j.graphdb.index.AutoIndexer; import org.neo4j.graphdb.index.IndexManager; import org.neo4j.kernel.configuration.Config; import org.neo4j.kernel.lifecycle.LifecycleAdapter; import java.util.Map; /** * handle the setup of auto indexing for UUIDs and registers a {@link UUIDTransactionEventHandler} */ class UUIDLifeCycle extends LifecycleAdapter { private TransactionEventHandler transactionEventHandler; private GraphDatabaseService graphDatabaseService; private IndexManager indexManager; private Config config; UUIDLifeCycle(GraphDatabaseService graphDatabaseService, Config config) { this.graphDatabaseService = graphDatabaseService; this.indexManager = graphDatabaseService.index(); this.config = config; } /** * since {@link org.neo4j.kernel.NodeAutoIndexerImpl#start()} is called *after* {@link org.neo4j.extension.uuid.UUIDLifeCycle#start()} it would apply config settings for auto indexing. To prevent this we change config here. * @throws Throwable */ @Override public void init() throws Throwable { Map params = config.getParams(); params.put(GraphDatabaseSettings.node_auto_indexing.name(), "true"); params.put(GraphDatabaseSettings.relationship_auto_indexing.name(), "true"); config.applyChanges(params); } @Override public void start() throws Throwable { startUUIDIndexing(indexManager.getNodeAutoIndexer()); startUUIDIndexing(indexManager.getRelationshipAutoIndexer()); transactionEventHandler = new UUIDTransactionEventHandler(); graphDatabaseService.registerTransactionEventHandler(transactionEventHandler); } @Override public void stop() throws Throwable { stopUUIDIndexing(indexManager.getNodeAutoIndexer()); stopUUIDIndexing(indexManager.getRelationshipAutoIndexer()); graphDatabaseService.unregisterTransactionEventHandler(transactionEventHandler); } void startUUIDIndexing(AutoIndexer autoIndexer) { autoIndexer.startAutoIndexingProperty(UUIDTransactionEventHandler.UUID_PROPERTY_NAME); } void stopUUIDIndexing(AutoIndexer autoIndexer) { autoIndexer.stopAutoIndexingProperty(UUIDTransactionEventHandler.UUID_PROPERTY_NAME); } } Most of the code is pretty much straight forward, l.44/45 set up autoindexing for uuid property. l48 registers the UUIDTransactionEventHandler with the graph database. Not that obvious is the code in the init() method. Neo4j’s NodeAutoIndexerImpl configures autoindexing itself and switches it on or off depending on the respective config option. However we want to have autoindexing always switched on. Unfortunately NodeAutoIndexerImpl is run after our code and overrides our settings. That’s we l.37-40 tweaks the config settings to force nice behaviour of NodeAutoIndexerImpl. Looking up Nodes or Relationships for UUID For completeness the project also contains a trivial unmanaged extension for looking up nodes and relationships using the REST interface, see UUIDRestInterface. By sending a HTTP GET to http://localhost:7474/db/data/node/ the node’s internal id returned. Build System and Testing For building the project, Gradle is used; build.gradle is trivial. Of course couple of tests are included. As a long standing addict I’ve obviously used Spock for testing. See the test code here. Final Words A downside of this implementation is that each and every node and relationships gets indexed. Indexing always trades write performance for read performance. Keep that in mind. It might make sense to get rid of unconditional auto indexing and put some domain knowledge into the TransactionEventHandler to assign only those nodes uuids and index them that are really used for storing in an external system.

August 22, 2013

by Stefan Armbruster

· 11,712 Views

DyngoDB: A MongoDB Interface for DynamoDB

You might be asking yourself, 'why do I need a MongoDB-like experience for DynamoDB when there are already full-MongoDB cloud services like MMS, MongoLab, MongoHQ and MongoDirector? One developer believes there is a need and has set up an experimental project called DyngoDB. It provides the MongoDB-style interface in front of Amazon's DynamoDB and their CloudSearch service. Apparently, in the developer's case, he only wants the MongoDB interface but prefers the DynamoDB storage engine. We'll have to see if other developers also have this specific set of preferences.

August 20, 2013

by Mitch Pronschinske

· 5,780 Views

I was facing a problem where while a person logs out his session is invalidated but the JSESSIONID still remained in the browser. As a result while logging in the Java API used to get the request from the browser along with a JSESSIONID(Just the ID since the session was invalidated) and would create the new session with the same ID. To fix this problem I used the above code so that whenever a user logs out the entire JSESSIONID becomes empty and thus cookie wont exist for that site.Anyone using JAVA can utilize this in their code. @RequestMapping(value = "/logout", method = RequestMethod.POST) public void logout(HttpServletRequest request, HttpServletResponse response) { /* Getting session and then invalidating it */ HttpSession session = request.getSession(false); if (request.isRequestedSessionIdValid() && session != null) { session.invalidate(); } handleLogOutResponse(response); } /** * This method would edit the cookie information and make JSESSIONID empty * while responding to logout. This would further help in order to. This would help * to avoid same cookie ID each time a person logs in * @param response */ private void handleLogOutResponse(HttpServletResponse response) { Cookie[] cookies = request.getCookies(); for (Cookie cookie : cookies) { cookie.setMaxAge(0); cookie.setValue(null); cookie.setPath("/"); response.addCookie(cookie); } }

August 15, 2013

by Shiv Kumar Ganesh

· 41,370 Views · 2 Likes

neo4j: Extracting a subgraph as an adjacency matrix and calculating eigenvector centrality with JBLAS

Earlier in the week I wrote a blog post showing how to calculate the eigenvector centrality of an adjacency matrix using JBLAS and the next step was to work out the eigenvector centrality of a neo4j sub graph. There were 3 steps involved in doing this: Export the neo4j sub graph as an adjacency matrix Run JBLAS over it to get eigenvector centrality scores for each node Write those scores back into neo4j I decided to make use of the Paul Revere data set from Kieran Healy’s blog post which consists of people and groups that they had membership of. The script to import the data is on my fork of the revere repository. Having imported the data the next step was to write a cypher query which would give me the people in anadjacency matrix with the number in each column/row intersection showing how many common groups that pair of people had. I thought it’d be easier to build this query incrementally so I started out writing a query which would return one row of the adjacency matrix: MATCH p1:Person, p2:Person WHERE p1.name = "Paul Revere" WITH p1, p2 MATCH p = p1-[?:MEMBER_OF]->()<-[?:MEMBER_OF]-p2 WITH p1.name AS p1, p2.name AS p2, COUNT(p) AS links ORDER BY p2 RETURN p1, COLLECT(links) AS row Here we start with Paul Revere and then find the relationships between him and every other person by way of a common group membership. We use an optional relationship since we need to include a value in each column/row of our adjacency matrix we need to return a 0 value for anyone he doesn’t intersect with. If we run that query we get back the following: +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | p1 | row | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "Paul Revere" | [2,1,1,1,1,1,1,1,1,1,1,1,1,1,2,3,1,1,1,1,1,1,3,3,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,3,2,1,1,2,1,2,1,1,1,1,1,0,1,1,1,1,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,2,1,1,1,1,1,1,2,1,3,1,3,2,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,0,1,0,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,4,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3,1,1,1,1,1,1,2,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,3,1,1,2,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3,1,1,2,1,1,1,1,1,1,1,1,3,1,1,1,1,3,1,1,1,1,0,1,2,1,1,1,1,1,1,1] | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ As it turns outs we’ve only got to remove the WHERE clause and order everybody and we’ve get the adjacency matrix for everyone: MATCH p1:Person, p2:Person WITH p1, p2 MATCH p = p1-[?:MEMBER_OF]->()<-[?:MEMBER_OF]-p2 WITH p1.name AS p1, p2.name AS p2, COUNT(p) AS links ORDER BY p2 RETURN p1, COLLECT(links) AS row ORDER BY p1 +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | p1 | row | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "Abiel Ruddock" | [0,1,1,1,0,1,0,1,0,0,1,1,1,0,1,2,0,1,0,1,1,1,2,2,1,0,0,1,1,0,1,1,1,1,1,0,0,0,0,1,1,0,0,2,2,0,0,1,1,2,1,1,1,0,1,0,1,1,0,0,2,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,0,1,1,0,1,1,1,1,1,1,1,1,1,0,2,1,2,1,0,0,0,0,1,1,0,1,0,0,1,0,2,0,0,1,0,0,0,1,0,0,2,0,1,0,1,1,1,0,0,1,1,0,0,0,0,0,0,2,0,0,0,0,0,0,0,1,0,1,1,0,1,1,1,2,0,0,1,1,0,0,2,0,1,2,1,1,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,1,2,1,0,1,1,1,1,1,0,0,1,1,0,0,0,0,1,0,1,1,0,0,1,0,0,2,1,0,0,1,1,1,1,0,1,0,0,0,1,0,1,0,1,1,0,0,1,0,1,0,1,0,0,1,0,2,1,1,0,0,2,0,1,0,0,0,0,1,0,1,0,1,0,1,0] | | "Abraham Hunt" | [1,0,1,1,0,1,0,0,0,0,0,1,0,0,0,1,0,1,0,1,1,0,1,1,0,0,0,1,1,0,1,0,0,1,0,0,0,0,0,1,0,0,0,1,1,0,0,0,1,1,1,1,1,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,0,1,1,0,1,0,1,1,1,1,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,1,1,0,1,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,1,1,1,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,1,0] | ... +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 254 rows 9897 ms The next step was to wire up the query results with the JBLAS code that I wrote in the previous post. I ended up with the following: public class Neo4jAdjacencyMatrixSpike { public static void main(String[] args) throws SQLException { ClientResponse response = client() .resource("http://localhost:7474/db/data/cypher") .entity(queryAsJson(), MediaType.APPLICATION_JSON) .accept(MediaType.APPLICATION_JSON) .post(ClientResponse.class); JsonNode result = response.getEntity(JsonNode.class); ArrayNode rows = (ArrayNode) result.get("data"); List principalEigenvector = JBLASSpike.getPrincipalEigenvector(new DoubleMatrix(asMatrix(rows))); List people = asPeople(rows); updatePeopleWithEigenvector(people, principalEigenvector); System.out.println(sort(people).take(10)); } private static double[][] asMatrix(ArrayNode rows) { double[][] matrix = new double[rows.size()][254]; int rowCount = 0; for (JsonNode row : rows) { ArrayNode matrixRow = (ArrayNode) row.get(2); double[] rowInMatrix = new double[254]; matrix[rowCount] = rowInMatrix; int columnCount = 0; for (JsonNode jsonNode : matrixRow) { matrix[rowCount][columnCount] = jsonNode.asInt(); columnCount++; } rowCount++; } return matrix; } // rest cut for brevity } Here we are taking the query and then converting it into an array of arrays before passing it to our JBLAS code to calculate the principal eigenvector. We then return the top 10 people: Person{name='William Cooper', eigenvector=0.172604992239612, nodeId=68}, Person{name='Nathaniel Barber', eigenvector=0.17260499223961198, nodeId=18}, Person{name='John Hoffins', eigenvector=0.17260499223961195, nodeId=118}, Person{name='Paul Revere', eigenvector=0.17171142003936804, nodeId=207}, Person{name='Caleb Davis', eigenvector=0.16383970722169897, nodeId=71}, Person{name='Caleb Hopkins', eigenvector=0.16383970722169897, nodeId=121}, Person{name='Henry Bass', eigenvector=0.16383970722169897, nodeId=21}, Person{name='Thomas Chase', eigenvector=0.16383970722169897, nodeId=54}, Person{name='William Greenleaf', eigenvector=0.16383970722169897, nodeId=104}, Person{name='Edward Proctor', eigenvector=0.15600043886738055, nodeId=201} I get back the same 10 people as Kieran Healy although they have different eigenvector values. As far as I understand the absolute value doesn’t matter, what’s more important is the relative score to other people so I think we’re ok. The final step was to write these eigenvector values back into neo4j which we can do with the following code: private static void updateNeo4jWithEigenvectors(List people) { for (Person person : people) { ObjectNode request = JsonNodeFactory.instance.objectNode(); request.put("query", "START p = node({nodeId}) SET p.eigenvectorCentrality={value}"); ObjectNode params = JsonNodeFactory.instance.objectNode(); params.put("nodeId", person.nodeId); params.put("value", person.eigenvector); request.put("params", params); client() .resource("http://localhost:7474/db/data/cypher") .entity(request, MediaType.APPLICATION_JSON) .accept(MediaType.APPLICATION_JSON) .post(ClientResponse.class); } } Now we might use that eigenvector centrality value in other queries, such as one to show who the most central/potentially influential people are in each group: MATCH g:Group<-[:MEMBER_OF]-p WITH g.name AS group, p.name AS personName, p.eigenvectorCentrality as eigen ORDER BY eigen DESC WITH group, COLLECT(personName) AS people RETURN group, HEAD(people) + [HEAD(TAIL(people))] + [HEAD(TAIL(TAIL(people)))] AS mostCentral +--------------------------------------------------------------------------+ | group | mostCentral | +--------------------------------------------------------------------------+ | "StAndrewsLodge" | ["Paul Revere","Joseph Warren","Thomas Urann"] | | "BostonCommittee" | ["William Cooper","Nathaniel Barber","John Hoffins"] | | "LoyalNine" | ["Caleb Hopkins","William Greenleaf","Caleb Davis"] | | "LondonEnemies" | ["William Cooper","Nathaniel Barber","John Hoffins"] | | "LongRoomClub" | ["Paul Revere","John Hancock","Benjamin Clarke"] | | "NorthCaucus" | ["William Cooper","Nathaniel Barber","John Hoffins"] | | "TeaParty" | ["William Cooper","Nathaniel Barber","John Hoffins"] | +--------------------------------------------------------------------------+ 7 rows 280 ms Our top ten feature frequently although it’s interesting that only one of them is in the ‘LongRoomClub’ group which perhaps indicates that people in that group are less likely to be members of the other ones. I’d be interested if anyone can think of other potential uses for eigenvector centrality once we’ve got it back in the graph. All the code described in this post is on github if you want to take it for a spin.

August 12, 2013

by Mark Needham

· 5,692 Views

EclipseLink MOXy and the Java API for JSON Processing - Object Model APIs

The Java API for JSON Processing (JSR-353) is the Java standard for producing and consuming JSON which was introduced as part of Java EE 7. JSR-353 includes object (DOM like) and stream (StAX like) APIs. In this post I will demonstrate the initial JSR-353 support we have added to MOXy's JSON binding in EclipseLink 2.6. You can now use MOXy to marshal to: javax.json.JsonArrayBuilder javax.json.JsonObjectBuilder And unmarshal from: javax.json.JsonStructure javax.json.JsonObject javax.json.JsonArray You can try this out today using a nightly build of EclipseLink 2.6.0: http://www.eclipse.org/eclipselink/downloads/nightly.php The JSR-353 reference implementation is available here: https://java.net/projects/jsonp/downloads/download/ri/javax.json-ri-1.0.zip Java Model Below is the simple customer model that we will use for this post. Note for this example we are only using the standard JAXB (JSR-222) annotations. Customer package blog.jsonp.moxy; import java.util.*; import javax.xml.bind.annotation.*; @XmlType(propOrder={"id", "firstName", "lastName", "phoneNumbers"}) public class Customer { private int id; private String firstName; private String lastName; private List phoneNumbers = new ArrayList(); public int getId() { return id; } public void setId(int id) { this.id = id; } public String getFirstName() { return firstName; } public void setFirstName(String firstName) { this.firstName = firstName; } @XmlElement(nillable=true) public String getLastName() { return lastName; } public void setLastName(String lastName) { this.lastName = lastName; } @XmlElement public List getPhoneNumbers() { return phoneNumbers; } } PhoneNumber package blog.jsonp.moxy; import javax.xml.bind.annotation.*; @XmlAccessorType(XmlAccessType.FIELD) public class PhoneNumber { private String type; private String number; public String getType() { return type; } public void setType(String type) { this.type = type; } public String getNumber() { return number; } public void setNumber(String number) { this.number = number; } } jaxb.properties To specify MOXy as your JAXB provider you need to include a file called jaxb.properties in the same package as your domain model with the following entry (see: Specifying EclipseLink MOXy as your JAXB Provider) javax.xml.bind.context.factory=org.eclipse.persistence.jaxb.JAXBContextFactory Marshal Demo In the demo code below we will use a combination of JSR-353 and MOXy APIs to produce JSON. JSR-353's JsonObjectBuilder and JsonArrayBuilder are used to produces instances of JsonObject and JsonArray. We can use MOXy to marshal to these builders by wrapping them in instances of MOXy's JsonObjectBuilderResult and JsonArrayBuilderResult. package blog.jsonp.moxy; import java.util.*; import javax.json.*; import javax.json.stream.JsonGenerator; import javax.xml.bind.*; import org.eclipse.persistence.jaxb.JAXBContextProperties; import org.eclipse.persistence.oxm.json.*; public class MarshalDemo { public static void main(String[] args) throws Exception { // Create the EclipseLink JAXB (MOXy) Marshaller Map jaxbProperties = new HashMap(2); jaxbProperties.put(JAXBContextProperties.MEDIA_TYPE, "application/json"); jaxbProperties.put(JAXBContextProperties.JSON_INCLUDE_ROOT, false); JAXBContext jc = JAXBContext.newInstance(new Class[] {Customer.class}, jaxbProperties); Marshaller marshaller = jc.createMarshaller(); // Create the JsonArrayBuilder JsonArrayBuilder customersArrayBuilder = Json.createArrayBuilder(); // Build the First Customer Customer customer = new Customer(); customer.setId(1); customer.setFirstName("Jane"); customer.setLastName(null); PhoneNumber phoneNumber = new PhoneNumber(); phoneNumber.setType("cell"); phoneNumber.setNumber("555-1111"); customer.getPhoneNumbers().add(phoneNumber); // Marshal the First Customer Object into the JsonArray JsonArrayBuilderResult result = new JsonArrayBuilderResult(customersArrayBuilder); marshaller.marshal(customer, result); // Build List of PhoneNumer Objects for Second Customer List phoneNumbers = new ArrayList(2); PhoneNumber workPhone = new PhoneNumber(); workPhone.setType("work"); workPhone.setNumber("555-2222"); phoneNumbers.add(workPhone); PhoneNumber homePhone = new PhoneNumber(); homePhone.setType("home"); homePhone.setNumber("555-3333"); phoneNumbers.add(homePhone); // Marshal the List of PhoneNumber Objects JsonArrayBuilderResult arrayBuilderResult = new JsonArrayBuilderResult(); marshaller.marshal(phoneNumbers, arrayBuilderResult); customersArrayBuilder // Use JSR-353 APIs for Second Customer's Data .add(Json.createObjectBuilder() .add("id", 2) .add("firstName", "Bob") .addNull("lastName") // Included Marshalled PhoneNumber Objects .add("phoneNumbers", arrayBuilderResult.getJsonArrayBuilder()) ) .build(); // Write JSON to System.out Map jsonProperties = new HashMap(1); jsonProperties.put(JsonGenerator.PRETTY_PRINTING, true); JsonWriterFactory writerFactory = Json.createWriterFactory(jsonProperties); JsonWriter writer = writerFactory.createWriter(System.out); writer.writeArray(customersArrayBuilder.build()); writer.close(); } } Highlighted lines: 36, 37, 38, 54, 55, 64 Output Below is the output from running the marshal demo (MarshalDemo). The highlighted portions (lines 2-12 and 18-25) correspond to the portions that were populated from our Java model. [ { "id":1, "firstName":"Jane", "lastName":null, "phoneNumbers":[ { "type":"cell", "number":"555-1111" } ] }, { "id":2, "firstName":"Bob", "lastName":null, "phoneNumbers":[ { "type":"work", "number":"555-2222" }, { "type":"home", "number":"555-3333" } ] } ] Highlighted lines: 2-12, 18-25 Unmarshal Demo MOXy enables you to unmarshal from a JSR-353 JsonStructure (JsonObject or JsonArray). To do this simply wrap the JsonStructure in an instance of MOXy's JsonStructureSource and use one of the unmarshal operations that takes an instance of Source. package blog.jsonp.moxy; import java.io.FileInputStream; import java.util.*; import javax.json.*; import javax.xml.bind.*; import org.eclipse.persistence.jaxb.JAXBContextProperties; import org.eclipse.persistence.oxm.json.JsonStructureSource; public class UnmarshalDemo { public static void main(String[] args) throws Exception { try (FileInputStream is = new FileInputStream("src/blog/jsonp/moxy/input.json")) { // Create the EclipseLink JAXB (MOXy) Unmarshaller Map jaxbProperties = new HashMap(2); jaxbProperties.put(JAXBContextProperties.MEDIA_TYPE, "application/json"); jaxbProperties.put(JAXBContextProperties.JSON_INCLUDE_ROOT, false); JAXBContext jc = JAXBContext.newInstance(new Class[] {Customer.class}, jaxbProperties); Unmarshaller unmarshaller = jc.createUnmarshaller(); // Parse the JSON JsonReader jsonReader = Json.createReader(is); // Unmarshal Root Level JsonArray JsonArray customersArray = jsonReader.readArray(); JsonStructureSource arraySource = new JsonStructureSource(customersArray); List customers = (List) unmarshaller.unmarshal(arraySource, Customer.class) .getValue(); for(Customer customer : customers) { System.out.println(customer.getFirstName()); } // Unmarshal Nested JsonObject JsonObject customerObject = customersArray.getJsonObject(1); JsonStructureSource objectSource = new JsonStructureSource(customerObject); Customer customer = unmarshaller.unmarshal(objectSource, Customer.class) .getValue(); for(PhoneNumber phoneNumber : customer.getPhoneNumbers()) { System.out.println(phoneNumber.getNumber()); } } } } Highlighted lines: 27-30, 37-39 Input (input.json) The following JSON input will be converted to a JsonArray using a JsonReader. [ { "id":1, "firstName":"Jane", "lastName":null, "phoneNumbers":[ { "type":"cell", "number":"555-1111" } ] }, { "id":2, "firstName":"Bob", "lastName":null, "phoneNumbers":[ { "type":"work", "number":"555-2222" }, { "type":"home", "number":"555-3333" } ] } ] Highlighted lines: 4, 15, 20, 24 Output Below is the output from running the unmarshal demo (UnmarshalDemo). Jane Bob 555-2222 555-3333

August 7, 2013

by Blaise Doughan

· 13,792 Views

NoSQL with JPA

EclipseLink, reference implementation of JPA, has JPA support for NoSQL databases (MongoDB and Oracle NoSQL) as of the version 2.4. In this tutorial we will discuss the use of MongoDB database with the JPA support of EclipseLink. The transaction previously done using the console and native java driver will be done in a web application with the help of EclipseLink. Tools and technologies used in the sample application are as follows: MongoDB version 2.4.1 MongoDB Java Driver version 2.11.1 JSF version 2.2 PrimeFaces version 3.5 EclipseLink version 2.4 Jetty 7.x Maven Plugin JDK version 1.7 Maven 3.0.4 Project Dependencies org.glassfish javax.faces 2.2.0-SNAPSHOT org.primefaces primefaces 3.5 org.primefaces.themes bootstrap 1.0.10 org.eclipse.persistence org.eclipse.persistence.jpa 2.4.0-SNAPSHOT org.eclipse.persistence org.eclipse.persistence.nosql 2.4.0-SNAPSHOT jboss jboss-j2ee 4.2.2.GA org.mongodb mongo-java-driver 2.11.1 commons-fileupload commons-fileupload 1.3 Entity Class @Entity @NoSql(dataFormat=DataFormatType.MAPPED) public class Article implements Serializable { public Article() { } @Id @GeneratedValue @Field(name="_id") private String id; @ElementCollection private List categoryLists = new ArrayList(); @Basic private String title; @Basic private String content; @Basic @Temporal(javax.persistence.TemporalType.DATE) private Date date; @Basic private String author; @ElementCollection private List tagLists = new ArrayList(); @NoSQL notation sets the data format and type and maps the NoSQL data. Because of using MongoDB in our sample application and documents in MongoDB stored in BSON format, MAP is used as data type. @ElementCollection notation maps the embedded collection into the parent document. Because more than one category and tag associated with an article would be a matter in our sample application, we map them as an element collection. Embedded Objects @Embeddable @NoSql(dataFormat=DataFormatType.MAPPED) public class Categories implements Serializable { @Basic private String category; @Embeddable @NoSql(dataFormat=DataFormatType.MAPPED) public class Tags implements Serializable { @Basic private String tag; We see @Embeddable notation at the top of the Categories and Tags’ class unlike Article entity class. The documents stored in the parent document are mapped with this notation. Please note that embedded objects do not need unique field. persistence.xml com.kodcu.entity.Article com.com.kodcu.entity.Categories com.kodcu.entity.Tags CRUD Operations index.xhtml MyBean.java public void saveArticle() { em.getTransaction().begin(); if(null == article.getId()) em.persist(article); else em.merge(article); em.getTransaction().commit(); } public void removeArticle() { em.getTransaction().begin(); em.remove(selectArticle); em.getTransaction().commit(); } 6. Demo Application Real content above and the demo application, can be accessed at NoSQL with JPA

August 6, 2013

by Hüseyin Akdoğan

CORE

· 31,666 Views · 1 Like

Getting started with CQEngine: LINQ for Java, Only Faster

CQEngine or collection query engine is a library that allows you to build indices over java collections and query them for objects using exposed properties. It offers similar capability to LINQ in .net but is thought to be faster because it builds indices over collections before querying them and uses set theory instead of iterations. In this post we will see how to query a simple collection of objects, in our example, a collection of users of a hypothetical system, using CQEngine. In a subsequent post, we will also see how iteratively searching a collection compares to querying via CQEngine. The first step is to get the CQEengine jar file. Download the jar from the CQEngine website or if you are using Maven, add the following dependency. com.googlecode.cqengine cqengine 1.0.3 Next, lets create the Class whose object we will be searching for: package co.syntx.examples.cqengine; import com.googlecode.cqengine.attribute.Attribute; import com.googlecode.cqengine.attribute.SimpleAttribute; public class User { private String username; private String password; private String fullname; private Role role; public User(String username, String password, String fullname, Role role) { super(); this.username = username; this.password = password; this.fullname = fullname; this.role = role; } public static final Attribute FULL_NAME = new SimpleAttribute("fullname") { public String getValue(User user) { return user.fullname; } }; public static final Attribute USERNAME = new SimpleAttribute("username") { public String getValue(User user) { return user.username; } }; public String getUsername() { return username; } public void setUsername(String username) { this.username = username; } public String getPassword() { return password; } public void setPassword(String password) { this.password = password; } public String getFullname() { return fullname; } public void setFullname(String fullname) { this.fullname = fullname; } public Role getRole() { return role; } public void setRole(Role role) { this.role = role; } } Next, we write a class, to perform our searches. I will go function by function. 1. Function to Build a Test Indexed Collection: In the following function, we build an indexed collection, define indices on attributes, and populate this collection with a certain number of objects. In actual usage, your collection will probably be filled with objects being returned from the DB, read from a file or other similar scenarios. public void buildIndexedCollection(int size) throws Exception { indexedUsers = CQEngine.newInstance(); indexedUsers.addIndex(HashIndex.onAttribute(User.FULL_NAME)); indexedUsers.addIndex(SuffixTreeIndex.onAttribute(User.FULL_NAME)); for (int i = 0; i < size; i++) { String username = RandomStringGenerator.generateRandomString(8,RandomStringGenerator.Mode.ALPHANUMERIC); String password = RandomStringGenerator.generateRandomString(8,RandomStringGenerator.Mode.ALPHANUMERIC); String fullname = RandomStringGenerator.generateRandomString(5,RandomStringGenerator.Mode.ALPHA) + " " + RandomStringGenerator.generateRandomString(5,RandomStringGenerator.Mode.ALPHA); Role role = new Role(); role.setName("admin"); indexedUsers.add(new User(username, password, fullname, role)); } } In line 3 we are initializing a new Indexed Collection, a reference of which is stored in the class variable indexedUsers. The reference is of type IndexedCollection In lines 4 and 5, we define two indices i) a Hash Index suitable for equal style queries. ii) a Suffix Index suitable for ends with style queried. For the purpose of this example, we are building indices only on the the Full name field. In line 8, we use a random string generator to populate dummy objects. In line 14 we add our object to our indexed collection. 2. Function to Perform Indexed Search for Exact Matches: In this function, we are querying for names that exactly match a given name. The equal function takes in the attribute upon which to perform the query and the value to search. The method equal is statically imported via a import static com.googlecode.cqengine.query.QueryFactory.*; In the example below, we are looping the results returned by the retrieve method and not doing anything with it. In your case, you may choose to return the Iterator returned by retrieve. public void indexedSearchForEquals(String fullname) throws Exception { Query query = equal(User.FULL_NAME, fullname); for (User user : indexedUsers.retrieve(query)) { // System.out.println(user.getFullname()); } } 3. Function to Perform Indexed Search for Ends With Matches: In this function, we are querying for names that end with a certain suffix. public void indexedSearchForEndsWith(String endswith) throws Exception { Query query1 = endsWith(User.FULL_NAME, endswith); for (User user : indexedUsers.retrieve(query1)) { // System.out.println(user.getFullname()); } } 4. Function to Perform Indexed Search for Equals or Ends With Matches: This function is a combination of both queries mentioned below and has an or relation between them. public void indexedSearchForEqualOrEndsWith(String equals, String ends) throws Exception { Query query = or(equal(User.FULL_NAME, equals),endsWith(User.FULL_NAME, ends)); for (User user : indexedUsers.retrieve(query)) { // System.out.println(user.getFullname()); } } 5. Putting it together: All the functions above belong to a class called CQEngineTest. We create a new object, build a test collection, then search for either exact matches, strings that end with a certain suffix or either. CQEngineTest test = new CQEngineTest(); test.buildIndexedCollection(size); test.indexedSearchForEqualOrEndsWith("test", "test"); In this example, we have used a Hash Index and a Suffix Tree Index. There are many other types of indices that you can choose depending on the type of query that you want to perform. A list of these indices and when to use them can be found on the cqengine project page. Also, apart from equal or endswith operations, there are others that you would typically expect to find. In a subsequent post we will also see how an indexed search compares with a typical iterative search in terms of times.

August 5, 2013

by Faheem Sohail

· 28,112 Views · 1 Like

JPA Searching Using Lucene - A Working Example with Spring and DBUnit

Working Example on Github There's a small, self contained mavenised example project over on Github to accompany this post - check it out here:https://github.com/adrianmilne/jpa-lucene-spring-demo Running the Demo See the README file over on GitHub for details of running the demo. Essentially - it's just running the Unit Tests, with the usual maven build and test results output to the console - example below. This is the result of running the DBUnit test, which inserts Book data into the HSQL database using JPA, and then uses Lucene to query the data, testing that the expected Books are returned (i.e. only those int he SCI-FI category, containing the word 'Space', and ensuring that any with 'Space' in the title appear before those with 'Space' only in the description. The Book Entity Our simple example stores Books. The Book entity class below is a standard JPA Entity with a few additional annotations to identify it to Lucene: @Indexed - this identifies that the class will be added to the Lucene index. You can define a specific index by adding the 'index' attribute to the annotation. We're just choosing the simplest, minimal configuration for this example. In addition to this - you also need to specify which properties on the entity are to be indexed, and how they are to be indexed. For our example we are again going for the default option by just adding an @Field annotation with no extra parameters. We are adding one other annotation to the 'title' field - @Boost - this is just telling Lucene to give more weight to search term matches that appear in this field (than the same term appearing in the description field). This example is purposefully kept minimal in terms of the ins-and-outs of Lucene (I may cover that in a later post) - we're really just concentrating on the integration with JPA and Spring for now. package com.cor.demo.jpa.entity; import javax.persistence.Entity; import javax.persistence.EnumType; import javax.persistence.Enumerated; import javax.persistence.GeneratedValue; import javax.persistence.Id; import javax.persistence.Lob; import org.hibernate.search.annotations.Boost; import org.hibernate.search.annotations.Field; import org.hibernate.search.annotations.Indexed; /** * Book JPA Entity. */ @Entity @Indexed public class Book { @Id @GeneratedValue private Long id; @Field @Boost(value = 1.5f) private String title; @Field @Lob private String description; @Field @Enumerated(EnumType.STRING) private BookCategory category; public Book(){ } public Book(String title, BookCategory category, String description){ this.title = title; this.category = category; this.description = description; } public Long getId() { return id; } public void setId(Long id) { this.id = id; } public String getTitle() { return title; } public void setTitle(String title) { this.title = title; } public BookCategory getCategory() { return category; } public void setCategory(BookCategory category) { this.category = category; } public String getDescription() { return description; } public void setDescription(String description) { this.description = description; } @Override public String toString() { return "Book [id=" + id + ", title=" + title + ", description=" + description + ", category=" + category + "]"; } } The Book Manager The BookManager class acts as a simple service layer for the Book operations - used for adding books and searching books. As you can see, the JPA database resources are autowired in by Spring from the application-context.xml. We are just using an in-memory hsql database in this example. package com.cor.demo.jpa.manager; import java.util.List; import javax.persistence.EntityManager; import javax.persistence.PersistenceContext; import javax.persistence.PersistenceContextType; import javax.persistence.Query; import org.hibernate.search.jpa.FullTextEntityManager; import org.hibernate.search.jpa.Search; import org.hibernate.search.query.dsl.QueryBuilder; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.context.annotation.Scope; import org.springframework.stereotype.Component; import org.springframework.transaction.annotation.Transactional; import com.cor.demo.jpa.entity.Book; import com.cor.demo.jpa.entity.BookCategory; /** * Manager for persisting and searching on Books. Uses JPA and Lucene. */ @Component @Scope(value = "singleton") public class BookManager { /** Logger. */ private static Logger LOG = LoggerFactory.getLogger(BookManager.class); /** JPA Persistence Unit. */ @PersistenceContext(type = PersistenceContextType.EXTENDED, name = "booksPU") private EntityManager em; /** Hibernate Full Text Entity Manager. */ private FullTextEntityManager ftem; /** * Method to manually update the Full Text Index. This is not required if inserting entities * using this Manager as they will automatically be indexed. Useful though if you need to index * data inserted using a different method (e.g. pre-existing data, or test data inserted via * scripts or DbUnit). */ public void updateFullTextIndex() throws Exception { LOG.info("Updating Index"); getFullTextEntityManager().createIndexer().startAndWait(); } /** * Add a Book to the Database. */ @Transactional public Book addBook(Book book) { LOG.info("Adding Book : " + book); em.persist(book); return book; } /** * Delete All Books. */ @SuppressWarnings("unchecked") @Transactional public void deleteAllBooks() { LOG.info("Delete All Books"); Query allBooks = em.createQuery("select b from Book b"); List books = allBooks.getResultList(); // We need to delete individually (rather than a bulk delete) to ensure they are removed // from the Lucene index correctly for (Book b : books) { em.remove(b); } } @SuppressWarnings("unchecked") @Transactional public void listAllBooks() { LOG.info("List All Books"); LOG.info("------------------------------------------"); Query allBooks = em.createQuery("select b from Book b"); List books = allBooks.getResultList(); for (Book b : books) { LOG.info(b.toString()); getFullTextEntityManager().index(b); } } /** * Search for a Book. */ @SuppressWarnings("unchecked") @Transactional public List search(BookCategory category, String searchString) { LOG.info("------------------------------------------"); LOG.info("Searching Books in category '" + category + "' for phrase '" + searchString + "'"); // Create a Query Builder QueryBuilder qb = getFullTextEntityManager().getSearchFactory().buildQueryBuilder().forEntity(Book.class).get(); // Create a Lucene Full Text Query org.apache.lucene.search.Query luceneQuery = qb.bool() .must(qb.keyword().onFields("title", "description").matching(searchString).createQuery()) .must(qb.keyword().onField("category").matching(category).createQuery()).createQuery(); Query fullTextQuery = getFullTextEntityManager().createFullTextQuery(luceneQuery, Book.class); // Run Query and print out results to console List result = (List) fullTextQuery.getResultList(); // Log the Results LOG.info("Found Matching Books :" + result.size()); for (Book b : result) { LOG.info(" - " + b); } return result; } /** * Convenience method to get Full Test Entity Manager. Protected scope to assist mocking in Unit * Tests. * @return Full Text Entity Manager. */ protected FullTextEntityManager getFullTextEntityManager() { if (ftem == null) { ftem = Search.getFullTextEntityManager(em); } return ftem; } /** * Get the JPA Entity Manager (required for the DBUnit Tests). * @return Entity manager */ protected EntityManager getEntityManager() { return em; } /** * Sets the JPA Entity Manager (required to assist with mocking in Unit Test) * @param em EntityManager */ protected void setEntityManager(EntityManager em) { this.em = em; } } application-context.xml This is the Spring configuration file. You can see in the JPA Entity Manager configuration the key for 'hibernate.search.default.indexBase' is added to the jpaPropertyMap to tell Lucene where to create the index. We have also externalised the database login credentials to a properties file (as you may wish to change these for different environments), for example by updating the propertyConfigurer to look for and use a different external properties if it finds one on the file system). classpath:/system.properties Testing Using DBUnit In the project is an example of using DBUnit with Spring to test adding and searching against the database using DBUnit to populate the database with test data, exercise the Book Manager search operations and then clean the database down. This is a great way to test database functionality and can be easily integrated into maven and continuous build environments. Because DBUnit bypasses the standard JPA insertion calls - the data does not get automatically added to the Lucene index. We have a method exposed on the service interface to update the Full Text index 'updateFullTextIndex()' - calling this causes Lucene to update the index with the current data in the database. This can be useful when you are adding search to pre-populated databases to index the existing content. package com.cor.demo.jpa.manager; import java.io.InputStream; import java.util.List; import org.dbunit.DBTestCase; import org.dbunit.database.DatabaseConnection; import org.dbunit.database.IDatabaseConnection; import org.dbunit.dataset.IDataSet; import org.dbunit.dataset.xml.FlatXmlDataSetBuilder; import org.dbunit.operation.DatabaseOperation; import org.hibernate.impl.SessionImpl; import org.junit.After; import org.junit.Before; import org.junit.Test; import org.junit.runner.RunWith; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.test.context.ContextConfiguration; import org.springframework.test.context.junit4.SpringJUnit4ClassRunner; import com.cor.demo.jpa.entity.Book; import com.cor.demo.jpa.entity.BookCategory; /** * DBUnit Test - loads data defined in 'test-data-set.xml' into the database to run tests against the * BookManager. More thorough (and ultimately easier in this context) than using mocks. */ @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(locations = { "classpath:/application-context.xml" }) public class BookManagerDBUnitTest extends DBTestCase { /** Logger. */ private static Logger LOG = LoggerFactory.getLogger(BookManagerDBUnitTest.class); /** Book Manager Under Test. */ @Autowired private BookManager bookManager; @Before public void setup() throws Exception { DatabaseOperation.CLEAN_INSERT.execute(getDatabaseConnection(), getDataSet()); } @After public void tearDown() { deleteBooks(); } @Override protected IDataSet getDataSet() throws Exception { InputStream inputStream = this.getClass().getClassLoader().getResourceAsStream("test-data-set.xml"); FlatXmlDataSetBuilder builder = new FlatXmlDataSetBuilder(); return builder.build(inputStream); } /** * Get the underlying database connection from the JPA Entity Manager (DBUnit needs this connection). * @return Database Connection * @throws Exception */ private IDatabaseConnection getDatabaseConnection() throws Exception { return new DatabaseConnection(((SessionImpl) (bookManager.getEntityManager().getDelegate())).connection()); } /** * Tests the expected results for searching for 'Space' in SCF-FI books. */ @Test public void testSciFiBookSearch() throws Exception { bookManager.listAllBooks(); bookManager.updateFullTextIndex(); List results = bookManager.search(BookCategory.SCIFI, "Space"); assertEquals("Expected 2 results for SCI FI search for 'Space'", 2, results.size()); assertEquals("Expected 1st result to be '2001: A Space Oddysey'", "2001: A Space Oddysey", results.get(0).getTitle()); assertEquals("Expected 2nd result to be 'Apollo 13'", "Apollo 13", results.get(1).getTitle()); } private void deleteBooks() { LOG.info("Deleting Books...-"); bookManager.deleteAllBooks(); } } The source data for the test is defined in an xml file.

August 5, 2013

by Adrian Milne

· 31,372 Views

What Is NoSQL?

Dan McCreary and Ann Kelly, authors of 'Making Sense of NoSQL,' discuss the business drivers and motivations that make NoSQL so popular to organizations today.

August 1, 2013

by Eric Gregory

· 21,298 Views · 4 Likes

Jersey Client: Testing External Calls

Jim and I have been doing a bit of work over the last week which involved calling neo4j’s HA status URI to check whether or not an instance was a master/slave and we’ve been using jersey-client. The code looked roughly like this: class Neo4jInstance { private Client httpClient; private URI hostname; public Neo4jInstance(Client httpClient, URI hostname) { this.httpClient = httpClient; this.hostname = hostname; } public Boolean isSlave() { String slaveURI = hostname.toString() + ":7474/db/manage/server/ha/slave"; ClientResponse response = httpClient.resource(slaveURI).accept(TEXT_PLAIN).get(ClientResponse.class); return Boolean.parseBoolean(response.getEntity(String.class)); } } While writing some tests against this code we wanted to stub out the actual calls to the HA slave URI so we could simulate both conditions and a brief search suggested that mockito was the way to go. We ended up with a test that looked like this: @Test public void shouldIndicateInstanceIsSlave() { Client client = mock( Client.class ); WebResource webResource = mock( WebResource.class ); WebResource.Builder builder = mock( WebResource.Builder.class ); ClientResponse clientResponse = mock( ClientResponse.class ); when( builder.get( ClientResponse.class ) ).thenReturn( clientResponse ); when( clientResponse.getEntity( String.class ) ).thenReturn( "true" ); when( webResource.accept( anyString() ) ).thenReturn( builder ); when( client.resource( anyString() ) ).thenReturn( webResource ); Boolean isSlave = new Neo4jInstance(client, URI.create("http://localhost")).isSlave(); assertTrue(isSlave); } which is pretty gnarly but does the job. I thought there must be a better way so I continued searching and eventually came across this post on the mailing list which suggested creating a custom ClientHandler and stubbing out requests/responses there. I had a go at doing that and wrapped it with a little DSL that only covers our very specific use case: private static ClientBuilder client() { return new ClientBuilder(); } static class ClientBuilder { private String uri; private int statusCode; private String content; public ClientBuilder requestFor(String uri) { this.uri = uri; return this; } public ClientBuilder returns(int statusCode) { this.statusCode = statusCode; return this; } public Client create() { return new Client() { public ClientResponse handle(ClientRequest request) throws ClientHandlerException { if (request.getURI().toString().equals(uri)) { InBoundHeaders headers = new InBoundHeaders(); headers.put("Content-Type", asList("text/plain")); return createDummyResponse(headers); } throw new RuntimeException("No stub defined for " + request.getURI()); } }; } private ClientResponse createDummyResponse(InBoundHeaders headers) { return new ClientResponse(statusCode, headers, new ByteArrayInputStream(content.getBytes()), messageBodyWorkers()); } private MessageBodyWorkers messageBodyWorkers() { return new MessageBodyWorkers() { public Map> getReaders(MediaType mediaType) { return null; } public Map> getWriters(MediaType mediaType) { return null; } public String readersToString(Map> mediaTypeListMap) { return null; } public String writersToString(Map> mediaTypeListMap) { return null; } public MessageBodyReader getMessageBodyReader(Class tClass, Type type, Annotation[] annotations, MediaType mediaType) { return (MessageBodyReader) new StringProvider(); } public MessageBodyWriter getMessageBodyWriter(Class tClass, Type type, Annotation[] annotations, MediaType mediaType) { return null; } public List getMessageBodyWriterMediaTypes(Class tClass, Type type, Annotation[] annotations) { return null; } public MediaType getMessageBodyWriterMediaType(Class tClass, Type type, Annotation[] annotations, List mediaTypes) { return null; } }; } public ClientBuilder content(String content) { this.content = content; return this; } } If we change our test to use this code it now looks like this: @Test public void shouldIndicateInstanceIsSlave() { Client client = client().requestFor("http://localhost:7474/db/manage/server/ha/slave"). returns(200). content("true"). create(); Boolean isSlave = new Neo4jInstance(client, URI.create("http://localhost")).isSlave(); assertTrue(isSlave); } Is there a better way? In Ruby I’ve used WebMock to achieve this and Ashok pointed me towards WebStub which looks nice except I’d need to pass in the hostname + port rather than constructing that in the code.

August 1, 2013

by Mark Needham

· 10,827 Views

AWS: Attaching an EBS volume on an EC2 instance and making it available for use

I recently wanted to attach an EBS volume to an existing EC2 instance that I had running and since it was for a one off tasks (famous last words) I decided to configure it manually. I created the EBS volume through the AWS console and one thing that initially caught me out is that the EC2 instance and EBS volume need to be in the same region and zone. Therefore if I create my EC2 instance in ‘eu-west-1b’ then I need to create my EBS volume in ‘eu-west-1b’ as well otherwise I won’t be able to attach it to that instance. I attached the device as /dev/sdf although the UI gives the following warning: Linux Devices: /dev/sdf through /dev/sdp Note: Newer linux kernels may rename your devices to /dev/xvdf through /dev/xvdp internally, even when the device name entered here (and shown in the details) is /dev/sdf through /dev/sdp. After attaching the EBS volume to the EC2 instance my next step was to SSH onto my EC2 instance and make the EBS volume available. The first step is to create a file system on the volume: $ sudo mkfs -t ext3 /dev/sdf mke2fs 1.42 (29-Nov-2011) Could not stat /dev/sdf --- No such file or directory The device apparently does not exist; did you specify it correctly? It turns out that warning was handy and the device has in fact been renamed. We can confirm this by callingfdisk: $ sudo fdisk -l Disk /dev/xvda1: 8589 MB, 8589934592 bytes 255 heads, 63 sectors/track, 1044 cylinders, total 16777216 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/xvda1 doesn't contain a valid partition table Disk /dev/xvdf: 53.7 GB, 53687091200 bytes 255 heads, 63 sectors/track, 6527 cylinders, total 104857600 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/xvdf doesn't contain a valid partition table /dev/xvdf is the one we’re interested in so I re-ran the previous command: $ sudo mkfs -t ext3 /dev/xvdf mke2fs 1.42 (29-Nov-2011) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 3276800 inodes, 13107200 blocks 655360 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 400 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424 Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done Once I’d done that I needed to create a mount point for the volume and I thought the best place was probably a directory under /mnt: $ sudo mkdir /mnt/ebs The final step is to mount the volume: $ sudo mount /dev/xvdf /mnt/ebs And if we run df we can see that it’s ready to go: $ df -h Filesystem Size Used Avail Use% Mounted on /dev/xvda1 7.9G 883M 6.7G 12% / udev 288M 8.0K 288M 1% /dev tmpfs 119M 164K 118M 1% /run none 5.0M 0 5.0M 0% /run/lock none 296M 0 296M 0% /run/shm /dev/xvdf 50G 180M 47G 1% /mnt/ebs

July 31, 2013

by Mark Needham

· 11,981 Views

OLAP Operation in R

OLAP (Online Analytical Processing) is a very common way to analyze raw transaction data by aggregating along different combinations of dimensions. This is a well-established field in Business Intelligence / Reporting. In this post, I will highlight the key ideas in OLAP operation and illustrate how to do this in R. Facts and Dimensions The core part of OLAP is a so-called "multi-dimensional data model", which contains two types of tables; "Fact" table and "Dimension" table A Fact table contains records each describe an instance of a transaction. Each transaction records contains categorical attributes (which describes contextual aspects of the transaction, such as space, time, user) as well as numeric attributes (called "measures" which describes quantitative aspects of the transaction, such as no of items sold, dollar amount). A Dimension table contain records that further elaborates the contextual attributes, such as user profile data, location details ... etc. In a typical setting of Multi-dimensional model ... Each fact table contains foreign keys that references the primary key of multiple dimension tables. In the most simple form, it is called a STAR schema. Dimension tables can contain foreign keys that references other dimensional tables. This provides a sophisticated detail breakdown of the contextual aspects. This is also called a SNOWFLAKE schema. Also this is not a hard rule, Fact table tends to be independent of other Fact table and usually doesn't contain reference pointer among each other. However, different Fact table usually share the same set of dimension tables. This is also called GALAXY schema. But it is a hard rule that Dimension table NEVER points / references Fact table A simple STAR schema is shown in following diagram. Each dimension can also be hierarchical so that the analysis can be done at different degree of granularity. For example, the time dimension can be broken down into days, weeks, months, quarter and annual; Similarly, location dimension can be broken down into countries, states, cities ... etc. Here we first create a sales fact table that records each sales transaction. # Setup the dimension tables state_table <- data.frame(key=c("CA", "NY", "WA", "ON", "QU"), name=c("California", "new York", "Washington", "Ontario", "Quebec"), country=c("USA", "USA", "USA", "Canada", "Canada")) month_table <- data.frame(key=1:12, desc=c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), quarter=c("Q1","Q1","Q1","Q2","Q2","Q2","Q3","Q3","Q3","Q4","Q4","Q4")) prod_table <- data.frame(key=c("Printer", "Tablet", "Laptop"), price=c(225, 570, 1120)) # Function to generate the Sales table gen_sales <- function(no_of_recs) { # Generate transaction data randomly loc <- sample(state_table$key, no_of_recs, replace=T, prob=c(2,2,1,1,1)) time_month <- sample(month_table$key, no_of_recs, replace=T) time_year <- sample(c(2012, 2013), no_of_recs, replace=T) prod <- sample(prod_table$key, no_of_recs, replace=T, prob=c(1, 3, 2)) unit <- sample(c(1,2), no_of_recs, replace=T, prob=c(10, 3)) amount <- unit*prod_table[prod,]$price sales <- data.frame(month=time_month, year=time_year, loc=loc, prod=prod, unit=unit, amount=amount) # Sort the records by time order sales <- sales[order(sales$year, sales$month),] row.names(sales) <- NULL return(sales) } # Now create the sales fact table sales_fact <- gen_sales(500) # Look at a few records head(sales_fact) month year loc prod unit amount 1 1 2012 NY Laptop 1 225 2 1 2012 CA Laptop 2 450 3 1 2012 ON Tablet 2 2240 4 1 2012 NY Tablet 1 1120 5 1 2012 NY Tablet 2 2240 6 1 2012 CA Laptop 1 225 Multi-dimensional Cube Now, we turn this fact table into a hypercube with multiple dimensions. Each cell in the cube represents an aggregate value for a unique combination of each dimension. # Build up a cube revenue_cube <- tapply(sales_fact$amount, sales_fact[,c("prod", "month", "year", "loc")], FUN=function(x){return(sum(x))}) # Showing the cells of the cude revenue_cube , , year = 2012, loc = CA month prod 1 2 3 4 5 6 7 8 9 10 11 12 Laptop 1350 225 900 675 675 NA 675 1350 NA 1575 900 1350 Printer NA 2280 NA NA 1140 570 570 570 NA 570 1710 NA Tablet 2240 4480 12320 3360 2240 4480 3360 3360 5600 2240 2240 3360 , , year = 2013, loc = CA month prod 1 2 3 4 5 6 7 8 9 10 11 12 Laptop 225 225 450 675 225 900 900 450 675 225 675 1125 Printer NA 1140 NA 1140 570 NA NA 570 NA 1140 1710 1710 Tablet 3360 3360 1120 4480 2240 1120 7840 3360 3360 1120 5600 4480 , , year = 2012, loc = NY month prod 1 2 3 4 5 6 7 8 9 10 11 12 Laptop 450 450 NA NA 675 450 675 NA 225 225 NA 450 Printer NA 2280 NA 2850 570 NA NA 1710 1140 NA 570 NA Tablet 3360 13440 2240 2240 2240 5600 5600 3360 4480 3360 4480 3360 , , year = 2013, loc = NY ..... dimnames(revenue_cube) $prod [1] "Laptop" "Printer" "Tablet" $month [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" $year [1] "2012" "2013" $loc [1] "CA" "NY" "ON" "QU" "WA" OLAP Operations Here are some common operations of OLAP Slice Dice Rollup Drilldown Pivot "Slice" is about fixing certain dimensions to analyze the remaining dimensions. For example, we can focus in the sales happening in "2012", "Jan", or we can focus in the sales happening in "2012", "Jan", "Tablet". # Slice # cube data in Jan, 2012 revenue_cube[, "1", "2012",] loc prod CA NY ON QU WA Laptop 1350 450 NA 225 225 Printer NA NA NA 1140 NA Tablet 2240 3360 5600 1120 2240 # cube data in Jan, 2012 revenue_cube["Tablet", "1", "2012",] CA NY ON QU WA 2240 3360 5600 1120 2240 "Dice" is about limited each dimension to a certain range of values, while keeping the number of dimensions the same in the resulting cube. For example, we can focus in sales happening in [Jan/ Feb/Mar, Laptop/Tablet, CA/NY]. revenue_cube[c("Tablet","Laptop"), c("1","2","3"), , c("CA","NY")] , , year = 2012, loc = CA month prod 1 2 3 Tablet 2240 4480 12320 Laptop 1350 225 900 , , year = 2013, loc = CA month prod 1 2 3 Tablet 3360 3360 1120 Laptop 225 225 450 , , year = 2012, loc = NY month prod 1 2 3 Tablet 3360 13440 2240 Laptop 450 450 NA , , year = 2013, loc = NY month prod 1 2 3 Tablet 3360 4480 6720 Laptop 450 NA 225 "Rollup" is about applying an aggregation function to collapse a number of dimensions. For example, we want to focus in the annual revenue for each product and collapse the location dimension (ie: we don't care where we sold our product). apply(revenue_cube, c("year", "prod"), FUN=function(x) {return(sum(x, na.rm=TRUE))}) prod year Laptop Printer Tablet 2012 22275 31350 179200 2013 25200 33060 166880 "Drilldown" is the reverse of "rollup" and applying an aggregation function to a finer level of granularity. For example, we want to focus in the annual and monthly revenue for each product and collapse the location dimension (ie: we don't care where we sold our product). apply(revenue_cube, c("year", "month", "prod"), FUN=function(x) {return(sum(x, na.rm=TRUE))}) , , prod = Laptop month year 1 2 3 4 5 6 7 8 9 10 11 12 2012 2250 2475 1575 1575 2250 1800 1575 1800 900 2250 1350 2475 2013 2250 900 1575 1575 2250 2475 2025 1800 2025 2250 3825 2250 , , prod = Printer month year 1 2 3 4 5 6 7 8 9 10 11 12 2012 1140 5700 570 3990 4560 2850 1140 2850 2850 1710 3420 570 2013 1140 4560 3420 4560 2850 1140 570 3420 1140 3420 3990 2850 , , prod = Tablet month year 1 2 3 4 5 6 7 8 9 10 11 12 2012 14560 23520 17920 12320 10080 14560 13440 15680 25760 12320 11200 7840 2013 8960 11200 10080 7840 14560 10080 29120 15680 15680 8960 12320 22400 "Pivot" is about analyzing the combination of a pair of selected dimensions. For example, we want to analyze the revenue by year and month. Or we want to analyze the revenue by product and location. apply(revenue_cube, c("year", "month"), FUN=function(x) {return(sum(x, na.rm=TRUE))}) month year 1 2 3 4 5 6 7 8 9 10 11 12 2012 17950 31695 20065 17885 16890 19210 16155 20330 29510 16280 15970 10885 2013 12350 16660 15075 13975 19660 13695 31715 20900 18845 14630 20135 27500 apply(revenue_cube, c("prod", "loc"), FUN=function(x) {return(sum(x, na.rm=TRUE))}) loc prod CA NY ON QU WA Laptop 16425 9450 7650 7425 6525 Printer 15390 19950 7980 10830 10260 Tablet 90720 117600 45920 34720 57120 I hope you can get a taste of the richness of data processing model in R. However, since R is doing all the processing in RAM. This requires your data to be small enough so it can fit into the local memory in a single machine.

July 30, 2013

by Ricky Ho

· 18,031 Views · 3 Likes

JMS vs RabbitMQ

Definition : JMS : Java Message Service is an API that is part of Java EE for sending messages between two or more clients. There are many JMS providers such as OpenMQ (glassfish’s default), HornetQ(Jboss), and ActiveMQ. RabbitMQ: is an open source message broker software which uses the AMQP standard and is written by Erlang. Messaging Model: JMS supports two models: one to one and publish/subscriber. RabbitMQ supports the AMQP model which has 4 models : direct, fanout, topic, headers. Data types: JMS supports 5 different data types but RabbitMQ supports only the binary data type. Workflow strategy: In AMQP, producers send to the exchange then the queue, but in JMS, producers send to the queue or topic directly. Technology compatibility: JMS is specific for java users only, but RabbitMQ supports many technologies. Performance: If you would like to know more about their performance, this benchmark is a good place to start, but look for others as well.

July 30, 2013

by Saeid Siavashi

· 51,784 Views · 16 Likes

Stepping through LMDB: Making Everything Easier

okay, i know that i have been critical about the lmdb codebase so far. but one thing that i really want to point out for it is that it was pretty easy to actually get things working on windows. it wasn’t smooth, in the sense that i had to muck around with the source a bit (hack endianess, remove a bunch of unix specific header files, etc). but that took less than an hour, and it was pretty much it. since i am by no means an experienced c developer, i consider this a major win. compare that to leveldb, which flat out won’t run on windows no matter how much time i spent trying, and it is a pleasure . also, stepping through the code i am starting to get a sense of how it works that is much different than the one i had when i just read the code. it is like one of those 3d images, you suddenly see something. the first thing that became obvious is that i totally missed the significance of the lock file. lmdb actually create two files: lock.mdb data.mdb lock.mdb is used to synchronized data between different readers. it seems to mostly be there if you want to have multiple writers using different processes. that is a very interesting model for an embedded database, i’ve to admit. not something that i think other embedded databases are offering. in order to do that, it create two named mutexes (one for read and one for write). a side note on windows support: lmdb supports windows, but it is very much a 2nd class citizen. you can see it in things like path not found error turning into a no such process error (because it try to use getlasterror() codes as c codes), or when it doesn’t create a directory even though not creating it would fail. i am currently debugging through the code and fixing such issues as i go along (but no, i am doing heavy handed magic fixes, just to get past this stage to the next one, otherwise i would have sent a pull request). here is one such example. here is the original code: but readfile in win32 will return false if the file is empty, so you actually need to write something like this to make the code work: past that hurdle, i think that i get a lot more about what is going on with the way lmdb works than before. let us start with the way data.mdb works. it is important to note that for pretty much everything in lmdb we use the system page size. by default, that is 4kb. the data file starts with 2 pages allocated. those page contain the following information: looking back at how couchdb did things, i am pretty sure that those two pages are going to be pretty important . i am guess that they would always contain the root of the data in the file. there is also the last transaction on them, which is what i imagine determine how something gets committed. i don’t know yet, as i said, guessing based on how couchdb works. i’ll continue this review in another time. next time, transactions…

July 27, 2013

by Oren Eini

· 9,538 Views

Using Morphia to Map Java Objects in MongoDB

MongoDB is an open source document-oriented NoSQL database system which stores data as JSON-like documents with dynamic schemas. As it doesn't store data in tables as is done in the usual relational database setup, it doesn't map well to the JPA way of storing data. Morphia is an open source lightweight type-safe library designed to bridge the gap between the MongoDB Java driver and domain objects. It can be an alternative to SpringData if you're not using the Spring Framework to interact with MongoDB. This post will cover the basics of persisting and querying entities along the lines of JPA by using Morphia and a MongoDB database instance. There are four POJOs this example will be using. First we have BaseEntity which is an abstract class containing the Id and Version fields: package com.city81.mongodb.morphia.entity; import org.bson.types.ObjectId; import com.google.code.morphia.annotations.Id; import com.google.code.morphia.annotations.Property; import com.google.code.morphia.annotations.Version; public abstract class BaseEntity { @Id @Property("id") protected ObjectId id; @Version @Property("version") private Long version; public BaseEntity() { super(); } public ObjectId getId() { return id; } public void setId(ObjectId id) { this.id = id; } public Long getVersion() { return version; } public void setVersion(Long version) { this.version = version; } } Whereas JPA would use @Column to rename the attribute, Morphia uses @Property. Another difference is that @Property needs to be on the variable whereas @Column can be on the variable or the get method. The main entity we want to persist is the Customer class: package com.city81.mongodb.morphia.entity; import java.util.List; import com.google.code.morphia.annotations.Embedded; import com.google.code.morphia.annotations.Entity; @Entity public class Customer extends BaseEntity { private String name; private List accounts; @Embedded private Address address; public String getName() { return name; } public void setName(String name) { this.name = name; } public List getAccounts() { return accounts; } public void setAccounts(List accounts) { this.accounts = accounts; } public Address getAddress() { return address; } public void setAddress(Address address) { this.address = address; } } As with JPA, the POJO is annotated with @Entity. The class also shows an example of @Embedded: The Address class is also annotated with @Embedded as shown below: package com.city81.mongodb.morphia.entity; import com.google.code.morphia.annotations.Embedded; @Embedded public class Address { private String number; private String street; private String town; private String postcode; public String getNumber() { return number; } public void setNumber(String number) { this.number = number; } public String getStreet() { return street; } public void setStreet(String street) { this.street = street; } public String getTown() { return town; } public void setTown(String town) { this.town = town; } public String getPostcode() { return postcode; } public void setPostcode(String postcode) { this.postcode = postcode; } } Finally, we have the Account class of which the customer class has a collection of: package com.city81.mongodb.morphia.entity; import com.google.code.morphia.annotations.Entity; @Entity public class Account extends BaseEntity { private String name; public String getName() { return name; } public void setName(String name) { this.name = name; } } The above show only a small subset of what annotations can be applied to domain classes. More can be found at http://code.google.com/p/morphia/wiki/AllAnnotations The Example class shown below goes through the steps involved in connecting to the MongoDB instance, populating the entities, persisting them and then retrieving them: package com.city81.mongodb.morphia; import java.net.UnknownHostException; import java.util.ArrayList; import java.util.List; import com.city81.mongodb.morphia.entity.Account; import com.city81.mongodb.morphia.entity.Address; import com.city81.mongodb.morphia.entity.Customer; import com.google.code.morphia.Datastore; import com.google.code.morphia.Key; import com.google.code.morphia.Morphia; import com.mongodb.Mongo; import com.mongodb.MongoException; /** * A MongoDB and Morphia Example * */ public class Example { public static void main( String[] args ) throws UnknownHostException, MongoException { String dbName = new String("bank"); Mongo mongo = new Mongo(); Morphia morphia = new Morphia(); Datastore datastore = morphia.createDatastore(mongo, dbName); morphia.mapPackage("com.city81.mongodb.morphia.entity"); Address address = new Address(); address.setNumber("81"); address.setStreet("Mongo Street"); address.setTown("City"); address.setPostcode("CT81 1DB"); Account account = new Account(); account.setName("Personal Account"); List accounts = new ArrayList(); accounts.add(account); Customer customer = new Customer(); customer.setAddress(address); customer.setName("Mr Bank Customer"); customer.setAccounts(accounts); Key savedCustomer = datastore.save(customer); System.out.println(savedCustomer.getId()); } Executing the first few lines will result in the creation of a Datastore. This interface will provide the ability to get, delete and save objects in the 'bank' MongoDB instance. The mapPackage method call on the morphia object determines what objects are mapped by that instance of Morphia. In this case all those in the package supplied. Other alternatives exist to map classes, including the method map which takes a single class (this method can be chained as the returning object is the morphia object), or passing a Set of classes to the Morphia constructor. After creating instances of the entities, they can be saved by calling save on the datastore instance and can be found using the primary key via the get method. The output from the Example class would look something like the below: 11-Jul-2012 13:20:06 com.google.code.morphia.logging.MorphiaLoggerFactory chooseLoggerFactory INFO: LoggerImplFactory set to com.google.code.morphia.logging.jdk.JDKLoggerFactory 4ffd6f7662109325c6eea24f Mr Bank Customer There are many other methods on the Datastore interface and they can be found along with the other Javadocs at http://morphia.googlecode.com/svn/site/morphia/apidocs/index.html An alternative to using the Datastore directly is to use the built in DAO support. This can be done by extending the BasicDAO class as shown below for the Customer entity: package com.city81.mongodb.morphia.dao; import com.city81.mongodb.morphia.entity.Customer; import com.google.code.morphia.Morphia; import com.google.code.morphia.dao.BasicDAO; import com.mongodb.Mongo; public class CustomerDAO extends BasicDAO { public CustomerDAO(Morphia morphia, Mongo mongo, String dbName) { super(mongo, morphia, dbName); } } To then make use of this, the Example class can be changed (and enhanced to show a query and a delete): ... CustomerDAO customerDAO = new CustomerDAO(morphia, mongo, dbName); customerDAO.save(customer); Query query = datastore.createQuery(Customer.class); query.and( query.criteria("accounts.name").equal("Personal Account"), query.criteria("address.number").equal("81"), query.criteria("name").contains("Bank") ); QueryResults retrievedCustomers = customerDAO.find(query); for (Customer retrievedCustomer : retrievedCustomers) { System.out.println(retrievedCustomer.getName()); System.out.println(retrievedCustomer.getAddress().getPostcode()); System.out.println(retrievedCustomer.getAccounts().get(0).getName()); customerDAO.delete(retrievedCustomer); } ... With the output from running the above shown below: 11-Jul-2012 13:30:46 com.google.code.morphia.logging.MorphiaLoggerFactory chooseLoggerFactory INFO: LoggerImplFactory set to com.google.code.morphia.logging.jdk.JDKLoggerFactory Mr Bank Customer CT81 1DB Personal Account This post only covers a few brief basics of Morphia but shows how it can help bridge the gap between JPA and NoSQL.

July 25, 2013

by Geraint Jones

· 76,277 Views