DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Databases Topics

article thumbnail
ElasticSearch: Java API
ElasticSearch provides Java API, thus it executes all operations asynchronously by using client object.
September 30, 2013
by Hüseyin Akdoğan DZone Core CORE
· 137,571 Views · 4 Likes
article thumbnail
Parallel SQL in C#
So, I’ve been wanting to get back to playing with C# for a while, and finally have had the opportunity. I’ve also been wanting to play with the Task library in .NET and see if I could get it to do something interesting, well below is the result. The code below, running in a .NET 4 project, will run two SQL SELECT statements against the AdventureWorks2012 database. There are three tasks in here, ParallelTask 1 and 2, and a timing task. The Parallel task takes a Connection String and a query as inputs, and passes out a Status Message. One of the important points with a task is that the task has to be self contained. This is why the connection is instantiated within the task. I also added in a Timing task (ParallelTiming) so I could pass out a ping message. The whole thing is controlled by the code in the main section, which is used to start the three tasks, with their appropriate parameters. After this it awaits the tasks completing, then passes out the resulting return messages. Try it out; it’s good fun and all you need is SQL Server, AdventureWorks and something to build C# projects. You can download the code here Have fun! /// Parallel_SQL demonstration code /// From Nick Haslam /// http://blog.nhaslam.com /// 16/9/2013 using System; using System.Collections.Generic; using System.Data.SqlClient; using System.Linq; using System.Text; using System.Threading.Tasks; namespace Parallel_SQL { class Program { /// /// First Parallel task /// ///Connection string details ///Query to execute ///Status message to pass back /// static Task ParallelTask1(string sConnString, string sQuery, Action StatusMessage) { return Task.Factory.StartNew(() => { SqlConnection conn = new SqlConnection(sConnString); conn.Open(); StatusMessage(“Running Query”); SqlDataReader reader = null; SqlCommand sqlCommand = new SqlCommand(sQuery, conn); reader = sqlCommand.ExecuteReader(); while (reader.Read()) { StatusMessage(reader[0].ToString()); } return “Task 1 Complete”; }); } /// /// Second Parallel task /// ///Connection string details ///Query to execute ///Status message to pass back /// static Task ParallelTask2(string sConnString, string sQuery, Action StatusMessage) { return Task.Factory.StartNew(() => { SqlConnection conn = new SqlConnection(sConnString); conn.Open(); StatusMessage(“Running Query”); SqlDataReader reader = null; SqlCommand sqlCommand = new SqlCommand(sQuery, conn); reader = sqlCommand.ExecuteReader(); while (reader.Read()) { StatusMessage(reader[0].ToString()); } return “Task 2 Complete”; }); } /// /// Timing Task /// ///Milliseconds between ping ///Status message to pass back /// static Task ParallelTiming(int iMSPause, Action StatusMessage) { return Task.Factory.StartNew(() => { for (int i = 0; i < 10; i++) { System.Threading.Thread.Sleep(iMSPause); StatusMessage(“******************** PING ********************”); } return “Timing task done”; }); } static void Main(string[] args) { string sConnString = “server=.; Trusted_Connection=yes; database=AdventureWorks2012;”; try { var Task1Control = ParallelTask1(sConnString, “SELECT top 500 TransactionID FROM Production.TransactionHistory”, (update) => { Console.WriteLine(String.Format(“{0} – {1}”, DateTime.Now, update)); }); var Task2Control = ParallelTask2(sConnString, “SELECT top 500 SalesOrderDetailID FROM sales.SalesOrderDetail”, (update) => { Console.WriteLine(String.Format(“{0} – \t\t{1}”, DateTime.Now, update)); }); var TimingTaskControl = ParallelTiming(250, (update) => { Console.WriteLine(String.Format(“{0} – \t\t\t{1}”, DateTime.Now, update)); }); // Await Completion of the tasks Console.WriteLine(“Task 1 Status – {0}”, Task1Control.Result); Console.WriteLine(“Task 2 Status – {0}”, Task2Control.Result); Console.WriteLine(“Timing Task Status – {0}”, TimingTaskControl.Result); } catch (Exception e) { Console.WriteLine(e.ToString()); } Console.ReadKey(); } } }
September 29, 2013
by Nick Haslam
· 22,643 Views · 31 Likes
article thumbnail
"Lazy" Database Synchronization Using RabbitMQ
The Problem Obviously, there are tons of different ways to sync databases, so why should it be described again? Let's imagine that we have an unusual situation with restrictions below: A future system will have some Head Office (HO) and a couple of Branch Offices (BOs) All offices are located in different places, and some of them have difficulties with the internet connection. It could even be a situation where the internet is available for 1-2 hours per day. Almost all vital data is created in the HO and should be presented as read-only in BOs. Data exchange should be limited with appropriate permissions (for example, if an operator has created some sensitive data in the HO for BO1, only BO1 should have access to it). HO should have access to all information that has been created or modified in BOs. According to all described points final decision to write own DB sync mechanism has been made. Basic Idea Due to connection degradation between HO and BOs, we have to sync everything within short-term sessions. Since there is no need to send information to all branches in general cases, we should be able to orchestrate data flow. Those thoughts bring us to the idea that we might implement some kind of RPC where an event occurs in one office, and it is reproduced (replayed) in another. Message queues (MQ) are a perfect solution to sync data between branches. RabbitMQ is my favorite MQ, so I will use it in this example. Also, this application will use the .NET stack which has a convenient API client implementation for RabbitMQ called EasyNetQ. High Level Application Architecture According to the idea of replaying some actions on other system instances, we should be able to divide them into single business-logic operations. The best way to achieve this it is by using the Aggregate Roots approach. The main idea is to have separated objects that are divided by domain entities, and each call to the methods of those objects is a single change to state of the business logic. For example, if we have some domain object Document and the ability to Get, Upsert, or Apply/Unapply, then we should describe its root as (pseudocode): public class DocumentRoot { public Document Get(Id) { ... } public Document Upsert(Document) { ... } public bool Apply(Id) { ... } public bool UnApply(Id) { ... } } Also, it's very important to ensure that each call will be in a transaction in order to avoid data loss. This can be achieved using simple method interception (for example Autofac + Castle.Proxy). In other worlds, the core process will look like this: Keep in mind things as entities primary keys, because data will be populated between different system instances, and we'll need to be sure that ID's will be the same. Also, collisions are possible while using simple auto-incrementing PK's, so our choice is GUID. With the help of a base repository, it's very simple to implement new GUID storage during object creation. Let's assume that we have an ExchangeInformation object that handles all data needed to restore a root call on a remote system. It will contain info about the method name, type name, input, and output params – this data can be obtained from a root interceptor. Also, it should have the list of new ID's, but it's not hard to get them too, even though we'll need to implement the UnitOfWork pattern on an ORM type to support transactions. This will allow us to place our ExchangeInformation in that UoF object (for example, within Entity Framework it's DbContext). Here is the implementation (using EF) of saving any changes in a domain within the base generic repository where the base entity looks like: public class EntityBase { public long Id { get; set; } public Guid Guid { get; set; } } public virtual void Save(T entity) { DbEntityEntry entry = Context.Entry(entity); if (entity.Guid == Guid.Empty) { try { Guid newGuid = Context.ExchangeInformation.IsExchangeRestore ? Context.ExchangeInformation.NewGuids[0] : Guid.NewGuid(); if (Context.ExchangeInformation.IsExchangeRestore) { Context.ExchangeInformation.NewGuids.RemoveAt(0); } else { Context.ExchangeInformation.NewGuids.Add(newGuid); } entity.Guid = newGuid; } catch { throw new Exception("Failed to restore exchange, no guid found"); } entry.State = EntityState.Added; return; } Context.Entry(entity).State = EntityState.Modified; } One more important note: to avoid code duplication, it's necessary to use GUID's on clients, because if they operate any other ID's we'll need to write two different implementations of any method. Big Picture After preparation completion, we can proceed with architecture design. Since every system instance should be able to send and receive new data, we can declare two RMQ topics: input and output. Also, because message flow must be orchestrated, queues for each system instance should be created within the output topic. The simplest strategy for a routing implementation is to use the branch office guide as a key. So we know how to do following at the moment : Save the source event in one office. Put this event to selected queues (selection could be made but it depends on the situation: read from the entity, call some additional method, use attributes etc.) The next step is a solution for how to make output events from one office appear in the input queue of the other office. RabbitMQ has two plugins for that: Federation and Shovel. They are quite similar, but shovel is working on a lower level and has more options to control the synchronization process, so that we'll use the second one to link queues. Shovel is very good with handling connection degradation and has lot of additional configurable options like message republishing properties, routing etc. Now it's time to combine all pieces in to single picture: Aggregators here are simple RabbitMQ consumers that handle incoming messages from other offices and launch appropriate methods. One other problem is restoring transferred params. From my point of view the best way is to use Json.Net with type serialization and restore them on a remote system instance with a small hack: private object[] GetParams(MethodInfo methodInfo, ExchangeInformation information, ExchangeMessage message) { ParameterInfo[] methodParams = methodInfo.GetParameters(); var listParams = new List>(information.InputParamsString); for (int ii = 0; ii < methodParams.Length; ii++) { var jObject = JsonConvert.DeserializeObject(information.OutputValueString); string typeName = jObject["$type"].ToString(); listParams.Add(jObject.ToObject(Type.GetType(typeName))); } return listParams.ToArray(); } Surely appropriate conditions for params count mismatch, so valid deserialization and so on are required. Conclusions The approach I've described is very easy to implement and it has lots of additional places that can be customized. For example, any other method can be executed before/instead of/after restoration on a target branch to change the logic of DOM behavior. The main issue is that collisions can occur if two BOs edit same object at the same time. Actually, it's not hard to track this situation by adding a hash to EntityBase. Nevertheless, a human's decision is needed to resolve conflicts, so a simple UI is necessary in the HO where the operator can choose which data is correct.
September 25, 2013
by Vladimir Kornev
· 18,265 Views · 2 Likes
article thumbnail
Connecting to SQL Azure with SQL Management Studio
Intro If you want to manage your SQL Databases in Azure using tools that you’re a little more familiar and comfortable with – for example – SQL Management Studio, how do you go about connecting? You could read the help article from Microsoft, or you can follow my intuitive screen-based instructions, below: Assumptions 1. I’m assuming you have a version of SQL Management Studio already installed. I believe you’ll need at least SQL Server 2008 R2’s version or newer 2. I’m further assuming you’ve already created a SQL Database in Azure Steps to Connect SSMS to SQL Azure 1. Authenticate to the Azure Portal 2. Click on SQL Databases 3. Click on Servers 4. Click on the name of the Server you wish to connect to… 5. Click on Configure… If not already in place, click on ‘Add to the allowed IP addresses’ to add your current IP address (or specify an address you wish to connect from) and click ‘Save’ 6. Open SQL Management Studio and connect to Database services (usually comes up by default) Enter the fully qualified server name (.database.windows.net) Change to SQL Server Authentication Enter the login preferred (if a new database, the username you specified when yuo created the DB server) Enter the correct password 7. Hit the Connect button Troubleshooting Ensure you have the appropriate ports open outbound from your local network or connection (typically port 1433) Ensure you have allowed the correct public IP address you’re trying to connect from via the Azure Portal (steps 1-5 above) Ensure you are using the correct server name and user name For SSMS, this is the server name (in step 4) followed by .database.windows.net Ensure you are using SQL Server Authentication For SSMS the username format is If you forgot the password of your username, you can reset the password in the Azure Portal, in step 4, click on Dashboard: Lastly… You can click on the Database (in step 2) to see your connection options:
September 25, 2013
by Rob Sanders
· 262,919 Views
article thumbnail
Solving the Detached Many-to-Many Problem with the Entity Framework
Introduction This article is part of the ongoing series I’ve been writing recently, but can be read as a standalone article. I’m going to do a better job of integrating the changes documented here into the ongoing solution I’ve been building. However, considering how much time and effort I put into solving this issue, I’ve decided to document the approach independently in case it is of use to others in the interim. The Problem Defined This issue presents itself when you are dealing with disconnected/detached Entity Framework POCO objects,. as the DbContext doesn’t track changes to entities. Specifically, trouble occurs with entities participating in a many-to-many relationship, where the EF has hidden a “join table” from the model itself. The problem with detached entities is that the data context has no way of knowing what changes have been made to an object graph, without fetching the data from the data store and doing an entity-by-entity comparison – and that assuming it’s possible to fetch the same way as it was originally. In this solution, all the entities are detached, don’t use proxy types and are designed to move between WCF service boundaries. Some Inspiration There are no out-of-the-box solutions that I’m aware of which can process POCO object graphs that are detached. I did find an interesting solution called GraphDiff which is available from github and also as a NuGet package, but it didn’t work with the latest RC version of the Entity Framework (v6). I also found a very comprehensive article on how to implement a generic repository pattern with the Entity Framework, but it was unable to handle detached many-to-many relationships. In any case, I highly recommend a read of this article, it was inspiration for some of the approach I’ve ended up taking with my own design. The Approach This morning I put together a simple data model with the relationships that I wanted to support with detached entities. I’ve attached the solution with a sample schema and test data at the bottom of this article. If you prefer to open and play with it, be sue to add the Entity Framework (v6 RC) via NuGet, I’ve omitted it for file size and licensing reasons). Here’s a logical view of the model I wanted to support: Here’s the schema view from SQL Server: Here’s the Entity Model which is generated from the above SQL schema: In the spirit of punching myself in the head, I’ve elected to have one table implement an identity specification (meaning the underlying schema allocated PK ID values) whereas the other two tables the ID must be specified. Theoretically, if I can handle the entity types in a generic fashion, then this solution can scale out to larger and more complex models. The scenarios I’m specifically looking to solve in this solution with detached object graphs are as follows: Add a relationship (many-to-many) Add a relationship (FK-based) Update a related entity (many-to-many) Update a related entity (FK-based) Remove a relationship (many-to-many) Remove a relationship (FK-based) Per the above, here’s the scenarios within the context of the above data model: Add a new Secondary entity to a Primary entity Add an Other entity to a Secondary entity Update a Secondary entity by updating a Primary entity Update an Other entity from a Secondary entity (or Primary entity) Remove (but not delete!) a Secondary entity from a Primary entity Remove (but not delete) a Other entity from a Secondary entity Establishing Test Data Just to give myself a baseline, the data model is populated (by default) with the following data. This gives us some “existing entities” to query and modify. More Work for the Consumer Although I tried my best, I couldn’t come to a design which didn’t require the consuming client to do slightly more work to enable this to work properly. Unfortunately the best place for change tracking to occur with disconnected entities is with the layer making changes – be it a business layer or something downstream. To this effect, entities will need to implement a property which reflects the state of the entity (added, modified, deleted etc.). For the object graph to be updated/managed successfully, the consumer of the entities needs to set the entity state properly. This isn’t at all as bad as it sounds, but it’s not nothing. Establishing some Scaffolding After generating the data model, the first thing to be done is ensure each entity derives from the same base class. (“EntityBase”) this is used later to establish the active state of an entity when it needs to be processed. I’ve also created an enum (“ObjectState”) which is a property of the base class and a helper function which maps ObjectState to an EF EntityState. In case this isn’t clear, here’s a class view: Constructing Data Access To ensure that the usage is consistent, I’ve defined a single Data Access class, mainly to establish the pattern for handling detached object graphs. I can’t stress enough that this is not intended as a guide to an appropriate way to structure your data access – I’ll be updating my ongoing series of articles to go into more detail – this is only to articulate a design approach to handling detached object graphs. Having said all that, here’s a look at my “DataAccessor” class, which can be used with generic data access entities (by way of generics): As with my ongoing project, the Entity Framework DbContext is instantiated by this class on construction, and implements IDisposable to ensure the DbContext is disposed properly upon construction. Here’s the constructor showing the EF configuration options I’m using: public DataAccessor() { _accessor = new SampleEntities(); _accessor.Configuration.LazyLoadingEnabled = false; _accessor.Configuration.ProxyCreationEnabled = false; } Updating an Entity We start with a basic scenario to ensure that the scaffolding has been implemented properly. The scenario is to query for a Primary entity and then change a property and update the entity in the data store. [TestMethod] public void UpdateSingleEntity() { Primary existing = null; String existingValue = String.Empty; using (DataAccessor a = new DataAccessor()) { existing = a.DataContext.Primaries.Include("Secondaries").First(); Assert.IsNotNull(existing); existingValue = existing.Title; existing.Title = "Unit " + DateTime.Now.ToString("MMdd hh:mm:ss"); } using (DataAccessor b = new DataAccessor()) { existing.State = ObjectState.Modified; b.InsertOrUpdate(existing); } using (DataAccessor c = new DataAccessor()) { existing.Title = existingValue; existing.State = ObjectState.Modified; c.InsertOrUpdate(existing); } } You’ll noticed that there is nothing particularly significant here, except that the object’s State is reset toModified between operations. Updating a Many-to-Many Relationship Now things get interesting. I’m going to query for a Primary entity, then I’ll update both a property of thePrimary entity itself, and a property of one of the entity’s relationships. [TestMethod] public void UpdateManyToMany() { Primary existing = null; Secondary other = null; String existingValue = String.Empty; String existingOtherValue = String.Empty; using (DataAccessor a = new DataAccessor()) { //Note that we include the navigation property in the query existing = a.DataContext.Primaries.Include("Secondaries").First(); Assert.IsTrue(existing.Secondaries.Count() > 1, "Should be at least 1 linked item"); } //save the original description existingValue = existing.Description; //set a new dummy value (with a date/time so we can see it working) existing.Description = "Edit " + DateTime.Now.ToString("yyyyMMdd hh:mm:ss"); existing.State = ObjectState.Modified; other = existing.Secondaries.First(); //save the original value existingOtherValue = other.AlternateDescription; //set a new value other.AlternateDescription = "Edit " + DateTime.Now.ToString("yyyyMMdd hh:mm:ss"); other.State = ObjectState.Modified; //a new data access class (new DbContext) using (DataAccessor b = new DataAccessor()) { //single method to handle inserts and updates //set a breakpoint here to see the result in the DB b.InsertOrUpdate(existing); } //return the values to the original ones existing.Description = existingValue; other.AlternateDescription = existingOtherValue; existing.State = ObjectState.Modified; other.State = ObjectState.Modified; using (DataAccessor c = new DataAccessor()) { //update the entities back to normal //set a breakpoint here to see the data before it reverts back c.InsertOrUpdate(existing); } } If we actually run this unit test and set the breakpoints accordingly, you’ll see the following in the database: Database at Breakpoint #1 / Database at Breakpoint #2 Database when Unit Test completes You’ll notice at the second breakpoint that the description of the first entities have both been updated. Examining the Insert/Update Code The function exposed by the “data access” class really just passes through to another private function which does the heavy lifting. This is mainly in case we need to reuse the logic, since it essentially processes state action on attached entities. public void InsertOrUpdate(params T[] entities) where T : EntityBase { ApplyStateChanges(entities); DataContext.SaveChanges(); } Here’s the definition of the ApplyStateChanges function, which I’ll discuss below: private void ApplyStateChanges(params T[] items) where T : EntityBase { DbSet dbSet = DataContext.Set(); foreach (T item in items) { //loads related entities into the current context dbSet.Attach(item); if (item.State == ObjectState.Added || item.State == ObjectState.Modified) { dbSet.AddOrUpdate(item); } else if (item.State == ObjectState.Deleted) { dbSet.Remove(item); } foreach (DbEntityEntry entry in DataContext.ChangeTracker.Entries() .Where(c => c.Entity.State != ObjectState.Processed && c.Entity.State != ObjectState.Unchanged)) { var y = DataContext.Entry(entry.Entity); y.State = HelperFunctions.ConvertState(entry.Entity.State); entry.Entity.State = ObjectState.Processed; } } } Notes on this Implementation What this function does is to iterate through the items to be examined, attach them to the current Data Context (which also attaches their children), act on each item accordingly (add/update/remove) and then process new entities which have been added to the Data Context’s change tracker. For each newly “discovered” entity (and ignoring entities which are unchanged or have already been examined), each entity’s DbEntityEntry is set according to the entity’s ObjectState (which is set by the calling client). Doing this allows the Entity Framework to understand what actions it needs to perform on the entities when SaveChanges() is invoked later. You’ll also note that I set the entity’s state to “Processed” when it has been examined, so we don’t act on it more than once (for performance purposes). Fun note: the AddOrUpdate extension method is something I found in theSystem.Data.Entity.Migrations namespace and it acts as an ‘Upsert’ operation, inserting or updating entities depending on whether they exist or not already. Bonus! That’s it for adding and updating, believe it or not. Corresponding Unit Test The following unit test establishes the creation of a new many-to-many entity, it is then removed (by relationship) and then finally deleted altogether from the database: [TestMethod] public void AddRemoveRelationship() { Primary existing = null; using (DataAccessor a = new DataAccessor()) { existing = a.DataContext.Primaries.Include("Secondaries") .FirstOrDefault(); Assert.IsNotNull(existing); } Secondary newEntity = new Secondary(); newEntity.State = ObjectState.Added; newEntity.AlternateTitle = "Unit"; newEntity.AlternateDescription = "Test"; newEntity.SecondaryId = 1000; existing.Secondaries.Add(newEntity); using (DataAccessor a = new DataAccessor()) { //breakpoint #1 here a.InsertOrUpdate(existing); } newEntity.State = ObjectState.Unchanged; existing.State = ObjectState.Modified; using (DataAccessor b = new DataAccessor()) { //breakpoint #2 here b.RemoveEntities(existing, x => x.Secondaries, newEntity); } using (DataAccessor c = new DataAccessor()) { //breakpoint #3 here c.Delete(newEntity); } } Test Results: Pre-Test – Breakpoint #1 / Breakpoint #2 Breakpoint #3 / Post execution (new entity deleted) SQL Profile Trace Removing a Many-to-Many Relationship Now this is where it gets tricky. I’d like to have something a little more polished, but the best I have come up with to date is a separate operation on the data provider which exposes functionality akin to “remove relationship”. The fundamental problem with how the EF POCO entities work without any modifications, is when they are detached, to remove a many-to-many relationship, the relationship to be removed is physically removed from the collection. When the object graph is sent back for processing, there’s a missing related entity, and the service or data context would have to make an assumption that the omission was on purpose, not to mention that it would have to compare against data currently in the data store. To make this easier, I’ve implemented a function called “RemoveEnttiies” which alters the relationship between the parent and the child/children. The one bug catch is that you need to specify the navigation property or collection, which might make it slightly undesirable to implement generically. In any case, I’ve provided two options – with the navigation property as a string parameter or as a LINQ expression – they both do the same thing. public void RemoveEntities(T parent, Expression> expression, params T2[] children) where T : EntityBase where T2 : EntityBase { DataContext.Set().Attach(parent); ObjectContext obj = DataContext.ToObjectContext(); foreach (T2 child in children) { DataContext.Set().Attach(child); obj.ObjectStateManager.ChangeRelationshipState(parent, child, expression, EntityState.Deleted); } DataContext.SaveChanges(); } Notes on this Implementation The “ToObjectContext” is an extension method, and is akin to (DataContext as IObjectContextAdapter).ObjectContext. This is to expose a more fundamental part of the Entity Framework’s object model. We need this level of access to get to the functionality which controls relationships. For each child to be removed (note: not deleted from the physical database), we nominate the parent object, the child, the navigation property (collection) and the nature of the relationship change (delete). Note that this will NOT WORK for Foreign Key defined relationships – more on that below. To delete entities which have active relationships, you’ll need to drop the relationship before attempting to delete or else you’ll have data integrity/referential integrity errors, unless you have accounted for cascading deletion (which I haven’t). Example execution: using (DataAccessor c = new DataAccessor()) { //c.RemoveEntities(existing, "Secondaries", s); //(or can use an expression): c.RemoveEntities(existing, x => x.Secondaries, s); } Removing FK Relationships As mentioned above, you can’t just edit the relationship to remove an FK-based relationship. Instead, you have to follow the EF practice of setting the FK entity to NULL. Here’s a Unit Test which demonstrates how this is achieved: Secondary s = ExistingEntity(); using (DataAccessor c = new DataAccessor()) { s.Other = null; s.OtherId = null; s.State = ObjectState.Modified; o.State = ObjectState.Unchanged; c.InsertOrUpdate(s); } We use the same “Insert or Update’ call – being aware that you have to set the ObjectState properties accordingly. Note: I’m in the process of testing the reverse removal – i.e. what happens if you want to remove a Secondaryentity from an Other entity’s collection. Deleting Entities This is fairly straightforward, but I’ve taken a few more precautions to ensure that the entity to be deleted is valid no the server side. public void Delete(params T[] entities) where T : EntityBase { foreach (T entity in entities) { T attachedEntity = Exists(entity); if (attachedEntity != null) { var attachedEntry = DataContext.Entry(attachedEntity); attachedEntry.State = EntityState.Deleted; } } DataContext.SaveChanges(); } To understand the above, you should take a look at the implementation of the “Exists” function which essentially checks the data store and local cache to see if there is an attached representation: protected T Exists(T entity) where T : EntityBase { var objContext = ((IObjectContextAdapter)this.DataContext) .ObjectContext; var objSet = objContext.CreateObjectSet(); var entityKey = objContext.CreateEntityKey(objSet.EntitySet.Name, entity); DbSet set = DataContext.Set(); var keys = (from x in entityKey.EntityKeyValues select x.Value).ToArray(); //Remember, there can by surrogate keys, so don't assume there's //just one column/one value //If a surrogate key isn't ordered properly, the Set().Find() //method will fail, use attributes on the entity to determine the //proper order. //context.Configuration.AutoDetectChangesEnabled = false; return set.Find(keys); } This is a fairly expensive operation which is why it’s pretty much reserved for deletes and not more frequent operations. It essentially determines the target entity’s primary key and then checks whether the entity exists or not. Note: I haven’t tested this on entities with surrogate keys, but I’ll get to it at some point. If you have surrogate key tables, you can define the PK key order using attributes on the model entity, but I haven’t done this (yet). Summary This article is the culmination of about two days of heavy analysis and investigation. I’ve got a whole lot more to contribute on this topic, but for now, I felt it was worthy enough to post as-is. What you’ve got here is still incredibly rough, and I haven’t done nearly enough testing. To be honest, I was quite excited by the initial results, which is why I decided to write this post. there’s an incredibly good chance that I’ve missed something in the design and implementation, so please be aware of that. I’ll be continuing to refine this approach in my main series of articles with much cleaner implementation. In the meantime though, if any of this helps anyone out there struggling with detached entities, I hope it helps. There’s precious few articles and samples that are up to date, and very few that seem to work. This is provided without any warranty of any kind! If you find any issues please e-mail me [email protected] and I’ll attempt to refactor/debug and find ways around some of the inherent limitations. In the meantime, there are a few helpful links I’ve come across in my travels on the WWW. See below. Example Solution Files [ Files ] Note: you’ll need to add the Entity Framework v6 RC package via NuGet, I haven’t included it in the archive. Helpful Links http://blog.magnusmontin.net/2013/05/30/generic-dal-using-entity-framework/ https://github.com/refactorthis/GraphDiff http://stackoverflow.com/questions/11686225/dbset-find-method-ridiculously-slow-compared-to-singleordefault-on-id http://stackoverflow.com/questions/10381106/cannot-update-many-to-many-relationships-in-entity-framework http://stackoverflow.com/questions/8413248/how-to-save-an-updated-many-to-many-collection-on-detached-entity-framework-4-1 http://stackoverflow.com/questions/6018711/generic-way-to-check-if-entity-exists-in-entity-framework
September 18, 2013
by Rob Sanders
· 163,499 Views
article thumbnail
Introduction to ElasticSearch
Learn about ElasticSearch, an open source tool developed with Java. It is a Lucene-based, scalable, full-text search engine, and a data analysis tool.
September 17, 2013
by Hüseyin Akdoğan DZone Core CORE
· 12,113 Views · 5 Likes
article thumbnail
EasyNetQ: Big Breaking Changes in the Advanced Bus
EasyNetQ is my little, easy to use, client API for RabbitMQ. It’s been doing really well recently. As I write this, it has 24,653 downloads on NuGet, making it by far the most popular high-level RabbitMQ API. The goal of EasyNetQ is to make working with RabbitMQ as easy as possible. I wanted junior developers to be able to use basic messaging patterns out-of-the-box with just a few lines of code and have EasyNetQ do all the heavy lifting: exchange-binding-queue configuration, error management, connection management, serialization, thread handling; all the things that make working against the low level AMQP C# API, provided by RabbitMQ, such a steep learning curve. To meet this goal, EasyNetQ has to be a very opinionated library. It has a set way of configuring exchanges, bindings and queues based on the .NET type of your messages. However, right from the first release, many users said that they liked the connection management, thread handling, and error management, but wanted to be able to set up their own broker topology. To support this, we introduced the advanced API, an idea stolen shamelessly from Ayende’s RavenDB client. You access the advanced bus (IAdvancedBus) via the Advanced property on IBus: var advancedBus = RabbitHutch.CreateBus("host=localhost").Advanced; Sometimes something can seem like a good idea at the time, and then later you think, “WTF! Why on earth did I do that?” It happens to me all the time. I thought it would be cool if I created the exchange-binding-queue topology and then passed it to the publish and subscribe methods, which would then internally declare the exchanges and queues and do the binding. I implemented a tasty little visitor pattern in my ITopologyVisitor. I optimized for my own programming pleasure, rather than an a simple, obvious, easy-to-understand API. I realized a while ago that a more straightforward set of declares on IAdvancedBus would be a far more obvious and intentional design. To this end, I’ve refactored the advanced bus to separate declares from publishing and consuming. I just pushed the changes to NuGet and have also updated the Advanced Bus documentation. Note that these are breaking changes, so please be careful if you are upgrading to the latest version, 0.12, and upwards. Here is a taste of how it works: Declare a queue, exchange and binding, and consume raw message bytes: var advancedBus = RabbitHutch.CreateBus("host=localhost").Advanced; var queue = advancedBus.QueueDeclare("my_queue"); var exchange = advancedBus.ExchangeDeclare("my_exchange", ExchangeType.Direct); advancedBus.Bind(exchange, queue, "routing_key"); advancedBus.Consume(queue, (body, properties, info) => Task.Factory.StartNew(() => { var message = Encoding.UTF8.GetString(body); Console.Out.WriteLine("Got message: '{0}'", message); })); Note that I’ve renamed ‘Subscribe’ to ‘Consume’ to better reflect the underlying AMQP method. Declare an exchange and publish a message: var advancedBus = RabbitHutch.CreateBus("host=localhost").Advanced; var exchange = advancedBus.ExchangeDeclare("my_exchange", ExchangeType.Direct); using (var channel = advancedBus.OpenPublishChannel()) { var body = Encoding.UTF8.GetBytes("Hello World!"); channel.Publish(exchange, "routing_key", new MessageProperties(), body); } You can also delete exchanges, queues and bindings: var advancedBus = RabbitHutch.CreateBus("host=localhost").Advanced; // declare some objects var queue = advancedBus.QueueDeclare("my_queue"); var exchange = advancedBus.ExchangeDeclare("my_exchange", ExchangeType.Direct); var binding = advancedBus.Bind(exchange, queue, "routing_key"); // and then delete them advancedBus.BindingDelete(binding); advancedBus.ExchangeDelete(exchange); advancedBus.QueueDelete(queue); advancedBus.Dispose(); I think these changes make for a much better advanced API. Have a look at the documentation for the details.
September 13, 2013
by Mike Hadlow
· 12,374 Views
article thumbnail
How to shard a cron
Sharding is a database partitioning technique that distributed Aggregates such as rows or documents across multiple servers; this choice for horizontal queries trades in some client complexity (whose queries must include a shard key such as a zip code or a customer id) for the capability of distributing the dataset between multiple servers, scaling not only the read but also the write capacity. On the application side, there are several singleton processes - for example cron configurations - that are usually run only once on the whole data set. For a certain category of singleton processes we can switch to a shard-like architecture that can scale first to multiple processes and when necessary to multiple servers. Step 1: identify the candidate process Take a look at your crontab or at your process scheduling configuration if you use another infrastructure. Some of the processes are aggregations of data producing statistics, and their work can already be distributed with patterns such as MapReduce. The kind of processes interesting for client sharding is the one where an operation is performed over every single element of the data set. Each element is an Aggregate and as such does not interact, in a single transaction, with other ones. Therefore these processes are intrinsically parallelizable: you only need a way to distribute the load. Some examples of shard-able processes are: rebuild the data aggregates for a new day for each user perform some consistency checks on every customer order send all pending orders perform a renewal for each user subscription Step 2: choose a shard key Once you have identified the aggregates along which to parallelize a length operation, the choice of the shard key will usually be straightforward. This key must be uniformly distributed between the N shards you want to create on the client side. Some examples: the zip code for customers the numerical, sequential id for orders a UUID for uploaded videos the transaction reference number for money transactions Step 3: divide the work with the shard key A process before the application of sharding is usually composed of two phases: select all Aggregates whose satisfy condition C apply operation O to all the selected Aggregates. The first operation can be sometime sharded directly, transforming each of the N processes in: select all Aggregates whose satisfy condition C and whose shard key is equal to this shard's number modulo N. apply operation O to all the selected Aggregates. For example, the first operation for shard 0 of 4 can be accomplished by an SQL query: SELECT * FROM aggregate_table WHERE outdated=true // condition C AND aggregate_id % 4=0// sharding Precalculating aggregate_id % 4 can improve the performance of the query, depending on your database; however it can make more difficult to rescale the number of processes. When you switch to 8 or 16 client shards it will be necessary to stop all current running processes, recalculate the column and restart the new batch. Furthermore, the performance of ALTER TABLE is usually not good on large tables which are the subject of this client sharding technique. Some times we're not able to divide the query in a partition of the original data set directly in the database. The pattern becomes: select id (and shard key if different) for all Aggregates whose satisfy condition C. filter the subset by only considering the id whose modulo N is equal to this shard's number. apply operation O to all the selected Aggregates. For example, I use this second form while using multiple client with MongoDB. I do not know if it's possible to query ObjectIds by their modulo, so I resort to selecting all of them and then filtering them out on the client side: $id = (string) $document['_id']; $numericalValue = hexdec(substr($id, -4)); // without substr() the conversion will overflow 32-bit integers if ($numericalValues % $shards == $shard) { ... } This is only useful under the assumption that is not the query that's taking too much time in the original process, but the application of the O operation to all of its results. Note also that these solutions needs well-behaving processes that do not intervene on the data of each other: the filtering is left to the programmer. In general, it is also necessary to guarantee mutual exclusion with the original singleton process; this usually comes up when deploying the battery of N crons, as they should not start until the last of the singleton processes has terminated not to be started again.
September 4, 2013
by Giorgio Sironi
· 6,855 Views
article thumbnail
API Gateway and API Portal - The pillars of API Management and the evolution of SOA
API Management solutions must combine an API Portal (for signing up developers) with an API Gateway (to link back to the enterprise). But where do these come from, and what is the relationship with SOA? To answer these questions, first let's look at a bit of history: In the 2000's, we had the SOA Gateway and the SOA Registry, working hand-in-hand. This was "SOA Governance". The SOA Registry (with a Repository) was intended to be the "central store of truth" for information about Web Services. It was often the public face of SOA Governance, the part which people could see. Usually the services in the registry took the form of heavyweight SOAP services, defined by WSDLs. The problem was that developers were often forced to register their SOAP services in the registry, rather than feeling that it was something beneficial to them. Browsing the registry was also a chore, involving the use of UDDI, also a heavyweight protocol (in fact, it was built on SOAP). Fast-forward to the current decade, and we find that the SOA Registry has been replaced by the API Portal. An API portal is also the "central store of truth", but now it includes REST APIs definitions (usually expressed using a Swagger-type format) as well as SOAP services. The API Portal is designed to be useful and helpful to developers who wish to build apps, rather than feeling like a chore to use. The lesson of SOA was that an attitude of "If we build it, they will come" (or "If we put it in the SOA Registry, people will use it") does not work. You have to make it into a pleasant experience for developers. API portals work for the very reason that SOA registries did not work: usability. Just like the SOA Gateway worked with the SOA Registry, so the API Gateway works hand-in-hand with the API Portal. Together, the combination of the API Portal with the API Gateway constitutes "API Management". The API Portal is for developers to sign up to use APIs, receive API Keys and quotas, and the API Gateway operates at runtime, managing the API Key usage and enforcing the API usage quotas. The API Gateway also performs the very important task of bridging from the technologies used by API clients (REST, OAuth) to the technologies used in the enterprise (Kerberos, SAML, or proprietary identity tokens such as CA SiteMinder smsession tokens). For more on this bridging, check out my webinar with Jason Cardinal from Identica tomorrow on "Bridging APIs to Enterprise Infrastructure". Gartner defines the combination of SOA Governance and API Management as "Application Services Governance". I'm proud to say that Axway (which acquired Vordel in 2012) is recognized by Gartner as a Leader in the category of Application Services Governance. We've seen an evolution of technologies (SOAP to REST) and approach (the UDDI registry to the web-based API Portal) in the journey from SOA Governance to API Management. From 30,000 feet, SOA Governance and API Management might look similar, but the new approach of API Management has already outshone SOA. The API Gateway and API Portal are key to this.
September 3, 2013
by Mitch Pronschinske
· 7,846 Views
article thumbnail
Assigning UUIDs to Neo4j Nodes and Relationships
TL;DR: This blog post features a small demo project on github: neo4j-uuid and explains how to automatically assign UUIDs to nodes and relationships in Neo4j. A very brief introduction into Neo4j 1.9′s KernelExtensionFactory is included as well. A Little Rant on Neo4j Node/Relationship IDs In a lot of use cases there is demand for storing a reference to a Neo4j node or relationship in a third party system. The first naive idea probably is to use the internal node/relationship id that Neo4j provides. Do not do that! Ever! You ask why? Well, Neo4j’s id is basically a offset in one of the store files Neo4j uses (with some math involved). Assume you delete couple of nodes. This produces holes in the store files that Neo4j might reclaim when creating new nodes later on. And since the id is a file offset there is a chance that the new node will have exactly the same id like the previously deleted node. If you don’t synchronously update all node id references stored elsewhere, you’re in trouble. If neo4j would be completely redeveloped from scratch the getId() method would not be part of the public API. As long as you use node ids only inside a request of an application for example, there’s nothing wrong. To repeat myself: Never ever store a node id in a third party system. I have officially warned you. UUIDs Enough of ranting, let’s see what we can do to safely store node references in an external system. Basically we need an identifier that has no semantics in contrast to the node id. A common approach to this is using Universally Unique Identifiers (UUID). Java JDK offers a UUID implementation, so we could potentially use UUID.randomUUID(). Unfortunately random UUIDs are slow to generate. A preferred approach is to use the machine’s MAC and a timestamp as base for the UUID – this should provide enough uniqueness. There a nice library out there at http://wiki.fasterxml.com/JugHome providing exactly what we need. Automatic UUID Assignments For convenience it would be great if all fresh created nodes and relationships get automatically assigned a uuid property without doing this explicitly. Fortunately Neo4j supports TransactionEventHandlers, a callback interface pluging into transaction handling. A TransactionEventHandler has a chance to modify or veto any transaction. It’s a sharp tool which can have significant negative performance impact if used the wrong way. I’ve implemented a UUIDTransactionEventHandler that performs the following tasks: Populate a UUID property for each new node or relationship Reject a transaction if a manual modification of a UUID is attempted; either assignment or removal public class UUIDTransactionEventHandler implements TransactionEventHandler { public static final String UUID_PROPERTY_NAME = "uuid"; private final TimeBasedGenerator uuidGenerator = Generators.timeBasedGenerator(); @Override public Object beforeCommit(TransactionData data) throws Exception { checkForUuidChanges(data.removedNodeProperties(), "remove"); checkForUuidChanges(data.assignedNodeProperties(), "assign"); checkForUuidChanges(data.removedRelationshipProperties(), "remove"); checkForUuidChanges(data.assignedRelationshipProperties(), "assign"); populateUuidsFor(data.createdNodes()); populateUuidsFor(data.createdRelationships()); return null; } @Override public void afterCommit(TransactionData data, java.lang.Object state) { } @Override public void afterRollback(TransactionData data, java.lang.Object state) { } /** * @param propertyContainers set UUID property for a iterable on nodes or relationships */ private void populateUuidsFor(Iterable propertyContainers) { for (PropertyContainer propertyContainer : propertyContainers) { if (!propertyContainer.hasProperty(UUID_PROPERTY_NAME)) { final UUID uuid = uuidGenerator.generate(); final StringBuilder sb = new StringBuilder(); sb.append(Long.toHexString(uuid.getMostSignificantBits())).append(Long.toHexString(uuid.getLeastSignificantBits())); propertyContainer.setProperty(UUID_PROPERTY_NAME, sb.toString()); } } } private void checkForUuidChanges(Iterable> changeList, String action) { for (PropertyEntry removedProperty : changeList) { if (removedProperty.key().equals(UUID_PROPERTY_NAME)) { throw new IllegalStateException("you are not allowed to " + action + " " + UUID_PROPERTY_NAME + " properties"); } } } } Setting up Using KernelExtensionFactory There are two remaining tasks for full automation of UUID assignments: We need to setup autoindexing for uuid properties to have a convenient way to look up nodes or relationships by UUID We need to register UUIDTransactionEventHandler with the graph database Since version 1.9 Neo4j has the notion of KernelExtensionFactory. Using KernelExtensionFactory you can supply a class that receives lifecycle callbacks when e.g. Neo4j is started or stopped. This is the right place for configuring autoindexing and setting up the TransactionEventHandler. Since JVM’s ServiceLoader is used KernelExtenstionFactories need to be registered in a file META-INF/services/org.neo4j.kernel.extension.KernelExtensionFactory by listing all implementations you want to use: org.neo4j.extension.uuid.UUIDKernelExtensionFactory KernelExtensionFactories can declare dependencies, therefore declare a inner interface (“Dependencies” in code) below that just has getters. Using proxies Neo4j will implement this class and supply you with the required dependencies. The dependencies are match on requested type, see Neo4j’s source code what classes are supported for being dependencies. KernelExtensionFactories must implement a newKernelExtension method that is supposed to return a instance of LifeCycle. For our UUID project we return a instance of UUIDLifeCycle: package org.neo4j.extension.uuid; import org.neo4j.graphdb.GraphDatabaseService; import org.neo4j.graphdb.PropertyContainer; import org.neo4j.graphdb.event.TransactionEventHandler; import org.neo4j.graphdb.factory.GraphDatabaseSettings; import org.neo4j.graphdb.index.AutoIndexer; import org.neo4j.graphdb.index.IndexManager; import org.neo4j.kernel.configuration.Config; import org.neo4j.kernel.lifecycle.LifecycleAdapter; import java.util.Map; /** * handle the setup of auto indexing for UUIDs and registers a {@link UUIDTransactionEventHandler} */ class UUIDLifeCycle extends LifecycleAdapter { private TransactionEventHandler transactionEventHandler; private GraphDatabaseService graphDatabaseService; private IndexManager indexManager; private Config config; UUIDLifeCycle(GraphDatabaseService graphDatabaseService, Config config) { this.graphDatabaseService = graphDatabaseService; this.indexManager = graphDatabaseService.index(); this.config = config; } /** * since {@link org.neo4j.kernel.NodeAutoIndexerImpl#start()} is called *after* {@link org.neo4j.extension.uuid.UUIDLifeCycle#start()} it would apply config settings for auto indexing. To prevent this we change config here. * @throws Throwable */ @Override public void init() throws Throwable { Map params = config.getParams(); params.put(GraphDatabaseSettings.node_auto_indexing.name(), "true"); params.put(GraphDatabaseSettings.relationship_auto_indexing.name(), "true"); config.applyChanges(params); } @Override public void start() throws Throwable { startUUIDIndexing(indexManager.getNodeAutoIndexer()); startUUIDIndexing(indexManager.getRelationshipAutoIndexer()); transactionEventHandler = new UUIDTransactionEventHandler(); graphDatabaseService.registerTransactionEventHandler(transactionEventHandler); } @Override public void stop() throws Throwable { stopUUIDIndexing(indexManager.getNodeAutoIndexer()); stopUUIDIndexing(indexManager.getRelationshipAutoIndexer()); graphDatabaseService.unregisterTransactionEventHandler(transactionEventHandler); } void startUUIDIndexing(AutoIndexer autoIndexer) { autoIndexer.startAutoIndexingProperty(UUIDTransactionEventHandler.UUID_PROPERTY_NAME); } void stopUUIDIndexing(AutoIndexer autoIndexer) { autoIndexer.stopAutoIndexingProperty(UUIDTransactionEventHandler.UUID_PROPERTY_NAME); } } Most of the code is pretty much straight forward, l.44/45 set up autoindexing for uuid property. l48 registers the UUIDTransactionEventHandler with the graph database. Not that obvious is the code in the init() method. Neo4j’s NodeAutoIndexerImpl configures autoindexing itself and switches it on or off depending on the respective config option. However we want to have autoindexing always switched on. Unfortunately NodeAutoIndexerImpl is run after our code and overrides our settings. That’s we l.37-40 tweaks the config settings to force nice behaviour of NodeAutoIndexerImpl. Looking up Nodes or Relationships for UUID For completeness the project also contains a trivial unmanaged extension for looking up nodes and relationships using the REST interface, see UUIDRestInterface. By sending a HTTP GET to http://localhost:7474/db/data/node/ the node’s internal id returned. Build System and Testing For building the project, Gradle is used; build.gradle is trivial. Of course couple of tests are included. As a long standing addict I’ve obviously used Spock for testing. See the test code here. Final Words A downside of this implementation is that each and every node and relationships gets indexed. Indexing always trades write performance for read performance. Keep that in mind. It might make sense to get rid of unconditional auto indexing and put some domain knowledge into the TransactionEventHandler to assign only those nodes uuids and index them that are really used for storing in an external system.
August 22, 2013
by Stefan Armbruster
· 11,695 Views
article thumbnail
DyngoDB: A MongoDB Interface for DynamoDB
You might be asking yourself, 'why do I need a MongoDB-like experience for DynamoDB when there are already full-MongoDB cloud services like MMS, MongoLab, MongoHQ and MongoDirector? One developer believes there is a need and has set up an experimental project called DyngoDB. It provides the MongoDB-style interface in front of Amazon's DynamoDB and their CloudSearch service. Apparently, in the developer's case, he only wants the MongoDB interface but prefers the DynamoDB storage engine. We'll have to see if other developers also have this specific set of preferences.
August 20, 2013
by Mitch Pronschinske
· 5,754 Views
article thumbnail
Destroy Cookie while Logging out.
I was facing a problem where while a person logs out his session is invalidated but the JSESSIONID still remained in the browser. As a result while logging in the Java API used to get the request from the browser along with a JSESSIONID(Just the ID since the session was invalidated) and would create the new session with the same ID. To fix this problem I used the above code so that whenever a user logs out the entire JSESSIONID becomes empty and thus cookie wont exist for that site.Anyone using JAVA can utilize this in their code. @RequestMapping(value = "/logout", method = RequestMethod.POST) public void logout(HttpServletRequest request, HttpServletResponse response) { /* Getting session and then invalidating it */ HttpSession session = request.getSession(false); if (request.isRequestedSessionIdValid() && session != null) { session.invalidate(); } handleLogOutResponse(response); } /** * This method would edit the cookie information and make JSESSIONID empty * while responding to logout. This would further help in order to. This would help * to avoid same cookie ID each time a person logs in * @param response */ private void handleLogOutResponse(HttpServletResponse response) { Cookie[] cookies = request.getCookies(); for (Cookie cookie : cookies) { cookie.setMaxAge(0); cookie.setValue(null); cookie.setPath("/"); response.addCookie(cookie); } }
August 15, 2013
by Shiv Kumar Ganesh
· 41,348 Views · 2 Likes
article thumbnail
neo4j: Extracting a subgraph as an adjacency matrix and calculating eigenvector centrality with JBLAS
Earlier in the week I wrote a blog post showing how to calculate the eigenvector centrality of an adjacency matrix using JBLAS and the next step was to work out the eigenvector centrality of a neo4j sub graph. There were 3 steps involved in doing this: Export the neo4j sub graph as an adjacency matrix Run JBLAS over it to get eigenvector centrality scores for each node Write those scores back into neo4j I decided to make use of the Paul Revere data set from Kieran Healy’s blog post which consists of people and groups that they had membership of. The script to import the data is on my fork of the revere repository. Having imported the data the next step was to write a cypher query which would give me the people in anadjacency matrix with the number in each column/row intersection showing how many common groups that pair of people had. I thought it’d be easier to build this query incrementally so I started out writing a query which would return one row of the adjacency matrix: MATCH p1:Person, p2:Person WHERE p1.name = "Paul Revere" WITH p1, p2 MATCH p = p1-[?:MEMBER_OF]->()<-[?:MEMBER_OF]-p2 WITH p1.name AS p1, p2.name AS p2, COUNT(p) AS links ORDER BY p2 RETURN p1, COLLECT(links) AS row Here we start with Paul Revere and then find the relationships between him and every other person by way of a common group membership. We use an optional relationship since we need to include a value in each column/row of our adjacency matrix we need to return a 0 value for anyone he doesn’t intersect with. If we run that query we get back the following: +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | p1 | row | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "Paul Revere" | [2,1,1,1,1,1,1,1,1,1,1,1,1,1,2,3,1,1,1,1,1,1,3,3,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,3,2,1,1,2,1,2,1,1,1,1,1,0,1,1,1,1,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,2,1,1,1,1,1,1,2,1,3,1,3,2,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,0,1,0,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,4,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3,1,1,1,1,1,1,2,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,3,1,1,2,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3,1,1,2,1,1,1,1,1,1,1,1,3,1,1,1,1,3,1,1,1,1,0,1,2,1,1,1,1,1,1,1] | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ As it turns outs we’ve only got to remove the WHERE clause and order everybody and we’ve get the adjacency matrix for everyone: MATCH p1:Person, p2:Person WITH p1, p2 MATCH p = p1-[?:MEMBER_OF]->()<-[?:MEMBER_OF]-p2 WITH p1.name AS p1, p2.name AS p2, COUNT(p) AS links ORDER BY p2 RETURN p1, COLLECT(links) AS row ORDER BY p1 +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | p1 | row | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "Abiel Ruddock" | [0,1,1,1,0,1,0,1,0,0,1,1,1,0,1,2,0,1,0,1,1,1,2,2,1,0,0,1,1,0,1,1,1,1,1,0,0,0,0,1,1,0,0,2,2,0,0,1,1,2,1,1,1,0,1,0,1,1,0,0,2,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,0,1,1,0,1,1,1,1,1,1,1,1,1,0,2,1,2,1,0,0,0,0,1,1,0,1,0,0,1,0,2,0,0,1,0,0,0,1,0,0,2,0,1,0,1,1,1,0,0,1,1,0,0,0,0,0,0,2,0,0,0,0,0,0,0,1,0,1,1,0,1,1,1,2,0,0,1,1,0,0,2,0,1,2,1,1,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,1,2,1,0,1,1,1,1,1,0,0,1,1,0,0,0,0,1,0,1,1,0,0,1,0,0,2,1,0,0,1,1,1,1,0,1,0,0,0,1,0,1,0,1,1,0,0,1,0,1,0,1,0,0,1,0,2,1,1,0,0,2,0,1,0,0,0,0,1,0,1,0,1,0,1,0] | | "Abraham Hunt" | [1,0,1,1,0,1,0,0,0,0,0,1,0,0,0,1,0,1,0,1,1,0,1,1,0,0,0,1,1,0,1,0,0,1,0,0,0,0,0,1,0,0,0,1,1,0,0,0,1,1,1,1,1,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,0,1,1,0,1,0,1,1,1,1,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,1,1,0,1,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,1,1,1,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,1,0] | ... +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 254 rows 9897 ms The next step was to wire up the query results with the JBLAS code that I wrote in the previous post. I ended up with the following: public class Neo4jAdjacencyMatrixSpike { public static void main(String[] args) throws SQLException { ClientResponse response = client() .resource("http://localhost:7474/db/data/cypher") .entity(queryAsJson(), MediaType.APPLICATION_JSON) .accept(MediaType.APPLICATION_JSON) .post(ClientResponse.class); JsonNode result = response.getEntity(JsonNode.class); ArrayNode rows = (ArrayNode) result.get("data"); List principalEigenvector = JBLASSpike.getPrincipalEigenvector(new DoubleMatrix(asMatrix(rows))); List people = asPeople(rows); updatePeopleWithEigenvector(people, principalEigenvector); System.out.println(sort(people).take(10)); } private static double[][] asMatrix(ArrayNode rows) { double[][] matrix = new double[rows.size()][254]; int rowCount = 0; for (JsonNode row : rows) { ArrayNode matrixRow = (ArrayNode) row.get(2); double[] rowInMatrix = new double[254]; matrix[rowCount] = rowInMatrix; int columnCount = 0; for (JsonNode jsonNode : matrixRow) { matrix[rowCount][columnCount] = jsonNode.asInt(); columnCount++; } rowCount++; } return matrix; } // rest cut for brevity } Here we are taking the query and then converting it into an array of arrays before passing it to our JBLAS code to calculate the principal eigenvector. We then return the top 10 people: Person{name='William Cooper', eigenvector=0.172604992239612, nodeId=68}, Person{name='Nathaniel Barber', eigenvector=0.17260499223961198, nodeId=18}, Person{name='John Hoffins', eigenvector=0.17260499223961195, nodeId=118}, Person{name='Paul Revere', eigenvector=0.17171142003936804, nodeId=207}, Person{name='Caleb Davis', eigenvector=0.16383970722169897, nodeId=71}, Person{name='Caleb Hopkins', eigenvector=0.16383970722169897, nodeId=121}, Person{name='Henry Bass', eigenvector=0.16383970722169897, nodeId=21}, Person{name='Thomas Chase', eigenvector=0.16383970722169897, nodeId=54}, Person{name='William Greenleaf', eigenvector=0.16383970722169897, nodeId=104}, Person{name='Edward Proctor', eigenvector=0.15600043886738055, nodeId=201} I get back the same 10 people as Kieran Healy although they have different eigenvector values. As far as I understand the absolute value doesn’t matter, what’s more important is the relative score to other people so I think we’re ok. The final step was to write these eigenvector values back into neo4j which we can do with the following code: private static void updateNeo4jWithEigenvectors(List people) { for (Person person : people) { ObjectNode request = JsonNodeFactory.instance.objectNode(); request.put("query", "START p = node({nodeId}) SET p.eigenvectorCentrality={value}"); ObjectNode params = JsonNodeFactory.instance.objectNode(); params.put("nodeId", person.nodeId); params.put("value", person.eigenvector); request.put("params", params); client() .resource("http://localhost:7474/db/data/cypher") .entity(request, MediaType.APPLICATION_JSON) .accept(MediaType.APPLICATION_JSON) .post(ClientResponse.class); } } Now we might use that eigenvector centrality value in other queries, such as one to show who the most central/potentially influential people are in each group: MATCH g:Group<-[:MEMBER_OF]-p WITH g.name AS group, p.name AS personName, p.eigenvectorCentrality as eigen ORDER BY eigen DESC WITH group, COLLECT(personName) AS people RETURN group, HEAD(people) + [HEAD(TAIL(people))] + [HEAD(TAIL(TAIL(people)))] AS mostCentral +--------------------------------------------------------------------------+ | group | mostCentral | +--------------------------------------------------------------------------+ | "StAndrewsLodge" | ["Paul Revere","Joseph Warren","Thomas Urann"] | | "BostonCommittee" | ["William Cooper","Nathaniel Barber","John Hoffins"] | | "LoyalNine" | ["Caleb Hopkins","William Greenleaf","Caleb Davis"] | | "LondonEnemies" | ["William Cooper","Nathaniel Barber","John Hoffins"] | | "LongRoomClub" | ["Paul Revere","John Hancock","Benjamin Clarke"] | | "NorthCaucus" | ["William Cooper","Nathaniel Barber","John Hoffins"] | | "TeaParty" | ["William Cooper","Nathaniel Barber","John Hoffins"] | +--------------------------------------------------------------------------+ 7 rows 280 ms Our top ten feature frequently although it’s interesting that only one of them is in the ‘LongRoomClub’ group which perhaps indicates that people in that group are less likely to be members of the other ones. I’d be interested if anyone can think of other potential uses for eigenvector centrality once we’ve got it back in the graph. All the code described in this post is on github if you want to take it for a spin.
August 12, 2013
by Mark Needham
· 5,674 Views
article thumbnail
EclipseLink MOXy and the Java API for JSON Processing - Object Model APIs
The Java API for JSON Processing (JSR-353) is the Java standard for producing and consuming JSON which was introduced as part of Java EE 7. JSR-353 includes object (DOM like) and stream (StAX like) APIs. In this post I will demonstrate the initial JSR-353 support we have added to MOXy's JSON binding in EclipseLink 2.6. You can now use MOXy to marshal to: javax.json.JsonArrayBuilder javax.json.JsonObjectBuilder And unmarshal from: javax.json.JsonStructure javax.json.JsonObject javax.json.JsonArray You can try this out today using a nightly build of EclipseLink 2.6.0: http://www.eclipse.org/eclipselink/downloads/nightly.php The JSR-353 reference implementation is available here: https://java.net/projects/jsonp/downloads/download/ri/javax.json-ri-1.0.zip Java Model Below is the simple customer model that we will use for this post. Note for this example we are only using the standard JAXB (JSR-222) annotations. Customer package blog.jsonp.moxy; import java.util.*; import javax.xml.bind.annotation.*; @XmlType(propOrder={"id", "firstName", "lastName", "phoneNumbers"}) public class Customer { private int id; private String firstName; private String lastName; private List phoneNumbers = new ArrayList(); public int getId() { return id; } public void setId(int id) { this.id = id; } public String getFirstName() { return firstName; } public void setFirstName(String firstName) { this.firstName = firstName; } @XmlElement(nillable=true) public String getLastName() { return lastName; } public void setLastName(String lastName) { this.lastName = lastName; } @XmlElement public List getPhoneNumbers() { return phoneNumbers; } } PhoneNumber package blog.jsonp.moxy; import javax.xml.bind.annotation.*; @XmlAccessorType(XmlAccessType.FIELD) public class PhoneNumber { private String type; private String number; public String getType() { return type; } public void setType(String type) { this.type = type; } public String getNumber() { return number; } public void setNumber(String number) { this.number = number; } } jaxb.properties To specify MOXy as your JAXB provider you need to include a file called jaxb.properties in the same package as your domain model with the following entry (see: Specifying EclipseLink MOXy as your JAXB Provider) javax.xml.bind.context.factory=org.eclipse.persistence.jaxb.JAXBContextFactory Marshal Demo In the demo code below we will use a combination of JSR-353 and MOXy APIs to produce JSON. JSR-353's JsonObjectBuilder and JsonArrayBuilder are used to produces instances of JsonObject and JsonArray. We can use MOXy to marshal to these builders by wrapping them in instances of MOXy's JsonObjectBuilderResult and JsonArrayBuilderResult. package blog.jsonp.moxy; import java.util.*; import javax.json.*; import javax.json.stream.JsonGenerator; import javax.xml.bind.*; import org.eclipse.persistence.jaxb.JAXBContextProperties; import org.eclipse.persistence.oxm.json.*; public class MarshalDemo { public static void main(String[] args) throws Exception { // Create the EclipseLink JAXB (MOXy) Marshaller Map jaxbProperties = new HashMap(2); jaxbProperties.put(JAXBContextProperties.MEDIA_TYPE, "application/json"); jaxbProperties.put(JAXBContextProperties.JSON_INCLUDE_ROOT, false); JAXBContext jc = JAXBContext.newInstance(new Class[] {Customer.class}, jaxbProperties); Marshaller marshaller = jc.createMarshaller(); // Create the JsonArrayBuilder JsonArrayBuilder customersArrayBuilder = Json.createArrayBuilder(); // Build the First Customer Customer customer = new Customer(); customer.setId(1); customer.setFirstName("Jane"); customer.setLastName(null); PhoneNumber phoneNumber = new PhoneNumber(); phoneNumber.setType("cell"); phoneNumber.setNumber("555-1111"); customer.getPhoneNumbers().add(phoneNumber); // Marshal the First Customer Object into the JsonArray JsonArrayBuilderResult result = new JsonArrayBuilderResult(customersArrayBuilder); marshaller.marshal(customer, result); // Build List of PhoneNumer Objects for Second Customer List phoneNumbers = new ArrayList(2); PhoneNumber workPhone = new PhoneNumber(); workPhone.setType("work"); workPhone.setNumber("555-2222"); phoneNumbers.add(workPhone); PhoneNumber homePhone = new PhoneNumber(); homePhone.setType("home"); homePhone.setNumber("555-3333"); phoneNumbers.add(homePhone); // Marshal the List of PhoneNumber Objects JsonArrayBuilderResult arrayBuilderResult = new JsonArrayBuilderResult(); marshaller.marshal(phoneNumbers, arrayBuilderResult); customersArrayBuilder // Use JSR-353 APIs for Second Customer's Data .add(Json.createObjectBuilder() .add("id", 2) .add("firstName", "Bob") .addNull("lastName") // Included Marshalled PhoneNumber Objects .add("phoneNumbers", arrayBuilderResult.getJsonArrayBuilder()) ) .build(); // Write JSON to System.out Map jsonProperties = new HashMap(1); jsonProperties.put(JsonGenerator.PRETTY_PRINTING, true); JsonWriterFactory writerFactory = Json.createWriterFactory(jsonProperties); JsonWriter writer = writerFactory.createWriter(System.out); writer.writeArray(customersArrayBuilder.build()); writer.close(); } } Highlighted lines: 36, 37, 38, 54, 55, 64 Output Below is the output from running the marshal demo (MarshalDemo). The highlighted portions (lines 2-12 and 18-25) correspond to the portions that were populated from our Java model. [ { "id":1, "firstName":"Jane", "lastName":null, "phoneNumbers":[ { "type":"cell", "number":"555-1111" } ] }, { "id":2, "firstName":"Bob", "lastName":null, "phoneNumbers":[ { "type":"work", "number":"555-2222" }, { "type":"home", "number":"555-3333" } ] } ] Highlighted lines: 2-12, 18-25 Unmarshal Demo MOXy enables you to unmarshal from a JSR-353 JsonStructure (JsonObject or JsonArray). To do this simply wrap the JsonStructure in an instance of MOXy's JsonStructureSource and use one of the unmarshal operations that takes an instance of Source. package blog.jsonp.moxy; import java.io.FileInputStream; import java.util.*; import javax.json.*; import javax.xml.bind.*; import org.eclipse.persistence.jaxb.JAXBContextProperties; import org.eclipse.persistence.oxm.json.JsonStructureSource; public class UnmarshalDemo { public static void main(String[] args) throws Exception { try (FileInputStream is = new FileInputStream("src/blog/jsonp/moxy/input.json")) { // Create the EclipseLink JAXB (MOXy) Unmarshaller Map jaxbProperties = new HashMap(2); jaxbProperties.put(JAXBContextProperties.MEDIA_TYPE, "application/json"); jaxbProperties.put(JAXBContextProperties.JSON_INCLUDE_ROOT, false); JAXBContext jc = JAXBContext.newInstance(new Class[] {Customer.class}, jaxbProperties); Unmarshaller unmarshaller = jc.createUnmarshaller(); // Parse the JSON JsonReader jsonReader = Json.createReader(is); // Unmarshal Root Level JsonArray JsonArray customersArray = jsonReader.readArray(); JsonStructureSource arraySource = new JsonStructureSource(customersArray); List customers = (List) unmarshaller.unmarshal(arraySource, Customer.class) .getValue(); for(Customer customer : customers) { System.out.println(customer.getFirstName()); } // Unmarshal Nested JsonObject JsonObject customerObject = customersArray.getJsonObject(1); JsonStructureSource objectSource = new JsonStructureSource(customerObject); Customer customer = unmarshaller.unmarshal(objectSource, Customer.class) .getValue(); for(PhoneNumber phoneNumber : customer.getPhoneNumbers()) { System.out.println(phoneNumber.getNumber()); } } } } Highlighted lines: 27-30, 37-39 Input (input.json) The following JSON input will be converted to a JsonArray using a JsonReader. [ { "id":1, "firstName":"Jane", "lastName":null, "phoneNumbers":[ { "type":"cell", "number":"555-1111" } ] }, { "id":2, "firstName":"Bob", "lastName":null, "phoneNumbers":[ { "type":"work", "number":"555-2222" }, { "type":"home", "number":"555-3333" } ] } ] Highlighted lines: 4, 15, 20, 24 Output Below is the output from running the unmarshal demo (UnmarshalDemo). Jane Bob 555-2222 555-3333
August 7, 2013
by Blaise Doughan
· 13,771 Views
article thumbnail
NoSQL with JPA
EclipseLink, reference implementation of JPA, has JPA support for NoSQL databases (MongoDB and Oracle NoSQL) as of the version 2.4. In this tutorial we will discuss the use of MongoDB database with the JPA support of EclipseLink. The transaction previously done using the console and native java driver will be done in a web application with the help of EclipseLink. Tools and technologies used in the sample application are as follows: MongoDB version 2.4.1 MongoDB Java Driver version 2.11.1 JSF version 2.2 PrimeFaces version 3.5 EclipseLink version 2.4 Jetty 7.x Maven Plugin JDK version 1.7 Maven 3.0.4 Project Dependencies org.glassfish javax.faces 2.2.0-SNAPSHOT org.primefaces primefaces 3.5 org.primefaces.themes bootstrap 1.0.10 org.eclipse.persistence org.eclipse.persistence.jpa 2.4.0-SNAPSHOT org.eclipse.persistence org.eclipse.persistence.nosql 2.4.0-SNAPSHOT jboss jboss-j2ee 4.2.2.GA org.mongodb mongo-java-driver 2.11.1 commons-fileupload commons-fileupload 1.3 Entity Class @Entity @NoSql(dataFormat=DataFormatType.MAPPED) public class Article implements Serializable { public Article() { } @Id @GeneratedValue @Field(name="_id") private String id; @ElementCollection private List categoryLists = new ArrayList(); @Basic private String title; @Basic private String content; @Basic @Temporal(javax.persistence.TemporalType.DATE) private Date date; @Basic private String author; @ElementCollection private List tagLists = new ArrayList(); @NoSQL notation sets the data format and type and maps the NoSQL data. Because of using MongoDB in our sample application and documents in MongoDB stored in BSON format, MAP is used as data type. @ElementCollection notation maps the embedded collection into the parent document. Because more than one category and tag associated with an article would be a matter in our sample application, we map them as an element collection. Embedded Objects @Embeddable @NoSql(dataFormat=DataFormatType.MAPPED) public class Categories implements Serializable { @Basic private String category; @Embeddable @NoSql(dataFormat=DataFormatType.MAPPED) public class Tags implements Serializable { @Basic private String tag; We see @Embeddable notation at the top of the Categories and Tags’ class unlike Article entity class. The documents stored in the parent document are mapped with this notation. Please note that embedded objects do not need unique field. persistence.xml com.kodcu.entity.Article com.com.kodcu.entity.Categories com.kodcu.entity.Tags CRUD Operations index.xhtml MyBean.java public void saveArticle() { em.getTransaction().begin(); if(null == article.getId()) em.persist(article); else em.merge(article); em.getTransaction().commit(); } public void removeArticle() { em.getTransaction().begin(); em.remove(selectArticle); em.getTransaction().commit(); } 6. Demo Application Real content above and the demo application, can be accessed at NoSQL with JPA
August 6, 2013
by Hüseyin Akdoğan DZone Core CORE
· 31,649 Views · 1 Like
article thumbnail
Getting started with CQEngine: LINQ for Java, Only Faster
CQEngine or collection query engine is a library that allows you to build indices over java collections and query them for objects using exposed properties. It offers similar capability to LINQ in .net but is thought to be faster because it builds indices over collections before querying them and uses set theory instead of iterations. In this post we will see how to query a simple collection of objects, in our example, a collection of users of a hypothetical system, using CQEngine. In a subsequent post, we will also see how iteratively searching a collection compares to querying via CQEngine. The first step is to get the CQEengine jar file. Download the jar from the CQEngine website or if you are using Maven, add the following dependency. com.googlecode.cqengine cqengine 1.0.3 Next, lets create the Class whose object we will be searching for: package co.syntx.examples.cqengine; import com.googlecode.cqengine.attribute.Attribute; import com.googlecode.cqengine.attribute.SimpleAttribute; public class User { private String username; private String password; private String fullname; private Role role; public User(String username, String password, String fullname, Role role) { super(); this.username = username; this.password = password; this.fullname = fullname; this.role = role; } public static final Attribute FULL_NAME = new SimpleAttribute("fullname") { public String getValue(User user) { return user.fullname; } }; public static final Attribute USERNAME = new SimpleAttribute("username") { public String getValue(User user) { return user.username; } }; public String getUsername() { return username; } public void setUsername(String username) { this.username = username; } public String getPassword() { return password; } public void setPassword(String password) { this.password = password; } public String getFullname() { return fullname; } public void setFullname(String fullname) { this.fullname = fullname; } public Role getRole() { return role; } public void setRole(Role role) { this.role = role; } } Next, we write a class, to perform our searches. I will go function by function. 1. Function to Build a Test Indexed Collection: In the following function, we build an indexed collection, define indices on attributes, and populate this collection with a certain number of objects. In actual usage, your collection will probably be filled with objects being returned from the DB, read from a file or other similar scenarios. public void buildIndexedCollection(int size) throws Exception { indexedUsers = CQEngine.newInstance(); indexedUsers.addIndex(HashIndex.onAttribute(User.FULL_NAME)); indexedUsers.addIndex(SuffixTreeIndex.onAttribute(User.FULL_NAME)); for (int i = 0; i < size; i++) { String username = RandomStringGenerator.generateRandomString(8,RandomStringGenerator.Mode.ALPHANUMERIC); String password = RandomStringGenerator.generateRandomString(8,RandomStringGenerator.Mode.ALPHANUMERIC); String fullname = RandomStringGenerator.generateRandomString(5,RandomStringGenerator.Mode.ALPHA) + " " + RandomStringGenerator.generateRandomString(5,RandomStringGenerator.Mode.ALPHA); Role role = new Role(); role.setName("admin"); indexedUsers.add(new User(username, password, fullname, role)); } } In line 3 we are initializing a new Indexed Collection, a reference of which is stored in the class variable indexedUsers. The reference is of type IndexedCollection In lines 4 and 5, we define two indices i) a Hash Index suitable for equal style queries. ii) a Suffix Index suitable for ends with style queried. For the purpose of this example, we are building indices only on the the Full name field. In line 8, we use a random string generator to populate dummy objects. In line 14 we add our object to our indexed collection. 2. Function to Perform Indexed Search for Exact Matches: In this function, we are querying for names that exactly match a given name. The equal function takes in the attribute upon which to perform the query and the value to search. The method equal is statically imported via a import static com.googlecode.cqengine.query.QueryFactory.*; In the example below, we are looping the results returned by the retrieve method and not doing anything with it. In your case, you may choose to return the Iterator returned by retrieve. public void indexedSearchForEquals(String fullname) throws Exception { Query query = equal(User.FULL_NAME, fullname); for (User user : indexedUsers.retrieve(query)) { // System.out.println(user.getFullname()); } } 3. Function to Perform Indexed Search for Ends With Matches: In this function, we are querying for names that end with a certain suffix. public void indexedSearchForEndsWith(String endswith) throws Exception { Query query1 = endsWith(User.FULL_NAME, endswith); for (User user : indexedUsers.retrieve(query1)) { // System.out.println(user.getFullname()); } } 4. Function to Perform Indexed Search for Equals or Ends With Matches: This function is a combination of both queries mentioned below and has an or relation between them. public void indexedSearchForEqualOrEndsWith(String equals, String ends) throws Exception { Query query = or(equal(User.FULL_NAME, equals),endsWith(User.FULL_NAME, ends)); for (User user : indexedUsers.retrieve(query)) { // System.out.println(user.getFullname()); } } 5. Putting it together: All the functions above belong to a class called CQEngineTest. We create a new object, build a test collection, then search for either exact matches, strings that end with a certain suffix or either. CQEngineTest test = new CQEngineTest(); test.buildIndexedCollection(size); test.indexedSearchForEqualOrEndsWith("test", "test"); In this example, we have used a Hash Index and a Suffix Tree Index. There are many other types of indices that you can choose depending on the type of query that you want to perform. A list of these indices and when to use them can be found on the cqengine project page. Also, apart from equal or endswith operations, there are others that you would typically expect to find. In a subsequent post we will also see how an indexed search compares with a typical iterative search in terms of times.
August 5, 2013
by Faheem Sohail
· 28,079 Views · 1 Like
article thumbnail
JPA Searching Using Lucene - A Working Example with Spring and DBUnit
Working Example on Github There's a small, self contained mavenised example project over on Github to accompany this post - check it out here:https://github.com/adrianmilne/jpa-lucene-spring-demo Running the Demo See the README file over on GitHub for details of running the demo. Essentially - it's just running the Unit Tests, with the usual maven build and test results output to the console - example below. This is the result of running the DBUnit test, which inserts Book data into the HSQL database using JPA, and then uses Lucene to query the data, testing that the expected Books are returned (i.e. only those int he SCI-FI category, containing the word 'Space', and ensuring that any with 'Space' in the title appear before those with 'Space' only in the description. The Book Entity Our simple example stores Books. The Book entity class below is a standard JPA Entity with a few additional annotations to identify it to Lucene: @Indexed - this identifies that the class will be added to the Lucene index. You can define a specific index by adding the 'index' attribute to the annotation. We're just choosing the simplest, minimal configuration for this example. In addition to this - you also need to specify which properties on the entity are to be indexed, and how they are to be indexed. For our example we are again going for the default option by just adding an @Field annotation with no extra parameters. We are adding one other annotation to the 'title' field - @Boost - this is just telling Lucene to give more weight to search term matches that appear in this field (than the same term appearing in the description field). This example is purposefully kept minimal in terms of the ins-and-outs of Lucene (I may cover that in a later post) - we're really just concentrating on the integration with JPA and Spring for now. package com.cor.demo.jpa.entity; import javax.persistence.Entity; import javax.persistence.EnumType; import javax.persistence.Enumerated; import javax.persistence.GeneratedValue; import javax.persistence.Id; import javax.persistence.Lob; import org.hibernate.search.annotations.Boost; import org.hibernate.search.annotations.Field; import org.hibernate.search.annotations.Indexed; /** * Book JPA Entity. */ @Entity @Indexed public class Book { @Id @GeneratedValue private Long id; @Field @Boost(value = 1.5f) private String title; @Field @Lob private String description; @Field @Enumerated(EnumType.STRING) private BookCategory category; public Book(){ } public Book(String title, BookCategory category, String description){ this.title = title; this.category = category; this.description = description; } public Long getId() { return id; } public void setId(Long id) { this.id = id; } public String getTitle() { return title; } public void setTitle(String title) { this.title = title; } public BookCategory getCategory() { return category; } public void setCategory(BookCategory category) { this.category = category; } public String getDescription() { return description; } public void setDescription(String description) { this.description = description; } @Override public String toString() { return "Book [id=" + id + ", title=" + title + ", description=" + description + ", category=" + category + "]"; } } The Book Manager The BookManager class acts as a simple service layer for the Book operations - used for adding books and searching books. As you can see, the JPA database resources are autowired in by Spring from the application-context.xml. We are just using an in-memory hsql database in this example. package com.cor.demo.jpa.manager; import java.util.List; import javax.persistence.EntityManager; import javax.persistence.PersistenceContext; import javax.persistence.PersistenceContextType; import javax.persistence.Query; import org.hibernate.search.jpa.FullTextEntityManager; import org.hibernate.search.jpa.Search; import org.hibernate.search.query.dsl.QueryBuilder; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.context.annotation.Scope; import org.springframework.stereotype.Component; import org.springframework.transaction.annotation.Transactional; import com.cor.demo.jpa.entity.Book; import com.cor.demo.jpa.entity.BookCategory; /** * Manager for persisting and searching on Books. Uses JPA and Lucene. */ @Component @Scope(value = "singleton") public class BookManager { /** Logger. */ private static Logger LOG = LoggerFactory.getLogger(BookManager.class); /** JPA Persistence Unit. */ @PersistenceContext(type = PersistenceContextType.EXTENDED, name = "booksPU") private EntityManager em; /** Hibernate Full Text Entity Manager. */ private FullTextEntityManager ftem; /** * Method to manually update the Full Text Index. This is not required if inserting entities * using this Manager as they will automatically be indexed. Useful though if you need to index * data inserted using a different method (e.g. pre-existing data, or test data inserted via * scripts or DbUnit). */ public void updateFullTextIndex() throws Exception { LOG.info("Updating Index"); getFullTextEntityManager().createIndexer().startAndWait(); } /** * Add a Book to the Database. */ @Transactional public Book addBook(Book book) { LOG.info("Adding Book : " + book); em.persist(book); return book; } /** * Delete All Books. */ @SuppressWarnings("unchecked") @Transactional public void deleteAllBooks() { LOG.info("Delete All Books"); Query allBooks = em.createQuery("select b from Book b"); List books = allBooks.getResultList(); // We need to delete individually (rather than a bulk delete) to ensure they are removed // from the Lucene index correctly for (Book b : books) { em.remove(b); } } @SuppressWarnings("unchecked") @Transactional public void listAllBooks() { LOG.info("List All Books"); LOG.info("------------------------------------------"); Query allBooks = em.createQuery("select b from Book b"); List books = allBooks.getResultList(); for (Book b : books) { LOG.info(b.toString()); getFullTextEntityManager().index(b); } } /** * Search for a Book. */ @SuppressWarnings("unchecked") @Transactional public List search(BookCategory category, String searchString) { LOG.info("------------------------------------------"); LOG.info("Searching Books in category '" + category + "' for phrase '" + searchString + "'"); // Create a Query Builder QueryBuilder qb = getFullTextEntityManager().getSearchFactory().buildQueryBuilder().forEntity(Book.class).get(); // Create a Lucene Full Text Query org.apache.lucene.search.Query luceneQuery = qb.bool() .must(qb.keyword().onFields("title", "description").matching(searchString).createQuery()) .must(qb.keyword().onField("category").matching(category).createQuery()).createQuery(); Query fullTextQuery = getFullTextEntityManager().createFullTextQuery(luceneQuery, Book.class); // Run Query and print out results to console List result = (List) fullTextQuery.getResultList(); // Log the Results LOG.info("Found Matching Books :" + result.size()); for (Book b : result) { LOG.info(" - " + b); } return result; } /** * Convenience method to get Full Test Entity Manager. Protected scope to assist mocking in Unit * Tests. * @return Full Text Entity Manager. */ protected FullTextEntityManager getFullTextEntityManager() { if (ftem == null) { ftem = Search.getFullTextEntityManager(em); } return ftem; } /** * Get the JPA Entity Manager (required for the DBUnit Tests). * @return Entity manager */ protected EntityManager getEntityManager() { return em; } /** * Sets the JPA Entity Manager (required to assist with mocking in Unit Test) * @param em EntityManager */ protected void setEntityManager(EntityManager em) { this.em = em; } } application-context.xml This is the Spring configuration file. You can see in the JPA Entity Manager configuration the key for 'hibernate.search.default.indexBase' is added to the jpaPropertyMap to tell Lucene where to create the index. We have also externalised the database login credentials to a properties file (as you may wish to change these for different environments), for example by updating the propertyConfigurer to look for and use a different external properties if it finds one on the file system). classpath:/system.properties Testing Using DBUnit In the project is an example of using DBUnit with Spring to test adding and searching against the database using DBUnit to populate the database with test data, exercise the Book Manager search operations and then clean the database down. This is a great way to test database functionality and can be easily integrated into maven and continuous build environments. Because DBUnit bypasses the standard JPA insertion calls - the data does not get automatically added to the Lucene index. We have a method exposed on the service interface to update the Full Text index 'updateFullTextIndex()' - calling this causes Lucene to update the index with the current data in the database. This can be useful when you are adding search to pre-populated databases to index the existing content. package com.cor.demo.jpa.manager; import java.io.InputStream; import java.util.List; import org.dbunit.DBTestCase; import org.dbunit.database.DatabaseConnection; import org.dbunit.database.IDatabaseConnection; import org.dbunit.dataset.IDataSet; import org.dbunit.dataset.xml.FlatXmlDataSetBuilder; import org.dbunit.operation.DatabaseOperation; import org.hibernate.impl.SessionImpl; import org.junit.After; import org.junit.Before; import org.junit.Test; import org.junit.runner.RunWith; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.test.context.ContextConfiguration; import org.springframework.test.context.junit4.SpringJUnit4ClassRunner; import com.cor.demo.jpa.entity.Book; import com.cor.demo.jpa.entity.BookCategory; /** * DBUnit Test - loads data defined in 'test-data-set.xml' into the database to run tests against the * BookManager. More thorough (and ultimately easier in this context) than using mocks. */ @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(locations = { "classpath:/application-context.xml" }) public class BookManagerDBUnitTest extends DBTestCase { /** Logger. */ private static Logger LOG = LoggerFactory.getLogger(BookManagerDBUnitTest.class); /** Book Manager Under Test. */ @Autowired private BookManager bookManager; @Before public void setup() throws Exception { DatabaseOperation.CLEAN_INSERT.execute(getDatabaseConnection(), getDataSet()); } @After public void tearDown() { deleteBooks(); } @Override protected IDataSet getDataSet() throws Exception { InputStream inputStream = this.getClass().getClassLoader().getResourceAsStream("test-data-set.xml"); FlatXmlDataSetBuilder builder = new FlatXmlDataSetBuilder(); return builder.build(inputStream); } /** * Get the underlying database connection from the JPA Entity Manager (DBUnit needs this connection). * @return Database Connection * @throws Exception */ private IDatabaseConnection getDatabaseConnection() throws Exception { return new DatabaseConnection(((SessionImpl) (bookManager.getEntityManager().getDelegate())).connection()); } /** * Tests the expected results for searching for 'Space' in SCF-FI books. */ @Test public void testSciFiBookSearch() throws Exception { bookManager.listAllBooks(); bookManager.updateFullTextIndex(); List results = bookManager.search(BookCategory.SCIFI, "Space"); assertEquals("Expected 2 results for SCI FI search for 'Space'", 2, results.size()); assertEquals("Expected 1st result to be '2001: A Space Oddysey'", "2001: A Space Oddysey", results.get(0).getTitle()); assertEquals("Expected 2nd result to be 'Apollo 13'", "Apollo 13", results.get(1).getTitle()); } private void deleteBooks() { LOG.info("Deleting Books...-"); bookManager.deleteAllBooks(); } } The source data for the test is defined in an xml file.
August 5, 2013
by Adrian Milne
· 31,307 Views
article thumbnail
What Is NoSQL?
Dan McCreary and Ann Kelly, authors of 'Making Sense of NoSQL,' discuss the business drivers and motivations that make NoSQL so popular to organizations today.
August 1, 2013
by Eric Gregory
· 21,271 Views · 4 Likes
article thumbnail
Jersey Client: Testing External Calls
Jim and I have been doing a bit of work over the last week which involved calling neo4j’s HA status URI to check whether or not an instance was a master/slave and we’ve been using jersey-client. The code looked roughly like this: class Neo4jInstance { private Client httpClient; private URI hostname; public Neo4jInstance(Client httpClient, URI hostname) { this.httpClient = httpClient; this.hostname = hostname; } public Boolean isSlave() { String slaveURI = hostname.toString() + ":7474/db/manage/server/ha/slave"; ClientResponse response = httpClient.resource(slaveURI).accept(TEXT_PLAIN).get(ClientResponse.class); return Boolean.parseBoolean(response.getEntity(String.class)); } } While writing some tests against this code we wanted to stub out the actual calls to the HA slave URI so we could simulate both conditions and a brief search suggested that mockito was the way to go. We ended up with a test that looked like this: @Test public void shouldIndicateInstanceIsSlave() { Client client = mock( Client.class ); WebResource webResource = mock( WebResource.class ); WebResource.Builder builder = mock( WebResource.Builder.class ); ClientResponse clientResponse = mock( ClientResponse.class ); when( builder.get( ClientResponse.class ) ).thenReturn( clientResponse ); when( clientResponse.getEntity( String.class ) ).thenReturn( "true" ); when( webResource.accept( anyString() ) ).thenReturn( builder ); when( client.resource( anyString() ) ).thenReturn( webResource ); Boolean isSlave = new Neo4jInstance(client, URI.create("http://localhost")).isSlave(); assertTrue(isSlave); } which is pretty gnarly but does the job. I thought there must be a better way so I continued searching and eventually came across this post on the mailing list which suggested creating a custom ClientHandler and stubbing out requests/responses there. I had a go at doing that and wrapped it with a little DSL that only covers our very specific use case: private static ClientBuilder client() { return new ClientBuilder(); } static class ClientBuilder { private String uri; private int statusCode; private String content; public ClientBuilder requestFor(String uri) { this.uri = uri; return this; } public ClientBuilder returns(int statusCode) { this.statusCode = statusCode; return this; } public Client create() { return new Client() { public ClientResponse handle(ClientRequest request) throws ClientHandlerException { if (request.getURI().toString().equals(uri)) { InBoundHeaders headers = new InBoundHeaders(); headers.put("Content-Type", asList("text/plain")); return createDummyResponse(headers); } throw new RuntimeException("No stub defined for " + request.getURI()); } }; } private ClientResponse createDummyResponse(InBoundHeaders headers) { return new ClientResponse(statusCode, headers, new ByteArrayInputStream(content.getBytes()), messageBodyWorkers()); } private MessageBodyWorkers messageBodyWorkers() { return new MessageBodyWorkers() { public Map> getReaders(MediaType mediaType) { return null; } public Map> getWriters(MediaType mediaType) { return null; } public String readersToString(Map> mediaTypeListMap) { return null; } public String writersToString(Map> mediaTypeListMap) { return null; } public MessageBodyReader getMessageBodyReader(Class tClass, Type type, Annotation[] annotations, MediaType mediaType) { return (MessageBodyReader) new StringProvider(); } public MessageBodyWriter getMessageBodyWriter(Class tClass, Type type, Annotation[] annotations, MediaType mediaType) { return null; } public List getMessageBodyWriterMediaTypes(Class tClass, Type type, Annotation[] annotations) { return null; } public MediaType getMessageBodyWriterMediaType(Class tClass, Type type, Annotation[] annotations, List mediaTypes) { return null; } }; } public ClientBuilder content(String content) { this.content = content; return this; } } If we change our test to use this code it now looks like this: @Test public void shouldIndicateInstanceIsSlave() { Client client = client().requestFor("http://localhost:7474/db/manage/server/ha/slave"). returns(200). content("true"). create(); Boolean isSlave = new Neo4jInstance(client, URI.create("http://localhost")).isSlave(); assertTrue(isSlave); } Is there a better way? In Ruby I’ve used WebMock to achieve this and Ashok pointed me towards WebStub which looks nice except I’d need to pass in the hostname + port rather than constructing that in the code.
August 1, 2013
by Mark Needham
· 10,799 Views
article thumbnail
AWS: Attaching an EBS volume on an EC2 instance and making it available for use
I recently wanted to attach an EBS volume to an existing EC2 instance that I had running and since it was for a one off tasks (famous last words) I decided to configure it manually. I created the EBS volume through the AWS console and one thing that initially caught me out is that the EC2 instance and EBS volume need to be in the same region and zone. Therefore if I create my EC2 instance in ‘eu-west-1b’ then I need to create my EBS volume in ‘eu-west-1b’ as well otherwise I won’t be able to attach it to that instance. I attached the device as /dev/sdf although the UI gives the following warning: Linux Devices: /dev/sdf through /dev/sdp Note: Newer linux kernels may rename your devices to /dev/xvdf through /dev/xvdp internally, even when the device name entered here (and shown in the details) is /dev/sdf through /dev/sdp. After attaching the EBS volume to the EC2 instance my next step was to SSH onto my EC2 instance and make the EBS volume available. The first step is to create a file system on the volume: $ sudo mkfs -t ext3 /dev/sdf mke2fs 1.42 (29-Nov-2011) Could not stat /dev/sdf --- No such file or directory The device apparently does not exist; did you specify it correctly? It turns out that warning was handy and the device has in fact been renamed. We can confirm this by callingfdisk: $ sudo fdisk -l Disk /dev/xvda1: 8589 MB, 8589934592 bytes 255 heads, 63 sectors/track, 1044 cylinders, total 16777216 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/xvda1 doesn't contain a valid partition table Disk /dev/xvdf: 53.7 GB, 53687091200 bytes 255 heads, 63 sectors/track, 6527 cylinders, total 104857600 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/xvdf doesn't contain a valid partition table /dev/xvdf is the one we’re interested in so I re-ran the previous command: $ sudo mkfs -t ext3 /dev/xvdf mke2fs 1.42 (29-Nov-2011) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 3276800 inodes, 13107200 blocks 655360 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 400 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424 Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done Once I’d done that I needed to create a mount point for the volume and I thought the best place was probably a directory under /mnt: $ sudo mkdir /mnt/ebs The final step is to mount the volume: $ sudo mount /dev/xvdf /mnt/ebs And if we run df we can see that it’s ready to go: $ df -h Filesystem Size Used Avail Use% Mounted on /dev/xvda1 7.9G 883M 6.7G 12% / udev 288M 8.0K 288M 1% /dev tmpfs 119M 164K 118M 1% /run none 5.0M 0 5.0M 0% /run/lock none 296M 0 296M 0% /run/shm /dev/xvdf 50G 180M 47G 1% /mnt/ebs
July 31, 2013
by Mark Needham
· 11,954 Views
  • Previous
  • ...
  • 504
  • 505
  • 506
  • 507
  • 508
  • 509
  • 510
  • 511
  • 512
  • 513
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×