DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Databases Topics

article thumbnail
Reasons to Move from DataTables to Generic Collections
These days, no community member writes or speaks about using DataTables and DataSets for data operations. But, there are a number of real projects built using them, and many developers still feel happy when they use them in their projects. Sometimes it is not easy to completely replace DataTables with typed generic lists, particularly in bulky projects. But now is the right time to move, as future developers may not even learn about DataTables :). Generic collections have a number of advantages over DataTables. One cannot imagine a day without generic collections once he/she gets to know how beneficial they are. The following is a list of the reasons to move from DataTables to collections that I could think of now: DataTable stores boxed objects, and one needs to unbox values when needed. This adds overhead on the runtime environment. However, values in generic collections are strongly typed, so no boxing involved. Unboxing happens at runtime, as does the type checking. If there is a mismatch between types of source and target, it leads to a runtime exception. This may lead to a number of issues while using DataTables. In case of collections, as the types are checked at the compile time, such type mismatches are caught during compilation. .NET languages got very nice support for creating collections, like object initializer and collection initializer. We don’t have such features for DataTables. LINQ queries can be used on both DataTables and collections. But the experience of writing the queries on generic collections is better because of IntelliSense support provided by Visual Studio. DataTables are framework specific; we often see issues with serializing and de-serializing them in web services. Generic collections are easier to serialize and de-serialize, so they can be easily used in any service and consumed from a client written in any language. ORMs are becoming increasingly popular, and they use generic collections for all data operations. Mocking DataTables in unit tests is a pain, as it involves creating the structure of the table wherever needed. But a generic collection needs a class defined just once. These are my opinions on preferring collections over DataTables. Any feedback is welcome. Happy coding!
October 21, 2013
by Rabi Kiran Srirangam
· 30,125 Views · 3 Likes
article thumbnail
Database vs. Data Science
One thing that Big Data certainly made happen is that it brought the database/infrastructure community and the data analysis/statistics/machine learning communities closer together. As always, each community had it’s own set of models, methods, and ideas about how to structure and interpret the world. You can still tell these differences when looking at current Big Data projects, and I think it’s important to be aware of the distinctions in order to better understand the relationships between different projects. Because, let’s face it, every project claims to re-invent Big Data. Hadoop and MapReduce being something like the founding fathers of Big Data, other’s projects have since appeared. Most notably, there are stream processing projects like Twitter’s Storm who move from batch-oriented processing to event-based processing which is more suited for real-time, low-latency processing. Spark is yet something different, a bit like Hadoop, but puts greater emphasis on iterative algorithms, and in-memory processing to achieve that landmark “100x faster than Hadoop” every current project seems to need to sport. Twitter’s summingbird project tries to bridge the gap between MapReduce and stream processing by providing us with a high-level set of operators which can then either run on MapReduce or Storm. However, both Spark or summingbird leave me sort of flat because you can see that they come from a database background, which means that there will still be a considerable gap to serious machine learning. So, what exactly is the difference? In the end, it’s the difference between relational and linear algebra. In the database world, you model relationships between objects, which you encode in tables, and foreign keys to link up entries between different tables. Probably the most important insight of the database world was to develop a query language, a declarative description of what you want to extract from your database, leaving the optimization of the query and the exact details of how to perform them efficiently to the database guys. The machine learning community, on the other hand, has its roots in linear algebra and probability theory. Objects are usually encoded as a feature vector, that is, a list of numbers describing different properties of an object. Data is often collected in matrices where each row corresponds to an object, and each column to a feature, not much unlike a table in a database. However, the operations you perform in order to do data analysis are quite different from the data base world. Take something as basic as linear regression: your try to learn a linear function f(x)=di=1wixi in a d-dimensional space (that is, where your objects are described by a d-dimensional vector) given n examples Xi, and Yi, where Xi are the features describing your objects and Yi is the real number you attach to Xi. One way to “learn” w is to tune it such that the quadratic error on the training examples is minimal. The solution can be written in closed form as w=(XXT)−1XY where X is the matrix built from the Xi (putting the Xi in the columns of X), and Y is the vector of outputs Yi. In order to solve this, you need to solve the linear equation (XXT)w=XY which can be done by one of a large number of algorithms, starting with Gaussian elimination, which you’ve probably learned in your undergrad studies, or the conjugate gradient algorithm, or by first computing a Cholesky decomposition. All of these algorithms have in common that they are iterative. They go through a number of operations, for example O(d3) for the Gaussian elimination case. They also need to store intermediate results. Gaussian elimination and Cholesky decomposition have rather elementary operations acting on individual entries, while the conjugate gradient algorithm performs a matrix-vector multiplication in each iteration. Most importantly, these algorithms can only be expressed very badly in SQL! It’s certainly not impossible, but you’d need to store your data in much different ways than you would in idiomatic database usage. So, it’s not about whether or not your framework can support iterative algorithms without significant latency, it’s about understanding that joins, group bys, and count() won’t get you far, but you need scalar products, matrix-vector and matrix-matrix multiplications. You don’t need indices for most ML algorithms, maybe except for being able to quickly find the k-nearest neighbors, because most algorithms tend to either take in the whole data set in each iteration or otherwise stream the whole set by some model which is iteratively updated like in stochastic gradient descent. I’m not sure projects like Spark or Stratosphere have fully grasped the significance of this yet. Database infrastructure-inspired Big Data has it’s place when it comes to extracting and preprocessing data, but eventually, you move from database land to machine learning land, which invariably means linear algebra land (or probability theory land, which often also reduces to linear algebra like computations). What often happens today is that you either painstakingly have to break down your linear algebra into MapReduce jobs, or you actively look for algorithms which fit the database view better. I think we’re still at the beginning of what is possible. Or, to be a bit more aggressive, claims that existing (infrastructure, database, parallelism inspired) frameworks provide you with sophistic data analytics are widely exaggerated. They take care of a very important problem by giving you a reliable infrastructure to scale your data analysis code, but there’s still a lot of work that needs to be done on your side. High-level DSLs like Apache Hive or Pig are a first step in this direction but still too much rooted in the database world IMHO. In summary, one should be aware of the difference between a framework which mostly is concerned with scaling and a tool which actually provides some piece of data analysis. And even if it comes with basic database-like analytics mechanisms, there is still a long way to go to do some serious data science. That’s why we’re also thinking that streamdrill occupies an interesting spot, because it is a bit of infrastructure, allowing you to process a serious amount of event data, but it also provides valuable analysis based on algorithms you wouldn’t want to implement yourself, even if you had some Big Data framework like Hadoop at hand. That’s an interesting direction I also would like to see more of in the future. Note: Just saw that Spark has a logistic regression example on their landing page. Well, doing matrix operations explicitly via map() on collections doesn’t count in my view ;)
October 18, 2013
by Mikio Braun
· 11,383 Views · 1 Like
article thumbnail
Generating SQL Railroad Diagrams
simple talk - How to get SQL Railroad Diagrams from MSDN BNF syntax notation. On SQL Server Books-On-Line, in the Transact-SQL Reference (database Engine), every SQL Statement has its syntax represented in ‘Backus–Naur Form’ notation (BNF) syntax. For a programmer in a hurry, this should be ideal because It is the only quick way to understand and appreciate all the permutations of the syntax. It is a great feature once you get your eye in. It isn’t the only way to get the information; You can, of course, reverse-engineer an understanding of the syntax from the examples, but your understanding won’t be complete, and you’ll have wasted time doing it. BNF is a good start in representing the syntax: Oracle and SQLite go one step further, and have proper railroad diagrams for their syntax, which is a far more accessible way of doing it. There are three problems with the BNF on MSDN. Firstly, it is isn’t a standard version of BNF, but an ancient fork from EBNF, inherited from Sybase. Secondly, it is excruciatingly difficult to understand, and thirdly it has a number of syntactic and semantic errors. The page describing DML triggers, for example, currently has the absurd BNF error that makes it state that all statements in the body of the trigger must be separated by commas. There are a few other detail problems too. Here is the offending syntax for a DML trigger, pasted from MSDN. ... I’ve been trying to create railroad diagrams for all the important SQL Server SQL statements, as good as you’d find for Oracle, and have so far published the CREATE TABLE and ALTER TABLE railroad diagrams based on the BNF. Although I’ve been aware of them, I’ve never realised until recently how many errors there are. Then, Colin Daley created a translator for the SQL Server dialect of BNF which outputs standard EBNF notation used by the W3C. The example MSDN BNF for the trigger would be rendered as … ... Colin’s intention was to allow anyone to paste SQL Server’s BNF notation into his website-based parser, and from this generate classic railroad diagrams via Gunther Rademacher's Railroad Diagram Generator. Colin's application does this for you: you're not aware that you are moving to a different site. Because Colin's 'translator' it is a parser, it will pick up syntax errors. Once you’ve fixed the syntax errors, you will get the syntax in the form of a human-readable railroad diagram and, in this form, the semantic mistakes become flamingly obvious. Gunter’s Railroad Diagram Generator is brilliant. To be able, after correcting the MSDN dialect of BNF, to generate a standard EBNF, and from thence to create railroad diagrams for SQL Server’s syntax that are as good as Oracle’s, is a great boon, and many thanks to Colin for the idea. Here is the result of the W3C EBNF from Colin’s application then being run through the Railroad diagram generator. Now that’s much better, you’ll agree. This is pretty easy to understand, and at this point any error is immediately obvious. This should be seriously useful, and it is to me. However there is that snag. The BNF is generally incorrect, and you can’t expect the average visitor to mess about with it. The answer is, of course, to correct the BNF on MSDN and maybe even add railroad diagrams for the syntax. Stop giggling! I agree it won’t happen. In the meantime, we need to collaboratively store and publish these corrected syntaxes ourselves as we do them. How? GitHub? SQL Server Central? Simple-Talk? What should those of us who use the system do with our corrected EBNF so that anyone can use them without hassle? Grammar Translator If you are familiar with the Grammar Translator, go ahead and create railroad diagrams from the Transact-SQL Reference. Otherwise, please see the FAQ. In particular, be sure to try thetutorial. Welcome to Railroad Diagram Generator! This is a tool for creating syntax diagrams, also known as railroad diagrams, from context-free grammars specified in EBNF. Syntax diagrams have been used for decades now, so the concept is well-known, and some tools for diagram generation are in existence. The features of this one are usage of the W3C's EBNF notation, web-scraping of grammars from W3C specifications, online editing of grammars, diagram presentation in SVG, and it was completely written in web languages (XQuery, XHTML, CSS, JavaScript). There's nothing like a diagram to help grok something (and the MSDN BNF SQL stuff really makes my brain hurt...)
October 18, 2013
by Greg Duncan
· 9,149 Views
article thumbnail
Extracting File Metadata with C# and the .NET Framework
The Windows Explorer (shell) provides extended file property information which can be quite valuable. The challenge was how to extract this information, given that the .NET Framework has somewhat limited support for this type of extraction?
October 14, 2013
by Rob Sanders
· 64,190 Views
article thumbnail
SSL Performance Overhead in MySQL
this post comes from ernie souhrada at the mysql performance blog. note: this is part 1 of what will be a two-part series on the performance implications of using in-flight data encryption. some of you may recall my security webinar from back in mid-august; one of the follow-up questions that i was asked was about the performance impact of enabling ssl connections. my answer was 25%, based on some 2011 data that i had seen over on yassl’s website, but i included the caveat that it is workload-dependent, because the most expensive part of using ssl is establishing the connection. not long thereafter, i received a request to conduct some more specific benchmarks surrounding ssl usage in mysql, and today i’m going to show the results. first, the testing environment. all tests were performed on an intel core i7-2600k 3.4ghz cpu (8 cores, ht included) with 32gb of ram and centos 6.4. the disk subsystem is a 2-disk raid-0 of samsung 830 ssds, although since we’re only concerned with measuring the overhead added by using ssl connections, we’ll only be conducting read-only tests with a dataset that fits completely in the buffer pool. the version of mysql used for this experiment is community edition 5.6.13, and the testing tools are sysbench 0.5 and perl. we conduct two tests, each one designed to simulate one of the most common mysql usage patterns. first, we examine connection pooling, often seen in the java world, where some small set of connections are established by, for example, the servlet container and then just passed around to the application as needed, and one-request-per-connection, typical in the lamp world, where the script that displays a given page might connect to the database, run a couple of queries, and then disconnect. test 1: connection pool for the first test, i ran sysbench in read-only mode at concurrency levels of 1, 2, 4, 8, 16, and 32 threads, first with no encryption and then with ssl enabled and key lengths of 1024, 2048, and 4096 bits. 8 sysbench tables were prepared, each containing 100,000 rows, resulting in a total data size of approximately 256mb. the size of my innodb buffer pool was 4gb, and before conducting each official measurement run, i ran a warm-up run to prime the buffer pool. each official test run lasted 10 minutes; this might seem short, but unlike, say, a pcie flash storage device, i would not expect the variable under observation to really change that much over time or need time to stabilize. the basic sysbench syntax used is shown below. #!/bin/bash for ssl in on off ; do for threads in 1 2 4 8 16 32 ; do sysbench --test=/usr/share/sysbench/oltp.lua --mysql-user=msandbox$ssl --mysql-password=msandbox \ --mysql-host=127.0.0.1 --mysql-port=5613 --mysql-db=sbtest --mysql-ssl=$ssl \ --oltp-tables-count=8 --num-threads=$threads --oltp-dist-type=uniform --oltp-read-only=on \ --report-interval=10 --max-time=600 --max-requests=0 run > sb-ssl_${ssl}-threads-${threads}.out done done if you’re not familiar with sysbench, the important thing to know about it for our purposes is that it does not connect and disconnect after each query or after each transaction. it establishes n connections to the database (where n is the number of threads) and runs queries though them until the test is over. this behavior provides our connection-pool simulation. the assumption, given what we know about where ssl is the slowest, is that the performance penalty here should be the lowest. first, let’s look at raw throughput, measured in queries per second: the average throughput and standard deviation (both measured in queries per second) for each test configuration is shown below in tabular format: # of threads ssl key size 1 2 4 8 16 32 ssl off 9250.18 (1005.82) 18297.61 (689.22) 33910.31 (446.02) 50077.60 (1525.37) 49844.49 (934.86) 49651.09 (498.68) 1024-bit 2406.53 (288.53) 4650.56 (558.58) 9183.33 (1565.41) 26007.11 (345.79) 25959.61 (343.55) 25913.69 (192.90) 2048-bit 2448.43 (290.02) 4641.61 (510.91) 8951.67 (1043.99) 26143.25 (360.84) 25872.10 (324.48) 25764.48 (370.33) 4096-bit 2427.95 (289.00) 4641.32 (547.57) 8991.37 (1005.89) 26058.09 (432.86) 25990.13 (439.53) 26041.27 (780.71) so, given that this is an 8-core machine and io isn’t a factor, we would expect throughput to max out at 8 threads, so the levelling-off of performance is expected. what we also see is that it doesn’t seem to make much difference what key length is used, which is also largely expected. however, i definitely didn’t think the encryption overhead would be so high. the next graph here is 95th-percentile latency from the same test: and in tabular format, the raw numbers (average and standard deviation): # of threads ssl key size 1 2 4 8 16 32 ssl off 1.882 (0.522) 1.728 (0.167) 1.764 (0.145) 2.459 (0.523) 6.616 (0.251) 27.307 (0.817) 1024-bit 6.151 (0.241) 6.442 (0.180) 6.677 (0.289) 4.535 (0.507) 11.481 (1.403) 37.152 (0.393) 2048-bit 6.083 (0.277) 6.510 (0.081) 6.693 (0.043) 4.498 (0.503) 11.222 (1.502) 37.387 (0.393) 4096-bit 6.120 (0.268) 6.454 (0.119) 6.690 (0.043) 4.571 (0.727) 11.194 (1.395) 37.26 (0.307) with the exception of 8 and 32 threads, the latency introduced by the use of ssl is constant at right around 5ms, regardless of the key length or the number of threads. i’m not surprised that there’s a large jump in latency at 32 threads, but i don’t have an immediate explanation for the improvement in the ssl latency numbers at 8 threads. test 2: connection time for the second test, i wrote a simple perl script to just connect and disconnect from the database as fast as possible. we know that it’s the connection setup which is the slowest part of ssl, and the previous test already shows us roughly what we can expect for ssl encryption overhead for sending data once the connection has been established, so let’s see just how much overhead ssl adds to connection time. the basic script to do this is quite simple (non-ssl version shown): #!/usr/bin/perl use dbi; use time::hires qw(time); $start = time; for (my $i=0; $i<100; $i++) { my $dbh = dbi->connect("dbi:mysql:host=127.0.0.1;port=5613", "msandbox","msandbox",undef); $dbh->disconnect; undef $dbh; } printf "%.6f\n", time - $start; as with test #1, i ran test #2 with no encryption and ssl encryption of 1024, 2048, and 4098 bits, and i conducted 10 trials of each configuration. then i took the elapsed time for each test and converted it to connections per second. the graph below shows the results from each run: here are the averages and standard deviations: encryption average connections per second standard deviation none 2701.75 165.54 1024-bit 77.04 6.14 2048-bit 28.183 1.713 4096-bit 5.45 0.015 yes, that’s right, 4096-bit ssl connections are 3 orders of magnitude slower to establish than unencrypted connections. really, the connection overhead for any level of ssl usage is quite high when compared to the unencrypted test, and it’s certainly much higher than my original quoted number of 25%. analysis and parting thoughts so, what do we take away from this? the first thing is, of course, is that ssl overhead is a lot higher than 25%, particularly if your application uses anything close to the one-connection-per-request pattern. for a system which establishes and maintains long-running connections, the initial connection overhead becomes a non-factor, regardless of the encryption strength, but there’s still a rather large performance penalty compared to the unencrypted connection. this leads directly into the second point, which is that connection pooling is by far a more efficient method of using ssl if your application can support it. but what if connection pooling isn’t an option, mysql’s ssl performance is insufficient, and you still need full encryption of data in-flight? run the encryption component of your system at a lower layer – a vpn with hardware crypto would be the fastest approach, but even something as simple as an ssh tunnel or openvpn *might* be faster than ssl within mysql. i’ll be exploring some of these solutions in a follow-up post. and finally… when in doubt, run your own benchmarks. i don’t have an explanation for why the yassl numbers are so different from these (maybe yassl is a faster ssl library than openssl, or maybe they used a different cipher – if you’re curious, the original 25% number came from slides 56-58 of this presentation ), but in any event, this does illustrate why it’s important to run tests on your own hardware and with your own workload when you’re interested in finding out how well something will perform rather than taking someone else’s word for it.
October 11, 2013
by Peter Zaitsev
· 6,773 Views
article thumbnail
Large Dataset Retrieval in Mule
Recently, a customer made a query on how to perform large dataset retrieval in Mule. The documentation page briefly explains how this may be achieved, however there is no working example on how to do this as far as I can tell. This blog post aims to explain in detail how large dataset retrieval works in Mule by giving an example. The customer wanted to transfer items from one database to another by performing a batch select and then a batch insert. The ‘batch insert’ part is pretty straightforward and is done automatically by Mule when the payload is of type List. However, the batch select is mastered in a different way. In order to retrieve all the records, we will use the Batch Manager to compute the ID ranges for the next batch of records to be retrieved. This is provided out of the box with Mule EE. We start by defining the database which will be used throughout the example to retrieve and insert records. For simplicity’s sake we are going to use the Derby in-memory database. NOTE: the records should be identified by a key which is unique and in a sequential numeric order. CREATE TABLE table1(KEY1 INTEGER GENERATED BY DEFAULT AS IDENTITY(START WITH 1) NOT NULL PRIMARY KEY, KEY2 VARCHAR(255)); CREATE TABLE table2(KEY1 VARCHAR(255), KEY2 VARCHAR(255)); INSERT INTO table1(KEY2) VALUES ('TEST1'); INSERT INTO table1(KEY2) VALUES ('TEST2'); INSERT INTO table1(KEY2) VALUES ('TEST3'); INSERT INTO table1(KEY2) VALUES ('TEST4'); INSERT INTO table1(KEY2) VALUES ('TEST5'); INSERT INTO table1(KEY2) VALUES ('TEST6'); INSERT INTO table1(KEY2) VALUES ('TEST7'); INSERT INTO table1(KEY2) VALUES ('TEST8'); INSERT INTO table1(KEY2) VALUES ('TEST9'); INSERT INTO table1(KEY2) VALUES ('TEST10'); As explained before, the select query is based on the ID ranges that are computed by the Batch Manager when nextBatch() is called. This will return a map with the lower and upper ids to be selected. In our case, we are storing this map into a flow variable named ‘boundaries’. After configuring the database and the JDBC connector, we need to configure the Batch Manager. This consists of specifying the idStore (which is a text file), which the BatchManager uses to store the starting point for the next batch. Moreover, on the Batch Manager, we need to configure the batch size and the starting point. In the documentation, you would find a reference to the noArgsWrapper. Its job is to invoke the nextBatch() method on the Batch Manager. However we find this very confusing and misleading, thus instead, we use a simple MEL expression which calls the nextBatch() directly. Now we have to configure the main flow where we perform the batch select. Given that the records are retrieved in batches, the flow has to be called multiple times until all of the records are retrieved. To solve this, we created a composite source so that at the end of the flow, if we haven’t retrieved all the records, we re-trigger the same flow using the VM queue. Once the current batch is finished, we need to call competeBatch() to instruct the batch manager that we’re done from the current batch, and ready to process the next. If this is not done, the Batch Manager will still consider the previous batch as ‘processing’. Furthermore, we have to check whether we have retrieved all of the records so we can stop processing. We do this by checking the size of the payload that is returned from the JDBC outbound endpoint. If the payload size is ’0′ (no more records to be retrieved), we have to call the completeBatch() method with ‘-1′, instructing the Batch Manager that all of the batch is complete. We must also set the starting point for next batch to ’0′. This is required so that when the flow is triggered again from the HTTP inbound endpoint, the flow will start processing from the first record. If the batch is not complete, we call the completeBatch() method (from the BatchManager class) with the current upperId. This sets the new starting point for the next batch to be processed. Finally we end the flow with a VM outbound on ‘batch’ which triggers the main flow to process the next batch of records. app.registry.seqBatchManager.completeBatch(-1); app.registry.seqBatchManager.setStartingPointForNextBatch(0); app.registry.seqBatchManager.completeBatch(flowVars.boundaries.upperId); A complete Mule configuration of the main flow shown here below.
October 2, 2013
by Clare Cini
· 10,421 Views
article thumbnail
Getting Started with NHibernate and ASP.NET MVC- CRUD Operations
In this post we are going to learn how we can use NHibernate in ASP.NET MVC application. What is NHibernate: ORMs(Object Relational Mapper) are quite popular this days. ORM is a mechanism to map database entities to Class entity objects without writing a code for fetching data and write some SQL queries. It automatically generates SQL Query for us and fetch data behalf on us. NHibernate is also a kind of Object Relational Mapper which is a port of popular Java ORM Hibernate. It provides a framework for mapping an domain model classes to a traditional relational databases. Its give us freedom of writing repetitive ADO.NET code as this will be act as our database layer. Let’s get started with NHibernate. How to download: There are two ways you can download this ORM. From nuget package and from the source forge site. Nuget - http://www.nuget.org/packages/NHibernate/ Source Forge-http://sourceforge.net/projects/nhibernate/ Creating a table for CRUD: I am going to use SQL Server 2012 express edition as a database. Following is a table with four fields Id, First Name, Last name, Designation. Creating ASP.NET MVC project for NHibernate: Let’s create a ASP.NET MVC project for NHibernate via click on File-> New Project –> ASP.NET MVC 4 web application. Installing NuGet package for NHibernate: I have installed nuget package from Package Manager console via following Command. It will install like following. NHibertnate configuration file: Nhibernate needs one configuration file for setting database connection and other details. You need to create a file with ‘hibernate.cfg.xml’ in model Nhibernate folder of your application with following details. NHibernate.Connection.DriverConnectionProvider NHibernate.Driver.SqlClientDriver Server=(local);database=LocalDatabase;Integrated Security=SSPI; NHibernate.Dialect.MsSql2012Dialect Here you have got different settings for NHibernate. You need to selected driver class, connection provider as per your database. If you are using other databases like Orcle or MySQL you will have different configuration. ThisNHibernate ORM can work with any databases. Creating a model class for NHibernate: Now it’s time to create model class for our CRUD operations. Following is a code for that. Property name is identical to database table columns. namespace NhibernateMVC.Models { public class Employee { public virtual int Id { get; set; } public virtual string FirstName { get; set; } public virtual string LastName { get; set; } public virtual string Designation { get; set; } } } Creating a mapping file between class and table: Now we need a xml mapping file between class and model with name “Employee.hbm.xml” like following in Nhibernate folder. Creating a class to open session for NHibernate I have created a class in models folder called NHIbernateSession and a static function it to open a session for NHibertnate. using System.Web; using NHibernate; using NHibernate.Cfg; namespace NhibernateMVC.Models { public class NHibertnateSession { public static ISession OpenSession() { var configuration = new Configuration(); var configurationPath = HttpContext.Current.Server.MapPath(@"~\Models\Nhibernate\hibernate.cfg.xml"); configuration.Configure(configurationPath); var employeeConfigurationFile = HttpContext.Current.Server.MapPath(@"~\Models\Nhibernate\Employee.hbm.xml"); configuration.AddFile(employeeConfigurationFile); ISessionFactory sessionFactory = configuration.BuildSessionFactory(); return sessionFactory.OpenSession(); } } } Listing: Now we have our open session method ready its time to write controller code to fetch data from the database. Following is a code for that. using System; using System.Web.Mvc; using NHibernate; using NHibernate.Linq; using System.Linq; using NhibernateMVC.Models; namespace NhibernateMVC.Controllers { public class EmployeeController : Controller { public ActionResult Index() { using (ISession session = NHibertnateSession.OpenSession()) { var employees = session.Query().ToList(); return View(employees); } } } } Here you can see I have get a session via OpenSession method and then I have queried database for fetching employee database. Let’s create a new for this you can create this via right lick on view on above method.We are going to create a strongly typed view for this. Our listing screen is ready once you run project it will fetch data as following. Create/Add: Now its time to write add employee code. Following is a code I have written for that. Here I have used session.save method to save new employee. First method is for returning a blank view and another method with HttpPost attribute will save the data into the database. public ActionResult Create() { return View(); } [HttpPost] public ActionResult Create(Employee emplolyee) { try { using (ISession session = NHibertnateSession.OpenSession()) { using (ITransaction transaction = session.BeginTransaction()) { session.Save(emplolyee); transaction.Commit(); } } return RedirectToAction("Index"); } catch(Exception exception) { return View(); } } Now let’s create a create view strongly typed view via right clicking on view and add view. Once you run this application and click on create new it will load following screen. Edit/Update: Now let’s create a edit functionality with NHibernate and ASP.NET MVC. For that I have written two action result method once for loading edit view and another for save data. Following is a code for that. public ActionResult Edit(int id) { using (ISession session = NHibertnateSession.OpenSession()) { var employee = session.Get(id); return View(employee); } } [HttpPost] public ActionResult Edit(int id, Employee employee) { try { using (ISession session = NHibertnateSession.OpenSession()) { var employeetoUpdate = session.Get(id); employeetoUpdate.Designation = employee.Designation; employeetoUpdate.FirstName = employee.FirstName; employeetoUpdate.LastName = employee.LastName; using (ITransaction transaction = session.BeginTransaction()) { session.Save(employeetoUpdate); transaction.Commit(); } } return RedirectToAction("Index"); } catch { return View(); } } Here in first action result I have fetched existing employee via get method of NHibernate session and in second I have fetched and changed the current employee with update details. You can create view for this via right click –>add view like below. I have created a strongly typed view for edit. Once you run code it will look like following. Details: Now it’s time to create a detail view where user can see the employee detail. I have written following logic for details view. public ActionResult Details(int id) { using (ISession session = NHibertnateSession.OpenSession()) { var employee = session.Get(id); return View(employee); } } You can add view like following via right click on actionresult view. now once you run this in browser it will look like following. Delete: Now its time to write delete functionality code. Following code I have written for that. public ActionResult Delete(int id) { using (ISession session = NHibertnateSession.OpenSession()) { var employee = session.Get(id); return View(employee); } } [HttpPost] public ActionResult Delete(int id, Employee employee) { try { using (ISession session = NHibertnateSession.OpenSession()) { using (ITransaction transaction = session.BeginTransaction()) { session.Delete(employee); transaction.Commit(); } } return RedirectToAction("Index"); } catch(Exception exception) { return View(); } } Here in the above first action result will have the delete confirmation view and another will perform actual delete operation with session delete method. When you run into the browser it will look like following. That’s it. It’s very easy to have crud operation with NHibernate. Stay tuned for more.
October 1, 2013
by Jalpesh Vadgama
· 47,277 Views
article thumbnail
ElasticSearch: Java API
ElasticSearch provides Java API, thus it executes all operations asynchronously by using client object.
September 30, 2013
by Hüseyin Akdoğan DZone Core CORE
· 137,553 Views · 4 Likes
article thumbnail
Parallel SQL in C#
So, I’ve been wanting to get back to playing with C# for a while, and finally have had the opportunity. I’ve also been wanting to play with the Task library in .NET and see if I could get it to do something interesting, well below is the result. The code below, running in a .NET 4 project, will run two SQL SELECT statements against the AdventureWorks2012 database. There are three tasks in here, ParallelTask 1 and 2, and a timing task. The Parallel task takes a Connection String and a query as inputs, and passes out a Status Message. One of the important points with a task is that the task has to be self contained. This is why the connection is instantiated within the task. I also added in a Timing task (ParallelTiming) so I could pass out a ping message. The whole thing is controlled by the code in the main section, which is used to start the three tasks, with their appropriate parameters. After this it awaits the tasks completing, then passes out the resulting return messages. Try it out; it’s good fun and all you need is SQL Server, AdventureWorks and something to build C# projects. You can download the code here Have fun! /// Parallel_SQL demonstration code /// From Nick Haslam /// http://blog.nhaslam.com /// 16/9/2013 using System; using System.Collections.Generic; using System.Data.SqlClient; using System.Linq; using System.Text; using System.Threading.Tasks; namespace Parallel_SQL { class Program { /// /// First Parallel task /// ///Connection string details ///Query to execute ///Status message to pass back /// static Task ParallelTask1(string sConnString, string sQuery, Action StatusMessage) { return Task.Factory.StartNew(() => { SqlConnection conn = new SqlConnection(sConnString); conn.Open(); StatusMessage(“Running Query”); SqlDataReader reader = null; SqlCommand sqlCommand = new SqlCommand(sQuery, conn); reader = sqlCommand.ExecuteReader(); while (reader.Read()) { StatusMessage(reader[0].ToString()); } return “Task 1 Complete”; }); } /// /// Second Parallel task /// ///Connection string details ///Query to execute ///Status message to pass back /// static Task ParallelTask2(string sConnString, string sQuery, Action StatusMessage) { return Task.Factory.StartNew(() => { SqlConnection conn = new SqlConnection(sConnString); conn.Open(); StatusMessage(“Running Query”); SqlDataReader reader = null; SqlCommand sqlCommand = new SqlCommand(sQuery, conn); reader = sqlCommand.ExecuteReader(); while (reader.Read()) { StatusMessage(reader[0].ToString()); } return “Task 2 Complete”; }); } /// /// Timing Task /// ///Milliseconds between ping ///Status message to pass back /// static Task ParallelTiming(int iMSPause, Action StatusMessage) { return Task.Factory.StartNew(() => { for (int i = 0; i < 10; i++) { System.Threading.Thread.Sleep(iMSPause); StatusMessage(“******************** PING ********************”); } return “Timing task done”; }); } static void Main(string[] args) { string sConnString = “server=.; Trusted_Connection=yes; database=AdventureWorks2012;”; try { var Task1Control = ParallelTask1(sConnString, “SELECT top 500 TransactionID FROM Production.TransactionHistory”, (update) => { Console.WriteLine(String.Format(“{0} – {1}”, DateTime.Now, update)); }); var Task2Control = ParallelTask2(sConnString, “SELECT top 500 SalesOrderDetailID FROM sales.SalesOrderDetail”, (update) => { Console.WriteLine(String.Format(“{0} – \t\t{1}”, DateTime.Now, update)); }); var TimingTaskControl = ParallelTiming(250, (update) => { Console.WriteLine(String.Format(“{0} – \t\t\t{1}”, DateTime.Now, update)); }); // Await Completion of the tasks Console.WriteLine(“Task 1 Status – {0}”, Task1Control.Result); Console.WriteLine(“Task 2 Status – {0}”, Task2Control.Result); Console.WriteLine(“Timing Task Status – {0}”, TimingTaskControl.Result); } catch (Exception e) { Console.WriteLine(e.ToString()); } Console.ReadKey(); } } }
September 29, 2013
by Nick Haslam
· 22,593 Views · 31 Likes
article thumbnail
"Lazy" Database Synchronization Using RabbitMQ
The Problem Obviously, there are tons of different ways to sync databases, so why should it be described again? Let's imagine that we have an unusual situation with restrictions below: A future system will have some Head Office (HO) and a couple of Branch Offices (BOs) All offices are located in different places, and some of them have difficulties with the internet connection. It could even be a situation where the internet is available for 1-2 hours per day. Almost all vital data is created in the HO and should be presented as read-only in BOs. Data exchange should be limited with appropriate permissions (for example, if an operator has created some sensitive data in the HO for BO1, only BO1 should have access to it). HO should have access to all information that has been created or modified in BOs. According to all described points final decision to write own DB sync mechanism has been made. Basic Idea Due to connection degradation between HO and BOs, we have to sync everything within short-term sessions. Since there is no need to send information to all branches in general cases, we should be able to orchestrate data flow. Those thoughts bring us to the idea that we might implement some kind of RPC where an event occurs in one office, and it is reproduced (replayed) in another. Message queues (MQ) are a perfect solution to sync data between branches. RabbitMQ is my favorite MQ, so I will use it in this example. Also, this application will use the .NET stack which has a convenient API client implementation for RabbitMQ called EasyNetQ. High Level Application Architecture According to the idea of replaying some actions on other system instances, we should be able to divide them into single business-logic operations. The best way to achieve this it is by using the Aggregate Roots approach. The main idea is to have separated objects that are divided by domain entities, and each call to the methods of those objects is a single change to state of the business logic. For example, if we have some domain object Document and the ability to Get, Upsert, or Apply/Unapply, then we should describe its root as (pseudocode): public class DocumentRoot { public Document Get(Id) { ... } public Document Upsert(Document) { ... } public bool Apply(Id) { ... } public bool UnApply(Id) { ... } } Also, it's very important to ensure that each call will be in a transaction in order to avoid data loss. This can be achieved using simple method interception (for example Autofac + Castle.Proxy). In other worlds, the core process will look like this: Keep in mind things as entities primary keys, because data will be populated between different system instances, and we'll need to be sure that ID's will be the same. Also, collisions are possible while using simple auto-incrementing PK's, so our choice is GUID. With the help of a base repository, it's very simple to implement new GUID storage during object creation. Let's assume that we have an ExchangeInformation object that handles all data needed to restore a root call on a remote system. It will contain info about the method name, type name, input, and output params – this data can be obtained from a root interceptor. Also, it should have the list of new ID's, but it's not hard to get them too, even though we'll need to implement the UnitOfWork pattern on an ORM type to support transactions. This will allow us to place our ExchangeInformation in that UoF object (for example, within Entity Framework it's DbContext). Here is the implementation (using EF) of saving any changes in a domain within the base generic repository where the base entity looks like: public class EntityBase { public long Id { get; set; } public Guid Guid { get; set; } } public virtual void Save(T entity) { DbEntityEntry entry = Context.Entry(entity); if (entity.Guid == Guid.Empty) { try { Guid newGuid = Context.ExchangeInformation.IsExchangeRestore ? Context.ExchangeInformation.NewGuids[0] : Guid.NewGuid(); if (Context.ExchangeInformation.IsExchangeRestore) { Context.ExchangeInformation.NewGuids.RemoveAt(0); } else { Context.ExchangeInformation.NewGuids.Add(newGuid); } entity.Guid = newGuid; } catch { throw new Exception("Failed to restore exchange, no guid found"); } entry.State = EntityState.Added; return; } Context.Entry(entity).State = EntityState.Modified; } One more important note: to avoid code duplication, it's necessary to use GUID's on clients, because if they operate any other ID's we'll need to write two different implementations of any method. Big Picture After preparation completion, we can proceed with architecture design. Since every system instance should be able to send and receive new data, we can declare two RMQ topics: input and output. Also, because message flow must be orchestrated, queues for each system instance should be created within the output topic. The simplest strategy for a routing implementation is to use the branch office guide as a key. So we know how to do following at the moment : Save the source event in one office. Put this event to selected queues (selection could be made but it depends on the situation: read from the entity, call some additional method, use attributes etc.) The next step is a solution for how to make output events from one office appear in the input queue of the other office. RabbitMQ has two plugins for that: Federation and Shovel. They are quite similar, but shovel is working on a lower level and has more options to control the synchronization process, so that we'll use the second one to link queues. Shovel is very good with handling connection degradation and has lot of additional configurable options like message republishing properties, routing etc. Now it's time to combine all pieces in to single picture: Aggregators here are simple RabbitMQ consumers that handle incoming messages from other offices and launch appropriate methods. One other problem is restoring transferred params. From my point of view the best way is to use Json.Net with type serialization and restore them on a remote system instance with a small hack: private object[] GetParams(MethodInfo methodInfo, ExchangeInformation information, ExchangeMessage message) { ParameterInfo[] methodParams = methodInfo.GetParameters(); var listParams = new List>(information.InputParamsString); for (int ii = 0; ii < methodParams.Length; ii++) { var jObject = JsonConvert.DeserializeObject(information.OutputValueString); string typeName = jObject["$type"].ToString(); listParams.Add(jObject.ToObject(Type.GetType(typeName))); } return listParams.ToArray(); } Surely appropriate conditions for params count mismatch, so valid deserialization and so on are required. Conclusions The approach I've described is very easy to implement and it has lots of additional places that can be customized. For example, any other method can be executed before/instead of/after restoration on a target branch to change the logic of DOM behavior. The main issue is that collisions can occur if two BOs edit same object at the same time. Actually, it's not hard to track this situation by adding a hash to EntityBase. Nevertheless, a human's decision is needed to resolve conflicts, so a simple UI is necessary in the HO where the operator can choose which data is correct.
September 25, 2013
by Vladimir Kornev
· 18,251 Views · 2 Likes
article thumbnail
Connecting to SQL Azure with SQL Management Studio
Intro If you want to manage your SQL Databases in Azure using tools that you’re a little more familiar and comfortable with – for example – SQL Management Studio, how do you go about connecting? You could read the help article from Microsoft, or you can follow my intuitive screen-based instructions, below: Assumptions 1. I’m assuming you have a version of SQL Management Studio already installed. I believe you’ll need at least SQL Server 2008 R2’s version or newer 2. I’m further assuming you’ve already created a SQL Database in Azure Steps to Connect SSMS to SQL Azure 1. Authenticate to the Azure Portal 2. Click on SQL Databases 3. Click on Servers 4. Click on the name of the Server you wish to connect to… 5. Click on Configure… If not already in place, click on ‘Add to the allowed IP addresses’ to add your current IP address (or specify an address you wish to connect from) and click ‘Save’ 6. Open SQL Management Studio and connect to Database services (usually comes up by default) Enter the fully qualified server name (.database.windows.net) Change to SQL Server Authentication Enter the login preferred (if a new database, the username you specified when yuo created the DB server) Enter the correct password 7. Hit the Connect button Troubleshooting Ensure you have the appropriate ports open outbound from your local network or connection (typically port 1433) Ensure you have allowed the correct public IP address you’re trying to connect from via the Azure Portal (steps 1-5 above) Ensure you are using the correct server name and user name For SSMS, this is the server name (in step 4) followed by .database.windows.net Ensure you are using SQL Server Authentication For SSMS the username format is If you forgot the password of your username, you can reset the password in the Azure Portal, in step 4, click on Dashboard: Lastly… You can click on the Database (in step 2) to see your connection options:
September 25, 2013
by Rob Sanders
· 262,895 Views
article thumbnail
Solving the Detached Many-to-Many Problem with the Entity Framework
Introduction This article is part of the ongoing series I’ve been writing recently, but can be read as a standalone article. I’m going to do a better job of integrating the changes documented here into the ongoing solution I’ve been building. However, considering how much time and effort I put into solving this issue, I’ve decided to document the approach independently in case it is of use to others in the interim. The Problem Defined This issue presents itself when you are dealing with disconnected/detached Entity Framework POCO objects,. as the DbContext doesn’t track changes to entities. Specifically, trouble occurs with entities participating in a many-to-many relationship, where the EF has hidden a “join table” from the model itself. The problem with detached entities is that the data context has no way of knowing what changes have been made to an object graph, without fetching the data from the data store and doing an entity-by-entity comparison – and that assuming it’s possible to fetch the same way as it was originally. In this solution, all the entities are detached, don’t use proxy types and are designed to move between WCF service boundaries. Some Inspiration There are no out-of-the-box solutions that I’m aware of which can process POCO object graphs that are detached. I did find an interesting solution called GraphDiff which is available from github and also as a NuGet package, but it didn’t work with the latest RC version of the Entity Framework (v6). I also found a very comprehensive article on how to implement a generic repository pattern with the Entity Framework, but it was unable to handle detached many-to-many relationships. In any case, I highly recommend a read of this article, it was inspiration for some of the approach I’ve ended up taking with my own design. The Approach This morning I put together a simple data model with the relationships that I wanted to support with detached entities. I’ve attached the solution with a sample schema and test data at the bottom of this article. If you prefer to open and play with it, be sue to add the Entity Framework (v6 RC) via NuGet, I’ve omitted it for file size and licensing reasons). Here’s a logical view of the model I wanted to support: Here’s the schema view from SQL Server: Here’s the Entity Model which is generated from the above SQL schema: In the spirit of punching myself in the head, I’ve elected to have one table implement an identity specification (meaning the underlying schema allocated PK ID values) whereas the other two tables the ID must be specified. Theoretically, if I can handle the entity types in a generic fashion, then this solution can scale out to larger and more complex models. The scenarios I’m specifically looking to solve in this solution with detached object graphs are as follows: Add a relationship (many-to-many) Add a relationship (FK-based) Update a related entity (many-to-many) Update a related entity (FK-based) Remove a relationship (many-to-many) Remove a relationship (FK-based) Per the above, here’s the scenarios within the context of the above data model: Add a new Secondary entity to a Primary entity Add an Other entity to a Secondary entity Update a Secondary entity by updating a Primary entity Update an Other entity from a Secondary entity (or Primary entity) Remove (but not delete!) a Secondary entity from a Primary entity Remove (but not delete) a Other entity from a Secondary entity Establishing Test Data Just to give myself a baseline, the data model is populated (by default) with the following data. This gives us some “existing entities” to query and modify. More Work for the Consumer Although I tried my best, I couldn’t come to a design which didn’t require the consuming client to do slightly more work to enable this to work properly. Unfortunately the best place for change tracking to occur with disconnected entities is with the layer making changes – be it a business layer or something downstream. To this effect, entities will need to implement a property which reflects the state of the entity (added, modified, deleted etc.). For the object graph to be updated/managed successfully, the consumer of the entities needs to set the entity state properly. This isn’t at all as bad as it sounds, but it’s not nothing. Establishing some Scaffolding After generating the data model, the first thing to be done is ensure each entity derives from the same base class. (“EntityBase”) this is used later to establish the active state of an entity when it needs to be processed. I’ve also created an enum (“ObjectState”) which is a property of the base class and a helper function which maps ObjectState to an EF EntityState. In case this isn’t clear, here’s a class view: Constructing Data Access To ensure that the usage is consistent, I’ve defined a single Data Access class, mainly to establish the pattern for handling detached object graphs. I can’t stress enough that this is not intended as a guide to an appropriate way to structure your data access – I’ll be updating my ongoing series of articles to go into more detail – this is only to articulate a design approach to handling detached object graphs. Having said all that, here’s a look at my “DataAccessor” class, which can be used with generic data access entities (by way of generics): As with my ongoing project, the Entity Framework DbContext is instantiated by this class on construction, and implements IDisposable to ensure the DbContext is disposed properly upon construction. Here’s the constructor showing the EF configuration options I’m using: public DataAccessor() { _accessor = new SampleEntities(); _accessor.Configuration.LazyLoadingEnabled = false; _accessor.Configuration.ProxyCreationEnabled = false; } Updating an Entity We start with a basic scenario to ensure that the scaffolding has been implemented properly. The scenario is to query for a Primary entity and then change a property and update the entity in the data store. [TestMethod] public void UpdateSingleEntity() { Primary existing = null; String existingValue = String.Empty; using (DataAccessor a = new DataAccessor()) { existing = a.DataContext.Primaries.Include("Secondaries").First(); Assert.IsNotNull(existing); existingValue = existing.Title; existing.Title = "Unit " + DateTime.Now.ToString("MMdd hh:mm:ss"); } using (DataAccessor b = new DataAccessor()) { existing.State = ObjectState.Modified; b.InsertOrUpdate(existing); } using (DataAccessor c = new DataAccessor()) { existing.Title = existingValue; existing.State = ObjectState.Modified; c.InsertOrUpdate(existing); } } You’ll noticed that there is nothing particularly significant here, except that the object’s State is reset toModified between operations. Updating a Many-to-Many Relationship Now things get interesting. I’m going to query for a Primary entity, then I’ll update both a property of thePrimary entity itself, and a property of one of the entity’s relationships. [TestMethod] public void UpdateManyToMany() { Primary existing = null; Secondary other = null; String existingValue = String.Empty; String existingOtherValue = String.Empty; using (DataAccessor a = new DataAccessor()) { //Note that we include the navigation property in the query existing = a.DataContext.Primaries.Include("Secondaries").First(); Assert.IsTrue(existing.Secondaries.Count() > 1, "Should be at least 1 linked item"); } //save the original description existingValue = existing.Description; //set a new dummy value (with a date/time so we can see it working) existing.Description = "Edit " + DateTime.Now.ToString("yyyyMMdd hh:mm:ss"); existing.State = ObjectState.Modified; other = existing.Secondaries.First(); //save the original value existingOtherValue = other.AlternateDescription; //set a new value other.AlternateDescription = "Edit " + DateTime.Now.ToString("yyyyMMdd hh:mm:ss"); other.State = ObjectState.Modified; //a new data access class (new DbContext) using (DataAccessor b = new DataAccessor()) { //single method to handle inserts and updates //set a breakpoint here to see the result in the DB b.InsertOrUpdate(existing); } //return the values to the original ones existing.Description = existingValue; other.AlternateDescription = existingOtherValue; existing.State = ObjectState.Modified; other.State = ObjectState.Modified; using (DataAccessor c = new DataAccessor()) { //update the entities back to normal //set a breakpoint here to see the data before it reverts back c.InsertOrUpdate(existing); } } If we actually run this unit test and set the breakpoints accordingly, you’ll see the following in the database: Database at Breakpoint #1 / Database at Breakpoint #2 Database when Unit Test completes You’ll notice at the second breakpoint that the description of the first entities have both been updated. Examining the Insert/Update Code The function exposed by the “data access” class really just passes through to another private function which does the heavy lifting. This is mainly in case we need to reuse the logic, since it essentially processes state action on attached entities. public void InsertOrUpdate(params T[] entities) where T : EntityBase { ApplyStateChanges(entities); DataContext.SaveChanges(); } Here’s the definition of the ApplyStateChanges function, which I’ll discuss below: private void ApplyStateChanges(params T[] items) where T : EntityBase { DbSet dbSet = DataContext.Set(); foreach (T item in items) { //loads related entities into the current context dbSet.Attach(item); if (item.State == ObjectState.Added || item.State == ObjectState.Modified) { dbSet.AddOrUpdate(item); } else if (item.State == ObjectState.Deleted) { dbSet.Remove(item); } foreach (DbEntityEntry entry in DataContext.ChangeTracker.Entries() .Where(c => c.Entity.State != ObjectState.Processed && c.Entity.State != ObjectState.Unchanged)) { var y = DataContext.Entry(entry.Entity); y.State = HelperFunctions.ConvertState(entry.Entity.State); entry.Entity.State = ObjectState.Processed; } } } Notes on this Implementation What this function does is to iterate through the items to be examined, attach them to the current Data Context (which also attaches their children), act on each item accordingly (add/update/remove) and then process new entities which have been added to the Data Context’s change tracker. For each newly “discovered” entity (and ignoring entities which are unchanged or have already been examined), each entity’s DbEntityEntry is set according to the entity’s ObjectState (which is set by the calling client). Doing this allows the Entity Framework to understand what actions it needs to perform on the entities when SaveChanges() is invoked later. You’ll also note that I set the entity’s state to “Processed” when it has been examined, so we don’t act on it more than once (for performance purposes). Fun note: the AddOrUpdate extension method is something I found in theSystem.Data.Entity.Migrations namespace and it acts as an ‘Upsert’ operation, inserting or updating entities depending on whether they exist or not already. Bonus! That’s it for adding and updating, believe it or not. Corresponding Unit Test The following unit test establishes the creation of a new many-to-many entity, it is then removed (by relationship) and then finally deleted altogether from the database: [TestMethod] public void AddRemoveRelationship() { Primary existing = null; using (DataAccessor a = new DataAccessor()) { existing = a.DataContext.Primaries.Include("Secondaries") .FirstOrDefault(); Assert.IsNotNull(existing); } Secondary newEntity = new Secondary(); newEntity.State = ObjectState.Added; newEntity.AlternateTitle = "Unit"; newEntity.AlternateDescription = "Test"; newEntity.SecondaryId = 1000; existing.Secondaries.Add(newEntity); using (DataAccessor a = new DataAccessor()) { //breakpoint #1 here a.InsertOrUpdate(existing); } newEntity.State = ObjectState.Unchanged; existing.State = ObjectState.Modified; using (DataAccessor b = new DataAccessor()) { //breakpoint #2 here b.RemoveEntities(existing, x => x.Secondaries, newEntity); } using (DataAccessor c = new DataAccessor()) { //breakpoint #3 here c.Delete(newEntity); } } Test Results: Pre-Test – Breakpoint #1 / Breakpoint #2 Breakpoint #3 / Post execution (new entity deleted) SQL Profile Trace Removing a Many-to-Many Relationship Now this is where it gets tricky. I’d like to have something a little more polished, but the best I have come up with to date is a separate operation on the data provider which exposes functionality akin to “remove relationship”. The fundamental problem with how the EF POCO entities work without any modifications, is when they are detached, to remove a many-to-many relationship, the relationship to be removed is physically removed from the collection. When the object graph is sent back for processing, there’s a missing related entity, and the service or data context would have to make an assumption that the omission was on purpose, not to mention that it would have to compare against data currently in the data store. To make this easier, I’ve implemented a function called “RemoveEnttiies” which alters the relationship between the parent and the child/children. The one bug catch is that you need to specify the navigation property or collection, which might make it slightly undesirable to implement generically. In any case, I’ve provided two options – with the navigation property as a string parameter or as a LINQ expression – they both do the same thing. public void RemoveEntities(T parent, Expression> expression, params T2[] children) where T : EntityBase where T2 : EntityBase { DataContext.Set().Attach(parent); ObjectContext obj = DataContext.ToObjectContext(); foreach (T2 child in children) { DataContext.Set().Attach(child); obj.ObjectStateManager.ChangeRelationshipState(parent, child, expression, EntityState.Deleted); } DataContext.SaveChanges(); } Notes on this Implementation The “ToObjectContext” is an extension method, and is akin to (DataContext as IObjectContextAdapter).ObjectContext. This is to expose a more fundamental part of the Entity Framework’s object model. We need this level of access to get to the functionality which controls relationships. For each child to be removed (note: not deleted from the physical database), we nominate the parent object, the child, the navigation property (collection) and the nature of the relationship change (delete). Note that this will NOT WORK for Foreign Key defined relationships – more on that below. To delete entities which have active relationships, you’ll need to drop the relationship before attempting to delete or else you’ll have data integrity/referential integrity errors, unless you have accounted for cascading deletion (which I haven’t). Example execution: using (DataAccessor c = new DataAccessor()) { //c.RemoveEntities(existing, "Secondaries", s); //(or can use an expression): c.RemoveEntities(existing, x => x.Secondaries, s); } Removing FK Relationships As mentioned above, you can’t just edit the relationship to remove an FK-based relationship. Instead, you have to follow the EF practice of setting the FK entity to NULL. Here’s a Unit Test which demonstrates how this is achieved: Secondary s = ExistingEntity(); using (DataAccessor c = new DataAccessor()) { s.Other = null; s.OtherId = null; s.State = ObjectState.Modified; o.State = ObjectState.Unchanged; c.InsertOrUpdate(s); } We use the same “Insert or Update’ call – being aware that you have to set the ObjectState properties accordingly. Note: I’m in the process of testing the reverse removal – i.e. what happens if you want to remove a Secondaryentity from an Other entity’s collection. Deleting Entities This is fairly straightforward, but I’ve taken a few more precautions to ensure that the entity to be deleted is valid no the server side. public void Delete(params T[] entities) where T : EntityBase { foreach (T entity in entities) { T attachedEntity = Exists(entity); if (attachedEntity != null) { var attachedEntry = DataContext.Entry(attachedEntity); attachedEntry.State = EntityState.Deleted; } } DataContext.SaveChanges(); } To understand the above, you should take a look at the implementation of the “Exists” function which essentially checks the data store and local cache to see if there is an attached representation: protected T Exists(T entity) where T : EntityBase { var objContext = ((IObjectContextAdapter)this.DataContext) .ObjectContext; var objSet = objContext.CreateObjectSet(); var entityKey = objContext.CreateEntityKey(objSet.EntitySet.Name, entity); DbSet set = DataContext.Set(); var keys = (from x in entityKey.EntityKeyValues select x.Value).ToArray(); //Remember, there can by surrogate keys, so don't assume there's //just one column/one value //If a surrogate key isn't ordered properly, the Set().Find() //method will fail, use attributes on the entity to determine the //proper order. //context.Configuration.AutoDetectChangesEnabled = false; return set.Find(keys); } This is a fairly expensive operation which is why it’s pretty much reserved for deletes and not more frequent operations. It essentially determines the target entity’s primary key and then checks whether the entity exists or not. Note: I haven’t tested this on entities with surrogate keys, but I’ll get to it at some point. If you have surrogate key tables, you can define the PK key order using attributes on the model entity, but I haven’t done this (yet). Summary This article is the culmination of about two days of heavy analysis and investigation. I’ve got a whole lot more to contribute on this topic, but for now, I felt it was worthy enough to post as-is. What you’ve got here is still incredibly rough, and I haven’t done nearly enough testing. To be honest, I was quite excited by the initial results, which is why I decided to write this post. there’s an incredibly good chance that I’ve missed something in the design and implementation, so please be aware of that. I’ll be continuing to refine this approach in my main series of articles with much cleaner implementation. In the meantime though, if any of this helps anyone out there struggling with detached entities, I hope it helps. There’s precious few articles and samples that are up to date, and very few that seem to work. This is provided without any warranty of any kind! If you find any issues please e-mail me [email protected] and I’ll attempt to refactor/debug and find ways around some of the inherent limitations. In the meantime, there are a few helpful links I’ve come across in my travels on the WWW. See below. Example Solution Files [ Files ] Note: you’ll need to add the Entity Framework v6 RC package via NuGet, I haven’t included it in the archive. Helpful Links http://blog.magnusmontin.net/2013/05/30/generic-dal-using-entity-framework/ https://github.com/refactorthis/GraphDiff http://stackoverflow.com/questions/11686225/dbset-find-method-ridiculously-slow-compared-to-singleordefault-on-id http://stackoverflow.com/questions/10381106/cannot-update-many-to-many-relationships-in-entity-framework http://stackoverflow.com/questions/8413248/how-to-save-an-updated-many-to-many-collection-on-detached-entity-framework-4-1 http://stackoverflow.com/questions/6018711/generic-way-to-check-if-entity-exists-in-entity-framework
September 18, 2013
by Rob Sanders
· 163,447 Views
article thumbnail
Introduction to ElasticSearch
Learn about ElasticSearch, an open source tool developed with Java. It is a Lucene-based, scalable, full-text search engine, and a data analysis tool.
September 17, 2013
by Hüseyin Akdoğan DZone Core CORE
· 12,113 Views · 5 Likes
article thumbnail
EasyNetQ: Big Breaking Changes in the Advanced Bus
EasyNetQ is my little, easy to use, client API for RabbitMQ. It’s been doing really well recently. As I write this, it has 24,653 downloads on NuGet, making it by far the most popular high-level RabbitMQ API. The goal of EasyNetQ is to make working with RabbitMQ as easy as possible. I wanted junior developers to be able to use basic messaging patterns out-of-the-box with just a few lines of code and have EasyNetQ do all the heavy lifting: exchange-binding-queue configuration, error management, connection management, serialization, thread handling; all the things that make working against the low level AMQP C# API, provided by RabbitMQ, such a steep learning curve. To meet this goal, EasyNetQ has to be a very opinionated library. It has a set way of configuring exchanges, bindings and queues based on the .NET type of your messages. However, right from the first release, many users said that they liked the connection management, thread handling, and error management, but wanted to be able to set up their own broker topology. To support this, we introduced the advanced API, an idea stolen shamelessly from Ayende’s RavenDB client. You access the advanced bus (IAdvancedBus) via the Advanced property on IBus: var advancedBus = RabbitHutch.CreateBus("host=localhost").Advanced; Sometimes something can seem like a good idea at the time, and then later you think, “WTF! Why on earth did I do that?” It happens to me all the time. I thought it would be cool if I created the exchange-binding-queue topology and then passed it to the publish and subscribe methods, which would then internally declare the exchanges and queues and do the binding. I implemented a tasty little visitor pattern in my ITopologyVisitor. I optimized for my own programming pleasure, rather than an a simple, obvious, easy-to-understand API. I realized a while ago that a more straightforward set of declares on IAdvancedBus would be a far more obvious and intentional design. To this end, I’ve refactored the advanced bus to separate declares from publishing and consuming. I just pushed the changes to NuGet and have also updated the Advanced Bus documentation. Note that these are breaking changes, so please be careful if you are upgrading to the latest version, 0.12, and upwards. Here is a taste of how it works: Declare a queue, exchange and binding, and consume raw message bytes: var advancedBus = RabbitHutch.CreateBus("host=localhost").Advanced; var queue = advancedBus.QueueDeclare("my_queue"); var exchange = advancedBus.ExchangeDeclare("my_exchange", ExchangeType.Direct); advancedBus.Bind(exchange, queue, "routing_key"); advancedBus.Consume(queue, (body, properties, info) => Task.Factory.StartNew(() => { var message = Encoding.UTF8.GetString(body); Console.Out.WriteLine("Got message: '{0}'", message); })); Note that I’ve renamed ‘Subscribe’ to ‘Consume’ to better reflect the underlying AMQP method. Declare an exchange and publish a message: var advancedBus = RabbitHutch.CreateBus("host=localhost").Advanced; var exchange = advancedBus.ExchangeDeclare("my_exchange", ExchangeType.Direct); using (var channel = advancedBus.OpenPublishChannel()) { var body = Encoding.UTF8.GetBytes("Hello World!"); channel.Publish(exchange, "routing_key", new MessageProperties(), body); } You can also delete exchanges, queues and bindings: var advancedBus = RabbitHutch.CreateBus("host=localhost").Advanced; // declare some objects var queue = advancedBus.QueueDeclare("my_queue"); var exchange = advancedBus.ExchangeDeclare("my_exchange", ExchangeType.Direct); var binding = advancedBus.Bind(exchange, queue, "routing_key"); // and then delete them advancedBus.BindingDelete(binding); advancedBus.ExchangeDelete(exchange); advancedBus.QueueDelete(queue); advancedBus.Dispose(); I think these changes make for a much better advanced API. Have a look at the documentation for the details.
September 13, 2013
by Mike Hadlow
· 12,227 Views
article thumbnail
How to shard a cron
Sharding is a database partitioning technique that distributed Aggregates such as rows or documents across multiple servers; this choice for horizontal queries trades in some client complexity (whose queries must include a shard key such as a zip code or a customer id) for the capability of distributing the dataset between multiple servers, scaling not only the read but also the write capacity. On the application side, there are several singleton processes - for example cron configurations - that are usually run only once on the whole data set. For a certain category of singleton processes we can switch to a shard-like architecture that can scale first to multiple processes and when necessary to multiple servers. Step 1: identify the candidate process Take a look at your crontab or at your process scheduling configuration if you use another infrastructure. Some of the processes are aggregations of data producing statistics, and their work can already be distributed with patterns such as MapReduce. The kind of processes interesting for client sharding is the one where an operation is performed over every single element of the data set. Each element is an Aggregate and as such does not interact, in a single transaction, with other ones. Therefore these processes are intrinsically parallelizable: you only need a way to distribute the load. Some examples of shard-able processes are: rebuild the data aggregates for a new day for each user perform some consistency checks on every customer order send all pending orders perform a renewal for each user subscription Step 2: choose a shard key Once you have identified the aggregates along which to parallelize a length operation, the choice of the shard key will usually be straightforward. This key must be uniformly distributed between the N shards you want to create on the client side. Some examples: the zip code for customers the numerical, sequential id for orders a UUID for uploaded videos the transaction reference number for money transactions Step 3: divide the work with the shard key A process before the application of sharding is usually composed of two phases: select all Aggregates whose satisfy condition C apply operation O to all the selected Aggregates. The first operation can be sometime sharded directly, transforming each of the N processes in: select all Aggregates whose satisfy condition C and whose shard key is equal to this shard's number modulo N. apply operation O to all the selected Aggregates. For example, the first operation for shard 0 of 4 can be accomplished by an SQL query: SELECT * FROM aggregate_table WHERE outdated=true // condition C AND aggregate_id % 4=0// sharding Precalculating aggregate_id % 4 can improve the performance of the query, depending on your database; however it can make more difficult to rescale the number of processes. When you switch to 8 or 16 client shards it will be necessary to stop all current running processes, recalculate the column and restart the new batch. Furthermore, the performance of ALTER TABLE is usually not good on large tables which are the subject of this client sharding technique. Some times we're not able to divide the query in a partition of the original data set directly in the database. The pattern becomes: select id (and shard key if different) for all Aggregates whose satisfy condition C. filter the subset by only considering the id whose modulo N is equal to this shard's number. apply operation O to all the selected Aggregates. For example, I use this second form while using multiple client with MongoDB. I do not know if it's possible to query ObjectIds by their modulo, so I resort to selecting all of them and then filtering them out on the client side: $id = (string) $document['_id']; $numericalValue = hexdec(substr($id, -4)); // without substr() the conversion will overflow 32-bit integers if ($numericalValues % $shards == $shard) { ... } This is only useful under the assumption that is not the query that's taking too much time in the original process, but the application of the O operation to all of its results. Note also that these solutions needs well-behaving processes that do not intervene on the data of each other: the filtering is left to the programmer. In general, it is also necessary to guarantee mutual exclusion with the original singleton process; this usually comes up when deploying the battery of N crons, as they should not start until the last of the singleton processes has terminated not to be started again.
September 4, 2013
by Giorgio Sironi
· 6,820 Views
article thumbnail
API Gateway and API Portal - The pillars of API Management and the evolution of SOA
API Management solutions must combine an API Portal (for signing up developers) with an API Gateway (to link back to the enterprise). But where do these come from, and what is the relationship with SOA? To answer these questions, first let's look at a bit of history: In the 2000's, we had the SOA Gateway and the SOA Registry, working hand-in-hand. This was "SOA Governance". The SOA Registry (with a Repository) was intended to be the "central store of truth" for information about Web Services. It was often the public face of SOA Governance, the part which people could see. Usually the services in the registry took the form of heavyweight SOAP services, defined by WSDLs. The problem was that developers were often forced to register their SOAP services in the registry, rather than feeling that it was something beneficial to them. Browsing the registry was also a chore, involving the use of UDDI, also a heavyweight protocol (in fact, it was built on SOAP). Fast-forward to the current decade, and we find that the SOA Registry has been replaced by the API Portal. An API portal is also the "central store of truth", but now it includes REST APIs definitions (usually expressed using a Swagger-type format) as well as SOAP services. The API Portal is designed to be useful and helpful to developers who wish to build apps, rather than feeling like a chore to use. The lesson of SOA was that an attitude of "If we build it, they will come" (or "If we put it in the SOA Registry, people will use it") does not work. You have to make it into a pleasant experience for developers. API portals work for the very reason that SOA registries did not work: usability. Just like the SOA Gateway worked with the SOA Registry, so the API Gateway works hand-in-hand with the API Portal. Together, the combination of the API Portal with the API Gateway constitutes "API Management". The API Portal is for developers to sign up to use APIs, receive API Keys and quotas, and the API Gateway operates at runtime, managing the API Key usage and enforcing the API usage quotas. The API Gateway also performs the very important task of bridging from the technologies used by API clients (REST, OAuth) to the technologies used in the enterprise (Kerberos, SAML, or proprietary identity tokens such as CA SiteMinder smsession tokens). For more on this bridging, check out my webinar with Jason Cardinal from Identica tomorrow on "Bridging APIs to Enterprise Infrastructure". Gartner defines the combination of SOA Governance and API Management as "Application Services Governance". I'm proud to say that Axway (which acquired Vordel in 2012) is recognized by Gartner as a Leader in the category of Application Services Governance. We've seen an evolution of technologies (SOAP to REST) and approach (the UDDI registry to the web-based API Portal) in the journey from SOA Governance to API Management. From 30,000 feet, SOA Governance and API Management might look similar, but the new approach of API Management has already outshone SOA. The API Gateway and API Portal are key to this.
September 3, 2013
by Mitch Pronschinske
· 7,793 Views
article thumbnail
Assigning UUIDs to Neo4j Nodes and Relationships
TL;DR: This blog post features a small demo project on github: neo4j-uuid and explains how to automatically assign UUIDs to nodes and relationships in Neo4j. A very brief introduction into Neo4j 1.9′s KernelExtensionFactory is included as well. A Little Rant on Neo4j Node/Relationship IDs In a lot of use cases there is demand for storing a reference to a Neo4j node or relationship in a third party system. The first naive idea probably is to use the internal node/relationship id that Neo4j provides. Do not do that! Ever! You ask why? Well, Neo4j’s id is basically a offset in one of the store files Neo4j uses (with some math involved). Assume you delete couple of nodes. This produces holes in the store files that Neo4j might reclaim when creating new nodes later on. And since the id is a file offset there is a chance that the new node will have exactly the same id like the previously deleted node. If you don’t synchronously update all node id references stored elsewhere, you’re in trouble. If neo4j would be completely redeveloped from scratch the getId() method would not be part of the public API. As long as you use node ids only inside a request of an application for example, there’s nothing wrong. To repeat myself: Never ever store a node id in a third party system. I have officially warned you. UUIDs Enough of ranting, let’s see what we can do to safely store node references in an external system. Basically we need an identifier that has no semantics in contrast to the node id. A common approach to this is using Universally Unique Identifiers (UUID). Java JDK offers a UUID implementation, so we could potentially use UUID.randomUUID(). Unfortunately random UUIDs are slow to generate. A preferred approach is to use the machine’s MAC and a timestamp as base for the UUID – this should provide enough uniqueness. There a nice library out there at http://wiki.fasterxml.com/JugHome providing exactly what we need. Automatic UUID Assignments For convenience it would be great if all fresh created nodes and relationships get automatically assigned a uuid property without doing this explicitly. Fortunately Neo4j supports TransactionEventHandlers, a callback interface pluging into transaction handling. A TransactionEventHandler has a chance to modify or veto any transaction. It’s a sharp tool which can have significant negative performance impact if used the wrong way. I’ve implemented a UUIDTransactionEventHandler that performs the following tasks: Populate a UUID property for each new node or relationship Reject a transaction if a manual modification of a UUID is attempted; either assignment or removal public class UUIDTransactionEventHandler implements TransactionEventHandler { public static final String UUID_PROPERTY_NAME = "uuid"; private final TimeBasedGenerator uuidGenerator = Generators.timeBasedGenerator(); @Override public Object beforeCommit(TransactionData data) throws Exception { checkForUuidChanges(data.removedNodeProperties(), "remove"); checkForUuidChanges(data.assignedNodeProperties(), "assign"); checkForUuidChanges(data.removedRelationshipProperties(), "remove"); checkForUuidChanges(data.assignedRelationshipProperties(), "assign"); populateUuidsFor(data.createdNodes()); populateUuidsFor(data.createdRelationships()); return null; } @Override public void afterCommit(TransactionData data, java.lang.Object state) { } @Override public void afterRollback(TransactionData data, java.lang.Object state) { } /** * @param propertyContainers set UUID property for a iterable on nodes or relationships */ private void populateUuidsFor(Iterable propertyContainers) { for (PropertyContainer propertyContainer : propertyContainers) { if (!propertyContainer.hasProperty(UUID_PROPERTY_NAME)) { final UUID uuid = uuidGenerator.generate(); final StringBuilder sb = new StringBuilder(); sb.append(Long.toHexString(uuid.getMostSignificantBits())).append(Long.toHexString(uuid.getLeastSignificantBits())); propertyContainer.setProperty(UUID_PROPERTY_NAME, sb.toString()); } } } private void checkForUuidChanges(Iterable> changeList, String action) { for (PropertyEntry removedProperty : changeList) { if (removedProperty.key().equals(UUID_PROPERTY_NAME)) { throw new IllegalStateException("you are not allowed to " + action + " " + UUID_PROPERTY_NAME + " properties"); } } } } Setting up Using KernelExtensionFactory There are two remaining tasks for full automation of UUID assignments: We need to setup autoindexing for uuid properties to have a convenient way to look up nodes or relationships by UUID We need to register UUIDTransactionEventHandler with the graph database Since version 1.9 Neo4j has the notion of KernelExtensionFactory. Using KernelExtensionFactory you can supply a class that receives lifecycle callbacks when e.g. Neo4j is started or stopped. This is the right place for configuring autoindexing and setting up the TransactionEventHandler. Since JVM’s ServiceLoader is used KernelExtenstionFactories need to be registered in a file META-INF/services/org.neo4j.kernel.extension.KernelExtensionFactory by listing all implementations you want to use: org.neo4j.extension.uuid.UUIDKernelExtensionFactory KernelExtensionFactories can declare dependencies, therefore declare a inner interface (“Dependencies” in code) below that just has getters. Using proxies Neo4j will implement this class and supply you with the required dependencies. The dependencies are match on requested type, see Neo4j’s source code what classes are supported for being dependencies. KernelExtensionFactories must implement a newKernelExtension method that is supposed to return a instance of LifeCycle. For our UUID project we return a instance of UUIDLifeCycle: package org.neo4j.extension.uuid; import org.neo4j.graphdb.GraphDatabaseService; import org.neo4j.graphdb.PropertyContainer; import org.neo4j.graphdb.event.TransactionEventHandler; import org.neo4j.graphdb.factory.GraphDatabaseSettings; import org.neo4j.graphdb.index.AutoIndexer; import org.neo4j.graphdb.index.IndexManager; import org.neo4j.kernel.configuration.Config; import org.neo4j.kernel.lifecycle.LifecycleAdapter; import java.util.Map; /** * handle the setup of auto indexing for UUIDs and registers a {@link UUIDTransactionEventHandler} */ class UUIDLifeCycle extends LifecycleAdapter { private TransactionEventHandler transactionEventHandler; private GraphDatabaseService graphDatabaseService; private IndexManager indexManager; private Config config; UUIDLifeCycle(GraphDatabaseService graphDatabaseService, Config config) { this.graphDatabaseService = graphDatabaseService; this.indexManager = graphDatabaseService.index(); this.config = config; } /** * since {@link org.neo4j.kernel.NodeAutoIndexerImpl#start()} is called *after* {@link org.neo4j.extension.uuid.UUIDLifeCycle#start()} it would apply config settings for auto indexing. To prevent this we change config here. * @throws Throwable */ @Override public void init() throws Throwable { Map params = config.getParams(); params.put(GraphDatabaseSettings.node_auto_indexing.name(), "true"); params.put(GraphDatabaseSettings.relationship_auto_indexing.name(), "true"); config.applyChanges(params); } @Override public void start() throws Throwable { startUUIDIndexing(indexManager.getNodeAutoIndexer()); startUUIDIndexing(indexManager.getRelationshipAutoIndexer()); transactionEventHandler = new UUIDTransactionEventHandler(); graphDatabaseService.registerTransactionEventHandler(transactionEventHandler); } @Override public void stop() throws Throwable { stopUUIDIndexing(indexManager.getNodeAutoIndexer()); stopUUIDIndexing(indexManager.getRelationshipAutoIndexer()); graphDatabaseService.unregisterTransactionEventHandler(transactionEventHandler); } void startUUIDIndexing(AutoIndexer autoIndexer) { autoIndexer.startAutoIndexingProperty(UUIDTransactionEventHandler.UUID_PROPERTY_NAME); } void stopUUIDIndexing(AutoIndexer autoIndexer) { autoIndexer.stopAutoIndexingProperty(UUIDTransactionEventHandler.UUID_PROPERTY_NAME); } } Most of the code is pretty much straight forward, l.44/45 set up autoindexing for uuid property. l48 registers the UUIDTransactionEventHandler with the graph database. Not that obvious is the code in the init() method. Neo4j’s NodeAutoIndexerImpl configures autoindexing itself and switches it on or off depending on the respective config option. However we want to have autoindexing always switched on. Unfortunately NodeAutoIndexerImpl is run after our code and overrides our settings. That’s we l.37-40 tweaks the config settings to force nice behaviour of NodeAutoIndexerImpl. Looking up Nodes or Relationships for UUID For completeness the project also contains a trivial unmanaged extension for looking up nodes and relationships using the REST interface, see UUIDRestInterface. By sending a HTTP GET to http://localhost:7474/db/data/node/ the node’s internal id returned. Build System and Testing For building the project, Gradle is used; build.gradle is trivial. Of course couple of tests are included. As a long standing addict I’ve obviously used Spock for testing. See the test code here. Final Words A downside of this implementation is that each and every node and relationships gets indexed. Indexing always trades write performance for read performance. Keep that in mind. It might make sense to get rid of unconditional auto indexing and put some domain knowledge into the TransactionEventHandler to assign only those nodes uuids and index them that are really used for storing in an external system.
August 22, 2013
by Stefan Armbruster
· 11,673 Views
article thumbnail
DyngoDB: A MongoDB Interface for DynamoDB
You might be asking yourself, 'why do I need a MongoDB-like experience for DynamoDB when there are already full-MongoDB cloud services like MMS, MongoLab, MongoHQ and MongoDirector? One developer believes there is a need and has set up an experimental project called DyngoDB. It provides the MongoDB-style interface in front of Amazon's DynamoDB and their CloudSearch service. Apparently, in the developer's case, he only wants the MongoDB interface but prefers the DynamoDB storage engine. We'll have to see if other developers also have this specific set of preferences.
August 20, 2013
by Mitch Pronschinske
· 5,735 Views
article thumbnail
Destroy Cookie while Logging out.
I was facing a problem where while a person logs out his session is invalidated but the JSESSIONID still remained in the browser. As a result while logging in the Java API used to get the request from the browser along with a JSESSIONID(Just the ID since the session was invalidated) and would create the new session with the same ID. To fix this problem I used the above code so that whenever a user logs out the entire JSESSIONID becomes empty and thus cookie wont exist for that site.Anyone using JAVA can utilize this in their code. @RequestMapping(value = "/logout", method = RequestMethod.POST) public void logout(HttpServletRequest request, HttpServletResponse response) { /* Getting session and then invalidating it */ HttpSession session = request.getSession(false); if (request.isRequestedSessionIdValid() && session != null) { session.invalidate(); } handleLogOutResponse(response); } /** * This method would edit the cookie information and make JSESSIONID empty * while responding to logout. This would further help in order to. This would help * to avoid same cookie ID each time a person logs in * @param response */ private void handleLogOutResponse(HttpServletResponse response) { Cookie[] cookies = request.getCookies(); for (Cookie cookie : cookies) { cookie.setMaxAge(0); cookie.setValue(null); cookie.setPath("/"); response.addCookie(cookie); } }
August 15, 2013
by Shiv Kumar Ganesh
· 41,325 Views · 2 Likes
article thumbnail
neo4j: Extracting a subgraph as an adjacency matrix and calculating eigenvector centrality with JBLAS
Earlier in the week I wrote a blog post showing how to calculate the eigenvector centrality of an adjacency matrix using JBLAS and the next step was to work out the eigenvector centrality of a neo4j sub graph. There were 3 steps involved in doing this: Export the neo4j sub graph as an adjacency matrix Run JBLAS over it to get eigenvector centrality scores for each node Write those scores back into neo4j I decided to make use of the Paul Revere data set from Kieran Healy’s blog post which consists of people and groups that they had membership of. The script to import the data is on my fork of the revere repository. Having imported the data the next step was to write a cypher query which would give me the people in anadjacency matrix with the number in each column/row intersection showing how many common groups that pair of people had. I thought it’d be easier to build this query incrementally so I started out writing a query which would return one row of the adjacency matrix: MATCH p1:Person, p2:Person WHERE p1.name = "Paul Revere" WITH p1, p2 MATCH p = p1-[?:MEMBER_OF]->()<-[?:MEMBER_OF]-p2 WITH p1.name AS p1, p2.name AS p2, COUNT(p) AS links ORDER BY p2 RETURN p1, COLLECT(links) AS row Here we start with Paul Revere and then find the relationships between him and every other person by way of a common group membership. We use an optional relationship since we need to include a value in each column/row of our adjacency matrix we need to return a 0 value for anyone he doesn’t intersect with. If we run that query we get back the following: +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | p1 | row | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "Paul Revere" | [2,1,1,1,1,1,1,1,1,1,1,1,1,1,2,3,1,1,1,1,1,1,3,3,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,3,2,1,1,2,1,2,1,1,1,1,1,0,1,1,1,1,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,2,1,1,1,1,1,1,2,1,3,1,3,2,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,0,1,0,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,4,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3,1,1,1,1,1,1,2,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,3,1,1,2,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3,1,1,2,1,1,1,1,1,1,1,1,3,1,1,1,1,3,1,1,1,1,0,1,2,1,1,1,1,1,1,1] | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ As it turns outs we’ve only got to remove the WHERE clause and order everybody and we’ve get the adjacency matrix for everyone: MATCH p1:Person, p2:Person WITH p1, p2 MATCH p = p1-[?:MEMBER_OF]->()<-[?:MEMBER_OF]-p2 WITH p1.name AS p1, p2.name AS p2, COUNT(p) AS links ORDER BY p2 RETURN p1, COLLECT(links) AS row ORDER BY p1 +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | p1 | row | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "Abiel Ruddock" | [0,1,1,1,0,1,0,1,0,0,1,1,1,0,1,2,0,1,0,1,1,1,2,2,1,0,0,1,1,0,1,1,1,1,1,0,0,0,0,1,1,0,0,2,2,0,0,1,1,2,1,1,1,0,1,0,1,1,0,0,2,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,0,1,1,0,1,1,1,1,1,1,1,1,1,0,2,1,2,1,0,0,0,0,1,1,0,1,0,0,1,0,2,0,0,1,0,0,0,1,0,0,2,0,1,0,1,1,1,0,0,1,1,0,0,0,0,0,0,2,0,0,0,0,0,0,0,1,0,1,1,0,1,1,1,2,0,0,1,1,0,0,2,0,1,2,1,1,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,1,2,1,0,1,1,1,1,1,0,0,1,1,0,0,0,0,1,0,1,1,0,0,1,0,0,2,1,0,0,1,1,1,1,0,1,0,0,0,1,0,1,0,1,1,0,0,1,0,1,0,1,0,0,1,0,2,1,1,0,0,2,0,1,0,0,0,0,1,0,1,0,1,0,1,0] | | "Abraham Hunt" | [1,0,1,1,0,1,0,0,0,0,0,1,0,0,0,1,0,1,0,1,1,0,1,1,0,0,0,1,1,0,1,0,0,1,0,0,0,0,0,1,0,0,0,1,1,0,0,0,1,1,1,1,1,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,0,1,1,0,1,0,1,1,1,1,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,1,1,0,1,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,1,1,1,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,1,0] | ... +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 254 rows 9897 ms The next step was to wire up the query results with the JBLAS code that I wrote in the previous post. I ended up with the following: public class Neo4jAdjacencyMatrixSpike { public static void main(String[] args) throws SQLException { ClientResponse response = client() .resource("http://localhost:7474/db/data/cypher") .entity(queryAsJson(), MediaType.APPLICATION_JSON) .accept(MediaType.APPLICATION_JSON) .post(ClientResponse.class); JsonNode result = response.getEntity(JsonNode.class); ArrayNode rows = (ArrayNode) result.get("data"); List principalEigenvector = JBLASSpike.getPrincipalEigenvector(new DoubleMatrix(asMatrix(rows))); List people = asPeople(rows); updatePeopleWithEigenvector(people, principalEigenvector); System.out.println(sort(people).take(10)); } private static double[][] asMatrix(ArrayNode rows) { double[][] matrix = new double[rows.size()][254]; int rowCount = 0; for (JsonNode row : rows) { ArrayNode matrixRow = (ArrayNode) row.get(2); double[] rowInMatrix = new double[254]; matrix[rowCount] = rowInMatrix; int columnCount = 0; for (JsonNode jsonNode : matrixRow) { matrix[rowCount][columnCount] = jsonNode.asInt(); columnCount++; } rowCount++; } return matrix; } // rest cut for brevity } Here we are taking the query and then converting it into an array of arrays before passing it to our JBLAS code to calculate the principal eigenvector. We then return the top 10 people: Person{name='William Cooper', eigenvector=0.172604992239612, nodeId=68}, Person{name='Nathaniel Barber', eigenvector=0.17260499223961198, nodeId=18}, Person{name='John Hoffins', eigenvector=0.17260499223961195, nodeId=118}, Person{name='Paul Revere', eigenvector=0.17171142003936804, nodeId=207}, Person{name='Caleb Davis', eigenvector=0.16383970722169897, nodeId=71}, Person{name='Caleb Hopkins', eigenvector=0.16383970722169897, nodeId=121}, Person{name='Henry Bass', eigenvector=0.16383970722169897, nodeId=21}, Person{name='Thomas Chase', eigenvector=0.16383970722169897, nodeId=54}, Person{name='William Greenleaf', eigenvector=0.16383970722169897, nodeId=104}, Person{name='Edward Proctor', eigenvector=0.15600043886738055, nodeId=201} I get back the same 10 people as Kieran Healy although they have different eigenvector values. As far as I understand the absolute value doesn’t matter, what’s more important is the relative score to other people so I think we’re ok. The final step was to write these eigenvector values back into neo4j which we can do with the following code: private static void updateNeo4jWithEigenvectors(List people) { for (Person person : people) { ObjectNode request = JsonNodeFactory.instance.objectNode(); request.put("query", "START p = node({nodeId}) SET p.eigenvectorCentrality={value}"); ObjectNode params = JsonNodeFactory.instance.objectNode(); params.put("nodeId", person.nodeId); params.put("value", person.eigenvector); request.put("params", params); client() .resource("http://localhost:7474/db/data/cypher") .entity(request, MediaType.APPLICATION_JSON) .accept(MediaType.APPLICATION_JSON) .post(ClientResponse.class); } } Now we might use that eigenvector centrality value in other queries, such as one to show who the most central/potentially influential people are in each group: MATCH g:Group<-[:MEMBER_OF]-p WITH g.name AS group, p.name AS personName, p.eigenvectorCentrality as eigen ORDER BY eigen DESC WITH group, COLLECT(personName) AS people RETURN group, HEAD(people) + [HEAD(TAIL(people))] + [HEAD(TAIL(TAIL(people)))] AS mostCentral +--------------------------------------------------------------------------+ | group | mostCentral | +--------------------------------------------------------------------------+ | "StAndrewsLodge" | ["Paul Revere","Joseph Warren","Thomas Urann"] | | "BostonCommittee" | ["William Cooper","Nathaniel Barber","John Hoffins"] | | "LoyalNine" | ["Caleb Hopkins","William Greenleaf","Caleb Davis"] | | "LondonEnemies" | ["William Cooper","Nathaniel Barber","John Hoffins"] | | "LongRoomClub" | ["Paul Revere","John Hancock","Benjamin Clarke"] | | "NorthCaucus" | ["William Cooper","Nathaniel Barber","John Hoffins"] | | "TeaParty" | ["William Cooper","Nathaniel Barber","John Hoffins"] | +--------------------------------------------------------------------------+ 7 rows 280 ms Our top ten feature frequently although it’s interesting that only one of them is in the ‘LongRoomClub’ group which perhaps indicates that people in that group are less likely to be members of the other ones. I’d be interested if anyone can think of other potential uses for eigenvector centrality once we’ve got it back in the graph. All the code described in this post is on github if you want to take it for a spin.
August 12, 2013
by Mark Needham
· 5,644 Views
  • Previous
  • ...
  • 502
  • 503
  • 504
  • 505
  • 506
  • 507
  • 508
  • 509
  • 510
  • 511
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×