DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Databases Topics

article thumbnail
EclipseLink MOXy and the Java API for JSON Processing - Object Model APIs
The Java API for JSON Processing (JSR-353) is the Java standard for producing and consuming JSON which was introduced as part of Java EE 7. JSR-353 includes object (DOM like) and stream (StAX like) APIs. In this post I will demonstrate the initial JSR-353 support we have added to MOXy's JSON binding in EclipseLink 2.6. You can now use MOXy to marshal to: javax.json.JsonArrayBuilder javax.json.JsonObjectBuilder And unmarshal from: javax.json.JsonStructure javax.json.JsonObject javax.json.JsonArray You can try this out today using a nightly build of EclipseLink 2.6.0: http://www.eclipse.org/eclipselink/downloads/nightly.php The JSR-353 reference implementation is available here: https://java.net/projects/jsonp/downloads/download/ri/javax.json-ri-1.0.zip Java Model Below is the simple customer model that we will use for this post. Note for this example we are only using the standard JAXB (JSR-222) annotations. Customer package blog.jsonp.moxy; import java.util.*; import javax.xml.bind.annotation.*; @XmlType(propOrder={"id", "firstName", "lastName", "phoneNumbers"}) public class Customer { private int id; private String firstName; private String lastName; private List phoneNumbers = new ArrayList(); public int getId() { return id; } public void setId(int id) { this.id = id; } public String getFirstName() { return firstName; } public void setFirstName(String firstName) { this.firstName = firstName; } @XmlElement(nillable=true) public String getLastName() { return lastName; } public void setLastName(String lastName) { this.lastName = lastName; } @XmlElement public List getPhoneNumbers() { return phoneNumbers; } } PhoneNumber package blog.jsonp.moxy; import javax.xml.bind.annotation.*; @XmlAccessorType(XmlAccessType.FIELD) public class PhoneNumber { private String type; private String number; public String getType() { return type; } public void setType(String type) { this.type = type; } public String getNumber() { return number; } public void setNumber(String number) { this.number = number; } } jaxb.properties To specify MOXy as your JAXB provider you need to include a file called jaxb.properties in the same package as your domain model with the following entry (see: Specifying EclipseLink MOXy as your JAXB Provider) javax.xml.bind.context.factory=org.eclipse.persistence.jaxb.JAXBContextFactory Marshal Demo In the demo code below we will use a combination of JSR-353 and MOXy APIs to produce JSON. JSR-353's JsonObjectBuilder and JsonArrayBuilder are used to produces instances of JsonObject and JsonArray. We can use MOXy to marshal to these builders by wrapping them in instances of MOXy's JsonObjectBuilderResult and JsonArrayBuilderResult. package blog.jsonp.moxy; import java.util.*; import javax.json.*; import javax.json.stream.JsonGenerator; import javax.xml.bind.*; import org.eclipse.persistence.jaxb.JAXBContextProperties; import org.eclipse.persistence.oxm.json.*; public class MarshalDemo { public static void main(String[] args) throws Exception { // Create the EclipseLink JAXB (MOXy) Marshaller Map jaxbProperties = new HashMap(2); jaxbProperties.put(JAXBContextProperties.MEDIA_TYPE, "application/json"); jaxbProperties.put(JAXBContextProperties.JSON_INCLUDE_ROOT, false); JAXBContext jc = JAXBContext.newInstance(new Class[] {Customer.class}, jaxbProperties); Marshaller marshaller = jc.createMarshaller(); // Create the JsonArrayBuilder JsonArrayBuilder customersArrayBuilder = Json.createArrayBuilder(); // Build the First Customer Customer customer = new Customer(); customer.setId(1); customer.setFirstName("Jane"); customer.setLastName(null); PhoneNumber phoneNumber = new PhoneNumber(); phoneNumber.setType("cell"); phoneNumber.setNumber("555-1111"); customer.getPhoneNumbers().add(phoneNumber); // Marshal the First Customer Object into the JsonArray JsonArrayBuilderResult result = new JsonArrayBuilderResult(customersArrayBuilder); marshaller.marshal(customer, result); // Build List of PhoneNumer Objects for Second Customer List phoneNumbers = new ArrayList(2); PhoneNumber workPhone = new PhoneNumber(); workPhone.setType("work"); workPhone.setNumber("555-2222"); phoneNumbers.add(workPhone); PhoneNumber homePhone = new PhoneNumber(); homePhone.setType("home"); homePhone.setNumber("555-3333"); phoneNumbers.add(homePhone); // Marshal the List of PhoneNumber Objects JsonArrayBuilderResult arrayBuilderResult = new JsonArrayBuilderResult(); marshaller.marshal(phoneNumbers, arrayBuilderResult); customersArrayBuilder // Use JSR-353 APIs for Second Customer's Data .add(Json.createObjectBuilder() .add("id", 2) .add("firstName", "Bob") .addNull("lastName") // Included Marshalled PhoneNumber Objects .add("phoneNumbers", arrayBuilderResult.getJsonArrayBuilder()) ) .build(); // Write JSON to System.out Map jsonProperties = new HashMap(1); jsonProperties.put(JsonGenerator.PRETTY_PRINTING, true); JsonWriterFactory writerFactory = Json.createWriterFactory(jsonProperties); JsonWriter writer = writerFactory.createWriter(System.out); writer.writeArray(customersArrayBuilder.build()); writer.close(); } } Highlighted lines: 36, 37, 38, 54, 55, 64 Output Below is the output from running the marshal demo (MarshalDemo). The highlighted portions (lines 2-12 and 18-25) correspond to the portions that were populated from our Java model. [ { "id":1, "firstName":"Jane", "lastName":null, "phoneNumbers":[ { "type":"cell", "number":"555-1111" } ] }, { "id":2, "firstName":"Bob", "lastName":null, "phoneNumbers":[ { "type":"work", "number":"555-2222" }, { "type":"home", "number":"555-3333" } ] } ] Highlighted lines: 2-12, 18-25 Unmarshal Demo MOXy enables you to unmarshal from a JSR-353 JsonStructure (JsonObject or JsonArray). To do this simply wrap the JsonStructure in an instance of MOXy's JsonStructureSource and use one of the unmarshal operations that takes an instance of Source. package blog.jsonp.moxy; import java.io.FileInputStream; import java.util.*; import javax.json.*; import javax.xml.bind.*; import org.eclipse.persistence.jaxb.JAXBContextProperties; import org.eclipse.persistence.oxm.json.JsonStructureSource; public class UnmarshalDemo { public static void main(String[] args) throws Exception { try (FileInputStream is = new FileInputStream("src/blog/jsonp/moxy/input.json")) { // Create the EclipseLink JAXB (MOXy) Unmarshaller Map jaxbProperties = new HashMap(2); jaxbProperties.put(JAXBContextProperties.MEDIA_TYPE, "application/json"); jaxbProperties.put(JAXBContextProperties.JSON_INCLUDE_ROOT, false); JAXBContext jc = JAXBContext.newInstance(new Class[] {Customer.class}, jaxbProperties); Unmarshaller unmarshaller = jc.createUnmarshaller(); // Parse the JSON JsonReader jsonReader = Json.createReader(is); // Unmarshal Root Level JsonArray JsonArray customersArray = jsonReader.readArray(); JsonStructureSource arraySource = new JsonStructureSource(customersArray); List customers = (List) unmarshaller.unmarshal(arraySource, Customer.class) .getValue(); for(Customer customer : customers) { System.out.println(customer.getFirstName()); } // Unmarshal Nested JsonObject JsonObject customerObject = customersArray.getJsonObject(1); JsonStructureSource objectSource = new JsonStructureSource(customerObject); Customer customer = unmarshaller.unmarshal(objectSource, Customer.class) .getValue(); for(PhoneNumber phoneNumber : customer.getPhoneNumbers()) { System.out.println(phoneNumber.getNumber()); } } } } Highlighted lines: 27-30, 37-39 Input (input.json) The following JSON input will be converted to a JsonArray using a JsonReader. [ { "id":1, "firstName":"Jane", "lastName":null, "phoneNumbers":[ { "type":"cell", "number":"555-1111" } ] }, { "id":2, "firstName":"Bob", "lastName":null, "phoneNumbers":[ { "type":"work", "number":"555-2222" }, { "type":"home", "number":"555-3333" } ] } ] Highlighted lines: 4, 15, 20, 24 Output Below is the output from running the unmarshal demo (UnmarshalDemo). Jane Bob 555-2222 555-3333
August 7, 2013
by Blaise Doughan
· 13,750 Views
article thumbnail
NoSQL with JPA
EclipseLink, reference implementation of JPA, has JPA support for NoSQL databases (MongoDB and Oracle NoSQL) as of the version 2.4. In this tutorial we will discuss the use of MongoDB database with the JPA support of EclipseLink. The transaction previously done using the console and native java driver will be done in a web application with the help of EclipseLink. Tools and technologies used in the sample application are as follows: MongoDB version 2.4.1 MongoDB Java Driver version 2.11.1 JSF version 2.2 PrimeFaces version 3.5 EclipseLink version 2.4 Jetty 7.x Maven Plugin JDK version 1.7 Maven 3.0.4 Project Dependencies org.glassfish javax.faces 2.2.0-SNAPSHOT org.primefaces primefaces 3.5 org.primefaces.themes bootstrap 1.0.10 org.eclipse.persistence org.eclipse.persistence.jpa 2.4.0-SNAPSHOT org.eclipse.persistence org.eclipse.persistence.nosql 2.4.0-SNAPSHOT jboss jboss-j2ee 4.2.2.GA org.mongodb mongo-java-driver 2.11.1 commons-fileupload commons-fileupload 1.3 Entity Class @Entity @NoSql(dataFormat=DataFormatType.MAPPED) public class Article implements Serializable { public Article() { } @Id @GeneratedValue @Field(name="_id") private String id; @ElementCollection private List categoryLists = new ArrayList(); @Basic private String title; @Basic private String content; @Basic @Temporal(javax.persistence.TemporalType.DATE) private Date date; @Basic private String author; @ElementCollection private List tagLists = new ArrayList(); @NoSQL notation sets the data format and type and maps the NoSQL data. Because of using MongoDB in our sample application and documents in MongoDB stored in BSON format, MAP is used as data type. @ElementCollection notation maps the embedded collection into the parent document. Because more than one category and tag associated with an article would be a matter in our sample application, we map them as an element collection. Embedded Objects @Embeddable @NoSql(dataFormat=DataFormatType.MAPPED) public class Categories implements Serializable { @Basic private String category; @Embeddable @NoSql(dataFormat=DataFormatType.MAPPED) public class Tags implements Serializable { @Basic private String tag; We see @Embeddable notation at the top of the Categories and Tags’ class unlike Article entity class. The documents stored in the parent document are mapped with this notation. Please note that embedded objects do not need unique field. persistence.xml com.kodcu.entity.Article com.com.kodcu.entity.Categories com.kodcu.entity.Tags CRUD Operations index.xhtml MyBean.java public void saveArticle() { em.getTransaction().begin(); if(null == article.getId()) em.persist(article); else em.merge(article); em.getTransaction().commit(); } public void removeArticle() { em.getTransaction().begin(); em.remove(selectArticle); em.getTransaction().commit(); } 6. Demo Application Real content above and the demo application, can be accessed at NoSQL with JPA
August 6, 2013
by Hüseyin Akdoğan DZone Core CORE
· 31,615 Views · 1 Like
article thumbnail
Getting started with CQEngine: LINQ for Java, Only Faster
CQEngine or collection query engine is a library that allows you to build indices over java collections and query them for objects using exposed properties. It offers similar capability to LINQ in .net but is thought to be faster because it builds indices over collections before querying them and uses set theory instead of iterations. In this post we will see how to query a simple collection of objects, in our example, a collection of users of a hypothetical system, using CQEngine. In a subsequent post, we will also see how iteratively searching a collection compares to querying via CQEngine. The first step is to get the CQEengine jar file. Download the jar from the CQEngine website or if you are using Maven, add the following dependency. com.googlecode.cqengine cqengine 1.0.3 Next, lets create the Class whose object we will be searching for: package co.syntx.examples.cqengine; import com.googlecode.cqengine.attribute.Attribute; import com.googlecode.cqengine.attribute.SimpleAttribute; public class User { private String username; private String password; private String fullname; private Role role; public User(String username, String password, String fullname, Role role) { super(); this.username = username; this.password = password; this.fullname = fullname; this.role = role; } public static final Attribute FULL_NAME = new SimpleAttribute("fullname") { public String getValue(User user) { return user.fullname; } }; public static final Attribute USERNAME = new SimpleAttribute("username") { public String getValue(User user) { return user.username; } }; public String getUsername() { return username; } public void setUsername(String username) { this.username = username; } public String getPassword() { return password; } public void setPassword(String password) { this.password = password; } public String getFullname() { return fullname; } public void setFullname(String fullname) { this.fullname = fullname; } public Role getRole() { return role; } public void setRole(Role role) { this.role = role; } } Next, we write a class, to perform our searches. I will go function by function. 1. Function to Build a Test Indexed Collection: In the following function, we build an indexed collection, define indices on attributes, and populate this collection with a certain number of objects. In actual usage, your collection will probably be filled with objects being returned from the DB, read from a file or other similar scenarios. public void buildIndexedCollection(int size) throws Exception { indexedUsers = CQEngine.newInstance(); indexedUsers.addIndex(HashIndex.onAttribute(User.FULL_NAME)); indexedUsers.addIndex(SuffixTreeIndex.onAttribute(User.FULL_NAME)); for (int i = 0; i < size; i++) { String username = RandomStringGenerator.generateRandomString(8,RandomStringGenerator.Mode.ALPHANUMERIC); String password = RandomStringGenerator.generateRandomString(8,RandomStringGenerator.Mode.ALPHANUMERIC); String fullname = RandomStringGenerator.generateRandomString(5,RandomStringGenerator.Mode.ALPHA) + " " + RandomStringGenerator.generateRandomString(5,RandomStringGenerator.Mode.ALPHA); Role role = new Role(); role.setName("admin"); indexedUsers.add(new User(username, password, fullname, role)); } } In line 3 we are initializing a new Indexed Collection, a reference of which is stored in the class variable indexedUsers. The reference is of type IndexedCollection In lines 4 and 5, we define two indices i) a Hash Index suitable for equal style queries. ii) a Suffix Index suitable for ends with style queried. For the purpose of this example, we are building indices only on the the Full name field. In line 8, we use a random string generator to populate dummy objects. In line 14 we add our object to our indexed collection. 2. Function to Perform Indexed Search for Exact Matches: In this function, we are querying for names that exactly match a given name. The equal function takes in the attribute upon which to perform the query and the value to search. The method equal is statically imported via a import static com.googlecode.cqengine.query.QueryFactory.*; In the example below, we are looping the results returned by the retrieve method and not doing anything with it. In your case, you may choose to return the Iterator returned by retrieve. public void indexedSearchForEquals(String fullname) throws Exception { Query query = equal(User.FULL_NAME, fullname); for (User user : indexedUsers.retrieve(query)) { // System.out.println(user.getFullname()); } } 3. Function to Perform Indexed Search for Ends With Matches: In this function, we are querying for names that end with a certain suffix. public void indexedSearchForEndsWith(String endswith) throws Exception { Query query1 = endsWith(User.FULL_NAME, endswith); for (User user : indexedUsers.retrieve(query1)) { // System.out.println(user.getFullname()); } } 4. Function to Perform Indexed Search for Equals or Ends With Matches: This function is a combination of both queries mentioned below and has an or relation between them. public void indexedSearchForEqualOrEndsWith(String equals, String ends) throws Exception { Query query = or(equal(User.FULL_NAME, equals),endsWith(User.FULL_NAME, ends)); for (User user : indexedUsers.retrieve(query)) { // System.out.println(user.getFullname()); } } 5. Putting it together: All the functions above belong to a class called CQEngineTest. We create a new object, build a test collection, then search for either exact matches, strings that end with a certain suffix or either. CQEngineTest test = new CQEngineTest(); test.buildIndexedCollection(size); test.indexedSearchForEqualOrEndsWith("test", "test"); In this example, we have used a Hash Index and a Suffix Tree Index. There are many other types of indices that you can choose depending on the type of query that you want to perform. A list of these indices and when to use them can be found on the cqengine project page. Also, apart from equal or endswith operations, there are others that you would typically expect to find. In a subsequent post we will also see how an indexed search compares with a typical iterative search in terms of times.
August 5, 2013
by Faheem Sohail
· 28,028 Views · 1 Like
article thumbnail
JPA Searching Using Lucene - A Working Example with Spring and DBUnit
Working Example on Github There's a small, self contained mavenised example project over on Github to accompany this post - check it out here:https://github.com/adrianmilne/jpa-lucene-spring-demo Running the Demo See the README file over on GitHub for details of running the demo. Essentially - it's just running the Unit Tests, with the usual maven build and test results output to the console - example below. This is the result of running the DBUnit test, which inserts Book data into the HSQL database using JPA, and then uses Lucene to query the data, testing that the expected Books are returned (i.e. only those int he SCI-FI category, containing the word 'Space', and ensuring that any with 'Space' in the title appear before those with 'Space' only in the description. The Book Entity Our simple example stores Books. The Book entity class below is a standard JPA Entity with a few additional annotations to identify it to Lucene: @Indexed - this identifies that the class will be added to the Lucene index. You can define a specific index by adding the 'index' attribute to the annotation. We're just choosing the simplest, minimal configuration for this example. In addition to this - you also need to specify which properties on the entity are to be indexed, and how they are to be indexed. For our example we are again going for the default option by just adding an @Field annotation with no extra parameters. We are adding one other annotation to the 'title' field - @Boost - this is just telling Lucene to give more weight to search term matches that appear in this field (than the same term appearing in the description field). This example is purposefully kept minimal in terms of the ins-and-outs of Lucene (I may cover that in a later post) - we're really just concentrating on the integration with JPA and Spring for now. package com.cor.demo.jpa.entity; import javax.persistence.Entity; import javax.persistence.EnumType; import javax.persistence.Enumerated; import javax.persistence.GeneratedValue; import javax.persistence.Id; import javax.persistence.Lob; import org.hibernate.search.annotations.Boost; import org.hibernate.search.annotations.Field; import org.hibernate.search.annotations.Indexed; /** * Book JPA Entity. */ @Entity @Indexed public class Book { @Id @GeneratedValue private Long id; @Field @Boost(value = 1.5f) private String title; @Field @Lob private String description; @Field @Enumerated(EnumType.STRING) private BookCategory category; public Book(){ } public Book(String title, BookCategory category, String description){ this.title = title; this.category = category; this.description = description; } public Long getId() { return id; } public void setId(Long id) { this.id = id; } public String getTitle() { return title; } public void setTitle(String title) { this.title = title; } public BookCategory getCategory() { return category; } public void setCategory(BookCategory category) { this.category = category; } public String getDescription() { return description; } public void setDescription(String description) { this.description = description; } @Override public String toString() { return "Book [id=" + id + ", title=" + title + ", description=" + description + ", category=" + category + "]"; } } The Book Manager The BookManager class acts as a simple service layer for the Book operations - used for adding books and searching books. As you can see, the JPA database resources are autowired in by Spring from the application-context.xml. We are just using an in-memory hsql database in this example. package com.cor.demo.jpa.manager; import java.util.List; import javax.persistence.EntityManager; import javax.persistence.PersistenceContext; import javax.persistence.PersistenceContextType; import javax.persistence.Query; import org.hibernate.search.jpa.FullTextEntityManager; import org.hibernate.search.jpa.Search; import org.hibernate.search.query.dsl.QueryBuilder; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.context.annotation.Scope; import org.springframework.stereotype.Component; import org.springframework.transaction.annotation.Transactional; import com.cor.demo.jpa.entity.Book; import com.cor.demo.jpa.entity.BookCategory; /** * Manager for persisting and searching on Books. Uses JPA and Lucene. */ @Component @Scope(value = "singleton") public class BookManager { /** Logger. */ private static Logger LOG = LoggerFactory.getLogger(BookManager.class); /** JPA Persistence Unit. */ @PersistenceContext(type = PersistenceContextType.EXTENDED, name = "booksPU") private EntityManager em; /** Hibernate Full Text Entity Manager. */ private FullTextEntityManager ftem; /** * Method to manually update the Full Text Index. This is not required if inserting entities * using this Manager as they will automatically be indexed. Useful though if you need to index * data inserted using a different method (e.g. pre-existing data, or test data inserted via * scripts or DbUnit). */ public void updateFullTextIndex() throws Exception { LOG.info("Updating Index"); getFullTextEntityManager().createIndexer().startAndWait(); } /** * Add a Book to the Database. */ @Transactional public Book addBook(Book book) { LOG.info("Adding Book : " + book); em.persist(book); return book; } /** * Delete All Books. */ @SuppressWarnings("unchecked") @Transactional public void deleteAllBooks() { LOG.info("Delete All Books"); Query allBooks = em.createQuery("select b from Book b"); List books = allBooks.getResultList(); // We need to delete individually (rather than a bulk delete) to ensure they are removed // from the Lucene index correctly for (Book b : books) { em.remove(b); } } @SuppressWarnings("unchecked") @Transactional public void listAllBooks() { LOG.info("List All Books"); LOG.info("------------------------------------------"); Query allBooks = em.createQuery("select b from Book b"); List books = allBooks.getResultList(); for (Book b : books) { LOG.info(b.toString()); getFullTextEntityManager().index(b); } } /** * Search for a Book. */ @SuppressWarnings("unchecked") @Transactional public List search(BookCategory category, String searchString) { LOG.info("------------------------------------------"); LOG.info("Searching Books in category '" + category + "' for phrase '" + searchString + "'"); // Create a Query Builder QueryBuilder qb = getFullTextEntityManager().getSearchFactory().buildQueryBuilder().forEntity(Book.class).get(); // Create a Lucene Full Text Query org.apache.lucene.search.Query luceneQuery = qb.bool() .must(qb.keyword().onFields("title", "description").matching(searchString).createQuery()) .must(qb.keyword().onField("category").matching(category).createQuery()).createQuery(); Query fullTextQuery = getFullTextEntityManager().createFullTextQuery(luceneQuery, Book.class); // Run Query and print out results to console List result = (List) fullTextQuery.getResultList(); // Log the Results LOG.info("Found Matching Books :" + result.size()); for (Book b : result) { LOG.info(" - " + b); } return result; } /** * Convenience method to get Full Test Entity Manager. Protected scope to assist mocking in Unit * Tests. * @return Full Text Entity Manager. */ protected FullTextEntityManager getFullTextEntityManager() { if (ftem == null) { ftem = Search.getFullTextEntityManager(em); } return ftem; } /** * Get the JPA Entity Manager (required for the DBUnit Tests). * @return Entity manager */ protected EntityManager getEntityManager() { return em; } /** * Sets the JPA Entity Manager (required to assist with mocking in Unit Test) * @param em EntityManager */ protected void setEntityManager(EntityManager em) { this.em = em; } } application-context.xml This is the Spring configuration file. You can see in the JPA Entity Manager configuration the key for 'hibernate.search.default.indexBase' is added to the jpaPropertyMap to tell Lucene where to create the index. We have also externalised the database login credentials to a properties file (as you may wish to change these for different environments), for example by updating the propertyConfigurer to look for and use a different external properties if it finds one on the file system). classpath:/system.properties Testing Using DBUnit In the project is an example of using DBUnit with Spring to test adding and searching against the database using DBUnit to populate the database with test data, exercise the Book Manager search operations and then clean the database down. This is a great way to test database functionality and can be easily integrated into maven and continuous build environments. Because DBUnit bypasses the standard JPA insertion calls - the data does not get automatically added to the Lucene index. We have a method exposed on the service interface to update the Full Text index 'updateFullTextIndex()' - calling this causes Lucene to update the index with the current data in the database. This can be useful when you are adding search to pre-populated databases to index the existing content. package com.cor.demo.jpa.manager; import java.io.InputStream; import java.util.List; import org.dbunit.DBTestCase; import org.dbunit.database.DatabaseConnection; import org.dbunit.database.IDatabaseConnection; import org.dbunit.dataset.IDataSet; import org.dbunit.dataset.xml.FlatXmlDataSetBuilder; import org.dbunit.operation.DatabaseOperation; import org.hibernate.impl.SessionImpl; import org.junit.After; import org.junit.Before; import org.junit.Test; import org.junit.runner.RunWith; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.test.context.ContextConfiguration; import org.springframework.test.context.junit4.SpringJUnit4ClassRunner; import com.cor.demo.jpa.entity.Book; import com.cor.demo.jpa.entity.BookCategory; /** * DBUnit Test - loads data defined in 'test-data-set.xml' into the database to run tests against the * BookManager. More thorough (and ultimately easier in this context) than using mocks. */ @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(locations = { "classpath:/application-context.xml" }) public class BookManagerDBUnitTest extends DBTestCase { /** Logger. */ private static Logger LOG = LoggerFactory.getLogger(BookManagerDBUnitTest.class); /** Book Manager Under Test. */ @Autowired private BookManager bookManager; @Before public void setup() throws Exception { DatabaseOperation.CLEAN_INSERT.execute(getDatabaseConnection(), getDataSet()); } @After public void tearDown() { deleteBooks(); } @Override protected IDataSet getDataSet() throws Exception { InputStream inputStream = this.getClass().getClassLoader().getResourceAsStream("test-data-set.xml"); FlatXmlDataSetBuilder builder = new FlatXmlDataSetBuilder(); return builder.build(inputStream); } /** * Get the underlying database connection from the JPA Entity Manager (DBUnit needs this connection). * @return Database Connection * @throws Exception */ private IDatabaseConnection getDatabaseConnection() throws Exception { return new DatabaseConnection(((SessionImpl) (bookManager.getEntityManager().getDelegate())).connection()); } /** * Tests the expected results for searching for 'Space' in SCF-FI books. */ @Test public void testSciFiBookSearch() throws Exception { bookManager.listAllBooks(); bookManager.updateFullTextIndex(); List results = bookManager.search(BookCategory.SCIFI, "Space"); assertEquals("Expected 2 results for SCI FI search for 'Space'", 2, results.size()); assertEquals("Expected 1st result to be '2001: A Space Oddysey'", "2001: A Space Oddysey", results.get(0).getTitle()); assertEquals("Expected 2nd result to be 'Apollo 13'", "Apollo 13", results.get(1).getTitle()); } private void deleteBooks() { LOG.info("Deleting Books...-"); bookManager.deleteAllBooks(); } } The source data for the test is defined in an xml file.
August 5, 2013
by Adrian Milne
· 31,290 Views
article thumbnail
What Is NoSQL?
Dan McCreary and Ann Kelly, authors of 'Making Sense of NoSQL,' discuss the business drivers and motivations that make NoSQL so popular to organizations today.
August 1, 2013
by Eric Gregory
· 21,236 Views · 4 Likes
article thumbnail
Jersey Client: Testing External Calls
Jim and I have been doing a bit of work over the last week which involved calling neo4j’s HA status URI to check whether or not an instance was a master/slave and we’ve been using jersey-client. The code looked roughly like this: class Neo4jInstance { private Client httpClient; private URI hostname; public Neo4jInstance(Client httpClient, URI hostname) { this.httpClient = httpClient; this.hostname = hostname; } public Boolean isSlave() { String slaveURI = hostname.toString() + ":7474/db/manage/server/ha/slave"; ClientResponse response = httpClient.resource(slaveURI).accept(TEXT_PLAIN).get(ClientResponse.class); return Boolean.parseBoolean(response.getEntity(String.class)); } } While writing some tests against this code we wanted to stub out the actual calls to the HA slave URI so we could simulate both conditions and a brief search suggested that mockito was the way to go. We ended up with a test that looked like this: @Test public void shouldIndicateInstanceIsSlave() { Client client = mock( Client.class ); WebResource webResource = mock( WebResource.class ); WebResource.Builder builder = mock( WebResource.Builder.class ); ClientResponse clientResponse = mock( ClientResponse.class ); when( builder.get( ClientResponse.class ) ).thenReturn( clientResponse ); when( clientResponse.getEntity( String.class ) ).thenReturn( "true" ); when( webResource.accept( anyString() ) ).thenReturn( builder ); when( client.resource( anyString() ) ).thenReturn( webResource ); Boolean isSlave = new Neo4jInstance(client, URI.create("http://localhost")).isSlave(); assertTrue(isSlave); } which is pretty gnarly but does the job. I thought there must be a better way so I continued searching and eventually came across this post on the mailing list which suggested creating a custom ClientHandler and stubbing out requests/responses there. I had a go at doing that and wrapped it with a little DSL that only covers our very specific use case: private static ClientBuilder client() { return new ClientBuilder(); } static class ClientBuilder { private String uri; private int statusCode; private String content; public ClientBuilder requestFor(String uri) { this.uri = uri; return this; } public ClientBuilder returns(int statusCode) { this.statusCode = statusCode; return this; } public Client create() { return new Client() { public ClientResponse handle(ClientRequest request) throws ClientHandlerException { if (request.getURI().toString().equals(uri)) { InBoundHeaders headers = new InBoundHeaders(); headers.put("Content-Type", asList("text/plain")); return createDummyResponse(headers); } throw new RuntimeException("No stub defined for " + request.getURI()); } }; } private ClientResponse createDummyResponse(InBoundHeaders headers) { return new ClientResponse(statusCode, headers, new ByteArrayInputStream(content.getBytes()), messageBodyWorkers()); } private MessageBodyWorkers messageBodyWorkers() { return new MessageBodyWorkers() { public Map> getReaders(MediaType mediaType) { return null; } public Map> getWriters(MediaType mediaType) { return null; } public String readersToString(Map> mediaTypeListMap) { return null; } public String writersToString(Map> mediaTypeListMap) { return null; } public MessageBodyReader getMessageBodyReader(Class tClass, Type type, Annotation[] annotations, MediaType mediaType) { return (MessageBodyReader) new StringProvider(); } public MessageBodyWriter getMessageBodyWriter(Class tClass, Type type, Annotation[] annotations, MediaType mediaType) { return null; } public List getMessageBodyWriterMediaTypes(Class tClass, Type type, Annotation[] annotations) { return null; } public MediaType getMessageBodyWriterMediaType(Class tClass, Type type, Annotation[] annotations, List mediaTypes) { return null; } }; } public ClientBuilder content(String content) { this.content = content; return this; } } If we change our test to use this code it now looks like this: @Test public void shouldIndicateInstanceIsSlave() { Client client = client().requestFor("http://localhost:7474/db/manage/server/ha/slave"). returns(200). content("true"). create(); Boolean isSlave = new Neo4jInstance(client, URI.create("http://localhost")).isSlave(); assertTrue(isSlave); } Is there a better way? In Ruby I’ve used WebMock to achieve this and Ashok pointed me towards WebStub which looks nice except I’d need to pass in the hostname + port rather than constructing that in the code.
August 1, 2013
by Mark Needham
· 10,774 Views
article thumbnail
AWS: Attaching an EBS volume on an EC2 instance and making it available for use
I recently wanted to attach an EBS volume to an existing EC2 instance that I had running and since it was for a one off tasks (famous last words) I decided to configure it manually. I created the EBS volume through the AWS console and one thing that initially caught me out is that the EC2 instance and EBS volume need to be in the same region and zone. Therefore if I create my EC2 instance in ‘eu-west-1b’ then I need to create my EBS volume in ‘eu-west-1b’ as well otherwise I won’t be able to attach it to that instance. I attached the device as /dev/sdf although the UI gives the following warning: Linux Devices: /dev/sdf through /dev/sdp Note: Newer linux kernels may rename your devices to /dev/xvdf through /dev/xvdp internally, even when the device name entered here (and shown in the details) is /dev/sdf through /dev/sdp. After attaching the EBS volume to the EC2 instance my next step was to SSH onto my EC2 instance and make the EBS volume available. The first step is to create a file system on the volume: $ sudo mkfs -t ext3 /dev/sdf mke2fs 1.42 (29-Nov-2011) Could not stat /dev/sdf --- No such file or directory The device apparently does not exist; did you specify it correctly? It turns out that warning was handy and the device has in fact been renamed. We can confirm this by callingfdisk: $ sudo fdisk -l Disk /dev/xvda1: 8589 MB, 8589934592 bytes 255 heads, 63 sectors/track, 1044 cylinders, total 16777216 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/xvda1 doesn't contain a valid partition table Disk /dev/xvdf: 53.7 GB, 53687091200 bytes 255 heads, 63 sectors/track, 6527 cylinders, total 104857600 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/xvdf doesn't contain a valid partition table /dev/xvdf is the one we’re interested in so I re-ran the previous command: $ sudo mkfs -t ext3 /dev/xvdf mke2fs 1.42 (29-Nov-2011) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 3276800 inodes, 13107200 blocks 655360 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 400 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424 Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done Once I’d done that I needed to create a mount point for the volume and I thought the best place was probably a directory under /mnt: $ sudo mkdir /mnt/ebs The final step is to mount the volume: $ sudo mount /dev/xvdf /mnt/ebs And if we run df we can see that it’s ready to go: $ df -h Filesystem Size Used Avail Use% Mounted on /dev/xvda1 7.9G 883M 6.7G 12% / udev 288M 8.0K 288M 1% /dev tmpfs 119M 164K 118M 1% /run none 5.0M 0 5.0M 0% /run/lock none 296M 0 296M 0% /run/shm /dev/xvdf 50G 180M 47G 1% /mnt/ebs
July 31, 2013
by Mark Needham
· 11,914 Views
article thumbnail
OLAP Operation in R
OLAP (Online Analytical Processing) is a very common way to analyze raw transaction data by aggregating along different combinations of dimensions. This is a well-established field in Business Intelligence / Reporting. In this post, I will highlight the key ideas in OLAP operation and illustrate how to do this in R. Facts and Dimensions The core part of OLAP is a so-called "multi-dimensional data model", which contains two types of tables; "Fact" table and "Dimension" table A Fact table contains records each describe an instance of a transaction. Each transaction records contains categorical attributes (which describes contextual aspects of the transaction, such as space, time, user) as well as numeric attributes (called "measures" which describes quantitative aspects of the transaction, such as no of items sold, dollar amount). A Dimension table contain records that further elaborates the contextual attributes, such as user profile data, location details ... etc. In a typical setting of Multi-dimensional model ... Each fact table contains foreign keys that references the primary key of multiple dimension tables. In the most simple form, it is called a STAR schema. Dimension tables can contain foreign keys that references other dimensional tables. This provides a sophisticated detail breakdown of the contextual aspects. This is also called a SNOWFLAKE schema. Also this is not a hard rule, Fact table tends to be independent of other Fact table and usually doesn't contain reference pointer among each other. However, different Fact table usually share the same set of dimension tables. This is also called GALAXY schema. But it is a hard rule that Dimension table NEVER points / references Fact table A simple STAR schema is shown in following diagram. Each dimension can also be hierarchical so that the analysis can be done at different degree of granularity. For example, the time dimension can be broken down into days, weeks, months, quarter and annual; Similarly, location dimension can be broken down into countries, states, cities ... etc. Here we first create a sales fact table that records each sales transaction. # Setup the dimension tables state_table <- data.frame(key=c("CA", "NY", "WA", "ON", "QU"), name=c("California", "new York", "Washington", "Ontario", "Quebec"), country=c("USA", "USA", "USA", "Canada", "Canada")) month_table <- data.frame(key=1:12, desc=c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), quarter=c("Q1","Q1","Q1","Q2","Q2","Q2","Q3","Q3","Q3","Q4","Q4","Q4")) prod_table <- data.frame(key=c("Printer", "Tablet", "Laptop"), price=c(225, 570, 1120)) # Function to generate the Sales table gen_sales <- function(no_of_recs) { # Generate transaction data randomly loc <- sample(state_table$key, no_of_recs, replace=T, prob=c(2,2,1,1,1)) time_month <- sample(month_table$key, no_of_recs, replace=T) time_year <- sample(c(2012, 2013), no_of_recs, replace=T) prod <- sample(prod_table$key, no_of_recs, replace=T, prob=c(1, 3, 2)) unit <- sample(c(1,2), no_of_recs, replace=T, prob=c(10, 3)) amount <- unit*prod_table[prod,]$price sales <- data.frame(month=time_month, year=time_year, loc=loc, prod=prod, unit=unit, amount=amount) # Sort the records by time order sales <- sales[order(sales$year, sales$month),] row.names(sales) <- NULL return(sales) } # Now create the sales fact table sales_fact <- gen_sales(500) # Look at a few records head(sales_fact) month year loc prod unit amount 1 1 2012 NY Laptop 1 225 2 1 2012 CA Laptop 2 450 3 1 2012 ON Tablet 2 2240 4 1 2012 NY Tablet 1 1120 5 1 2012 NY Tablet 2 2240 6 1 2012 CA Laptop 1 225 Multi-dimensional Cube Now, we turn this fact table into a hypercube with multiple dimensions. Each cell in the cube represents an aggregate value for a unique combination of each dimension. # Build up a cube revenue_cube <- tapply(sales_fact$amount, sales_fact[,c("prod", "month", "year", "loc")], FUN=function(x){return(sum(x))}) # Showing the cells of the cude revenue_cube , , year = 2012, loc = CA month prod 1 2 3 4 5 6 7 8 9 10 11 12 Laptop 1350 225 900 675 675 NA 675 1350 NA 1575 900 1350 Printer NA 2280 NA NA 1140 570 570 570 NA 570 1710 NA Tablet 2240 4480 12320 3360 2240 4480 3360 3360 5600 2240 2240 3360 , , year = 2013, loc = CA month prod 1 2 3 4 5 6 7 8 9 10 11 12 Laptop 225 225 450 675 225 900 900 450 675 225 675 1125 Printer NA 1140 NA 1140 570 NA NA 570 NA 1140 1710 1710 Tablet 3360 3360 1120 4480 2240 1120 7840 3360 3360 1120 5600 4480 , , year = 2012, loc = NY month prod 1 2 3 4 5 6 7 8 9 10 11 12 Laptop 450 450 NA NA 675 450 675 NA 225 225 NA 450 Printer NA 2280 NA 2850 570 NA NA 1710 1140 NA 570 NA Tablet 3360 13440 2240 2240 2240 5600 5600 3360 4480 3360 4480 3360 , , year = 2013, loc = NY ..... dimnames(revenue_cube) $prod [1] "Laptop" "Printer" "Tablet" $month [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" $year [1] "2012" "2013" $loc [1] "CA" "NY" "ON" "QU" "WA" OLAP Operations Here are some common operations of OLAP Slice Dice Rollup Drilldown Pivot "Slice" is about fixing certain dimensions to analyze the remaining dimensions. For example, we can focus in the sales happening in "2012", "Jan", or we can focus in the sales happening in "2012", "Jan", "Tablet". # Slice # cube data in Jan, 2012 revenue_cube[, "1", "2012",] loc prod CA NY ON QU WA Laptop 1350 450 NA 225 225 Printer NA NA NA 1140 NA Tablet 2240 3360 5600 1120 2240 # cube data in Jan, 2012 revenue_cube["Tablet", "1", "2012",] CA NY ON QU WA 2240 3360 5600 1120 2240 "Dice" is about limited each dimension to a certain range of values, while keeping the number of dimensions the same in the resulting cube. For example, we can focus in sales happening in [Jan/ Feb/Mar, Laptop/Tablet, CA/NY]. revenue_cube[c("Tablet","Laptop"), c("1","2","3"), , c("CA","NY")] , , year = 2012, loc = CA month prod 1 2 3 Tablet 2240 4480 12320 Laptop 1350 225 900 , , year = 2013, loc = CA month prod 1 2 3 Tablet 3360 3360 1120 Laptop 225 225 450 , , year = 2012, loc = NY month prod 1 2 3 Tablet 3360 13440 2240 Laptop 450 450 NA , , year = 2013, loc = NY month prod 1 2 3 Tablet 3360 4480 6720 Laptop 450 NA 225 "Rollup" is about applying an aggregation function to collapse a number of dimensions. For example, we want to focus in the annual revenue for each product and collapse the location dimension (ie: we don't care where we sold our product). apply(revenue_cube, c("year", "prod"), FUN=function(x) {return(sum(x, na.rm=TRUE))}) prod year Laptop Printer Tablet 2012 22275 31350 179200 2013 25200 33060 166880 "Drilldown" is the reverse of "rollup" and applying an aggregation function to a finer level of granularity. For example, we want to focus in the annual and monthly revenue for each product and collapse the location dimension (ie: we don't care where we sold our product). apply(revenue_cube, c("year", "month", "prod"), FUN=function(x) {return(sum(x, na.rm=TRUE))}) , , prod = Laptop month year 1 2 3 4 5 6 7 8 9 10 11 12 2012 2250 2475 1575 1575 2250 1800 1575 1800 900 2250 1350 2475 2013 2250 900 1575 1575 2250 2475 2025 1800 2025 2250 3825 2250 , , prod = Printer month year 1 2 3 4 5 6 7 8 9 10 11 12 2012 1140 5700 570 3990 4560 2850 1140 2850 2850 1710 3420 570 2013 1140 4560 3420 4560 2850 1140 570 3420 1140 3420 3990 2850 , , prod = Tablet month year 1 2 3 4 5 6 7 8 9 10 11 12 2012 14560 23520 17920 12320 10080 14560 13440 15680 25760 12320 11200 7840 2013 8960 11200 10080 7840 14560 10080 29120 15680 15680 8960 12320 22400 "Pivot" is about analyzing the combination of a pair of selected dimensions. For example, we want to analyze the revenue by year and month. Or we want to analyze the revenue by product and location. apply(revenue_cube, c("year", "month"), FUN=function(x) {return(sum(x, na.rm=TRUE))}) month year 1 2 3 4 5 6 7 8 9 10 11 12 2012 17950 31695 20065 17885 16890 19210 16155 20330 29510 16280 15970 10885 2013 12350 16660 15075 13975 19660 13695 31715 20900 18845 14630 20135 27500 apply(revenue_cube, c("prod", "loc"), FUN=function(x) {return(sum(x, na.rm=TRUE))}) loc prod CA NY ON QU WA Laptop 16425 9450 7650 7425 6525 Printer 15390 19950 7980 10830 10260 Tablet 90720 117600 45920 34720 57120 I hope you can get a taste of the richness of data processing model in R. However, since R is doing all the processing in RAM. This requires your data to be small enough so it can fit into the local memory in a single machine.
July 30, 2013
by Ricky Ho
· 17,918 Views · 3 Likes
article thumbnail
JMS vs RabbitMQ
Definition : JMS : Java Message Service is an API that is part of Java EE for sending messages between two or more clients. There are many JMS providers such as OpenMQ (glassfish’s default), HornetQ(Jboss), and ActiveMQ. RabbitMQ: is an open source message broker software which uses the AMQP standard and is written by Erlang. Messaging Model: JMS supports two models: one to one and publish/subscriber. RabbitMQ supports the AMQP model which has 4 models : direct, fanout, topic, headers. Data types: JMS supports 5 different data types but RabbitMQ supports only the binary data type. Workflow strategy: In AMQP, producers send to the exchange then the queue, but in JMS, producers send to the queue or topic directly. Technology compatibility: JMS is specific for java users only, but RabbitMQ supports many technologies. Performance: If you would like to know more about their performance, this benchmark is a good place to start, but look for others as well.
July 30, 2013
by Saeid Siavashi
· 51,707 Views · 16 Likes
article thumbnail
Stepping through LMDB: Making Everything Easier
okay, i know that i have been critical about the lmdb codebase so far. but one thing that i really want to point out for it is that it was pretty easy to actually get things working on windows. it wasn’t smooth, in the sense that i had to muck around with the source a bit (hack endianess, remove a bunch of unix specific header files, etc). but that took less than an hour, and it was pretty much it. since i am by no means an experienced c developer, i consider this a major win. compare that to leveldb, which flat out won’t run on windows no matter how much time i spent trying, and it is a pleasure . also, stepping through the code i am starting to get a sense of how it works that is much different than the one i had when i just read the code. it is like one of those 3d images, you suddenly see something. the first thing that became obvious is that i totally missed the significance of the lock file. lmdb actually create two files: lock.mdb data.mdb lock.mdb is used to synchronized data between different readers. it seems to mostly be there if you want to have multiple writers using different processes. that is a very interesting model for an embedded database, i’ve to admit. not something that i think other embedded databases are offering. in order to do that, it create two named mutexes (one for read and one for write). a side note on windows support: lmdb supports windows, but it is very much a 2nd class citizen. you can see it in things like path not found error turning into a no such process error (because it try to use getlasterror() codes as c codes), or when it doesn’t create a directory even though not creating it would fail. i am currently debugging through the code and fixing such issues as i go along (but no, i am doing heavy handed magic fixes, just to get past this stage to the next one, otherwise i would have sent a pull request). here is one such example. here is the original code: but readfile in win32 will return false if the file is empty, so you actually need to write something like this to make the code work: past that hurdle, i think that i get a lot more about what is going on with the way lmdb works than before. let us start with the way data.mdb works. it is important to note that for pretty much everything in lmdb we use the system page size. by default, that is 4kb. the data file starts with 2 pages allocated. those page contain the following information: looking back at how couchdb did things, i am pretty sure that those two pages are going to be pretty important . i am guess that they would always contain the root of the data in the file. there is also the last transaction on them, which is what i imagine determine how something gets committed. i don’t know yet, as i said, guessing based on how couchdb works. i’ll continue this review in another time. next time, transactions…
July 27, 2013
by Oren Eini
· 9,462 Views
article thumbnail
Using Morphia to Map Java Objects in MongoDB
MongoDB is an open source document-oriented NoSQL database system which stores data as JSON-like documents with dynamic schemas. As it doesn't store data in tables as is done in the usual relational database setup, it doesn't map well to the JPA way of storing data. Morphia is an open source lightweight type-safe library designed to bridge the gap between the MongoDB Java driver and domain objects. It can be an alternative to SpringData if you're not using the Spring Framework to interact with MongoDB. This post will cover the basics of persisting and querying entities along the lines of JPA by using Morphia and a MongoDB database instance. There are four POJOs this example will be using. First we have BaseEntity which is an abstract class containing the Id and Version fields: package com.city81.mongodb.morphia.entity; import org.bson.types.ObjectId; import com.google.code.morphia.annotations.Id; import com.google.code.morphia.annotations.Property; import com.google.code.morphia.annotations.Version; public abstract class BaseEntity { @Id @Property("id") protected ObjectId id; @Version @Property("version") private Long version; public BaseEntity() { super(); } public ObjectId getId() { return id; } public void setId(ObjectId id) { this.id = id; } public Long getVersion() { return version; } public void setVersion(Long version) { this.version = version; } } Whereas JPA would use @Column to rename the attribute, Morphia uses @Property. Another difference is that @Property needs to be on the variable whereas @Column can be on the variable or the get method. The main entity we want to persist is the Customer class: package com.city81.mongodb.morphia.entity; import java.util.List; import com.google.code.morphia.annotations.Embedded; import com.google.code.morphia.annotations.Entity; @Entity public class Customer extends BaseEntity { private String name; private List accounts; @Embedded private Address address; public String getName() { return name; } public void setName(String name) { this.name = name; } public List getAccounts() { return accounts; } public void setAccounts(List accounts) { this.accounts = accounts; } public Address getAddress() { return address; } public void setAddress(Address address) { this.address = address; } } As with JPA, the POJO is annotated with @Entity. The class also shows an example of @Embedded: The Address class is also annotated with @Embedded as shown below: package com.city81.mongodb.morphia.entity; import com.google.code.morphia.annotations.Embedded; @Embedded public class Address { private String number; private String street; private String town; private String postcode; public String getNumber() { return number; } public void setNumber(String number) { this.number = number; } public String getStreet() { return street; } public void setStreet(String street) { this.street = street; } public String getTown() { return town; } public void setTown(String town) { this.town = town; } public String getPostcode() { return postcode; } public void setPostcode(String postcode) { this.postcode = postcode; } } Finally, we have the Account class of which the customer class has a collection of: package com.city81.mongodb.morphia.entity; import com.google.code.morphia.annotations.Entity; @Entity public class Account extends BaseEntity { private String name; public String getName() { return name; } public void setName(String name) { this.name = name; } } The above show only a small subset of what annotations can be applied to domain classes. More can be found at http://code.google.com/p/morphia/wiki/AllAnnotations The Example class shown below goes through the steps involved in connecting to the MongoDB instance, populating the entities, persisting them and then retrieving them: package com.city81.mongodb.morphia; import java.net.UnknownHostException; import java.util.ArrayList; import java.util.List; import com.city81.mongodb.morphia.entity.Account; import com.city81.mongodb.morphia.entity.Address; import com.city81.mongodb.morphia.entity.Customer; import com.google.code.morphia.Datastore; import com.google.code.morphia.Key; import com.google.code.morphia.Morphia; import com.mongodb.Mongo; import com.mongodb.MongoException; /** * A MongoDB and Morphia Example * */ public class Example { public static void main( String[] args ) throws UnknownHostException, MongoException { String dbName = new String("bank"); Mongo mongo = new Mongo(); Morphia morphia = new Morphia(); Datastore datastore = morphia.createDatastore(mongo, dbName); morphia.mapPackage("com.city81.mongodb.morphia.entity"); Address address = new Address(); address.setNumber("81"); address.setStreet("Mongo Street"); address.setTown("City"); address.setPostcode("CT81 1DB"); Account account = new Account(); account.setName("Personal Account"); List accounts = new ArrayList(); accounts.add(account); Customer customer = new Customer(); customer.setAddress(address); customer.setName("Mr Bank Customer"); customer.setAccounts(accounts); Key savedCustomer = datastore.save(customer); System.out.println(savedCustomer.getId()); } Executing the first few lines will result in the creation of a Datastore. This interface will provide the ability to get, delete and save objects in the 'bank' MongoDB instance. The mapPackage method call on the morphia object determines what objects are mapped by that instance of Morphia. In this case all those in the package supplied. Other alternatives exist to map classes, including the method map which takes a single class (this method can be chained as the returning object is the morphia object), or passing a Set of classes to the Morphia constructor. After creating instances of the entities, they can be saved by calling save on the datastore instance and can be found using the primary key via the get method. The output from the Example class would look something like the below: 11-Jul-2012 13:20:06 com.google.code.morphia.logging.MorphiaLoggerFactory chooseLoggerFactory INFO: LoggerImplFactory set to com.google.code.morphia.logging.jdk.JDKLoggerFactory 4ffd6f7662109325c6eea24f Mr Bank Customer There are many other methods on the Datastore interface and they can be found along with the other Javadocs at http://morphia.googlecode.com/svn/site/morphia/apidocs/index.html An alternative to using the Datastore directly is to use the built in DAO support. This can be done by extending the BasicDAO class as shown below for the Customer entity: package com.city81.mongodb.morphia.dao; import com.city81.mongodb.morphia.entity.Customer; import com.google.code.morphia.Morphia; import com.google.code.morphia.dao.BasicDAO; import com.mongodb.Mongo; public class CustomerDAO extends BasicDAO { public CustomerDAO(Morphia morphia, Mongo mongo, String dbName) { super(mongo, morphia, dbName); } } To then make use of this, the Example class can be changed (and enhanced to show a query and a delete): ... CustomerDAO customerDAO = new CustomerDAO(morphia, mongo, dbName); customerDAO.save(customer); Query query = datastore.createQuery(Customer.class); query.and( query.criteria("accounts.name").equal("Personal Account"), query.criteria("address.number").equal("81"), query.criteria("name").contains("Bank") ); QueryResults retrievedCustomers = customerDAO.find(query); for (Customer retrievedCustomer : retrievedCustomers) { System.out.println(retrievedCustomer.getName()); System.out.println(retrievedCustomer.getAddress().getPostcode()); System.out.println(retrievedCustomer.getAccounts().get(0).getName()); customerDAO.delete(retrievedCustomer); } ... With the output from running the above shown below: 11-Jul-2012 13:30:46 com.google.code.morphia.logging.MorphiaLoggerFactory chooseLoggerFactory INFO: LoggerImplFactory set to com.google.code.morphia.logging.jdk.JDKLoggerFactory Mr Bank Customer CT81 1DB Personal Account This post only covers a few brief basics of Morphia but shows how it can help bridge the gap between JPA and NoSQL.
July 25, 2013
by Geraint Jones
· 76,113 Views
article thumbnail
Jersey: Listing all Resources, Paths, Verbs to Build an Entry Point/Index for an API
I’ve been playing around with Jersey over the past couple of days and one thing I wanted to do was create an entry point or index which listed all my resources, the available paths and the verbs they accepted. Guido Simone explained a neat way of finding the paths and verbs for a specific resource using Jersey’sIntrospectionModeller: AbstractResource resource = IntrospectionModeller.createResource(JacksonResource.class); System.out.println("Path is " + resource.getPath().getValue()); String uriPrefix = resource.getPath().getValue(); for (AbstractSubResourceMethod srm :resource.getSubResourceMethods()) { String uri = uriPrefix + "/" + srm.getPath().getValue(); System.out.println(srm.getHttpMethod() + " at the path " + uri + " return " + srm.getReturnType().getName()); } If we run that against j4-minimal‘s JacksonResource class we get the following output: Path is /jackson GET at the path /jackson/{who} return com.g414.j4.minimal.JacksonResource$Greeting GET at the path /jackson/awesome/{who} return javax.ws.rs.core.Response That’s pretty neat but I didn’t want to have to manually list all my resources since I’ve already done that using Guice . I needed a way to programatically get hold of them and I partially found the way to do this from this postwhich suggests using Application.getSingletons(). I actually ended up using Application.getClasses() and I ended up with ResourceListingResource: @Path("/") public class ResourceListingResource { @GET @Produces(MediaType.APPLICATION_JSON) public Response showAll( @Context Application application, @Context HttpServletRequest request) { String basePath = request.getRequestURL().toString(); ObjectNode root = JsonNodeFactory.instance.objectNode(); ArrayNode resources = JsonNodeFactory.instance.arrayNode(); root.put( "resources", resources ); for ( Class aClass : application.getClasses() ) { if ( isAnnotatedResourceClass( aClass ) ) { AbstractResource resource = IntrospectionModeller.createResource( aClass ); ObjectNode resourceNode = JsonNodeFactory.instance.objectNode(); String uriPrefix = resource.getPath().getValue(); for ( AbstractSubResourceMethod srm : resource.getSubResourceMethods() ) { String uri = uriPrefix + "/" + srm.getPath().getValue(); addTo( resourceNode, uri, srm, joinUri(basePath, uri) ); } for ( AbstractResourceMethod srm : resource.getResourceMethods() ) { addTo( resourceNode, uriPrefix, srm, joinUri( basePath, uriPrefix ) ); } resources.add( resourceNode ); } } return Response.ok().entity( root ).build(); } private void addTo( ObjectNode resourceNode, String uriPrefix, AbstractResourceMethod srm, String path ) { if ( resourceNode.get( uriPrefix ) == null ) { ObjectNode inner = JsonNodeFactory.instance.objectNode(); inner.put("path", path); inner.put("verbs", JsonNodeFactory.instance.arrayNode()); resourceNode.put( uriPrefix, inner ); } ((ArrayNode) resourceNode.get( uriPrefix ).get("verbs")).add( srm.getHttpMethod() ); } private boolean isAnnotatedResourceClass( Class rc ) { if ( rc.isAnnotationPresent( Path.class ) ) { return true; } for ( Class i : rc.getInterfaces() ) { if ( i.isAnnotationPresent( Path.class ) ) { return true; } } return false; } } The only change I’ve made from Guido Simone’s solution is that I also call resource.getResourceMethods()because resource.getSubResourceMethods() only returns methods which have a @Path annotation. Since we’ll sometimes define our path at the class level and then define different verbs that operate on that resource it misses some methods out. If we run a cURL command (piped through python to make it look nice) against the root we get the following output: $ curl http://localhost:8080/ -w "\n" 2>/dev/null | python -mjson.tool { "resources": [ { "/bench": { "path": "http://localhost:8080/bench", "verbs": [ "GET", "POST", "PUT", "DELETE" ] } }, { "/sample/{who}": { "path": "http://localhost:8080/sample/{who}", "verbs": [ "GET" ] } }, { "/jackson/awesome/{who}": { "path": "http://localhost:8080/jackson/awesome/{who}", "verbs": [ "GET" ] }, "/jackson/{who}": { "path": "http://localhost:8080/jackson/{who}", "verbs": [ "GET" ] } }, { "/": { "path": "http://localhost:8080/", "verbs": [ "GET" ] } } ] }
July 24, 2013
by Mark Needham
· 20,957 Views · 2 Likes
article thumbnail
Getting started with JPA and Mule
Working with JPA managed entities in Mule applications can be difficult. Since the JPA session is not propagated between message processors, transformers are typically needed to produce an entity from a message’s payload, pass it to a component for processing, then serialize it back to an un-proxied representation for further processing. Transactions have been complicated too. Its difficult to coordinate a transaction between multiple components that are operating with JPA entity payloads. Finally the lack of support for JPA queries makes it difficult to load objects without working with raw SQL and the JDBC transport. Mule Support for JPA Entities The JPA module aims to simplify working with JPA managed entities with Mule. It provides message processors that map to an EntityManager’s methods. The message processors participate in Mule transactions, making it easy to structure JPA transactions within Mule flows. The JPA module also provides a @PersistenceContext implementation. This allows Mule components to participate in JPA transactions. Installing the JPA Module To install the JPA Module you need to click on “Help” followed by “Install New Software…” from Mule Studio. Select the “MuleStudio Cloud Connectors Update Site” from the “Work With” drop-down list then find the “Mule Java Persistence API Module Mule Extension.” This is illustrated below: Fetching JPA Entities JPA query language or criteria queries can be executed using the “query” MP. Supplying a statement to the query will execute the given query and return the results to the next message processor, as illustrated in the following Gist: The queryParameters-ref defines the parameters. In this case the message’s payload as the parameters to the query. The following query illustrates how a Map payload could be used to populate query parameters: The query processor also supports criteria queries by setting the queryParameters-ref to an instance of a CriteriaQuery, as illustrated in the functional test snippet below. CriteriaBuilder criteriaBuilder = entityManager.getCriteriaBuilder(); CriteriaQuery criteriaQuery = criteriaBuilder.createQuery(Dog.class); Root from = criteriaQuery.from(Dog.class); Predicate condition = criteriaBuilder.equal(from.get("name"), "Cujo"); criteriaQuery.where(condition); runFlowWithPayloadAndExpect("testQuery", expectedResults, criteriaQuery); You can use the ”find” MP to load a single object if you know its ID: Transactions and Entity Operations The default behavior of most JPA providers, like Hibernate, is to provide proxies on entity relationships to avoid loading full object graphs into memory. When these objects are detached from the JPA session, however, attempts to access relations in the object will often fail because the proxied session is no longer available. This complicates using JPA is Mule applications as JPA objects pass between message processors and inbetween flows and the session subsequently becomes unavailable. The JPA module allows you to avoid this by wrapping your operations in a transactional block. Let’s first look at how to persist an object then query it within a transaction. The below assumes the message’s payload is an instance of the Dog domain class. Now let’s see how we can use the merge processor to attach a JPA object to a new session. This can be useful when passing a JPA entity from one flow to another. ....other processing here.... Detaching an entity is just as simple: Component Operations with JPA The real power of using JPA with Mule is allowing your business services to participate in Mule managed JPA transactions. A @PersistenceContext EntityManager reference in your component class will cause Mule to inject a reference to a transactional flow’s current EntityManager for that method, as illustrated in the following class: public class DogServiceImpl { @PersistenceContext EntityManager entityManager; public Dog groom(Dog dog) { return entityManager.merge(dog); } } We can now wire the component up in a flow: Conclusion JPA is an important part of the JEE ecosystem and hopefully this module will simplify your use of JPA managed entities in Mule applications.
July 24, 2013
by John D'Emic
· 8,159 Views
article thumbnail
Algorithm of the Week: Spatial Indexing with Quadtrees and Hilbert Curves
some time ago at oredev, after the sessions, there was "birds of a feather" - a sort of mini-unconference. anyone could write up a topic on the whiteboard; interested individuals added their names, and each group got allocated a room to chat about the topic. i joined the "spatial indexing" group, and we spent a fascinating hour and a half talking about spatial indexing methods, reminding me of several interesting algorithms and techniques. spatial indexing is increasingly important as more and more data and applications are geospatially-enabled. efficiently querying geospatial data, however, is a considerable challenge: because the data is two-dimensional (or sometimes, more), you can't use standard indexing techniques to query on position. spatial indexes solve this through a variety of techniques. in this post, we'll cover several - quadtrees , geohashes (not to be confused with geohashing ), and space-filling curves - and reveal how they're all interrelated. quadtrees quadtrees are a very straightforward spatial indexing technique. in a quadtree, each node represents a bounding box covering some part of the space being indexed, with the root node covering the entire area. each node is either a leaf node - in which case it contains one or more indexed points, and no children, or it is an internal node, in which case it has exactly four children, one for each quadrant obtained by dividing the area covered in half along both axes - hence the name. a representation of how a quadtree divides an indexed area. source: wikipedia inserting data into a quadtree is simple: starting at the root, determine which quadrant your point occupies. recurse to that node and repeat, until you find a leaf node. then, add your point to that node's list of points. if the list exceeds some pre-determined maximum number of elements, split the node, and move the points into the correct subnodes. a representation of how a quadtree is structured internally. to query a quadtree, starting at the root, examine each child node, and check if it intersects the area being queried for. if it does, recurse into that child node. whenever you encounter a leaf node, examine each entry to see if it intersects with the query area, and return it if it does. note that a quadtree is very regular - it is, in fact, a trie , since the values of the tree nodes do not depend on the data being inserted. a consequence of this is that we can uniquely number our nodes in a straightforward manner: simply number each quadrant in binary (00 for the top left, 10 for the top right, and so forth), and the number for a node is the concatenation of the quadrant numbers for each of its ancestors, starting at the root. using this system, the bottom right node in the sample image would be numbered 11 01. if we define a maximum depth for our tree, then, we can calculate a point's node number without reference to the tree - simply normalize the node's coordinates to an appropriate integer range (for example, 32 bits each), and then interleave the bits from the x and y coordinates -each pair of bits specifies a quadrant in the hypothetical quadtree. geohashes this system might seem familiar: it's a geohash ! at this point, you can actually throw out the quadtree itself - the node number, or geohash, contains all the information we need about its location in the tree. each leaf node in a full-height tree is a complete geohash, and each internal node is represented by the range from its smallest leaf node to its largest one. thus, you can efficiently locate all the points under any internal node by indexing on the geohash by performing a query for everything within the numeric range covered by the desired node. querying once we've thrown away the tree itself becomes a little more complex. instead of refining our search set recursively, we need to construct a search set ahead of time. first, find the smallest prefix (or quadtree node) that completely covers the query area. in the worst case, this may be substantially larger than the actual query area - for example, a small shape in the center of the indexed area that intersects all four quadrants would require selecting the root node for this step. the aim, now, is to construct a set of prefixes that completely covers the query region, while including as little area outside the region as possible. if we had no other constraints, we could simply select the set of leaf nodes that intersect the query area - but that would result in a lot of queries. another constraint, then, is that we want to minimise the number of distinct ranges we have to query for. one approach to doing this is to start by setting a maximum number of ranges we're willing to have. construct a set of ranges, initially populated with the prefix we identified earlier. pick the node in the set that can be subdivided without exceeding the maximum range count and will remove the most unwanted area from the query region. repeat this until there are no ranges in the set that can be further subdivided. finally, examine the resulting set, and join any adjacent ranges, if possible. the diagram below demonstrates how this works for a query on a circular area with a limit of 5 query ranges. how a query for a region is broken into a series of geohash prefixes/ranges. this approach works well, and it allows us to avoid the need to do recursive lookups - the set of range lookups we do execute can all be done in parallel. since each lookup can be expected to require a disk seek, parallelizing our queries allows us to substantially cut down the time required to return the results. still, we can do better. you may notice that all the areas we need to query in the above diagram are adjacent, yet we can only merge two of them (the two in the bottom right of the selected area) into a single range query, requiring us to do 4 separate queries. this is due in part to the order that our geohashing approach 'visits' subregions, working left to right, then top to bottom in each quad. the discontinuity as we go from top right to bottom left quad results in us having to split up some ranges that we could otherwise make contiguous. if we were to visit regions in a different order, perhaps we could minimise or eliminate these discontinuities, resulting in more areas that can be treated as adjacent and fetched with a single query. with an improvement in efficiency like that, we could do fewer queries for the same area covered, or conversely, the same number of queries, but including less extraneous area. illustrates the order in which the geohashing approach 'visits' each quad. hilbert curves suppose instead, we visit regions in a 'u' shape. within each quad, of course, we also visit subquads in the same 'u' shape, but aligned so as to match up with neighbouring quads. if we organise the orientation of these 'u's correctly, we can completely eliminate any discontinuities, and visit the entire area at whatever resolution we choose continuously, fully exploring each region before moving on to the next. not only does this eliminate discontinuities, but it also improves the overall locality. the pattern we get if we do this may look familiar - it's a hilbert curve. hilbert curves are part of a class of one-dimensional fractals known as space-filling curves , so named because they are one dimensional lines that nevertheless fill all available space in a fixed area. they're fairly well known, in part thanks to xkcd's use of them for a map of the internet . as you can see, they're also of use for spatial indexing, since they exhibit exactly the locality and continuity required. for example, if we take another look at the example we used for finding the set of queries required to encompass a circle above, we find that we can reduce the number of queries by one: the small region in the lower left is now contiguous with the region to its right, and whilst the two regions at the bottom are no longer contiguous with each other, the rightmost one is now contiguous with the large area in the upper right. illustrates the order in which a hilbert curve 'visits' each quad. one thing that our elegant new system is lacking, so far, is a way of converting between a pair of (x,y) coordinates and the corresponding position in the hilbert curve. with geohashing it was easy and obvious - just interleave the x and y coordinates - but there's no obvious way to modify that for a hilbert curve. searching the internet, you're likely to come across many descriptions of how hilbert curves are drawn, but few if any descriptions of how to find the position of an arbitrary point. to figure this out, we need to take a closer look at how the hilbert cure can be recursively constructed. the first thing to observe is that although most references to hilbert curves focus on how to draw the curve, this is a distraction from the essential property of the curve, and its importance to us: it's an ordering for points on a plane. if we express a hilbert curve in terms of this ordering, drawing the curve itself becomes trivial - simply a matter of connecting the dots. forget about how to connect adjacent sub-curves, and instead focus on how we can recursively enumerate the points. hilbert curves are all about ordering a set of points on a 2d plane at the root level, enumerating the points is simple: pick a direction and a start point, and proceed around the four quadrants, numbering them 0 to 3. the difficulty is introduced when we want to determine the order we visit the sub-quadrants in while maintaining the overall adjacency property. examination reveals that each of the sub-quadrants' curves is a simple transformation of the original curve: there are only four possible transformations. naturally, this applies recursively to sub-sub quadrants, and so forth. the curve we use for a given quadrant is determined by the curve we used for the square it's in, and the quadrant's position. with a little work, we can construct a table that encapsulates this: suppose we want to use this table to determine the position of a point on a third-level hilbert curve. for the sake of this example, assume our point has coordinates (5,2) starting with the first square on the diagram, find the quadrant your point is in - in this case, it's the upper right quadrant. the first part of our hilbert curve position, then, is 3 (11 in binary). next, we consult the square shown in the inset of square 3 - in this case, it's the second square. repeat the process: which sub-quadrant does our point fall into? here, it's the lower left one, meaning the next part of our position is 1, and the square we should consult next is the second one again. repeating the process one final time, we find our point falls in the upper right sub-sub-quadrant, our final coordinate is 3 (11 in binary). stringing them together, we now know the position of our point on the curve is 110111 binary, or 55. let's be a little more methodical, and write methods to convert between x,y coordinates and hilbert curve positions. first, we need to express our diagram above in terms a computer can understand: hilbert_map = { 'a': {(0, 0): (0, 'd'), (0, 1): (1, 'a'), (1, 0): (3, 'b'), (1, 1): (2, 'a')}, 'b': {(0, 0): (2, 'b'), (0, 1): (1, 'b'), (1, 0): (3, 'a'), (1, 1): (0, 'c')}, 'c': {(0, 0): (2, 'c'), (0, 1): (3, 'd'), (1, 0): (1, 'c'), (1, 1): (0, 'b')}, 'd': {(0, 0): (0, 'a'), (0, 1): (3, 'c'), (1, 0): (1, 'd'), (1, 1): (2, 'd')}, } in the snippet above, each element of 'hilbert_map' corresponds to one of the four squares in the diagram above. to make things easier to follow, i've identified each one with a letter - 'a' is the first square, 'b' the second, and so forth. the value for each square is a dict, mapping x and y coordinates for the (sub-)quadrant to the position along the line (the first part of the value tuple) and the square to use next (the second part of the value tuple). here's how we can use this to translate x and y coordinates into a hilbert curve position: def point_to_hilbert(x, y, order=16): current_square = 'a' position = 0 for i in range(order - 1, -1, -1): position <<= 2 quad_x = 1 if x & (1 << i) else 0 quad_y = 1 if y & (1 << i) else 0 quad_position, current_square = hilbert_map[current_square][(quad_x, quad_y)] position |= quad_position return position the input to this function is the integer x and y coordinates, and the order of the curve. an order 1 curve fills a 2x2 grid, an order 2 curve fills a 4x4 grid, and so forth. our x and y coordinates, then, should be normalized to a range of 0 to 2order-1. the function works by stepping over each bit of the x and y coordinates, starting with the most significant. for each, it determines which (sub-)quadrant the coordinate lies in, by testing the corresponding bit, then fetches the position along the line and the next square to use from the table we defined earlier. the curve position is set as the least significant 2 bits on the position variable, and at the beginning of the next loop, it's left-shifted to make room for the next set of coordinates. let's check that we've written the function correctly by running our example from above through it: >>> point_to_hilbert(5,2,3)55 presto! for a further test, we can use the function to generate a complete list of ordered points for a hilbert curve, then use a spreadsheet to graph them and see if we get a hilbert curve. enter the following expression into an interactive python interpreter: >>> points =[(x, y)for x in range(8)for y in range(8)]>>> sorted_points = sorted(points, key=lambda k: point_to_hilbert(k[0], k[1],3))>>>print'\n'.join('%s,%s'% x for x in sorted_points) take the resulting text, paste it into a file called 'hilbert.csv', open it in your favorite spreadsheet, and instruct it to generate a scatter plot. the result is, of course, a nicely plotted hilbert curve! the inverse of point_to_hilbert is a straightforward reversal of the hilbert_map; implementing it is left as an exercise for the reader. conclusion there you have it - spatial indexing, from quadtrees to geohashes to hilbert curves. one final observation: if you express the ordered sequence of x,y coordinates required to draw a hilbert curve in binary, do you notice anything interesting about the ordering? does it remind you of anything? just to wrap up, a caveat: all of the indexing methods i've described today are only well-suited to indexing points. if you want to index lines, polylines, or polygons, you're probably out of luck with these methods - and so far as i'm aware, the only known algorithm for effectively indexing shapes is the r-tree , an entirely different and more complex beast.
July 23, 2013
by Nick Johnson
· 43,704 Views
article thumbnail
Log4j 2: Performance close to insane
Recently a respected member of the Apache community tried Log4j 2 and wrote on Twitter: (Quote from Mark Struberg: @TheASF #log4j2 rocks big times! Performance is close to insane ^^ http://logging.apache.org/log4j/2.x/ ) It happened shortly after Remko Popma contributed something which is now called the “AsyncLoggers”. Some of you might know Log4j 2 has AsyncAppenders already. They are similar like the ones you can find in Log4j 1 and other logging frameworks. I am honest: I wasn’t so excited about the new feature until I read the tweet on its performance and became curious. Clearly Java logging has many goals. Among them: logging must be as fast as hell. Nobody wants his logging framework to become a bottleneck. Of course you’ll always have a cost when logging. There is some operation the CPU must perform. Something is happening, even when you decide NOT to write a log statement. Logging is expected to be invisible. Until now, the well-known logging frameworks were similar in speed. Benchmarks are unreliable after all. We have made some benchmarks over at Apache Logging. Sometimes one logging frameworks wins, sometimes the other. But at the end of the day you can say they are all very good and you can choose whatever your liking is. Until we got Remko’s contribution and Log4j 2 became “insanely fast”. Small software projects running one thread might not care about performance so much. When running a SaaS you simply don’t know when your app gets so much attraction that you need to scale. Then you suddenly need some extra power. With Log4j 2, running 64 threads might bring you twelve times more logging throughput than with comparable frameworks. We speak of more than 18,000,000 messages per second, while others do around 1,500,000 or less in the same environment. I saw the chart, but simply couldn’t believe it. There must be something wrong. I rechecked. I ran the tests myself. It’s like that: Log4j 2 is insanely fast. Async Performance, last read on July 19, 2013 As of now, we have a logging framework which performs lots better than every other logging framework out there. As of now we need to justify our decision when we do not want to use Log4j 2, if speed matters.Everything else than Log4j 2 can become a bottleneck and a risk. With such a fast logging framework you might even consider to log a bit more in production than you did before. Eventually I wrote Remko an e-mail and asked him what exactly the difference between the old AsyncAppenders and the new Asynchronous Loggers is. The difference between old AsynAppenders and new AsyncLoggers “The Asynchronous Loggers do two things differently than the AsyncAppender”, he told me, “they try to do the minimum amount of work before handing off the log message to another thread, and they use a different mechanism to pass information between the producer and consumer threads. AsyncAppender uses an ArrayBlockingQueue to hand off the messages to the thread that writes to disk, and Asynchronous Loggers use the LMAX Disruptor library. Especially the Disruptor has made a large performance difference.” In other terms, the AsyncAppender use a first-in-first-out Queue to work through messages. But the Async Logger uses something new – the Disruptor. To be honest, I had never heard of it. And furthermore, I never thought much about scaling my logging framework. When somebody said “scale the system”, I thought about the database, the app server and much more, but usually not logging. In production, logging was off. End of story. But Remko thinks about scaling when it comes to logging. “Looking at the performance test results for the Asynchronous Loggers, the first thing you notice is that some ways of logging scale much better than others. By scaling better I mean that you get more throughput when you add more threads. If your throughput increases a constant amount with every thread you add, you have linear scalability. This is very desirable but can be difficult to achieve.”, he wrote me. “Comparing synchronous to asynchronous, you would expect any asynchronous mechanism to scale much better than synchronous logging because you don’t do the I/O in the producing thread any more, and we all know that ‘I/O is slow’ (and I’ll get back to this in a bit)”. Yes, exactly my understanding. I thought it would be enough to send something to a queue, and something else would pick it up and write the message. The app would go on. This is exactly what the old AsyncAppender does, wrote Remko: “With AsyncAppender, all your application thread needs to do is create a LogEvent object and put it on the ArrayBlockingQueue; the consuming thread will then take these events off the queue and do all the time-consuming work. That is, the work of turning the event into bytes and writing these bytes to the I/O device. Since the application threads do not need to do the I/O, you would expect this to scale better, meaning adding threads will allow you to log more events.” If you believed that like me, take a seat and a deep breath. We were wrong. “What may surprise you is that this is not the case.”, he wrote. “If you look at the performance numbers for the AsyncAppenders of all logging frameworks, you’ll see that every time you double the number of threads, your throughput per thread roughly halves.” “So your total throughput remains more or less flat! AsyncAppenders are faster than synchronous logging, but they are similar in the sense that neither of them gives you more total throughput when you add more threads.”, he told me. It hit me like a hammer. Basically instead of making your logging faster with adding more threads you made basically: nothing. After all Appenders didn’t scale until now. I asked Remko why this was the case. “It turns out that queues are not the most optimal data structure to pass information between threads. The concurrent queues that are part of the standard Java libraries use locks to make sure that values don’t get corrupted and to ensure data visibility between threads.”. LMAX Disruptor? “The LMAX team did a lot of research on this and found that these queues have a lot of lock contention. An interesting thing they found is that queues are always either full or empty: If your producer is faster, your queue will be full most of the time (and that may be a problem in itself ). If your consumer is fast enough, your queue will be empty most of the time. Either way, you will have contention on the head or on the tail of the queue, where both the producer and the consumer thread want to update the same field. To resolve this, the LMAX team came up with the Disruptor library, which is a lock-free data structure for passing messages between threads. Here is a performance comparison between the Disruptor and ArrayBlockingQueue:Performance Comparison.” Wow. After all these years of Java programming I actually felt a bit like a Junior programmer again. I missed the LMAX disruptor and even never considered it a performance problem to use the Queue. I wonder what other performance problems I did not discover so far. I realized, I had to re-learn Java. I asked Remko how he could find a library like the LMAX disruptor. I mean nobody writes software, creates an instance of a Queue-class, doubts its performance and finally searches the internet for “something better”. Or are there really people of that kind? “How I found about the Disruptor? The short answer is, it was all a mistake.”, he started. “Okay, perhaps that was a bit too short, so here is the longer answer: a colleague of mine wrote a small logger, essentially adding a time-stamped log message to a queue, with a background thread that took these strings off the queue and wrote them to disk. He did this because he needed better performance than what he could get with log4j-1.x. I did some testing and found it was faster, I don’t remember exactly by how much. I was quite surprised because I had been using log4j for years and had never thought it would be easily outperformed. Until then I had assumed that the well-known libraries would be fast, because, well… To be honest, I had just assumed. So this was a bit of an eye-opener for me. However, the custom logger was a bit bare-bones in terms of functionality so I started to look around for alternatives.” “Before I start talking about the Disruptor, I have to confess something. I recently went back to see how much faster the custom logger was than log4j-1.x, but when I measured it it was actually slower! It turned out that I had been comparing the custom logger to an old beta of log4j-2.0, I think beta3 or beta4. AsyncAppender in those betas still had a performance issue (LOG4J2-153 if you’re curious). If I had compared the custom logger to the AsyncAppender in log4j-1.x, I would have found that log4j-1.x was faster and I would not have thought about it further. But because I made this mistake I started to look for other high-performance logging libraries that were richer in functionality. I did not find such a logging library, but I ran into a whole bunch of other interesting stuff, including the Disruptor. Eventually I decided to try to combine Log4j-2, which has a very nice code base, with the Disruptor. The result of this was eventually accepted into Log4j-2 itself, and the rest, as they say, was history.” “One thing I came across that I should mention here is Peter Lawrey’sChronicle library. Chronicle uses memory-mapped files to write tens of millions of messages per second to disk with very low latency. Remember that above I said that “we all know that I/O is slow”? Chronicle shows that synchronous I/O can be very, very fast.“. “It was via Peter’s work that I came across the Disruptor. There is a lot of good material out there about the Disruptor. Just to give you a few pointers: Martin Fowler: LMAX Trisha Lee on LMAX under the hood (slightly outdated now but the most detailed material I know of) …and video presentations like this The Disruptor google group is also highly recommended. Recommended readings on Java performance in general are: Martin Thompson’s “Mechanical Sympathy” Martin Thompson Presentations. Martin Thompson has done a number of articles and presentations on various aspects of high performance computing in java. He does a great job of making the complex stuff that is going on under the hood accessible.” My bookmarks folder went full after reading this e-mail, and I appreciate the lots of starting points for improving my knowledge on Java performance. Should I use AsyncLoggers by default? I was sure I want to use the new Async Loggers. This all sounds just fantastic. But on the other hand, I am a bit scared and even a little paranoid to include new dependencies or new technologies like the new Log4j 2 Async Loggers. I asked Remko if he would use the new feature by default or if he would enable them just for a few, limited use cases. “I use Async Loggers by default, yes.”, he wrote me. “One use case when you would _not_ want to use asynchronous logging is when you use logging for audit purposes. In that case a logging error is a problem that your application needs to know about and deal with. I believe that most applications are different, in that they don’t care too much about logging errors. Most applications don’t want to stop if a logging exception occurs, in fact, they don’t even want to know about it. By default, appenders in Log4j-2.0 will suppress exceptions so the application doesn’t need to try/catch every log statement. If that is your usage, then you will not lose anything by using asynchronous loggers, so you get only the benefits, which is improved performance.” “One nice little detail I should mention is that both Async Loggers and Async Appenders fix something that has always bothered me in Log4j-1.x, which is that they will flush the buffer after logging the last event in the queue. With Log4j-1.x, if you used buffered I/O, you often could not see the last few log events, as they were still stuck in the memory buffer. Your only option was setting immediateFlush to true, which forces disk I/O on every single log event and has a performance impact. With Async Loggers and Appenders in Log4j-2.0 your log statements are all flushed to disk, so they are always visible, but this happens in a very efficient manner.” Isn’t it risky to log to use Log4js AsyncLoggers? But considering that Log4j-1 had serious threading issues and the modern world uses cloud computing and clustering all the time to scale their apps,isn’t asynchronous logging some kind of additional risk? Or is it safe? I knew my questions would sound like the questions of a decision maker, not of an developer. But the whole LMAX thing was so new to me and since I maintain the old and really ugly Log4j 1 code, I simply had to ask. Remko: “There are a number of questions in there. First, is Log4j-2 safer from a concurrency perspective than Log4j-1.x? I believe so. The Log4j-2 team has put in considerable effort to support multi-threaded applications, and the asynchronous loggers are just a very recent and relatively small addition to the project. Log4j-2 uses more granular locking than log4j-1.x, and is architecturally simpler, which should result in fewer issues, and any issues that do come up will be easier to fix.” “On the other hand, Log4j-2 is still in beta and is under active development, although recently I think most effort is being spent on fixing things and tying up loose ends rather than adding new features. I believe it is stable enough for production use. If you are considering using Log4j-2, for performance or other reasons, I’d suggest you do your due diligence and test, just like you would before adopting any other 3rd party library in your project.” (Sidenote: A stable version of Log4j2 can be expected soon, most likely autumn 2013). Sounded good to me. And yes, I can perfectly agree with that from my own observations on the project, though I personally did not write code in the Log4j 2 repository. “The other question I see is: Is asynchronous logging riskier than synchronous logging? I don’t think so, in fact, if your application is multi threaded the opposite may be the case: once the log event has been handed off to the consumer thread that does the I/O, there is only that one thread dealing with the layouts, appenders and all the other logging-related components. So after the hand-off you’re single-threaded and you don’t need to worry about any threading issues like deadlock and liveliness etc any more.” “You can take this one step further and make your business logic completely single-threaded, using the disruptor for all I/O or communication with external systems. Single-threaded business logic without lock contention can be blazingly fast. The results at LMAX (6 million transactions/sec, with less than 10 ms latency) speak for themselves.” Reading Remko’s message I learned three things. First, I had to learn more about Java performance. Second, I definitely want to make my applications use Log4j 2. As first step, I will enable it in my Struts 2 apps, which I use often. Third, a web application framework using the LMAX Disruptor might blow us all away. I would like to give a big thank you and a hug to Remko Popma for answering my questions and working on this blog post with me. All errors are my own.
July 20, 2013
by Christian Grobmeier
· 7,444 Views · 1 Like
article thumbnail
Playing with NHibernate - Inverse and Cascade Mapping Attributes
I have to admit that NHibernate provides a really flexible way of handling class inheritance and parent-child relationship.
July 12, 2013
by Mariano Vazquez
· 36,376 Views
article thumbnail
JAX RS: Streaming a Response using StreamingOutput
A couple of weeks ago Jim and I were building out a neo4j unmanaged extension from which we wanted to return the results of a traversal which had a lot of paths. Our code initially looked a bit like this: package com.markandjim @Path("/subgraph") public class ExtractSubGraphResource { private final GraphDatabaseService database; public ExtractSubGraphResource(@Context GraphDatabaseService database) { this.database = database; } @GET @Produces(MediaType.TEXT_PLAIN) @Path("/{nodeId}/{depth}") public Response hello(@PathParam("nodeId") long nodeId, @PathParam("depth") int depth) { Node node = database.getNodeById(nodeId); final Traverser paths = Traversal.description() .depthFirst() .relationships(DynamicRelationshipType.withName("whatever")) .evaluator( Evaluators.toDepth(depth) ) .traverse(node); StringBuilder allThePaths = new StringBuilder(); for (org.neo4j.graphdb.Path path : paths) { allThePaths.append(path.toString() + "\n"); } return Response.ok(allThePaths.toString()).build(); } } We then compiled that into a JAR, placed it in ‘plugins’ and added the following line to ‘conf/neo4j-server.properties’: org.neo4j.server.thirdparty_jaxrs_classes=com.markandjim=/unmanaged After we’d restarted the neo4j server we were able to call this end point using cURL like so: $ curl -v http://localhost:7474/unmanaged/subgraph/1000/10 This approach works quite well but Jim pointed out that it was quite inefficient to load all those paths up into memory so we thought it would be quite cool if we could stream it as we got to each path. Traverser wraps an iterator so we are lazily evaluating the result set in any case. After a bit of searching we came StreamingOutput which is exactly what we need. We adapted our code to use that instead: package com.markandjim @Path("/subgraph") public class ExtractSubGraphResource { private final GraphDatabaseService database; public ExtractSubGraphResource(@Context GraphDatabaseService database) { this.database = database; } @GET @Produces(MediaType.TEXT_PLAIN) @Path("/{nodeId}/{depth}") public Response hello(@PathParam("nodeId") long nodeId, @PathParam("depth") int depth) { Node node = database.getNodeById(nodeId); final Traverser paths = Traversal.description() .depthFirst() .relationships(DynamicRelationshipType.withName("whatever")) .evaluator( Evaluators.toDepth(depth) ) .traverse(node); StreamingOutput stream = new StreamingOutput() { @Override public void write(OutputStream os) throws IOException, WebApplicationException { Writer writer = new BufferedWriter(new OutputStreamWriter(os)); for (org.neo4j.graphdb.Path path : paths) { writer.write(path.toString() + "\n"); } writer.flush(); } }; return Response.ok(stream).build(); } As far as I can tell the only discernible difference between the two approaches is that you get an almost immediate response from the streamed approached whereas the first approach has to put everything in the StringBuilder first. Both approaches make use of chunked transfer encoding which according to tcpdump seems to have a maximum packet size of 16332 bytes: 00:10:27.361521 IP localhost.7474 > localhost.55473: Flags [.], seq 6098196:6114528, ack 179, win 9175, options [nop,nop,TS val 784819663 ecr 784819662], length 16332 00:10:27.362278 IP localhost.7474 > localhost.55473: Flags [.], seq 6147374:6163706, ack 179, win 9175, options [nop,nop,TS val 784819663 ecr 784819663], length 16332
July 10, 2013
by Mark Needham
· 114,187 Views · 4 Likes
article thumbnail
Changing MySQL Binary Log Files Location to Another Directory
What is the Default? Usually in most installations, binary log files are located in the MySQL default directory (/var/lib/mysql) just next to the data files. Why Should I Move the Binary Logs to Another Directory? Each data modification (INSERT, UPDATE, DELETE...) and data definition (ALTER, ADD, DROP...) statement that you perform in your server are recorded in the Log files. Therefore, each time you make any of these statements, you actually update both your data files and your log files. The result is high IO utilization that is focused on a specific disk area. A common recommendation in the database field is to separate these files to two different disks in order to get a better performance. How to Perform it? Change the log-bin variable in the my.cnf to log-bin=/path/to/new/directory/mysql-bin Purge as many files as you can (PURGE BINLOG...) in order to minimize the number of moved files (see stop 4). Stop the master (service mysql stop). Move the files to the new directory: mv /var/lib/mysql/mysql-bin.* /path/to/new/directory Start the master again (service mysql start). Bottom Line Few steps and your server is ready for more traffic and data. Keep Performing, Moshe Kaplan
June 30, 2013
by Moshe Kaplan
· 49,157 Views
article thumbnail
Linksheet: Apache Cassandra
Curator's note: This collection of resources collected by Tim Spann will help you get started with Apache Cassandra, the open source distributed database system originally developed at Facebook. http://hector-client.github.io/hector/build/html/index.html https://www.openshift.com/blogs/cassandra-on-openshift https://github.com/shekhargulati/cassandra-openshift-quickstart-java http://whyjava.wordpress.com/2012/02/23/logging-jaxws-soap-request-and-response-using-a-java-property/ https://dzone.com/articles/creating-your-first-java https://github.com/boneill42/naughty-or-nice http://www.slideshare.net/DataStax/college-credit-creating-your-first-app-in-java-with-cassandra
June 30, 2013
by Tim Spann DZone Core CORE
· 5,276 Views · 1 Like
article thumbnail
Integration of Amazon Redshift Data Warehouse with Talend Data Integration
In this blog post, I will show you how to "ETL" all kinds of data to Amazon’s cloud data warehouse Redshift wit Talend’s big data components. Let’s begin with a short introduction to Amazon Redshift (copied from website): "Amazon Redshift is [part of Amazon Web Services (AWS) and] a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud. With a few clicks in the AWS Management Console, customers can launch a Redshift cluster, starting with a few hundred gigabytes and scaling to a petabyte or more, for under $1,000 per terabyte per year. Traditional data warehouses require significant time and resource to administer, especially for large datasets. In addition, the financial cost associated with building, maintaining, and growing self-managed, on-premise data warehouses is very high. Amazon Redshift not only significantly lowers the cost of a data warehouse, but also makes it easy to analyze large amounts of data very quickly.“ Sounds interesting! And indeed, we already see companies using Talend’s Redshift connectors. From Talend perspective it is not much more than just another database. If you have ever used a Talend connector, you can integrate to Redshift within some minutes. In the next sections, I will describe all necessary steps and give some hints regarding configuration issues and performance improvements. Be aware: You need Talend Open Studio for Data Integration (Apache License, open source) or any Talend Enterprise Edition / Platform which contains the Cloud components to see and use Amazon Redshift connectors. The open source edition offers all connectors and functionality to integrate with Amazon Redshift. However, Enterprise versions offer some more features (e.g. versioning), comfort (e.g. wizards) and commercial support. Setup Amazon Redshift Setup of Amazon Redshift is very easy. Just follow Amazon‘s getting started guide: http://docs.aws.amazon.com/redshift/latest/gsg/welcome.html. Like every other AWS guide, it is very easy to understand and use. Be aware, that you just have to do step 1, 2 and 3 of the getting started guide for using it with Talend. Some hints: - Step 1 („before you begin“): Just sign up. Client tools and drivers are not necessary because they are already installed within Talend Studio. - Step 2 („launch a cluster“): Yes, please start your cluster! - Step 3(„authorize access“): If you are not sure what to do here, select Connection Type = CIDR/IP. Find out your IP address (http://whatismyipaddress.com) and enter it with „/32“ at the end. Example: „192.168.1.1/32“ Now you can connect to Amazon Redshift from your Talend Studio on your local computer. Step 4 (connect) and step 5 (create table, data, queries) are not necessary, this will be done from Talend Studio. Of course, you should not forget to delete your cluster (step 7) when you are done. Otherwise, you will pay for every hour, even if you do not access your DWH. Connect to Amazon Redshift from Talend Studio Create a new connection to Amazon Redshift database as you do with every other relational database. The easiest way is to use „DB Connection Wizard“ in metadata. Just enter your connection information and check if it works. You get all information about configuration from Amazon Web Console. The connection string looks something like this: „jdbc:paraccel://talend-demo-cluster.cp8t6c5.eu-west-1.redshift.amazonaws.com:5439/dev“ Next, right click on the created connection and select „retrieve schema“. „public“ is the default schema which you (have to) use. Now, you are ready to use this connection within Talend Jobs to write to Amazon Redshift and read from it. Create Talend Jobs (Write, Read, Delete) Amazon Redshift components work like any other Talend (relational) database components. Look at www.help.talend.com for more information if you have not used them before (or just try them out, they are very self-explanatory). You just have to drag&drop your connection from metadata . Afterwards, you can easily write data (tRedShiftOutput), read data (tRedshiftInput), or do any other queries such as delete or copy (tRedShiftRow). In the following job, I start with deleting all content in the Amazon Redshift table. Then, I read data from a MySQL table and insert it into an Amazon Redshift table. The table is created automatically (as I have configured it this way). After this subjob is finished, I read the data again, and store it to a CSV file (which is also created automatically). Of course, this is no business use case, but it shows how to use different Amazon Redshift components. Query Data from Amazon Redshift You can connect to Amazon Redshift directly from Talend Studio to explore and query data of the DWH. Thus, no other database tool is required. Just right click on your Amazon Redshift connection in metadata and select „edit queries“. Here you can define, execute and save SQL queries. Improve Performance Write performance of Amazon Redshift is relatively low compared to „classical“ relational databases (in your data center) as you have to upload all data into the cloud. Different alternatives exist to improve performance: - Bulk inserts: „Extended insert“ (in advanced settings) improves performance a lot, but still not to hyperspeed… Also, as it is bulk, you can just do inserts! It is not compatible to „rejects“ or „updates“ - AWS S3 and COPY command: S3 is Amazon’s „simple storage service“, a key-value store – also called NoSQL today – for storing very large objects. You can use Amazon Redshift’s COPY command (http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html) to transfer data from S3 to Amazon Redshift with good performance. Though, you still have to copy data to S3 before, same „cloud problem“ here. The COPY command can be used with tRedshiftRow, so no problem at all from Talend perspective. To transfer data to S3, you can either use the Talend S3 components from Talendforge, Talend’s open source community (http://www.talendforge.org/exchange), or use camel-s3, an Apache Camel component which is included in Talend ESB. The latter is an option, if you use Talend Data Services which combines Talend DI and Talend ESB in its unified platform. Summary You need not be a cloud or DWH expert, or an expert developer to integrate with Amazon’s cloud data warehouse Redshift. It is very easy with Talend’s integration solutions. Just drag&drop, configure, do some graphical mappings / transformations (if necessary), that’s it. Code is generated. Job runs. You can integrate Amazon Redshift almost as simple as any other relational database. Just be aware of some cloud specific security and performance issues. With Talend, you can easily „ETL“ all data from different sources to Redshift and store it there for under $1,000 per terabyte per year – even with the open source version! Best regards, Kai Wähner (Contact and feedback via @KaiWaehner, www.kai-waehner.de, LinkedIn / Xing) This is content from my blog: http://www.kai-waehner.de/blog/2013/06/26/integration-of-amazon-redshift-cloud-data-warehouse-aws-saas-dwh-with-talend-data-integration-di-big-data-bd-enterprise-service-bus-esb/
June 27, 2013
by Kai Wähner DZone Core CORE
· 20,449 Views · 1 Like
  • Previous
  • ...
  • 503
  • 504
  • 505
  • 506
  • 507
  • 508
  • 509
  • 510
  • 511
  • 512
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×