DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Databases Topics

article thumbnail
Hibernate 3 with Spring
1. Overview This article will focus on setting up Hibernate 3 with Spring – we’ll look at how to configure Spring 3 with Hibernate 3 using both Java and XML Configuration. 2. Maven To add the Spring Persistence dependencies to the pom, please see the Spring with Maven article. Continuing with Hibernate 3, the Maven dependencies are simple: org.hibernate hibernate-core 3.6.10.Final Then, to enable Hibernate to use its proxy model, we need javassist as well: org.javassist javassist 3.17.1-GA And since we’re going to use MySQL for this tutorial, we’ll also need: mysql mysql-connector-java 5.1.25 runtime 3. Java Spring Configuration for Hibernate 3 Setting up Hibernate 3 with Spring and Java configuration is straightforward: import java.util.Properties; import javax.sql.DataSource; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.ComponentScan; import org.springframework.context.annotation.Configuration; import org.springframework.context.annotation.PropertySource; import org.springframework.core.env.Environment; import org.springframework.dao.annotation.PersistenceExceptionTranslationPostProcessor; import org.springframework.jdbc.datasource.DriverManagerDataSource; import org.springframework.orm.hibernate3.HibernateTransactionManager; import org.springframework.orm.hibernate3.annotation.AnnotationSessionFactoryBean; import org.springframework.transaction.annotation.EnableTransactionManagement; import com.google.common.base.Preconditions; @Configuration @EnableTransactionManagement @PropertySource({ "classpath:persistence-mysql.properties" }) @ComponentScan({ "org.baeldung.spring.persistence" }) public class PersistenceConfig { @Autowired private Environment env; @Bean public AnnotationSessionFactoryBean sessionFactory() { AnnotationSessionFactoryBean sessionFactory = new AnnotationSessionFactoryBean(); sessionFactory.setDataSource(restDataSource()); sessionFactory.setPackagesToScan(new String[] { "org.baeldung.spring.persistence.model" }); sessionFactory.setHibernateProperties(hibernateProperties()); return sessionFactory; } @Bean public DataSource restDataSource() { DriverManagerDataSource dataSource = new DriverManagerDataSource(); dataSource.setDriverClassName(env.getProperty("jdbc.driverClassName")); dataSource.setUrl(env.getProperty("jdbc.url")); dataSource.setUsername(env.getProperty("jdbc.user")); dataSource.setPassword(env.getProperty("jdbc.pass")); return dataSource; } @Bean public HibernateTransactionManager transactionManager() { HibernateTransactionManager txManager = new HibernateTransactionManager(); txManager.setSessionFactory(sessionFactory().getObject()); return txManager; } @Bean public PersistenceExceptionTranslationPostProcessor exceptionTranslation() { return new PersistenceExceptionTranslationPostProcessor(); } Properties hibernateProperties() { return new Properties() { { setProperty("hibernate.hbm2ddl.auto", env.getProperty("hibernate.hbm2ddl.auto")); setProperty("hibernate.dialect", env.getProperty("hibernate.dialect")); } }; } } Compared to the XML Configuration – described next – there is a small difference in the way one bean in the configuration access another. In XML there is no difference between pointing to a bean or pointing to a bean factory capable of creating that bean. Since the Java configuration is type-safe – pointing directly to the bean factory is no longer an option – we need to retrieve the bean from the bean factory manually: txManager.setSessionFactory(sessionFactory().getObject()); 4. XML Spring Configuration for Hibernate 3 Simillary, Hibernate 3 can be configured using XML Configuration as well: ${hibernate.hbm2ddl.auto} ${hibernate.dialect} Then, this XML file is boostrapped into the Spring context: @Configuration @EnableTransactionManagement @ImportResource({ "classpath:persistenceConfig.xml" }) public class PersistenceXmlConfig { // } For both types of configuration, the JDBC and Hibernate specific properties are stored in a properties file: # jdbc.X jdbc.driverClassName=com.mysql.jdbc.Driver jdbc.url=jdbc:mysql://localhost:3306/spring_hibernate_dev?createDatabaseIfNotExist=true jdbc.user=tutorialuser jdbc.pass=tutorialmy5ql # hibernate.X hibernate.dialect=org.hibernate.dialect.MySQL5Dialect hibernate.show_sql=false hibernate.hbm2ddl.auto=create-drop 5. Spring, Hibernate and MySQL The example above uses MySQL 5 as the underlying database configured with Hibernate – however, Hibernate supports several underlying SQL Databases. 5.1. The Driver The Driver class name is configured via the jdbc.driverClassName property provided to the DataSource. In the example above, it is set to com.mysql.jdbc.Driver from the mysql-connector-java dependency we defined in the pom, at the start of the article. 5.2. The Dialect The Dialect is configured via the hibernate.dialect property provided to the Hibernate SessionFactory. In the example above, this is set to org.hibernate.dialect.MySQL5Dialect as we are using MySQL 5 as the underlying Database. There are several other dialects supporting MySQL: org.hibernate.dialect.MySQL5InnoDBDialect – for MySQL 5.x with the InnoDB storage engine org.hibernate.dialect.MySQLDialect – for MySQL prior to 5.x org.hibernate.dialect.MySQLInnoDBDialect – for MySQL prior to 5.x with the InnoDB storage engine org.hibernate.dialect.MySQLMyISAMDialect – for all MySQL versions with the ISAM storage engine Hibernate supports SQL Dialects for every supported Database. 6. Usage At this point, Hibernate 3 is fully configured with Spring and we can inject the raw HibernateSessionFactory directly whenever we need to: public abstract class FooHibernateDAO{ @Autowired SessionFactory sessionFactory; ... protected Session getCurrentSession(){ return sessionFactory.getCurrentSession(); } } 7. Conclusion In this example, we configured Hiberate 3 with Spring – both with Java and XML configuration. The implementation of this simple project can be found in the github project – this is an Eclipse based project, so it should be easy to import and run as it is.
May 8, 2013
by Eugen Paraschiv
· 12,962 Views · 1 Like
article thumbnail
Neo4j/Cypher: Returning a Row with Zero Count When No Relationship Exists
I’ve been trying to see if I can match some of the football stats that OptaJoe posts on twitter and one that I was looking at yesterday was around the number of red cards different teams have received. 1 – Sunderland have picked up their first PL red card of the season. The only team without one now are Man Utd. Angels. To refresh this is the sub graph that we’ll need to look at to work it out: I started off with the following query which traverses out from each match, finds the players who were sent off in the match and then groups the sendings off by the team they were playing for: START game = node:matches('match_id:*') MATCH game<-[:sent_off_in]-player-[:played]->likeThis-[:in]->game, likeThis-[:for]->team RETURN team.name, COUNT(game) AS redCards ORDER BY redCards LIMIT 5 When we run this we get the following results: +------------------------------+ | team.name | redCards | +------------------------------+ | "Sunderland" | 1 | | "West Ham United" | 1 | | "Norwich City" | 1 | | "Reading" | 1 | | "Liverpool" | 2 | +------------------------------+ 5 rows The problem we have here is that it hasn’t returned Manchester United because they haven’t yet received any red cards and therefore none of their players match the ‘sent_off_in’ relationship. I ran into something similar in a post I wrote about a month ago where I was working out which day of the week players scored on. The first step towards getting Manchester United to return with a count of 0 is to make the ‘sent_off_in’ relationship optional. However, that on its own that isn’t enough because it now returns a count of all the player performances for each team: START game = node:matches('match_id:*') MATCH game<-[?:sent_off_in]-player-[:played]->likeThis-[:in]->game, likeThis-[:for]->team RETURN team.name, COUNT(game) AS redCards ORDER BY redCards ASC LIMIT 5 +-----------------------------+ | team.name | redCards | +-----------------------------+ | "Chelsea" | 448 | | "Wigan Athletic" | 459 | | "Fulham" | 460 | | "Liverpool" | 466 | | "Everton" | 467 | +-----------------------------+ 5 rows Instead what we need to do is collect up all the ‘sent_off_in’ relationships and sum them up. We can use the COLLECT function to do that and the neat thing about COLLECT is that it doesn’t bother collecting the empty relationships so we end up with exactly what we need: START game = node:matches('match_id:*') MATCH game<-[r?:sent_off_in]-player-[:played]->likeThis-[:in]->game, likeThis-[:for]->team RETURN team.name, COLLECT(r) AS redCards LIMIT 5 +-----------------------------------------------------------------------------------------------------+ | team.name | redCards | +-----------------------------------------------------------------------------------------------------+ | "Wigan Athletic" | [:sent_off_in[26443] {},:sent_off_in[37785] {}] | | "Everton" | [:sent_off_in[6795] {minute:61},:sent_off_in[21735] {},:sent_off_in[34594] {}] | | "Newcastle United" | [:sent_off_in[434] {minute:75},:sent_off_in[32389] {},:sent_off_in[34915] {}] | | "Southampton" | [:sent_off_in[49393] {minute:70},:sent_off_in[49392] {minute:82}] | | "West Ham United" | [:sent_off_in[21734] {minute:67}] | +-----------------------------------------------------------------------------------------------------+ 5 rows We then just need to call the LENGTH function to work out how many red cards there are in each collection and then we’re done: START game = node:matches('match_id:*') MATCH game<-[r?:sent_off_in]-player-[:played]->likeThis-[:in]->game, likeThis-[:for]->team RETURN team.name, LENGTH(COLLECT(r)) AS redCards ORDER BY redCards LIMIT 5 +--------------------------------+ | team.name | redCards | +--------------------------------+ | "Manchester United" | 0 | | "West Ham United" | 1 | | "Sunderland" | 1 | | "Norwich City" | 1 | | "Reading" | 1 | +--------------------------------+ 5 rows
April 30, 2013
by Mark Needham
· 5,860 Views
article thumbnail
Multipart Upload on S3 with jclouds
1. Goal In the previous article, we looked at how we can use the generic Blob APIs from jclouds to upload content to S3. In this article we will use the S3 specific asynchronous API from jclouds to upload content and leverage the multipart upload functionality provided by S3. 2. Preparation 2.1. Set up the custom API The first part of the upload process is creating the jclouds API – this is a custom API for Amazon S3: public AWSS3AsyncClient s3AsyncClient() { String identity = ... String credentials = ... BlobStoreContext context = ContextBuilder.newBuilder("aws-s3"). credentials(identity, credentials).buildView(BlobStoreContext.class); RestContext providerContext = context.unwrap(); return providerContext.getAsyncApi(); } 2.2. Determining the number of parts for the content Amazon S3 has a 5 MB limit for each part to be uploaded. As such, the first thing we need to do is determine the right number of parts that we can split our content into so that we don’t have parts below this 5 MB limit: public static int getMaximumNumberOfParts(byte[] byteArray) { int numberOfParts= byteArray.length / fiveMB; // 5*1024*1024 if (numberOfParts== 0) { return 1; } return numberOfParts; } 2.3. Breaking the content into parts Were going to break the byte array into a set number of parts: public static List breakByteArrayIntoParts(byte[] byteArray, int maxNumberOfParts) { List parts = Lists. newArrayListWithCapacity(maxNumberOfParts); int fullSize = byteArray.length; long dimensionOfPart = fullSize / maxNumberOfParts; for (int i = 0; i < maxNumberOfParts; i++) { int previousSplitPoint = (int) (dimensionOfPart * i); int splitPoint = (int) (dimensionOfPart * (i + 1)); if (i == (maxNumberOfParts - 1)) { splitPoint = fullSize; } byte[] partBytes = Arrays.copyOfRange(byteArray, previousSplitPoint, splitPoint); parts.add(partBytes); } return parts; } We’re going to test the logic of breaking the byte array into parts – we’re going to generate some bytes, split the byte array, recompose it back together using Guava and verify that we get back the original: @Test public void given16MByteArray_whenFileBytesAreSplitInto3_thenTheSplitIsCorrect() { byte[] byteArray = randomByteData(16); int maximumNumberOfParts = S3Util.getMaximumNumberOfParts(byteArray); List fileParts = S3Util.breakByteArrayIntoParts(byteArray, maximumNumberOfParts); assertThat(fileParts.get(0).length + fileParts.get(1).length + fileParts.get(2).length, equalTo(byteArray.length)); byte[] unmultiplexed = Bytes.concat(fileParts.get(0), fileParts.get(1), fileParts.get(2)); assertThat(byteArray, equalTo(unmultiplexed)); } To generate the data, we simply use the support from Random: byte[] randomByteData(int mb) { byte[] randomBytes = new byte[mb * 1024 * 1024]; new Random().nextBytes(randomBytes); return randomBytes; } 2.4. Creating the Payloads Now that we have determined the correct number of parts for our content and we managed to break the content into parts, we need to generate the Payload objects for the jclouds API: public static List createPayloadsOutOfParts(Iterable fileParts) { List payloads = Lists.newArrayList(); for (byte[] filePart : fileParts) { byte[] partMd5Bytes = Hashing.md5().hashBytes(filePart).asBytes(); Payload partPayload = Payloads.newByteArrayPayload(filePart); partPayload.getContentMetadata().setContentLength((long) filePart.length); partPayload.getContentMetadata().setContentMD5(partMd5Bytes); payloads.add(partPayload); } return payloads; } 3. Upload The upload process is a flexible multi-step process – this means: the upload can be started before having all the data – data can be uploaded as it’s coming in data is uploaded in chunks – if one of these operations fails, it can simply be retrieved chunks can be uploaded in parallel – this can greatly increase the upload speed, especially in the case of large files 3.1. Initiating the Upload operation The first step in the Upload operation is to initiate the process. This request to S3 must contain the standard HTTP headers – the Content-MD5 header in particular needs to be computed. Were going to use the Guava hash function support here: Hashing.md5().hashBytes(byteArray).asBytes(); This is the md5 hash of the entire byte array, not of the parts yet. To initiate the upload, and for all further interactions with S3, we’re going to use the AWSS3AsyncClient – the asynchronous API we created earlier: ObjectMetadata metadata = ObjectMetadataBuilder.create().key(key).contentMD5(md5Bytes).build(); String uploadId = s3AsyncApi.initiateMultipartUpload(container, metadata).get(); The key is the handle assigned to the object – this needs to be a unique identifier specified by the client. Also notice that, even though we’re using the async version of the API, we’re blocking for the result of this operation – this is because we will need the result of the initialize to be able to move forward. The result of the operation is an upload id returned by S3 – this will identify the upload throughout it’s lifecycle and will be present in all subsequent upload operations. 3.2. Uploading the Parts The next step is uploading the parts. Our goal here is to send these requests in parallel, as the upload parts operation represent the bulk of the upload process: List> ongoingOperations = Lists.newArrayList(); for (int partNumber = 0; partNumber < filePartsAsByteArrays.size(); partNumber++) { ListenableFuture future = s3AsyncApi.uploadPart( container, key, partNumber + 1, uploadId, payloads.get(partNumber)); ongoingOperations.add(future); } The part numbers need to be continuous but the order in which the requests are send is not relevant. After all of the upload part requests have been submitted, we need to wait for their responses so that we can collect the individual ETag value of each part: Function, String> getEtagFromOp = new Function, String>() { public String apply(ListenableFuture ongoingOperation) { try { return ongoingOperation.get(); } catch (InterruptedException | ExecutionException e) { throw new IllegalStateException(e); } } }; List etagsOfParts = Lists.transform(ongoingOperations, getEtagFromOp); If, for whatever reason, one of the upload part operations fails, the operation can be retried until it succeeds. The logic above does not contain the retry mechanism, but building it in should be straightforward enough. 3.3. Completing the Upload operation The final step of the upload process is completing the multipart operation. The S3 API requires the responses from the previous parts upload as a Map, which we can now easily create from the list of ETags that we obtained above: Map parts = Maps.newHashMap(); for (int i = 0; i < etagsOfParts.size(); i++) { parts.put(i + 1, etagsOfParts.get(i)); } And finally, send the complete request: s3AsyncApi.completeMultipartUpload(container, key, uploadId, parts).get(); This will return final ETag of the finished object and will complete the entire upload process. 4. Conclusion In this article we built a multipart enabled, fully parallel upload operation to S3, using the custom S3 jclouds API. This operation is ready to be used as is, but it can be improved in a few ways. First, retry logic should be added around the upload operations to better deal with failures. Next, for really large files, even though the mechanism is sending all upload multipart requests in parallel, a throttling mechanism should still limit the number of parallel requests being sent. This is both to avoid bandwidth becoming a bottleneck as well as to make sure Amazon itself doesn’t flag the upload process as exceeding an allowed limit of requests per second – the Guava RateLimiter can potentially be very well suited for this. P.S. You might dig following me on Twitter.
April 21, 2013
by Eugen Paraschiv
· 6,589 Views · 1 Like
article thumbnail
Upload on S3 with the jclouds Library
There are several good ways to upload content to an S3 bucket in the Java world – in this article we’ll look at what the jclouds library provides for this purpose. To use jclouds – specifically the APIs discussed in this article, this simple Maven dependency should be added to the pom of the project: org.jclouds jclouds-allblobstore 1.5.9 1. Uploading to Amazon S3 The first step, in order to access any of these APIs, is to create a BlobStoreContext: BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(BlobStoreContext.class); This represents the entry-point to a general key-value storage service, such as Amazon S3 – but not limited to it. For the more specific S3 only implementation, the context can be created similarly: BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(S3BlobStoreContext.class); And even more specifically: BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(AWSS3BlobStoreContext.class); When the authenticated context is no longer needed, closing it is required to release all resources – threads and connections – associated to it. 2. The four S3 APIs of jclouds The jclouds library provides four different APIs to upload content to S3 bucket, ranging from simple but inflexible to complex and powerful, all obtained via the BlobStoreContext. Let’s start with the simplest. 2.1. Upload via the Map API The easiest way jclouds can be used to interact with an S3 bucket is by representing that bucket as a Map. The API is obtained from the context: InputStreamMap bucket = context.createInputStreamMap("bucketName"); Then, to upload a simple HTML file: bucket.putString("index1.html", "hello world1"); The InputStreamMap API exposes several other types of PUT operations – files, raw bytes – both for single and bulk. A simple integration test can be used as an example: @Test public void whenFileIsUploadedToS3WithMapApi_thenNoExceptions() { BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(AWSS3BlobStoreContext.class); InputStreamMap bucket = context.createInputStreamMap("bucketName"); bucket.putString("index1.html", "hello world1"); context.close(); } 2.2. Upload via BlobMap Using the simple Map API is straightforward but ultimately limited – for example, there is no way to pass in metadata about the content being uploaded. When more flexibility and customization is necessary, this simplified approach to uploading data to S3 via a Map is no longer enough. The next API we’ll look at is the Blob Map API – this is obtained from the context: BlobMap bucket = context.createBlobMap("bucketName"); The API allows the client to access more lower level details, such as Content-Length, Content-Type, Content-Encoding, eTag hash and others; to upload new content in the bucket: Blob blob = bucket.blobBuilder().name("index2.html"). payload("hello world2"). contentType("text/html").calculateMD5().build(); The API also allows setting a variety of payloads on the create request. A simple integration test for uploading a basic HTML file to S3 via the Blob Map API: @Test public void whenFileIsUploadedToS3WithBlobMap_thenNoExceptions() throws IOException { BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(AWSS3BlobStoreContext.class); BlobMap bucket = context.createBlobMap("bucketName"); Blob blob = bucket.blobBuilder().name("index2.html"). payload("hello world2"). contentType("text/html").calculateMD5().build(); bucket.put(blob.getMetadata().getName(), blob); context.close(); } 2.3. Upload via BlobStore The previous APIs had no way to upload content using multipart upload – this makes them ill suited when working with large files. This limitation is addressed by the next API we’re going to look at – the synchronous BlobStore API. This is obtained from the context: BlobStore blobStore = context.getBlobStore(); To use the multipart support and upload a file to S3: Blob blob = blobStore.blobBuilder("index3.html"). payload("hello world3").contentType("text/html").build(); blobStore.putBlob("bucketName", blob, PutOptions.Builder.multipart()); The payload builder is the same one that was being used by the BlobMap API, so the same flexibility in specifying lower level metadata information about the blob is available here. The difference is the PutOptions supported by the PUT operation of the API – namely the multipart support. The previous integration test now has multipart enabled: @Test public void whenFileIsUploadedToS3WithBlobStore_thenNoExceptions() { BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(AWSS3BlobStoreContext.class); BlobStore blobStore = context.getBlobStore(); Blob blob = blobStore.blobBuilder("index3.html"). payload("hello world3").contentType("text/html").build(); blobStore.putBlob("bucketName", blob, PutOptions.Builder.multipart()); context.close(); } 2.4. Upload via AsyncBlobStore While the previous BlobStore API was synchronous, there is also an asynchronous API for BlobStore – AsyncBlobStore. The API is similarly obtained from the context: AsyncBlobStore blobStore = context.getAsyncBlobStore(); The only difference between the two is that the async API is returning ListenableFuture for the PUT asynchronous operation: Blob blob = blobStore.blobBuilder("index4.html"). .payload("hello world4").build(); blobStore.putBlob("bucketName", blob).get(); The integration test displaying this operation is similar to the synchronous one: @Test public void whenFileIsUploadedToS3WithBlobStore_thenNoExceptions() { BlobStoreContext context = ContextBuilder.newBuilder("aws-s3").credentials(identity, credentials) .buildView(AWSS3BlobStoreContext.class); BlobStore blobStore = context.getBlobStore(); Blob blob = blobStore.blobBuilder("index4.html"). payload("hello world4").contentType("text/html").build(); Future putOp = blobStore.putBlob("bucketName", blob, PutOptions.Builder.multipart()); putOp.get(); context.close(); } 3. Conclusion In this article, we analysed the four APIs that the jclouds library provides to upload content to Amazon S3. These four APIs are generic and they work with other key-value storage services as well – such as Microsoft Azure Storage for example. In the next article we’ll look at the Amazon specific S3 API available in jclouds – the AWSS3Client. We’ll implement the operation of uploading a large file, dynamically calculate the optimal number of parts for any given file, and perform the upload of all parts in parallel. P.S. You might dig following me on Twitter.
April 18, 2013
by Eugen Paraschiv
· 8,860 Views · 1 Like
article thumbnail
ActiveMQ and .NET combined!
ActiveMQ is one of the most popular messaging frameworks. For sure the most popular open source framework. Many people think that ActiveMQ works only with Java and this is not true at all. ActiveMQ can work with almost every popular language (including JavaScript!) through numerous protocols which it supports. Today I will show you how to use ActiveMQ in .NET-based solutions. Project setup Using VS 2010's Extension Manger I installed NuGet Package Manager. After installation and VS 2010 restart, I created a project called ActiveMQNMS. I right-clicked it and selected "Manage NuGet packages...". In the search field I typed: "ActiveMQ". There was a package called Apache.NMS.ActiveMQ. I installed it. (Note: ActiveMQ has one dependency - Apache.NMS package. The NMS package provides a unified API for working with different messaging frameworks and providers.) Starting ActiveMQ I already had ActiveMQ installed on my machine. If you don't have one, download it from http://activemq.apache.org. The default instance listens on 61616 port. However, mine is listening on 62626. If you want to run my code, please remember to change the port. To start ActiveMQ I executed: activemq-5.5.0\bin\activemq Depending on configured ports, you can use ActiveMQ web console to manage your queues, topics, subscribers, connections, embedded Apache Camel, etc. I'm using 8282 port, and the console URL is: http://localhost:8282/admin. Test stub In general the .NET API is almost a copy of the Java API. So if you're familiar with JMS and/or ActiveMQ you don't need any documentation. Please note TestIntialize and TestCleanup methods. using System; using Apache.NMS; using Apache.NMS.ActiveMQ; using Microsoft.VisualStudio.TestTools.UnitTesting; namespace ActiveMQNMS { [Serializable] public class Person { public string FirstName { get; set; } public string LastName { get; set; } } [TestClass] public class ActiveMqTest { private IConnection _connection; private ISession _session; private const String QUEUE_DESTINATION = "DotNet.ActiveMQ.Test.Queue"; [TestInitialize] public void TestInitialize() { IConnectionFactory factory = new ConnectionFactory("tcp://localhost:62626"); _connection = factory.CreateConnection(); _connection.Start(); _session = _connection.CreateSession(); } [TestCleanup] public void TestCleanup() { _session.Close(); _connection.Close(); } } } Writing Producer Here is the producer: [TestMethod] public void TestA() { IDestination dest = _session.GetQueue(QUEUE_DESTINATION); using (IMessageProducer producer = _session.CreateProducer(dest)) { var person = new Person { FirstName = "Łukasz", LastName = "Budnik" }; var objectMessage = producer.CreateObjectMessage(person); producer.Send(objectMessage); } } Run the test and refresh "Queues" list in ActiveMQ web console. You should see DotNet.ActiveMQ.Test.Queue queue with 1 enqueued and pending message. Purge the queue by hitting the purge link or you simply delete it. Writing Consumer Now we have to consume the message. Here is the code: [TestMethod] public void TestB() { Person person = null; IDestination dest = _session.GetQueue(QUEUE_DESTINATION); using (IMessageConsumer consumer = _session.CreateConsumer(dest)) { IMessage message; while ((message = consumer.Receive(TimeSpan.FromMilliseconds(2000))) != null) { var objectMessage = message as IObjectMessage; if (objectMessage != null) { person = objectMessage.Body as Person; if (person != null) { Assert.AreEqual("Łukasz", person.FirstName); Assert.AreEqual("Budnik", person.LastName); } } else { Assert.Fail("Object Message is null"); } } } if (person == null) { Assert.Fail("Person object is null"); } } Run tests. Refresh "Queues" tab in ActiveMQ web console. You should see 1 message enqueued and 1 message dequeued. As expected. Summary That's all. Simple, isn't it? ActiveMQ works very, very nicely with .NET. I have to find some performance comparison for ActiveMQ and MS or pure .C#/NET messaging frameworks. Or maybe you have it? Please share. cheers, Łukasz
April 15, 2013
by Łukasz Budnik
· 29,406 Views
article thumbnail
Introduction to SmartSVN
SmartSVN is a powerful and easy-to-use graphical client for Apache Subversion. There are several clients for Subversion, but here are just a few reasons you should try SmartSVN: It’s cross-platform – SmartSVN runs on Windows, Linux and Mac OS X, so you can continue using the operating system (OS) that works the best for you. It can also be integrated into your OS, via Mac’s Finder Integration or Windows Shell. Everything you need, out of the box – SmartSVN comes complete with all the tools you need to manage your Subversion projects: Conflict solver – this feature combines the freedom of a general, three-way-merge with the ability to detect and resolve any conflicts that occur during the development lifecycle. File compare – this allows you to make inner-line comparisons and directly edit the compared files. Built-in SSH client – allows users to access servers using the SSH protocol. This security-conscious protocol encrypts every piece of communication between the client and the server, for additional protection. A complete view of your project at a glance – the most important files (such as conflicted, modified or missing files) are placed at the top of the file list. SmartSVN also highlights which directories contain local modifications, which directories have been changed in the repository, and whether individual files have been modified locally or in the central repo. This makes it easy to get a quick overview of the state of your project. Fully customizable – maximize productivity by fine-tuning your SmartSVN installation to suit your particular needs: Change keyboard shortcuts, write your own plugin with the SmartSVN API, group revisions to personalize your display, create Change Sets, and alter the context menus and toolbars to suit you. You can learn more about customizing SmartSVN at our ‘5 Ways to Customize SmartSVN’ blog post. Comprehensive bug tracker support – Trac and JIRA are both fully supported. Multitude of support options – SmartSVN users have access to a range of free support, from refcards to blogsand documentation, the SmartSVN forum and a Twitter account maintained by our open source experts. If you need extra support with your SmartSVN installation, expert email support is included with SmartSVN Professional licenses. Want to learn more about SmartSVN? On April 18th, WANdisco will be be holding a free ‘Introduction to SmartSVN’ webinar covering everything you need to get off to a great start with this popular client: Repository basics Checkouts, working folders, editing files and commits Reporting on changes Simple branching Simple merging This webinar is free so register now.
April 13, 2013
by Jessica Thornsby
· 6,592 Views
article thumbnail
Application Services Governance Components
Application Services Governance is a necessary step towards building a responsive IT organization and achieving business agility. By guiding teams through a streamlined application services development process, Application Services Governance Platforms optimize IT effectiveness, raise software quality, and reduce delivery timeframes. Governance relies on policy, people, process and technology to guide business activity and consistently deliver positive outcomes. Effective governance channels business activity towards the ‘right’ path; by making the right actions the path of least resistance. To efficiently guide teams and demonstrate policy compliance benefits, Application Services Governance Platforms provide policy management, developer portals, repositories, service integration and composition, and business value dashboards. Effective governance encompasses the entire IT solution spanning APIs, services, business processes, data, and application delivery. While most governance solutions focus on web services, leading Application Services Governance Platforms bridge API governance, SOA governance, Cloud deployment governance, data governance, and application delivery governance. Additionally, the governance experience must be tailored for the participant’s project role. Portals may be personalized to present notifications, tasks, actions, and reports suitable for application service creators, publishers, subscribers, consumers, or business managers. Application delivery governance segments participants into developers, quality assurance testers, operations, project managers, and application users. End-user Application Services Governance priorities are evolving toward bridging service governance with API governance, extending application lifecycle management to embrace cloud deployment environments, and focusing on visualizing asset business value. Key governance challenges include meeting mobile application demands, implementing efficient self-service provisioning, right-sizing governance practices (not too heavy or light), and defining appropriate policy tiers. Governance Components To efficiently guide teams and demonstrate policy compliance benefits, Application Services Governance Platforms provide policy management, developer portals, repositories, service integration and composition, and business value dashboards. Figure 1 Application Services Governance Components Policy Management Policy management is used to specify the correct behavior, detail exception thresholds, and define corrective actions or notifications. Leading application services governance platforms deliver advanced policy management by conforming to a flexible architecture, addressing relevant policy categories, and spanning all lifecycle phases. A comprehensive Application Services Governance Platform manages: Design-time Policy Run-time Policy Security Policy Developer access Policy Service and API Lifecycle Management Policy Application Lifecycle Management Policy Within these six broad categories, application services governance commonly encompasses service level policies, usage policies, version policies, subscription policies, and access control policies. Registries serve as policy stores for many types of runtime policies including security policies, lifecycle management workflow policies, API policies, service description, service contracts, service consumption, service usage, service lifecycle management, service level agreements (SLAs) and XACML authorization policies. Leading platforms have built-in support for a number of policy standards including WS-Policy, XACML 3.0, and SCXML. Cloud foundation and cloud middleware components deliver sophisticated run-time policy enforcement for tenant partitioning, service level management, application provisioning, tenant access, and resource management. All run-time infrastructure products should serve as well-integrated policy enforcement points that may delegate policy decisions to external decision points or internally cache and process policy assertions. Identity Management infrastructure components serve as a policy decision point and a policy manager for sophisticated security policies encoded in XACML. The Application Service Governance Platforms use workflow engines to execute governance workflow, present task lists, and manage approvals. Complex Event Processor components can be configured as policy decision points, which use time-based policy pattern matching to evaluate run-time service, message, REST resource, and event traffic. For more information on policy management, read the detailed policy management blog post. Developer Portal and Repository Portals serve as the viewport into policy management, service integration and composition, and business value dashboards. The Application Service Governance portals should deliver an application service governance experience tuned for self-service, on-demand access, and safe API usage. Developer portals are often contextually personalized to fit the project and user’s role. For example, a developer portal may fit the needs of API creators and API publishers who are defining, documenting, and publishing APIs. The portal’s user experience may enable API creators and publishers to monitor, manage, and analyze API usage. A developer portal may also be personalized to deliver a user experience tailored for API consumers. API developers who are consuming APIs can find, explore, subscribe and evaluate APIs. Developer portals are often tuned to facilitate service meta-data and lifecycle management for service creators. Service and integration developers who are consuming services can find and explore services. A developer portal should guide teams toward effective and efficient governance when building service implementation and service consumption code. Advanced developer portals capabilities include overlaying build management governance, test governance (i.e. unit, integration, performance), implementation lifecycle governance, and deployment governance. An Application Services Governance Platform should enable flexible organization, classification & documentation of services, APIs, and any IT asset. Key repository capabilities include governing and managing: Any type of metadata in any structure Service, API, or artifact associations and relationships Schema definitions and namespaces Users and Roles User subscriptions Service level agreements Developer documentation Social taxonomies (e.g. ratings, comments, tags) Implementation artifacts (i.e. code, test cases) Service Integration and Composition Service integration and composition for APIs, web services, or business process are often implemented using tools provided by the run-time infrastructure vendor. Application Services Governance components must integrate into diverse run-time infrastructure containers and development tooling. Synchronizing policy, development artifacts, and deployment packages requires tight integration between design-time tools, development tools, run-time management consoles, and application services governance portals and repositories. Business Value Dashboards To gauge governance effectiveness and enhanced business value, analytic dashboards assess policy compliance, quality of service, service usage, architecture coherence, and team performance. The Application Services Governance platform should capture service tier subscription information, collects usage statistics, and integrate with billing and payment systems that deliver show-back or charge-back reports. Subscription and usage reports help teams understand asset adoption (by version, by service) and usage (by version, by service). By understanding adoption and usage, business owners and architects can intelligently invest future development resources, properly plan infrastructure scale, and rationalize the portfolio. Dashboards also present a service overview, number of services, service lifecycle stage, schema re-use, service dependencies, upgrade impacts, development team productivity, and project progress. Governance Lifecycle Phases API management portals and SOA Governance Registries must work together to keep API lifecycle stages synchronized with backend service implementation stages. An API Governance experience may provide a straightforward set of lifecycle stages (e.g., created, published, deprecated, retired, blocked) that may be customized by the development team. SOA Governance Registries facilitates service metadata management and governance across design, implementation, test, and run-time operations. Figure 2 below depicts the intersection of the two governance views. Figure 2: API and Service Lifecycle Views Application delivery governance usually relies on ad hoc tools and processes, knitted together by end-user delivery managers. Application Services Governance Platforms should span project inception, development, quality assurance, production deployment, production management, maintenance, and retirement. Figure 3 illustrates service implementation activities governed by an application delivery governance product. Figure 3: Implementation activities governed by application services delivery governance Application Services Governance Drivers The IT focus on API, DevOps, and Cloud scale is driving resurgent interest in Application Services Governance. As development teams support mobile applications by fielding web APIs, they are creating a new ‘demand layer’ in front of existing service implementations. Both API and SOA success requires creating loosely coupled consumer-provider connections, enforcing a separation of concerns between consumer and provider, and exposing a set of re-usable, shared services, and gaining service consumer adoption. With traditional SOA Governance, many development teams publish services, yet struggle to create a service architecture that is widely shared, re-used, and adopted across internal development teams. In today’s connected business world, API and SOA are the business. An effective governance approach must address human collaboration stumbling blocks. By publishing managed APIs, establishing API manager and publisher roles, extending the governance registry, facilitating API management practices (e.g self-service key management, self-service provisioning, service tier management, and usage visualization),and offering APIs through developer portal, organizations can overcome collaboration, trust, and adoption hurdles while enhancing SOA success. By publishing managed APIs, establishing API manager and publisher roles, extending the governance registry, and offering APIs through an API Store, team have a new opportunity to increase service re-use and enhance IT business value. For more information on how teams can complement SOA Governance with API Governance, read the promoting services with API Management white paper. Because services are often imbedded in application solutions, leading Application Services Governance platforms wrap services governance inside application delivery governance. When operation team members use traditional point tools (i.e. Puppet, Chef, Jenkins,Selenium) to achieve DevOps benefits, the teams spend a considerable amount of time and effort creating agile workflow, effective governance, seamless activity transitions, and on-demand self-service access. A configurable DevOps PaaS can implement governance best practices and be readily adopted by teams without extensive implementation effort. Effective application delivery governance presents a simplified and unified user experience to complex development tools, processes, and team hand-offs. By integrating software promotion best practices, test automation, continuous integration, and issue tracking, application delivery governance raises software quality while reducing delivery timeframes. For more information, read about how to accelerate agility and maintain governance with DevOps PaaS. Recommended Reading Policy Management for Application Services Governance Application Services Governance Requires More Than a SOA Registry API and SOA Convergence Promoting services with API Management white paper Accelerate agility and maintain governance with DevOps PaaS Governance Registry Brings Integrity to SaaS Platform Gartner’s analysis of WSO2 SOA Governance
April 13, 2013
by Chris Haddad
· 5,906 Views · 2 Likes
article thumbnail
Complex Event Processing Made Easy (using Esper)
The following is a very simple example of event stream processing (using the ESPER engine). Note - a full working example is available over on GitHub: https://github.com/corsoft/esper-demo-nuclear What is Complex Event processing (CEP)? Complex Event Processing (CEP), or Event Stream Stream Processing (ESP) are technologies commonly used in Event-Driven systems. These type of systems consume, and react to a stream of event data in real time. Typically these will be things like financial trading, fraud identification and process monitoring systems – where you need to identify, make sense of, and react quickly to emerging patterns in a stream of data events. Key Components of a CEP system A CEP system is like your typical database model turned upside down. Whereas a typical database stores data, and runs queries against the data, a CEP data stores queries, and runs data through the queries. To do this it basically needs: Data – in the form of ‘Events’ Queries – using EPL (‘Event Processing Language’) Listeners – code that ‘does something’ if the queries return results A Simple Example - A Nuclear Power Plant Take the example of a Nuclear Power Station.. Now, this is just an example – so please try and suspend your disbelief if you know something about Nuclear Cores, Critical Temperatures, and the like. It’s just an example. I could have picked equally unbelievable financial transaction data. But ... Monitoring the Core Temperature Now I don’t know what the core is, or if it even exists in reality – but for this example lets assume our power station has one, and if it gets too hot – well, very bad things happen.. Lets also assume that we have temperature gauges (thermometers?) in place which take a reading of the core temperature every second – and send the data to a central monitoring system. What are the requirements? We need to be warned when 3 types of events are detected: MONITOR just tell us the average temperature every 10 seconds - for information purposes WARNING WARN us if we have 2 consecutive temperatures above a certain threshold CRITICAL ALERT us if we have 4 consecutive events, with the first one above a certain threshold, and each subsequent one greater than the last – and the last one being 1.5 times greater than the first. This is trying to alert us that we have a sudden, rising escalating temperature spike – a bit like the diagram below. And let’s assume this is a very bad thing. Using Esper There are a number of ways you could approach building a system to handle these requirements. For the purpose of this post though - we will look at using Esper to tackle this problem How we approach this with Esper is: Using Esper – we can create 3 queries (using EPL - Esper Query Language) to model each of these event patterns. We then attach a listener to each query - this will be triggered when the EPL detects a matching pattern of events) We create an Esper service, and register these queries (and their listeners) We can then just throw Temperature data through the service – and let Esper tell alert the listeners when we get matches. (A working example of this simple solution is available on Githib - see link above) Our Simple ESPER Solution At the core of the system are the 3 queries for detecting the events. Query 1 – MONITOR (Just monitor the average temperature) select avg(value) as avg_val from TemperatureEvent.win:time_batch(10 sec) Query 2 – WARN (Tell us if we have 2 consecutive events which breach a threshold) select * from TemperatureEvent " match_recognize ( measures A as temp1, B as temp2 pattern (A B) define A as A.temperature > 400, B as B.temperature > 400) Query 3 – CRITICAL - 4 consecutive rising values above all above 100 with the fourth value being 1.5x greater than the first select * from TemperatureEvent match_recognize ( measures A as temp1, B as temp2, C as temp3, D as temp4 pattern (A B C D) define A as A.temperature > 100, B as (A.temperature < B.value), C as (B.temperature < C.value), D as (C.temperature < D.value) and D.value > (A.value * 1.5)) Some Code Snippets TemperatureEvent We assume our incoming data arrives in the form of a TemperatureEvent POJO If it doesn't - we can convert it to one, e.g. if it comes in via a JMS queue, our queue listener can convert it to a POJO. We don't have to do this, but doing so decouples us from the incoming data structure, and gives us more flexibility if we start to do more processing in our Java code outside the core Esper queries. An example of our POJO is below package com.cor.cep.event; package com.cor.cep.event; import java.util.Date; /** * Immutable Temperature Event class. * The process control system creates these events. * The TemperatureEventHandler picks these up * and processes them. */ public class TemperatureEvent { /** Temperature in Celcius. */ private int temperature; /** Time temerature reading was taken. */ private Date timeOfReading; /** * Single value constructor. * @param value Temperature in Celsius. */ /** * Temerature constructor. * @param temperature Temperature in Celsius * @param timeOfReading Time of Reading */ public TemperatureEvent(int temperature, Date timeOfReading) { this.temperature = temperature; this.timeOfReading = timeOfReading; } /** * Get the Temperature. * @return Temperature in Celsius */ public int getTemperature() { return temperature; } /** * Get time Temperature reading was taken. * @return Time of Reading */ public Date getTimeOfReading() { return timeOfReading; } @Override public String toString() { return "TemperatureEvent [" + temperature + "C]"; } } Handling this Event In our main handler class - TemperatureEventHandler.java, we initialise the Esper service. We register the package containing our TemperatureEvent so the EPL can use it. We also create our 3 statements and add a listener to each statement /** * Auto initialise our service after Spring bean wiring is complete. */ @Override public void afterPropertiesSet() throws Exception { initService(); } /** * Configure Esper Statement(s). */ public void initService() { Configuration config = new Configuration(); // Recognise domain objects in this package in Esper. config.addEventTypeAutoName("com.cor.cep.event"); epService = EPServiceProviderManager.getDefaultProvider(config); createCriticalTemperatureCheckExpression(); createWarningTemperatureCheckExpression(); createTemperatureMonitorExpression(); } An example of creating the Critical Temperature warning and attaching the listener /** * EPL to check for a sudden critical rise across 4 events, * where the last event is 1.5x greater than the first. * This is checking for a sudden, sustained escalating * rise in the temperature */ private void createCriticalTemperatureCheckExpression() { LOG.debug("create Critical Temperature Check Expression"); EPAdministrator epAdmin = epService.getEPAdministrator(); criticalEventStatement = epAdmin.createEPL(criticalEventSubscriber.getStatement()); criticalEventStatement.setSubscriber(criticalEventSubscriber); } And finally - an example of the listener for the Critical event. This just logs some debug - that's as far as this demo goes. package com.cor.cep.subscriber; import java.util.Map; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.stereotype.Component; import com.cor.cep.event.TemperatureEvent; /** * Wraps Esper Statement and Listener. No dependency on Esper libraries. */ @Component public class CriticalEventSubscriber implements StatementSubscriber { /** Logger */ private static Logger LOG = LoggerFactory.getLogger(CriticalEventSubscriber.class); /** Minimum starting threshold for a critical event. */ private static final String CRITICAL_EVENT_THRESHOLD = "100"; /** * If the last event in a critical sequence is this much greater * than the first - issue a critical alert. */ private static final String CRITICAL_EVENT_MULTIPLIER = "1.5"; /** * {@inheritDoc} */ public String getStatement() { // Example using 'Match Recognise' syntax. String criticalEventExpression = "select * from TemperatureEvent " + "match_recognize ( " + "measures A as temp1, B as temp2, C as temp3, D as temp4 " + "pattern (A B C D) " + "define " + " A as A.temperature > " + CRITICAL_EVENT_THRESHOLD + ", " + " B as (A.temperature < B.temperature), " + " C as (B.temperature < C.temperature), " + " D as (C.temperature < D.temperature) " + "and D.temperature > " + "(A.temperature * " + CRITICAL_EVENT_MULTIPLIER + ")" + ")"; return criticalEventExpression; } /** * Listener method called when Esper has detected a pattern match. */ public void update(Map eventMap) { // 1st Temperature in the Critical Sequence TemperatureEvent temp1 = (TemperatureEvent) eventMap.get("temp1"); // 2nd Temperature in the Critical Sequence TemperatureEvent temp2 = (TemperatureEvent) eventMap.get("temp2"); // 3rd Temperature in the Critical Sequence TemperatureEvent temp3 = (TemperatureEvent) eventMap.get("temp3"); // 4th Temperature in the Critical Sequence TemperatureEvent temp4 = (TemperatureEvent) eventMap.get("temp4"); StringBuilder sb = new StringBuilder(); sb.append("***************************************"); sb.append("\n* [ALERT] : CRITICAL EVENT DETECTED! "); sb.append("\n* " + temp1 + " > " + temp2 + " > " + temp3 + " > " + temp4); sb.append("\n***************************************"); LOG.debug(sb.toString()); } } The Running Demo Full instructions for running the demo can be found here: https://github.com/corsoft/esper-demo-nuclear An example of the running demo is shown below - it generates random Temperature events and sends them through the Esper processor (in the real world this would come in via a JMS queue, http endpoint or socket listener). When any of our 3 queries detect a match - debug is dumped to the console. In a real world solution each of these 3 listeners would handle the events differently - maybe by sending messages to alert queues/endpoints for other parts of the system to pick up the processing. Conclusions Using a system like Esper is a neat way to monitor and spot patterns in data in real time with minimal code. This is obviously (and intentionally) a very bare bones demo, barely touching the surface of the capabilities available. Check out the Esper web site for more info and demos. Esper also has a plugin for Apache Camel Integration engine - which allows you to configure you EPL queries directly in XML Spring camel routes, removing the need for any Java code completely (we will possibly cover this in a later blog post!)
April 11, 2013
by Adrian Milne
· 68,346 Views · 4 Likes
article thumbnail
Add Custom Post Meta Data To Post List Table
One of the best thing about WordPress is that you can customise almost anything. In the admin area you can see a list of all the posts you have added in WordPress. Within this table it shows the basic information for each of the posts, the title, the author, the category, tags, comments and the date the post was published. WordPress has a number of different filters and actions that allow you to edit the output of the column so you can add your own data to this list. For example if you have custom post meta data which is useful information you want to display on the list of posts you can add new custom columns to the list. In this article you will learn how to add new columns to the post list, how you can add data to the column and how you can make this column sortable. Add New Columns First we start off by adding the new column to the list, for this we use the WordPress filter manage_edit-post_columns. This will allow you to edit the output of the columns by adding new values to the column array. The callback function on this filter will pass in one parameter which are the current columns on the list, the return of this function will be the new columns on the post table. This means that we can add additional values to the array to add extra columns to the table. The following code will add a new column to the table just after the title column. // Add a column to the edit post list add_filter( 'manage_edit-post_columns', 'add_new_columns'); /** * Add new columns to the post table * * @param Array $columns - Current columns on the list post */ function add_new_columns( $columns ) { $column_meta = array( 'meta' => 'Custom Column' ); $columns = array_slice( $columns, 0, 2, true ) + $column_meta + array_slice( $columns, 2, NULL, true ); return $columns; } Add Columns To Custom Post Types If you have custom post types in your site and want to add additional columns to this list, WordPress comes with built in filters you can apply to add new columns to this table. add_filter( 'manage_${post_type}_posts_columns', 'add_new_columns'); If you have a custom post type of portfolio then you can use the following code to add a column to the list of portfolio post types. function add_portfolio_columns($columns) { return array_merge($columns, array('client' => __('Client'), 'project_date' =>__( 'Project Date'))); } add_filter('manage_portfolio_posts_columns' , 'add_portfolio_columns'); Add Data To Custom Columns Once you have created the new columns for the posts list you can now add data to the new columns by using the WordPress action manage_posts_custom_column. Adding an action to this will be called on each column, from this call we can get data for the post display this on the post list. The following code will check what column we are on and get the custom post meta data for the current post and display this in the column. // Add action to the manage post column to display the data add_action( 'manage_posts_custom_column' , 'custom_columns' ); /** * Display data in new columns * * @param $column Current column * * @return Data for the column */ function custom_columns( $column ) { global $post; switch ( $column ) { case 'meta': $metaData = get_post_meta( $post->ID, 'twitter_url', true ); echo $metaData; break; } } Add Data To Custom Post Type Columns Along with being able to add a filter to custom post types by adding new columns, WordPress has a built in action you can use to add data to custom columns. In the above example we add a new column just for post types of a portfolio to add two new columns to the list, using the below code you can add data to these new columns. function custom_portfolio_column( $column, $post_id ) { switch ( $column ) { case 'project_date': echo get_post_meta( $post_id , 'project_date' , true ); break; case 'client': echo get_post_meta( $post_id , 'client' , true ); break; } } add_action( 'manage_portfolio_posts_custom_column' , 'custom_portfolio_column' ); Make Columns Sortable By default the new custom columns are not sortable so this makes it hard to find data that you need. To sort the custom columns WordPress has another filter manage_edit-post_sortable_columns you can use to assign which columns are sortable. When this action is ran the function will pass in a parameter of all the columns which are currently sortable, by adding your new custom columns to this list will now make these columns sortable. The value you give this will be used in the URL so WordPress understands which column to order by. The following to allow you to sort by the custom column meta. // Register the column as sortable function register_sortable_columns( $columns ) { $columns['meta'] = 'Custom Column'; return $columns; } add_filter( 'manage_edit-post_sortable_columns', 'register_sortable_columns' ); That's all the information you need to change the way posts are listed in your admin area. What useful information do you wish was displayed in the post list?
April 10, 2013
by Paul Underwood
· 13,622 Views
article thumbnail
Configuring Apache SolrCloud on Amazon VPC
We are going to construct an Apache SolrCloud (4.1) with 12 node EC2 instance(s) inside Amazon VPC in this post. Since the search data stored inside the SolrCloud is critical, we are going to build High availability at Solr Node level as well as AZ level. This setup will be done inside private subnet of Amazon VPC and will leverage 3 Availability Zones of the Amazon EC2 Region. Deployment architecture of the setup is given below: A small brief about setup: 3 Zookeepers will be deployed on 3 Availability Zones. ZK EC2 instances will be deployed on the Private subnet of the Amazon VPC. 3 Solr Shard EC2 instances will be deployed on Private subnet of Availability Zone 1 inside Amazon VPC. 3 Solr Replica EC2 instances will be deployed on Private subnet of Availability Zone 2 inside Amazon VPC. 3 Solr Replica EC2 instances will be deployed on Private subnet of Availability Zone 3 inside Amazon VPC. EBS optimized + PIOPS EC2 instances can be used for Solr EC2 Nodes To know more about SolrCloud Deployment best practices on Amazon VPC, Refer article: http://harish11g.blogspot.in/2013/03/Apache-Solr-cloud-on-Amazon-EC2-AWS-VPC-implementation-deployment.html Step 1: Creating Virtual Private Cloud on AWS Create a VPC with Public and Private Subnets. Assume the Load balancer and Web/App Servers can reside on the public subnet and Apache Solr Cloud will reside on the private subnet of the VPC. Step 2: Assigning the IP for the Subnets Create the subnet with its IP range. Chose the Availability zone for this subnet. Step 3: Multiple Subnets on Multiple AZ’s Create multiple subnets in Multiple AZ for building a Highly available setup for SolCloud Step 4: Install Java for Zookeeper & Solr Amazon Linux is chosen as the EC2 OS variant. Execute the following instructions on the respective EC2 nodes after their launch. EC2 instances should be launched in Multi-AZ in Multiple VPC Private Subnets. Solr uses Zookeeper as the cluster configuration and coordinator. Zookeeper is a distributed file system containing information about all the Solr Nodes. Solrconfig.xml, Schema.xml etc are stored in the repository.We have used Oracle-Sun Java over OpenJDK “sudo -s” “cd /opt” “wget --no-cookies --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2Ftechnetwork%2Fjava%2Fjavase%2Fdownloads%2Fjdk-7u3-download-1501626.html;" http://download.oracle.com/otn-pub/java/jdk/7u13-b20/jdk-7u13-linux-x64.rpm” “mv jdk-7u10-linux-x64.rpm?AuthParam=1357217677_76ec3d8d9a3644f4b9ec1ea79e1fcf33 jdk-7u10-linux-x64.rpm jdk-7u10-linux-x64.rpm” “sudo rpm -ivh jdk-7u10-linux-x64.rpm” “alternatives --install /usr/bin/java java /usr/java/jdk1.7.0_10/jre/bin/java 20000” “alternatives --install /usr/bin/javaws javaws /usr/java/jdk1.7.0_10/jre/bin/javaws 20000” “alternatives --install /usr/bin/javac javac /usr/java/jdk1.7.0_10/bin/javac 20000” “alternatives --install /usr/bin/jar jar /usr/java/jdk1.7.0_10/bin/jar 20000” “alternatives --install /usr/bin/java java /usr/java/jre1.7.0_10/bin/java 20000” “alternatives --install /usr/bin/javaws javaws /usr/java/jre1.7.0_10/bin/javaws 20000” “alternatives --configure java” Add JAVA_HOME in .bash_profile: “vim ~/.bash_profile” export JAVA_HOME="/usr/java/jdk1.7.0_09" export PATH=$PATH:$JAVA_HOME/bin Restart the instance. “init 6” Check the version of Java installed using “java -version” command Step 5: Configure the ZooKeeper (v3.4.5) Ensemble: Since single Zookeeper is not ideal for a large Solr cluster (because of SPOF), it is recommended to configure multiple Zookeepers in concert as an ensemble .In this step we will install and configure 3 ZooKeeper EC2 nodes spanning across 3 different Availability Zones in respective Private Subnets inside a VPC.Zookeeper will be configured on Amazon Linux. “sudo yum update” “sudo -s” “ cd /opt” “wget http://apache.techartifact.com/mirror/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz” “tar -xzvf zookeeper-3.4.5.tar.gz” “rm zookeeper-3.4.5.tar.gz” “cd zookeeper-3.4.5” “cp conf/zoo_sample.cfg conf/zoo.cfg” Add the following lines in zoo.cfg “vim conf/zoo.cfg” dataDir=/data server.1=[zk-server01-ip]:2888:3888 server.2=[zk-server02-ip]:2888:3888 server.3=[zk-server03-ip]:2888:3888 “cd /opt/zookeeper/data” “vim myid” 1 or 2 or 3 respectively on each ZooKeeper EC2 instances in Multi-AZ #Starting ZooKeeper Program. “bin/zkServer.sh start” Follow the above steps in all the ZooKeeper servers. ReferClustered (Multi-Server) SetupandConfiguration Parameters for understandingquorum_port,leader_election_port and the filemyid. Every ZooKeeper node needs to know about every other ZK EC2 node in the ensemble, and a majority of EC2’s (called a Quorum) are needed to provide the service. Make sure the VPC IP of all the Zookeepers are given in every ZK node, like the one in following command. server.1=:: server.2=:: server.3=:: Step 6: Configuring Solr 4.1 EC2 node In this step we will install and configure 3 Apache Solr4.1 Shard EC2 instances in a single Amazon AZ and 2 Solr Replicas in another AZ in their respective Private subnets. Please note that we have to specify all the ZooKeeper (ZK) hosts on every Solr instance as below. Note: Solr gets comes with jetty in default, it is suggested to use tomcat for production nodes. Perform the following after launching EC2 instances in Multi-AZ in Multiple VPC Private Subnets. “sudo -s” “yum update” “cd /opt” “wget http://apache.techartifact.com/mirror/lucene/solr/4.1.0/apache-solr-4.1.0.tgz” “tar -xzvf apache-solr-4.1.0.tgz” “rm -f apache-solr-4.1.0.tgz” On Solr Shard/Replica Instances: “cd /opt/apache-solr-4.0.0/example/” “vim /opt/apache-solr-4.0.0/example/solr/collection1/conf/solrconfig.xml” Change /var/data/solr to /data Starting Solr4.1 Shard/Replica Java Program. “java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=SolrCloud4.1-Conf -DnumShards=3 -DzkHost=[zk-server01-ip]:2181,[zk-server02-ip]:2181,[zk-server03-ip]:2181 -jar start.jar “java -DzkHost= DzkHost=:,:,: -jar start.jar” -DnumShards: the number of shards that will be present. Note that once set, this number cannot be increased or decreased without re-indexing the entire data set. (Dynamically changing the number of shards is part of the Solr roadmap!) -DzkHost: a comma-separated list of ZooKeeper servers. -Dbootstrap_confdir, -Dcollection.configName: these parameters are specified only when starting up the first Solr instance. This will enable the transfer of configuration files to ZooKeeper. Subsequent Solr instances need to just point to the ZooKeeper ensemble. The above command with –DnumShards=3 specifies that it is a 3-shard cluster. The first Solr EC2 node automatically becomes shard1 and the second Solr EC2 node automatically becomes shard2 …. What happens when we launch fourth Solr instance in this cluster? Since it’s a 3-shard cluster, the fourth Solr EC2 node automatically becomes a replica of shard1 and the fifth Solr EC2 node becomes a replica of shard2. Step 7: AWS Security Group TCP Ports to be enabled: Configure the following TCP ports on the AWS security group to allow access between Solr and ZK nodes deployed in Multiple AZ. Solr Shards/Replicas will connect to ZK through TCP Port 2181 Solr Web Interface with Jetty container through TCP Port 8983 Solr Web Interface with Tomcat container through TCP Port 8080 Every instance that is part of the ZooKeeper ensemble should know about every other machine in the ensemble. We can accomplish this with the series of lines of the form server.id=host:port:port For example, server.1=[vpc-ip]:2888:3888 server.2=[vpc-ip]:2888:3888 server.3=[vpc-ip]:2888:3888 TCP Ports 2888, 3888 should be opened for ZK Ensemble.
April 5, 2013
by Harish Ganesan
· 7,776 Views
article thumbnail
Weekend Project: Send sensor data from Arduino to MongoDB
Arduino is an open-source electronics platform that can acknowledge and interact with its environment through a variety of sensor types. It’s great for hardware prototyping and one-off projects. I just got an Arduino Board from our friends at SendGrid, who also gave me a little tutorial in the art of Arduino hacking. Inspired by the tutorial and armed with this new board, I bought a passive infared (PIR) motion sensor from my local Radio Shack. Now I was ready to play; in particular, I wanted to be able to collect that continuous stream of hardware sensor data into a MongoDB database for logging, trend analysis, system event correlation, etc. To this end, I created the demo project “mongodb-motion”, which I’ve made public on Github. In the “mongodb-motion” Github repo, you will find an Arudino project that writes motion sensor data to a cloud MongoDB database at MongoLab and sends alerts via email based on certain criteria. I built this demo using Node.js and the MongoLab REST API. Below, I’ll go through exactly what hardware you need to make your own “mongodb-motion” project a success, and how the code actually works. What You Need The hardware used in this demo includes: an Arduino UNO R3 and a Parallax PIR motion sensor. How the Code Works You can use a variety of motion sensors with the Arduino. In this particular experiment, I used a PIR motion sensor. The PIR motion sensor behaves like a switch, with ‘down’ events emitted on motion detection and ‘up’ events a few seconds after motion ceases to be detected. On the receiving side, I used JohnnyFive, an appropriately named Node.js package that accepts sensor events and sends messages to the Arduino board. With the two ends set, I’ll move on to the project’s configuration file. In this demo, I’ve included a configuration file, config-sample.js, where credentials for the MongoLab REST API and for the email SMTP server can be added. In my case, I used the SendGrid SMTP service. The configuration file also has two callbacks that determine when an email is emitted, one for each type of event – “detect” and “ceased”. I’ve used this feature to automatically send an email alert if an event timestamp is between 7:00pm and 8:00am, ostensibly when my office should be motionless… I’m out there watching you, office! Once you’ve customized this config-sample.js file, be sure to rename it to config.js in order for it to be usable. If you inspect the project code, you’ll notice that the MongoLab REST API is called in the logMsg() function, using an https.request. Building this little demo has given me some new ideas for hardware hacking the cloud. I hope you give it a try too. Thanks to the Arduino, Node.js and Javascript communities, and special thanks to Rick Waldon for Johnny Five, SendGrid for the UNO board, and a big shout out to @swiftalphaone for the Waza tutorial.
April 3, 2013
by Ben Wen
· 17,780 Views
article thumbnail
HashSet vs. TreeSet vs. LinkedHashSet
in a set, there are no duplicate elements. that is one of the major reasons to use a set. there are 3 commonly used implementations of set in java: hashset, treeset and linkedhashset. when and which to use is an important question. in brief, if we want a fast set, we should use hashset; if we need a sorted set, then treeset should be used; if we want a set that can be read by following its insertion order, linkedhashset should be used. 1. set interface set interface extends collection interface. in a set, no duplicates are allowed. every element in a set must be unique. we can simply add elements to a set, and finally we will get a set of elements with duplicates removed automatically. 2. hashset vs. treeset vs. linkedhashset hashset is implemented using a hash table. elements are not ordered. the add, remove, and contains methods has constant time complexity o(1). treeset is implemented using a tree structure(red-black tree in algorithm book). the elements in a set are sorted, but the add, remove, and contains methods has time complexity of o(log (n)). it offers several methods to deal with the ordered set like first(), last(), headset(), tailset(), etc. linkedhashset is between hashset and treeset. it is implemented as a hash table with a linked list running through it, so it provides the order of insertion. the time complexity of basic methods is o(1). 3. treeset example treeset tree = new treeset(); tree.add(12); tree.add(63); tree.add(34); tree.add(45); iterator iterator = tree.iterator(); system.out.print("tree set data: "); while (iterator.hasnext()) { system.out.print(iterator.next() + " "); } output is sorted as follows: tree set data: 12 34 45 63 now let's define a dog class as follows: class dog { int size; public dog(int s) { size = s; } public string tostring() { return size + ""; } } let's add some dogs to treeset like the following: import java.util.iterator; import java.util.treeset; public class testtreeset { public static void main(string[] args) { treeset dset = new treeset(); dset.add(new dog(2)); dset.add(new dog(1)); dset.add(new dog(3)); iterator iterator = dset.iterator(); while (iterator.hasnext()) { system.out.print(iterator.next() + " "); } } } compile ok, but run-time error occurs: exception in thread "main" java.lang.classcastexception: collection.dog cannot be cast to java.lang.comparable at java.util.treemap.put(unknown source) at java.util.treeset.add(unknown source) at collection.testtreeset.main(testtreeset.java:22) because treeset is sorted, the dog object need to implement java.lang.comparable's compareto() method like the following: class dog implements comparable{ int size; public dog(int s) { size = s; } public string tostring() { return size + ""; } @override public int compareto(dog o) { return size - o.size; } } the output is: 1 2 3 4. hashset example hashset dset = new hashset(); dset.add(new dog(2)); dset.add(new dog(1)); dset.add(new dog(3)); dset.add(new dog(5)); dset.add(new dog(4)); iterator iterator = dset.iterator(); while (iterator.hasnext()) { system.out.print(iterator.next() + " "); } output: 5 3 2 1 4 note the order is not certain. 5. linkedhashset example linkedhashset dset = new linkedhashset(); dset.add(new dog(2)); dset.add(new dog(1)); dset.add(new dog(3)); dset.add(new dog(5)); dset.add(new dog(4)); iterator iterator = dset.iterator(); while (iterator.hasnext()) { system.out.print(iterator.next() + " "); } the order of the output is certain and it is the insertion order. 2 1 3 5 4 6. performance testing the following method tests the performance of the three class on add() method. public static void main(string[] args) { random r = new random(); hashset hashset = new hashset(); treeset treeset = new treeset(); linkedhashset linkedset = new linkedhashset(); // start time long starttime = system.nanotime(); for (int i = 0; i < 1000; i++) { int x = r.nextint(1000 - 10) + 10; hashset.add(new dog(x)); } // end time long endtime = system.nanotime(); long duration = endtime - starttime; system.out.println("hashset: " + duration); // start time starttime = system.nanotime(); for (int i = 0; i < 1000; i++) { int x = r.nextint(1000 - 10) + 10; treeset.add(new dog(x)); } // end time endtime = system.nanotime(); duration = endtime - starttime; system.out.println("treeset: " + duration); // start time starttime = system.nanotime(); for (int i = 0; i < 1000; i++) { int x = r.nextint(1000 - 10) + 10; linkedset.add(new dog(x)); } // end time endtime = system.nanotime(); duration = endtime - starttime; system.out.println("linkedhashset: " + duration); } from the output below, we can clearly wee that hashset is the fastest one. hashset: 2244768 treeset: 3549314 linkedhashset: 2263320 if you enjoyed this article and want to learn more about java collections, check out this collection of tutorials and articles on all things java collections.
March 29, 2013
by Ryan Wang
· 181,564 Views · 3 Likes
article thumbnail
AWS VPC NAT Instance Failover and High Availability
Amazon Virtual Private Cloud (VPC) is a great way to setup an isolated portion of AWS and control the network topology. It is a great way to extend your data center and use AWS for burst requirements. With the latest VPC for Everyone announcement, what was earlier "Classic" and "VPC" in AWS will soon be only VPC. That is, every deployment in AWS will be on a VPC even though one might not need all the additional features that VPC provides. One might eventually start looking at utilizing VPC features such as multiple Subnets, Network isolation, Network ACLs, etc.. Those who have already worked with VPC's understand the role of NAT Instance in a VPC. When you create a VPC, you create them with multiple Subnets (Public and Private). Instances launched in the Public Subnet have direct internet connectivity to send and receive internet traffic through the internet gateway of the VPC. Typically, internet facing servers such as web servers are kept in the Public Subnet. A Private Subnet can be used to launch Instances that do not require direct access from the internet. Instances in a Private Subnet can access the Internet without exposing their private IP address by routing their traffic through a Network Address Translation (NAT) instance in the Public Subnet. AWS provides an AMI that can be launched as a NAT Instance. Following diagram is the representation of a standard VPC that gets provisioned through the AWS Management Console wizard. Standard Private and Public Subnets in a VPC The above architecture has A Public Subnet that has direct internet connectivity through the Internet Gateway. Web Instances can be placed within the Public Subnet The custom Route Table associated with Public Subnet will have the necessary routing information to route traffic to the Internet Gateway A NAT Instance is also provisioned in the Public Subnet A Private Subnet that has outbound internet connectivity through the NAT Instance in the Public Subnet The Main Route Table is by default associated with the Private Subnet. This will have necessary routing information to route internet traffic to the NAT Instance Instances in the Private Subnet will use the NAT Instance for outbound internet connectivity. For example, DB backups from standby that needs to be stored in S3. Background programs that make external web services calls Of course, the above architecture has limited High Availability since all the Subnets are created within the same Availability Zone. We can avoid this by creating multiple Subnets in multiple Availability Zones. Public and Private Subnets with multiple Availability Zones Additional Subnets (Public and Private) are created in one another Availability Zone Both Private Subnets are attached to the Main Routing Table Both Public Subnets are attached to the same Custom Routing Table Instances in the Private Subnet still continue to use the NAT Instance for outbound internet connectivity Though we increased the High Availability by utilizing multiple Availability Zones, the NAT Instance is still a Single Point of Failure. NAT Instance is just another EC2 Instance that can become unavailable any time. The updated architecture below uses two NAT Instances to provide failover and High Availability for the NAT Instances NAT Instance High Availability Each Subnet is associated with its own Route Table NAT1 is provisioned in Public Subnet 1 NAT2 is provisioned in Public Subnet 2 Private Subnet 1's Route Table (RT) has routing entry to NAT1 for internet traffic Private Subnet 2's Route Table (RT) has routing entry to NAT2 for internet traffic NAT Instance HA Illustration A script can be installed on both the NAT Instances to monitor each other and swap the routing table association if one of them fails. For example, if NAT1 detects that NAT2 is not responding to its ping requests, it can change the Route Table of Private Subnet 2 to NAT1 for internet traffic. Once NAT2 becomes operational again, a reverse swapping can happen. AWS has a pretty good documentation on this and a sample script for the swapping. Apart from HA, the above architecture also provides better overall throughput, since during normal conditions, both NAT Instances can be used to drive the outbound internet requirements of the VPC. If there are workloads that requires a lot of outbound internet connectivity, having more than one NAT Instance would make sense. Of course, you are still limited with one NAT Instance per Subnet.
March 28, 2013
by Raghuraman Balachandran
· 18,767 Views
article thumbnail
Accessing AWS Without Key and Secret
If you are using Amazon Web Services(AWS), you are probably aware how to access and use resources like SNS, SQS, S3 using key and secret. With the aws-java-sdk that is straight forward: AmazonSNSClient snsClient = new AmazonSNSClient( new BasicAWSCredentials("your key", "your secret")) One of the difficulties with this approach is storing the key/secret securely especially when there are different set of these for different environments. Using java property files, combined with maven or spring profiles might help a little bit to externalize the key/secret out of your source code, but still doesn't solve the issue of securely accessing these resources. Amazon has another service to help you in this occasion. No, no, this is not one more service to pay for in order to use the previous services. It is a free service, actually it is a feature of the amazon account. AWS Identity and Access Management (IAM) lets you securely control access to AWS services and resources for your users, you can manage users and groups and define permissions for AWS resources. One interesting functionality of IAM is the ability to assign roles to EC2 instances. The idea is you create roles with sets of permissions and you launch an EC2 instance by assigning the role to the instance. And when you deploy an application on that instance, the application doesn't need to have access key and secret in order to access other amazon resource. The application will use the role credentials to sign the requests. This has a number of benefits like a centralized place to control all the instances credentials, reduced risk with auto refreshing credentials and so on. Here is a short video demonstrating how to assign roles to an EC2 instance: Once you have role based security enabled for an instance, to access other resources from that instances you have to create and AwsClient using the chained credential provider: AmazonSNSClient snsClient = new AmazonSNSClient( new DefaultAWSCredentialsProviderChain()) The provider will search your system properties, environment properties and finally call instance metadata API to retrieve the role credentials in chain of responsibility fashion. It will also refresh the credentials in the background periodically depending on its expiration period. And finally, if you want to use role based security from Camel applications running on Amazon, all you have to do is create an instance of the client with configured chained credentials object and don't specify any key or secret: from("direct:start") .to("aws-sns://MyTopic?amazonSNSClient=#snsClient");
March 26, 2013
by Bilgin Ibryam
· 14,390 Views
article thumbnail
Connect Apache OFBiz with the Real World
What would you expect from someone who is OFBiz and Camel committer? To integrate them for fun? Fine, here it is. In addition to being fun, I believe this integration will be of real benefit for the OFBiz community, because despite the fact of being a complete ERP software, OFBiz lacks the ability to easily integrate with external systems. The goal of this project is instead of reinventing the wheel and trying to integrate OFBiz with each system separately, integrate it with Camel and let Camel do what it does best: connect your application with every possible protocol and system out there. Quick OFBiz introduction The Apache Open For Business Project is an open source, enterprise automation software. It consist mainly from two parts: A full-stack framework for rapid business application development. It has Entity Engine for the data layer (imagine something like iBATIS and Hibernate combined). It is the same entity engine that powers millions of Attlasian Jira instances. But don't get me wrong, it is not meant for usage outside of OFBiz framework, so use it only as OFBiz data layer. Service Engine - this might be hard to grasp for someone only with DDD background, but OFBiz doesn't have any domain objects. Instead for the service layer it uses SOA and has thousands of services that contains the business logic. A service is an atomic bit of isolated business logic, usually reading and updating the database. If you need you can make services triggering each other using ECAs(event-condition-action) which is kind of rule engine that allows define pre/post conditions for triggering other service calls when a service is executed. The service itself can be written written in java, groovy or simple language (an XML DSL for simple database manipulation) and usually requires authentication, authorisation and finally executed in a transaction. UI widgets - an XML DSL which allows you easily create complex pages with tables, forms and trees. And the really great thing about this framework is that 'The whole is greater than the sum of its parts' - all of the above layers works together amazingly: if you have an entity definition (a table) in your data layer, you can use it in your service layer during your service interface definition or its implementation. It takes one line of code(a long one) to create a service which has as input parameters the table columns and return the primary key as result of the service. Then if you are creating a screen with tables or forms, you can base it on your entity definitions or service definitions. It is again only few lines of code to create a form with fields mapping to a service or entity fields. Out of the box business applications. These are vertical applications for managing the full life cycle of a business domain like ordering, accounting, manufacturing and many more in a horizontally integrated manner. So creating an order from order or ecommerce application will interact with facility to check whether a product is available, and after the order is created will create accounting transaction in accounting application. Before the order is shipped from the facility, it will create invoices and once the invoice is paid, it will close the order. You get the idea. Camel in 30 seconds Apache Camel is an integration framework based on known Enterprise Integration Patterns(EIP). Camel can also be presented as consisting of two artifacts: The routing framework which can be defined using java, scala, xml DSL with all the EIPs like Pipe, Filter, Router, Splitter, Aggregator, Throttler, Normalizer and many more. Components and transformers ie all the different connectors to more than 100 different applications and protocols like: AMQP, AWS, web services, REST, MongoDB, Twitter, Websocket, you name it. If you can imagine a tool, that enables you to consume data from one system, then manipulate the data (transform, filter, split, aggregate) and send it to other systems, using a declarative, concise, English-like DSL without any boilerplate code - that's Apache Camel. Let OFBiz talk to all of the systems Camel do The main interaction point with OFBiz are either by using the Entity Engine for direct data manipulation or by calling services through Service Engine. The latter is preferred because it ensures that the user executing the service is authorised to do so, the operation is transactional to ensure data integrity, and also all the business rules are satisfied (there might be other services that have to be executed with ECA rules). So if we can create an OFBiz endpoint in Camel and execute OFBiz services from Camel messages, that would allow OFBiz to receive notifications from Camel endpoints. What about the other way around - making OFBiz notify Camel endpoints? The ideal way would be to have an OFBiz service that sends the IN parameters to Camel endpoints as message body and headers and return the reply message as OFBiz service response. If you are wondering: why is it so great, what is an endpoint, where is the real world, who is gonna win Euro2012... have a look at the complete list of available Camel components, and you will find out the answer. Running Camel in OFBiz container I've started an experimental ofbiz-camel project on github which allows you to do all of the above. It demonstrates how to poll files from a directory using Camel and create notes in OFBiz with the content of the file using createNote service. The project also has an OFBiz service, that enables sending messages from OFBiz to Camel. For example using that service it is possible to send a message to Camel file://data endpoint, and Camel will create a file in the data folder using the service parameters. The integration between OFBiz and Camel is achieved by running Camel in an OFBiz container as part of the OFBiz framework. This makes quite tight integration, but ensures that there will not be any http, rmi or any other overhead in between. It is still WIP and may change totally. Running Camel and OFBiz separately Another approach is KISS: run Camel and OFBiz as they are - separate applications, and let them interact with RMI, WS* or something else. This doesn't require any much coding, but only configuring both systems to talk to each other. I've created a simple demo camel-ofbiz-rmi which demonstrates how to listen for tweets with a specific keyword and store them in OFBiz as notes by calling createNote service using RMI. It uses Camel's twitter and rmi components and requires only configuration. Notice that this example demonstrates only one way interaction: from Camel to OFBiz. In order to invoke a Camel endpoint from OFBiz you can you have to write some RMI, WS* or other code. PS: I'm looking forward to hear your real world integration requirements for OFBiz.
March 25, 2013
by Bilgin Ibryam
· 18,207 Views · 1 Like
article thumbnail
MySQL 5.6 Compatible Percona XtraBackup 2.0.6 Released
This post comes from Hrvoje Matijakovic at the MySQL Performance Blog. Percona is glad to announce the release of Percona XtraBackup 2.0.6 for MySQL 5.6 on March 20, 2013. Downloads are available from our download site here and Percona Software Repositories. This release is the current GA (Generally Available) stable release in the 2.0 series. New Features: XtraBackup 2.0.6 has implemented basic support for MySQL 5.6, Percona Server 5.6 and MariaDB 10.0. Basic support means that these versions are are recognized by XtraBackup, and that backup/restore works as long as no 5.6-specific features are used (such as GTID, remote/transportable tablespaces, separate undo tablespace, 5.6-style buffer pool dump files). Bugs Fixed: Individual InnoDB tablespaces with size less than 1MB were extended to 1MB on the backup prepare operation. This led to a large increase in disk usage in cases when there are many small InnoDB tablespaces. Bug fixed #950334 (Daniel Frett, Alexey Kopytov). Fixed the issue that caused databases corresponding to inaccessible datadir subdirectories to be ignored by XtraBackup without warning or error messages. This was happening because InnoDB code silently ignored datadir subdirectories it could not open. Bug fixed #664986 (Alexey Kopytov). Under some circumstances XtraBackup could fail to copy a tablespace with a high --parallel option value and a low innodb_open_files value. Bug fixed #870119 (Alexey Kopytov). Fix for the bug #711166 introduced a regression that caused individual partition backups to fail when used with --include option in innobackupex or the --tables option in xtrabackup. Bug fixed #1130627 (Alexey Kopytov). innobackupex didn’t add the file-per-table setting for table-independent backups. Fixed by making XtraBackup auto-enable innodb_file_per_table when the --export option is used. Bug fixed #930062 (Alexey Kopytov). Under some circumstances XtraBackup could fail on a backup prepare with innodb_flush_method=O_DIRECT. Bug fixed #1055547 (Alexey Kopytov). innobackupex did not pass the --tmpdir option to the xtrabackup binary resulting in the server’s tmpdir always being used for temporary files. Bug fixed #1085099 (Alexey Kopytov). XtraBackup for MySQL 5.6 has improved the error reporting for unrecognized server versions. Bug fixed #1087219 (Alexey Kopytov). Fixed the missing rpm dependency for Perl Time::HiRes package that caused innobackupex to fail on minimal CentOS installations. Bug fixed #1121573 (Alexey Bychko). innobackupex would fail when --no-lock and --rsync were used in conjunction. Bug fixed #1123335 (Sergei Glushchenko). Fix for the bug #1055989 introduced a regression that caused xtrabackup_pid file to remain in the temporary dir after execution. Bug fixed #1114955 (Alexey Kopytov). Unnecessary debug messages have been removed from the XtraBackup output. Bug fixed #1131084 (Alexey Kopytov). Other bug fixes: bug fixed #1153334 (Alexey Kopytov), bug fixed #1098498 (Laurynas Biveinis), bug fixed #1132763 (Laurynas Biveinis), bug fixed #1142229 (Laurynas Biveinis), bug fixed #1130581 (Laurynas Biveinis). Release notes with all the bugfixes for Percona XtraBackup 2.0.6 for MySQL 5.6 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.
March 24, 2013
by Peter Zaitsev
· 2,925 Views
article thumbnail
Neo4j/Cypher: WITH, COLLECT, and EXTRACT
As I mentioned in my last post I’m trying to get the hang of the WITH statement in neo4j’s cypher query language and I found another application when trying to work out which opponents teams played on certain days. I started out with a query which grouped the data set by day and showed the opponents that were played on that day: START team = node:teams('name:"Manchester United"') MATCH team-[h:home_team|away_team]-game-[:on_day]-day RETURN DISTINCT day.name, COLLECT(TRIM(REPLACE(REPLACE(game.name, "Manchester United", ""), "vs", ""))) +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | day.name | opponents | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "Sunday" | ["Liverpool","Everton","Southampton","Liverpool","Newcastle United","Chelsea","Manchester City","Swansea City","Tottenham Hotspur"] | | "Wednesday" | ["Southampton","West Ham United","Newcastle United"] | | "Monday" | ["Everton"] | | "Saturday" | ["Reading","Fulham","Wigan Athletic","Tottenham Hotspur","Stoke City","Arsenal","Queens Park Rangers","Sunderland","West Bromwich Albion","Norwich City","Reading","Aston Villa","Norwich City","Fulham","Queens Park Rangers"] | | "Tuesday" | ["Wigan Athletic"] | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 5 rows The way we’ve got the opponents is a bit of a hack – the name of the two teams is in the ‘name’ property of a game node and we’ve removed ‘Manchester United’ and the word ‘vs’ to get the opponent’s name. I thought it’d be cool if we could separate the games on each day based on whether Manchester United were playing at home or away. With a lot of help from Wes Freeman we ended up with the following query which does the job: START team = node:teams('name:"Manchester United"') MATCH team-[h:home_team|away_team]-game-[:on_day]-day WITH day.name as d, game, team, h MATCH team-[:home_team|away_team]-game-[:home_team|away_team]-opp WITH d, COLLECT([type(h),opp.name]) AS games RETURN d, EXTRACT(c in FILTER(x in games: HEAD(x) = "home_team") : HEAD(TAIL(c))) AS home, EXTRACT(c in FILTER(x in games: HEAD(x) = "away_team") : HEAD(TAIL(c))) AS away We use a similar approach with COLLECT as in the previous post whereby we have a collection of tuples describing whether Manchester United were at home or not and who they were playing. A neat thing that Wes pointed out is that since there are only 2 teams per game we’re able to get the opponent node easily because it’s the only other node that can match the ‘home_team|away_team” relationship since we’ve already matched our team. If we run the query just up to the last WITH we get the following result: +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | d | games | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "Sunday" | [["home_team","Liverpool"],["home_team","Everton"],["away_team","Southampton"],["away_team","Liverpool"],["away_team","Newcastle United"],["away_team","Chelsea"],["away_team","Manchester City"],["away_team","Swansea City"],["away_team","Tottenham Hotspur"]] | | "Wednesday" | [["home_team","Southampton"],["home_team","West Ham United"],["home_team","Newcastle United"]] | | "Monday" | [["away_team","Everton"]] | | "Saturday" | [["home_team","Reading"],["home_team","Fulham"],["home_team","Wigan Athletic"],["home_team","Tottenham Hotspur"],["home_team","Stoke City"],["home_team","Arsenal"],["home_team","Queens Park Rangers"],["home_team","Sunderland"],["home_team","West Bromwich Albion"],["home_team","Norwich City"],["away_team","Reading"],["away_team","Aston Villa"],["away_team","Norwich City"],["away_team","Fulham"],["away_team","Queens Park Rangers"]] | | "Tuesday" | [["away_team","Wigan Athletic"]] | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 5 rows We then use the FILTER function to choose either the opponents Manchester United played at home or away and then we use the EXTRACT function to get the opponent from the tuple: +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | d | home | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "Sunday" | ["Liverpool","Everton"] | | "Wednesday" | ["Southampton","West Ham United","Newcastle United"] | | "Monday" | [] | | "Saturday" | ["Reading","Fulham","Wigan Athletic","Tottenham Hotspur","Stoke City","Arsenal","Queens Park Rangers","Sunderland","West Bromwich Albion","Norwich City"] | | "Tuesday" | [] | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 5 rows +-----------------------------------------------------------------------------------------------------------------------------+ | d | away | +-----------------------------------------------------------------------------------------------------------------------------+ | "Sunday" | ["Southampton","Liverpool","Newcastle United","Chelsea","Manchester City","Swansea City","Tottenham Hotspur"] | | "Wednesday" | [] | | "Monday" | ["Everton"] | | "Saturday" | ["Reading","Aston Villa","Norwich City","Fulham","Queens Park Rangers"] | | "Tuesday" | ["Wigan Athletic"] | +-----------------------------------------------------------------------------------------------------------------------------+ (I ran the query twice alternating between the last two lines so that it’s readable here. In actual fact the away teams would be in a column next to the home teams) I thought it was quite interesting how many games Manchester United play away on a Sunday – I think all of those games were probably televised so I thought they’d be more evenly split between home and away matches. Adding televised matches is perhaps another layer to add to the graph. It’s probably more useful to summarise how many games were played on each day at home and away rather than who they’re against and we can use the REDUCE function to do this: START team = node:teams('name:"Manchester United"') MATCH team-[h:home_team|away_team]-game-[:on_day]-day WITH day.name as dayName, game, team, h MATCH team-[:home_team|away_team]-game-[:home_team|away_team]-opp WITH dayName, COLLECT([type(h),opp.name]) AS games RETURN dayName, REDUCE(homeGames=0, game in EXTRACT(c in FILTER(x in games: head(x) = "home_team") : HEAD(TAIL(c))) : homeGames + 1) as home, REDUCE(awayGames=0, game in EXTRACT(c in FILTER(x in games: head(x) = "away_team") : HEAD(TAIL(c))) : awayGames + 1) as away, REDUCE(totalGames=0, game in games : totalGames + 1) as total +-----------------------------------+ | dayName | home | away | total | +-----------------------------------+ | "Sunday" | 2 | 7 | 9 | | "Wednesday" | 3 | 0 | 3 | | "Monday" | 0 | 1 | 1 | | "Saturday" | 10 | 5 | 15 | | "Tuesday" | 0 | 1 | 1 | +-----------------------------------+ 5 rows An alternative way of writing the initial query would be the following which Michael Hunger suggested on the thread: START team = node:teams('name:"Manchester United"') MATCH p=team-[:home_team|away_team]-game-[:home_team|away_team]-(), game-[:on_day]-day WITH day.name as dayName, COLLECT([LAST(p), HEAD(RELS(p))]) AS opponents WITH dayName, EXTRACT(y in FILTER(x in opponents: TYPE(HEAD(TAIL(x))) = "home_team") : HEAD(y)) AS home, EXTRACT(y in FILTER(x in opponents : TYPE(HEAD(TAIL(x))) = "away_team") : HEAD(y)) AS away RETURN dayName, EXTRACT(team in home: team.name) AS homeOpponents, EXTRACT(team in away: team.name) AS awayOpponents ORDER BY dayName Here we take a slightly different approach where we make use of functions that we can apply to a matching path. We create a collection of tuples where LAST(p) matches the opponent node and HEAD(RELS(p)) matches the ‘home_team’ or ‘away_team’ relationship accordingly. We then filter the collection to find the times that we played at home and away. This is done by taking the second value from the tuple and then calling TYPE on it which either returns ‘home_team’ or ‘away_team’. We then extract the first value from the tuple which is the opponent node. In the last part of the query we extract the name from the opponent nodes.
March 22, 2013
by Mark Needham
· 17,881 Views
article thumbnail
OpenJPA: Memory Leak Case Study
This article will provide the complete root cause analysis details and resolution of a Java heap memory leak (Apache OpenJPA leak) affecting an Oracle Weblogic server 10.0 production environment. This post will also demonstrate the importance to follow the Java Persistence API best practices when managing the javax.persistence.EntityManagerFactory lifecycle. Environment specifications Java EE server: Oracle Weblogic Portal 10.0 OS: Solaris 10 JDK: Oracle/Sun HotSpot JVM 1.5 32-bit @2 GB capacity Java Persistence API: Apache OpenJPA 1.0.x (JPA 1.0 specifications) RDBMS: Oracle 10g Platform type: Web Portal Troubleshooting tools Quest Foglight for Java (Java heap monitoring) MAT (Java heap dump analysis) Problem description & observations The problem was initially reported by our Weblogic production support team following production outages. An initial root cause analysis exercise did reveal the following facts and observations: Production outages were observed on regular basis after ~2 weeks of traffic. The failures were due to Java heap (OldGen) depletion e.g. OutOfMemoryError: Java heap space error found in the Weblogic logs. A Java heap memory leak was confirmed after reviewing the Java heap OldGen space utilization over time from Foglight monitoring tool along with the Java verbose GC historical data. Following the discovery of the above problems, the decision was taken to move to the next phase of the RCA and perform a JVM heap dump analysis of the affected Weblogic (JVM) instances. JVM heap dump analysis ** A video explaining the following JVM Heap Dump analysis is now available here. In order to generate a JVM heap dump, the supported team did use the HotSpot 1.5 jmap utility which generated a heap dump file (heap.bin) of about ~1.5 GB. The heap dump file was then analyzed using the Eclipse Memory Analyzer Tool. Now let’s review the heap dump analysis so we can understand the source of the OldGen memory leak. MAT provides an initial Leak Suspects report which can be very useful to highlight your high memory contributors. For our problem case, MAT was able to identify a leak suspect contributing to almost 600 MB or 40% of the total OldGen space capacity. At this point we found one instance of java.util.LinkedList using almost 600 MB of memory and loaded to one of our application parent class loader (@ 0x7e12b708). The next step was to understand the leaking objects along with the source of retention. MAT allows you to inspect any class loader instance of your application, providing you with capabilities to inspect the loaded classes & instances. Simply search for the desired object by providing the address e.g. 0x7e12b708 and then inspect the loaded classes & instances by selecting List Objects > with outgoing references. As you can see from the above snapshot, the analysis was quite revealing. What we found was one instance of org.apache.openjpa.enhance.PCRegistry at the source of the memory retention; more precisely the culprit was the _listeners field implemented as a LinkedList. For your reference, the Apache OpenJPA PCRegistry is used internally to track the registered persistence-capable classes. Find below a snippet of the PCRegistry source code from Apache OpenJPA version 1.0.4 exposing the _listeners field. /** * Tracks registered persistence-capable classes. * * @since 0.4.0 * @author Abe White */ publicclass PCRegistry { // DO NOT ADD ADDITIONAL DEPENDENCIES TO THIS CLASS privatestaticfinal Localizer _loc = Localizer.forPackage (PCRegistry.class); // map of pc classes to meta structs; weak so the VM can GC classes privatestaticfinal Map _metas = new ConcurrentReferenceHashMap (ReferenceMap.WEAK, ReferenceMap.HARD); // register class listeners privatestaticfinal Collection _listeners = new LinkedList(); …………………………………………………………………………………… Now the question is why is the memory footprint of this internal data structure so big and potentially leaking over time? The next step was to deep dive into the _listeners LinkedLink instance in order to review the leaking objects. We finally found that the leaking objects were actually the JDBC & SQL mapping definitions (metadata) used by our application in order to execute various queries against our Oracle database. A review of the JPA specifications, OpenJPA documentation and source did confirm that the root cause was associated with a wrong usage of the javax.persistence.EntityManagerFactory such of lack of closure of a newly created EntityManagerFactory instance. If you look closely at the above code snapshot, you will realize that the close() method is indeed responsible to cleanup any recently used metadata repository instance. It did also raise another concern, why are we creating such Factory instances over and over… The next step of the investigation was to perform a code walkthrough of our application code, especially around the life cycle management of the JPA EntityManagerFactory and EntityManager objects. Root cause and solution A code walkthrough of the application code did reveal that the application was creating a new instance of EntityManagerFactory on each single request and not closing it properly. public class Application { @Resource private UserTransaction utx = null; // Initialized on each application request and not closed! @PersistenceUnit(unitName = "UnitName") private EntityManagerFactory emf = Persistence.createEntityManagerFactory("PersistenceUnit"); public EntityManager getEntityManager() { return this.emf.createEntityManager(); } public void businessMethod() { // Create a new EntityManager instance via from the newly created EntityManagerFactory instance // Do something... // Close the EntityManager instance } } This code defect and improver use of JPA EntityManagerFactory was causing a leak or accumulation of metadata repository instances within the OpenJPA _listeners data structure demonstrated from the earlier JVM heap dump analysis. The solution of the problem was to centralize the management & life cycle of the thread safe javax.persistence.EntityManagerFactory via the Singleton pattern. The final solution was implemented as per below: Create and maintain only one static instance of javax.persistence.EntityManagerFactory per application class loader and implemented via the Singleton Pattern. Create and dispose new instances of EntityManager for each application request. Please review this discussion from Stackoverflow as the solution we implemented is quite similar. Following the implementation of the solution to our production environment, no more Java heap OldGen memory leak is observed. Please feel free to provide your comments and share your experience on the same.
March 21, 2013
by Pierre - Hugues Charbonneau
· 9,226 Views
article thumbnail
Entity Framework 5/6 vs NHibernate 3 – The State of Affairs
It has been almost two years since I've last compared NHibernate and Entity Framework, so with the recentalpha version of EF 6, it's about time to look at the current state of affair. I've been using NHibernate for more than 6 years so obviously I'm a bit biased. But I can't ignore that EF's feature list is growing and some of the things I like about the NHibernate eco-system such as code-based mappings and automatic migrations have found a place in EF. Moreover, EF is now open-source, so they're accepting pull requests as well. Rather than doing a typical feature-by-feature comparison, I'll be looking at those aspects of an object-relational mapper that I think are important when building large-scale enterprise applications. So let's see how those frameworks match up. Just for your information, I've been looking at Entity Framework 6 Alpha 3 and NHibernate 3.3.1GA. Support for rich domain models When you're practicing Domain Driven Design it is crucial to be able to model your domain using the right object-oriented principles. For example, you should be able to encapsulate data and only expose properties if that is needed by the functional requirements. If you model an association using a UML qualifier, you should be able to implement that using a IDictionary. Similarly, collection properties should be based on IEnumerableor any of the newer read-only collections introduced in .NET 4.5 so that your collections are protected by external changes. NHibernate supports all these requirements and adds quite a lot of flexibility like ordered and unordered sets. Unfortunately, neither EF5 or 6 supports mapping private fields (yet) nor can you directly use a dictionary class. In fact, EF only supports ICollections of entities, so collections of value objects are out of the question. One notable type that still isn't fully supported is the enum. It was introduced in EF5, but only if you target .NET 4.5. EF6 will fortunately fixes this so that it is also available in .NET 4.0 applications. A good ORM should also allow your domain model to be as persistence ignorant as possible. In other words, you shouldn't need to decorate your classes with attributes or subclass some framework-provided base-class (something you might remember from Linq2Sql). Both frameworks impose some limitations such as protected default constructors or virtual members, but that's not going to be too much of an issue. Vendor support Although Microsoft makes us believe that corporate clients only use SQL Server or SQL Azure, we all know that the opposite is much more true. The big drawback of EF compared to NH is that the latter has all the providers built-in. So whenever a new version of the framework is released you don't have to worry about vendor support. Both EF5 and NH 3.3 support various flavors of SQL Server/Azure, SQLite, PostgreSQL, Oracle, Sybase, Firebird and DB2. Most of these providers originate from EF 4, so they don’t support code-first (migrations) or the new DBContext façade. EF6 is still an alpha release and its provider model seems to contain some breaking changes so don't expect any support for anything other than Microsoft's own databases anytime soon. Support switching databases for automated testing purposes Our architecture uses a repository pattern implementation that allows swapping the actual data mapper on-the-fly. Since we're heavily practicing Test Driven Development, we use this opportunity to approach our testing in different ways. We use an in-memory Dictionary for unit tests where the subject-under-test simply needs some data to be setup in a specific way (using Test Data Builders). We use an in-memory SQLite database when we want to verify that NHibernate can process the LINQ query correctly and performs sufficiently using NHProf. We use an actual SQL Server for unit tests that verify that our mapping against the database schema is correct. We have some integration code that interacts with a third-party Oracle system that is tested on SQL Server on a local development box, but uses Oracle on our automated SpecFlow build. So you can imagine switching between database providers without changing the mapping code is quite essential for us. During development, we decided that we did not care about the actual class that represented the integration tables, so we tried to use the Entity Framework model-first approach. Unfortunately, when you do that, you're basically locking yourself to a particular database. After switching back to our normal NHibernate approach, changing the connection string during deployment was enough to switch between SQL Server and Oracle. Fortunately this has also been possible since EF 4.1 and Jason Short wrote a good blog post about that. Automatic schema migration When you're practicing an agile methodology such as Scrum, you'll probably try to deliver a potentially shippable release at the end of every sprint. Part of being agile is that functionality can be added at any time where some of that might be affecting the database schema. The most traditional way of dealing with that is to generate or hand-write SQL scripts that are applied during deployment. The problem with SQL scripts is that they are tedious to write, might contain bugs, and are often closely coupled to the database vendor. Wouldn't it be great if the ORM framework would support some way of figuring out what version of the schema is being used and automatically upgrade the database scheme as part of your normal development cycle? Or what about the ability to revert the schema to an older version? The good news that this exists for both frameworks, but with a caveat. For instance, NHibernate doesn't support this out-of-the-box (although you can generate the initial schema). But with the help of another open-source project, Fluent Migrations, you can get very far. We currently use it in an enterprise system and it works like a charm. The caveat is that the support for the various databases is not always at the same level. For instance, SQLite doesn't allow renaming a column and Fluent Migrations doesn't support it (although theoretically it could create a new column, copy the old data over, and drop the old column). As an example of a fluent migration supporting both an update as well as a rollback, check out this snippet. [Migration(1)] public class TestCreateAndDropTableMigration: Migration { public override void Up() { Create.Table("TestTable") .WithColumn("Id").AsInt32().NotNullable().PrimaryKey().Identity() .WithColumn("Name").AsString(255).NotNullable().WithDefaultValue("Anonymous"); Create.Table("TestTable2") .WithColumn("Id").AsInt32().NotNullable().PrimaryKey().Identity() .WithColumn("Name").AsString(255).Nullable() .WithColumn("TestTableId").AsInt32().NotNullable(); Create.Index("ix_Name").OnTable("TestTable2").OnColumn("Name").Ascending() .WithOptions().NonClustered(); Create.Column("Name2").OnTable("TestTable2").AsBoolean().Nullable(); Create.ForeignKey("fk_TestTable2_TestTableId_TestTable_Id") .FromTable("TestTable2").ForeignColumn("TestTableId") .ToTable("TestTable").PrimaryColumn("Id"); Insert.IntoTable("TestTable").Row(new { Name = "Test" }); } public override void Down() { Delete.Table("TestTable2"); Delete.Table("TestTable"); } } Entity Framework has something similar built-in since version 5. It's called Code-First Migrations and looks surprisingly similar to Fluent Migrations. Just like the NHibernate solution has some limitations, EF's has as well and that is support from vendors. At the time of this writing not a single vendor supports Code-First Migrations. On the other hand, if you're only using SQL Server, SQL Express, SQL Compact or SQL Azure, there's nothing from stopping you to use it. Code-based mapping If you remember the old days of NHibernate, you might recall those ugly XML files that were needed to configure the mapping of your .NET classes to the underlying database. Fluent NHibernate has been offering a very nice fluent API for replacing those mappings with code. Not only does this prevent errors in the XML, it is also a very refactor-friendly approach. We've been using it for years and the extensive (and customizable) convention-basedmapping engine even allows auto-mapping entities to tables without the need of explicit mapping code. Strangely enough, NHibernate 3.2 has introduced a brand new fluent API that directly competes with Fluent NHibernate. Because of lack of documentation, I've never bothered to look at it, especially since Fluent NHibernate has been doing its job remarkedly. But during my research for this post I noticed that Adam Bar has written a very extensive series on the new API, and he actually managed to raise a renewed interest in the new API. Until Entity Framework 4.1 the only way to set-up the mapping was through its data model designer (not to be confused with an OO designer). But apparently the team behind it learned from Fluent Nhibernate and decided to introduce their own code-first approach, surprisingly named Code-First. In terms of convention-based mapping, it was quite limited, especially compared to Fluent NHibernate. EF 6 is going to introduce a lot of hooks for changing the conventions, both on property level as well as on class level. Supporting custom types and collections One of the guidelines in my own coding guidelines is to consider wrapping primitive types with more domain-specific types. Conincedentily it is also one of the rules of Object Calisthenetics. In Domain Driven Design these types are called Value Objects and their purpose is to encapsulate all the data and behavior associated with a recurring domain concept. For instance, rather than having two separate DateTime properties to represent a period and separate methods for determining whether some point of time occurs within that period, I would prefer to have a dedicated Period class that contains all that logic. This approach results in a design that contains less duplication and is easier to understand. Contrary to NHibernate, Entity Framework doesn't offer anything like this and as far as I know, doesn't plan to. NH on the other hand offers a myriad of options for creating custom types, custom collections or even composite types. Granted, you have to do a bit of digging to find the right documentation (and StackOverflow is your friend here), but if you do, it really helps to enrich your domain model. Query flexibility Some would argue that EF's LINQ support is much more mature, and until NH 3.2 I would have agreed. But since then, NH's LINQ support has improved substantially. For instance, during development we use an in-memory SQLite database in our query-related unit tests to make sure the query can actually be executed by NH. Before 3.2, we regularly ran into strange cast exceptions or exceptions because of unsupported expressions. Since 3.2, we've never seen those anymore. I haven't tried to run all our existing queries against EF, but I have no doubts that it would have any issue with it. In terms of non-LINQ querying, EF supports Entity SQL as well as native SQL (although I don’t know if all vendors are supported). NHibernate offers the HQL, QueryOver and Criteria APIs next to native vendor-specific SQL. Both frameworks support stored procedures. All in all plenty of flexibility. Extensibility EF 6 uses the service locator pattern to allow replacing certain aspects of the framework at runtime. This is a good starting point for extensibility, but unfortunately the team always demonstrates this by replacing the pluralization service. As if someone would actually like to do that. Nonetheless, I'm sure the team's plan is to expose more extension points in the near future. NH has a very extensive set of observable collections called listeners that can be used to hook into virtually every part of the framework. We've been using it for cross-cutting concerns, for hooking up auditing services and also for some CQRS related aspects. You can also tweak a lot of NH's behavior throughconfiguration properties (although you'll have to Google…eh…Bing for the right examples). Other notable features Each of the frameworks has some unique features that don't fit in any of the other topics I've discussed up to now. A short summary: Entity Framework 6 adds async/await support, a feature that NHibernate may never get due to the impact it has on the entire architecture. It also has built-in support for automatically reconnecting to the database, which is particularly useful for a known issue with SQL Azure. Both NHibernate as well as the Entity Framework support .NET 4.5, but only the latter gains some significant performance improvements from it. NHibernate also offers a unique feature called Futures that you can use to compose a set of queries and send them to the database as a single request. The Entity Framework allows creating a DBContext with an existing open connection. As far as I know that's not possible in NHibernate. Version 6 of the Entity Framework adds spatial support, something for which you need a 3rd party library to get that in NHibernate. Pedro Sousa wrote an in-depth blog post series about that. Wrap-up The big difference between Entity Framework and NHibernate from a developer perspective is that the former offers an integrated set of services whereas the latter requires the combination of several open-source libraries. That on itself is not a big issue - that's why we have NuGet, don't we? - but we've noticed that those libraries are not always up-to-date soon enough when new NHibernate versions are released. From that same perspective NHibernate does offer a lot of flexibility and clearly shows its maturity. On the other hand, you could also see that as a potential barrier for new developers. It's just much easier to get started with Entity Framework than with NH. The documentation on Entity Framework is quite comprehensive and even the new functionality for version 6 is extensively documented using feature specifications. The NHibernatedocumentation has always been lagging behind a bit. For instance, the new mapping system is not even mentioned even though the reference documentation mentions the correct version. The information is available, but you just have to search a bit. The fact that the EH is being developed using a Git source control repository is also a big plus. Just look at themany pull requests they've been taking in. On the other hand, to my surprise somebody moved the NHibernate source code to GitHub while I wasn't paying attention. So on that aspect they are equals. And does NHibernate have a future at all? Some would argue it is dead already. I don't agree though. Just look at the statistics on GitHub; 240 forks, almost 200 pull requests and a lot of commits in the last few months. I do agree that NoSQL solutions like RavenDB are extremely powerful and offer a lot of fun and flexibility for development, but the fact of the matter is that they are still not widely accepted by enterprises with a history in SQL Server or Oracle. Nevertheless, the RAD aspect of EF cannot be ignored and is important for small short-running projects where SQL Server is the norm. And for those projects, I would wholeheartedly recommend EF. But for the bigger systems where a NoSQL solution is not an option, especially those based on Domain Driven Design, NHibernate is still the king in town. As usual, I might have overlooked a feature or misinterpreted some aspect of both frameworks. If so, leave a comment, email me or drop me a tweet at @ddoomen.
March 20, 2013
by Dennis Doomen
· 38,956 Views
article thumbnail
MongoDB: Replication Lag and the Facts of Life
Curator's Note: The content of this article was originally written over at the MongoLab blog. So you’re checking in on your latest awesome application one day — it’s really getting traction! You’re proud of its uptime record, thanks in part to the MongoDB replica set underneath it. But now … something’s wrong. Users are complaining that some of their data has gone missing. Others are noticing stuff they deleted has suddenly reappeared. What’s going on?!? Don’t worry… we’ll get to the bottom of this! In doing so, we’ll examine a source of risk that’s easy to overlook in a MongoDB application: replication lag — what it means, why it happens, and what you can do about it. Here’s what we’re going to cover: What is replication lag? Why is lag problematic? What causes a secondary to fall behind? How do I measure lag? How do I monitor for lag? What can I do to minimize lag? Tip #1: Make sure your secondary has enough horsepower Tip #2: Consider adjusting your write concern Tip #3: Plan for index builds Tip #4: Take backups without blocking Tip #5: Be sure capped collections have an _id field & a unique index Tip #6: Check for replication errors Don’t let replication lag take you by surprise. Continuing this cautionary tale… Seriously, wtf?! You were doing everything right! Using MongoDB with a well-designed schema and lovingly-tuned indexes, your application back-end has been handling thousands of transactions per second without breaking a sweat. You’ve got multiple nodes arranged in a replica set with no single point of failure. Your application tier’s Mongo driver connections are aware of the replica set and can follow changes in the PRIMARY node during failover. All critical writes are “safe” writes. Your app has been up without interruption for almost six months now! How could this have happened? This unsettling situation has the hallmarks of an insidious foe in realm of high-availability data stewardship: unchecked replication lag. Closely monitoring a MongoDB replica set for replication lag is critical. What is replication lag? As you probably know, like many data stores MongoDB relies on replication — making redundant copies of data — to meet design goals around availability. In a perfect world, data replication would be instantaneous; but in reality, thanks to pesky laws of physics, some delay is inevitable — it’s a fact of life. We need to be able to reason about how it affects us so as to manage around the phenomenon appropriately. Let’s start with definitions… For a given secondary node, replication lag is the delay between the time an operation occurs on the primary and the time that same operation gets applied on the secondary. For the replica set as a whole, replication lag is (for most purposes) the smallest replication lag found among all its secondary nodes. In a smoothly running replica set, all secondaries closely follow changes on the primary, fetching each group of operations from its oplog and replaying them approximately as fast as they occur. That is, replication lag remains as close to zero as possible. Reads from any node are then reasonably consistent; and, should the current primary become unavailable, the secondary that assumes the PRIMARY role will be able to serve to clients a dataset that is almost identical to the original. For a variety of reasons, however, secondaries may fall behind. Sometimes elevated replication lag is transient and will remedy itself without intervention. Other times, replication lag remains high or continues to rise, indicating a systemic problem that needs to be addressed. In either case, the larger the replication lag grows and the longer it remains that way, the more exposure your database has to the associated risks. Why is lag problematic? Significant replication lag creates failure modes that can be problematic for a MongoDB database deployment that is meant to be highly available. Here’s why: If your replica set fails over to a secondary that is significantly behind the primary, a lot of un-replicated data may be on the original primary that will need to be manually reconciled. This will be painful or impossible if the original primary is unrecoverable. If the failed primary cannot be recovered quickly, you may be forced to run on a node whose data is not up-to-date, or forced to take down your database altogether until the primary can be recovered. If you have only one secondary, and it falls farther behind than the earliest history retained in the primary’s oplog, your secondary will require a full resynchronization from the primary. During the resync, your cluster will lack the redundancy of a valid secondary; the cluster will not return to high availability until the entire data set is copied. If you only take backups from your secondary (which we highly recommend), backups must be suspended for the duration of the resync. Replication lag makes it more likely that results of any read operations distributed across secondaries will be inconsistent. A “safe” write with ‘w’ > 1 — i.e., requiring multiple nodes acknowledge the write before it returns — will incur latency proportional to the current replication lag, and/or may time out. Strictly speaking, the problem of replication lag is distinct from the problem of data durability. But as the last point above regarding multi-node write concern illustrates, the two concepts are most certainly linked. Data that has not yet been replicated is not completely protected from single-node failure; and client writes specified to be safe from single-node failure must block until replication catches up to them. What causes a secondary to fall behind? In general, a secondary falls behind on replication any time it cannot keep up with the rate at which the primary is writing data. Some common causes: Secondary is weak To have the best chance of keeping up, a secondary host should match the primary host’s specs for CPU, disk IOPS, and network I/O. If it’s outmatched by the primary on any of these specs, a secondary may fall behind during periods of sustained write activity. Depending on load this will, at best, create brief excursions in replication lag and, at worst, cause the secondary to fall irretrievably behind. Bursty writes In the wake of a burst of write activity on the primary, a secondary may not be able to fetch and apply the ops quickly enough. If the secondary is underpowered, this effect can be quite dramatic. But even when the nodes have evenly matched specs, such a situation is possible. For example, a command like: db.coll.update({x: 7}, {$set: {y: 42}, {multi: true} can place an untold number of separate “update” ops in the primary’s oplog. To keep up, a secondary must fetch those ops (max 4MB at a time for each getMore command!), read into RAM any index and data pages necessary to satisfy each _id lookup (remember: each oplog entry references a single target document by _id; the original query about “x” is never directly reflected the oplog), and finally perform the update op, altering the document and placing the corresponding entry into its oplog; and it must do all this in the same amount of time that the primary does merely the last step. Multiplied by a large enough number of ops, that disparity can amount to a noticeable lag. Map/reduce output A specific type of the extreme write burst scenario might be a command like: db.coll.mapReduce( ... { out: other_coll ... }) Index build It may surprise you to learn that, even if you build an index in the background on the primary, it will be built in the foreground on each secondary. There is currently no way to build indexes in the background on secondary nodes (cf. SERVER-2771). Therefore, whenever a secondary builds an index, it will block all other operations, including replication, for the duration. If the index builds quickly, this may not be a problem; but long-running index builds can swiftly manifest as significant replication lag. Secondary is locked for backup One of the suggested methods for backing up data in a replica set involves explicitly locking a secondary against changes while the backup is taken. Assuming the primary is still conducting business as usual, of course replication lag will climb until the backup is complete and the lock is released. Secondary is offline Similarly, if the secondary is not running or cannot reach the primary for whatever reason, it cannot make progress against the replication backlog. When it rejoins the replica set, the replication lag will naturally reflect the time spent away. How do I measure lag? Run the db.printSlaveReplicationInfo() command To determine the current replication lag of your replica set, you can use the mongo shell and run the db.printSlaveReplicationInfo() command. rs-ds046297:PRIMARY db.printSlaveReplicationInfo() source: ds046297-a1.mongolab.com:46297 syncedTo: Tue Mar 05 2013 07:48:19 GMT-0800 (PST) = 7475 secs ago (2.08hrs) source: ds046297-a2.mongolab.com:46297 syncedTo: Tue Mar 05 2013 07:48:19 GMT-0800 (PST) = 7475 secs ago (2.08hrs) More than 2 hours — whoa, isn’t that a lot? Maybe! See, those “syncedTo” times don’t have much to do with the clock on the wall; they’re just the timestamp on the last operation that the replica has copied over from the PRIMARY. If the last write operation on the PRIMARY happened 5 minutes ago, then yes: 2 hours is a lot. On the other hand, if the last op was 2.08 hours ago, then this is golden! To fill in that missing piece of the story, we can use the db.printReplicationInfo() command. rs-ds046297:PRIMARY db.printReplicationInfo() configured oplog size: 1024MB log length start to end: 5589secs (1.55hrs) oplog first event time: Tue Mar 05 2013 06:15:19 GMT-0800 (PST) oplog last event time: Tue Mar 05 2013 07:48:19 GMT-0800 (PST) now: Tue Mar 05 2013 09:53:07 GMT-0800 (PST) Let’s see … PRIMARY’s “oplog last event time” – SECONDARY’s “syncedTo” = 0.0. Yay. As fun as that subtraction may be, it’s seldom called for. If there is a steady flow of write operations, the last op on the PRIMARY will usually have been quite recent. Thus, a figure like “2.08 hours” should probably raise eyebrows; you would expect to see a nice low number there instead — perhaps as high as a few seconds. And, having seen a low number, there would be no need to qualify its context with the second command. Examine the “repl lag” graph in MMS You can also view recent and historical replication lag using the MongoDB Monitoring Service (MMS) from 10gen. On the Status tab of each SECONDARY node, you’ll find the repl lag graph: How do I monitor for lag? It is critical that the replication lag of your replica set(s) be monitored continuously. Since you have to sleep occasionally, this is a job best done by robots. It is essential that these robots be reliable, and that they notify you promptly whenever a replica set is lagging too far behind. Here are a couple ways you can make sure this is taken care of: If MongoLab is hosting your replica set, relax! For any multi-node, highly-available replica set we host for you, you can monitor replication lag in our UI and by default you will receive automated alerts whenever the replication lag exceeds 10 minutes. You can also set up an alert using the MMS system. Its exciting new features allow you to configure a replication lag alert: What can I do to minimize lag? Out of courtesy (for them or for ourselves), we would like to make those lag-monitoring automata’s lives as boring as possible. Here are some tips: Tip #1: Make sure your secondary has enough horsepower It’s not uncommon for people to run under-powered secondaries to save money — this can be fine if the write load is light. But in scenarios where the write load is heavy, the secondary might not be able to keep up with the primary. To avoid this, you should beef up your secondary so that it’s as powerful as your primary. Specifically, a SECONDARY node should have enough network bandwidth that it can retrieve ops from the PRIMARY’s oplog at roughly the rate they’re created and also enough storage throughput that it can apply the ops — i.e., read any affected documents and their index entries into RAM, and commit the altered documents back to disk — at that same rate. CPU rarely becomes a bottleneck, but it may need to be considered if there are many index keys to compute and insert for the documents that are being added or changed. Tip #2: Consider adjusting your write concern Your secondary may be lagging simply because your primary’s oplog is filling up faster than it can be replicated. Even with an equally-brawny SECONDARY node, the PRIMARY will always be capable of depositing 4MB in its memory-mapped oplog in a fraction of the time those same 4MB will need to make it across a TCP/IP connection. One viable way to apply some back-pressure to the primary might be to adjust your write concern. If you are currently using a write concern that does not acknowledge writes (aka “fire-and-forget” mode), you can change your write concern to require an acknowledgement from the primary (w:1) and/or a write to the primary’s journal (j:true). Doing so will slow down the rate at which the concerned connection can generate new ops needing replication. Other times it may be appropriate to use a ‘w’ > 1 or a ‘w’ set to “majority” to ensure that each write to the cluster is replicated to more than one node before the command returns. Requiring confirmation that a write has replicated to secondaries will effectively guarantee that those secondaries have caught up (at least up to the timestamp of this write) before the next command on the same connection can produce more ops in the backlog. As previously alluded to, choosing the most appropriate write concern for the data durability requirements of your application — or for particular critical write operations within the application — is something you must give thought to irrespective of the replication lag issue we’re focusing on here. But you should be aware of the interrelationship: just as the durability guarantee of w>1 can be used as a means of forcing a periodic “checkpoint” on replication, excessive replication lag can show up as a surprisingly high latency (or timeout) for that very occasional critical write operation where you’ve used “w: majority” to make sure it’s truly committed. Adjust to taste Having servers acknowledge every write can be a big hit to system throughput. If it makes sense for your application, you can amortize that penalty by doing inserts in batches, requiring acknowledgement only at the end of each batch. The smaller the batch, the greater the back-pressure on PRIMARY data creation rate, and correspondingly greater potential adverse impact to overall throughput. Don’t overdo it Using a large value for ‘w’ can itself be problematic. It represents a demand that w nodes finish working through their existing backlog before the command returns. So, if replication lag is high (in the sense of there being a large volume of data waiting to copy over) when the write command is issued, the command execution time will suffer a proportionally high latency. Also, if enough nodes go offline such that ‘w’ cannot be satisfied, you have effectively locked up your database. This is basically the opposite of “high availability.” Tip #3: Plan for index builds As mentioned earlier, an index build on a secondary is a foreground, blocking operation. If you’re going to create an index that is sizeable, perhaps you can arrange to do it during a period of low write activity on the primary. Alternately, if you have more than one secondary, you can follow the steps here to minimize the impact of building large indexes. Tip #4: Take backups without blocking Earlier we discussed the technique of locking the secondary to do a backup. There are other alternatives to consider here, including filesystem snapshots and “point-in-time” backups using the “--oplog” option of mongodump without locking. These are preferable to locking the secondary during a period of active writes if there’s any chance you’ll use the secondary for anything other than backups. Tip #5: Be sure capped collections have an _id field & a unique index Reliable replication is not possible unless there is a unique index on the _id field. Before MongoDB version 2.2, capped collections did not have an _id field or index by default. If you have a collection like this, you should create an index on the _id field, specifying unique: true. Failing to do this can, in certain situations, cause replication to halt entirely. So … this should not be regarded as optional. Tip #6: Check for replication errors If you see that replication lag is only increasing (and never falling), your replica set could be experiencing replication errors. To check for errors, run rs.status() and look at the errmsg field in the result. Additionally, check the log file of your secondary and look for error messages there. One specific example: if you see “RS102 too stale to catch up” in the secondary’s mongodb.log or in the errmsg field when running rs.status(), it means that secondary has fallen so far behind that there is not enough history retained by the primary (its “oplog size”) to bring it up to date. In this case, your secondary will require a full resynchronization from the primary. In general, though, what you do in response to an error depends on the error. Sometimes you can simply restart the mongod process for your secondary; but the majority of the time you will need to understand the root cause of the error before you can fix the problem. Don’t let replication lag take you by surprise. At the end of the day, replication lag is just one more source of risk in any high-availability system that we need to understand and design around. Striking the right balance between performance and “safety” of write operations is an exercise in risk management — the “right” balance will be different in different situations. For an application on a tight budget with occasional spikes in write volume, for example, you might decide that a large replication lag in the wake of those spikes is acceptable given the goals of the application, and so an underpowered secondary makes sense. At the opposite extreme, for an application where every write is precious and sacred, the required “majority” write concern will mean you have essentially no tolerance for replication lag above the very minimum possible. The good news is that MongoDB makes this all very configurable, even on an operation by operation basis. We hope this article has given you some insight into the phenomenon of replication lag that will enable you to reason about the risk it poses for a high-availability MongoDB application, and armed you with some tools for managing it. As always, let us know if we can help!
March 19, 2013
by Todd O. Dampier
· 43,962 Views · 1 Like
  • Previous
  • ...
  • 505
  • 506
  • 507
  • 508
  • 509
  • 510
  • 511
  • 512
  • 513
  • 514
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×