DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Languages Topics

article thumbnail
MySQL 5.6 Compatible Percona XtraBackup 2.0.6 Released
This post comes from Hrvoje Matijakovic at the MySQL Performance Blog. Percona is glad to announce the release of Percona XtraBackup 2.0.6 for MySQL 5.6 on March 20, 2013. Downloads are available from our download site here and Percona Software Repositories. This release is the current GA (Generally Available) stable release in the 2.0 series. New Features: XtraBackup 2.0.6 has implemented basic support for MySQL 5.6, Percona Server 5.6 and MariaDB 10.0. Basic support means that these versions are are recognized by XtraBackup, and that backup/restore works as long as no 5.6-specific features are used (such as GTID, remote/transportable tablespaces, separate undo tablespace, 5.6-style buffer pool dump files). Bugs Fixed: Individual InnoDB tablespaces with size less than 1MB were extended to 1MB on the backup prepare operation. This led to a large increase in disk usage in cases when there are many small InnoDB tablespaces. Bug fixed #950334 (Daniel Frett, Alexey Kopytov). Fixed the issue that caused databases corresponding to inaccessible datadir subdirectories to be ignored by XtraBackup without warning or error messages. This was happening because InnoDB code silently ignored datadir subdirectories it could not open. Bug fixed #664986 (Alexey Kopytov). Under some circumstances XtraBackup could fail to copy a tablespace with a high --parallel option value and a low innodb_open_files value. Bug fixed #870119 (Alexey Kopytov). Fix for the bug #711166 introduced a regression that caused individual partition backups to fail when used with --include option in innobackupex or the --tables option in xtrabackup. Bug fixed #1130627 (Alexey Kopytov). innobackupex didn’t add the file-per-table setting for table-independent backups. Fixed by making XtraBackup auto-enable innodb_file_per_table when the --export option is used. Bug fixed #930062 (Alexey Kopytov). Under some circumstances XtraBackup could fail on a backup prepare with innodb_flush_method=O_DIRECT. Bug fixed #1055547 (Alexey Kopytov). innobackupex did not pass the --tmpdir option to the xtrabackup binary resulting in the server’s tmpdir always being used for temporary files. Bug fixed #1085099 (Alexey Kopytov). XtraBackup for MySQL 5.6 has improved the error reporting for unrecognized server versions. Bug fixed #1087219 (Alexey Kopytov). Fixed the missing rpm dependency for Perl Time::HiRes package that caused innobackupex to fail on minimal CentOS installations. Bug fixed #1121573 (Alexey Bychko). innobackupex would fail when --no-lock and --rsync were used in conjunction. Bug fixed #1123335 (Sergei Glushchenko). Fix for the bug #1055989 introduced a regression that caused xtrabackup_pid file to remain in the temporary dir after execution. Bug fixed #1114955 (Alexey Kopytov). Unnecessary debug messages have been removed from the XtraBackup output. Bug fixed #1131084 (Alexey Kopytov). Other bug fixes: bug fixed #1153334 (Alexey Kopytov), bug fixed #1098498 (Laurynas Biveinis), bug fixed #1132763 (Laurynas Biveinis), bug fixed #1142229 (Laurynas Biveinis), bug fixed #1130581 (Laurynas Biveinis). Release notes with all the bugfixes for Percona XtraBackup 2.0.6 for MySQL 5.6 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.
March 24, 2013
by Peter Zaitsev
· 2,942 Views
article thumbnail
Extracting the Elements of the Java Collection- The Java 8 Way
We all have extensively used Collection classes like List, Map and their derived versions. And each time we used them we had to iterate through them to either find some element or update the elements or find different elements matching some condition. Consider a List of Person as shown below: List personList = new ArrayList<>(); personList.add(new Person("Virat", "Kohli",22)); personList.add(new Person("Arun", "Kumar",25)); personList.add(new Person("Rajesh", "Mohan", 32)); personList.add(new Person("Rahul", "Dravid", 35)); To find out all the Person instances with age greater than 30, we would do: List olderThan30OldWay = new ArrayList<>(); for ( Person p : personList){ if ( p.age >= 30){ olderThan30OldWay.add(p); } } System.out.println(olderThan30OldWay); and this gives me the output as: [Rajesh Mohan, 32, Rahul Dravid, 35] The code is easy to write, but is it not a bit more verbose, especially the iteration part? Why would we have to iterate? Would it not be cool if there was an API which would iterate the contents and give us the end result i.e we give the source List and use a series of method calls to get us the result List we are looking for? Yes, this is possible in other languages like Scala, Groovy which support passing closures and also support internal iteration. But is there a solution for Java developers? Yes, this exact problem is being solved by introducing support for Lambda Expressions(closures) and a enhanced Collection API to leverage the lambda expression support. The sad news is that it’s going to be part of Java 8 and will take some time to be into mainstream development. Leveraging the Java 8 enhancements to the above scenario As I said before the Collections API is being enhanced to support the use of Lambda Expression and more about it can be read here. Instead of adding all the new APIs to the Collection class the JDK team created a new concept called “Stream” and added most of the APIs in that class. “Stream” is a sequence of elements obtained from the Collection from which it is created. To read more about the origin of Stream class please refer to this document. To implement the example I started with using the enhancements in Java 8 we would be using few new APIs namely: stream(), filter(), collect(), Collectors.toCollection(). stream(): Uses the collection on which this API is called to create an instance of Stream class. filter():This method accepts a lambda expression which takes in one parameter and returns a boolean value. This lambda expression is written as a replacement for implementing the Predicate class. collect(): There are 2 overloaded versions of this method. The one I am using here takes an instance of Collector. This method takes the contents of the stream and constructs another collection. This construction logic is defined by the Collector. Collectors.toCollection(): Collectors is a factory for Collector. And the toCollection() takes a Lambda expression/Method reference which should return a new instance of any derivatives of Collection class. With brief introduction to the APIs used, let me show the code which is equivalent to the first code sample: List olderThan30 = //Create a Stream from the personList personList.stream(). //filter the element to select only those with age >= 30 filter(p -> p.age >= 30). //put those filtered elements into a new List. collect(Collectors.toCollection(() -> new ArrayList())); System.out.println(olderThan30); The above code uses both Internal iteration and lambda expressions to make it intuitive, concise and soothing to the eye. If you are not familiar with the idea of Lambda Expressions, check out my previous entry which covers in brief about Lambda expressions.
March 23, 2013
by Mohamed Sanaulla
· 78,372 Views · 2 Likes
article thumbnail
5 Ways Objects Can Communicate With Each Other Heading Towards Decoupling
Way 1. Simple method call Object A calls a method on object B. This is clearly the simplest type of communication between two objects but is also the way which results in the highest coupling. Object A’s class has a dependency upon object B’s class. Wherever you try to take object A’s class, object B’s class (and all of its dependencies) are coming with it. Way 2. Decouple the callee from the caller Object A’s class declares an interface and calls a method on that interface. Object B’s class implements that interface. This is a step in the right direction as object A’s class has no dependency on object B’s class. However, something else has to create object B and introduce it to object A for it to call. So we have created the need for an additional class which has a dependency upon object B’s class. We have also created a dependency from B to A. However, these can be a small price to pay if we are serious about taking object A’s class off to other projects. Way 3. Use an Adaptor Object A’s class declares an interface and calls a method on that interface. An adaptor class implements the interface and wraps object B, forwarding calls to it. This frees up object B’s class from being dependent on object A’s class. Now we are getting closer to some real decoupling. This is particularly useful if object B’s class is a third-party class which we have no control over. Way 4. Dependency Injection Dependency injection is used to find, create and call object B. This amounts to deferring until runtime how object A will talk to object B. This way certainly feels to have the lowest coupling, but in reality just shifts the coupling problem into the wiring realm. At least before we could rely on the compiler to ensure that there was a concrete object on the other end of each call – and furthermore we had the convenience of using the development tools to help us unpick the interaction between objects. Way 5. Chain of command pattern The chain of command pattern is used to allow object A to effectively say “does anyone know how to handle this call?”. Object B, which is listening out for these cries for help, picks up the message and figures out for itself if it is able to respond. This approach does mean that object A has to be ready for the outcome that nobody is able to respond, however it buys us great flexibility in how the responder is implemented. Chain of command – way 5 – is the decoupling winner and here's an example to help explain why. Let object A be a raster image file viewer, with responsibilities for allowing the user to pick the file to open, and zoom in and out on the image as it is displayed. Let object B be a loader which has the responsibility of opening a gif file and returning an array of colored pixels. Our aim is to avoid tying object A's class to object B's class because object B's class uses a third party library. Additionally, object A doesn't want to know about how the image file is interpreted, or even if it is a gif, jpg, png or whatever. In this example object B, or more likely a wrapper of object B, will declare a method which equips it to respond to any requests to open an image file. The method will respond with an array of pixels if the file is of a format it recognizes, or respond with null if it does not recognize the format. The framework then simply asks handlers in turn until one provides a non-null response. With this framework in place we are now free to slide in more image loaders with the addition of just one more handler class. And furthermore, on the source end of the call, we can add other classes to not just view the images, but print them, edit them or manipulate them in any other way we choose. In conclusion, we can see that decoupling can be achieved and yield flexibility, but this does not mean it is appropriate for every call from one object to another. The best thing to do is start with straight method calls, but keep cohesion in mind. Then if at a later stage it becomes necessary to swap in and out different objects it won't be too hard to extract an interface and put in place a decoupling mechanism.
March 22, 2013
by Paul Wells
· 43,201 Views · 2 Likes
article thumbnail
OpenJPA: Memory Leak Case Study
This article will provide the complete root cause analysis details and resolution of a Java heap memory leak (Apache OpenJPA leak) affecting an Oracle Weblogic server 10.0 production environment. This post will also demonstrate the importance to follow the Java Persistence API best practices when managing the javax.persistence.EntityManagerFactory lifecycle. Environment specifications Java EE server: Oracle Weblogic Portal 10.0 OS: Solaris 10 JDK: Oracle/Sun HotSpot JVM 1.5 32-bit @2 GB capacity Java Persistence API: Apache OpenJPA 1.0.x (JPA 1.0 specifications) RDBMS: Oracle 10g Platform type: Web Portal Troubleshooting tools Quest Foglight for Java (Java heap monitoring) MAT (Java heap dump analysis) Problem description & observations The problem was initially reported by our Weblogic production support team following production outages. An initial root cause analysis exercise did reveal the following facts and observations: Production outages were observed on regular basis after ~2 weeks of traffic. The failures were due to Java heap (OldGen) depletion e.g. OutOfMemoryError: Java heap space error found in the Weblogic logs. A Java heap memory leak was confirmed after reviewing the Java heap OldGen space utilization over time from Foglight monitoring tool along with the Java verbose GC historical data. Following the discovery of the above problems, the decision was taken to move to the next phase of the RCA and perform a JVM heap dump analysis of the affected Weblogic (JVM) instances. JVM heap dump analysis ** A video explaining the following JVM Heap Dump analysis is now available here. In order to generate a JVM heap dump, the supported team did use the HotSpot 1.5 jmap utility which generated a heap dump file (heap.bin) of about ~1.5 GB. The heap dump file was then analyzed using the Eclipse Memory Analyzer Tool. Now let’s review the heap dump analysis so we can understand the source of the OldGen memory leak. MAT provides an initial Leak Suspects report which can be very useful to highlight your high memory contributors. For our problem case, MAT was able to identify a leak suspect contributing to almost 600 MB or 40% of the total OldGen space capacity. At this point we found one instance of java.util.LinkedList using almost 600 MB of memory and loaded to one of our application parent class loader (@ 0x7e12b708). The next step was to understand the leaking objects along with the source of retention. MAT allows you to inspect any class loader instance of your application, providing you with capabilities to inspect the loaded classes & instances. Simply search for the desired object by providing the address e.g. 0x7e12b708 and then inspect the loaded classes & instances by selecting List Objects > with outgoing references. As you can see from the above snapshot, the analysis was quite revealing. What we found was one instance of org.apache.openjpa.enhance.PCRegistry at the source of the memory retention; more precisely the culprit was the _listeners field implemented as a LinkedList. For your reference, the Apache OpenJPA PCRegistry is used internally to track the registered persistence-capable classes. Find below a snippet of the PCRegistry source code from Apache OpenJPA version 1.0.4 exposing the _listeners field. /** * Tracks registered persistence-capable classes. * * @since 0.4.0 * @author Abe White */ publicclass PCRegistry { // DO NOT ADD ADDITIONAL DEPENDENCIES TO THIS CLASS privatestaticfinal Localizer _loc = Localizer.forPackage (PCRegistry.class); // map of pc classes to meta structs; weak so the VM can GC classes privatestaticfinal Map _metas = new ConcurrentReferenceHashMap (ReferenceMap.WEAK, ReferenceMap.HARD); // register class listeners privatestaticfinal Collection _listeners = new LinkedList(); …………………………………………………………………………………… Now the question is why is the memory footprint of this internal data structure so big and potentially leaking over time? The next step was to deep dive into the _listeners LinkedLink instance in order to review the leaking objects. We finally found that the leaking objects were actually the JDBC & SQL mapping definitions (metadata) used by our application in order to execute various queries against our Oracle database. A review of the JPA specifications, OpenJPA documentation and source did confirm that the root cause was associated with a wrong usage of the javax.persistence.EntityManagerFactory such of lack of closure of a newly created EntityManagerFactory instance. If you look closely at the above code snapshot, you will realize that the close() method is indeed responsible to cleanup any recently used metadata repository instance. It did also raise another concern, why are we creating such Factory instances over and over… The next step of the investigation was to perform a code walkthrough of our application code, especially around the life cycle management of the JPA EntityManagerFactory and EntityManager objects. Root cause and solution A code walkthrough of the application code did reveal that the application was creating a new instance of EntityManagerFactory on each single request and not closing it properly. public class Application { @Resource private UserTransaction utx = null; // Initialized on each application request and not closed! @PersistenceUnit(unitName = "UnitName") private EntityManagerFactory emf = Persistence.createEntityManagerFactory("PersistenceUnit"); public EntityManager getEntityManager() { return this.emf.createEntityManager(); } public void businessMethod() { // Create a new EntityManager instance via from the newly created EntityManagerFactory instance // Do something... // Close the EntityManager instance } } This code defect and improver use of JPA EntityManagerFactory was causing a leak or accumulation of metadata repository instances within the OpenJPA _listeners data structure demonstrated from the earlier JVM heap dump analysis. The solution of the problem was to centralize the management & life cycle of the thread safe javax.persistence.EntityManagerFactory via the Singleton pattern. The final solution was implemented as per below: Create and maintain only one static instance of javax.persistence.EntityManagerFactory per application class loader and implemented via the Singleton Pattern. Create and dispose new instances of EntityManager for each application request. Please review this discussion from Stackoverflow as the solution we implemented is quite similar. Following the implementation of the solution to our production environment, no more Java heap OldGen memory leak is observed. Please feel free to provide your comments and share your experience on the same.
March 21, 2013
by Pierre - Hugues Charbonneau
· 9,242 Views
article thumbnail
Using Lambda Expression to Sort a List in Java 8 using NetBeans Lambda Support
As part of JSR 335Lambda expressions are being introduced to the Java language from Java 8 onwards and this is a major change in the Java language.
March 21, 2013
by Mohamed Sanaulla
· 346,112 Views · 6 Likes
article thumbnail
Entity Framework 5/6 vs NHibernate 3 – The State of Affairs
It has been almost two years since I've last compared NHibernate and Entity Framework, so with the recentalpha version of EF 6, it's about time to look at the current state of affair. I've been using NHibernate for more than 6 years so obviously I'm a bit biased. But I can't ignore that EF's feature list is growing and some of the things I like about the NHibernate eco-system such as code-based mappings and automatic migrations have found a place in EF. Moreover, EF is now open-source, so they're accepting pull requests as well. Rather than doing a typical feature-by-feature comparison, I'll be looking at those aspects of an object-relational mapper that I think are important when building large-scale enterprise applications. So let's see how those frameworks match up. Just for your information, I've been looking at Entity Framework 6 Alpha 3 and NHibernate 3.3.1GA. Support for rich domain models When you're practicing Domain Driven Design it is crucial to be able to model your domain using the right object-oriented principles. For example, you should be able to encapsulate data and only expose properties if that is needed by the functional requirements. If you model an association using a UML qualifier, you should be able to implement that using a IDictionary. Similarly, collection properties should be based on IEnumerableor any of the newer read-only collections introduced in .NET 4.5 so that your collections are protected by external changes. NHibernate supports all these requirements and adds quite a lot of flexibility like ordered and unordered sets. Unfortunately, neither EF5 or 6 supports mapping private fields (yet) nor can you directly use a dictionary class. In fact, EF only supports ICollections of entities, so collections of value objects are out of the question. One notable type that still isn't fully supported is the enum. It was introduced in EF5, but only if you target .NET 4.5. EF6 will fortunately fixes this so that it is also available in .NET 4.0 applications. A good ORM should also allow your domain model to be as persistence ignorant as possible. In other words, you shouldn't need to decorate your classes with attributes or subclass some framework-provided base-class (something you might remember from Linq2Sql). Both frameworks impose some limitations such as protected default constructors or virtual members, but that's not going to be too much of an issue. Vendor support Although Microsoft makes us believe that corporate clients only use SQL Server or SQL Azure, we all know that the opposite is much more true. The big drawback of EF compared to NH is that the latter has all the providers built-in. So whenever a new version of the framework is released you don't have to worry about vendor support. Both EF5 and NH 3.3 support various flavors of SQL Server/Azure, SQLite, PostgreSQL, Oracle, Sybase, Firebird and DB2. Most of these providers originate from EF 4, so they don’t support code-first (migrations) or the new DBContext façade. EF6 is still an alpha release and its provider model seems to contain some breaking changes so don't expect any support for anything other than Microsoft's own databases anytime soon. Support switching databases for automated testing purposes Our architecture uses a repository pattern implementation that allows swapping the actual data mapper on-the-fly. Since we're heavily practicing Test Driven Development, we use this opportunity to approach our testing in different ways. We use an in-memory Dictionary for unit tests where the subject-under-test simply needs some data to be setup in a specific way (using Test Data Builders). We use an in-memory SQLite database when we want to verify that NHibernate can process the LINQ query correctly and performs sufficiently using NHProf. We use an actual SQL Server for unit tests that verify that our mapping against the database schema is correct. We have some integration code that interacts with a third-party Oracle system that is tested on SQL Server on a local development box, but uses Oracle on our automated SpecFlow build. So you can imagine switching between database providers without changing the mapping code is quite essential for us. During development, we decided that we did not care about the actual class that represented the integration tables, so we tried to use the Entity Framework model-first approach. Unfortunately, when you do that, you're basically locking yourself to a particular database. After switching back to our normal NHibernate approach, changing the connection string during deployment was enough to switch between SQL Server and Oracle. Fortunately this has also been possible since EF 4.1 and Jason Short wrote a good blog post about that. Automatic schema migration When you're practicing an agile methodology such as Scrum, you'll probably try to deliver a potentially shippable release at the end of every sprint. Part of being agile is that functionality can be added at any time where some of that might be affecting the database schema. The most traditional way of dealing with that is to generate or hand-write SQL scripts that are applied during deployment. The problem with SQL scripts is that they are tedious to write, might contain bugs, and are often closely coupled to the database vendor. Wouldn't it be great if the ORM framework would support some way of figuring out what version of the schema is being used and automatically upgrade the database scheme as part of your normal development cycle? Or what about the ability to revert the schema to an older version? The good news that this exists for both frameworks, but with a caveat. For instance, NHibernate doesn't support this out-of-the-box (although you can generate the initial schema). But with the help of another open-source project, Fluent Migrations, you can get very far. We currently use it in an enterprise system and it works like a charm. The caveat is that the support for the various databases is not always at the same level. For instance, SQLite doesn't allow renaming a column and Fluent Migrations doesn't support it (although theoretically it could create a new column, copy the old data over, and drop the old column). As an example of a fluent migration supporting both an update as well as a rollback, check out this snippet. [Migration(1)] public class TestCreateAndDropTableMigration: Migration { public override void Up() { Create.Table("TestTable") .WithColumn("Id").AsInt32().NotNullable().PrimaryKey().Identity() .WithColumn("Name").AsString(255).NotNullable().WithDefaultValue("Anonymous"); Create.Table("TestTable2") .WithColumn("Id").AsInt32().NotNullable().PrimaryKey().Identity() .WithColumn("Name").AsString(255).Nullable() .WithColumn("TestTableId").AsInt32().NotNullable(); Create.Index("ix_Name").OnTable("TestTable2").OnColumn("Name").Ascending() .WithOptions().NonClustered(); Create.Column("Name2").OnTable("TestTable2").AsBoolean().Nullable(); Create.ForeignKey("fk_TestTable2_TestTableId_TestTable_Id") .FromTable("TestTable2").ForeignColumn("TestTableId") .ToTable("TestTable").PrimaryColumn("Id"); Insert.IntoTable("TestTable").Row(new { Name = "Test" }); } public override void Down() { Delete.Table("TestTable2"); Delete.Table("TestTable"); } } Entity Framework has something similar built-in since version 5. It's called Code-First Migrations and looks surprisingly similar to Fluent Migrations. Just like the NHibernate solution has some limitations, EF's has as well and that is support from vendors. At the time of this writing not a single vendor supports Code-First Migrations. On the other hand, if you're only using SQL Server, SQL Express, SQL Compact or SQL Azure, there's nothing from stopping you to use it. Code-based mapping If you remember the old days of NHibernate, you might recall those ugly XML files that were needed to configure the mapping of your .NET classes to the underlying database. Fluent NHibernate has been offering a very nice fluent API for replacing those mappings with code. Not only does this prevent errors in the XML, it is also a very refactor-friendly approach. We've been using it for years and the extensive (and customizable) convention-basedmapping engine even allows auto-mapping entities to tables without the need of explicit mapping code. Strangely enough, NHibernate 3.2 has introduced a brand new fluent API that directly competes with Fluent NHibernate. Because of lack of documentation, I've never bothered to look at it, especially since Fluent NHibernate has been doing its job remarkedly. But during my research for this post I noticed that Adam Bar has written a very extensive series on the new API, and he actually managed to raise a renewed interest in the new API. Until Entity Framework 4.1 the only way to set-up the mapping was through its data model designer (not to be confused with an OO designer). But apparently the team behind it learned from Fluent Nhibernate and decided to introduce their own code-first approach, surprisingly named Code-First. In terms of convention-based mapping, it was quite limited, especially compared to Fluent NHibernate. EF 6 is going to introduce a lot of hooks for changing the conventions, both on property level as well as on class level. Supporting custom types and collections One of the guidelines in my own coding guidelines is to consider wrapping primitive types with more domain-specific types. Conincedentily it is also one of the rules of Object Calisthenetics. In Domain Driven Design these types are called Value Objects and their purpose is to encapsulate all the data and behavior associated with a recurring domain concept. For instance, rather than having two separate DateTime properties to represent a period and separate methods for determining whether some point of time occurs within that period, I would prefer to have a dedicated Period class that contains all that logic. This approach results in a design that contains less duplication and is easier to understand. Contrary to NHibernate, Entity Framework doesn't offer anything like this and as far as I know, doesn't plan to. NH on the other hand offers a myriad of options for creating custom types, custom collections or even composite types. Granted, you have to do a bit of digging to find the right documentation (and StackOverflow is your friend here), but if you do, it really helps to enrich your domain model. Query flexibility Some would argue that EF's LINQ support is much more mature, and until NH 3.2 I would have agreed. But since then, NH's LINQ support has improved substantially. For instance, during development we use an in-memory SQLite database in our query-related unit tests to make sure the query can actually be executed by NH. Before 3.2, we regularly ran into strange cast exceptions or exceptions because of unsupported expressions. Since 3.2, we've never seen those anymore. I haven't tried to run all our existing queries against EF, but I have no doubts that it would have any issue with it. In terms of non-LINQ querying, EF supports Entity SQL as well as native SQL (although I don’t know if all vendors are supported). NHibernate offers the HQL, QueryOver and Criteria APIs next to native vendor-specific SQL. Both frameworks support stored procedures. All in all plenty of flexibility. Extensibility EF 6 uses the service locator pattern to allow replacing certain aspects of the framework at runtime. This is a good starting point for extensibility, but unfortunately the team always demonstrates this by replacing the pluralization service. As if someone would actually like to do that. Nonetheless, I'm sure the team's plan is to expose more extension points in the near future. NH has a very extensive set of observable collections called listeners that can be used to hook into virtually every part of the framework. We've been using it for cross-cutting concerns, for hooking up auditing services and also for some CQRS related aspects. You can also tweak a lot of NH's behavior throughconfiguration properties (although you'll have to Google…eh…Bing for the right examples). Other notable features Each of the frameworks has some unique features that don't fit in any of the other topics I've discussed up to now. A short summary: Entity Framework 6 adds async/await support, a feature that NHibernate may never get due to the impact it has on the entire architecture. It also has built-in support for automatically reconnecting to the database, which is particularly useful for a known issue with SQL Azure. Both NHibernate as well as the Entity Framework support .NET 4.5, but only the latter gains some significant performance improvements from it. NHibernate also offers a unique feature called Futures that you can use to compose a set of queries and send them to the database as a single request. The Entity Framework allows creating a DBContext with an existing open connection. As far as I know that's not possible in NHibernate. Version 6 of the Entity Framework adds spatial support, something for which you need a 3rd party library to get that in NHibernate. Pedro Sousa wrote an in-depth blog post series about that. Wrap-up The big difference between Entity Framework and NHibernate from a developer perspective is that the former offers an integrated set of services whereas the latter requires the combination of several open-source libraries. That on itself is not a big issue - that's why we have NuGet, don't we? - but we've noticed that those libraries are not always up-to-date soon enough when new NHibernate versions are released. From that same perspective NHibernate does offer a lot of flexibility and clearly shows its maturity. On the other hand, you could also see that as a potential barrier for new developers. It's just much easier to get started with Entity Framework than with NH. The documentation on Entity Framework is quite comprehensive and even the new functionality for version 6 is extensively documented using feature specifications. The NHibernatedocumentation has always been lagging behind a bit. For instance, the new mapping system is not even mentioned even though the reference documentation mentions the correct version. The information is available, but you just have to search a bit. The fact that the EH is being developed using a Git source control repository is also a big plus. Just look at themany pull requests they've been taking in. On the other hand, to my surprise somebody moved the NHibernate source code to GitHub while I wasn't paying attention. So on that aspect they are equals. And does NHibernate have a future at all? Some would argue it is dead already. I don't agree though. Just look at the statistics on GitHub; 240 forks, almost 200 pull requests and a lot of commits in the last few months. I do agree that NoSQL solutions like RavenDB are extremely powerful and offer a lot of fun and flexibility for development, but the fact of the matter is that they are still not widely accepted by enterprises with a history in SQL Server or Oracle. Nevertheless, the RAD aspect of EF cannot be ignored and is important for small short-running projects where SQL Server is the norm. And for those projects, I would wholeheartedly recommend EF. But for the bigger systems where a NoSQL solution is not an option, especially those based on Domain Driven Design, NHibernate is still the king in town. As usual, I might have overlooked a feature or misinterpreted some aspect of both frameworks. If so, leave a comment, email me or drop me a tweet at @ddoomen.
March 20, 2013
by Dennis Doomen
· 38,972 Views
article thumbnail
Encode Email Addresses With PHP
Spam bots will crawl your pages exactly the same way Search Engines crawls your pages, but while Search Engines are crawling to index your content, spam bots are crawling to find certain information. One of the most common bits of data they are searching for are email addresses, this is because they can store these email addresses in a list and send out spam emails. Displaying your email address on your web page makes it very easy for your visitors to store this and send you email, but you will need to get around the spam bot problem to do this you will need to HTML encode your email address. While encoding your email address will make it display correctly in the browser in the source code the email address will be encoded so that spam bots can not read the email. Below is a PHP code snippet which will allow you to pass in an email address and encode it so that you can display the email address on your website. This uses the PHP function ord() to return an ASCII value for the character used in the email. /** * Encode an email address to display on your website */ function encode_email_address( $email ) { $output = ''; for ($i = 0; $i < strlen($email); $i++) { $output .= '&#'.ord($email[$i]).';'; } return $output; } To use in on your page all you have to do is pass in a parameter and it will return the encoded email you can now use on your website. $encodedEmail = encode_email_address( '[email protected]' ); printf('%s', $encodedEmail, $encodedEmail);
March 20, 2013
by Paul Underwood
· 10,198 Views
article thumbnail
Algorithm of the Week: Aho-Corasick String Matching Algorithm in Haskell
let’s say you have a large piece of text and a dictionary of keywords. how do you quickly locate all the keywords? aho-corasick algorithm diagram well, there are many ways really, you could even iterate through the whole thing and compare words to keywords. but it turns out that’s going to be very slow. at least o(n_keywords * n_words) complexity. essentially you’re making as many passes over the text as your dictionary is big. in 1975 a couple of ibm researchers – alfred aho and margaret corasick – discovered an algorithm that can do this in a single pass. the aho-corasick string matching algorithm . i implemented it in haskell and it takes 0.005s to find 8 different keywords in oscar wilde’s the nightingale and the rose – a 12kb text. a quick naive keyword search implemented in python takes 0.023s . not a big difference practically speaking, but imagine a situation with megabytes of text and thousands of words in the dictionary. the authors mention printing out the result as a major bottleneck in their assessment of the algorithm. yep, printing . the aho-corasick algorithm at the core of this algorithm are three functions: the three functions of aho-corasick algorithm a parser based on a state machine, which maps (state, char) pairs to states and occasionally emits an output. this is called the goto function a failure function, which tells the goto function which state to jump into when the character it just read doesn’t match anything an output function, which maps states to outputs – potentially more than one per state the algorithm works in two stages. it will first construct the goto, failure and output functions. the complexity of this operation hinges solely on the size of our dictionary. then it iterates over the input text to produce all the matches. using state machines for parsing text is a well known trick – the real genius of this algorithm rests in that failure function if you ask me. it makes lateral transitions between states when the algorithm climbs itself into a wall. say you have she and hers in the dictionary. the goto machine eats your input string one character at the time. let’s say it’s already read s h . the next input is an e so it outputs she and reaches a final state. next it reads an r , but the state didn’t expect any more inputs, so the failure function puts us on the path towards hers . this is a bit tricky to explain in text, i suggest you look at the picture from the original article and look at what’s happening. my haskell implementation the first implementation i tried, relied on manully mapping inputs to outputs for the goto, failure and output functions by using pattern recognition. not very pretty, extremely hardcoded, but it worked and was easy to make. building the functions dynamically proved a bit trickier. type goto = map (int, char) int type failure = map int int type output = map int [string] first off, we build the goto function. -- builds the goto function build_goto::goto -> string -> (goto, string) build_goto m s = (add_one 0 m s, s) -- adds one string to goto function add_one::int -> goto -> [char] -> goto add_one _ m [] = m add_one state m (c:rest) | member key m = add_one (frommaybe 0 $ map.lookup key m) m rest | otherwise = add_one max (map.insert key max m) rest where key = (state, c) max = (size m)+1 essentially this builds a flattened prefix tree in a hashmap of (state, char) pairs mapping to the next state. it makes sure to avoid adding new edges to the three as much as possible. the reason it’s not simply a prefix tree are those lateral transitions; doing them in a tree would require backtracking and repeating of steps, so we haven’t achieved anything. once we have the goto function, building the output is trivial. -- builds the output function build_output::(?m::goto) => [string] -> output build_output [] = empty build_output (s:rest) = map.insert (fin 0 s) (list.filter (\x -> elem x dictionary) $ list.tails s) $ build_output rest -- returns the state in which an input string ends without using failures fin::(?m::goto) => int -> [char] -> int fin state [] = state fin state (c:rest) = fin next rest where next = frommaybe 0 $ map.lookup (state, c) ?m we are essentially going over the dictionary, finding the final state for each word and building a hash table mapping final states to their outputs. building the failure function was trickiest, because we need a way to iterate over the depths at which nodes are position in the goto state machine. but we threw that info away by using a hashmap. -- tells us which nodes in the goto state machine are at which traversal depth nodes_at_depths::(?m::goto) => [[int]] nodes_at_depths = list.map (\i -> list.filter (>0) $ list.map (\l -> if i < length l then l!!i else -1) paths) [0..(maximum $ list.map length paths)-1] where paths = list.map (path 0) dictionary we now have a list of lists, that tells us at which depth certain nodes are. -- builds the failure function build_fail::(?m::goto) => [[int]] -> int -> failure build_fail nodes 0 = fst $ mapaccuml (\f state -> (map.insert state 0 f, state)) empty (nodes!!0) build_fail nodes d = fst $ mapaccuml (\f state -> (map.insert state (decide_fail state lower) f, state)) lower (nodes!!d) where lower = build_fail nodes (d-1) -- inner step of building the failure function decide_fail::(?m::goto) => int -> failure -> int decide_fail state lower = findwithdefault 0 (s, c) ?m where (s', c) = key' state $ assocs ?m s = findwithdefault 0 s' lower -- gives us the key associated with a certain state (how to get there) key'::int -> [((int, char), int)] -> (int, char) key' _ [] = (-1, '_') -- this is ugly, being of maybe type would be better key' state ((k, v):rest) | state == v = k | otherwise = key' state rest here we are going over the list of nodes at depths and deciding what the failure should be for each depth based on the failures of depth-1. at depth zero, all failures go to the zeroth state. an important part of this process was inverting the goto hashmap so values point to keys, which is essentially what the key’ function does. finally, we can use the whole algorithm like this: main = do let ?m = fst $ mapaccuml build_goto empty dictionary let ?f = build_fail nodes_at_depths $ (length $ nodes_at_depths)-1 ?out = build_output dictionary print $ ahocorasick text a bit more involved than the usual example of haskell found online, it’s still pretty cool you can see the whole code on github here .
March 19, 2013
by Swizec Teller
· 21,965 Views
article thumbnail
Binding to JSON & XML - Handling Collections
One of EclipseLink JAXB (MOXy)'s strengths is the ability to map an object model to both JSON and XML with a single set of metadata. The one weakness had been that you needed to compromise on the JSON key or XML element for collection properties. I'm happy to say that this issue has been solved in EclipseLink 2.5 (and EclipseLink 2.4.2), and I will demonstrate below with an example. You can try this out today by downloading an EclipseLink 2.5.0 (or EclipseLink 2.4.2) nightly build starting on March 15, 2013 from: http://www.eclipse.org/eclipselink/downloads/nightly.php Domain Model By default a JAXB (JSR-222) implementation will not output a grouping element around collection data. This can be done through the use of the @XmlElementWrapper annotation (see: JAXB & Collection Properties). This grouping element often has a plural name and is a better fit for the key of a JSON array than the repeating element defined by the @XmlElement annotation is. package blog.json.collections; import java.util.*; import javax.xml.bind.annotation.*; @XmlRootElement @XmlType(propOrder={"name", "emailAddresses"}) public class Customer { private String name; private List emailAddresses = new ArrayList(); public String getName() { return name; } public void setName(String name) { this.name = name; } @XmlElementWrapper(name="email-addresses") @XmlElement(name="email-address") public List getEmailAddresses() { return emailAddresses; } public void setEmailAddresses(List<'String> emailAddresses) { this.emailAddresses = emailAddresses; } } Demo We will specify the JSON_WRAPPER_AS_ARRAY_NAME property with a true value to tell MOXy that it should use the grouping element as the name for the JSON array value. Then we will use the same Marshaller to output the same object to both XML and JSON. package blog.json.collections; import java.util.*; import javax.xml.bind.*; import org.eclipse.persistence.jaxb.MarshallerProperties; public class Demo { public static void main(String[] args) throws Exception { Customer customer = new Customer(); customer.setName("Jane Doe"); customer.getEmailAddresses().add("[email protected]"); customer.getEmailAddresses().add("[email protected]"); Map properties = new HashMap(1); properties.put(MarshallerProperties.JSON_WRAPPER_AS_ARRAY_NAME, true); JAXBContext jc = JAXBContext.newInstance(new Class[] {Customer.class}, properties); Marshaller marshaller = jc.createMarshaller(); marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true); // Output XML marshaller.marshal(customer, System.out); // Output JSON marshaller.setProperty(MarshallerProperties.MEDIA_TYPE, "application/json"); marshaller.marshal(customer, System.out); } } XML Output Below is the XML output from running the demo code. We see that email-addresses is marshalled as the grouping element which contains an email-address element for each item in the collection. Jane Doe [email protected] [email protected] JSON Output The following JSON output is produced from the same metadata. The only difference is that we told MOXy to use the grouping element as the name for JSON array values. { "customer" : { "name" : "Jane Doe", "email-addresses" : [ "[email protected]", "[email protected]" ] } } JAX-RS You can easily use MOXy as your JSON-binding provider in a JAX-RS environment (see: MOXy as your JAX-RS JSON Provider - MOXyJsonProvider). You can specify that the grouping element should be used as the JSON array name with the wrapperAsArrayName property on MOXyJsonProvider. package blog.json.collections; import java.util.*; import javax.ws.rs.core.Application; import org.eclipse.persistence.jaxb.rs.MOXyJsonProvider; public class CustomerApplication extends Application { @Override public Set> getClasses() { HashSet> set = new HashSet>(1); set.add(CustomerService.class); return set; } @Override public Set
March 18, 2013
by Blaise Doughan
· 8,588 Views
article thumbnail
Client For ActiveMQ
This Post explains Topics in Active MQ (Message Broker) with Subscribing and Publishing. For this we will write two java clients. As we did for wso2 Message Broker TopicSubscriber.java to Subcribe for messages TopicPublisher.java to to Publish the messages Let's Start. [1] Get Active MQ from http://activemq.apache.org/download.html [1.1] Start Active MQ from \bin\activemq.bat You can see the started server form http://localhost:8161/admin/ [2] Create Porject "Client" on IDE that you preferred [3] Add activemq-all-5.7.0.jar to lib Dir in the project (activemq-all-5.7.0.jar can be found in root folder) [4] Creat class "TopicSubscriber.java" to Subcribe for messages package simple; import java.util.Properties; import javax.jms.JMSException; import javax.jms.Message; import javax.jms.MessageListener; import javax.jms.Session; import javax.jms.TextMessage; import javax.jms.Topic; import javax.jms.TopicConnection; import javax.jms.TopicConnectionFactory; import javax.jms.TopicSession; import javax.naming.InitialContext; import javax.naming.NamingException; public class TopicSubscriber { private String topicName = "news.sport"; private String initialContextFactory = "" +"org.apache.activemq.jndi.ActiveMQInitialContextFactory"; private String connectionString = "tcp://" +"localhost:61616"; private boolean messageReceived = false; public static void main(String[] args) { TopicSubscriber subscriber = new TopicSubscriber(); subscriber.subscribeWithTopicLookup(); } public void subscribeWithTopicLookup() { Properties properties = new Properties(); TopicConnection topicConnection = null; properties.put("java.naming.factory.initial", initialContextFactory); properties.put("connectionfactory.QueueConnectionFactory", connectionString); properties.put("topic." + topicName, topicName); try { InitialContext ctx = new InitialContext(properties); TopicConnectionFactory topicConnectionFactory = (TopicConnectionFactory) ctx .lookup("QueueConnectionFactory"); topicConnection = topicConnectionFactory.createTopicConnection(); System.out .println("Create Topic Connection for Topic " + topicName); while (!messageReceived) { try { TopicSession topicSession = topicConnection .createTopicSession(false, Session.AUTO_ACKNOWLEDGE); Topic topic = (Topic) ctx.lookup(topicName); // start the connection topicConnection.start(); // create a topic subscriber javax.jms.TopicSubscriber topicSubscriber = topicSession .createSubscriber(topic); TestMessageListener messageListener = new TestMessageListener(); topicSubscriber.setMessageListener(messageListener); Thread.sleep(5000); topicSubscriber.close(); topicSession.close(); } catch (JMSException e) { e.printStackTrace(); } catch (NamingException e) { e.printStackTrace(); } catch (InterruptedException e) { e.printStackTrace(); } } } catch (NamingException e) { throw new RuntimeException("Error in initial context lookup", e); } catch (JMSException e) { throw new RuntimeException("Error in JMS operations", e); } finally { if (topicConnection != null) { try { topicConnection.close(); } catch (JMSException e) { throw new RuntimeException( "Error in closing topic connection", e); } } } } public class TestMessageListener implements MessageListener { public void onMessage(Message message) { try { System.out.println("Got the Message : " + ((TextMessage) message).getText()); messageReceived = true; } catch (JMSException e) { e.printStackTrace(); } } } } [5] Creat class "TopicPublisher.java" to to Publish the messages package simple; import javax.jms.*; import javax.naming.InitialContext; import javax.naming.NamingException; import java.util.Properties; public class TopicPublisher { private String topicName = "news.sport"; private String initialContextFactory = "org.apache.activemq" +".jndi.ActiveMQInitialContextFactory"; private String connectionString = "tcp://localhost:61616"; public static void main(String[] args) { TopicPublisher publisher = new TopicPublisher(); publisher.publishWithTopicLookup(); } public void publishWithTopicLookup() { Properties properties = new Properties(); TopicConnection topicConnection = null; properties.put("java.naming.factory.initial", initialContextFactory); properties.put("connectionfactory.QueueConnectionFactory", connectionString); properties.put("topic." + topicName, topicName); try { // initialize // the required connection factories InitialContext ctx = new InitialContext(properties); TopicConnectionFactory topicConnectionFactory = (TopicConnectionFactory) ctx .lookup("QueueConnectionFactory"); topicConnection = topicConnectionFactory.createTopicConnection(); try { TopicSession topicSession = topicConnection.createTopicSession( false, Session.AUTO_ACKNOWLEDGE); // create or use the topic System.out.println("Use the Topic " + topicName); Topic topic = (Topic) ctx.lookup(topicName); javax.jms.TopicPublisher topicPublisher = topicSession .createPublisher(topic); String msg = "Hi, I am Test Message"; TextMessage textMessage = topicSession.createTextMessage(msg); topicPublisher.publish(textMessage); System.out.println("Publishing message " +textMessage); topicPublisher.close(); topicSession.close(); Thread.sleep(20); } catch (InterruptedException e) { e.printStackTrace(); } } catch (JMSException e) { throw new RuntimeException("Error in JMS operations", e); } catch (NamingException e) { throw new RuntimeException("Error in initial context lookup", e); } } } [6] Firstly Run "TopicSubscriber.java" and then run "TopicPublisher.java" Here is out put from both TopicSubscriber:: Create Topic Connection for Topic news.sport Got the Message : Hi, I am Test Message TopicPublisher:: Use the Topic news.sport Publishing message ActiveMQTextMessage {commandId = 0, responseRequired = false, messageId = ID:Madhuka-THINK-51683-1359787878456-1:1:1:1:1, originalDestination = null, originalTransactionId = null, producerId = null, destination = topic://news.sport, transactionId = null, expiration = 0, timestamp = 1359787878729, arrival = 0, brokerInTime = 0, brokerOutTime = 0, correlationId = null, replyTo = null, persistent = true, type = null, priority = 4, groupID = null, groupSequence = 0, targetConsumerId = null, compressed = false, userID = null, content = null, marshalledProperties = null, dataStructure = null, redeliveryCounter = 0, size = 0, properties = null, readOnlyProperties = false, readOnlyBody = false, droppable = false, text = Hi, I am Test Message} [More] Here is full message that we have send to TopicSubscriber. We can get that any parameter in above. Here is sample to get TimeStamp and ID from JMS message. public class TestMessageListener implements MessageListener { public void onMessage(Message message) { try { System.out.println("Got the Message TimeStamp: " + message.getJMSTimestamp()); System.out.println("Got the Message JMS ID : " + message.getJMSMessageID()); messageReceived = true; } catch (JMSException e) { e.printStackTrace(); } } } Now go to ActiveMQ server at http://localhost:8161/admin/ See that Topic and message count for that topic. Now you time to check more in 'Active MQ'
March 16, 2013
by Madhuka Udantha
· 18,334 Views
article thumbnail
Bring Ruby VCR to Javascript testing with Capybara and puffing-billy
Let’s say you are writing an application in Ruby. You are probably talking to every API under the sun and are happily writing tests to make sure your code isn’t failing. Because you don’t want to rely on 3rd parties or an internet connection to make your tests pass or fail you mock everything with let’s say, Webmock. This also makes your tests much much faster. After all even the fastest internet is much slower than the processor talking to its memory. If you’re too lazy to mock out every API under the sun, you might use VCR to record requests and play them back later. The main advantage being, you don’t have to worry about meticulously reimplementing everything, and you can nuke the recordings at any time to make sure your code still works against the real API. Life is good. Enter Javascript, stage left Then Javascript becomes more and more prominent. Suddenly your application’s logic is shifting from backend to browser and before you know it, most of your tests are pretty irrelevant. You’re fine for a while with Capybara or Cucumber. Launch a headless browser, click around the site from the comfort of RSpec, make sure users see what they’re supposed to. Balance restored. Then you add a payment form. Or something. Suddenly your frontend is talking to an API. In case of Stripe or Balanced it’s even a feature. A great benefit for the user. jQuery(function($) { $('#payment-form').submit(function(event) { var $form = $(this); // Disable the submit button to prevent repeated clicks $form.find('button').prop('disabled', true); Stripe.createToken($form, stripeResponseHandler); // Prevent the form from submitting with the default action return false; }); }); Well that sucks, you’re suddenly back to square one. Your tests take minutes to execute. Your tests fail without an internet connection. Your tests rely on some 3rd party service being up. Your tests suck. Who wants to code when running ~5 tests takes 3 minutes? Nobody. Enter puffing-billy, stage right The problem is that neither Webmock nor VCR can handle requests originating in a browser because they happen in a different thread and they can’t mess around with those. Luckily, a year ago Olly Smith, created puffing-billy. The idea was great – spin up a web proxy, tell your headless browser to use it, when your code makes a request it will go through the proxy, which will try to use a Webmock to handle it, otherwise pass it on to the vast internet. But who wants to mock everything out manually? Over the past few weeks I set upon the task of fixing this problem and restoring sanity to my life. Good tests are transparent to the application and I’ll be damned if I use any of the suggested solutions on the internet like “Well you just put a switch in your code that knows if you’re in a test and then doesn’t talk to Stripe” Screw that. This morning I submitted a pull request to puffing-billy. I added the ability for puffing-billy to behave like it was VCR, but for your browser. When a request is made, it gets cached. The cache is then persisted between sessions, and requests are played back to the browser as needed. It’s not as sophisticated as VCR just yet, but it gets the job done and my test runtime has gone from 3 minutes to just under a minute. That’s a big deal in my book! The caching even understands that some URL’s are needlessly different on every request (social buttons, analytics etc.) so you can configure it to normalize those requests to a single recording that is played back every time. Your tests don’t really rely on gAnalytics working right? And the best thing is, you don’t even have to change your tests. You add something like this in your spec_helper.rb: Billy.configure do |c| c.cache = true c.ignore_params = ["http://www.google-analytics.com/__utm.gif", "http://b.siftscience.com/i.gif", "https://r.twimg.com/jot", "http://p.twitter.com/t.gif", "http://p.twitter.com/f.gif", "http://www.facebook.com/plugins/like.php", "https://www.facebook.com/dialog/oauth", "http://cdn.api.twitter.com/1/urls/count.json"] c.persist_cache = true c.cache_path = 'spec/req_cache/' end # need to call this because of a race condition between persist_cache # being set and the proxy being loaded for the first time Billy.proxy.restore_cache Capybara.javascript_driver = :poltergeist_billy A test for the payment form looks the same as usual: scenario "physical product" do product = start_buying build(:product, :physical, user: @seller, active: true) VCR.use_cassette('Balanced/purchase_with_cc') do within '#new_order' do fill_in 'order_email', with: Faker::Internet.safe_email fill_in_address fill_in_card click_on 'Buy Now' end page.should have_css('#receipt', :visible => true) end validate_receipt product, @seller end Puffing-billy will transparently cache every requests the browser makes and VCR records any requests made by your backend logic. It’s pretty sweet. What do you guys think? I only have 20 days of Ruby experience and the internet has told me it really wants something like this, but I couldn’t find anyone who’s already made it.
March 15, 2013
by Swizec Teller
· 8,003 Views
article thumbnail
From Java to PHP
We are welcoming some new colleagues that come from a Java background in the Onebip team, both from the development and operations field. Here's a primer on learning PHP in this situation, that you may find useful when introducing similar people in your PHP-based projects. The absolute basics Before being able to discuss meaningful matters over PHP code, these pages from the manual should be sent to each interested developer. What is the basic syntax and the available data types of PHP. Primitive types are diffused like in Java, but there's no autoboxing as there is no object equivalent for them. Strings are immutable and one of the most important types. They can be defined with single or double quotes, and their manipulation functions follow the C api. Arrays are the glue of the language, working as lists, sets and maps while wrapped into objects. They can always be traversed with foreach(), and keep an eye on the available functions to avoid rewriting array_search() or sort() by accident. Operators work differently on primitive values and on objects: == is different from === in both these context, but in two ways. Other everyday operators are `.` and `+=`. PHP's object paradigm is borrowed from Java: it supports concrete and abstract classes, interfaces, and the private, protected and public scopes. Type hints (which are actually not hints but strong preconditions) are what is most similar to Java static type safety mechanisms. I personally strongly favors their usage. Namespaces are packages, use statements are imports. However, you don't always have to write them. The deployment model of PHP consists of an interpreter instantiated N times, running on many shared nothing processes. Don't care about: The installation process of Apache and PHP: if you work in a team, you will get good help on that, and you're going to do this only once per project probably. For example, we provide a virtual machine ready for development. The function syntax is also not very useful by itself in 2013, skip directly to objects and their methods. Take a look at anonymous functions, however. Exceptions work mostly as in Java, so don't bother reading about them before coding. Sessions are also evil in web services nowadays, so ignore anything that start with $_SESSION or session_*(): write shared nothing services (that may be restricted to our team and projects.) In general, ignore also low-level APIs like setcookie() or exec() if you already have an higher-level abstraction in the application you're working on, being this abstraction a library or your own code. It's important to know how cookies are transmitted according to HTTP, but coming from another language you already know the protocol. So you write objects that go into a graph At least that's how I see programming these days. However, you have to exit the process sometimes, and PHP provides several APIs for that. The database APIs such as the mongo and PDO extensions, working respectively with MongoDB and relational database. However, if you already use a database abstraction layer, learn just that as there is nothing interesting to see at this lower level. The DateTime object and its cousins, at least know how to call its constructor and the format() method. You should know these APIs exist in order to look them up on demand: json_*() functions, the SimpleXMLElement class, the mcrypt extension and hash_hmac(), mail(). They're just a click away at php.net/function_or_class_name, and I think these names are pretty self-explanatory to tell you when you should read about them. Watch out for: Some topics are always controversial, and get confusing for PHP beginners. Raise your alert level when you feel you're encountering some strange behavior. the difference between == and === comes out often, and it cannot be reduced to "just use === by default" since all primitive types coming from HTTP requests in the classic x-www-form-urlencoded Content-Type are usually strings. php.ini settings may change the output of your application and the actual flow of execution depending on error reporting levels and display options. One never ceases to learn, so when you encounter some problematic directive, take a quick read of the rest of the directives for that extension. Since some people believe web development is the concatenation of strings (it is not), it is tempting to reinvent the wheel; for example, writing functions for composing URLS from paths and query strings. http_build_query() solves that problem as other Composer-ready packages do, so take a look around before starting to implement commodity algorithms yourself. Stack Overflow and the PHP manual itself are good places to start if your problem has been already solved by someone else in the last 10 years. Perform a kata To test your level of proficiency with PHP, execute a small kata with TDD which incidentally will also check your usage of PHPUnit. For example, classic katas are: The Fizz Buzz, maybe with a web interface. The Roman numbers kata: write a function that transforms 42 in XLII. The Game of Life, although it is probably too complex for your first PHP project if you didn't already solve it in another language.
March 13, 2013
by Giorgio Sironi
· 26,585 Views
article thumbnail
In-Memory Data Grids
Introduction The IT buzzword of 2012 is without a doubt Big Data. It’s new and here to stay, and for a good reason. Big data is data that exceeds the processing capacity of conventional database systems. Great examples are CERN with the Large Hadron Collider, whose experiments generate 25 petabytes of data annually, or Walmart, which handles more than one million customer transaction every hour. Problems These vast amounts of data leave us with two problems. Problem 1: To gain value from this data, one must choose an alternative way to process it. The value of big data to an organization falls into two categories: analytical use, and enabling new products. Big data analytics can reveal insights hidden previously by data too costly to process, such as peer influence among customers, revealed by analyzing shoppers’ transactions, social and geographical data. Being able to process every item of data in reasonable time removes the troublesome need for sampling and promotes an investigative approach to data, in contrast to the somewhat static nature of running predetermined reports. Problem 2: The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. Remember the CERN case where the LHC produces over 25 Petabytes of data annually? No “classic” database architecture or setup is capable of holding these amounts of data. Solutions Fortunately, both problems can be solved by implementing the correct infrastructure and rethinking data storage. There are two critical factors in Big Data environments: size and speed. We already discussed the vast amounts of data and desire to be able to access and process the data fast. The latter is the main differentiator from more traditional data warehouses. Just imagine what you can do when you can access all your data real-time. Enter big data. A common Big Data implementation is an in-memory data grid that lives in a distributed cluster, ensuring both speed, by storing data in-memory, and capacity by using scalability features provided by a cluster. As a bonus, availability is ensured by using a distributed cluster. As for the data storage, there are typically two kinds: in-memory databases and in-memory data grids. But first some background. It is not a new attempt to use main memory as a storage area instead of a disk. In our daily lives there are numerous examples of main memory databases (MMDB), as they perform much faster than disk-based databases. An every day example is a mobile phone. When you SMS or call someone most mobile service providers use MMDB to get the information on your contact as soon as possible. The same applies to your phone. When someone calls you, the caller details are looked up in the contacts application, usually providing a name and sometimes a picture. In memory data grids In Memory Data Grid (IMDG) is the same as MMDB in that it stores data in main memory, but it has a totally different architecture. The features of IMDG can be summarized as follows: Data is distributed and stored on multiple servers. Each server operates in the active mode. A data model is usually object-oriented (serialized) and non-relational. According to the necessity, you often need to add or reduce servers. No traditional database features such as tables. In other words, IMDG is designed to store data in main memory, ensure scalability and store an object itself. These days, there are many IMDG products, both commercial and open source. Some of the most commonly used products are: Hazelcast (http://www.hazelcast.com) JBoss Infinispan (http://www.jboss.org/infinispan) GridGain DataGrid (http://www.gridgain.com/features/in-memory-data-grid/) VMware Gemfire (http://www.vmware.com/nl/products/application-platform/vfabric-gemfire/overview.html) Oracle Coherence (http://www.oracle.com/technetwork/middleware/coherence/overview/index.html) Gigaspaces XAP (http://www.gigaspaces.com/datagrid) Terracotta Enterprise Suite (http://terracotta.org/products/enterprise-suite) Why Memory? The main reasons for using main memory for data storage are once again the two main themes of Big Data: speed and capacity. The processing performance of main memory is 800 times faster than an HDD and up to 40 times faster than an. Moreover, the latest x86 server supports main memory of hundreds of GB per server. It is said that the limit of a traditional processing database’s (OLTP) data capacity is approximately 1 TB and that the OLTP processing data capacity would not increment well. If servers using main memory of 1 TB or larger become more commonly used, you will be able to conduct operations with the entire data placed in main memory, at least in the field of OLTP. IMDG Architecture To use main memory as a storage area, two weak points should be overcome: Limited capacity: involves data that exceeds the maximum capacity of the main memory of the server Reliability: involves data loss in case of a (system) failure. IMDG overcomes the limit of capacity by ensuring horizontal scalability using a distributed architecture, and resolves the issue of reliability through a replication system as part of the grid (or a distributed cluster). Now let’s discuss how an IMDG actually works. First of all, it is important to understand that an IMDG is not the same as an in-memory database, also referred to as MMDB (main memory databases). Typical examples of MMDBs are Oracle TimesTen or Sap Hana. MMDBs are full database products that simply reside in memory. As a result of being a full-blown database, they also carry the weight and overhead of database management features. IMDG is different. No tables, indexes, triggers, stored procedures, process managers etc. Just plain storage. The data model used in IMDG is key-value pairs. A key-value pair is a list with only two parts: a key and a value. The key can be used for storing and retrieving the values in the list. A key can be compared to the index or primary key of a table in a database. Note that IMDG are closely tied to development environments such as Java as the key-value pairs are represented by the structures provided by such a programming environment. Most IMDGs are written in Java, and can only be used within other Java applications. Therefore, the values of key-value pairs can be anything supported by Java, ranging from simple data types such as a string or number, to complex objects. This overcomes the two important hurdles: as you can store complex Java objects as value, there’s no need to translate these objects into a relational datamodel (which is the case in more traditional applications using a database for storage). Furthermore, the seeming limitation of being able to store only one value per key, is actually no limitation at all. Large memory sizes Most of the products introduced above use Java as an implementation language. Java reserves and uses a part of the RAM (internal memory) for dynamic memory allocation. This reserved memory space is called the Java heap. All runtime objects created by a Java application are stored in heap. Using large amounts of data causes two problems. Size limitation: By default, the heap size is 128 MB, but for current business applications, this limit is reached easily. Once the heap is “full”, no new objects can be created and the Java application will show some nasty errors. Performance: It is possible to increase the size of the heap, but this introduces some new problems. When a heap reaches a size of more than 4 gigabytes, Java will have serious issues with memory managements, causing your application to slow down or even freeze. Java has a feature called Garbage Collector, which periodically scans the heap and checks each object if it is still valid and being used. If not, the garbage collector removes the object and defragments the newly available space. The problem is, the larger the heap size, the more work to do for the garbage collector, resulting in performance degradation. Imagine a large bank has a Java application that manages customers, accounts and transactions. We have seen that an IMDG allows the application to store and access all data very quickly by caching it in memory, instead of storing the data in relatively slow databases. Let’s assume the combined data has a size of 40 gigabytes. Storing it in heap is simply not possible, considering the performance penalties of Java’s memory management capabilities. The graph below illustrates the garbage collection pause time when placing cached data in heap: Terracotta’s BigMemory product has a method to overcome these limitations. The method is to use an off-heap memory (direct buffer). Data will not be stored in Java’s heap, but directly in the available internal memory (RAM). Since, this is not subject to Java’s garbage collector, there are no performance penalties. The differences on performance are significant, as can be seen in the graph below: Using off-heap storage has some major benefits: You can use all the available memory on your machine, not just the memory that is allocated to the heap (usually less that 512 Mb). This allows you to store more data in a in-memory data grid, greatly speeding up your application. The heap can be relieved by storing data in native memory, speeding up Java applications as less heap space has to be garbage collected. Clustering, fail over and high availability So far, we have seen IMDG features that are applicable to a single server. However, the real power of IMDG lies in it’s networking and clustering capabilities, providing features as data replication, data synchronization between clients, fail over and high availability. To achieve this, a cluster of servers (or server array) acts a backbone of the infrastructure. Applications (that still can have their own IMDG or off-heap cache) that are connected to the cluster can share, replicate and backup their data with either the cluster or other applications. The graph below depicts a typical setup using Terracotta's BigMemory: The caches on the application servers are usually referred to as “level 1” cache, while the data cache on the server array is referred to as “level 2” cache. There are many different scenarios possible for storing, clustering, synchronizing and replicating data. Covering all these topics goes far beyond the scope of this article. For more information, consult the technical documentation of the product of your choice. Conclusion Big Data brings us some new challenges. First of all, storing and accessing vast amounts of data makes us rethink traditional methods and technologies. Next, there’s the question what to do with all the available data. The potential value for marketing, financial and other businesses is huge. In order to facilitate Big Data, in-memory data grids are considered the best option. IMDGs with off-heap storage are even more powerful, allowing data centric enterprise application to overcome certain limits of the Java platform, such as memory and performance constraints. As the amount of data that (large) companies produce and store, grows exponentially, databases will hit a limit. Accessing your data without a performance penalty simply will not be possible. The answer to this is using an IMDG.
March 13, 2013
by Roy Prins
· 32,634 Views · 5 Likes
article thumbnail
Database Concepts for a java Dev: Database Normalization
In this part, I will be briefing about different types of Database Normalizations using a sample data model. What is Database Normalization? Normalization is the process of efficiently organizing data in the database. Primary Goal of Normalization? Eliminating redundant data & ensuring meaningful data dependencies. Types of Normalization The following are the three most common normal forms in the database normalization process First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Sample Data Model for Demonstration The following data model will be used to demonstrate all the three normal forms First Normal Form (1NF) First Normal Form (1NF) sets the very basic rules for an organized database: Create separate set of tables for each group of related data and identify each row with a unique columns [primary key] or set of columns [composite key] Eliminate duplicate columns from the table The following data model depicts the tables after 1NF rules are applied - Second Normal Form (2NF) Second Normal Form (2NF) further addresses the concept of removing duplicate data: Meet all the requirements of the first normal form Remove subsets of data that apply to multiple rows of a table and place them in separate tables Create relationships between these new tables and their predecessors through the use of foreign keys So basically the objective of the Second Normal Form is to take that is only partly dependent on the primary key and enter that data into another table. The following data model depicts the tables after 2NF rules are applied. Data from EMPLOYEE_TABLE is split into 2 tables – EMPLOYEE_TABLE and EMPLOYEE_HR_TABLE. Similarly data from CUSTOMER_TABLE is moved to CUSTOMER_TABLE and CUSTOMER_ORDER table Third Normal Form (3NF) Third normal form (3NF) goes one large step further: Meet all the requirements of the second normal form. Remove columns that are not dependent upon the primary key. The following data model depicts the tables after 3NF rules are applied. Further state and country details are moved to their own tables because they are not dependent on the primary key. Advantages of Normalizing the Database There are several advantages of normalization - Data can be stored as small atomic pieces Saves space Increases speed Reduces data anomalies Easy maintenance Other parts of this series include: Part 1 – ACID Properties Part 2 – Keys Part 4 – Database Transactions [coming soon] Part 5 – Indexes [coming soon]
March 13, 2013
by Jagadeesh Motamarri
· 10,928 Views · 1 Like
article thumbnail
Maven's Non-Resolvable Parent POM Problem
Need help dealing with Maven's non-resolvable parent problem? Check out this post to learn how.
March 12, 2013
by Roger Hughes
· 462,959 Views · 8 Likes
article thumbnail
Consider Replacing Spring XML Configuration with JavaConfig
Spring articles are becoming a trend on this blog, I should probably apply for a SpringSource position Colleagues of mine sometimes curse me for my stubbornness in using XML configuration for Spring. Yes, it seems so 2000′s but XML has definite advantages: Configuration is centralized, it’s not scattered among all different components so you can have a nice overview of beans and their wirings in a single place If you need to split your files, no problem, Spring let you do that. It then reassembles them at runtime through internal tags or external context files aggregation Only XML configuration allows for explicit wiring – as opposed to autowiring. Sometimes, the latter is a bit too magical for my own taste. Its apparent simplicity hides real complexity: not only do we need to switch between by-type and by-name autowiring, but more importantly, the strategy for choosing the relevant bean among all eligible ones escapes but the more seasoned Spring developers. Profiles seem to make this easier, but is relatively new and is known to few Last but not least, XML is completely orthogonal to the Java file: there’s no coupling between the 2 so that the class can be used in more than one context with different configurations The sole problem with XML is that you have to wait until runtime to discover typos in a bean or some other stupid boo-boo. On the other side, using Spring IDE plugin (or the integrated Spring Tools Suite) definitely can help you there. An interesting alternative to both XML and direct annotations on bean classes is JavaConfig, a former separate project embedded into Spring itself since v3.0. It merges XML decoupling advantage with Java compile-time checks. JavaConfig can be seen as the XML file equivalent, only written in Java. The whole documentation is of course available online, but this article will just let you kickstart using JavaConfig. As an example, let us migrate from the following XML file to a JavaConfig The equivalent file is the following: import java.net.MalformedURLException; import java.net.URL; import javax.swing.Icon; import javax.swing.ImageIcon; import javax.swing.JButton; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; @Configuration public class MigratedConfiguration { @Bean public JButton button() { return new JButton("Hello World"); } @Bean public JButton anotherButton() { return new JButton(icon()); } @Bean public Icon icon() throws MalformedURLException { URL url = new URL("http://morevaadin.com/assets/images/learning_vaadin_cover.png"); return new ImageIcon(url); } } Usage is simpler than simple: annotate the main class with @Configuration and individual producer methods with @Bean. The only drawback, IMHO, is that it uses autowiring. Apart from that, It just works. Note that in a Web environment, the web deployment descriptor should be updated with the following lines: contextClass org.springframework.web.context.support.AnnotationConfigWebApplicationContext contextConfigLocation com.packtpub.learnvaadin.springintegration.SpringIntegrationConfiguration Sources for this article are available in Maven/Eclipse format here. To go further: Java-based container configuration documentation AnnotationConfigWebApplicationContextJavaDoc @ContextConfigurationJavaDoc (to configure Spring Test to use JavaConfig)
March 10, 2013
by Nicolas Fränkel
· 88,750 Views · 1 Like
article thumbnail
Advanced ListenableFuture Capabilities
Last time we familiarized ourselves with ListenableFuture. I promised to introduced more advanced techniques, namely transformations and chaining. Let's start from something straightforward. Say we have our ListenableFuture which we got from some asynchronous service. We also have a simple method: Document parse(String xml) {//... We don't need String, we need Document. One way would be to simply resolve Future (wait for it) and do the processing on String. But much more elegant solution is to apply transformation once the results are available and treat our method as if was always returning ListenableFuture. This is pretty straightforward: final ListenableFuture future = //... final ListenableFuture documentFuture = Futures.transform(future, new Function() { @Override public Document apply(String contents) { return parse(contents); } }); or more readable: final Function parseFun = new Function() { @Override public Document apply(String contents) { return parse(contents); } }; final ListenableFuture future = //... final ListenableFuture documentFuture = Futures.transform(future, parseFun); Java syntax is a bit limiting, but please focus on what we just did. Futures.transform() doesn't wait for underlying ListenableFuture to apply parse() transformation. Instead, under the hood, it registers a callback, wishing to be notified whenever given future finishes. This transformation is applied dynamically and transparently for us at right moment. We still have Future, but this time wrapping Document. So let's go one step further. We also have an asynchronous, possibly long-running method that calculates relevance (whatever that is in this context) of a given Document: ListenableFuture calculateRelevance(Document pageContents) {//... Can we somehow chain it with ListenableFuture we already have? First attempt: final Function> relevanceFun = new Function>() { @Override public ListenableFuture apply(Document input) { return calculateRelevance(input); } }; final ListenableFuture future = //... final ListenableFuture documentFuture = Futures.transform(future, parseFun); final ListenableFuture> relevanceFuture = Futures.transform(documentFuture, relevanceFun); Ouch! Future of future of Double, that doesn't look good. Once we resolve outer future we need to wait for inner one as well. Definitely not elegant. Can we do better? final AsyncFunction relevanceAsyncFun = new AsyncFunction() { @Override public ListenableFuture apply(Document pageContents) throws Exception { return calculateRelevance(pageContents); } }; final ListenableFuture future = //comes from ListeningExecutorService final ListenableFuture documentFuture = Futures.transform(future, parseFun); final ListenableFuture relevanceFuture = Futures.transform(documentFuture, relevanceAsyncFun); Please look very carefully at all types and results. Notice the difference between Function and AsyncFunction. Initially we got an asynchronous method returning future of String. Later on we transformed it to seamlessly turn String into XML Document. This transformation happens asynchronously, when inner future completes. Having future of Document we would like to call a method that requires Document and returns future of Double. If we call relevanceFuture.get(), our Future object will first wait for inner task to complete and having its result (String -> Document) will wait for outer task and return Double. We can also register callbacks on relevanceFuture which will fire when outer task (calculateRelevance()) finishes. If you are still here, the are even more crazy transformations. Remember that all this happens in a loop. For each web site we got ListenableFuture which we asynchronously transformed to ListenableFuture. So in the end we work with a List>. This also means that in order to extract all the results we either have to register listener for each and every ListenableFuture or wait for each of them. Which doesn't progress us at all. But what if we could easily transform from List> to ListenableFuture>? Read carefully - from list of futures to future of list. In other words, rather than having a bunch of small futures we have one future that will complete when all child futures complete - and the results are mapped one-to-one to target list. Guess what, Guava can do this! final List> relevanceFutures = //...; final ListenableFuture> futureOfRelevance = Futures.allAsList(relevanceFutures); Of course there is no waiting here as well. Wrapper ListenableFuture> will be notified every time one of its child futures complete. The moment the last child ListenableFuture completes, outer future completes as well. Everything is event-driven and completely hidden from you. Do you think that's it? Say we would like to compute the biggest relevance in the whole set. As you probably know by now, we won't wait for a List. Instead we will register transformation from List to Double! final ListenableFuture maxRelevanceFuture = Futures.transform(futureOfRelevance, new Function, Double>() { @Override public Double apply(List relevanceList) { return Collections.max(relevanceList); } }); Finally, we can listen for completion event of maxRelevanceFuture and e.g. send results (asynchronously!) using JMS. Here is a complete code if you lost track: private Document parse(String xml) { return //... } private final Function parseFun = new Function() { @Override public Document apply(String contents) { return parse(contents); } }; private ListenableFuture calculateRelevance(Document pageContents) { return //... } final AsyncFunction relevanceAsyncFun = new AsyncFunction() { @Override public ListenableFuture apply(Document pageContents) throws Exception { return calculateRelevance(pageContents); } }; //... final ListeningExecutorService pool = MoreExecutors.listeningDecorator( Executors.newFixedThreadPool(10) ); final List> relevanceFutures = new ArrayList<>(topSites.size()); for (final URL siteUrl : topSites) { final ListenableFuture future = pool.submit(new Callable() { @Override public String call() throws Exception { return IOUtils.toString(siteUrl, StandardCharsets.UTF_8); } }); final ListenableFuture documentFuture = Futures.transform(future, parseFun); final ListenableFuture relevanceFuture = Futures.transform(documentFuture, relevanceAsyncFun); relevanceFutures.add(relevanceFuture); } final ListenableFuture> futureOfRelevance = Futures.allAsList(relevanceFutures); final ListenableFuture maxRelevanceFuture = Futures.transform(futureOfRelevance, new Function, Double>() { @Override public Double apply(List relevanceList) { return Collections.max(relevanceList); } }); Futures.addCallback(maxRelevanceFuture, new FutureCallback() { @Override public void onSuccess(Double result) { log.debug("Result: {}", result); } @Override public void onFailure(Throwable t) { log.error("Error :-(", t); } }); Was it worth it? Yes and no. Yes, because we learned some really important constructs and primitives used together with futures/promises: chaining, mapping (transforming) and reducing. The solution is beautiful in terms of CPU utilization - no waiting, blocking, etc. Remember that the biggest strength of Node.js is its "no-blocking" policy. Also in Netty futures are ubiquitous. Last but not least, it feels very functional. On the other hand, mainly due to Java syntax verbosity and lack of type inference (yes, we will jump into Scala soon) code seems very unreadable, hard to follow and maintain. Well, to some degree this holds true for all message driven systems. But as long as we don't invent better APIs and primitives, we must learn to live and take advantage of asynchronous, highly parallel computations. If you want to experiment with ListenableFuture even more, don't forget to read official documentation.
March 7, 2013
by Tomasz Nurkiewicz
· 23,501 Views · 1 Like
article thumbnail
Sequelize, the JavaScript ORM, in practice
node.js is well-know for its good connectivity with nosql databases. a less know fact is that it's also very efficient with relational databases. among the dozens orms out there in javascript, one stands out for relational databases: sequelize . it's quite easy to learn but there are not many pointers about how to organize model code with this module. here are a few tips we learned by using sequelize in a medium size project. sequelize 101 sequelize claims to supports mysql, postgresql and sqlite. the sequelize docs explain the first steps with the javascript orm. first, initialize a database connection, then a few models, without worrying about primary or foreign keys: var sequelize = new sequelize('database', 'username'[, 'password']) var project = sequelize.define('project', { title: sequelize.string, description: sequelize.text }); var task = sequelize.define('task', { title: sequelize.string, description: sequelize.text, deadline: sequelize.date }); project.hasmany(task); next, create new instances and persist them, query the database, etc. // create an instance var task = task.build({title: 'very important task'}) task.title // ==> 'very important task' // persist an instance task.save() .error(function(err) { // error callback }) .success(function() { // success callback }); // query persistence for instances var tasks = task.all({ where: ['dealdine < ?', new date()] }) .error(function(err) { // error callback }) .success(function() { // success callback }); sequelize uses promises so you can chain error and success callbacks, and it all plays well with unit tests. all that is pretty well documented, but the sequelize documentation only covers the basic usage. once you start using sequelize in real world projects, finding the right way to implement a feature gets trickier. model file structure all the examples in the sequelize documentation show all model declarations grouped in a single file. once a project reaches production size, this is not a viable approach. the alternative is to import models from a module using sequelize.import() . the problem is that relationships rely on several models. when you declare a relationship, models from both sides of the relationship must already be imported. you should not import model files from other model files because of node.js module caching policy (more on that later on); instead, you can define relationships in a standalone file. here is the file structure we've been working with: models/ index.js # import all models and creates relationships phonenumber.js task.js user.js ... and here is how the main models/index.js initializes the entire model: var sequelize = require('sequelize'); var config = require('config').database; // we use node-config to handle environments // initialize database connection var sequelize = new sequelize( config.name, config.username, config.password, config.options ); // load models var models = [ 'phonenumber', 'task', 'user' ]; models.foreach(function(model) { module.exports[model] = sequelize.import(__dirname + '/' + model); }); // describe relationships (function(m) { m.phonenumber.belongsto(m.user); m.task.belongsto(m.user); m.user.hasmany(m.task); m.user.hasmany(m.phonenumber); })(module.exports); // export connection module.exports.sequelize = sequelize; using models in code from other parts of the application, if you need a model class, require the models/index.js instead of the standalone model file. that way, you don't have to repeat the sequelize initialization. var models = require('./models'); var user = models.user; var user = user.build({ first_name: "john", last_name: "doe "}); the problem is, when you require the models/index.js file, node may use a cached version of the module... or not: from nodejs.org : multiple calls to require('foo') may not cause the module code to be executed multiple times. (...) modules are cached based on their resolved filename. since modules may resolve to a different filename based on the location of the calling module (loading from node_modules folders), it is not a guarantee that require('foo') will always return the exact same object, if it would resolve to different files. that means that using require('./models') to get the models may create more than one connection to the database. to avoid that, the models variable must be somehow singleton-esque. this can be easily achieved, if you're using a framework like expressjs , by attaching the models module to the application: app.set('models', require('./models')); and when you need to require a class of the model in a controller, use this application setting rather than a direct import: var user = app.get('models').user; accessing other models sequelize models can be extended with class and instance methods. you can add abilities to model classes, much like in a true activerecord implementation. but a problem arises when adding a method that depends on another model: how can a model access another one? // in models/user.js module.exports = function(sequelize, datatypes) { return sequelize.define('user', { first_name: datatypes.string, last_name: datatypes.string, }, { instancemethods: { counttasks: function() { // how to implement this method ? } } }); }; if the two models share a relationship, there is a way. here, one user has many tasks , that makes the task model accessible through user.associations['tasks'].target . and here is yet another problem: since sequelize doesn't use prototype-based inheritance, how can a user instance gain access to the user class? digging into the sequelize source brings the protected __factory to the light. with all this undocumented knowledge, it is now possible to write the counttasks() instance method: counttasks: function() { return this.__factory.associations['tasks'].target.count({ where: { user_id: this.id } }); } note that task.count() returns a promise, so counttasks() also returns a promise: user.counttasks().success(function(nbtasks) { // do somethig with the user task count }); extending models (a.k.a behaviors) what if you need to reuse several methods across several models? sequelize doesn't have a behavior system per se (or "concerns" in the ruby on rails terminology), although it's quite easy to implement . for now, you're condemned to import common methods before the call to sequelize.define() and use sequelize.utils._.extend() to add it to the instancemethods or classmethods object: // in models/friendlyurl.js module.exports = function(keys) { return { geturl: function() { var ret = ''; keys.foreach(function(key) { ret += this[key]; }) return ret .tolowercase() .replace(/^\s+|\s+$/g, "") // trim whitespace .replace(/[_|\s]+/g, "-") .replace(/[^a-z0-9-]+/g, "") .replace(/[-]+/g, "-") .replace(/^-+|-+$/g, ""); } }; } // in models/user.js var friendlyurlmethods = require('./friendlyurl')(['first_name', 'last_name']); module.exports = function(sequelize, datatypes) { return sequelize.define('user', { first_name: datatypes.string, last_name: datatypes.string, }, { instancemethods: sequelize.utils._.extend({}, friendlyurlmethods, { counttasks: function() { return this.__factory.associations['tasks'].target.count({ where: { user_id: this.id } }); } }); }) }; now the user model instances gain access to a geturl() method: var user = user.build({ first_name: 'john', last_name: 'doe' }); user.geturl(); // 'john_doe' a limitation of this trick is that you must define behaviors before the actual model. this forbids behaviors accessing other models. query series sequelize provides a tool called the querychainer to ease the resynchronization of queries. new sequelize.utils.querychainer() .add(user, 'find', [id]) .add(task, 'findall') .error(function(err) { /* hmm not good :> */ }) .success(function(results) { var user = results[0]; var tasks = results[1]; // do things with the results }); after using it a little, this utility turns out to be very limited. most notably, querychainer executes all queries in parallel by default. and you only get access to the results of the queries in the final callback - no way to pass values from one query to the other. we've found it much more convenient to use a generic resynchronizing module like async , which provides the wonderful async.auto() utility. this method lets you list tasks together with dependencies, and determines which task can be run in parallel, and which must be run in series. async.auto([ user: function(next) { user.find(id).complete(next); }, tasks: ['user', function(next) { tasks.findall({ where: { user_id: user.id } }).complete(next); }] ], function(err, results) { var user = results.user; var tasks = results.tasks; // do things with the results }); notice the complete() method, which is an alternative to the two success() and error() callbacks. complete() accepts a callback with the signature (err, res) , which is more natural in the node.js world, and compatible with async . prefetching one thing orms are usually good at is minimizing queries. sequelize offers a prefetching feature, allowing to group two queries in a single one using a join. for instance, if you want to retrieve a task together with the related user, write the query as follows: task.find({ where: { id: id } }, include: ['user']) .error(function(err) { // error callback }) .success(function(task) { task.getuser(); // does not trigger a new query }); this is another undocumented feature, although the documentation should be updated soon . migrations sequelize provides a migration command line utility. but because it only allows modifying the model using sequelize commands (and not calling any asynchronous method of your own ), this migration command falls short. as of now, we've been handling migrations manually using numbered sql files and a custom utility to run them in order. custom sql queries sequelize is built over database adapters, and as such provides a way to execute arbitrary sql queries against the database. here is an example: var util = require('util'); var query = 'select * from `task` ' + 'left join `user` on `task`.`userid` = `user`.`id` ' + 'where `user`.`last_name` = %s'; var escapedname = sequelize.constructor.utils.escape(last_name); querywithparams = util.format(query, escapedname); sequelize.query(querywithparams, task) .error(function(err) { // error callback }) .success(function(tasks) { task.getuser(); // does not trigger a new query }); sequelize.query() returns a promise just like other query functions. if you provide the model to use for hydration ( task in this case), the query() method returns model instances rather than a simple json. note that you must escape values by hand before concatenating them into the sql query. for strings, sequelize.constructor.utils.escape() is your friend. for integers, util.format('%d') should do the trick. conclusion is sequelize ready for prime time ? almost. the learning curve is made longer by an incomplete documentation, but most of the features required by a production-level orm are there. however, i wouldn't recommend it for production just yet if you're not ready to run on your own fork, since the rate at which prs are merged in the sequelize github repository is low.
March 5, 2013
by Francois Zaninotto
· 52,864 Views · 2 Likes
article thumbnail
MySQL optimizer: ANALYZE TABLE and Waiting for table flush
This post comes from Miguel Angel Nieto at the MySQL Performance Blog. The MySQL optimizer makes the decision of what execution plan to use based on the information provided by the storage engines. That information is not accurate in some engines like InnoDB and they are based in statistics calculations therefore sometimes some tune is needed. In InnoDB these statistics are calculated automatically, check the following blog post for more information: http://www.mysqlperformanceblog.com/2011/10/06/when-does-innodb-update-table-statistics-and-when-it-can-bite/ There are some variables to tune how that statistics are calculated but we need to wait until the gathering process triggers again to see if there is any improvement. Usually the first step to try to get back to the previous execution plan is to force that process with ANALYZE TABLE that is usually fast enough to not cause troubles. Let’s see an example of how a simple and fast ANALYZE can cause a downtime. Waiting for table flush: In order to trigger this problem we need: - Lot of concurrency - A long running query - Run an ANALYZE TABLE on a table accessed by the long running query So first we need a long running query against table t: SELECT * FROM t WHERE c > '%c%'; Then in our efforts to get a better execution plan for another query we run ANALYZE TABLE: mysql> analyze table t; +--------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +--------+---------+----------+----------+ | test.t | analyze | status | OK | +--------+---------+----------+----------+ 1 row in set (0.00 sec) Perfect, very fast! But then some seconds later we realize that our application is down. Let’s see the process list. I’ve removed most of the columns to make it clearer: +------+-------------------------+---------------------------------+ | Time | State | Info | | 32 | Writing to net | select * from t where c > '%0%' | | 12 | Waiting for table flush | select * from test.t where i=1 | | 12 | Waiting for table flush | select * from test.t where i=2 | | 12 | Waiting for table flush | select * from test.t where i=3 | | 11 | Waiting for table flush | select * from test.t where i=7 | | 10 | Waiting for table flush | select * from test.t where i=11 | | 11 | Waiting for table flush | select * from test.t where i=5 | | 11 | Waiting for table flush | select * from test.t where i=4 | | 11 | Waiting for table flush | select * from test.t where i=9 | | 11 | Waiting for table flush | select * from test.t where i=8 | | 11 | Waiting for table flush | select * from test.t where i=12 | | 11 | Waiting for table flush | select * from test.t where i=14 | | 10 | Waiting for table flush | select * from test.t where i=6 | | 10 | Waiting for table flush | select * from test.t where i=15 | | 10 | Waiting for table flush | select * from test.t where i=10 | [...] The ANALYZE TABLE runs perfect but after it the rest of the threads that are running a query against that table need to wait. This is because MySQL has detected that the underlying table has changed and it needs to close and reopen it using FLUSH. Therefore the table will be locked until all queries that are using that table finish. There are only two solutions to this situation, wait until the long query finishes or kill the query. Also, we have to take in account that killing a query could cause even more troubles. If we are dealing with a write query on InnoDB the rollback process could take even more time to finish than the original query. On the other hand, if the table is MyISAM there will be no rollback process so all the already updated rows can’t be recovered. This particular example is not only a problem of ANALYZE. Other commands like FLUSH TABLES, ALTER, RENAME, OPTIMIZE or REPAIR can cause threads to wait on “Waiting for tables”, “Waiting for table” and “Waiting for table flush”. Conclusion Before running an ANALYZE table or any other command listed before, check the running queries. If the table that you are going to work on is very used the recommendation is to run it during the low peak of load or a maintenance window.
February 28, 2013
by Peter Zaitsev
· 12,059 Views
article thumbnail
How Expensive is a Method Call in Java?
We have all been there. Looking at the poorly designed code while listening to the author’s explanations about how one should never sacrifice performance over design. And you just cannot convince the author to get rid of his 500-line methods because chaining method calls would destroy the performance. Well, it might have been true in 1996 or so. But since then JVM has evolved to be an amazing piece of software. One way to find out about it is to start looking more deeply into optimizations carried out by the virtual machine. The arsenal of techniques applied by the JVM is quite extensive, but lets look into one of them in more details. Namely method inlining . It is easiest to explain via the following sample: int sum(int a, int b, int c, int d) { return sum(sum(a, b),sum(c, d)); } int sum(int a, int b) { return a + b; } When this code is run, the JVM will figure out that it can replace it with a more effective, so called “inlined” code: int sum(int a, int b, int c, int d) { return a + b + c + d; } You have to pay attention that this optimization is done by the virtual machine and not by the compiler. It is not transparent at the first place why this decision was made. After all – if you look at the sample code above – why postpone optimization when compilation can produce more efficient bytecode? But considering also other not-so obvious cases, JVM is the best place to carry out the optimization: JVM is equipped with runtime data besides static analysis. During runtime JVM can make better decisions based on what methods are executed most often, what loads are redundant, when is it safe to use copy propagation, etc. JVM has got information about the underlying architecture – number of cores, heap size and configuration and can thus make the best selection based on this information. But let us see those assumptions in practice. I have created a small test application which uses several different ways to add together 1024 integers. A relatively reasonable one, where the implementation just iterates over the array containing 1024 integers and sums the result together. This implementation is available in InlineSummarizer.java . Recursion based divide-and-conquer approach. I take the original 1024 – element array and recursively divide it into halves – the first recursion depth thus gives me two 512-element arrays, the second depth has four 256-element arrays and so forth. In order to sum together all the 1024 elements I introduce 1023 additional method invocations. This implementation is attached as RecursiveSummarizer.java . Naive divide-and-conquer approach. This one also divides the original 1024-element array, but via calling additional instance methods on the separated halves – namely I nest sum512(), sum256(), sum128(), …, sum2() calls until I have summarized all the elements. As with recursion, I introduce 1023 additional method invocations in the source code . And I have a test class to run all those samples. The first results are from unoptimized code: As seen from the above, the inlined code is the fastest. And the ones where we have introduced 1023 additional method invocations are slower by ~25,000ns. But this image has to be interpreted with a caveat – it is a snapshot from the runs where JIT has not yet fully optimized the code. In my mid-2010 MB Pro it took between 200 and 3000 runs depending on the implementation. The more realistic results are below. I have ran all the summarizer implementations for more than 1,000,000 times and discarded the runs where JIT has not yet managed to perform it’s magic. We can see that even though inlined code still performed best, the iterative approach also flew at a decent speed. But recursion is notably different – when iterative approach close in with just 20% overhead, RecursiveSummarizer takes 340% of the time the inlined code needs to complete. Apparently this is something one should be aware of – when you use recursion, the JVM is helpless and cannot inline method calls. So be aware of this limitation when using recursion. Recursion aside – method overheads are close to being non-existent. Just 205 ns difference between having 1023 additional method invocations in your source code. Remember, those were nanoseconds (10^-9 s) over there that we used for measurement. So thanks to JIT we can safely neglect most of the overhead introduced by method invocations. The next time when your coworker is hiding his lousy design decisions behind the statement that popping through a call stack is not efficient, let him go through a small JIT crash course first. And if you wish to be well-equipped to block his future absurd statements, subscribe to either our RSS or Twitter feed and we are glad to provide you future case studies. Full disclosure: the inspiration for the test case used in this article was triggered by Tomasz Nurkiewicz blog post .
February 27, 2013
by Nikita Salnikov-Tarnovski
· 13,035 Views
  • Previous
  • ...
  • 426
  • 427
  • 428
  • 429
  • 430
  • 431
  • 432
  • 433
  • 434
  • 435
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×