DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Databases Topics

article thumbnail
Simplifying the Data Access Layer with Spring and Java Generics
1. Overview This is the second of a series of articles about Persistence with Spring. The previous article discussed setting up the persistence layer with Spring 3.1 and Hibernate, without using templates. This article will focus on simplifying the Data Access Layer by using a single, generified DAO, which will result in elegant data access, with no unnecessary clutter. Yes, in Java. The Persistence with Spring series: Part 1 – The Persistence Layer with Spring 3.1 and Hibernate Part 3 – The Persistence Layer with Spring 3.1 and JPA Part 4 – The Persistence Layer with Spring Data JPA Part 5 – Transaction configuration with JPA and Spring 3.1 2. The DAO mess Most production codebases have some kind of DAO layer. Usually the implementation ranges from a raw class with no inheritance to some kind of generified class, but one thing is consistent – there is always more then one. Most likely, there are as many DAOs as there are entities in the system. Also, depending on the level of generics involved, the actual implementations can vary from heavily duplicated code to almost empty, with the bulk of the logic grouped in an abstract class. 2.1. A Generic DAO Instead of having multiple implementations – one for each entity in the system – a single parametrized DAO can be used in such a way that it still takes full advantage of the type safety provided by generics. Two implementations of this concept are presented next, one for a Hibernate centric persistence layer and the other focusing on JPA. These implementation are by no means complete – only some data access methods are included, but they can be easily be made more thorough. 2.2. The Abstract Hibernate DAO public abstract class AbstractHibernateDAO< T extends Serializable > { private Class< T > clazz; @Autowired SessionFactory sessionFactory; public void setClazz( Class< T > clazzToSet ){ this.clazz = clazzToSet; } public T findOne( Long id ){ return (T) this.getCurrentSession().get( this.clazz, id ); } public List< T > findAll(){ return this.getCurrentSession() .createQuery( "from " + this.clazz.getName() ).list(); } public void save( T entity ){ this.getCurrentSession().persist( entity ); } public void update( T entity ){ this.getCurrentSession().merge( entity ); } public void delete( T entity ){ this.getCurrentSession().delete( entity ); } public void deleteById( Long entityId ){ T entity = this.getById( entityId ); this.delete( entity ); } protected Session getCurrentSession(){ return this.sessionFactory.getCurrentSession(); } } The DAO uses the Hibernate API directly, without relying on any Spring templates (such as HibernateTemplate). Using of templates, as well as management of the SessionFactory which is autowired in the DAO were covered in the previous post of the series. 2.3. The Abstract JPA DAO public abstract class AbstractJpaDAO< T extends Serializable > { private Class< T > clazz; @PersistenceContext EntityManager entityManager; public void setClazz( Class< T > clazzToSet ){ this.clazz = clazzToSet; } public T findOne( Long id ){ return this.entityManager.find( this.clazz, id ); } public List< T > findAll(){ return this.entityManager.createQuery( "from " + this.clazz.getName() ) .getResultList(); } public void save( T entity ){ this.entityManager.persist( entity ); } public void update( T entity ){ this.entityManager.merge( entity ); } public void delete( T entity ){ this.entityManager.remove( entity ); } public void deleteById( Long entityId ){ T entity = this.getById( entityId ); this.delete( entity ); } } Similar to the Hibernate DAO implementation, the Java Persistence API is used here directly, again not relying on the now deprecated Spring JpaTemplate. 2.4. The Generic DAO Now, the actual implementation of the generic DAO is as simple as it can be – it contains no logic. Its only purpose is to be injected by the Spring container in a service layer (or in whatever other type of client of the Data Access Layer): @Repository @Scope( BeanDefinition.SCOPE_PROTOTYPE ) public class GenericJpaDAO< T extends Serializable > extends AbstractJpaDAO< T > implements IGenericDAO< T >{ // } @Repository @Scope( BeanDefinition.SCOPE_PROTOTYPE ) public class GenericHibernateDAO< T extends Serializable > extends AbstractHibernateDAO< T > implements IGenericDAO< T >{ // } First, note that the generic implementation is itself parametrized – allowing the client to choose the correct parameter in a case by case basis. This will mean that the clients gets all the benefits of type safety without needing to create multiple artifacts for each entity. Second, notice the prototype scope of these generic DAO implementation. Using this scope means that the Spring container will create a new instance of the DAO each time it is requested (including on autowiring). That will allow a service to use multiple DAOs with different parameters for different entities, as needed. The reason this scope is so important is due to the way Spring initializes beans in the container. Leaving the generic DAO without a scope would mean using the default singleton scope, which would lead to a single instance of the DAO living in the container. That would obviously be majorly restrictive for any kind of more complex scenario. 3. The Service There is now a single DAO to be injected by Spring; also, the Class needs to be specified: @Service class FooService implements IFooService{ IGenericDAO< Foo > dao; @Autowired public void setDao( IGenericDAO< Foo > daoToSet ){ this.dao = daoToSet; this.dao.setClazz( Foo.class ); } // ... } Spring autowires the new DAO insteince using setter injection so that the implementation can be customized with the Class object. After this point, the DAO is fully parametrized and ready to be used by the service. 4. Conclusion This article discussed the simplification of the Data Access Layer by providing a single, reusable implementation of a generic DAO. This implementation was presented in both a Hibernate and a JPA based environment. The result is a streamlined persistence layer, with no unnecessary clutter. For a step by step introduction about setting up the Spring context using Java based configuration and the basic Maven pom for the project, see this article. The next article of the Persistence with Spring series will focus on setting up the DAL layer with Spring 3.1 and JPA. In the meantime, you can check out the full implementation in the github project. If you read this far, you should follow me on twitter here.
January 5, 2012
by Eugen Paraschiv
· 25,017 Views · 1 Like
article thumbnail
How to deploy a neo4j instance in Amazon EC2 in 10 minutes
Neo4j is a high-performance, NOSQL graph database with all the features of a mature and robust database. In this post I will explain how to deploy a neo4j instance in Amazon EC2 web service. For this tutorial to take you no more than 10 minutes you should be able to execute properly some bash commands like mv, tar, ssh and scp (secure copy). I also assume that you have an account in Amazon Web Services and you are familiar to the process of launching instances. If not, I strongly recommend you to follow this starting guide and complete it till you manage to connect to your instance with ssh. Start downloading the latest stable version of neo4j. Which you can find here. The “Community Edition” fits well for development purposes. Do not forget to select the Unix version of the server. This will download a tar.gz file which you will copy to your EC2 instance later. While you download the neo4j server open the AWS Management Console and launch a Basic 32-bit Amazon Linux AMI. If you want to launch an Ubuntu AMI please notice that it doesn’t ship with Java, which is required for running neo4j. If you are not familiar with key pairs, pem files or security groups I insist you to follow the EC2 starting guide I mentioned above. You can either create a new security group or use the default, but you will need to configure a new security rule for the neo4j server port. After launching the instance, create a TCP rule on port 7474 with source 0.0.0.0/0. Here you are opening port 7474 for anyone. If you are planning to use the neo4j REST API and remotely call it from another server, for example a Rails application hosted in Heroku, for security reasons, you may want to change the source field to the address of your Heroku server. Do not forget to open port 22 (SSH), this is typically the first rule normal people create after launching an instance. You are almost done! You should now install neo4j in your instance. Open a terminal in your localhost and navigate to the path where you downloaded neo4j. Copy the file to your Amazon instance by using the scp command: scp -i your_pem_file.pem neo4j-community-1.6.M01-unix.tar.gz ec2-user@YOUR_PUBLIC_INSTANCE_DNS:/home/ec2-user Please notice that you will need to change the path to your pem file, typically placed in ~/.ssh, the filename of the neo4j server you just downloaded and the plublic DNS of your instance. Now connect to your instance with SSH: ssh -i your_pem_file.pem ec2-user@YOUR_PUBLIC_INSTANCE_DNS Untar the neo4j server: tar xvfz neo4j-community-1.6.M01-unix.tar.gz.tar.gz Move it to /usr/local and rename the folder to neo4j: sudo mv neo4j-community-1.6.M01 /usr/local/neo4j Almost done!!! You should now open neo4j-server.properties under the conf directory and add the following line: org.neo4j.server.webserver.address=0.0.0.0 This lines allows anyone to connect remotely to your neo4j database server. Now run the start script. From the neo4j server folder. sudo ./bin/neo4j start Finally, open a browser and access the webadmin interface of your neo4j database by typing http://YOUR_PUBLIC_INSTANCE_DNS:7474. You should see the Neo4j Monitoring and Management Tool, pretty cool! If not, ask me You can now try using the REST API and the curl bash command to insert nodes and relationships. I hope this post helped you, good luck! Follow me on Twitter @negarnil Source: http://www.cloudtmp.com/java/how-to-deploy-a-neo4j-instance-in-amazon-ec2-in-10-minutes/
December 27, 2011
by Nicolas Garnil
· 27,395 Views · 1 Like
article thumbnail
How to order by multiple columns using Lambas and LINQ in C#
Today, I was required to order a list of records based on a Name and then ID. A simple one, but I did spend some time on how to do it with Lambda Expression in C#. C# provides the OrderBy, OrderByDescending, ThenBy, ThenByDescending. You can use them in your lambda expression to order the records as per your requirement. Assuming your list is “Phones” and contains the following data public class Phone { public int ID { get; set; } public string Name { get; set; } } public class Phones : List { public Phones() { Add(new Phone { ID = 1, Name = "Windows Phone 7" }); Add(new Phone { ID = 5, Name = "iPhone" }); Add(new Phone { ID = 2, Name = "Windows Phone 7" }); Add(new Phone { ID = 3, Name = "Windows Mobile 6.1" }); Add(new Phone { ID = 6, Name = "Android" }); Add(new Phone { ID = 10, Name = "BlackBerry" }); } } If you were to use LINQ Query , the query will look like the one below dataGridView1.DataSource = (from m in new Phones() orderby m.Name, m.ID select m).ToList(); Simple isn’t it ? It very simple using Lamba expression too. Your Lambda’s expression for the above LINQ query will look like the one below dataGridView1.DataSource = new Phones().OrderByDescending(a => a.Name).ThenByDescending(a => a.ID).ToList();
December 24, 2011
by Senthil Kumar
· 136,787 Views
article thumbnail
Easy URL rewriting in ASP.NET 4.0 web forms
In this post I am going to explain URL rewriting in greater details. This post will contain basic of URL rewriting and will explain how we can do URL rewriting in fewer lines of code. Why we need URL rewriting? Let’s consider a simpler scenario we want to display a customer details on a ASP.NET Page so how our page will know that for which customer we need to display details? The simplest way of doing is to use query string we will pass a customer id which uniquely identifies customer in query string. So our url will look like this. Customer.aspx?Id=1 This will work but the problem with above URL is that its not user friendly and search engine friendly. who is going to remember that what query string parameter I am going to pass and why we need that parameter. Also when search engine will crawl this site it will going to read this URL blindly as this url is not informative because it query string is not readable for search engine crawlers. So your search engine will be ranked lower as this URL is not readable to search engine crawlers.Now when do a URL rewriting our URL will be cleaner shorter and simpler like this. Customers/Id/1/ Here anybody in world can understand it talking about customer and this page will used to show customer details.Even search engine crawler will also know that you are talking about customers. That is why we need URL rewriting. URL rewriting and ASP.NET In earlier versions of ASP.NET we have to write lots of code for URL rewriting but Now with ASP.NET 4.0 you can easily rewrite in fewer lines of code. So let’s start a Demo where I will demonstrate you how we can easily rewrite URLs. So let’s first create a ASP. NET web form application via File->New->Project and a dialog box will open just like below and then created a empty project called URL rewriting. After creating a project I have added global.asax – where we are going to write url mapping logic and then I have added an asp.net page called which will display customer information just like below. So everything is now ready let’s start writing code. First thing we need to do is to define routes. Route will map a URL to physical page.First I will create static function called Register route which map route to particular file and then I am going to call this from application_start event of global.asax. Following is code for that. using System; using System.Web.Routing; namespace UrlRewriting { public class Global : System.Web.HttpApplication { protected void Application_Start(object sender, EventArgs e) { RegisterRoutes(RouteTable.Routes); } public static void RegisterRoutes(RouteCollection routeCollection) { routeCollection.MapPageRoute("RouteForCustomer", "Customer/{Id}", "~/Customer.aspx"); } } } Now as mapping code has been done let’s right code for customer.aspx page. I have have following code in page_load event of customer.aspx using System; namespace UrlRewriting { public partial class Customer : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { string id = Page.RouteData.Values["Id"].ToString(); Response.Write("Customer Details page"); Response.Write(string.Format("Displaying information for customer : {0}",id)); } } } Here in above code you can see that I am getting value from page route data and then just printing it. In real world it will fetch customer data from database and show customer details on page. Now let’s run that application. It will print a details of customer as I have passed Id in URL suppose you pass 1 as id in URL then it will look like following. Now if you put 2 in url it will print information about customer 2. That’s it. So we have enabled URL rewriting in asp.net in fewer lines of code. In next post I am going to explain redirection with URL rewriting. Hope you like this post. Stay tuned for more.. Till then Happy programming
December 10, 2011
by Jalpesh Vadgama
· 28,487 Views
article thumbnail
MySQL vs. Neo4j on a Large-Scale Graph Traversal
this post presents an analysis of mysql (a relational database) and neo4j (a graph database) in a side-by-side comparison on a simple graph traversal. the data set that was used was an artificially generated graph with natural statistics. the graph has 1 million vertices and 4 million edges. the degree distribution of this graph on a log-log plot is provided below. a visualization of a 1,000 vertex subset of the graph is diagrammed above. loading the graph the graph data set was loaded both into mysql and neo4j. in mysql a single table was used with the following schema. create table graph ( outv int not null, inv int not null ); create index outv_index using btree on graph (outv); create index inv_index using btree on graph (inv); after loading the data, the table appears as below. the first line reads: “vertex 0 is connected to vertex 1.” mysql> select * from graph limit 10; +------+-----+ | outv | inv | +------+-----+ | 0 | 1 | | 0 | 2 | | 0 | 6 | | 0 | 7 | | 0 | 8 | | 0 | 9 | | 0 | 10 | | 0 | 12 | | 0 | 19 | | 0 | 25 | +------+-----+ 10 rows in set (0.04 sec) the 1 million vertex graph data set was also loaded into neo4j. in gremlin , the graph edges appear as below. the first line reads: “vertex 0 is connected to vertex 992915.” gremlin> g.e[1..10] ==>e[183][0-related->992915] ==>e[182][0-related->952836] ==>e[181][0-related->910150] ==>e[180][0-related->897901] ==>e[179][0-related->871349] ==>e[178][0-related->857804] ==>e[177][0-related->798969] ==>e[176][0-related->773168] ==>e[175][0-related->725516] ==>e[174][0-related->700292] warming up the caches before traversing the graph data structure in both mysql and neo4j, each database had a “ warm up ” procedure run on it. in mysql, a “select * from graph” was evaluated and all of the results were iterated through. in neo4j, every vertex in the graph was iterated through and the outgoing edges of each vertex were retrieved. finally, for both mysql and neo4j, the experiment discussed next was run twice in a row and the results of the second run were evaluated. traversing the graph the traversal that was evaluated on each database started from some root vertex and emanated n-steps out. there was no sorting, no distinct-ing, etc. the only two variables for the experiments are the length of the traversal and the root vertex to start the traversal from. in mysql, the following 5 queries denote traversals of length 1 through 5. note that the “?” is a variable parameter of the query that denotes the root vertex. select a.inv from graph as a where a.outv=? select b.inv from graph as a, graph as b where a.inv=b.outv and a.outv=? select c.inv from graph as a, graph as b, graph as c where a.inv=b.outv and b.inv=c.outv and a.outv=? select d.inv from graph as a, graph as b, graph as c, graph as d where a.inv=b.outv and b.inv=c.outv and c.inv=d.outv and a.outv=? select e.inv from graph as a, graph as b, graph as c, graph as d, graph as e where a.inv=b.outv and b.inv=c.outv and c.inv=d.outv and d.inv=e.outv and a.outv=? for neo4j, the blueprints pipes framework was used. a pipe of length n was constructed using the following static method. public static pipeline createpipeline(final integer steps) { final arraylist pipes = new arraylist(); for (int i = 0; i < steps; i++) { pipe pipe1 = new vertexedgepipe(vertexedgepipe.step.out_edges); pipe pipe2 = new edgevertexpipe(edgevertexpipe.step.in_vertex); pipes.add(pipe1); pipes.add(pipe2); } return new pipeline(pipes); } for both mysql and neo4j, the results of the query (sql and pipes) were iterated through. thus, all results were retrieved for each query. in mysql, this was done as follows. while (resultset.next()) { resultset.getint(finalcolumn); } in neo4j, this is done as follows. while (pipeline.hasnext()) { pipeline.next(); } experimental results the artificial graph dataset was constructed with a “ rich get richer “, preferential attachment model . thus, the vertices created earlier are the most dense (i.e. highest number of adjacent vertices). this property was used to limit the amount of time it would take to evaluate the tests for each traversal. only the first 250 vertices were used as roots of the traversals. before presenting timing results, note that all of these experiments were run on a macbook pro with a 2.66ghz intel core 2 duo and 4gigs of ram at 1067 mhz ddr3. the packages used were java 1.6, mysql jdbc 5.0.8, and blueprints pipes 0.1.2. java version "1.6.0_17" java(tm) se runtime environment (build 1.6.0_17-b04-248-10m3025) java hotspot(tm) 64-bit server vm (build 14.3-b01-101, mixed mode) the following java virtual machine parameters were used: -xmx1000m -xms500m below are the total running times for both mysql (red) and neo4j (blue) for traversals of length 1, 2, 3, and 4. the raw data is presented below along with the total number of vertices returned by each traversal—which, of course, is the same for both mysql and neo4j given that its the same graph data set being processed. also realize that traversals can loop and thus, many of the same vertices are returned multiple times. finally, note that only neo4j has the running time for a traversal of length 5. mysql did not finish after waiting 2 hours to complete. in comparison, neo4j took 14.37 minutes to complete a 5 step traversal. [mysql steps-1] time(ms):124 -- vertices_returned:11360 [mysql steps-2] time(ms):922 -- vertices_returned:162640 [mysql steps-3] time(ms):8851 -- vertices_returned:2206437 [mysql steps-4] time(ms):112930 -- vertices_returned:28125623 [mysql steps-5] n/a [neo4j steps-1] time(ms):27 -- vertices_returned:11360 [neo4j steps-2] time(ms):474 -- vertices_returned:162640 [neo4j steps-3] time(ms):3366 -- vertices_returned:2206437 [neo4j steps-4] time(ms):49312 -- vertices_returned:28125623 [neo4j steps-5] time(ms):862399 -- vertices_returned:358765631 next, the individual data points for both mysql and neo4j are presented in the plot below. each point denotes how long it took to return n number of vertices for the varying traversal lengths. finally, the data below provides the number of vertices returned per millisecond (on average) for each of the traversals. again, mysql did not finish in its 2 hour limit for a traversal of length 5. [mysql steps-1] vertices/ms:91.6128847554668 [mysql steps-2] vertices/ms:176.399127537985 [mysql steps-3] vertices/ms:249.286746556076 [mysql steps-4] vertices/ms:249.053599519823 [mysql steps-5] n/a [neo4j steps-1] vertices/ms:420.740351166341 [neo4j steps-2] vertices/ms:343.122344772028 [neo4j steps-3] vertices/ms:655.507125256186 [neo4j steps-4] vertices/ms:570.360621871775 [neo4j steps-5] vertices/ms:416.00886711325 conclusion in conclusion, given a traversal of an artificial graph with natural statistics, the graph database neo4j is more optimal than the relational database mysql. however, no attempts have been made to optimize the java vm, the sql queries, etc. these experiments were run with both neo4j and mysql “out of the box” and with a “natural syntax” for both types of queries. source: http://markorodriguez.com/2011/02/18/mysql-vs-neo4j-on-a-large-scale-graph-traversal/
December 5, 2011
by Marko Rodriguez
· 58,349 Views · 1 Like
article thumbnail
Create Your Own XML/JSON/HTML API with PHP
Develop your own API service for your PHP projects.
December 1, 2011
by Andrei Prikaznov
· 68,129 Views
article thumbnail
Zero Downtime – What is it and why is it important?
For most large web applications, uptime is of foremost importants. Any outage can be seen by customers as a frustration, or opportunity to move to a competitor. What's more for a site that also includes e-commerce, it can mean real lost sales. Zero Downtime describes a site without service interruption. To achieve such lofty goals, redundancy becomes a critical requirement at every level of your infrastructure. If you're using cloud hosting, are you redundant to alternate availability zones and regions? Are you using geographically distributed load balancing? Do you have multiple clustered databases on the backend, and multiple webservers load balanced. All of these requirements will increase uptime, but may not bring you close to zero downtime. For that you'll need thorough testing. The solution is to pull the trigger on sections of your infrastructure, and prove that it fails over quickly without noticeable outage. The ultimate test is the outage itself. Sean Hull on Quora: What is zero downtime and why is it important? Source: http://www.iheavy.com/2011/06/23/zero-downtime-what-is-it-and-why-is-it-important/
November 23, 2011
by Sean Hull
· 26,149 Views
article thumbnail
Eventual Consistency in NoSQL Databases: Theory and Practice
One of NoSQL's goals: handle previously-unthinkable amounts of data. One of unthinkable-amounts-of-data's problems: previously-improbable events become extremely probable, precisely because the set of interactions is so large. Flip a coin a hundred times, and you're not likely to get 50 heads in a row. But flip it a few trillion times, and you probably will find some 50-heads streaks. So NoSQL's performance strength is also its mathematical weakness. This order of scale can result in lots of problems, but one of the most common is consistency -- the C in ACID -- clearly a fundamental desideratum for any database system, but in principle much harder to acheive for NoSQL databases than for others. Emerging database technologies have forced developers and computer scientists to define more exactly what kind of consistency is really needed, for any given application. Two years ago, ACM (the Association for Computing Machinery) published an extremely helpful examination of the attenuated notion of consistency called 'eventual consistency'. Their summary: Data inconsistency in large-scale reliable distributed systems must be tolerated for two reasons: improving read and write performance under highly concurrent conditions; and handling partition cases where a majority model would render part of the system unavailable even though the nodes are up and running. The article surveys technical solutions as well as user considerations that might soften the undesirability of anything less than perfect, instantaneous consistency. It's not long (4 pages plus pictures), and explains some deep database issues quite clearly. On the more practical side of the problem: Russell Brown recently gave a talk at the NoSQL Exchange 2011 on exactly this topic. More specifically, he showed how some distributed systems (Riak in particular) try to minimize conflicts, and suggested some ways to reconcile conflicts automatically using smart semantic techniques. Check out the NoSQL Exchange page for Russell's talk here, which includes an embedded video. But read the ACM article first for a broader overview, since Russell launches into technical details pretty quickly.
November 22, 2011
by John Esposito
· 12,476 Views
article thumbnail
Adventures in Archiving with MongoDB
Most databases have them: a small handful of big tables that grow at a substantially faster rate than any of their peers (think tweets, clicks, check-ins, etc). Over time, the sheer size of these tables cause queries against them to slow down, and increase the size and time taken for backups. In some cases, not all of this data needs to be “live” so that it can be randomly accessed forever, and can be archived once some criteria has been met. I like to think of this as Eventually Irrelevant (if I may coin a term). This post was authored by Brian Ploetz Many database products support partitioning features, which allow you to arrange data physically based on some criteria (typically date, range, list, or hash-based partitioning). These partitioning features allow you to drop partitions from the live database when appropriate. While MongoDB supports horizontal partitioning via sharding, it does not currently support the notion of dropping shards. Even if it did, the criteria for what documents to drop from a collection would most likely not be based on your shard key, so it’s unlikely you’d want to drop a shard anyways. You’d want to drop data from all shards. Capped Collections and the forthcoming TTL-based Capped Collections feature are not quite what we’re looking for either. Capped Collections have very strict rules (including not being able to shard them), and for both normal and TTL-based Capped Collections, as old documents are aged out they are simply dropped on the floor. What we need is more control over the dropping process, such that we can guarantee that our data has been copied to an archive database/data warehouse before being dropped from the main database. I set out to find the most efficient way of manually archiving documents from a large collection given the tools currently available in MongoDB. For the purposes of this exercise, let’s assume we have a large collection of ad clicks which are associated with ads. We want to archive clicks which are associated with ads which expired more than 3 months ago. MapReduce MongoDB wants you to do bulk operations using MapReduce. Since all of the examples and documentation revolve around aggregation, I suspected I wouldn’t be able to use MapReduce as the backbone of my archiving process. After some experimenting, I found that my intuition was correct. Upon attempting to insert/remove documents within the map or reduce functions, I would quickly run into exception 10293 internal error: locks are not upgradeable errors: Sun Sep 25 21:22:24 [conn5] update warehouse.clicks query: { _id: ObjectId('4d6bf191db0b4b2b1fad5b65') } exception 10293 internal error: locks are not upgradeable: { "opid" : 4248173, "active" : false, "waitingForLock" : false, "op" : "update", "ns" : "?", "query" : { "_id" : { "$oid" : "4d6bf191db0b4b2b1fad5b65" } }, "client" : "0.0.0.0:0", "desc" : "conn" } 0ms Sun Sep 25 21:22:24 [conn5] Assertion: 10293:internal error: locks are not upgradeable: { "opid" : 4248174, "active" : false, "waitingForLock" : false, "op" : "update", "ns" : "?", "query" : { "_id" : { "$oid" : "4d6bf198db0b4b2b20ad5b65" } }, "client" : "0.0.0.0:0", "desc" : "conn" } 0x10008de9b 0x1002064f7 0x1002c1ec6 0x1002c4f3d 0x1002c5f7a 0x1000a8181 0x100147df4 0x1004dc68e 0x1004efc93 0x1004dc71b 0x1004dcb71 0x10049a078 0x100158efa 0x100377760 0x100389d0c 0x10034c204 0x10034d877 0x100180cc4 0x100184649 0x1002b9e89 0 mongod 0x000000010008de9b _ZN5mongo11msgassertedEiPKc + 315 1 mongod 0x00000001002064f7 _ZN5mongo10MongoMutex19_writeLockedAlreadyEv + 263 2 mongod 0x00000001002c1ec6 _ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE + 886 3 mongod 0x00000001002c4f3d _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE + 5661 4 mongod 0x00000001002c5f7a _ZN5mongo14DBDirectClient3sayERNS_7MessageE + 106 5 mongod 0x00000001000a8181 _ZN5mongo12DBClientBase6updateERKSsNS_5QueryENS_7BSONObjEbb + 273 6 mongod 0x0000000100147df4 _ZN5mongo12mongo_updateEP9JSContextP8JSObjectjPlS4_ + 660 7 mongod 0x00000001004dc68e js_Invoke + 3864 8 mongod 0x00000001004efc93 js_Interpret + 71932 9 mongod 0x00000001004dc71b js_Invoke + 4005 10 mongod 0x00000001004dcb71 js_InternalInvoke + 404 11 mongod 0x000000010049a078 JS_CallFunction + 86 12 mongod 0x0000000100158efa _ZN5mongo7SMScope6invokeEyRKNS_7BSONObjEib + 666 13 mongod 0x0000000100377760 _ZN5mongo2mr8JSMapper3mapERKNS_7BSONObjE + 96 14 mongod 0x0000000100389d0c _ZN5mongo2mr16MapReduceCommand3runERKSsRNS_7BSONObjERSsRNS_14BSONObjBuilderEb + 1740 15 mongod 0x000000010034c204 _ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb + 628 16 mongod 0x000000010034d877 _ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_10BufBuilderERNS_14BSONObjBuilderEbi + 2151 17 mongod 0x0000000100180cc4 _ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_10BufBuilderERNS_14BSONObjBuilderEbi + 52 18 mongod 0x0000000100184649 _ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_ + 10585 19 mongod 0x00000001002b9e89 _ZN5mongo13receivedQueryERNS_6ClientERNS_10DbResponseERNS_7MessageE + 569 Upon killing the client which kicked off the MapReduce process, theses errors would continue to spin out of control, and I’d have to bounce the server. While MapReduce’s divide and conquer approach is ideal for what we’re trying to accomplish, unless the ability to obtain a write lock within a MapReduce function is supported, it is not a viable option for now. db.eval() While a server side script is certainly a viable option for writing an archiving process, it has a fundamental flaw in that it won’t work for sharded collections, which could be a non-starter for some folks. An example archiving process using db.eval() would look something like this: archiveClick = function archiveClick(doc) { if (expiredAdIds.indexOf(doc.ad_id) != -1) { archivedb = db.getMongo().getDB("archive"); archivedb.clicks.save(doc); // before we remove the original document, make sure the archive worked if (archivedb.getLastError() == null) { db.clicks.remove({_id: doc._id}); } else { throw "could not archive document with _id " + doc._id; } } } archiveClicks = function archiveClicks() { threeMonthsAgo = new Date(); threeMonthsAgo.setMonth(threeMonthsAgo.getMonth()-3); expiredAdIds = []; getExpiredAds = function(ad) {expiredAdIds.push(ad._id.str);} db.ads.find({end_date: {$lte: threeMonthsAgo}).forEach(getExpiredAds); db.system.js.save({_id: "expiredAdIds", value: expiredAdIds}); db.clicks.find().forEach(archiveClick); db.system.js.remove({_id: "expiredAdIds"}); } db.system.js.save({_id: "archiveClick", value: archiveClick}); db.system.js.save({_id: "archiveClicks", value: archiveClicks}); db.runCommand({$eval: "archiveClicks()", nolock: true}); Note the nolock: true sent to db.eval(). This ensures the mongod isn’t locked for other operations while this runs. The Holy Grail: Define Custom archive() Functions On Collections In order for our archiving process to work for sharded collections, we will need to add new functions to collection objects in the shell to perform the archiving. Frankly, this feels a lot more natural than db.eval(). I wanted to support two modes of archiving: immediate archiving, and a mark & sweep approach which allows you to queue documents to be archived at one point in time, and perform the actual archiving at a later time (off-peak hours). I’ve made these functions available in my mongodb-archive project on GitHub. Download the archive.js file, and use it like so: load("archive.js"); threeMonthsAgo = new Date(); threeMonthsAgo.setMonth(threeMonthsAgo.getMonth()-3); expiredAdIds = []; getExpiredAds = function(ad) {expiredAdIds.push(ad._id.str);} db.ads.find({end_date: {$lte: threeMonthsAgo}).forEach(getExpiredAds); archiveConnection = new Mongo("localhost:27018"); archiveDB = archiveConnection.getDB("archive"); archiveCollection = archiveDB.getCollection("clicks"); for (var i = 0; i < expiredAdIds.length; i++) { print("archiving clicks for ad_id: " + expiredAdIds[i]); db.clicks.archive({"ad_id": expiredAdIds[i]}, archiveCollection); print(""); } The mark/sweep variant would look like: db.clicks.queueForArchive({"ad_id": expiredAdIds[i]}); // ....some time later..... db.clicks.archiveQueued(archiveCollection); The biggest factors on the performance of the archiving process are really no different than the things that impact the performance of your MongoDB database overall: working set size, indexes, disk speed, and lock contention. The more indexes there are on the source collection, the longer the remove() will take. The more indexes there are on the archive collection, the longer the save() will take. The more of the data that you wish to archive is resident in memory, the faster the process will be. On my MacBook Pro (2.3GHz i7, 8GB RAM) with a 5400 RPM SATA drive and a clicks collection containing over 11 million documents, I was able to archive ~500 documents per second with the immediate archiving approach. Example output from mongostat when this was running: Archive DB: insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 1 0 0 3 0 208m 2.85g 15m 3.8 0 0|0 0|0 1k 1k 2 17:13:02 0 0 45 0 0 46 0 208m 2.85g 15m 3.4 0 0|0 0|0 61k 6k 2 17:13:03 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 8 0 0 9 0 208m 2.85g 15m 0.1 0 0|0 0|0 10k 2k 2 17:13:04 0 0 47 0 0 48 0 208m 2.85g 15m 3.4 0 0|0 0|0 66k 7k 2 17:13:05 0 0 0 0 0 1 1 208m 2.85g 15m 0 0 0|0 0|0 62b 1k 2 17:13:06 0 0 18 0 0 19 0 208m 2.85g 15m 2.6 0 0|0 0|0 24k 3k 2 17:13:07 0 0 79 0 0 80 0 208m 2.85g 16m 1.1 0 0|0 0|0 111k 11k 2 17:13:08 0 0 131 0 0 132 0 208m 2.85g 16m 8.8 0 0|0 0|0 181k 17k 2 17:13:09 0 0 216 0 0 217 0 208m 2.85g 16m 3.8 0 0|0 0|0 291k 27k 2 17:13:10 0 0 425 0 0 426 0 208m 2.85g 17m 10.5 0 0|0 0|0 600k 53k 2 17:13:11 0 0 490 0 0 490 0 208m 2.85g 19m 7.3 0 0|0 0|1 658k 61k 2 17:13:12 0 0 630 0 0 632 0 208m 2.85g 20m 11.3 0 0|0 0|0 844k 78k 2 17:13:13 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 818 0 0 818 0 208m 2.85g 22m 13.6 0 0|0 0|0 1m 101k 2 17:13:14 0 0 686 0 0 688 0 208m 2.85g 21m 9.6 0 0|0 0|0 932k 85k 2 17:13:15 0 0 0 0 0 1 0 208m 2.85g 22m 0 0 0|0 0|0 62b 1k 2 17:13:16 0 0 178 0 0 179 0 208m 2.85g 22m 1.2 0 0|0 0|0 252k 23k 2 17:13:17 0 0 974 0 0 975 0 208m 2.85g 23m 12.3 0 0|0 0|0 1m 121k 2 17:13:18 0 0 920 0 0 921 0 208m 2.85g 26m 10.1 0 0|0 0|0 1m 114k 2 17:13:19 0 0 810 0 0 811 0 208m 2.85g 26m 8.1 0 0|0 0|0 1m 100k 2 17:13:20 0 0 612 0 0 613 0 208m 2.85g 27m 8.1 0 0|0 0|0 798k 76k 2 17:13:21 0 0 0 0 0 1 0 208m 2.85g 27m 0 0 0|0 0|0 62b 1k 2 17:13:22 0 0 0 0 0 1 0 208m 2.85g 27m 0 0 0|0 0|0 62b 1k 2 17:13:23 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 30 0 0 31 0 208m 2.85g 27m 1.1 0 0|0 0|0 46k 5k 2 17:13:24 0 0 852 0 0 853 0 208m 2.85g 27m 14.8 0 0|0 0|0 1m 106k 2 17:13:25 0 0 768 0 0 769 0 208m 2.85g 29m 9.5 0 0|0 0|0 1m 95k 2 17:13:26 0 0 867 0 0 868 0 208m 2.85g 32m 13.4 0 0|0 0|0 1m 107k 2 17:13:27 0 0 705 0 0 706 0 208m 2.85g 31m 10.8 0 0|0 0|0 936k 88k 2 17:13:28 0 0 331 0 0 332 0 208m 2.85g 32m 3.9 0 0|0 0|0 446k 42k 2 17:13:29 0 0 0 0 0 1 0 208m 2.85g 32m 0 0 0|0 0|0 62b 1k 2 17:13:30 0 0 551 0 0 552 0 208m 2.85g 32m 6.1 0 0|0 0|0 739k 69k 2 17:13:31 0 0 728 0 0 729 0 208m 2.85g 35m 12.2 0 0|0 0|0 949k 90k 2 17:13:32 0 0 991 0 0 992 0 208m 2.85g 35m 11.7 0 0|0 0|0 1m 123k 2 17:13:33 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 935 0 0 935 0 208m 2.85g 38m 10.9 0 0|0 0|1 1m 116k 2 17:13:34 0 0 431 0 0 433 0 208m 2.85g 39m 5.9 0 0|0 0|0 563k 54k 2 17:13:35 0 0 0 0 0 1 0 208m 2.85g 39m 0 0 0|0 0|0 62b 1k 2 17:13:36 0 0 518 0 0 519 0 208m 2.85g 40m 5 0 0|0 0|0 693k 65k 2 17:13:37 0 0 904 0 0 905 0 208m 2.85g 40m 8.9 0 0|0 0|0 1m 112k 2 17:13:38 0 0 958 0 0 959 0 208m 2.85g 41m 14.2 0 0|0 0|0 1m 119k 2 17:13:39 0 0 899 0 0 900 0 208m 2.85g 44m 11.1 0 0|0 0|0 1m 111k 2 17:13:40 0 0 401 0 0 402 0 208m 2.85g 42m 5.9 0 0|0 0|0 540k 50k 2 17:13:41 0 0 856 0 0 857 0 208m 2.85g 44m 9.4 0 0|0 0|0 1m 106k 2 17:13:42 0 0 232 0 0 233 0 208m 2.85g 45m 2.9 0 0|0 0|0 311k 29k 1 17:13:43 Source DB: insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 2 0 12 1 13 0 56g 115g 2.82g 52.2 0 0|0 0|1 1k 822k 2 17:13:02 0 0 0 42 0 43 0 56g 115g 2.65g 88.1 0 0|0 0|1 5k 4k 2 17:13:03 0 0 0 14 0 15 1 56g 115g 1.88g 101 0 0|0 0|1 1k 2k 2 17:13:04 0 0 0 33 1 35 0 56g 115g 1.89g 53.8 16.6 0|0 1|0 4k 4k 2 17:13:05 0 0 0 0 0 1 0 56g 115g 1.9g 0 0 0|0 1|0 62b 1k 2 17:13:06 0 0 0 34 0 35 0 56g 115g 1.9g 47.3 0 0|0 0|0 4k 4m 2 17:13:07 0 0 0 91 0 92 0 56g 115g 1.89g 86.8 0 0|0 0|0 12k 8k 2 17:13:08 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 149 0 150 0 56g 115g 1.9g 82.5 0 0|0 0|0 20k 13k 2 17:13:09 0 0 0 287 0 287 0 56g 115g 1.9g 74.3 0 0|0 0|1 38k 25k 2 17:13:10 0 0 0 387 0 389 0 56g 115g 1.91g 64.6 0 0|0 0|0 52k 33k 2 17:13:11 0 0 0 580 0 581 0 56g 115g 1.93g 59 0 0|0 0|0 78k 49k 2 17:13:12 0 0 0 685 0 686 0 56g 115g 1.93g 47.9 0 0|0 0|0 93k 58k 2 17:13:13 0 0 0 729 0 730 0 56g 115g 1.95g 42.6 0 0|0 0|0 99k 61k 2 17:13:14 0 0 0 551 1 552 0 56g 115g 1.97g 34.5 0 0|0 1|0 74k 47k 2 17:13:15 0 0 0 0 0 1 0 56g 115g 1.98g 0 0 0|0 1|0 62b 1k 2 17:13:16 0 0 0 368 0 368 0 56g 115g 1.96g 20 0 0|0 0|1 50k 4m 2 17:13:17 0 0 0 1032 0 1034 0 56g 115g 2g 35.9 0 0|0 0|0 140k 87k 2 17:13:18 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 834 0 835 0 56g 115g 2.01g 42.3 0 0|0 0|0 113k 70k 2 17:13:19 0 0 0 922 0 922 0 56g 115g 2.02g 38.9 0 0|0 0|1 125k 77k 2 17:13:20 0 0 0 338 1 340 0 56g 115g 2.03g 13 0 0|0 1|0 46k 29k 2 17:13:21 0 0 0 0 0 1 0 56g 115g 2.04g 0 0 0|0 1|0 62b 1k 2 17:13:22 0 0 0 0 0 1 0 56g 115g 2.04g 0 0 0|0 1|0 62b 1k 2 17:13:23 0 0 0 185 0 186 0 56g 115g 2.02g 24.7 0 0|0 0|0 25k 4m 2 17:13:24 0 0 0 920 0 920 0 56g 115g 2.02g 29.9 0 0|0 0|1 125k 77k 2 17:13:25 0 0 0 741 0 742 0 56g 115g 2.04g 49 0 0|0 0|1 100k 62k 2 17:13:26 0 0 0 886 0 887 0 56g 115g 2.05g 33 0 0|0 0|1 120k 74k 2 17:13:27 0 0 0 683 0 684 0 56g 115g 2.08g 57.3 0 0|0 0|1 92k 58k 2 17:13:28 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 138 1 140 0 56g 115g 2.06g 14.5 0 0|0 1|0 18k 12k 2 17:13:29 0 0 0 12 0 12 0 56g 115g 2.07g 0.2 0 0|0 0|1 1k 4m 2 17:13:30 0 0 0 761 0 763 0 56g 115g 2.08g 46.5 0 0|0 0|0 103k 64k 2 17:13:31 0 0 0 762 0 762 0 56g 115g 2.09g 47 0 0|0 0|1 103k 64k 2 17:13:32 0 0 0 957 0 959 0 56g 115g 2.1g 35.3 0 0|0 0|0 130k 80k 2 17:13:33 0 0 0 905 0 905 0 56g 115g 2.13g 38.5 0 0|1 0|1 123k 76k 2 17:13:34 0 0 0 239 1 241 0 56g 115g 2.12g 14.9 0 0|0 1|0 32k 21k 2 17:13:35 0 0 0 0 0 1 0 56g 115g 2.13g 0 0 0|0 1|0 62b 1k 2 17:13:36 0 0 0 780 0 781 0 56g 115g 2.13g 42.8 0 0|0 0|0 106k 4m 2 17:13:37 0 0 0 805 0 806 0 56g 115g 2.14g 39.8 0 0|0 0|0 109k 68k 2 17:13:38 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 988 0 989 0 56g 115g 2.16g 34.8 0 0|0 0|0 134k 83k 2 17:13:39 0 0 0 953 0 953 0 56g 115g 2.17g 39.9 0 0|0 0|1 129k 80k 2 17:13:40 0 0 0 375 1 376 0 56g 115g 2.18g 13.9 0 0|1 0|1 51k 1m 2 17:13:41 0 0 0 867 0 869 0 56g 115g 2.19g 41.7 0 0|0 0|0 118k 73k 1 17:13:42 Total: MongoDB shell version: 2.0.1 connecting to: localhost:27017/prodcopy archiving clicks for ad_id: 4d6c12497b90420373000056 archiving documents... archived 19045 documents in 40701ms. For the queued approach with this same set up, I was able to queue ~1200 documents per second, and the archiving process performed at about the same ~500 documents per second. Example output from mongostat while this was running: Archive DB: insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 0 0 1 0 208m 2.85g 18m 0 0 0|0 0|0 62b 1k 2 17:16:44 0 0 0 0 0 1 0 208m 2.85g 18m 0 0 0|0 0|0 62b 1k 2 17:16:45 0 0 0 0 0 1 0 208m 2.85g 18m 0 0 0|0 0|0 62b 1k 2 17:16:46 0 0 0 0 0 1 0 208m 2.85g 18m 0 0 0|0 0|0 62b 1k 2 17:16:47 0 0 0 0 0 1 0 208m 2.85g 18m 0 0 0|0 0|0 62b 1k 2 17:16:48 0 0 0 0 0 1 0 208m 2.85g 18m 0 0 0|0 0|0 62b 1k 2 17:16:49 0 0 248 0 0 249 0 208m 2.85g 17m 6 0 0|0 0|0 283k 31k 2 17:16:50 0 0 780 0 0 781 0 208m 2.85g 18m 8.2 0 0|0 0|0 883k 97k 2 17:16:51 0 0 1256 0 0 1257 0 208m 2.85g 22m 13.7 0 0|0 0|0 1m 155k 2 17:16:52 0 0 1084 0 0 1085 0 208m 2.85g 21m 10 0 0|0 0|0 1m 134k 2 17:16:53 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 1108 0 0 1109 0 208m 2.85g 25m 9.8 0 0|0 0|0 1m 137k 2 17:16:54 0 0 1108 0 0 1109 0 208m 2.85g 24m 9.8 0 0|0 0|0 1m 137k 2 17:16:55 0 0 1110 0 0 1111 0 208m 2.85g 27m 11 0 0|0 0|0 1m 137k 2 17:16:56 0 0 1166 0 0 1167 0 208m 2.85g 31m 11.3 0 0|0 0|0 1m 144k 2 17:16:57 0 0 1246 0 0 1247 0 208m 2.85g 31m 10.6 0 0|0 0|0 1m 154k 2 17:16:58 0 0 1094 0 0 1095 0 208m 2.85g 35m 9.6 0 0|0 0|0 1m 135k 2 17:16:59 0 0 957 0 0 958 0 208m 2.85g 33m 9.6 0 0|0 0|0 1m 119k 2 17:17:00 0 0 721 0 0 722 0 464m 3.35g 38m 8.9 0 0|0 0|0 1m 90k 2 17:17:01 0 0 243 0 0 243 0 464m 3.35g 34m 9.7 0 0|0 0|0 445k 31k 2 17:17:02 0 0 921 0 0 923 0 464m 3.35g 37m 30.1 0 0|0 0|0 1m 114k 2 17:17:03 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 668 0 0 669 0 464m 3.35g 38m 9.6 0 0|0 0|0 962k 83k 2 17:17:04 0 0 995 0 0 995 0 464m 3.35g 41m 21.2 0 0|0 0|1 1m 123k 2 17:17:05 0 0 467 0 0 468 0 464m 3.35g 27m 12 0 0|1 0|1 679k 58k 2 17:17:06 0 0 1 0 0 2 1 464m 3.35g 17m 85.5 0 0|1 0|1 1k 1k 2 17:17:07 0 0 337 0 0 338 0 464m 3.35g 19m 36.2 0 0|0 0|1 499k 42k 2 17:17:08 0 0 917 0 0 919 0 464m 3.35g 21m 20.4 0 0|0 0|0 1m 114k 2 17:17:09 0 0 1022 0 0 1022 0 464m 3.35g 25m 22.7 0 0|0 0|1 1m 126k 2 17:17:10 0 0 970 0 0 971 0 464m 3.35g 24m 27.4 0 0|0 0|1 1m 120k 2 17:17:11 0 0 166 0 0 168 0 464m 3.35g 25m 1.2 0 0|0 0|0 251k 21k 2 17:17:12 0 0 862 0 0 863 0 464m 3.35g 28m 23.1 0 0|0 0|0 1m 107k 2 17:17:13 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 888 0 0 889 0 464m 3.35g 28m 28.8 0 0|0 0|0 1m 110k 2 17:17:14 0 0 540 0 0 541 0 464m 3.35g 30m 12.2 0 0|0 0|0 802k 67k 2 17:17:15 0 0 879 0 0 880 0 464m 3.35g 31m 22.3 0 0|0 0|0 1m 109k 2 17:17:16 0 0 473 0 0 474 0 464m 3.35g 36m 17.2 0 0|0 0|0 1m 59k 1 17:17:17 Source DB: insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 0 0 1 0 56g 115g 1.86g 0 0 0|0 0|0 62b 1k 1 17:16:29 0 0 0 0 0 1 0 56g 115g 1.86g 0 0 0|0 0|0 62b 1k 1 17:16:30 1 1 1 0 1 3 0 56g 115g 1.87g 51.1 0 0|0 0|1 475b 697k 2 17:16:31 0 0 0 0 0 1 0 56g 115g 349m 129 0 0|0 0|1 62b 1k 2 17:16:32 0 0 0 0 0 1 0 56g 115g 367m 101 0 0|0 0|1 62b 1k 2 17:16:33 0 0 0 0 0 1 0 56g 115g 384m 73.4 0 0|0 0|1 62b 1k 2 17:16:34 0 0 0 0 0 1 0 56g 115g 428m 112 0 0|0 0|1 62b 1k 2 17:16:35 0 0 0 0 0 1 0 56g 115g 436m 94.8 0 0|0 0|1 62b 1k 2 17:16:36 0 0 0 0 0 1 0 56g 115g 450m 108 0 0|0 0|1 62b 1k 2 17:16:37 0 0 0 0 0 1 0 56g 115g 461m 110 0 0|0 0|1 62b 1k 2 17:16:38 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 0 0 1 0 56g 115g 480m 95.5 0 0|0 0|1 62b 1k 2 17:16:39 0 0 0 0 0 1 0 56g 115g 494m 73.5 0 0|0 0|1 62b 1k 2 17:16:40 0 0 0 0 0 1 0 56g 115g 531m 120 0 0|0 0|1 62b 1k 2 17:16:41 0 0 0 0 0 1 0 56g 115g 541m 108 0 0|0 0|1 62b 1k 2 17:16:42 0 0 0 0 0 1 0 56g 115g 564m 74.6 0 0|0 0|1 62b 1k 2 17:16:43 0 0 0 0 0 1 0 56g 115g 579m 127 0 0|0 0|1 62b 1k 2 17:16:44 0 0 0 0 0 1 0 56g 115g 605m 100 0 0|0 0|1 62b 1k 2 17:16:45 0 0 0 0 0 1 0 56g 115g 631m 98.5 0 0|0 0|1 62b 1k 2 17:16:46 0 0 0 0 0 1 0 56g 115g 657m 89.1 0 0|0 0|1 62b 1k 2 17:16:47 0 0 0 0 0 1 0 56g 115g 662m 87.8 0 0|0 0|1 62b 1k 2 17:16:48 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 0 0 1 0 56g 115g 680m 93.5 0 0|0 0|1 62b 1k 2 17:16:49 0 1 0 530 1 531 0 56g 115g 767m 81.3 0 0|0 0|1 72k 4m 2 17:16:50 0 0 0 884 0 886 0 56g 115g 721m 48 0 0|0 0|0 120k 74k 2 17:16:51 0 0 0 1079 0 1080 0 56g 115g 698m 33.6 0 0|0 0|0 146k 90k 2 17:16:52 0 0 0 1294 0 1294 0 56g 115g 720m 23.8 0 0|1 0|1 175k 108k 2 17:16:53 0 0 0 1118 1 1120 0 56g 115g 740m 31.9 0 0|0 0|0 152k 4m 2 17:16:54 0 0 0 1111 0 1112 0 56g 115g 718m 33.3 0 0|0 0|0 151k 93k 2 17:16:55 0 0 0 1097 0 1098 0 56g 115g 706m 33.1 0 0|0 0|0 149k 92k 2 17:16:56 0 0 0 1097 0 1098 0 56g 115g 701m 33.2 0 0|0 0|0 149k 92k 2 17:16:57 0 0 0 1091 1 1092 0 56g 115g 696m 35.1 0 0|0 0|0 148k 4m 2 17:16:58 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 1149 0 1150 0 56g 115g 718m 21.3 0 0|0 0|0 156k 96k 2 17:16:59 0 0 0 1091 0 1091 0 56g 115g 707m 31.2 0 0|0 0|1 148k 91k 2 17:17:00 0 0 0 468 0 470 0 56g 115g 714m 9.3 0 0|0 0|0 63k 40k 2 17:17:01 0 0 0 332 0 333 0 56g 115g 694m 27.3 0 0|0 0|0 45k 28k 2 17:17:02 0 0 0 1013 1 1014 0 56g 115g 713m 17.8 0 0|0 0|0 137k 4m 2 17:17:03 0 0 0 551 0 552 0 56g 115g 691m 39.9 0 0|0 0|0 74k 47k 2 17:17:04 0 0 0 1140 0 1141 0 56g 115g 709m 19.7 0 0|0 0|0 155k 95k 2 17:17:05 0 0 0 126 0 127 0 56g 115g 709m 2.1 0 0|0 0|0 17k 11k 2 17:17:06 0 0 0 92 0 92 0 56g 115g 710m 1.6 0 0|0 0|1 12k 8k 2 17:17:07 0 0 0 605 1 607 0 56g 115g 700m 29.1 0 0|0 0|0 82k 4m 2 17:17:08 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 814 0 814 0 56g 115g 699m 17.3 0 0|0 0|1 110k 68k 2 17:17:09 0 0 0 1036 0 1037 0 56g 115g 708m 18.4 0 0|0 0|0 140k 87k 2 17:17:10 0 0 0 820 0 821 0 56g 115g 715m 14.4 0 0|1 0|1 111k 69k 2 17:17:11 0 0 0 298 0 300 0 56g 115g 683m 24.2 0 0|0 0|0 40k 26k 2 17:17:12 0 0 0 907 1 908 0 56g 115g 699m 20.4 0 0|0 0|0 123k 4m 2 17:17:13 0 0 0 946 0 947 0 56g 115g 707m 16.4 0 0|0 0|0 128k 79k 2 17:17:14 0 0 0 374 0 375 0 56g 115g 677m 36.7 0 0|0 0|0 50k 32k 2 17:17:15 0 0 0 905 1 905 0 56g 115g 689m 16.2 0 0|0 0|0 123k 821k 2 17:17:16 0 0 0 259 0 261 0 56g 115g 690m 5.1 0 0|0 0|0 35k 22k 1 17:17:17 0 0 0 0 0 1 0 56g 115g 690m 0 0 0|0 0|0 62b 1k 1 17:17:18 Total: MongoDB shell version: 2.0.1 connecting to: localhost:27017/prodcopy archiving clicks for ad_id: 4d6c12497b90420373000062 ensuring sparse index on field "arch" marking documents for archive... queued 22227 documents for archive in 19369ms. archiving queued documents to archive.clicks... archived 22227 queued documents to archive.clicks in 26901ms. Upgrading the hard drive to an SSD, I was able to almost triple the performance of the archiving step, and archive ~1300 documents per second. Obviously this is all a giant hack, and it would be nice if MongoDB supported some kind of partitioning feature so we didn’t have to do any of this manually. But until then, this is the best I could get given the tools currently available. Source: http://blog.brianploetz.com/post/12131083486/adventures-in-archiving-with-mongodb
November 21, 2011
by Mitch Pronschinske
· 14,357 Views
article thumbnail
Freight Management System on NetBeans
Lynden is a family of transportation and logistics companies specialized in shipping to Alaska and other locations worldwide. Over land, on the water, in the air - or in any combination - Lynden has been helping customers solve transportation problems for over a century. The Lynden Freight Management System is a NetBeans Platform application which serves a dual purpose as both a planning and freight tracking tool. The Planning module allows terminal managers to see all freight that is currently inbound to their location as well as freight that is scheduled to depart from their location so they can make the most efficient use of their dock space and resources as possible. The Trace module allows customer service personnel to search for customer account information, view the tracking history of any given freight item in the system as well as display any documents related to the shipment, such as bills of lading or delivery receipts. NetBeans Platform Lynden has benefited from the NetBeans Platform as it allows developers to focus on the business logic of our applications rather than the underlying "plumbing". We are able to leverage built-in support for event handling, enable/disable functionality on UI controls, dockable windows, and automatic updates for our application with minimal work compared to rolling our own framework. We chose to go the desktop application route as we have a number of existing desktop applications here that this application will likely need to interface with at some point, as well as a commercial set of rich UI components that we have been using for some time now. For the initial deployment, we will be pushing the installer out to employee PCs via the Landesk remote desktop administration tool. Future updates to various modules within the application will be done via the update center functionality built into the NetBeans Platform. Screenshots
November 19, 2011
by Rob Terpilowski
· 11,744 Views · 3 Likes
article thumbnail
ASP.NET MVC: Converting business objects to select list items
Some of our business classes are used to fill dropdown boxes or select lists. And often you have some base class for all your business classes. In this posting I will show you how to use base business class to write extension method that converts collection of business objects to ASP.NET MVC select list items without writing a lot of code. BusinessBase, BaseEntity and other base classes I prefer to have some base class for all my business classes so I can easily use them regardless of their type in contexts I need. NB! Some guys say that it is good idea to have base class for all your business classes and they also suggest you to have mappings done same way in database. Other guys say that it is good to have base class but you don’t have to have one master table in database that contains identities of all your business objects. It is up to you how and what you prefer to do but whatever you do – think and analyze first, please. :) To keep things maximally simple I will use very primitive base class in this example. This class has only Id property and that’s it. public class BaseEntity { public virtual long Id { get; set; } } Now we have Id in base class and we have one more question to solve – how to better visualize our business objects? To users ID is not enough, they want something more informative. We can define some abstract property that all classes must implement. But there is also another option we can use – overriding ToString() method in our business classes. public class Product : BaseEntity { public virtual string SKU { get; set; } public virtual string Name { get; set; } public override string ToString() { if (string.IsNullOrEmpty(Name)) return base.ToString(); return Name; } } Although you can add more functionality and properties to your base class we are at point where we have what we needed: identity and human readable presentation of business objects. Writing list items converter Now we can write method that creates list items for us. public static class BaseEntityExtensions { public static IEnumerable ToSelectListItems (this IList baseEntities) where T : BaseEntity { return ToSelectListItems((IEnumerator) baseEntities.GetEnumerator()); } public static IEnumerable ToSelectListItems (this IEnumerator baseEntities) { var items = new HashSet(); while (baseEntities.MoveNext()) { var item = new SelectListItem(); var entity = baseEntities.Current; item.Value = entity.Id.ToString(); item.Text = entity.ToString(); items.Add(item); } return items; } } You can see here to overloads of same method. One works with List and the other with IEnumerator. Although mostly my repositories return IList when querying data there are always situations where I can use more abstract types and interfaces. Using extension methods in code In your code you can use ToSelectListItems() extension methods like shown on following code fragment. ... var model = new MyFormModel(); model.Statuses = _myRepository.ListStatuses().ToSelectListItems(); ... You can call this method on all your business classes that extend your base entity. Wanna have some fun with this code? Write overload for extension method that accepts selected item ID.
November 11, 2011
by Gunnar Peipman
· 13,631 Views
article thumbnail
Creating a backup in Team Foundation Server 2010 using the Power Tools
over the last few years the product team has been putting their finishing touches on a backup module for the team foundation server administration console. why you might ask do you need another way to backup? surely you can just backup the bits? well, you could, but as tfs has a lot of moving parts it can get very complicated to creating a backup . required permissions identify databases create tables in databases create a stored procedure for marking tables create a stored procedure for marking all tables at once create a stored procedure to automatically mark tables create a scheduled job to run the table-marking procedure create a maintenance plan for full backups create a maintenance plan for differential backups create a maintenance plan for transaction backups back up additional lab management components -from “back up team foundation server” on msdn there are a heck of a lot of databases that, depending on your environment, might be spread over your entire network. figure: deployment topologies (where is my data?) from msdn so, how is this problem solved. well the tfs team have create a tool to create all of the backups and all of the job as well as managing the backup location for you. this sounds fantastic, but how about in practice. was it really that easy? well….not really…here is the extra stuff i found out: your account must own the share owning the folder does not cut it (see error #1- tf254027). sql must be running under a domain account or network service sql must also have permission to the share, and the validation will get confused if you use “localsystem” instead of network service or a domain account (see error #2- tf254027) the account running sql must have permission to create spn’s the account that is used for sql must be able to both see and create service principal names in active directory (see error #3: terminating your tfs server). once you learn how to google without keywords and read your servers mind you will have a nice backup system going… error #1- tf254027 i initially got an error because the accounts did not really have full control over the target location. this is a problem with the share. although i have full permission for \\fileserver1\share\tfsbackups it is just a folder under the \\fileserver1\share\ location and i do not have permission to change the sharing settings there. figure: tf254027 is caused by permission issues [info @16:36:34.342] granting account root_company\tfssqlbox$ permission on folder \\fileserver1\share\tfsbackups [info @16:36:34.348] system.unauthorizedaccessexception: attempted to perform an unauthorized operation. at system.security.accesscontrol.win32.setsecurityinfo(resourcetype type, string name, safehandle handle, securityinfos securityinformation, securityidentifier owner, securityidentifier group, genericacl sacl, genericacl dacl) at system.security.accesscontrol.nativeobjectsecurity.persist(string name, safehandle handle, accesscontrolsections includesections, object exceptioncontext) at system.security.accesscontrol.filesystemsecurity.persist(string fullpath) at microsoft.teamfoundation.powertools.admin.helpers.filehelper.grantfolderpermission(string account, string path) [info @16:36:34.350] granting account root_company\tfs.services permission on folder \\fileserver1\share\tfsbackups [info @16:36:34.352] system.unauthorizedaccessexception: attempted to perform an unauthorized operation. at system.security.accesscontrol.win32.setsecurityinfo(resourcetype type, string name, safehandle handle, securityinfos securityinformation, securityidentifier owner, securityidentifier group, genericacl sacl, genericacl dacl) at system.security.accesscontrol.nativeobjectsecurity.persist(string name, safehandle handle, accesscontrolsections includesections, object exceptioncontext) at system.security.accesscontrol.filesystemsecurity.persist(string fullpath) at microsoft.teamfoundation.powertools.admin.helpers.filehelper.grantfolderpermission(string account, string path) [error @16:36:34.352] granting permission to account root_company\tfssqlbox$ on path \\fileserver1\share\tfsbackups failed figure: the log files get to the root of the problem, but not the reason after much messing around i have found that you can’t use a sub-folder of a share that you do not have permission for. you require permission to the share itself to apply permissions. error #2- tf254027 lets try this again with a share that we control. i will create a backup share on the tfs server and at least then i control then permissions. figure: the next error looks the same, but it is subtly different [info @18:12:05.813] "verify: grant backup plan permissions\root\verifybackuppathpermissionsgrantedsuccessfully(verifybackuppathpermissionsgrantedsuccessfully): exiting verification with state completed and result success" [info @18:12:05.813] verify: grant backup plan permissions\root\verifydummybackupcreation(verifytestbackupcreatedsuccessfully): starting verification [info @18:12:05.813] verify test backup created successfully [info @18:12:05.813] starting creating backup test validation [error @18:12:06.132] microsoft.sqlserver.management.smo.failedoperationexception: backup failed for server 'sqlserver1'. ---> microsoft.sqlserver.management.common.executionfailureexception: an exception occurred while executing a transact-sql statement or batch. ---> system.data.sqlclient.sqlexception: cannot open backup device '\\tfsserver1\tfsbackup\temp_20111104111205.bak'. operating system error 5(access is denied.). backup database is terminating abnormally. at microsoft.sqlserver.management.common.connectionmanager.executetsql(executetsqlaction action, object execobject, dataset filldataset, boolean catchexception) at microsoft.sqlserver.management.common.serverconnection.executenonquery(string sqlcommand, executiontypes executiontype) --- end of inner exception stack trace --- at microsoft.sqlserver.management.common.serverconnection.executenonquery(string sqlcommand, executiontypes executiontype) at microsoft.sqlserver.management.common.serverconnection.executenonquery(stringcollection sqlcommands, executiontypes executiontype) at microsoft.sqlserver.management.smo.executionmanager.executenonquery(stringcollection queries) at microsoft.sqlserver.management.smo.backuprestorebase.executesql(server server, stringcollection queries) at microsoft.sqlserver.management.smo.backup.sqlbackup(server srv) --- end of inner exception stack trace --- at microsoft.sqlserver.management.smo.backup.sqlbackup(server srv) at microsoft.teamfoundation.powertools.admin.helpers.backupfactory.testbackupcreation(string path) [error @18:12:06.184] !verify error!: account root_comapny\martin.hinshelwood failed to create backups using path \\tfsserver1\tfsbackup [info @18:12:06.184] "verify: grant backup plan permissions\root\verifydummybackupcreation(verifytestbackupcreatedsuccessfully): exiting verification with state completed and result error" [info @18:12:06.184] !verify result!: 5 completed, 0 skipped: 4 success, 1 errors, 0 warnings [info @18:12:06.197] verify: backup tasks verifications(vcontainer): starting verification [info @18:12:06.197] a generic container node that does not contribute to results [info @18:12:06.197] "verify: backup tasks verifications(vcontainer): exiting verification with state ignore and result ignore" [info @18:12:06.197] verify: backup tasks verifications\root(vcontainer): starting verification [info @18:12:06.197] a generic container node that does not contribute to results [info @18:12:06.197] "verify: backup tasks verifications\root(vcontainer): exiting verification with state ignore and result ignore" [info @18:12:06.197] verify: backup tasks verifications\root\verifydummybackupcreation(verifytestbackupcreatedsuccessfully): starting verification [info @18:12:06.197] verify test backup created successfully [info @18:12:06.197] starting creating backup test validation [error @18:12:06.389] microsoft.sqlserver.management.smo.failedoperationexception: backup failed for server sqlserver1'. ---> microsoft.sqlserver.management.common.executionfailureexception: an exception occurred while executing a transact-sql statement or batch. ---> system.data.sqlclient.sqlexception: cannot open backup device '\\tfsserver1\tfsbackup\temp_20111104111206.bak'. operating system error 5(access is denied.). backup database is terminating abnormally. at microsoft.sqlserver.management.common.connectionmanager.executetsql(executetsqlaction action, object execobject, dataset filldataset, boolean catchexception) at microsoft.sqlserver.management.common.serverconnection.executenonquery(string sqlcommand, executiontypes executiontype) --- end of inner exception stack trace --- at microsoft.sqlserver.management.common.serverconnection.executenonquery(string sqlcommand, executiontypes executiontype) at microsoft.sqlserver.management.common.serverconnection.executenonquery(stringcollection sqlcommands, executiontypes executiontype) at microsoft.sqlserver.management.smo.executionmanager.executenonquery(stringcollection queries) at microsoft.sqlserver.management.smo.backuprestorebase.executesql(server server, stringcollection queries) at microsoft.sqlserver.management.smo.backup.sqlbackup(server srv) --- end of inner exception stack trace --- at microsoft.sqlserver.management.smo.backup.sqlbackup(server srv) at microsoft.teamfoundation.powertools.admin.helpers.backupfactory.testbackupcreation(string path) figure: this time the error is lying and is from sql not locally as it implies it looks like the problem is that sql server can’t write to that folder, but i can and the machine account can. lets try this from the sql server itself, and with a native backup. figure: sql server can’t write to that location dam… so even a native sql backup can’t write to this location. title: microsoft sql server management studio ------------------------------ backup failed for server 'sqlserver1'. (microsoft.sqlserver.smoextended) for help, click: http://go.microsoft.com/fwlink?prodname=microsoft+sql+server&prodver=10.50.2500.0+((kj_pcu_main).110617-0038+)&evtsrc=microsoft.sqlserver.management.smo.exceptiontemplates.failedoperationexceptiontext&evtid=backup+server&linkid=20476 ------------------------------ additional information: system.data.sqlclient.sqlerror: cannot open backup device '\\tfsserver1\tfsbackup\moo.bak'. operating system error 5(access is denied.). (microsoft.sqlserver.smo) for help, click: http://go.microsoft.com/fwlink?prodname=microsoft+sql+server&prodver=10.50.2500.0+((kj_pcu_main).110617-0038+)&linkid=20476 ------------------------------ buttons: ok ------------------------------ figure: sql server errors suck even more as it turns out, sql server is running under “localserivce” which is not authenticating against our share. so we need to change the service that tfs runs under. error #3: terminating your tfs server as we should always use the sql server configuration manager to change these things i fired it up and since i already have a domain account for running tfs under i decided to use that one. figure: this is easy when you apply it will ask you to restart sql, but it should be all complete. lets check tfs and make sure that everything is running… figure: omg! what just happened! oh shit: i think i just broke tfs. why can’t tfs connect? lets try the sql management studio and see. figure: what is a sspi? this does not look good… after i have hastily changed the service account back to the original value and made use that his fixed tfs i wanted to also figure out why it broke. usually i would just ask shad (one of my extremely technical colleagues) but alas he is on his honeymoon. some googling turned up an spn issue. the account that sql runs under must be able to both read and write service principal names for itself in active directory. this can be set, but only be a domain admin. dynamically set spn’s for sql service accounts so lets go with network service instead. if we change the account that sql server runs under to “network service” then i can add permission for “root_company\sqlserver1$” to my share and get it working. yes, servers have ad accounts as well.
November 5, 2011
by Martin Hinshelwood
· 9,979 Views
article thumbnail
Recommendation Engine Models
In a classical model of recommendation system, there are "users" and "items". User has associated metadata (or content) such as age, gender, race and other demographic information. Items also has its metadata such as text description, price, weight ... etc. On top of that, there are interaction (or transaction) between user and items, such as userA download/purchase movieB, userX give a rating 5 to productY ... etc. Now given all the metadata of user and item, as well as their interaction over time, can we answer the following questions ... What is the probability that userX purchase itemY ? What rating will userX give to itemY ? What is the top k unseen items that should be recommended to userX ? Content-based Approach In this approach, we make use of the metadata to categorize user and item and then match them at the category level. One example is to recommend jobs to candidates, we can do a IR/text search to match the user's resume with the job descriptions. Another example is to recommend an item that is "similar" to the one that the user has purchased. Similarity is measured according to the item's metadata and various distance function can be used. The goal is to find k nearest neighbors of the item we know the user likes. Collaborative Filtering Approach In this approach, we look purely at the interactions between user and item, and use that to perform our recommendation. The interaction data can be represented as a matrix. Notice that each cell represents the interaction between user and item. For example, the cell can contain the rating that user gives to the item (in the case the cell is a numeric value), or the cell can be just a binary value indicating whether the interaction between user and item has happened. (e.g. a "1" if userX has purchased itemY, and "0" otherwise. The matrix is also extremely sparse, meaning that most of the cells are unfilled. We need to be careful about how we treat these unfilled cells, there are 2 common ways ... Treat these unknown cells as "0". Make them equivalent to user giving a rate "0". This may or may not be a good idea depends on your application scenarios. Guess what the missing value should be. For example, to guess what userX will rate itemA given we know his has rate on itemB, we can look at all users (or those who is in the same age group of userX) who has rate both itemA and itemB, then compute an average rating from them. Use the average rating of itemA and itemB to interpolate userX's rating on itemA given his rating on itemB. User-based Collaboration Filter In this model, we do the following Find a group of users that is “similar” to user X Find all movies liked by this group that hasn’t been seen by user X Rank these movies and recommend to user X This introduces the concept of user-to-user similarity, which is basically the similarity between 2 row vectors of the user/item matrix. To compute the K nearest neighbor of a particular users. A naive implementation is to compute the "similarity" for all other users and pick the top K. Different similarity functions can be used. Jaccard distance function is defined as the number of intersections of movies that both users has seen divided by the number of union of movies they both seen. Pearson similarity is first normalizing the user's rating and then compute the cosine distance. There are two problems with this approach Compare userX and userY is expensive as they have millions of attributes Find top k similar users to userX require computing all pairs of userX and userY Location Sensitive Hashing and Minhash To resolve problem 1, we approximate the similarity using a cheap estimation function, called minhash. The idea is to find a hash function h() such that the probability of h(userX) = h(userY) is proportion to the similarity of userX and userY. And if we can find 100 of h() function, we can just count the number of such function where h(userX) = h(userY) to determine how similar userX is to userY. The idea is depicted as follows ... It will be expensive to permute the rows if the number of rows is large. Remember that the purpose of h(c1) is to return row number of the first row that is 1. So we can scan each row of c1 to see if it is 1, if so we apply a function newRowNum = hash(rowNum) to simulate a permutation. Take the minimum of the newRowNum seen so far. As an optimization, instead of doing one column at a time, we can do it a row at the time, the algorithm is as follows To solve problem 2, we need to avoid computing all other users' similarity to userX. The idea is to hash users into buckets such that similar users will be fall into the same bucket. Therefore, instead of computing all users, we only compute the similarity of those users who is in the same bucket of userX. The idea is to horizontally partition the column into b bands, each with r rows. By pick the parameter b and r, we can control the likelihood (function of similarity) that they will fall into the same bucket in at least one band. Item-based Collaboration Filter If we transpose the user/item matrix and do the same thing, we can compute the item to item similarity. In this model, we do the following ... Find the set of movies that user X likes (from interaction data) Find a group of movies that is similar to these set of movies that we know user X likes Rank these movies and recommend to user X It turns out that computing item-based collaboration filter has more benefit than computing user to user similarity for the following reasons ... Number of items typically smaller than number of users While user's taste will change over time and hence the similarity matrix need to be updated more frequent, item to item similarity tends to be more stable and requires less update. Singular Value Decomposition If we look back at the matrix, we can see the matrix multiplication is equivalent to mapping an item from the item space to the user space. In other words, if we view each of the existing item as an axis in the user space (notice, each user is a vector of their rating on existing items), then multiplying a new item with the matrix gives the same vector like the user. So we can then compute a dot product with this projected new item with user to determine its similarity. It turns out that this is equivalent to map the user to the item space and compute a dot product there. In other words, multiply the matrix is equivalent to mapping between item space and user space. Now lets imagine there is a hidden concept space in between. Instead of jumping directly from user space to item space, we can think of jumping from user space to a concept space, and then to the item space. Notice that here we first map the user space to the concept space and also map the item space to the concept space. Then we match both user and item at the concept space. This is a generalization of our recommender. We can use SVD to factor the matrix into 2 parts. Let P be the m by n matrix (m rows and n columns). P = UDV where U is an m by m matrix, each column represents the eigenvectors of P*transpose(P). And V is an n by n matrix with each row represents the eigenvector of transpose(P)*P. D is a diagonal matrix containing eigenvalues of P*transpose(P), or transpose(P)*P. In other words, we can decompose P into U*squareroot(D) and squareroot(D)*V. Notice that D can be thought as the strength of each "concept" in the concept space. And the value is order in terms of their magnitude in decreasing order. If we remove some of the weakest concept by making them zero, we reduce the number of non-zero elements in D, which effective generalize the concept space (make them focus in the important concepts). Calculate SVD decomposition for matrix with large dimensions is expensive. Fortunately, if our goal is to compute an SVD approximation (with k diagonal non-zero value), we can use the random projection mechanism as describer here. Association Rule Based In this model, we use the market/basket association rule algorithm to discover rule like ... {item1, item2} => {item3, item4, item5} We represent each user as a basket and each viewing as an item (notice that we ignore the rating and use a binary value). After that we use association rule mining algorithm to detect frequent item set and the association rules. Then for each user, we match the user's previous viewing items to the set of rules to determine what other movies should we recommend. Evaluate the recommender After we have a recommender, how do we evaluate the performance of it ? The basic idea is to use separate the data into the training set and the test set. For the test set, we remove certain user-to-movies interaction (change certain cells from 1 to 0) and pretending the user hasn't seen the item. Then we use the training set to train a recommender and then fit the test set (with removed interaction) to the recommender. The performance is measured by how much overlap between the recommended items with the one that we have removed. In other words, a good recommender should be able to recover the set of items that we have removed from the test set. Leverage tagging information on items In some cases, items has explicit tags associated with them (we can considered the tags is a user-annotated concept space added to the items). Consider each item is described with a vector of tags. Now user can also be auto-tagged based on the items they have interacted. For example, if userX purchase itemY which is tagged with Z1, and Z2. Then user will increase her tag Z1 and Z2 in her existing tag vector. We can use a time decay mechanism to update the user's tag vector as follows ... current_user_tag = alpha * item_tag + (1 - alpha) * prev_user_tag To recommend an item to the user, we simply need to calculate the top k items by computing the dot product (ie: cosine distance) of the user tag vector and the item tag vector. Source: http://horicky.blogspot.com/2011/09/recommendation-engine.html
November 2, 2011
by Ricky Ho
· 26,735 Views · 2 Likes
article thumbnail
Avoid Lazy JPA Collections
Hibernate (and actually JPA) has collection mappings: @OneToMany, @ManyToMany, @ElementCollection. All of these are by default lazy. This means the collections are specific implementations of the List or Set interface that hold a reference to the persistent session and the values are loaded from the database only if the collection is accessed. That saves unnecessary database queries if you only occasionally use the collection. However, there’s a problem with that. The problem that manifests itself through the exception that in my observations is 2nd most commonly asked exception (after NullPointerException) – the LazyInitializationException. The problem is that the session is usually open for your service layer and is closed as soon as you return the entity to the view layer. And when you try to iterate the uninitialized collection in your view (jsp for example), the collection throws LazyInitializationException, because the session that they hold a reference to is already closed and they can’t fetch the items. How is this solved? The so called OpenSessionInView / OpenEntityManagerInView “patterns”. In short: you make a filter that opens the session when the request starts and closes it after the view has been rendered (and not after the service layer finishes). Some people call that an anti-pattern, because it leaks persistence handling into the view layer, and complicates the setup. I wouldn’t say it’s that bad: generally it solves the problem without introducing other problems. But in all recent project I’ve been involved, we aren’t using OpenSessionInView, and it works fine. It works fine because we aren’t using lazy collections. But then, you’ll rightly point, you will be fetching “the whole world” when you load a single entity. Well, no. There are two types of *ToMany mappings: value-type mappings where the collection logically does not hold more than a dozen elements. This is in most cases @ElementCollection, and also @*ToMany with items like “Category” or “Price” that are just more complex value objects, but that do not hold any other mappings themselves. Another common feature of these types of collections is that they are usually displayed in the UI together with their owning entity. It is most likely that you want to display the categories of an article, for example. For this type of collections EAGER is the better option. You’ll have to fetch them anyway, why not let hibernate (or any jpa implementation) think of some clever join? As I said – the collections are logically not bigger than a dozen or two, so fetching them won’t be a performance hit. And, logically, they won’t fetch a big object graph with them. mappings across the big, core entities. This can be “all orders made by the user” or “all users for the organization”, “all items of the supplier”, etc. You certainly don’t want to fetch them eagerly. Because if you fetch 2000 users for an organization, which in turn have 1000 orders each, and an order has 3 items on average which in turn have a collection of all people who have purchased it.. you’ll end up with your entire database in memory. Obviously you need lazy collections, right? Well, no. In that case you should not be using collection mappings at all. These types of relations are, in 99% of the cases, displayed in paged lists in the UI. Or in search results. They are never (and should never) be displayed all on one screen (or should rarely be returned in one API call, if your application provides something like a REST API). You have to make queries for them, and use query.setMaxResults and query.setFirstResult() (or limit them with some restrictive criteria). Furthermore having the collections mapped means someone will try to use them at some point, which may fail. And if the object is serialized (xml, json, etc.) the collection contents will be fetched. Something you almost certainly don’t want to happen. (A draft idea here: JPA could have a PagedList collection that would allow paged lazy fetching, thus eliminating the need for a query) So what did I just say – that you should never use lazy collections. Use eager collections for very simple, shallow mappings, and use paged queries for the bigger ones. Well, not exactly. Lazy collections are there and they have application, though it is rather limited. Or at least they are way less applicable than they are used. Here’s an example scenario where I found it applicable. In my side-project I have a Message entity, and it holds a collection of Picture entities. When a user uploads a picture, it is stored in that collection. A message can have no more than 10 pictures, so the collection could very well be eager. But then, Message is the most commonly used entity – it’s fetched virtually on every request. But only some messages have pictures (how many of the tweets on your stream have a a picture upload?). So I don’t want hibernate to make queries just to find out there are no pictures for a given message. Hence I store the number of pictures in a separate field, make the pictures collection lazy, and Hibernate.initialize(..) it manually only if the number of pictures is > 0. So there are scenarios, when the entity has optional collections that fall into the first category above (“small, shallow collections”). So if it is small, shallow and optional (say, used in less than 20% of the cases), then you should go with Lazy to save unnecessary queries. For everything else – having lazy collections will make your life harder. From http://techblog.bozho.net/?p=645
October 28, 2011
by Bozhidar Bozhanov
· 21,366 Views · 1 Like
article thumbnail
Magic: The Gathering in JavaScript and HTML5
as a user interface fan, i could not miss the development with html 5. so the goal of this post is to walk through a graphic application that uses javascript and html 5. we will see through examples one way (among others) to develop this kind of project. application overview tools the html 5 page data gathering cards loading & cache handling cards display mouse management state storage animations handling multi-devices conclusion to go further application overview we will produce an application that will let us display a magic the gathering ©(courtesy of www.wizards.com/magic ) cards collection. users will be able to scroll and zoom using the mouse (like bing maps, for example). you can see the final result here: http://bolaslenses.catuhe.com the project source files can be downloaded here: http://www.catuhe.com/msdn/bolaslenses.zip cards are stored on windows azure storage and use the azure content distribution network ( cdn : a service that deploys data near the final users) in order to achieve maximum performances. an asp.net service is used to return cards list (using json format). tools to write our application, we will use visual studio 2010 sp1 with web standards update . this extension adds intellisense support in html 5 page (which is a really important thing ). so, our solution will contain an html 5 page side by side with .js files (these files will contain javascript scripts). about debug, it is possible to set a breakpoint directly in the .js files under visual studio. it is also possible to use the developer bar of internet explorer 9 (use f12 key to display it). debug with visual studio 2010 debug with internet explorer 9 (f12/developer bar) so, we have a modern developer environment with intellisense and debug support. therefore, we are ready to start and first of all, we will write the html 5 page. the html 5 page our page will be built around an html 5 canvas which will be used to draw the cards: cards scanned by mwshq team magic the gathering official site : http://www.wizards.com/magic bolas lenses your browser does not support html5 canvas. loading data... if we dissect this page, we can note that it is divided into two parts: the header part with the title, the logo and the special mentions the main part (section) holds the canvas and the tooltips that will display the status of the application. there is also a hidden image ( backimage ) used as source for not yet loaded cards. to build the layout of the page, a style sheet ( full.css ) is applied. style sheets are a mechanism used to change the tags styles (in html, a style defines the entire display options for a tag): html, body { height: 100%; } body { background-color: #888888; font-size: .85em; font-family: "segoe ui, trebuchet ms" , verdana, helvetica, sans-serif; margin: 0; padding: 0; color: #696969; } a:link { color: #034af3; text-decoration: underline; } a:visited { color: #505abc; } a:hover { color: #1d60ff; text-decoration: none; } a:active { color: #12eb87; } header, footer, nav, section { display: block; } table { width: 100%; } header, #header { position: relative; margin-bottom: 0px; color: #000; padding: 0; } #title { font-weight: bold; color: #fff; border: none; font-size: 60px !important; vertical-align: middle; margin-left: 70px } #legal { text-align: right; color: white; font-size: 14px; width: 50%; position: absolute; top: 15px; right: 10px } #leftheader { width: 50%; vertical-align: middle; } section { margin: 20px 20px 20px 20px; } #maincanvas{ border: 4px solid #000000; } #cardscount { font-weight: bolder; font-size: 1.1em; } .tooltip { position: absolute; bottom: 5px; color: black; background-color: white; margin-right: auto; margin-left: auto; left: 35%; right: 35%; padding: 5px; width: 30%; text-align: center; border-radius: 10px; -webkit-border-radius: 10px; -moz-border-radius: 10px; box-shadow: 2px 2px 2px #333333; } #bolaslogo { width: 64px; height: 64px; } #picturecell { float: left; width: 64px; margin: 5px 5px 5px 5px; vertical-align: middle; } thus, this sheet is responsible for setting up the following display: style sheets are powerful tools that allow an infinite number of displays. however, they are sometimes complicated to setup (for example if a tag is affected by a class, an identifier and its container). to simplify this setup, the development bar of internet explorer 9 is particularly useful because we can use it to see styles hierarchy that is applied to a tag. for example let’s take a look at the waittext tooltip with the development bar. to do this, you must press f12 in internet explorer and use the selector to choose the tooltip: once the selection is done, we can see the styles hierarchy: thus, we can see that our div received its styles from the body tag and the . tooltip entry of the style sheet. with this tool, it becomes possible to see the effect of each style (which can be disabled). it is also possible to add new style on the fly. another important point of this window is the ability to change the rendering mode of internet explorer 9. indeed, we can test how, for example, internet explorer 8 will handle the same page. to do this, go to the [ browser mode ] menu and select the engine of internet explorer 8. this change will especially impact our tooltip as it uses border-radius (rounded edge) and box-shadow that are features of css 3: internet explorer 9 internet explorer 8 our page provides a graceful degradation as it still works (with no annoying visual difference) when the browser does not support all the required technologies. now that our interface is ready, we will take a look at the data source to retrieve the cards to display. the server provides the cards list using json format on this url: http://bolaslenses.catuhe.com/ home/listofcards/?colorstring=0 it takes one parameter ( colorstring ) to select a specific color (0 = all). when developing with javascript, there is a good reflex to have (reflex also good in other languages too, but really important in javascript): one must ask whether what we want to develop has not been already done in an existing framework. indeed, there is a multitude of open source projects around javascript. one of them is jquery which provides a plethora of convenient services. thus, in our case to connect to the url of our server and get the cards list, we could go through a xmlhttprequest and have fun to parse the returned json. or we can use jquery . so we will use the getjson function which will take care of everything for us: function getlistofcards() { var url = "http://bolaslenses.catuhe.com/home/listofcards/?jsoncallback=?"; $.getjson(url, { colorstring: "0" }, function (data) { listofcards = data; $("#cardscount").text(listofcards.length + " cards displayed"); $("#waittext").slidetoggle("fast"); }); } as we can see, our function stores the cards list in the listofcards variable and calls two jquery functions: text that change the text of a tag slidetoggle that hides (or shows) a tag by animating its height the listofcards list contains objects whose format is: id : unique identifier of the card path : relative path of the card (without the extension) it should be noted that the url of the server is called with the “ ?jsoncallback=? ” suffix. indeed, ajax calls are constrained in terms of security to connect only to the same address as the calling script. however, there is a solution called jsonp that will allow us to make a concerted call to the server (which of course must be aware of the operation). and fortunately, jquery can handle it all alone by just adding the right suffix. once we have our cards list, we can set up the pictures loading and caching. cards loading & cache handling the main trick of our application is to draw only the cards effectively visible on the screen. the display window is defined by a zoom level and an offset (x, y) in the overall system. var visucontrol = { zoom : 0.25, offsetx : 0, offsety : 0 }; the overall system is defined by 14819 cards that are spread over 200 columns and 75 rows. also, we must be aware that each card is available in three versions: high definition: 480x680 without compression (.jpg suffix) medium definition: 240x340 with standard compression (.50.jpg suffix) low definition: 120x170 with strong compression (.25.jpg suffix) thus, depending on the zoom level, we will load the correct version to optimize networks transfer. to do this we will develop a function that will give an image for a given card. this function will be configured to download a certain level of quality. in addition it will be linked with lower quality level to return it if the card for the current level is not yet uploaded: function imagecache(substr, replacementcache) { var extension = substr; var backimage = document.getelementbyid("backimage"); this.load = function (card) { var localcache = this; if (this[card.id] != undefined) return; var img = new image(); localcache[card.id] = { image: img, isloaded: false }; currentdownloads++; img.onload = function () { localcache[card.id].isloaded = true; currentdownloads--; }; img.onerror = function() { currentdownloads--; }; img.src = "http://az30809.vo.msecnd.net/" + card.path + extension; }; this.getreplacementfromlowercache = function (card) { if (replacementcache == undefined) return backimage; return replacementcache.getimageforcard(card); }; this.getimageforcard = function(card) { var img; if (this[card.id] == undefined) { this.load(card); img = this.getreplacementfromlowercache(card); } else { if (this[card.id].isloaded) img = this[card.id].image; else img = this.getreplacementfromlowercache(card); } return img; }; } an imagecache is built by giving the associated suffix and the underlying cache. here you can see two important functions: load : this function will load the right picture and will store it in a cache (the msecnd.net url is the azure cdn address of the cards) getimageforcard : this function returns the card picture from the cache if already loaded. otherwise it requests the underlying cache to return its version (and so on) so to handle our 3 levels of caches, we have to declare three variables: var imagescache25 = new imagecache(".25.jpg"); var imagescache50 = new imagecache(".50.jpg", imagescache25); var imagescachefull = new imagecache(".jpg", imagescache50); selecting the right cover is only depending on zoom: function getcorrectimagecache() { if (visucontrol.zoom <= 0.25) return imagescache25; if (visucontrol.zoom <= 0.8) return imagescache50; return imagescachefull; } to give a feedback to the user, we will add a timer that will manage a tooltip that indicates the number of images currently loaded: function updatestats() { var stats = $("#stats"); stats.html(currentdownloads + " card(s) currently downloaded."); if (currentdownloads == 0 && statsvisible) { statsvisible = false; stats.slidetoggle("fast"); } else if (currentdownloads > 1 && !statsvisible) { statsvisible = true; stats.slidetoggle("fast"); } } setinterval(updatestats, 200); again we note the use of jquery to simplify animations. we will now discuss the display of cards. cards display to draw our cards, we need to actually fill the canvas using its 2d context (which exists only if the browser supports html 5 canvas): var maincanvas = document.getelementbyid("maincanvas"); var drawingcontext = maincanvas.getcontext('2d'); the drawing will be made by processlistofcards function (called 60 times per second): function processlistofcards() { if (listofcards == undefined) { drawwaitmessage(); return; } maincanvas.width = document.getelementbyid("center").clientwidth; maincanvas.height = document.getelementbyid("center").clientheight; totalcards = listofcards.length; var localcardwidth = cardwidth * visucontrol.zoom; var localcardheight = cardheight * visucontrol.zoom; var effectivetotalcardsinwidth = colscount * localcardwidth; var rowscount = math.ceil(totalcards / colscount); var effectivetotalcardsinheight = rowscount * localcardheight; initialx = (maincanvas.width - effectivetotalcardsinwidth) / 2.0 - localcardwidth / 2.0; initialy = (maincanvas.height - effectivetotalcardsinheight) / 2.0 - localcardheight / 2.0; // clear clearcanvas(); // computing of the viewing area var initialoffsetx = initialx + visucontrol.offsetx * visucontrol.zoom; var initialoffsety = initialy + visucontrol.offsety * visucontrol.zoom; var startx = math.max(math.floor(-initialoffsetx / localcardwidth) - 1, 0); var starty = math.max(math.floor(-initialoffsety / localcardheight) - 1, 0); var endx = math.min(startx + math.floor((maincanvas.width - initialoffsetx - startx * localcardwidth) / localcardwidth) + 1, colscount); var endy = math.min(starty + math.floor((maincanvas.height - initialoffsety - starty * localcardheight) / localcardheight) + 1, rowscount); // getting current cache var imagecache = getcorrectimagecache(); // render for (var y = starty; y < endy; y++) { for (var x = startx; x < endx; x++) { var localx = x * localcardwidth + initialoffsetx; var localy = y * localcardheight + initialoffsety; // clip if (localx > maincanvas.width) continue; if (localy > maincanvas.height) continue; if (localx + localcardwidth < 0) continue; if (localy + localcardheight < 0) continue; var card = listofcards[x + y * colscount]; if (card == undefined) continue; // get from cache var img = imagecache.getimageforcard(card); // render try { if (img != undefined) drawingcontext.drawimage(img, localx, localy, localcardwidth, localcardheight); } catch (e) { $.grep(listofcards, function (item) { return item.image != img; }); } } }; // scroll bars drawscrollbars(effectivetotalcardsinwidth, effectivetotalcardsinheight, initialoffsetx, initialoffsety); // fps computefps(); } this function is built around many key points: if the cards list is not yet loaded, we display a tooltip indicating that download is in progress: var pointcount = 0; function drawwaitmessage() { pointcount++; if (pointcount > 200) pointcount = 0; var points = ""; for (var index = 0; index < pointcount / 10; index++) points += "."; $("#waittext").html("loading...please wait" + points); subsequently, we define the position of the display window (in terms of cards and coordinates), then we proceed to clean the canvas: function clearcanvas() { maincanvas.width = document.body.clientwidth - 50; maincanvas.height = document.body.clientheight - 140; drawingcontext.fillstyle = "rgb(0, 0, 0)"; drawingcontext.fillrect(0, 0, maincanvas.width, maincanvas.height); } then we browse the cards list and call the drawimage function of the canvas context. the current image is provided by the active cache (depending on the zoom): // get from cache var img = imagecache.getimageforcard(card); // render try { if (img != undefined) drawingcontext.drawimage(img, localx, localy, localcardwidth, localcardheight); } catch (e) { $.grep(listofcards, function (item) { return item.image != img; }); we also have to draw the scroll bar with the roundedrectangle function that uses quadratic curves: function roundedrectangle(x, y, width, height, radius) { drawingcontext.beginpath(); drawingcontext.moveto(x + radius, y); drawingcontext.lineto(x + width - radius, y); drawingcontext.quadraticcurveto(x + width, y, x + width, y + radius); drawingcontext.lineto(x + width, y + height - radius); drawingcontext.quadraticcurveto(x + width, y + height, x + width - radius, y + height); drawingcontext.lineto(x + radius, y + height); drawingcontext.quadraticcurveto(x, y + height, x, y + height - radius); drawingcontext.lineto(x, y + radius); drawingcontext.quadraticcurveto(x, y, x + radius, y); drawingcontext.closepath(); drawingcontext.stroke(); drawingcontext.fill(); } function drawscrollbars(effectivetotalcardsinwidth, effectivetotalcardsinheight, initialoffsetx, initialoffsety) { drawingcontext.fillstyle = "rgba(255, 255, 255, 0.6)"; drawingcontext.linewidth = 2; // vertical var totalscrollheight = effectivetotalcardsinheight + maincanvas.height; var scaleheight = maincanvas.height - 20; var scrollheight = maincanvas.height / totalscrollheight; var scrollstarty = (-initialoffsety + maincanvas.height * 0.5) / totalscrollheight; roundedrectangle(maincanvas.width - 8, scrollstarty * scaleheight + 10, 5, scrollheight * scaleheight, 4); // horizontal var totalscrollwidth = effectivetotalcardsinwidth + maincanvas.width; var scalewidth = maincanvas.width - 20; var scrollwidth = maincanvas.width / totalscrollwidth; var scrollstartx = (-initialoffsetx + maincanvas.width * 0.5) / totalscrollwidth; roundedrectangle(scrollstartx * scalewidth + 10, maincanvas.height - 8, scrollwidth * scalewidth, 5, 4); } and finally, we need to compute the number of frames per second: function computefps() { if (previous.length > 60) { previous.splice(0, 1); } var start = (new date).gettime(); previous.push(start); var sum = 0; for (var id = 0; id < previous.length - 1; id++) { sum += previous[id + 1] - previous[id]; } var diff = 1000.0 / (sum / previous.length); $("#cardscount").text(diff.tofixed() + " fps. " + listofcards.length + " cards displayed"); } drawing cards relies heavily on the browser's ability to speed up canvas rendering. for the record, here are the performances on my machine with the minimum zoom level (0.05): browser fps internet explorer 9 30 firefox 5 30 chrome 12 17 ipad (with a zoom level of 0.8) 7 windows phone mango (with a zoom level of 0.8) 20 (!!) the site even works on mobile phones and tablets as long as they support html 5. here we can see the inner power of html 5 browsers that can handle a full screen of cards more than 30 times per second! mouse management to browse our cards collection, we have to manage the mouse (including its wheel). for the scrolling, we'll just handle the onmouvemove , onmouseup and onmousedown events. onmouseup and onmousedown events will be used to detect if the mouse is clicked or not: var mousedown = 0; document.body.onmousedown = function (e) { mousedown = 1; getmouseposition(e); previousx = posx; previousy = posy; }; document.body.onmouseup = function () { mousedown = 0; }; the onmousemove event is connected to the canvas and used to move the view: var previousx = 0; var previousy = 0; var posx = 0; var posy = 0; function getmouseposition(eventargs) { var e; if (!eventargs) e = window.event; else { e = eventargs; } if (e.offsetx || e.offsety) { posx = e.offsetx; posy = e.offsety; } else if (e.clientx || e.clienty) { posx = e.clientx; posy = e.clienty; } } function onmousemove(e) { if (!mousedown) return; getmouseposition(e); mousemovefunc(posx, posy, previousx, previousy); previousx = posx; previousy = posy; } this function (onmousemove) calculates the current position and provides also the previous value in order to move the offset of the display window: function move(posx, posy, previousx, previousy) { currentaddx = (posx - previousx) / visucontrol.zoom; currentaddy = (posy - previousy) / visucontrol.zoom; } mousehelper.registermousemove(maincanvas, move); note that jquery also provides tools to manage mouse events. for the management of the wheel, we will have to adapt to different browsers that do not behave the same way on this point: function wheel(event) { var delta = 0; if (event.wheeldelta) { delta = event.wheeldelta / 120; if (window.opera) delta = -delta; } else if (event.detail) { /** mozilla case. */ delta = -event.detail / 3; } if (delta) { wheelfunc(delta); } if (event.preventdefault) event.preventdefault(); event.returnvalue = false; } we can see that everyone does what he wants :). the function to register with this event is: mousehelper.registerwheel = function (func) { wheelfunc = func; if (window.addeventlistener) window.addeventlistener('dommousescroll', wheel, false); window.onmousewheel = document.onmousewheel = wheel; }; and we will use this function to change the zoom with the wheel: // mouse mousehelper.registerwheel(function (delta) { currentaddzoom += delta / 500.0; }); finally we will add a bit of inertia when moving the mouse (and the zoom) to give some kind of smoothness: // inertia var inertia = 0.92; var currentaddx = 0; var currentaddy = 0; var currentaddzoom = 0; function doinertia() { visucontrol.offsetx += currentaddx; visucontrol.offsety += currentaddy; visucontrol.zoom += currentaddzoom; var effectivetotalcardsinwidth = colscount * cardwidth; var rowscount = math.ceil(totalcards / colscount); var effectivetotalcardsinheight = rowscount * cardheight var maxoffsetx = effectivetotalcardsinwidth / 2.0; var maxoffsety = effectivetotalcardsinheight / 2.0; if (visucontrol.offsetx < -maxoffsetx + cardwidth) visucontrol.offsetx = -maxoffsetx + cardwidth; else if (visucontrol.offsetx > maxoffsetx) visucontrol.offsetx = maxoffsetx; if (visucontrol.offsety < -maxoffsety + cardheight) visucontrol.offsety = -maxoffsety + cardheight; else if (visucontrol.offsety > maxoffsety) visucontrol.offsety = maxoffsety; if (visucontrol.zoom < 0.05) visucontrol.zoom = 0.05; else if (visucontrol.zoom > 1) visucontrol.zoom = 1; processlistofcards(); currentaddx *= inertia; currentaddy *= inertia; currentaddzoom *= inertia; // epsilon if (math.abs(currentaddx) < 0.001) currentaddx = 0; if (math.abs(currentaddy) < 0.001) currentaddy = 0; } this kind of small function does not cost a lot to implement, but adds a lot to the quality of user experience. state storage also to provide a better user experience, we will save the display window’s position and zoom. to do this, we will use the service of localstorage (which saves pairs of keys / values for the long term (the data is retained after the browser is closed) and only accessible by the current window object): function saveconfig() { if (window.localstorage == undefined) return; // zoom window.localstorage["zoom"] = visucontrol.zoom; // offsets window.localstorage["offsetx"] = visucontrol.offsetx; window.localstorage["offsety"] = visucontrol.offsety; } // restore data if (window.localstorage != undefined) { var storedzoom = window.localstorage["zoom"]; if (storedzoom != undefined) visucontrol.zoom = parsefloat(storedzoom); var storedoffsetx = window.localstorage["offsetx"]; if (storedoffsetx != undefined) visucontrol.offsetx = parsefloat(storedoffsetx); var storedoffsety = window.localstorage["offsety"]; if (storedoffsety != undefined) visucontrol.offsety = parsefloat(storedoffsety); } animations to add even more dynamism to our application we will allow our users to double-click on a card to zoom and focus on it. our system should animate three values: the two offsets (x, y) and the zoom. to do this, we will use a function that will be responsible of animating a variable from a source value to a destination value with a given duration: var animationhelper = function (root, name) { var paramname = name; this.animate = function (current, to, duration) { var offset = (to - current); var ticks = math.floor(duration / 16); var offsetpart = offset / ticks; var tickscount = 0; var intervalid = setinterval(function () { current += offsetpart; root[paramname] = current; tickscount++; if (tickscount == ticks) { clearinterval(intervalid); root[paramname] = to; } }, 16); }; }; the use of this function is: // prepare animations parameters var zoomanimationhelper = new animationhelper(visucontrol, "zoom"); var offsetxanimationhelper = new animationhelper(visucontrol, "offsetx"); var offsetyanimationhelper = new animationhelper(visucontrol, "offsety"); var speed = 1.1 - visucontrol.zoom; zoomanimationhelper.animate(visucontrol.zoom, 1.0, 1000 * speed); offsetxanimationhelper.animate(visucontrol.offsetx, targetoffsetx, 1000 * speed); offsetyanimationhelper.animate(visucontrol.offsety, targetoffsety, 1000 * speed); the advantage of the animationhelper function is that it is able to animate as many parameters as you wish (and that only with the settimer function!) handling multi-devices finally we will ensure that our page can also be seen on tablets pc and even on phones. to do this, we will use a feature of css 3: the media-queries . with this technology, we can apply style sheets according to some queries such as a specific display size: here we see that if the screen width is less than 480 pixels, the following style sheet will be added: #legal { font-size: 8px; } #title { font-size: 30px !important; } #waittext { font-size: 12px; } #bolaslogo { width: 48px; height: 48px; } #picturecell { width: 48px; } finally we will ensure that our page can also be seen on tablets pc and even on phones. to do this, we will use a feature of css 3: #legal { font-size: 8px; } #title { font-size: 30px !important; } #waittext { font-size: 12px; } #bolaslogo { width: 48px; height: 48px; } #picturecell { width: 48px; } conclusion html 5 / css 3 / javascript and visual studio 2010 allow to develop portable and efficient solutions (within the limits of browsers that support html 5 of course) with some great features such as hardware accelerated rendering. this kind of development is also simplified by the use of frameworks like jquery. also, i am especially fan of javascript that turns out to be a very powerful dynamic language. of course, c# or vb.net developers have to change theirs reflexes but for the development of web pages it's worth. in conclusion, i think that the best to be convinced is to try! to go further internet explorer test drive: http://ie.microsoft.com/testdrive/ internet explorer 9 guide for developer : http://msdn.microsoft.com/en-us/ie/ff468705 w3c site for html 5 : http://dev.w3.org/html5/spec/overview.html internet explorer site : http://msdn.microsoft.com/en-us/ie/aa740469 about the author david catuhe is a developer evangelist for microsoft france in charge of user experience development tools (from xaml to directx/xna and html5). he defines himself as a geek and likes coding all that refer to graphics. before working for microsoft, he founded a company that developed a realtime 3d engine written with directx ( www.vertice.fr ). source: http://blogs.msdn.com/b/eternalcoding/archive/2011/07/25/feedback-of-a-graphic-development-using-html5-amp-javascript.aspx
October 24, 2011
by David Catuhe
· 25,134 Views · 1 Like
article thumbnail
RDF data in Neo4J - the Tinkerpop story
My previous blog post discussed the use of Neo4J as a RDF triple store. Michael Hunger however informed me that the neo-rdf-sail component is no longer under active development and advised me to have a look at Tinkerpop’s Sail implementation. As mentioned in my previous blog post, I recently got asked to implement a storage and querying platform for biological RDF (Resource Description Framework) data. Traditional RDF stores are not really an option as my solution should also provide the ability to calculate shortest paths between random subjects. Calculating shortest path is however one of the strong selling points of Graph Databases and more specifically Neo4J. Unfortunately, the neo-rdf-sail component, which suits my requirements perfectly, is no longer under active development. Tinkerpop’s Sail implementation however, fills the void with an even better alternative! 1. What is Tinkerpop? Tinkerpop is an open source project that provides an entire stack of technologies within the Graph Database space. At the core of this stack is the Blueprints framework. Blueprints can be considered as the JDBC of Graph Databases. By providing a collection of generic interfaces, it allows to develop graph-based applications, without introducing explicit dependencies on concrete Graph Database implementations. Additionally, Blueprints provides concrete bindings for the Neo4J, OrientDB and Dex Graph Databases. On top of Blueprints, the Tinkerpop team developed an entire range of graph technologies, including Gremlin, a powerful, domain-specific language designed for traversing graphs. Hence, once a Blueprints binding is available for a particular Graph Database, an entire range of technologies can be leveraged. 2. Tinkerpop and Sail Last time, I talked about exposing a Neo4J Graph Database (containing RDF triples) through the Sail interface, which is part of the openrdf.org project. By doing so, we can reuse an entire range of RDF utilities (parsers and query evaluators) that are part of the openrdf.org project. The Blueprints framework provides us with a similar ability: each Graph Database binding that implements the Tinkerpop TransactionalGraph and IndexableGraph interfaces can be exposed as a GraphSail, which is Tinkerpop’s implementation of the Sail interface. Once you have your Sail available, storing and querying RDF is analogous to the piece of code shown in my previous blog article. // Create the sail graph database graph = new MyNeo4jGraph("var/flights", 100000); graph.setTransactionMode(TransactionalGraph.Mode.MANUAL); sail = new GraphSail(graph); // Initialize the sail store sail.initialize(); // Get the sail repository connection connection = new SailRepository(sail).getConnection(); // Import the data connection.add(getResource("sneeair.rdf"), null, RDFFormat.RDFXML); // Execute SPARQL query TupleQuery durationquery = connection.prepareTupleQuery(QueryLanguage.SPARQL, "PREFIX io: " + "PREFIX fl: " + "SELECT ?number ?departure ?destination " + "WHERE { " + "?flight io:flight ?number . " + "?flight fl:flightFromCityName ?departure . " + "?flight fl:flightToCityName ?destination . " + "?flight io:duration \"1:35\" . " + "}"); TupleQueryResult result = durationquery.evaluate(); The two first lines of code require some more clarification. A TransactionalGraph can be run in MANUAL or AUTOMATIC transaction mode. In AUTOMATIC mode, transactions are basically ignored, in the sense that each item that gets created is immediately persisted in the underlying Graph Database. Although this fits my needs, AUTOMATIC mode is extremely slow in case of Neo4J because of the continuous IO access. MANUAL mode on the other hand is very fast; a new transaction is created at the moment the import of the RDF data file starts and is only committed to the Neo4J data store once all RDF triples are parsed and created. Unfortunately, MANUAL mode does not scale either in my specific situation; as some of my RDF data files contain over 50 million RDF triples, they can not fit into memory (i.e. Java heap space error). Requiring fast imports, I extended the default Neo4J Blueprints binding to support intermediate commits. I based my implementation on Neo4J’s best practices for big transactions. The idea is rather simple: you specify the maximum number of items that can be kept in memory, before they should be committed to the Neo4J data store. Once this number is reached, the current transaction is committed and a new one is automatically started. Simple, but very effective! public class MyNeo4jGraph extends Neo4jGraph { private long numberOfItems = 0; private long maxNumberOfItems = 1; public MyNeo4jGraph(final String directory, long maxNumberOfItems) { super(directory, null); this.maxNumberOfItems = maxNumberOfItems; } public MyNeo4jGraph(final String directory, final Map configuration, long maxNumberOfItems) { super(directory, configuration); this.maxNumberOfItems = maxNumberOfItems; } public Vertex addVertex(final Object id) { Vertex vertex = super.addVertex(id); commitIfRequired(); return vertex; } public Edge addEdge(final Object id, final Vertex outVertex, final Vertex inVertex, final String label) { Edge edge = super.addEdge(id, outVertex, inVertex, label); commitIfRequired(); return edge; } private void commitIfRequired() { // Check whether commit should be executed if (++numberOfItems % maxNumberOfItems == 0) { // Stop the transaction stopTransaction(Conclusion.SUCCESS); // Immediately start a new one startTransaction(); } } } 3. Shortest path calculation Although Blueprints allows you to abstract away the Neo4J implementation details, it still provides you with access to the raw Neo4J data store if needed. Hence, one can still use the graph algorithms provided in the neo4j-graph-algo component to calculate shortest paths between random subjects. The complete source code can be found on the Datablend public GitHub repository.
October 24, 2011
by Davy Suvee
· 25,272 Views
article thumbnail
How to Load or Save Image using Hibernate – MySQL
This tutorial will walk you throughout how to save and load an image from database (MySQL) using Hibernate. Requirements For this sampel project, we are going to use: Eclipse IDE (you can use your favorite IDE); MySQL (you can use any other database, make sure to change the column type if required); Hibernate jars and dependencies (you can download the sample project with all required jars); JUnit - for testing (jar also included in the sample project). PrintScreen When we finish implementing this sample projeto, it should look like this: Database Model Before we get started with the sample projet, we have to run this sql script into MySQL: DROP SCHEMA IF EXISTS `blog` ; CREATE SCHEMA IF NOT EXISTS `blog` DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci ; USE `blog` ; -- ----------------------------------------------------- -- Table `blog`.`BOOK` -- ----------------------------------------------------- DROP TABLE IF EXISTS `blog`.`BOOK` ; CREATE TABLE IF NOT EXISTS `blog`.`BOOK` ( `BOOK_ID` INT NOT NULL AUTO_INCREMENT , `BOOK_NAME` VARCHAR(45) NOT NULL , `BOOK_IMAGE` MEDIUMBLOB NOT NULL , PRIMARY KEY (`BOOK_ID`) ) ENGINE = InnoDB; This script will create a table BOOK, which we are going to use in this tutorial. Book POJO We are going to use a simple POJO in this project. A Book has an ID, a name and an image, which is represented by an array of bytes. As we are going to persist an image into the database, we have to use the BLOB type. MySQLhas some variations of BLOBs, you can check the difference between them here. In this example, we are going to use the Medium Blob, which can store L + 3 bytes, where L < 2^24. Make sure you do not forget to add the column definition on the Column annotation. package com.loiane.model; import javax.persistence.Column; import javax.persistence.Entity; import javax.persistence.GeneratedValue; import javax.persistence.Id; import javax.persistence.Lob; import javax.persistence.Table; @Entity @Table(name="BOOK") public class Book { @Id @GeneratedValue @Column(name="BOOK_ID") private long id; @Column(name="BOOK_NAME", nullable=false) private String name; @Lob @Column(name="BOOK_IMAGE", nullable=false, columnDefinition="mediumblob") private byte[] image; public long getId() { return id; } public void setId(long id) { this.id = id; } public String getName() { return name; } public void setName(String name) { this.name = name; } public byte[] getImage() { return image; } public void setImage(byte[] image) { this.image = image; } } Hibernate Config This configuration file contains the required info used to connect to the database. com.mysql.jdbc.Driver jdbc:mysql://localhost/blog root root org.hibernate.dialect.MySQLDialect 1 true Hibernate Util The HibernateUtil class helps in creating the SessionFactory from the Hibernate configuration file. package com.loiane.hibernate; import org.hibernate.SessionFactory; import org.hibernate.cfg.AnnotationConfiguration; import com.loiane.model.Book; public class HibernateUtil { private static final SessionFactory sessionFactory; static { try { sessionFactory = new AnnotationConfiguration() .configure() .addPackage("com.loiane.model") //the fully qualified package name .addAnnotatedClass(Book.class) .buildSessionFactory(); } catch (Throwable ex) { System.err.println("Initial SessionFactory creation failed." + ex); throw new ExceptionInInitializerError(ex); } } public static SessionFactory getSessionFactory() { return sessionFactory; } } DAO In this class, we created two methods: one to save a Book instance into the database and another one to load a Book instance from the database. package com.loiane.dao; import org.hibernate.HibernateException; import org.hibernate.Session; import org.hibernate.Transaction; import com.loiane.hibernate.HibernateUtil; import com.loiane.model.Book; public class BookDAOImpl { /** * Inserts a row in the BOOK table. * Do not need to pass the id, it will be generated. * @param book * @return an instance of the object Book */ public Book saveBook(Book book) { Session session = HibernateUtil.getSessionFactory().openSession(); Transaction transaction = null; try { transaction = session.beginTransaction(); session.save(book); transaction.commit(); } catch (HibernateException e) { transaction.rollback(); e.printStackTrace(); } finally { session.close(); } return book; } /** * Delete a book from database * @param bookId id of the book to be retrieved */ public Book getBook(Long bookId) { Session session = HibernateUtil.getSessionFactory().openSession(); try { Book book = (Book) session.get(Book.class, bookId); return book; } catch (HibernateException e) { e.printStackTrace(); } finally { session.close(); } return null; } } Test To test it, first we need to create a Book instance and set an image to the image attribute. To do so, we need to load an image from the hard drive, and we are going to use the one located in the images folder. Then we can call the DAO class and save into the database. Then we can try to load the image. Just to make sure it is the same image we loaded, we are going to save it in the hard drive. package com.loiane.test; import static org.junit.Assert.assertNotNull; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import org.junit.AfterClass; import org.junit.BeforeClass; import org.junit.Test; import com.loiane.dao.BookDAOImpl; import com.loiane.model.Book; public class TestBookDAO { private static BookDAOImpl bookDAO; @BeforeClass public static void runBeforeClass() { bookDAO = new BookDAOImpl(); } @AfterClass public static void runAfterClass() { bookDAO = null; } /** * Test method for {@link com.loiane.dao.BookDAOImpl#saveBook()}. */ @Test public void testSaveBook() { //File file = new File("images\\extjsfirstlook.jpg"); //windows File file = new File("images/extjsfirstlook.jpg"); byte[] bFile = new byte[(int) file.length()]; try { FileInputStream fileInputStream = new FileInputStream(file); fileInputStream.read(bFile); fileInputStream.close(); } catch (Exception e) { e.printStackTrace(); } Book book = new Book(); book.setName("Ext JS 4 First Look"); book.setImage(bFile); bookDAO.saveBook(book); assertNotNull(book.getId()); } /** * Test method for {@link com.loiane.dao.BookDAOImpl#getBook()}. */ @Test public void testGetBook() { Book book = bookDAO.getBook((long) 1); assertNotNull(book); try{ //FileOutputStream fos = new FileOutputStream("images\\output.jpg"); //windows FileOutputStream fos = new FileOutputStream("images/output.jpg"); fos.write(book.getImage()); fos.close(); }catch(Exception e){ e.printStackTrace(); } } } To verify if it was really saved, let’s check the table Book: and if we right click… and choose to see the image we just saved, we will see it: Source Code Download You can download the complete source code (or fork/clone the project – git) from: Github: https://github.com/loiane/hibernate-image-example BitBucket: https://bitbucket.org/loiane/hibernate-image-example/downloads Happy Coding! From http://loianegroner.com/2011/10/how-to-load-or-save-image-using-hibernate-mysql/
October 24, 2011
by Loiane Groner
· 98,051 Views · 2 Likes
article thumbnail
OpenStreetMap API framework for PHP
OpenStreetMap is a global project with an aim of collaboratively collecting map data, and today Ken Guest has submitted his PHP package for communitcating with the OSM API to the public and the PEAR PEPr review process: So over the last while, I’ve been working on a PHP package imaginatively named Services_Openstreetmap, for interacting with the openstreetmap API. I initially needed it so I could search for certain POIs and tabulate the results; it’s now also capable of adding data to the openstreetmap database – nodes and other elements can be created, updated and so on. It will even access the details of the user that is being used to modify that data, which is one difference between it and the other single purpose OSM frameworks. --Ken Guest You can view the submission here, and you should definitely take a look at openstreetmap.org if you haven't already. Good news for PHP developers looking to use this project more heavily in their applications.
October 22, 2011
by Mitch Pronschinske
· 16,268 Views
article thumbnail
How to retrieve/extract metadata information from audio files using Java and Apache Tika API?
i guess, i’m writing this post after a long time. this time, i’m writing about apache tika api that a friend of mine and i tried out to extract/retrieve metadata information from audio files supported by it – .mp3, .aiff, .au, .midi, .wav. to make it clear, here’s a screenshot of the information shown by windows vista about an audio file: we wanted to extract this using java and with googling, found that apache tika would help. we needed this metadata to index audio files for it to be searchable in a search application that we’re building using apache lucene . here’s a sample java program that extracts metadata from an mp3 file: package singz.samples.search.audio.metadata; import java.io.file; import java.io.fileinputstream; import java.io.filenotfoundexception; import java.io.ioexception; import java.io.inputstream; import org.apache.tika.exception.tikaexception; import org.apache.tika.metadata.metadata; import org.apache.tika.parser.parsecontext; import org.apache.tika.parser.parser; import org.apache.tika.parser.mp3.mp3parser; import org.xml.sax.contenthandler; import org.xml.sax.saxexception; import org.xml.sax.helpers.defaulthandler; /** * @author singaram subramanian * extract metadata of an audio file using apache tika api * */ public class audiometadataextractordemo { public static void main(string[] args) { // this audio file has metadata embedded in xmp (extensible metadata platform) standard // created by adobe systems inc. xmp standardizes the definition, creation, and // processing of extensible metadata. string audiofileloc = "c:\\pop\\backstreetboys_showmethemeaningofbeinglonely.mp3"; try { inputstream input = new fileinputstream(new file(audiofileloc)); contenthandler handler = new defaulthandler(); metadata metadata = new metadata(); parser parser = new mp3parser(); parsecontext parsectx = new parsecontext(); parser.parse(input, handler, metadata, parsectx); input.close(); // list all metadata string[] metadatanames = metadata.names(); for(string name : metadatanames){ system.out.println(name + ": " + metadata.get(name)); } // retrieve the necessary info from metadata // names - title, xmpdm:artist etc. - mentioned below may differ based // on the standard used for processing and storing standardized and/or // proprietary information relating to the contents of a file. system.out.println("title: " + metadata.get("title")); system.out.println("artists: " + metadata.get("xmpdm:artist")); system.out.println("genre: " + metadata.get("xmpdm:genre")); } catch (filenotfoundexception e) { e.printstacktrace(); } catch (ioexception e) { e.printstacktrace(); } catch (saxexception e) { e.printstacktrace(); } catch (tikaexception e) { e.printstacktrace(); } } } maven pom xml 4.0.0 singz.samples.search.audio audiometadataextractor 0.0.1 jar audiometadataextractor http://maven.apache.org utf-8 org.apache.tika tika-core 0.10 org.apache.tika tika-parsers 0.10 output xmpdm:releasedate: 2001 xmpdm:audiochanneltype: stereo xmpdm:album: top 100 pop author: backstreet boys xmpdm:artist: backstreet boys channels: 2 xmpdm:audiosamplerate: 44100 xmpdm:logcomment: eng xmpdm:tracknumber: 04 version: mpeg 3 layer iii version 1 xmpdm:composer: null xmpdm:audiocompressor: mp3 title: show me the meaning of being lonely samplerate: 44100 xmpdm:genre: pop content-type: audio/mpeg title: show me the meaning of being lonely artists: backstreet boys genre: pop about apache tika http://tika.apache.org/index.html “the apache tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries.” http://www.lucidimagination.com/devzone/technical-articles/content-extraction-tika#article.tika “apache tika is a content type detection and content extraction framework. tika provides a general application programming interface that can be used to detect the content type of a document and also parse textual content and metadata from several document formats. tika does not try to understand the full variety of different document formats by itself but instead delegates the real work to various existing parser libraries such as apache poi for microsoft formats, pdfbox for adobe pdf, neko html for html etc. the grand idea behind tika is that it offers a generic interface for parsing multiple formats. the tika api hides the technical differences of the various parser implementations. this means that you don’t have to learn and consume one api for every format you use but can instead use a single api – the tika api. internally tika usually delegates the parsing work to existing parsing libraries and adapts the parse result so that client applications can easily manage variety of formats. tika aims to be efficient in using available resources (mainly ram) while parsing. the tika api is stream oriented so that the parsed source document does not need to be loaded into memory all at once but only as it is needed. ultimately, however, the amount of resources consumed is mandated by the parser libraries that tika uses. at the time of writing this, tika supports directly around 30 document formats. see list of supported document formats . the list of supported document formats is not limited by tika in any way. in the simplest case you can add support for new document formats by implementing a thin adapter that that implements the parser interface for the new document format.” about xmp standard http://en.wikipedia.org/wiki/extensible_metadata_platform “the adobe extensible metadata platform ( xmp ) is a standard, created by adobe systems inc. , for processing and storing standardized and proprietary information relating to the contents of a file. xmp standardizes the definition, creation, and processing of extensible metadata . serialized xmp can be embedded into a significant number of popular file formats, without breaking their readability by non-xmp-aware applications. embedding metadata avoids many problems that occur when metadata is stored separately. xmp is used in pdf , photography and photo editing applications. xmp can be used in several file formats such as pdf , jpeg , jpeg 2000 , jpeg xr , gif , png , html , tiff , adobe illustrator , psd , mp3 , mp4 , audio video interleave , wav , rf64 , audio interchange file format , postscript , encapsulated postscript , and proposed for djvu . in a typical edited jpeg file, xmp information is typically included alongside exif and iptc information interchange model data.” from http://singztechmusings.wordpress.com/2011/10/17/how-to-retrieveextract-metadata-information-from-audio-files-using-java-and-apache-tika-api/
October 20, 2011
by Singaram Subramanian
· 34,185 Views
article thumbnail
Handling PHP Sessions in Windows Azure
One of the challenges in building a distributed web application is in handling sessions. When you have multiple instances of an application running and session data is written to local files (as is the default behavior for the session handling functions in PHP) a user session can be lost when a session is started on one instance but subsequent requests are directed (via a load balancer) to other instances. To successfully manage sessions across multiple instances, you need a common data store. In this post I’ll show you how the Windows Azure SDK for PHP makes this easy by storing session data in Windows Azure Table storage. In the 4.0 release of the Windows Azure SDK for PHP, session handling via Windows Azure Table and Blob storage was included in the newly added SessionHandler class. Note: The SessionHandler class supports storing session data in Table storage or Blob storage. I will focus on using Table storage in this post largely because I haven’t been able to come up with a scenario in which using Blob storage would be better (or even necessary). If you have ideas about how/why Blob storage would be better, I’d love to hear them. The SessionHandler class makes it possible to write code for handling sessions in the same way you always have, but the session data is stored on a Windows Azure Table instead of local files. To accomplish this, precede your usual session handling code with these lines: require_once 'Microsoft/WindowsAzure/Storage/Table.php'; require_once 'Microsoft/WindowsAzure/SessionHandler.php'; $storageClient = new Microsoft_WindowsAzure_Storage_Table('table.core.windows.net', 'your storage account name', 'your storage account key'); $sessionHandler = new Microsoft_WindowsAzure_SessionHandler($storageClient , 'sessionstable'); $sessionHandler->register(); Now you can call session_start() and other session functions as you normally would. Nicely, it just works. Really, that’s all there is to using the SessionHandler, but I found it interesting to take a look at how it works. The first interesting thing to note is that the register method is simply calling the session_set_save_handler function to essentially map the session handling functionality to custom functions. Here’s what the method looks like from the source code: public function register() { return session_set_save_handler(array($this, 'open'), array($this, 'close'), array($this, 'read'), array($this, 'write'), array($this, 'destroy'), array($this, 'gc') ); } The reading, writing, and deleting of session data is only slightly more complicated. When writing session data, the key-value pairs that make up the data are first serialized and then base64 encoded. The serialization of the data allows for lots of flexibility in the data you want to store (i.e. you don’t have to worry about matching some schema in the data store). When storing data in a table, each entry must have a partition key and row key that uniquely identify it. The partition key is a string (“sessions” by default, but this is changeable in the class constructor) and the the row key is the session ID. (For more information about the structure of Tables, see this post.) Finally, the data is either updated (it it already exists in the Table) or a new entry is inserted. Here’s a portion of the write function: $serializedData = base64_encode(serialize($serializedData)); $sessionRecord = new Microsoft_WindowsAzure_Storage_DynamicTableEntity($this->_sessionContainerPartition, $id); $sessionRecord->sessionExpires = time(); $sessionRecord->serializedData = $serializedData; try { $this->_storage->updateEntity($this->_sessionContainer, $sessionRecord); } catch (Microsoft_WindowsAzure_Exception $unknownRecord) { $this->_storage->insertEntity($this->_sessionContainer, $sessionRecord); } Not surprisingly, when session data is read from the table, it is retrieved by session ID, base64 decoded, and unserialized. Again, here’s a snippet that show’s what is happening: $sessionRecord = $this->_storage->retrieveEntityById( $this->_sessionContainer, $this->_sessionContainerPartition, $id ); return unserialize(base64_decode($sessionRecord->serializedData)); As you can see, the SessionHandler class makes good use of the storage APIs in the SDK. To learn more about the SessionHandler class (and the storage APIs), check out the documentation on Codeplex. You can, of course, get the complete source code here: http://phpazure.codeplex.com/SourceControl/list/changesets. As I investigated the session handling in the Windows Azure SDK for PHP, I noticed that the absence of support for SQL Azure as a session store was conspicuous. I’m curious about how many people would prefer to use SQL Azure over Azure Tables as a session store. If you have an opinion on this, please let me know in the comments.
October 19, 2011
by Brian Swan
· 7,869 Views
  • Previous
  • ...
  • 514
  • 515
  • 516
  • 517
  • 518
  • 519
  • 520
  • 521
  • 522
  • 523
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×