Data Engineering Resources

The Latest Data Engineering Topics

Most databases have them: a small handful of big tables that grow at a substantially faster rate than any of their peers (think tweets, clicks, check-ins, etc). Over time, the sheer size of these tables cause queries against them to slow down, and increase the size and time taken for backups. In some cases, not all of this data needs to be “live” so that it can be randomly accessed forever, and can be archived once some criteria has been met. I like to think of this as Eventually Irrelevant (if I may coin a term). This post was authored by Brian Ploetz Many database products support partitioning features, which allow you to arrange data physically based on some criteria (typically date, range, list, or hash-based partitioning). These partitioning features allow you to drop partitions from the live database when appropriate. While MongoDB supports horizontal partitioning via sharding, it does not currently support the notion of dropping shards. Even if it did, the criteria for what documents to drop from a collection would most likely not be based on your shard key, so it’s unlikely you’d want to drop a shard anyways. You’d want to drop data from all shards. Capped Collections and the forthcoming TTL-based Capped Collections feature are not quite what we’re looking for either. Capped Collections have very strict rules (including not being able to shard them), and for both normal and TTL-based Capped Collections, as old documents are aged out they are simply dropped on the floor. What we need is more control over the dropping process, such that we can guarantee that our data has been copied to an archive database/data warehouse before being dropped from the main database. I set out to find the most efficient way of manually archiving documents from a large collection given the tools currently available in MongoDB. For the purposes of this exercise, let’s assume we have a large collection of ad clicks which are associated with ads. We want to archive clicks which are associated with ads which expired more than 3 months ago. MapReduce MongoDB wants you to do bulk operations using MapReduce. Since all of the examples and documentation revolve around aggregation, I suspected I wouldn’t be able to use MapReduce as the backbone of my archiving process. After some experimenting, I found that my intuition was correct. Upon attempting to insert/remove documents within the map or reduce functions, I would quickly run into exception 10293 internal error: locks are not upgradeable errors: Sun Sep 25 21:22:24 [conn5] update warehouse.clicks query: { _id: ObjectId('4d6bf191db0b4b2b1fad5b65') } exception 10293 internal error: locks are not upgradeable: { "opid" : 4248173, "active" : false, "waitingForLock" : false, "op" : "update", "ns" : "?", "query" : { "_id" : { "$oid" : "4d6bf191db0b4b2b1fad5b65" } }, "client" : "0.0.0.0:0", "desc" : "conn" } 0ms Sun Sep 25 21:22:24 [conn5] Assertion: 10293:internal error: locks are not upgradeable: { "opid" : 4248174, "active" : false, "waitingForLock" : false, "op" : "update", "ns" : "?", "query" : { "_id" : { "$oid" : "4d6bf198db0b4b2b20ad5b65" } }, "client" : "0.0.0.0:0", "desc" : "conn" } 0x10008de9b 0x1002064f7 0x1002c1ec6 0x1002c4f3d 0x1002c5f7a 0x1000a8181 0x100147df4 0x1004dc68e 0x1004efc93 0x1004dc71b 0x1004dcb71 0x10049a078 0x100158efa 0x100377760 0x100389d0c 0x10034c204 0x10034d877 0x100180cc4 0x100184649 0x1002b9e89 0 mongod 0x000000010008de9b _ZN5mongo11msgassertedEiPKc + 315 1 mongod 0x00000001002064f7 _ZN5mongo10MongoMutex19_writeLockedAlreadyEv + 263 2 mongod 0x00000001002c1ec6 _ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE + 886 3 mongod 0x00000001002c4f3d _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE + 5661 4 mongod 0x00000001002c5f7a _ZN5mongo14DBDirectClient3sayERNS_7MessageE + 106 5 mongod 0x00000001000a8181 _ZN5mongo12DBClientBase6updateERKSsNS_5QueryENS_7BSONObjEbb + 273 6 mongod 0x0000000100147df4 _ZN5mongo12mongo_updateEP9JSContextP8JSObjectjPlS4_ + 660 7 mongod 0x00000001004dc68e js_Invoke + 3864 8 mongod 0x00000001004efc93 js_Interpret + 71932 9 mongod 0x00000001004dc71b js_Invoke + 4005 10 mongod 0x00000001004dcb71 js_InternalInvoke + 404 11 mongod 0x000000010049a078 JS_CallFunction + 86 12 mongod 0x0000000100158efa _ZN5mongo7SMScope6invokeEyRKNS_7BSONObjEib + 666 13 mongod 0x0000000100377760 _ZN5mongo2mr8JSMapper3mapERKNS_7BSONObjE + 96 14 mongod 0x0000000100389d0c _ZN5mongo2mr16MapReduceCommand3runERKSsRNS_7BSONObjERSsRNS_14BSONObjBuilderEb + 1740 15 mongod 0x000000010034c204 _ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb + 628 16 mongod 0x000000010034d877 _ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_10BufBuilderERNS_14BSONObjBuilderEbi + 2151 17 mongod 0x0000000100180cc4 _ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_10BufBuilderERNS_14BSONObjBuilderEbi + 52 18 mongod 0x0000000100184649 _ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_ + 10585 19 mongod 0x00000001002b9e89 _ZN5mongo13receivedQueryERNS_6ClientERNS_10DbResponseERNS_7MessageE + 569 Upon killing the client which kicked off the MapReduce process, theses errors would continue to spin out of control, and I’d have to bounce the server. While MapReduce’s divide and conquer approach is ideal for what we’re trying to accomplish, unless the ability to obtain a write lock within a MapReduce function is supported, it is not a viable option for now. db.eval() While a server side script is certainly a viable option for writing an archiving process, it has a fundamental flaw in that it won’t work for sharded collections, which could be a non-starter for some folks. An example archiving process using db.eval() would look something like this: archiveClick = function archiveClick(doc) { if (expiredAdIds.indexOf(doc.ad_id) != -1) { archivedb = db.getMongo().getDB("archive"); archivedb.clicks.save(doc); // before we remove the original document, make sure the archive worked if (archivedb.getLastError() == null) { db.clicks.remove({_id: doc._id}); } else { throw "could not archive document with _id " + doc._id; } } } archiveClicks = function archiveClicks() { threeMonthsAgo = new Date(); threeMonthsAgo.setMonth(threeMonthsAgo.getMonth()-3); expiredAdIds = []; getExpiredAds = function(ad) {expiredAdIds.push(ad._id.str);} db.ads.find({end_date: {$lte: threeMonthsAgo}).forEach(getExpiredAds); db.system.js.save({_id: "expiredAdIds", value: expiredAdIds}); db.clicks.find().forEach(archiveClick); db.system.js.remove({_id: "expiredAdIds"}); } db.system.js.save({_id: "archiveClick", value: archiveClick}); db.system.js.save({_id: "archiveClicks", value: archiveClicks}); db.runCommand({$eval: "archiveClicks()", nolock: true}); Note the nolock: true sent to db.eval(). This ensures the mongod isn’t locked for other operations while this runs. The Holy Grail: Define Custom archive() Functions On Collections In order for our archiving process to work for sharded collections, we will need to add new functions to collection objects in the shell to perform the archiving. Frankly, this feels a lot more natural than db.eval(). I wanted to support two modes of archiving: immediate archiving, and a mark & sweep approach which allows you to queue documents to be archived at one point in time, and perform the actual archiving at a later time (off-peak hours). I’ve made these functions available in my mongodb-archive project on GitHub. Download the archive.js file, and use it like so: load("archive.js"); threeMonthsAgo = new Date(); threeMonthsAgo.setMonth(threeMonthsAgo.getMonth()-3); expiredAdIds = []; getExpiredAds = function(ad) {expiredAdIds.push(ad._id.str);} db.ads.find({end_date: {$lte: threeMonthsAgo}).forEach(getExpiredAds); archiveConnection = new Mongo("localhost:27018"); archiveDB = archiveConnection.getDB("archive"); archiveCollection = archiveDB.getCollection("clicks"); for (var i = 0; i < expiredAdIds.length; i++) { print("archiving clicks for ad_id: " + expiredAdIds[i]); db.clicks.archive({"ad_id": expiredAdIds[i]}, archiveCollection); print(""); } The mark/sweep variant would look like: db.clicks.queueForArchive({"ad_id": expiredAdIds[i]}); // ....some time later..... db.clicks.archiveQueued(archiveCollection); The biggest factors on the performance of the archiving process are really no different than the things that impact the performance of your MongoDB database overall: working set size, indexes, disk speed, and lock contention. The more indexes there are on the source collection, the longer the remove() will take. The more indexes there are on the archive collection, the longer the save() will take. The more of the data that you wish to archive is resident in memory, the faster the process will be. On my MacBook Pro (2.3GHz i7, 8GB RAM) with a 5400 RPM SATA drive and a clicks collection containing over 11 million documents, I was able to archive ~500 documents per second with the immediate archiving approach. Example output from mongostat when this was running: Archive DB: insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 1 0 0 3 0 208m 2.85g 15m 3.8 0 0|0 0|0 1k 1k 2 17:13:02 0 0 45 0 0 46 0 208m 2.85g 15m 3.4 0 0|0 0|0 61k 6k 2 17:13:03 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 8 0 0 9 0 208m 2.85g 15m 0.1 0 0|0 0|0 10k 2k 2 17:13:04 0 0 47 0 0 48 0 208m 2.85g 15m 3.4 0 0|0 0|0 66k 7k 2 17:13:05 0 0 0 0 0 1 1 208m 2.85g 15m 0 0 0|0 0|0 62b 1k 2 17:13:06 0 0 18 0 0 19 0 208m 2.85g 15m 2.6 0 0|0 0|0 24k 3k 2 17:13:07 0 0 79 0 0 80 0 208m 2.85g 16m 1.1 0 0|0 0|0 111k 11k 2 17:13:08 0 0 131 0 0 132 0 208m 2.85g 16m 8.8 0 0|0 0|0 181k 17k 2 17:13:09 0 0 216 0 0 217 0 208m 2.85g 16m 3.8 0 0|0 0|0 291k 27k 2 17:13:10 0 0 425 0 0 426 0 208m 2.85g 17m 10.5 0 0|0 0|0 600k 53k 2 17:13:11 0 0 490 0 0 490 0 208m 2.85g 19m 7.3 0 0|0 0|1 658k 61k 2 17:13:12 0 0 630 0 0 632 0 208m 2.85g 20m 11.3 0 0|0 0|0 844k 78k 2 17:13:13 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 818 0 0 818 0 208m 2.85g 22m 13.6 0 0|0 0|0 1m 101k 2 17:13:14 0 0 686 0 0 688 0 208m 2.85g 21m 9.6 0 0|0 0|0 932k 85k 2 17:13:15 0 0 0 0 0 1 0 208m 2.85g 22m 0 0 0|0 0|0 62b 1k 2 17:13:16 0 0 178 0 0 179 0 208m 2.85g 22m 1.2 0 0|0 0|0 252k 23k 2 17:13:17 0 0 974 0 0 975 0 208m 2.85g 23m 12.3 0 0|0 0|0 1m 121k 2 17:13:18 0 0 920 0 0 921 0 208m 2.85g 26m 10.1 0 0|0 0|0 1m 114k 2 17:13:19 0 0 810 0 0 811 0 208m 2.85g 26m 8.1 0 0|0 0|0 1m 100k 2 17:13:20 0 0 612 0 0 613 0 208m 2.85g 27m 8.1 0 0|0 0|0 798k 76k 2 17:13:21 0 0 0 0 0 1 0 208m 2.85g 27m 0 0 0|0 0|0 62b 1k 2 17:13:22 0 0 0 0 0 1 0 208m 2.85g 27m 0 0 0|0 0|0 62b 1k 2 17:13:23 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 30 0 0 31 0 208m 2.85g 27m 1.1 0 0|0 0|0 46k 5k 2 17:13:24 0 0 852 0 0 853 0 208m 2.85g 27m 14.8 0 0|0 0|0 1m 106k 2 17:13:25 0 0 768 0 0 769 0 208m 2.85g 29m 9.5 0 0|0 0|0 1m 95k 2 17:13:26 0 0 867 0 0 868 0 208m 2.85g 32m 13.4 0 0|0 0|0 1m 107k 2 17:13:27 0 0 705 0 0 706 0 208m 2.85g 31m 10.8 0 0|0 0|0 936k 88k 2 17:13:28 0 0 331 0 0 332 0 208m 2.85g 32m 3.9 0 0|0 0|0 446k 42k 2 17:13:29 0 0 0 0 0 1 0 208m 2.85g 32m 0 0 0|0 0|0 62b 1k 2 17:13:30 0 0 551 0 0 552 0 208m 2.85g 32m 6.1 0 0|0 0|0 739k 69k 2 17:13:31 0 0 728 0 0 729 0 208m 2.85g 35m 12.2 0 0|0 0|0 949k 90k 2 17:13:32 0 0 991 0 0 992 0 208m 2.85g 35m 11.7 0 0|0 0|0 1m 123k 2 17:13:33 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 935 0 0 935 0 208m 2.85g 38m 10.9 0 0|0 0|1 1m 116k 2 17:13:34 0 0 431 0 0 433 0 208m 2.85g 39m 5.9 0 0|0 0|0 563k 54k 2 17:13:35 0 0 0 0 0 1 0 208m 2.85g 39m 0 0 0|0 0|0 62b 1k 2 17:13:36 0 0 518 0 0 519 0 208m 2.85g 40m 5 0 0|0 0|0 693k 65k 2 17:13:37 0 0 904 0 0 905 0 208m 2.85g 40m 8.9 0 0|0 0|0 1m 112k 2 17:13:38 0 0 958 0 0 959 0 208m 2.85g 41m 14.2 0 0|0 0|0 1m 119k 2 17:13:39 0 0 899 0 0 900 0 208m 2.85g 44m 11.1 0 0|0 0|0 1m 111k 2 17:13:40 0 0 401 0 0 402 0 208m 2.85g 42m 5.9 0 0|0 0|0 540k 50k 2 17:13:41 0 0 856 0 0 857 0 208m 2.85g 44m 9.4 0 0|0 0|0 1m 106k 2 17:13:42 0 0 232 0 0 233 0 208m 2.85g 45m 2.9 0 0|0 0|0 311k 29k 1 17:13:43 Source DB: insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 2 0 12 1 13 0 56g 115g 2.82g 52.2 0 0|0 0|1 1k 822k 2 17:13:02 0 0 0 42 0 43 0 56g 115g 2.65g 88.1 0 0|0 0|1 5k 4k 2 17:13:03 0 0 0 14 0 15 1 56g 115g 1.88g 101 0 0|0 0|1 1k 2k 2 17:13:04 0 0 0 33 1 35 0 56g 115g 1.89g 53.8 16.6 0|0 1|0 4k 4k 2 17:13:05 0 0 0 0 0 1 0 56g 115g 1.9g 0 0 0|0 1|0 62b 1k 2 17:13:06 0 0 0 34 0 35 0 56g 115g 1.9g 47.3 0 0|0 0|0 4k 4m 2 17:13:07 0 0 0 91 0 92 0 56g 115g 1.89g 86.8 0 0|0 0|0 12k 8k 2 17:13:08 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 149 0 150 0 56g 115g 1.9g 82.5 0 0|0 0|0 20k 13k 2 17:13:09 0 0 0 287 0 287 0 56g 115g 1.9g 74.3 0 0|0 0|1 38k 25k 2 17:13:10 0 0 0 387 0 389 0 56g 115g 1.91g 64.6 0 0|0 0|0 52k 33k 2 17:13:11 0 0 0 580 0 581 0 56g 115g 1.93g 59 0 0|0 0|0 78k 49k 2 17:13:12 0 0 0 685 0 686 0 56g 115g 1.93g 47.9 0 0|0 0|0 93k 58k 2 17:13:13 0 0 0 729 0 730 0 56g 115g 1.95g 42.6 0 0|0 0|0 99k 61k 2 17:13:14 0 0 0 551 1 552 0 56g 115g 1.97g 34.5 0 0|0 1|0 74k 47k 2 17:13:15 0 0 0 0 0 1 0 56g 115g 1.98g 0 0 0|0 1|0 62b 1k 2 17:13:16 0 0 0 368 0 368 0 56g 115g 1.96g 20 0 0|0 0|1 50k 4m 2 17:13:17 0 0 0 1032 0 1034 0 56g 115g 2g 35.9 0 0|0 0|0 140k 87k 2 17:13:18 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 834 0 835 0 56g 115g 2.01g 42.3 0 0|0 0|0 113k 70k 2 17:13:19 0 0 0 922 0 922 0 56g 115g 2.02g 38.9 0 0|0 0|1 125k 77k 2 17:13:20 0 0 0 338 1 340 0 56g 115g 2.03g 13 0 0|0 1|0 46k 29k 2 17:13:21 0 0 0 0 0 1 0 56g 115g 2.04g 0 0 0|0 1|0 62b 1k 2 17:13:22 0 0 0 0 0 1 0 56g 115g 2.04g 0 0 0|0 1|0 62b 1k 2 17:13:23 0 0 0 185 0 186 0 56g 115g 2.02g 24.7 0 0|0 0|0 25k 4m 2 17:13:24 0 0 0 920 0 920 0 56g 115g 2.02g 29.9 0 0|0 0|1 125k 77k 2 17:13:25 0 0 0 741 0 742 0 56g 115g 2.04g 49 0 0|0 0|1 100k 62k 2 17:13:26 0 0 0 886 0 887 0 56g 115g 2.05g 33 0 0|0 0|1 120k 74k 2 17:13:27 0 0 0 683 0 684 0 56g 115g 2.08g 57.3 0 0|0 0|1 92k 58k 2 17:13:28 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 138 1 140 0 56g 115g 2.06g 14.5 0 0|0 1|0 18k 12k 2 17:13:29 0 0 0 12 0 12 0 56g 115g 2.07g 0.2 0 0|0 0|1 1k 4m 2 17:13:30 0 0 0 761 0 763 0 56g 115g 2.08g 46.5 0 0|0 0|0 103k 64k 2 17:13:31 0 0 0 762 0 762 0 56g 115g 2.09g 47 0 0|0 0|1 103k 64k 2 17:13:32 0 0 0 957 0 959 0 56g 115g 2.1g 35.3 0 0|0 0|0 130k 80k 2 17:13:33 0 0 0 905 0 905 0 56g 115g 2.13g 38.5 0 0|1 0|1 123k 76k 2 17:13:34 0 0 0 239 1 241 0 56g 115g 2.12g 14.9 0 0|0 1|0 32k 21k 2 17:13:35 0 0 0 0 0 1 0 56g 115g 2.13g 0 0 0|0 1|0 62b 1k 2 17:13:36 0 0 0 780 0 781 0 56g 115g 2.13g 42.8 0 0|0 0|0 106k 4m 2 17:13:37 0 0 0 805 0 806 0 56g 115g 2.14g 39.8 0 0|0 0|0 109k 68k 2 17:13:38 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 988 0 989 0 56g 115g 2.16g 34.8 0 0|0 0|0 134k 83k 2 17:13:39 0 0 0 953 0 953 0 56g 115g 2.17g 39.9 0 0|0 0|1 129k 80k 2 17:13:40 0 0 0 375 1 376 0 56g 115g 2.18g 13.9 0 0|1 0|1 51k 1m 2 17:13:41 0 0 0 867 0 869 0 56g 115g 2.19g 41.7 0 0|0 0|0 118k 73k 1 17:13:42 Total: MongoDB shell version: 2.0.1 connecting to: localhost:27017/prodcopy archiving clicks for ad_id: 4d6c12497b90420373000056 archiving documents... archived 19045 documents in 40701ms. For the queued approach with this same set up, I was able to queue ~1200 documents per second, and the archiving process performed at about the same ~500 documents per second. Example output from mongostat while this was running: Archive DB: insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 0 0 1 0 208m 2.85g 18m 0 0 0|0 0|0 62b 1k 2 17:16:44 0 0 0 0 0 1 0 208m 2.85g 18m 0 0 0|0 0|0 62b 1k 2 17:16:45 0 0 0 0 0 1 0 208m 2.85g 18m 0 0 0|0 0|0 62b 1k 2 17:16:46 0 0 0 0 0 1 0 208m 2.85g 18m 0 0 0|0 0|0 62b 1k 2 17:16:47 0 0 0 0 0 1 0 208m 2.85g 18m 0 0 0|0 0|0 62b 1k 2 17:16:48 0 0 0 0 0 1 0 208m 2.85g 18m 0 0 0|0 0|0 62b 1k 2 17:16:49 0 0 248 0 0 249 0 208m 2.85g 17m 6 0 0|0 0|0 283k 31k 2 17:16:50 0 0 780 0 0 781 0 208m 2.85g 18m 8.2 0 0|0 0|0 883k 97k 2 17:16:51 0 0 1256 0 0 1257 0 208m 2.85g 22m 13.7 0 0|0 0|0 1m 155k 2 17:16:52 0 0 1084 0 0 1085 0 208m 2.85g 21m 10 0 0|0 0|0 1m 134k 2 17:16:53 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 1108 0 0 1109 0 208m 2.85g 25m 9.8 0 0|0 0|0 1m 137k 2 17:16:54 0 0 1108 0 0 1109 0 208m 2.85g 24m 9.8 0 0|0 0|0 1m 137k 2 17:16:55 0 0 1110 0 0 1111 0 208m 2.85g 27m 11 0 0|0 0|0 1m 137k 2 17:16:56 0 0 1166 0 0 1167 0 208m 2.85g 31m 11.3 0 0|0 0|0 1m 144k 2 17:16:57 0 0 1246 0 0 1247 0 208m 2.85g 31m 10.6 0 0|0 0|0 1m 154k 2 17:16:58 0 0 1094 0 0 1095 0 208m 2.85g 35m 9.6 0 0|0 0|0 1m 135k 2 17:16:59 0 0 957 0 0 958 0 208m 2.85g 33m 9.6 0 0|0 0|0 1m 119k 2 17:17:00 0 0 721 0 0 722 0 464m 3.35g 38m 8.9 0 0|0 0|0 1m 90k 2 17:17:01 0 0 243 0 0 243 0 464m 3.35g 34m 9.7 0 0|0 0|0 445k 31k 2 17:17:02 0 0 921 0 0 923 0 464m 3.35g 37m 30.1 0 0|0 0|0 1m 114k 2 17:17:03 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 668 0 0 669 0 464m 3.35g 38m 9.6 0 0|0 0|0 962k 83k 2 17:17:04 0 0 995 0 0 995 0 464m 3.35g 41m 21.2 0 0|0 0|1 1m 123k 2 17:17:05 0 0 467 0 0 468 0 464m 3.35g 27m 12 0 0|1 0|1 679k 58k 2 17:17:06 0 0 1 0 0 2 1 464m 3.35g 17m 85.5 0 0|1 0|1 1k 1k 2 17:17:07 0 0 337 0 0 338 0 464m 3.35g 19m 36.2 0 0|0 0|1 499k 42k 2 17:17:08 0 0 917 0 0 919 0 464m 3.35g 21m 20.4 0 0|0 0|0 1m 114k 2 17:17:09 0 0 1022 0 0 1022 0 464m 3.35g 25m 22.7 0 0|0 0|1 1m 126k 2 17:17:10 0 0 970 0 0 971 0 464m 3.35g 24m 27.4 0 0|0 0|1 1m 120k 2 17:17:11 0 0 166 0 0 168 0 464m 3.35g 25m 1.2 0 0|0 0|0 251k 21k 2 17:17:12 0 0 862 0 0 863 0 464m 3.35g 28m 23.1 0 0|0 0|0 1m 107k 2 17:17:13 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 888 0 0 889 0 464m 3.35g 28m 28.8 0 0|0 0|0 1m 110k 2 17:17:14 0 0 540 0 0 541 0 464m 3.35g 30m 12.2 0 0|0 0|0 802k 67k 2 17:17:15 0 0 879 0 0 880 0 464m 3.35g 31m 22.3 0 0|0 0|0 1m 109k 2 17:17:16 0 0 473 0 0 474 0 464m 3.35g 36m 17.2 0 0|0 0|0 1m 59k 1 17:17:17 Source DB: insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 0 0 1 0 56g 115g 1.86g 0 0 0|0 0|0 62b 1k 1 17:16:29 0 0 0 0 0 1 0 56g 115g 1.86g 0 0 0|0 0|0 62b 1k 1 17:16:30 1 1 1 0 1 3 0 56g 115g 1.87g 51.1 0 0|0 0|1 475b 697k 2 17:16:31 0 0 0 0 0 1 0 56g 115g 349m 129 0 0|0 0|1 62b 1k 2 17:16:32 0 0 0 0 0 1 0 56g 115g 367m 101 0 0|0 0|1 62b 1k 2 17:16:33 0 0 0 0 0 1 0 56g 115g 384m 73.4 0 0|0 0|1 62b 1k 2 17:16:34 0 0 0 0 0 1 0 56g 115g 428m 112 0 0|0 0|1 62b 1k 2 17:16:35 0 0 0 0 0 1 0 56g 115g 436m 94.8 0 0|0 0|1 62b 1k 2 17:16:36 0 0 0 0 0 1 0 56g 115g 450m 108 0 0|0 0|1 62b 1k 2 17:16:37 0 0 0 0 0 1 0 56g 115g 461m 110 0 0|0 0|1 62b 1k 2 17:16:38 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 0 0 1 0 56g 115g 480m 95.5 0 0|0 0|1 62b 1k 2 17:16:39 0 0 0 0 0 1 0 56g 115g 494m 73.5 0 0|0 0|1 62b 1k 2 17:16:40 0 0 0 0 0 1 0 56g 115g 531m 120 0 0|0 0|1 62b 1k 2 17:16:41 0 0 0 0 0 1 0 56g 115g 541m 108 0 0|0 0|1 62b 1k 2 17:16:42 0 0 0 0 0 1 0 56g 115g 564m 74.6 0 0|0 0|1 62b 1k 2 17:16:43 0 0 0 0 0 1 0 56g 115g 579m 127 0 0|0 0|1 62b 1k 2 17:16:44 0 0 0 0 0 1 0 56g 115g 605m 100 0 0|0 0|1 62b 1k 2 17:16:45 0 0 0 0 0 1 0 56g 115g 631m 98.5 0 0|0 0|1 62b 1k 2 17:16:46 0 0 0 0 0 1 0 56g 115g 657m 89.1 0 0|0 0|1 62b 1k 2 17:16:47 0 0 0 0 0 1 0 56g 115g 662m 87.8 0 0|0 0|1 62b 1k 2 17:16:48 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 0 0 1 0 56g 115g 680m 93.5 0 0|0 0|1 62b 1k 2 17:16:49 0 1 0 530 1 531 0 56g 115g 767m 81.3 0 0|0 0|1 72k 4m 2 17:16:50 0 0 0 884 0 886 0 56g 115g 721m 48 0 0|0 0|0 120k 74k 2 17:16:51 0 0 0 1079 0 1080 0 56g 115g 698m 33.6 0 0|0 0|0 146k 90k 2 17:16:52 0 0 0 1294 0 1294 0 56g 115g 720m 23.8 0 0|1 0|1 175k 108k 2 17:16:53 0 0 0 1118 1 1120 0 56g 115g 740m 31.9 0 0|0 0|0 152k 4m 2 17:16:54 0 0 0 1111 0 1112 0 56g 115g 718m 33.3 0 0|0 0|0 151k 93k 2 17:16:55 0 0 0 1097 0 1098 0 56g 115g 706m 33.1 0 0|0 0|0 149k 92k 2 17:16:56 0 0 0 1097 0 1098 0 56g 115g 701m 33.2 0 0|0 0|0 149k 92k 2 17:16:57 0 0 0 1091 1 1092 0 56g 115g 696m 35.1 0 0|0 0|0 148k 4m 2 17:16:58 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 1149 0 1150 0 56g 115g 718m 21.3 0 0|0 0|0 156k 96k 2 17:16:59 0 0 0 1091 0 1091 0 56g 115g 707m 31.2 0 0|0 0|1 148k 91k 2 17:17:00 0 0 0 468 0 470 0 56g 115g 714m 9.3 0 0|0 0|0 63k 40k 2 17:17:01 0 0 0 332 0 333 0 56g 115g 694m 27.3 0 0|0 0|0 45k 28k 2 17:17:02 0 0 0 1013 1 1014 0 56g 115g 713m 17.8 0 0|0 0|0 137k 4m 2 17:17:03 0 0 0 551 0 552 0 56g 115g 691m 39.9 0 0|0 0|0 74k 47k 2 17:17:04 0 0 0 1140 0 1141 0 56g 115g 709m 19.7 0 0|0 0|0 155k 95k 2 17:17:05 0 0 0 126 0 127 0 56g 115g 709m 2.1 0 0|0 0|0 17k 11k 2 17:17:06 0 0 0 92 0 92 0 56g 115g 710m 1.6 0 0|0 0|1 12k 8k 2 17:17:07 0 0 0 605 1 607 0 56g 115g 700m 29.1 0 0|0 0|0 82k 4m 2 17:17:08 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 814 0 814 0 56g 115g 699m 17.3 0 0|0 0|1 110k 68k 2 17:17:09 0 0 0 1036 0 1037 0 56g 115g 708m 18.4 0 0|0 0|0 140k 87k 2 17:17:10 0 0 0 820 0 821 0 56g 115g 715m 14.4 0 0|1 0|1 111k 69k 2 17:17:11 0 0 0 298 0 300 0 56g 115g 683m 24.2 0 0|0 0|0 40k 26k 2 17:17:12 0 0 0 907 1 908 0 56g 115g 699m 20.4 0 0|0 0|0 123k 4m 2 17:17:13 0 0 0 946 0 947 0 56g 115g 707m 16.4 0 0|0 0|0 128k 79k 2 17:17:14 0 0 0 374 0 375 0 56g 115g 677m 36.7 0 0|0 0|0 50k 32k 2 17:17:15 0 0 0 905 1 905 0 56g 115g 689m 16.2 0 0|0 0|0 123k 821k 2 17:17:16 0 0 0 259 0 261 0 56g 115g 690m 5.1 0 0|0 0|0 35k 22k 1 17:17:17 0 0 0 0 0 1 0 56g 115g 690m 0 0 0|0 0|0 62b 1k 1 17:17:18 Total: MongoDB shell version: 2.0.1 connecting to: localhost:27017/prodcopy archiving clicks for ad_id: 4d6c12497b90420373000062 ensuring sparse index on field "arch" marking documents for archive... queued 22227 documents for archive in 19369ms. archiving queued documents to archive.clicks... archived 22227 queued documents to archive.clicks in 26901ms. Upgrading the hard drive to an SSD, I was able to almost triple the performance of the archiving step, and archive ~1300 documents per second. Obviously this is all a giant hack, and it would be nice if MongoDB supported some kind of partitioning feature so we didn’t have to do any of this manually. But until then, this is the best I could get given the tools currently available. Source: http://blog.brianploetz.com/post/12131083486/adventures-in-archiving-with-mongodb

November 21, 2011

by Mitch Pronschinske

· 14,395 Views

Freight Management System on NetBeans

Lynden is a family of transportation and logistics companies specialized in shipping to Alaska and other locations worldwide. Over land, on the water, in the air - or in any combination - Lynden has been helping customers solve transportation problems for over a century. The Lynden Freight Management System is a NetBeans Platform application which serves a dual purpose as both a planning and freight tracking tool. The Planning module allows terminal managers to see all freight that is currently inbound to their location as well as freight that is scheduled to depart from their location so they can make the most efficient use of their dock space and resources as possible. The Trace module allows customer service personnel to search for customer account information, view the tracking history of any given freight item in the system as well as display any documents related to the shipment, such as bills of lading or delivery receipts. NetBeans Platform Lynden has benefited from the NetBeans Platform as it allows developers to focus on the business logic of our applications rather than the underlying "plumbing". We are able to leverage built-in support for event handling, enable/disable functionality on UI controls, dockable windows, and automatic updates for our application with minimal work compared to rolling our own framework. We chose to go the desktop application route as we have a number of existing desktop applications here that this application will likely need to interface with at some point, as well as a commercial set of rich UI components that we have been using for some time now. For the initial deployment, we will be pushing the installer out to employee PCs via the Landesk remote desktop administration tool. Future updates to various modules within the application will be done via the update center functionality built into the NetBeans Platform. Screenshots

November 19, 2011

by Rob Terpilowski

· 11,765 Views · 3 Likes

Tackling the Circular Dependency in Java...

Let me first define what we mean by circular dependency in OOAD terms vis-a-vis Java. Suppose we have a class called A which has class B’s Object. (in UML terms A HAS B). at the same time we class B is also composed of Object of class A (in UML terms B HAS A). obviously this represents circular dependency because while creating the object of A, the compiler must know the size of B... on the other hand while creating object of B, the compiler must know the size of A. this is something like egg vs. chicken problem... this may be possible in real life situation as well. for example suppose a multi storied building has a lift. so in the UML terms, the building HAS lift... but at the same time, suppose, while constructing the lift object, we need to give it the information about the building object to access various functionalities of the Building class... for example, suppose the speed of the lift is set depending on the number of floors of the Building... hence while constructing the Lift object it must access the functionalities of the Building object which will give the number of floors the building has got...hence in UML terms the lift HAS building... so this is a sure case of circular dependency... in real java code it will look something as follows: public class Building { private Lift lift; private int floor; public Building(){ lift = new Lift(); setFloor(15); } public int getFloor(){ return floor; } public void setFloor(int floor){ this.floor = floor; } }//end of class building //class Lift public class Lift { private Building building; private int Speed; public Lift(){ building = new Building(); setSpeed(); } public void setSpeed(){ if (building.getFloor()>20){ //one set of functionalities //may be the the speed of the lift will be more this.Speed = 10; } else { //different set of functionalities //may be the speed of the lift will be less this.Speed = 5; } } public int getSpeed(){ return Speed; } }//end of class Lift As it becomes clear from the above code, that while creating the Building object it will create the Lift object, and while creating the Lift object, it will try to create a Building object to access some of its functionalities... So, ultimately it will go out of memory and we get a StackOverflow runtime exception... So how do we handle this problem in Java? We actually tackle this problem by declaring an IBuildingProxy interface and by deriving our Building class from that... the lift class, instead of Having Building object, it Has IBuildingProxy... the source code of the solution looks like the following... public interface IBuildingProxy { int getFloor(); void setFloor(int floor); } public class Building implements IBuildingProxy{ private Lift lift; private int floor; public Building(){ lift = new Lift(this); setFloor(15); } public int getFloor(){ return floor; } public void setFloor(int floor){ this.floor = floor; } } public class Lift { private IBuildingProxy building; private int Speed; public Lift(Building b){ this.building = b; setSpeed(); } private void setSpeed(){ if (building.getFloor()>20){ / /one set of functionalities //may be the the speed of the lift will be more this.Speed = 10; } else { //different set of functionalities //may be the speed of the lift will be less this.Speed = 5; } } public int getSpeed(){ return Speed; } } public class CircularDependencyTest { public static void main(String[] args){ Building b = new Building(); Lift l = new Lift(b); } } So whats the principle behind such work around... It will be clear soon... As it becomes clear from the code that Building HAS Lift... That is not a problem... Now when it comes to solve the part that Lift HAS Building, instead of the Building object, we have created an IBuildingProxy interface and we pass it to the Lift class... what it essentially means, that the building class knows the memory requirement to initialize the Lift object, and as the Lift class HAS just a proxy interface of the Building, it does not have to care for the Building's memory requirement... and that solves the problem... Hope this discussion becomes helpful for Java learners...

November 17, 2011

by Somenath Mukhopadhyay

· 32,488 Views · 2 Likes

ASP.NET MVC: Converting business objects to select list items

Some of our business classes are used to fill dropdown boxes or select lists. And often you have some base class for all your business classes. In this posting I will show you how to use base business class to write extension method that converts collection of business objects to ASP.NET MVC select list items without writing a lot of code. BusinessBase, BaseEntity and other base classes I prefer to have some base class for all my business classes so I can easily use them regardless of their type in contexts I need. NB! Some guys say that it is good idea to have base class for all your business classes and they also suggest you to have mappings done same way in database. Other guys say that it is good to have base class but you don’t have to have one master table in database that contains identities of all your business objects. It is up to you how and what you prefer to do but whatever you do – think and analyze first, please. :) To keep things maximally simple I will use very primitive base class in this example. This class has only Id property and that’s it. public class BaseEntity { public virtual long Id { get; set; } } Now we have Id in base class and we have one more question to solve – how to better visualize our business objects? To users ID is not enough, they want something more informative. We can define some abstract property that all classes must implement. But there is also another option we can use – overriding ToString() method in our business classes. public class Product : BaseEntity { public virtual string SKU { get; set; } public virtual string Name { get; set; } public override string ToString() { if (string.IsNullOrEmpty(Name)) return base.ToString(); return Name; } } Although you can add more functionality and properties to your base class we are at point where we have what we needed: identity and human readable presentation of business objects. Writing list items converter Now we can write method that creates list items for us. public static class BaseEntityExtensions { public static IEnumerable ToSelectListItems (this IList baseEntities) where T : BaseEntity { return ToSelectListItems((IEnumerator) baseEntities.GetEnumerator()); } public static IEnumerable ToSelectListItems (this IEnumerator baseEntities) { var items = new HashSet(); while (baseEntities.MoveNext()) { var item = new SelectListItem(); var entity = baseEntities.Current; item.Value = entity.Id.ToString(); item.Text = entity.ToString(); items.Add(item); } return items; } } You can see here to overloads of same method. One works with List and the other with IEnumerator. Although mostly my repositories return IList when querying data there are always situations where I can use more abstract types and interfaces. Using extension methods in code In your code you can use ToSelectListItems() extension methods like shown on following code fragment. ... var model = new MyFormModel(); model.Statuses = _myRepository.ListStatuses().ToSelectListItems(); ... You can call this method on all your business classes that extend your base entity. Wanna have some fun with this code? Write overload for extension method that accepts selected item ID.

November 11, 2011

by Gunnar Peipman

· 13,648 Views

Creating a backup in Team Foundation Server 2010 using the Power Tools

over the last few years the product team has been putting their finishing touches on a backup module for the team foundation server administration console. why you might ask do you need another way to backup? surely you can just backup the bits? well, you could, but as tfs has a lot of moving parts it can get very complicated to creating a backup . required permissions identify databases create tables in databases create a stored procedure for marking tables create a stored procedure for marking all tables at once create a stored procedure to automatically mark tables create a scheduled job to run the table-marking procedure create a maintenance plan for full backups create a maintenance plan for differential backups create a maintenance plan for transaction backups back up additional lab management components -from “back up team foundation server” on msdn there are a heck of a lot of databases that, depending on your environment, might be spread over your entire network. figure: deployment topologies (where is my data?) from msdn so, how is this problem solved. well the tfs team have create a tool to create all of the backups and all of the job as well as managing the backup location for you. this sounds fantastic, but how about in practice. was it really that easy? well….not really…here is the extra stuff i found out: your account must own the share owning the folder does not cut it (see error #1- tf254027). sql must be running under a domain account or network service sql must also have permission to the share, and the validation will get confused if you use “localsystem” instead of network service or a domain account (see error #2- tf254027) the account running sql must have permission to create spn’s the account that is used for sql must be able to both see and create service principal names in active directory (see error #3: terminating your tfs server). once you learn how to google without keywords and read your servers mind you will have a nice backup system going… error #1- tf254027 i initially got an error because the accounts did not really have full control over the target location. this is a problem with the share. although i have full permission for \\fileserver1\share\tfsbackups it is just a folder under the \\fileserver1\share\ location and i do not have permission to change the sharing settings there. figure: tf254027 is caused by permission issues [info @16:36:34.342] granting account root_company\tfssqlbox$ permission on folder \\fileserver1\share\tfsbackups [info @16:36:34.348] system.unauthorizedaccessexception: attempted to perform an unauthorized operation. at system.security.accesscontrol.win32.setsecurityinfo(resourcetype type, string name, safehandle handle, securityinfos securityinformation, securityidentifier owner, securityidentifier group, genericacl sacl, genericacl dacl) at system.security.accesscontrol.nativeobjectsecurity.persist(string name, safehandle handle, accesscontrolsections includesections, object exceptioncontext) at system.security.accesscontrol.filesystemsecurity.persist(string fullpath) at microsoft.teamfoundation.powertools.admin.helpers.filehelper.grantfolderpermission(string account, string path) [info @16:36:34.350] granting account root_company\tfs.services permission on folder \\fileserver1\share\tfsbackups [info @16:36:34.352] system.unauthorizedaccessexception: attempted to perform an unauthorized operation. at system.security.accesscontrol.win32.setsecurityinfo(resourcetype type, string name, safehandle handle, securityinfos securityinformation, securityidentifier owner, securityidentifier group, genericacl sacl, genericacl dacl) at system.security.accesscontrol.nativeobjectsecurity.persist(string name, safehandle handle, accesscontrolsections includesections, object exceptioncontext) at system.security.accesscontrol.filesystemsecurity.persist(string fullpath) at microsoft.teamfoundation.powertools.admin.helpers.filehelper.grantfolderpermission(string account, string path) [error @16:36:34.352] granting permission to account root_company\tfssqlbox$ on path \\fileserver1\share\tfsbackups failed figure: the log files get to the root of the problem, but not the reason after much messing around i have found that you can’t use a sub-folder of a share that you do not have permission for. you require permission to the share itself to apply permissions. error #2- tf254027 lets try this again with a share that we control. i will create a backup share on the tfs server and at least then i control then permissions. figure: the next error looks the same, but it is subtly different [info @18:12:05.813] "verify: grant backup plan permissions\root\verifybackuppathpermissionsgrantedsuccessfully(verifybackuppathpermissionsgrantedsuccessfully): exiting verification with state completed and result success" [info @18:12:05.813] verify: grant backup plan permissions\root\verifydummybackupcreation(verifytestbackupcreatedsuccessfully): starting verification [info @18:12:05.813] verify test backup created successfully [info @18:12:05.813] starting creating backup test validation [error @18:12:06.132] microsoft.sqlserver.management.smo.failedoperationexception: backup failed for server 'sqlserver1'. ---> microsoft.sqlserver.management.common.executionfailureexception: an exception occurred while executing a transact-sql statement or batch. ---> system.data.sqlclient.sqlexception: cannot open backup device '\\tfsserver1\tfsbackup\temp_20111104111205.bak'. operating system error 5(access is denied.). backup database is terminating abnormally. at microsoft.sqlserver.management.common.connectionmanager.executetsql(executetsqlaction action, object execobject, dataset filldataset, boolean catchexception) at microsoft.sqlserver.management.common.serverconnection.executenonquery(string sqlcommand, executiontypes executiontype) --- end of inner exception stack trace --- at microsoft.sqlserver.management.common.serverconnection.executenonquery(string sqlcommand, executiontypes executiontype) at microsoft.sqlserver.management.common.serverconnection.executenonquery(stringcollection sqlcommands, executiontypes executiontype) at microsoft.sqlserver.management.smo.executionmanager.executenonquery(stringcollection queries) at microsoft.sqlserver.management.smo.backuprestorebase.executesql(server server, stringcollection queries) at microsoft.sqlserver.management.smo.backup.sqlbackup(server srv) --- end of inner exception stack trace --- at microsoft.sqlserver.management.smo.backup.sqlbackup(server srv) at microsoft.teamfoundation.powertools.admin.helpers.backupfactory.testbackupcreation(string path) [error @18:12:06.184] !verify error!: account root_comapny\martin.hinshelwood failed to create backups using path \\tfsserver1\tfsbackup [info @18:12:06.184] "verify: grant backup plan permissions\root\verifydummybackupcreation(verifytestbackupcreatedsuccessfully): exiting verification with state completed and result error" [info @18:12:06.184] !verify result!: 5 completed, 0 skipped: 4 success, 1 errors, 0 warnings [info @18:12:06.197] verify: backup tasks verifications(vcontainer): starting verification [info @18:12:06.197] a generic container node that does not contribute to results [info @18:12:06.197] "verify: backup tasks verifications(vcontainer): exiting verification with state ignore and result ignore" [info @18:12:06.197] verify: backup tasks verifications\root(vcontainer): starting verification [info @18:12:06.197] a generic container node that does not contribute to results [info @18:12:06.197] "verify: backup tasks verifications\root(vcontainer): exiting verification with state ignore and result ignore" [info @18:12:06.197] verify: backup tasks verifications\root\verifydummybackupcreation(verifytestbackupcreatedsuccessfully): starting verification [info @18:12:06.197] verify test backup created successfully [info @18:12:06.197] starting creating backup test validation [error @18:12:06.389] microsoft.sqlserver.management.smo.failedoperationexception: backup failed for server sqlserver1'. ---> microsoft.sqlserver.management.common.executionfailureexception: an exception occurred while executing a transact-sql statement or batch. ---> system.data.sqlclient.sqlexception: cannot open backup device '\\tfsserver1\tfsbackup\temp_20111104111206.bak'. operating system error 5(access is denied.). backup database is terminating abnormally. at microsoft.sqlserver.management.common.connectionmanager.executetsql(executetsqlaction action, object execobject, dataset filldataset, boolean catchexception) at microsoft.sqlserver.management.common.serverconnection.executenonquery(string sqlcommand, executiontypes executiontype) --- end of inner exception stack trace --- at microsoft.sqlserver.management.common.serverconnection.executenonquery(string sqlcommand, executiontypes executiontype) at microsoft.sqlserver.management.common.serverconnection.executenonquery(stringcollection sqlcommands, executiontypes executiontype) at microsoft.sqlserver.management.smo.executionmanager.executenonquery(stringcollection queries) at microsoft.sqlserver.management.smo.backuprestorebase.executesql(server server, stringcollection queries) at microsoft.sqlserver.management.smo.backup.sqlbackup(server srv) --- end of inner exception stack trace --- at microsoft.sqlserver.management.smo.backup.sqlbackup(server srv) at microsoft.teamfoundation.powertools.admin.helpers.backupfactory.testbackupcreation(string path) figure: this time the error is lying and is from sql not locally as it implies it looks like the problem is that sql server can’t write to that folder, but i can and the machine account can. lets try this from the sql server itself, and with a native backup. figure: sql server can’t write to that location dam… so even a native sql backup can’t write to this location. title: microsoft sql server management studio ------------------------------ backup failed for server 'sqlserver1'. (microsoft.sqlserver.smoextended) for help, click: http://go.microsoft.com/fwlink?prodname=microsoft+sql+server&prodver=10.50.2500.0+((kj_pcu_main).110617-0038+)&evtsrc=microsoft.sqlserver.management.smo.exceptiontemplates.failedoperationexceptiontext&evtid=backup+server&linkid=20476 ------------------------------ additional information: system.data.sqlclient.sqlerror: cannot open backup device '\\tfsserver1\tfsbackup\moo.bak'. operating system error 5(access is denied.). (microsoft.sqlserver.smo) for help, click: http://go.microsoft.com/fwlink?prodname=microsoft+sql+server&prodver=10.50.2500.0+((kj_pcu_main).110617-0038+)&linkid=20476 ------------------------------ buttons: ok ------------------------------ figure: sql server errors suck even more as it turns out, sql server is running under “localserivce” which is not authenticating against our share. so we need to change the service that tfs runs under. error #3: terminating your tfs server as we should always use the sql server configuration manager to change these things i fired it up and since i already have a domain account for running tfs under i decided to use that one. figure: this is easy when you apply it will ask you to restart sql, but it should be all complete. lets check tfs and make sure that everything is running… figure: omg! what just happened! oh shit: i think i just broke tfs. why can’t tfs connect? lets try the sql management studio and see. figure: what is a sspi? this does not look good… after i have hastily changed the service account back to the original value and made use that his fixed tfs i wanted to also figure out why it broke. usually i would just ask shad (one of my extremely technical colleagues) but alas he is on his honeymoon. some googling turned up an spn issue. the account that sql runs under must be able to both read and write service principal names for itself in active directory. this can be set, but only be a domain admin. dynamically set spn’s for sql service accounts so lets go with network service instead. if we change the account that sql server runs under to “network service” then i can add permission for “root_company\sqlserver1$” to my share and get it working. yes, servers have ad accounts as well.

November 5, 2011

by Martin Hinshelwood

· 9,997 Views

Recommendation Engine Models

In a classical model of recommendation system, there are "users" and "items". User has associated metadata (or content) such as age, gender, race and other demographic information. Items also has its metadata such as text description, price, weight ... etc. On top of that, there are interaction (or transaction) between user and items, such as userA download/purchase movieB, userX give a rating 5 to productY ... etc. Now given all the metadata of user and item, as well as their interaction over time, can we answer the following questions ... What is the probability that userX purchase itemY ? What rating will userX give to itemY ? What is the top k unseen items that should be recommended to userX ? Content-based Approach In this approach, we make use of the metadata to categorize user and item and then match them at the category level. One example is to recommend jobs to candidates, we can do a IR/text search to match the user's resume with the job descriptions. Another example is to recommend an item that is "similar" to the one that the user has purchased. Similarity is measured according to the item's metadata and various distance function can be used. The goal is to find k nearest neighbors of the item we know the user likes. Collaborative Filtering Approach In this approach, we look purely at the interactions between user and item, and use that to perform our recommendation. The interaction data can be represented as a matrix. Notice that each cell represents the interaction between user and item. For example, the cell can contain the rating that user gives to the item (in the case the cell is a numeric value), or the cell can be just a binary value indicating whether the interaction between user and item has happened. (e.g. a "1" if userX has purchased itemY, and "0" otherwise. The matrix is also extremely sparse, meaning that most of the cells are unfilled. We need to be careful about how we treat these unfilled cells, there are 2 common ways ... Treat these unknown cells as "0". Make them equivalent to user giving a rate "0". This may or may not be a good idea depends on your application scenarios. Guess what the missing value should be. For example, to guess what userX will rate itemA given we know his has rate on itemB, we can look at all users (or those who is in the same age group of userX) who has rate both itemA and itemB, then compute an average rating from them. Use the average rating of itemA and itemB to interpolate userX's rating on itemA given his rating on itemB. User-based Collaboration Filter In this model, we do the following Find a group of users that is “similar” to user X Find all movies liked by this group that hasn’t been seen by user X Rank these movies and recommend to user X This introduces the concept of user-to-user similarity, which is basically the similarity between 2 row vectors of the user/item matrix. To compute the K nearest neighbor of a particular users. A naive implementation is to compute the "similarity" for all other users and pick the top K. Different similarity functions can be used. Jaccard distance function is defined as the number of intersections of movies that both users has seen divided by the number of union of movies they both seen. Pearson similarity is first normalizing the user's rating and then compute the cosine distance. There are two problems with this approach Compare userX and userY is expensive as they have millions of attributes Find top k similar users to userX require computing all pairs of userX and userY Location Sensitive Hashing and Minhash To resolve problem 1, we approximate the similarity using a cheap estimation function, called minhash. The idea is to find a hash function h() such that the probability of h(userX) = h(userY) is proportion to the similarity of userX and userY. And if we can find 100 of h() function, we can just count the number of such function where h(userX) = h(userY) to determine how similar userX is to userY. The idea is depicted as follows ... It will be expensive to permute the rows if the number of rows is large. Remember that the purpose of h(c1) is to return row number of the first row that is 1. So we can scan each row of c1 to see if it is 1, if so we apply a function newRowNum = hash(rowNum) to simulate a permutation. Take the minimum of the newRowNum seen so far. As an optimization, instead of doing one column at a time, we can do it a row at the time, the algorithm is as follows To solve problem 2, we need to avoid computing all other users' similarity to userX. The idea is to hash users into buckets such that similar users will be fall into the same bucket. Therefore, instead of computing all users, we only compute the similarity of those users who is in the same bucket of userX. The idea is to horizontally partition the column into b bands, each with r rows. By pick the parameter b and r, we can control the likelihood (function of similarity) that they will fall into the same bucket in at least one band. Item-based Collaboration Filter If we transpose the user/item matrix and do the same thing, we can compute the item to item similarity. In this model, we do the following ... Find the set of movies that user X likes (from interaction data) Find a group of movies that is similar to these set of movies that we know user X likes Rank these movies and recommend to user X It turns out that computing item-based collaboration filter has more benefit than computing user to user similarity for the following reasons ... Number of items typically smaller than number of users While user's taste will change over time and hence the similarity matrix need to be updated more frequent, item to item similarity tends to be more stable and requires less update. Singular Value Decomposition If we look back at the matrix, we can see the matrix multiplication is equivalent to mapping an item from the item space to the user space. In other words, if we view each of the existing item as an axis in the user space (notice, each user is a vector of their rating on existing items), then multiplying a new item with the matrix gives the same vector like the user. So we can then compute a dot product with this projected new item with user to determine its similarity. It turns out that this is equivalent to map the user to the item space and compute a dot product there. In other words, multiply the matrix is equivalent to mapping between item space and user space. Now lets imagine there is a hidden concept space in between. Instead of jumping directly from user space to item space, we can think of jumping from user space to a concept space, and then to the item space. Notice that here we first map the user space to the concept space and also map the item space to the concept space. Then we match both user and item at the concept space. This is a generalization of our recommender. We can use SVD to factor the matrix into 2 parts. Let P be the m by n matrix (m rows and n columns). P = UDV where U is an m by m matrix, each column represents the eigenvectors of P*transpose(P). And V is an n by n matrix with each row represents the eigenvector of transpose(P)*P. D is a diagonal matrix containing eigenvalues of P*transpose(P), or transpose(P)*P. In other words, we can decompose P into U*squareroot(D) and squareroot(D)*V. Notice that D can be thought as the strength of each "concept" in the concept space. And the value is order in terms of their magnitude in decreasing order. If we remove some of the weakest concept by making them zero, we reduce the number of non-zero elements in D, which effective generalize the concept space (make them focus in the important concepts). Calculate SVD decomposition for matrix with large dimensions is expensive. Fortunately, if our goal is to compute an SVD approximation (with k diagonal non-zero value), we can use the random projection mechanism as describer here. Association Rule Based In this model, we use the market/basket association rule algorithm to discover rule like ... {item1, item2} => {item3, item4, item5} We represent each user as a basket and each viewing as an item (notice that we ignore the rating and use a binary value). After that we use association rule mining algorithm to detect frequent item set and the association rules. Then for each user, we match the user's previous viewing items to the set of rules to determine what other movies should we recommend. Evaluate the recommender After we have a recommender, how do we evaluate the performance of it ? The basic idea is to use separate the data into the training set and the test set. For the test set, we remove certain user-to-movies interaction (change certain cells from 1 to 0) and pretending the user hasn't seen the item. Then we use the training set to train a recommender and then fit the test set (with removed interaction) to the recommender. The performance is measured by how much overlap between the recommended items with the one that we have removed. In other words, a good recommender should be able to recover the set of items that we have removed from the test set. Leverage tagging information on items In some cases, items has explicit tags associated with them (we can considered the tags is a user-annotated concept space added to the items). Consider each item is described with a vector of tags. Now user can also be auto-tagged based on the items they have interacted. For example, if userX purchase itemY which is tagged with Z1, and Z2. Then user will increase her tag Z1 and Z2 in her existing tag vector. We can use a time decay mechanism to update the user's tag vector as follows ... current_user_tag = alpha * item_tag + (1 - alpha) * prev_user_tag To recommend an item to the user, we simply need to calculate the top k items by computing the dot product (ie: cosine distance) of the user tag vector and the item tag vector. Source: http://horicky.blogspot.com/2011/09/recommendation-engine.html

November 2, 2011

by Ricky Ho

· 26,793 Views · 2 Likes

Avoid Lazy JPA Collections

Hibernate (and actually JPA) has collection mappings: @OneToMany, @ManyToMany, @ElementCollection. All of these are by default lazy. This means the collections are specific implementations of the List or Set interface that hold a reference to the persistent session and the values are loaded from the database only if the collection is accessed. That saves unnecessary database queries if you only occasionally use the collection. However, there’s a problem with that. The problem that manifests itself through the exception that in my observations is 2nd most commonly asked exception (after NullPointerException) – the LazyInitializationException. The problem is that the session is usually open for your service layer and is closed as soon as you return the entity to the view layer. And when you try to iterate the uninitialized collection in your view (jsp for example), the collection throws LazyInitializationException, because the session that they hold a reference to is already closed and they can’t fetch the items. How is this solved? The so called OpenSessionInView / OpenEntityManagerInView “patterns”. In short: you make a filter that opens the session when the request starts and closes it after the view has been rendered (and not after the service layer finishes). Some people call that an anti-pattern, because it leaks persistence handling into the view layer, and complicates the setup. I wouldn’t say it’s that bad: generally it solves the problem without introducing other problems. But in all recent project I’ve been involved, we aren’t using OpenSessionInView, and it works fine. It works fine because we aren’t using lazy collections. But then, you’ll rightly point, you will be fetching “the whole world” when you load a single entity. Well, no. There are two types of *ToMany mappings: value-type mappings where the collection logically does not hold more than a dozen elements. This is in most cases @ElementCollection, and also @*ToMany with items like “Category” or “Price” that are just more complex value objects, but that do not hold any other mappings themselves. Another common feature of these types of collections is that they are usually displayed in the UI together with their owning entity. It is most likely that you want to display the categories of an article, for example. For this type of collections EAGER is the better option. You’ll have to fetch them anyway, why not let hibernate (or any jpa implementation) think of some clever join? As I said – the collections are logically not bigger than a dozen or two, so fetching them won’t be a performance hit. And, logically, they won’t fetch a big object graph with them. mappings across the big, core entities. This can be “all orders made by the user” or “all users for the organization”, “all items of the supplier”, etc. You certainly don’t want to fetch them eagerly. Because if you fetch 2000 users for an organization, which in turn have 1000 orders each, and an order has 3 items on average which in turn have a collection of all people who have purchased it.. you’ll end up with your entire database in memory. Obviously you need lazy collections, right? Well, no. In that case you should not be using collection mappings at all. These types of relations are, in 99% of the cases, displayed in paged lists in the UI. Or in search results. They are never (and should never) be displayed all on one screen (or should rarely be returned in one API call, if your application provides something like a REST API). You have to make queries for them, and use query.setMaxResults and query.setFirstResult() (or limit them with some restrictive criteria). Furthermore having the collections mapped means someone will try to use them at some point, which may fail. And if the object is serialized (xml, json, etc.) the collection contents will be fetched. Something you almost certainly don’t want to happen. (A draft idea here: JPA could have a PagedList collection that would allow paged lazy fetching, thus eliminating the need for a query) So what did I just say – that you should never use lazy collections. Use eager collections for very simple, shallow mappings, and use paged queries for the bigger ones. Well, not exactly. Lazy collections are there and they have application, though it is rather limited. Or at least they are way less applicable than they are used. Here’s an example scenario where I found it applicable. In my side-project I have a Message entity, and it holds a collection of Picture entities. When a user uploads a picture, it is stored in that collection. A message can have no more than 10 pictures, so the collection could very well be eager. But then, Message is the most commonly used entity – it’s fetched virtually on every request. But only some messages have pictures (how many of the tweets on your stream have a a picture upload?). So I don’t want hibernate to make queries just to find out there are no pictures for a given message. Hence I store the number of pictures in a separate field, make the pictures collection lazy, and Hibernate.initialize(..) it manually only if the number of pictures is > 0. So there are scenarios, when the entity has optional collections that fall into the first category above (“small, shallow collections”). So if it is small, shallow and optional (say, used in less than 20% of the cases), then you should go with Lazy to save unnecessary queries. For everything else – having lazy collections will make your life harder. From http://techblog.bozho.net/?p=645

October 28, 2011

by Bozhidar Bozhanov

· 21,408 Views · 1 Like

Smart Batching

How often have we all heard that “batching” will increase latency? As someone with a passion for low-latency systems this surprises me. In my experience when batching is done correctly, not only does it increase throughput, it can also reduce average latency and keep it consistent. Well then, how can batching magically reduce latency? It comes down to what algorithm and data structures are employed. In a distributed environment we are often having to batch up messages/events into network packets to achieve greater throughput. We also employ similar techniques in buffering writes to storage to reduce the number of IOPS. That storage could be a block device backed file-system or a relational database. Most IO devices can only handle a modest number of IO operations per second, so it is best to fill those operations efficiently. Many approaches to batching involve waiting for a timeout to occur and this will by its very nature increase latency. The batch can also get filled before the timeout occurs making the latency even more unpredictable. Figure 1. Figure 1. above depicts decoupling the access to an IO device, and therefore the contention for access to it, by introducing a queue like structure to stage the messages/events to be sent and a thread doing the batching for writing to the device. The Algorithm An approach to batching uses the following algorithm in Java pseudo code: public final class NetworkBatcher implements Runnable { private final NetworkFacade network; private final Queue queue; private final ByteBuffer buffer; public NetworkBatcher(final NetworkFacade network, final int maxPacketSize, final Queue queue) { this.network = network; buffer = ByteBuffer.allocate(maxPacketSize); this.queue = queue; } @Override public void run() { while (!Thread.currentThread().isInterrupted()) { while (null == queue.peek()) { employWaitStrategy(); // block, spin, yield, etc. } Message msg; while (null != (msg = queue.poll())) { if (msg.size() > buffer.remaining()) { sendBuffer(); } buffer.put(msg.getBytes()); } sendBuffer(); } } private void sendBuffer() { buffer.flip(); network.send(buffer); buffer.clear(); } } Basically, wait for data to become available and as soon as it is, send it right away. While sending a previous message or waiting on new messages, a burst of traffic may arrive which can all be sent in a batch, up to the size of the buffer, to the underlying resource. This approach can use ConcurrentLinkedQueue which provides low-latency and avoid locks. However it has an issue in not creating back pressure to stall producing/publishing threads if they are outpacing the batcher whereby the queue could grow out of control because it is unbounded. I’ve often had to wrap ConcurrentLinkedQueue to track its size and thus create back pressure. This size tracking can add 50% to the processing cost of using this queue in my experience. This algorithm respects the single writer principle and can often be employed when writing to a network or storage device, and thus avoid lock contention in third party API libraries. By avoiding the contention we avoid the J-Curve latency profile normally associated with contention on resources, due to the queuing effect on locks. With this algorithm, as load increases, latency stays constant until the underlying device is saturated with traffic resulting in a more "bathtub" profile than the J-Curve. Let’s take a worked example of handling 10 messages that arrive as a burst of traffic. In most systems traffic comes in bursts and is seldom uniformly spaced out in time. One approach will assume no batching and the threads write to device API directly as in Figure 1. above. The other will use a lock free data structure to collect the messages plus a single thread consuming messages in a loop as per the algorithm above. For the example let’s assume it takes 100µs to write a single buffer to the network device as a synchronous operation and have it acknowledged. The buffer will ideally be less than the MTU of the network in size when latency is critical. Many network sub-systems are asynchronous and support pipelining but we will make the above assumption to clarify the example. If the network operation is using a protocol like HTTP under REST or Web Services then this assumption matches the underlying implementation. Best (µs) Average (µs) Worst (µs) Packets Sent Serial 100 500 1,000 10 Smart Batching 100 150 200 1-2 The absolute lowest latency will be achieved if a message is sent from the thread originating the data directly to the resource, if the resource is un-contended. The table above shows what happens when contention occurs and a queuing effect kicks in. With the serial approach 10 individual packets will have to be sent and these typically need to queue on a lock managing access to the resource, therefore they get processed sequentially. The above figures assume the locking strategy works perfectly with no perceivable overhead which is unlikely in a real application. For the batching solution it is likely all 10 packets will be picked up in first batch if the concurrent queue is efficient, thus giving the best case latency scenario. In the worst case only one message is sent in the first batch with the other nine following in the next. Therefore in the worst case scenario one message has a latency of 100µs and the following 9 have a latency of 200µs thus giving a worst case average of 190µs which is significantly better than the serial approach. This is one good example when the simplest solution is just a bit too simple because of the contention. The batching solution helps achieve consistent low-latency under burst conditions and is best for throughput. It also has a nice effect across the network on the receiving end in that the receiver has to process fewer packets and therefore makes the communication more efficient both ends. Most hardware handles data in buffers up to a fixed size for efficiency. For a storage device this will typically be a 4KB block. For networks this will be the MTU and is typically 1500 bytes for Ethernet. When batching, it is best to understand the underlying hardware and write batches down in ideal buffer size to be optimally efficient. However keep in mind that some devices need to envelope the data, e.g. the Ethernet and IP headers for network packets so the buffer needs to allow for this. There will always be an increased latency from a thread switch and the cost of exchange via the data structure. However there are a number of very good non-blocking structures available using lock-free techniques. For the Disruptor this type of exchange can be achieved in as little as 50-100ns thus making the choice of taking the smart batching approach a no brainer for low-latency or high-throughput distributed systems. This technique can be employed for many problems and not just IO. The core of the Disruptor uses this technique to help rebalance the system when the publishers burst and outpace the EventProcessors. The algorithm can be seen inside the BatchEventProcessor. Note: For this algorithm to work the queueing structure must handle the contention better than the underlying resource. Many queue implementations are extremely poor at managing contention. Use science and measure before coming to a conclusion. Batching with the Disruptor The code below shows the same algorithm in action using the Disruptor's EventHandler mechanism. In my experience, this is a very effective technique for handling any IO device efficiently and keeping latency low when dealing with load or burst traffic. public final class NetworkBatchHandler implements EventHander { private final NetworkFacade network; private final ByteBuffer buffer; public NetworkBatchHandler(final NetworkFacade network, final int maxPacketSize) { this.network = network; buffer = ByteBuffer.allocate(maxPacketSize); } public void onEvent(Message msg, long sequence, boolean endOfBatch) throws Exception { if (msg.size() > buffer.remaining()) { sendBuffer(); } buffer.put(msg.getBytes()); if (endOfBatch) { sendBuffer(); } } private void sendBuffer() { buffer.flip(); network.send(buffer); buffer.clear(); } } The endOfBatch parameter greatly simplifies the handling of the batch compared to the double loop in the algorithm above. I have simplified the examples to illustrate the algorithm. Clearly error handling and other edge conditions need to be considered. Separation of IO from Work Processing There is another very good reason to separate the IO from the threads doing the work processing. Handing off the IO to another thread means the worker thread, or threads, can continue processing without blocking in a nice cache friendly manner. I've found this to be critical in achieving high-performance throughput. If the underlying IO device or resource becomes briefly saturated then the messages can be queued for the batcher thread allowing the work processing threads to continue. The batching thread then feeds the messages to the IO device in the most efficient way possible allowing the data structure to handle the burst and if full apply the necessary back pressure, thus providing a good separation of concerns in the workflow. Conclusion So there you have it. Smart Batching can be employed in concert with the appropriate data structures to achieve consistent low-latency and maximum throughput. From http://mechanical-sympathy.blogspot.com/2011/10/smart-batching.html

October 26, 2011

by Martin Thompson

· 11,415 Views · 1 Like

Magic: The Gathering in JavaScript and HTML5

as a user interface fan, i could not miss the development with html 5. so the goal of this post is to walk through a graphic application that uses javascript and html 5. we will see through examples one way (among others) to develop this kind of project. application overview tools the html 5 page data gathering cards loading & cache handling cards display mouse management state storage animations handling multi-devices conclusion to go further application overview we will produce an application that will let us display a magic the gathering ©(courtesy of www.wizards.com/magic ) cards collection. users will be able to scroll and zoom using the mouse (like bing maps, for example). you can see the final result here: http://bolaslenses.catuhe.com the project source files can be downloaded here: http://www.catuhe.com/msdn/bolaslenses.zip cards are stored on windows azure storage and use the azure content distribution network ( cdn : a service that deploys data near the final users) in order to achieve maximum performances. an asp.net service is used to return cards list (using json format). tools to write our application, we will use visual studio 2010 sp1 with web standards update . this extension adds intellisense support in html 5 page (which is a really important thing ). so, our solution will contain an html 5 page side by side with .js files (these files will contain javascript scripts). about debug, it is possible to set a breakpoint directly in the .js files under visual studio. it is also possible to use the developer bar of internet explorer 9 (use f12 key to display it). debug with visual studio 2010 debug with internet explorer 9 (f12/developer bar) so, we have a modern developer environment with intellisense and debug support. therefore, we are ready to start and first of all, we will write the html 5 page. the html 5 page our page will be built around an html 5 canvas which will be used to draw the cards: cards scanned by mwshq team magic the gathering official site : http://www.wizards.com/magic bolas lenses your browser does not support html5 canvas. loading data... if we dissect this page, we can note that it is divided into two parts: the header part with the title, the logo and the special mentions the main part (section) holds the canvas and the tooltips that will display the status of the application. there is also a hidden image ( backimage ) used as source for not yet loaded cards. to build the layout of the page, a style sheet ( full.css ) is applied. style sheets are a mechanism used to change the tags styles (in html, a style defines the entire display options for a tag): html, body { height: 100%; } body { background-color: #888888; font-size: .85em; font-family: "segoe ui, trebuchet ms" , verdana, helvetica, sans-serif; margin: 0; padding: 0; color: #696969; } a:link { color: #034af3; text-decoration: underline; } a:visited { color: #505abc; } a:hover { color: #1d60ff; text-decoration: none; } a:active { color: #12eb87; } header, footer, nav, section { display: block; } table { width: 100%; } header, #header { position: relative; margin-bottom: 0px; color: #000; padding: 0; } #title { font-weight: bold; color: #fff; border: none; font-size: 60px !important; vertical-align: middle; margin-left: 70px } #legal { text-align: right; color: white; font-size: 14px; width: 50%; position: absolute; top: 15px; right: 10px } #leftheader { width: 50%; vertical-align: middle; } section { margin: 20px 20px 20px 20px; } #maincanvas{ border: 4px solid #000000; } #cardscount { font-weight: bolder; font-size: 1.1em; } .tooltip { position: absolute; bottom: 5px; color: black; background-color: white; margin-right: auto; margin-left: auto; left: 35%; right: 35%; padding: 5px; width: 30%; text-align: center; border-radius: 10px; -webkit-border-radius: 10px; -moz-border-radius: 10px; box-shadow: 2px 2px 2px #333333; } #bolaslogo { width: 64px; height: 64px; } #picturecell { float: left; width: 64px; margin: 5px 5px 5px 5px; vertical-align: middle; } thus, this sheet is responsible for setting up the following display: style sheets are powerful tools that allow an infinite number of displays. however, they are sometimes complicated to setup (for example if a tag is affected by a class, an identifier and its container). to simplify this setup, the development bar of internet explorer 9 is particularly useful because we can use it to see styles hierarchy that is applied to a tag. for example let’s take a look at the waittext tooltip with the development bar. to do this, you must press f12 in internet explorer and use the selector to choose the tooltip: once the selection is done, we can see the styles hierarchy: thus, we can see that our div received its styles from the body tag and the . tooltip entry of the style sheet. with this tool, it becomes possible to see the effect of each style (which can be disabled). it is also possible to add new style on the fly. another important point of this window is the ability to change the rendering mode of internet explorer 9. indeed, we can test how, for example, internet explorer 8 will handle the same page. to do this, go to the [ browser mode ] menu and select the engine of internet explorer 8. this change will especially impact our tooltip as it uses border-radius (rounded edge) and box-shadow that are features of css 3: internet explorer 9 internet explorer 8 our page provides a graceful degradation as it still works (with no annoying visual difference) when the browser does not support all the required technologies. now that our interface is ready, we will take a look at the data source to retrieve the cards to display. the server provides the cards list using json format on this url: http://bolaslenses.catuhe.com/ home/listofcards/?colorstring=0 it takes one parameter ( colorstring ) to select a specific color (0 = all). when developing with javascript, there is a good reflex to have (reflex also good in other languages too, but really important in javascript): one must ask whether what we want to develop has not been already done in an existing framework. indeed, there is a multitude of open source projects around javascript. one of them is jquery which provides a plethora of convenient services. thus, in our case to connect to the url of our server and get the cards list, we could go through a xmlhttprequest and have fun to parse the returned json. or we can use jquery . so we will use the getjson function which will take care of everything for us: function getlistofcards() { var url = "http://bolaslenses.catuhe.com/home/listofcards/?jsoncallback=?"; $.getjson(url, { colorstring: "0" }, function (data) { listofcards = data; $("#cardscount").text(listofcards.length + " cards displayed"); $("#waittext").slidetoggle("fast"); }); } as we can see, our function stores the cards list in the listofcards variable and calls two jquery functions: text that change the text of a tag slidetoggle that hides (or shows) a tag by animating its height the listofcards list contains objects whose format is: id : unique identifier of the card path : relative path of the card (without the extension) it should be noted that the url of the server is called with the “ ?jsoncallback=? ” suffix. indeed, ajax calls are constrained in terms of security to connect only to the same address as the calling script. however, there is a solution called jsonp that will allow us to make a concerted call to the server (which of course must be aware of the operation). and fortunately, jquery can handle it all alone by just adding the right suffix. once we have our cards list, we can set up the pictures loading and caching. cards loading & cache handling the main trick of our application is to draw only the cards effectively visible on the screen. the display window is defined by a zoom level and an offset (x, y) in the overall system. var visucontrol = { zoom : 0.25, offsetx : 0, offsety : 0 }; the overall system is defined by 14819 cards that are spread over 200 columns and 75 rows. also, we must be aware that each card is available in three versions: high definition: 480x680 without compression (.jpg suffix) medium definition: 240x340 with standard compression (.50.jpg suffix) low definition: 120x170 with strong compression (.25.jpg suffix) thus, depending on the zoom level, we will load the correct version to optimize networks transfer. to do this we will develop a function that will give an image for a given card. this function will be configured to download a certain level of quality. in addition it will be linked with lower quality level to return it if the card for the current level is not yet uploaded: function imagecache(substr, replacementcache) { var extension = substr; var backimage = document.getelementbyid("backimage"); this.load = function (card) { var localcache = this; if (this[card.id] != undefined) return; var img = new image(); localcache[card.id] = { image: img, isloaded: false }; currentdownloads++; img.onload = function () { localcache[card.id].isloaded = true; currentdownloads--; }; img.onerror = function() { currentdownloads--; }; img.src = "http://az30809.vo.msecnd.net/" + card.path + extension; }; this.getreplacementfromlowercache = function (card) { if (replacementcache == undefined) return backimage; return replacementcache.getimageforcard(card); }; this.getimageforcard = function(card) { var img; if (this[card.id] == undefined) { this.load(card); img = this.getreplacementfromlowercache(card); } else { if (this[card.id].isloaded) img = this[card.id].image; else img = this.getreplacementfromlowercache(card); } return img; }; } an imagecache is built by giving the associated suffix and the underlying cache. here you can see two important functions: load : this function will load the right picture and will store it in a cache (the msecnd.net url is the azure cdn address of the cards) getimageforcard : this function returns the card picture from the cache if already loaded. otherwise it requests the underlying cache to return its version (and so on) so to handle our 3 levels of caches, we have to declare three variables: var imagescache25 = new imagecache(".25.jpg"); var imagescache50 = new imagecache(".50.jpg", imagescache25); var imagescachefull = new imagecache(".jpg", imagescache50); selecting the right cover is only depending on zoom: function getcorrectimagecache() { if (visucontrol.zoom <= 0.25) return imagescache25; if (visucontrol.zoom <= 0.8) return imagescache50; return imagescachefull; } to give a feedback to the user, we will add a timer that will manage a tooltip that indicates the number of images currently loaded: function updatestats() { var stats = $("#stats"); stats.html(currentdownloads + " card(s) currently downloaded."); if (currentdownloads == 0 && statsvisible) { statsvisible = false; stats.slidetoggle("fast"); } else if (currentdownloads > 1 && !statsvisible) { statsvisible = true; stats.slidetoggle("fast"); } } setinterval(updatestats, 200); again we note the use of jquery to simplify animations. we will now discuss the display of cards. cards display to draw our cards, we need to actually fill the canvas using its 2d context (which exists only if the browser supports html 5 canvas): var maincanvas = document.getelementbyid("maincanvas"); var drawingcontext = maincanvas.getcontext('2d'); the drawing will be made by processlistofcards function (called 60 times per second): function processlistofcards() { if (listofcards == undefined) { drawwaitmessage(); return; } maincanvas.width = document.getelementbyid("center").clientwidth; maincanvas.height = document.getelementbyid("center").clientheight; totalcards = listofcards.length; var localcardwidth = cardwidth * visucontrol.zoom; var localcardheight = cardheight * visucontrol.zoom; var effectivetotalcardsinwidth = colscount * localcardwidth; var rowscount = math.ceil(totalcards / colscount); var effectivetotalcardsinheight = rowscount * localcardheight; initialx = (maincanvas.width - effectivetotalcardsinwidth) / 2.0 - localcardwidth / 2.0; initialy = (maincanvas.height - effectivetotalcardsinheight) / 2.0 - localcardheight / 2.0; // clear clearcanvas(); // computing of the viewing area var initialoffsetx = initialx + visucontrol.offsetx * visucontrol.zoom; var initialoffsety = initialy + visucontrol.offsety * visucontrol.zoom; var startx = math.max(math.floor(-initialoffsetx / localcardwidth) - 1, 0); var starty = math.max(math.floor(-initialoffsety / localcardheight) - 1, 0); var endx = math.min(startx + math.floor((maincanvas.width - initialoffsetx - startx * localcardwidth) / localcardwidth) + 1, colscount); var endy = math.min(starty + math.floor((maincanvas.height - initialoffsety - starty * localcardheight) / localcardheight) + 1, rowscount); // getting current cache var imagecache = getcorrectimagecache(); // render for (var y = starty; y < endy; y++) { for (var x = startx; x < endx; x++) { var localx = x * localcardwidth + initialoffsetx; var localy = y * localcardheight + initialoffsety; // clip if (localx > maincanvas.width) continue; if (localy > maincanvas.height) continue; if (localx + localcardwidth < 0) continue; if (localy + localcardheight < 0) continue; var card = listofcards[x + y * colscount]; if (card == undefined) continue; // get from cache var img = imagecache.getimageforcard(card); // render try { if (img != undefined) drawingcontext.drawimage(img, localx, localy, localcardwidth, localcardheight); } catch (e) { $.grep(listofcards, function (item) { return item.image != img; }); } } }; // scroll bars drawscrollbars(effectivetotalcardsinwidth, effectivetotalcardsinheight, initialoffsetx, initialoffsety); // fps computefps(); } this function is built around many key points: if the cards list is not yet loaded, we display a tooltip indicating that download is in progress: var pointcount = 0; function drawwaitmessage() { pointcount++; if (pointcount > 200) pointcount = 0; var points = ""; for (var index = 0; index < pointcount / 10; index++) points += "."; $("#waittext").html("loading...please wait" + points); subsequently, we define the position of the display window (in terms of cards and coordinates), then we proceed to clean the canvas: function clearcanvas() { maincanvas.width = document.body.clientwidth - 50; maincanvas.height = document.body.clientheight - 140; drawingcontext.fillstyle = "rgb(0, 0, 0)"; drawingcontext.fillrect(0, 0, maincanvas.width, maincanvas.height); } then we browse the cards list and call the drawimage function of the canvas context. the current image is provided by the active cache (depending on the zoom): // get from cache var img = imagecache.getimageforcard(card); // render try { if (img != undefined) drawingcontext.drawimage(img, localx, localy, localcardwidth, localcardheight); } catch (e) { $.grep(listofcards, function (item) { return item.image != img; }); we also have to draw the scroll bar with the roundedrectangle function that uses quadratic curves: function roundedrectangle(x, y, width, height, radius) { drawingcontext.beginpath(); drawingcontext.moveto(x + radius, y); drawingcontext.lineto(x + width - radius, y); drawingcontext.quadraticcurveto(x + width, y, x + width, y + radius); drawingcontext.lineto(x + width, y + height - radius); drawingcontext.quadraticcurveto(x + width, y + height, x + width - radius, y + height); drawingcontext.lineto(x + radius, y + height); drawingcontext.quadraticcurveto(x, y + height, x, y + height - radius); drawingcontext.lineto(x, y + radius); drawingcontext.quadraticcurveto(x, y, x + radius, y); drawingcontext.closepath(); drawingcontext.stroke(); drawingcontext.fill(); } function drawscrollbars(effectivetotalcardsinwidth, effectivetotalcardsinheight, initialoffsetx, initialoffsety) { drawingcontext.fillstyle = "rgba(255, 255, 255, 0.6)"; drawingcontext.linewidth = 2; // vertical var totalscrollheight = effectivetotalcardsinheight + maincanvas.height; var scaleheight = maincanvas.height - 20; var scrollheight = maincanvas.height / totalscrollheight; var scrollstarty = (-initialoffsety + maincanvas.height * 0.5) / totalscrollheight; roundedrectangle(maincanvas.width - 8, scrollstarty * scaleheight + 10, 5, scrollheight * scaleheight, 4); // horizontal var totalscrollwidth = effectivetotalcardsinwidth + maincanvas.width; var scalewidth = maincanvas.width - 20; var scrollwidth = maincanvas.width / totalscrollwidth; var scrollstartx = (-initialoffsetx + maincanvas.width * 0.5) / totalscrollwidth; roundedrectangle(scrollstartx * scalewidth + 10, maincanvas.height - 8, scrollwidth * scalewidth, 5, 4); } and finally, we need to compute the number of frames per second: function computefps() { if (previous.length > 60) { previous.splice(0, 1); } var start = (new date).gettime(); previous.push(start); var sum = 0; for (var id = 0; id < previous.length - 1; id++) { sum += previous[id + 1] - previous[id]; } var diff = 1000.0 / (sum / previous.length); $("#cardscount").text(diff.tofixed() + " fps. " + listofcards.length + " cards displayed"); } drawing cards relies heavily on the browser's ability to speed up canvas rendering. for the record, here are the performances on my machine with the minimum zoom level (0.05): browser fps internet explorer 9 30 firefox 5 30 chrome 12 17 ipad (with a zoom level of 0.8) 7 windows phone mango (with a zoom level of 0.8) 20 (!!) the site even works on mobile phones and tablets as long as they support html 5. here we can see the inner power of html 5 browsers that can handle a full screen of cards more than 30 times per second! mouse management to browse our cards collection, we have to manage the mouse (including its wheel). for the scrolling, we'll just handle the onmouvemove , onmouseup and onmousedown events. onmouseup and onmousedown events will be used to detect if the mouse is clicked or not: var mousedown = 0; document.body.onmousedown = function (e) { mousedown = 1; getmouseposition(e); previousx = posx; previousy = posy; }; document.body.onmouseup = function () { mousedown = 0; }; the onmousemove event is connected to the canvas and used to move the view: var previousx = 0; var previousy = 0; var posx = 0; var posy = 0; function getmouseposition(eventargs) { var e; if (!eventargs) e = window.event; else { e = eventargs; } if (e.offsetx || e.offsety) { posx = e.offsetx; posy = e.offsety; } else if (e.clientx || e.clienty) { posx = e.clientx; posy = e.clienty; } } function onmousemove(e) { if (!mousedown) return; getmouseposition(e); mousemovefunc(posx, posy, previousx, previousy); previousx = posx; previousy = posy; } this function (onmousemove) calculates the current position and provides also the previous value in order to move the offset of the display window: function move(posx, posy, previousx, previousy) { currentaddx = (posx - previousx) / visucontrol.zoom; currentaddy = (posy - previousy) / visucontrol.zoom; } mousehelper.registermousemove(maincanvas, move); note that jquery also provides tools to manage mouse events. for the management of the wheel, we will have to adapt to different browsers that do not behave the same way on this point: function wheel(event) { var delta = 0; if (event.wheeldelta) { delta = event.wheeldelta / 120; if (window.opera) delta = -delta; } else if (event.detail) { /** mozilla case. */ delta = -event.detail / 3; } if (delta) { wheelfunc(delta); } if (event.preventdefault) event.preventdefault(); event.returnvalue = false; } we can see that everyone does what he wants :). the function to register with this event is: mousehelper.registerwheel = function (func) { wheelfunc = func; if (window.addeventlistener) window.addeventlistener('dommousescroll', wheel, false); window.onmousewheel = document.onmousewheel = wheel; }; and we will use this function to change the zoom with the wheel: // mouse mousehelper.registerwheel(function (delta) { currentaddzoom += delta / 500.0; }); finally we will add a bit of inertia when moving the mouse (and the zoom) to give some kind of smoothness: // inertia var inertia = 0.92; var currentaddx = 0; var currentaddy = 0; var currentaddzoom = 0; function doinertia() { visucontrol.offsetx += currentaddx; visucontrol.offsety += currentaddy; visucontrol.zoom += currentaddzoom; var effectivetotalcardsinwidth = colscount * cardwidth; var rowscount = math.ceil(totalcards / colscount); var effectivetotalcardsinheight = rowscount * cardheight var maxoffsetx = effectivetotalcardsinwidth / 2.0; var maxoffsety = effectivetotalcardsinheight / 2.0; if (visucontrol.offsetx < -maxoffsetx + cardwidth) visucontrol.offsetx = -maxoffsetx + cardwidth; else if (visucontrol.offsetx > maxoffsetx) visucontrol.offsetx = maxoffsetx; if (visucontrol.offsety < -maxoffsety + cardheight) visucontrol.offsety = -maxoffsety + cardheight; else if (visucontrol.offsety > maxoffsety) visucontrol.offsety = maxoffsety; if (visucontrol.zoom < 0.05) visucontrol.zoom = 0.05; else if (visucontrol.zoom > 1) visucontrol.zoom = 1; processlistofcards(); currentaddx *= inertia; currentaddy *= inertia; currentaddzoom *= inertia; // epsilon if (math.abs(currentaddx) < 0.001) currentaddx = 0; if (math.abs(currentaddy) < 0.001) currentaddy = 0; } this kind of small function does not cost a lot to implement, but adds a lot to the quality of user experience. state storage also to provide a better user experience, we will save the display window’s position and zoom. to do this, we will use the service of localstorage (which saves pairs of keys / values for the long term (the data is retained after the browser is closed) and only accessible by the current window object): function saveconfig() { if (window.localstorage == undefined) return; // zoom window.localstorage["zoom"] = visucontrol.zoom; // offsets window.localstorage["offsetx"] = visucontrol.offsetx; window.localstorage["offsety"] = visucontrol.offsety; } // restore data if (window.localstorage != undefined) { var storedzoom = window.localstorage["zoom"]; if (storedzoom != undefined) visucontrol.zoom = parsefloat(storedzoom); var storedoffsetx = window.localstorage["offsetx"]; if (storedoffsetx != undefined) visucontrol.offsetx = parsefloat(storedoffsetx); var storedoffsety = window.localstorage["offsety"]; if (storedoffsety != undefined) visucontrol.offsety = parsefloat(storedoffsety); } animations to add even more dynamism to our application we will allow our users to double-click on a card to zoom and focus on it. our system should animate three values: the two offsets (x, y) and the zoom. to do this, we will use a function that will be responsible of animating a variable from a source value to a destination value with a given duration: var animationhelper = function (root, name) { var paramname = name; this.animate = function (current, to, duration) { var offset = (to - current); var ticks = math.floor(duration / 16); var offsetpart = offset / ticks; var tickscount = 0; var intervalid = setinterval(function () { current += offsetpart; root[paramname] = current; tickscount++; if (tickscount == ticks) { clearinterval(intervalid); root[paramname] = to; } }, 16); }; }; the use of this function is: // prepare animations parameters var zoomanimationhelper = new animationhelper(visucontrol, "zoom"); var offsetxanimationhelper = new animationhelper(visucontrol, "offsetx"); var offsetyanimationhelper = new animationhelper(visucontrol, "offsety"); var speed = 1.1 - visucontrol.zoom; zoomanimationhelper.animate(visucontrol.zoom, 1.0, 1000 * speed); offsetxanimationhelper.animate(visucontrol.offsetx, targetoffsetx, 1000 * speed); offsetyanimationhelper.animate(visucontrol.offsety, targetoffsety, 1000 * speed); the advantage of the animationhelper function is that it is able to animate as many parameters as you wish (and that only with the settimer function!) handling multi-devices finally we will ensure that our page can also be seen on tablets pc and even on phones. to do this, we will use a feature of css 3: the media-queries . with this technology, we can apply style sheets according to some queries such as a specific display size: here we see that if the screen width is less than 480 pixels, the following style sheet will be added: #legal { font-size: 8px; } #title { font-size: 30px !important; } #waittext { font-size: 12px; } #bolaslogo { width: 48px; height: 48px; } #picturecell { width: 48px; } finally we will ensure that our page can also be seen on tablets pc and even on phones. to do this, we will use a feature of css 3: #legal { font-size: 8px; } #title { font-size: 30px !important; } #waittext { font-size: 12px; } #bolaslogo { width: 48px; height: 48px; } #picturecell { width: 48px; } conclusion html 5 / css 3 / javascript and visual studio 2010 allow to develop portable and efficient solutions (within the limits of browsers that support html 5 of course) with some great features such as hardware accelerated rendering. this kind of development is also simplified by the use of frameworks like jquery. also, i am especially fan of javascript that turns out to be a very powerful dynamic language. of course, c# or vb.net developers have to change theirs reflexes but for the development of web pages it's worth. in conclusion, i think that the best to be convinced is to try! to go further internet explorer test drive: http://ie.microsoft.com/testdrive/ internet explorer 9 guide for developer : http://msdn.microsoft.com/en-us/ie/ff468705 w3c site for html 5 : http://dev.w3.org/html5/spec/overview.html internet explorer site : http://msdn.microsoft.com/en-us/ie/aa740469 about the author david catuhe is a developer evangelist for microsoft france in charge of user experience development tools (from xaml to directx/xna and html5). he defines himself as a geek and likes coding all that refer to graphics. before working for microsoft, he founded a company that developed a realtime 3d engine written with directx ( www.vertice.fr ). source: http://blogs.msdn.com/b/eternalcoding/archive/2011/07/25/feedback-of-a-graphic-development-using-html5-amp-javascript.aspx

October 24, 2011

by David Catuhe

· 25,151 Views · 1 Like

RDF data in Neo4J - the Tinkerpop story

My previous blog post discussed the use of Neo4J as a RDF triple store. Michael Hunger however informed me that the neo-rdf-sail component is no longer under active development and advised me to have a look at Tinkerpop’s Sail implementation. As mentioned in my previous blog post, I recently got asked to implement a storage and querying platform for biological RDF (Resource Description Framework) data. Traditional RDF stores are not really an option as my solution should also provide the ability to calculate shortest paths between random subjects. Calculating shortest path is however one of the strong selling points of Graph Databases and more specifically Neo4J. Unfortunately, the neo-rdf-sail component, which suits my requirements perfectly, is no longer under active development. Tinkerpop’s Sail implementation however, fills the void with an even better alternative! 1. What is Tinkerpop? Tinkerpop is an open source project that provides an entire stack of technologies within the Graph Database space. At the core of this stack is the Blueprints framework. Blueprints can be considered as the JDBC of Graph Databases. By providing a collection of generic interfaces, it allows to develop graph-based applications, without introducing explicit dependencies on concrete Graph Database implementations. Additionally, Blueprints provides concrete bindings for the Neo4J, OrientDB and Dex Graph Databases. On top of Blueprints, the Tinkerpop team developed an entire range of graph technologies, including Gremlin, a powerful, domain-specific language designed for traversing graphs. Hence, once a Blueprints binding is available for a particular Graph Database, an entire range of technologies can be leveraged. 2. Tinkerpop and Sail Last time, I talked about exposing a Neo4J Graph Database (containing RDF triples) through the Sail interface, which is part of the openrdf.org project. By doing so, we can reuse an entire range of RDF utilities (parsers and query evaluators) that are part of the openrdf.org project. The Blueprints framework provides us with a similar ability: each Graph Database binding that implements the Tinkerpop TransactionalGraph and IndexableGraph interfaces can be exposed as a GraphSail, which is Tinkerpop’s implementation of the Sail interface. Once you have your Sail available, storing and querying RDF is analogous to the piece of code shown in my previous blog article. // Create the sail graph database graph = new MyNeo4jGraph("var/flights", 100000); graph.setTransactionMode(TransactionalGraph.Mode.MANUAL); sail = new GraphSail(graph); // Initialize the sail store sail.initialize(); // Get the sail repository connection connection = new SailRepository(sail).getConnection(); // Import the data connection.add(getResource("sneeair.rdf"), null, RDFFormat.RDFXML); // Execute SPARQL query TupleQuery durationquery = connection.prepareTupleQuery(QueryLanguage.SPARQL, "PREFIX io: " + "PREFIX fl: " + "SELECT ?number ?departure ?destination " + "WHERE { " + "?flight io:flight ?number . " + "?flight fl:flightFromCityName ?departure . " + "?flight fl:flightToCityName ?destination . " + "?flight io:duration \"1:35\" . " + "}"); TupleQueryResult result = durationquery.evaluate(); The two first lines of code require some more clarification. A TransactionalGraph can be run in MANUAL or AUTOMATIC transaction mode. In AUTOMATIC mode, transactions are basically ignored, in the sense that each item that gets created is immediately persisted in the underlying Graph Database. Although this fits my needs, AUTOMATIC mode is extremely slow in case of Neo4J because of the continuous IO access. MANUAL mode on the other hand is very fast; a new transaction is created at the moment the import of the RDF data file starts and is only committed to the Neo4J data store once all RDF triples are parsed and created. Unfortunately, MANUAL mode does not scale either in my specific situation; as some of my RDF data files contain over 50 million RDF triples, they can not fit into memory (i.e. Java heap space error). Requiring fast imports, I extended the default Neo4J Blueprints binding to support intermediate commits. I based my implementation on Neo4J’s best practices for big transactions. The idea is rather simple: you specify the maximum number of items that can be kept in memory, before they should be committed to the Neo4J data store. Once this number is reached, the current transaction is committed and a new one is automatically started. Simple, but very effective! public class MyNeo4jGraph extends Neo4jGraph { private long numberOfItems = 0; private long maxNumberOfItems = 1; public MyNeo4jGraph(final String directory, long maxNumberOfItems) { super(directory, null); this.maxNumberOfItems = maxNumberOfItems; } public MyNeo4jGraph(final String directory, final Map configuration, long maxNumberOfItems) { super(directory, configuration); this.maxNumberOfItems = maxNumberOfItems; } public Vertex addVertex(final Object id) { Vertex vertex = super.addVertex(id); commitIfRequired(); return vertex; } public Edge addEdge(final Object id, final Vertex outVertex, final Vertex inVertex, final String label) { Edge edge = super.addEdge(id, outVertex, inVertex, label); commitIfRequired(); return edge; } private void commitIfRequired() { // Check whether commit should be executed if (++numberOfItems % maxNumberOfItems == 0) { // Stop the transaction stopTransaction(Conclusion.SUCCESS); // Immediately start a new one startTransaction(); } } } 3. Shortest path calculation Although Blueprints allows you to abstract away the Neo4J implementation details, it still provides you with access to the raw Neo4J data store if needed. Hence, one can still use the graph algorithms provided in the neo4j-graph-algo component to calculate shortest paths between random subjects. The complete source code can be found on the Datablend public GitHub repository.

October 24, 2011

by Davy Suvee

· 25,306 Views

How to Load or Save Image using Hibernate – MySQL

This tutorial will walk you throughout how to save and load an image from database (MySQL) using Hibernate. Requirements For this sampel project, we are going to use: Eclipse IDE (you can use your favorite IDE); MySQL (you can use any other database, make sure to change the column type if required); Hibernate jars and dependencies (you can download the sample project with all required jars); JUnit - for testing (jar also included in the sample project). PrintScreen When we finish implementing this sample projeto, it should look like this: Database Model Before we get started with the sample projet, we have to run this sql script into MySQL: DROP SCHEMA IF EXISTS `blog` ; CREATE SCHEMA IF NOT EXISTS `blog` DEFAULT CHARACTER SET latin1 COLLATE latin1_swedish_ci ; USE `blog` ; -- ----------------------------------------------------- -- Table `blog`.`BOOK` -- ----------------------------------------------------- DROP TABLE IF EXISTS `blog`.`BOOK` ; CREATE TABLE IF NOT EXISTS `blog`.`BOOK` ( `BOOK_ID` INT NOT NULL AUTO_INCREMENT , `BOOK_NAME` VARCHAR(45) NOT NULL , `BOOK_IMAGE` MEDIUMBLOB NOT NULL , PRIMARY KEY (`BOOK_ID`) ) ENGINE = InnoDB; This script will create a table BOOK, which we are going to use in this tutorial. Book POJO We are going to use a simple POJO in this project. A Book has an ID, a name and an image, which is represented by an array of bytes. As we are going to persist an image into the database, we have to use the BLOB type. MySQLhas some variations of BLOBs, you can check the difference between them here. In this example, we are going to use the Medium Blob, which can store L + 3 bytes, where L < 2^24. Make sure you do not forget to add the column definition on the Column annotation. package com.loiane.model; import javax.persistence.Column; import javax.persistence.Entity; import javax.persistence.GeneratedValue; import javax.persistence.Id; import javax.persistence.Lob; import javax.persistence.Table; @Entity @Table(name="BOOK") public class Book { @Id @GeneratedValue @Column(name="BOOK_ID") private long id; @Column(name="BOOK_NAME", nullable=false) private String name; @Lob @Column(name="BOOK_IMAGE", nullable=false, columnDefinition="mediumblob") private byte[] image; public long getId() { return id; } public void setId(long id) { this.id = id; } public String getName() { return name; } public void setName(String name) { this.name = name; } public byte[] getImage() { return image; } public void setImage(byte[] image) { this.image = image; } } Hibernate Config This configuration file contains the required info used to connect to the database. com.mysql.jdbc.Driver jdbc:mysql://localhost/blog root root org.hibernate.dialect.MySQLDialect 1 true Hibernate Util The HibernateUtil class helps in creating the SessionFactory from the Hibernate configuration file. package com.loiane.hibernate; import org.hibernate.SessionFactory; import org.hibernate.cfg.AnnotationConfiguration; import com.loiane.model.Book; public class HibernateUtil { private static final SessionFactory sessionFactory; static { try { sessionFactory = new AnnotationConfiguration() .configure() .addPackage("com.loiane.model") //the fully qualified package name .addAnnotatedClass(Book.class) .buildSessionFactory(); } catch (Throwable ex) { System.err.println("Initial SessionFactory creation failed." + ex); throw new ExceptionInInitializerError(ex); } } public static SessionFactory getSessionFactory() { return sessionFactory; } } DAO In this class, we created two methods: one to save a Book instance into the database and another one to load a Book instance from the database. package com.loiane.dao; import org.hibernate.HibernateException; import org.hibernate.Session; import org.hibernate.Transaction; import com.loiane.hibernate.HibernateUtil; import com.loiane.model.Book; public class BookDAOImpl { /** * Inserts a row in the BOOK table. * Do not need to pass the id, it will be generated. * @param book * @return an instance of the object Book */ public Book saveBook(Book book) { Session session = HibernateUtil.getSessionFactory().openSession(); Transaction transaction = null; try { transaction = session.beginTransaction(); session.save(book); transaction.commit(); } catch (HibernateException e) { transaction.rollback(); e.printStackTrace(); } finally { session.close(); } return book; } /** * Delete a book from database * @param bookId id of the book to be retrieved */ public Book getBook(Long bookId) { Session session = HibernateUtil.getSessionFactory().openSession(); try { Book book = (Book) session.get(Book.class, bookId); return book; } catch (HibernateException e) { e.printStackTrace(); } finally { session.close(); } return null; } } Test To test it, first we need to create a Book instance and set an image to the image attribute. To do so, we need to load an image from the hard drive, and we are going to use the one located in the images folder. Then we can call the DAO class and save into the database. Then we can try to load the image. Just to make sure it is the same image we loaded, we are going to save it in the hard drive. package com.loiane.test; import static org.junit.Assert.assertNotNull; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import org.junit.AfterClass; import org.junit.BeforeClass; import org.junit.Test; import com.loiane.dao.BookDAOImpl; import com.loiane.model.Book; public class TestBookDAO { private static BookDAOImpl bookDAO; @BeforeClass public static void runBeforeClass() { bookDAO = new BookDAOImpl(); } @AfterClass public static void runAfterClass() { bookDAO = null; } /** * Test method for {@link com.loiane.dao.BookDAOImpl#saveBook()}. */ @Test public void testSaveBook() { //File file = new File("images\\extjsfirstlook.jpg"); //windows File file = new File("images/extjsfirstlook.jpg"); byte[] bFile = new byte[(int) file.length()]; try { FileInputStream fileInputStream = new FileInputStream(file); fileInputStream.read(bFile); fileInputStream.close(); } catch (Exception e) { e.printStackTrace(); } Book book = new Book(); book.setName("Ext JS 4 First Look"); book.setImage(bFile); bookDAO.saveBook(book); assertNotNull(book.getId()); } /** * Test method for {@link com.loiane.dao.BookDAOImpl#getBook()}. */ @Test public void testGetBook() { Book book = bookDAO.getBook((long) 1); assertNotNull(book); try{ //FileOutputStream fos = new FileOutputStream("images\\output.jpg"); //windows FileOutputStream fos = new FileOutputStream("images/output.jpg"); fos.write(book.getImage()); fos.close(); }catch(Exception e){ e.printStackTrace(); } } } To verify if it was really saved, let’s check the table Book: and if we right click… and choose to see the image we just saved, we will see it: Source Code Download You can download the complete source code (or fork/clone the project – git) from: Github: https://github.com/loiane/hibernate-image-example BitBucket: https://bitbucket.org/loiane/hibernate-image-example/downloads Happy Coding! From http://loianegroner.com/2011/10/how-to-load-or-save-image-using-hibernate-mysql/

October 24, 2011

by Loiane Groner

· 98,082 Views · 2 Likes

OpenStreetMap API framework for PHP

OpenStreetMap is a global project with an aim of collaboratively collecting map data, and today Ken Guest has submitted his PHP package for communitcating with the OSM API to the public and the PEAR PEPr review process: So over the last while, I’ve been working on a PHP package imaginatively named Services_Openstreetmap, for interacting with the openstreetmap API. I initially needed it so I could search for certain POIs and tabulate the results; it’s now also capable of adding data to the openstreetmap database – nodes and other elements can be created, updated and so on. It will even access the details of the user that is being used to modify that data, which is one difference between it and the other single purpose OSM frameworks. --Ken Guest You can view the submission here, and you should definitely take a look at openstreetmap.org if you haven't already. Good news for PHP developers looking to use this project more heavily in their applications.

October 22, 2011

by Mitch Pronschinske

· 16,297 Views

How to retrieve/extract metadata information from audio files using Java and Apache Tika API?

i guess, i’m writing this post after a long time. this time, i’m writing about apache tika api that a friend of mine and i tried out to extract/retrieve metadata information from audio files supported by it – .mp3, .aiff, .au, .midi, .wav. to make it clear, here’s a screenshot of the information shown by windows vista about an audio file: we wanted to extract this using java and with googling, found that apache tika would help. we needed this metadata to index audio files for it to be searchable in a search application that we’re building using apache lucene . here’s a sample java program that extracts metadata from an mp3 file: package singz.samples.search.audio.metadata; import java.io.file; import java.io.fileinputstream; import java.io.filenotfoundexception; import java.io.ioexception; import java.io.inputstream; import org.apache.tika.exception.tikaexception; import org.apache.tika.metadata.metadata; import org.apache.tika.parser.parsecontext; import org.apache.tika.parser.parser; import org.apache.tika.parser.mp3.mp3parser; import org.xml.sax.contenthandler; import org.xml.sax.saxexception; import org.xml.sax.helpers.defaulthandler; /** * @author singaram subramanian * extract metadata of an audio file using apache tika api * */ public class audiometadataextractordemo { public static void main(string[] args) { // this audio file has metadata embedded in xmp (extensible metadata platform) standard // created by adobe systems inc. xmp standardizes the definition, creation, and // processing of extensible metadata. string audiofileloc = "c:\\pop\\backstreetboys_showmethemeaningofbeinglonely.mp3"; try { inputstream input = new fileinputstream(new file(audiofileloc)); contenthandler handler = new defaulthandler(); metadata metadata = new metadata(); parser parser = new mp3parser(); parsecontext parsectx = new parsecontext(); parser.parse(input, handler, metadata, parsectx); input.close(); // list all metadata string[] metadatanames = metadata.names(); for(string name : metadatanames){ system.out.println(name + ": " + metadata.get(name)); } // retrieve the necessary info from metadata // names - title, xmpdm:artist etc. - mentioned below may differ based // on the standard used for processing and storing standardized and/or // proprietary information relating to the contents of a file. system.out.println("title: " + metadata.get("title")); system.out.println("artists: " + metadata.get("xmpdm:artist")); system.out.println("genre: " + metadata.get("xmpdm:genre")); } catch (filenotfoundexception e) { e.printstacktrace(); } catch (ioexception e) { e.printstacktrace(); } catch (saxexception e) { e.printstacktrace(); } catch (tikaexception e) { e.printstacktrace(); } } } maven pom xml 4.0.0 singz.samples.search.audio audiometadataextractor 0.0.1 jar audiometadataextractor http://maven.apache.org utf-8 org.apache.tika tika-core 0.10 org.apache.tika tika-parsers 0.10 output xmpdm:releasedate: 2001 xmpdm:audiochanneltype: stereo xmpdm:album: top 100 pop author: backstreet boys xmpdm:artist: backstreet boys channels: 2 xmpdm:audiosamplerate: 44100 xmpdm:logcomment: eng xmpdm:tracknumber: 04 version: mpeg 3 layer iii version 1 xmpdm:composer: null xmpdm:audiocompressor: mp3 title: show me the meaning of being lonely samplerate: 44100 xmpdm:genre: pop content-type: audio/mpeg title: show me the meaning of being lonely artists: backstreet boys genre: pop about apache tika http://tika.apache.org/index.html “the apache tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries.” http://www.lucidimagination.com/devzone/technical-articles/content-extraction-tika#article.tika “apache tika is a content type detection and content extraction framework. tika provides a general application programming interface that can be used to detect the content type of a document and also parse textual content and metadata from several document formats. tika does not try to understand the full variety of different document formats by itself but instead delegates the real work to various existing parser libraries such as apache poi for microsoft formats, pdfbox for adobe pdf, neko html for html etc. the grand idea behind tika is that it offers a generic interface for parsing multiple formats. the tika api hides the technical differences of the various parser implementations. this means that you don’t have to learn and consume one api for every format you use but can instead use a single api – the tika api. internally tika usually delegates the parsing work to existing parsing libraries and adapts the parse result so that client applications can easily manage variety of formats. tika aims to be efficient in using available resources (mainly ram) while parsing. the tika api is stream oriented so that the parsed source document does not need to be loaded into memory all at once but only as it is needed. ultimately, however, the amount of resources consumed is mandated by the parser libraries that tika uses. at the time of writing this, tika supports directly around 30 document formats. see list of supported document formats . the list of supported document formats is not limited by tika in any way. in the simplest case you can add support for new document formats by implementing a thin adapter that that implements the parser interface for the new document format.” about xmp standard http://en.wikipedia.org/wiki/extensible_metadata_platform “the adobe extensible metadata platform ( xmp ) is a standard, created by adobe systems inc. , for processing and storing standardized and proprietary information relating to the contents of a file. xmp standardizes the definition, creation, and processing of extensible metadata . serialized xmp can be embedded into a significant number of popular file formats, without breaking their readability by non-xmp-aware applications. embedding metadata avoids many problems that occur when metadata is stored separately. xmp is used in pdf , photography and photo editing applications. xmp can be used in several file formats such as pdf , jpeg , jpeg 2000 , jpeg xr , gif , png , html , tiff , adobe illustrator , psd , mp3 , mp4 , audio video interleave , wav , rf64 , audio interchange file format , postscript , encapsulated postscript , and proposed for djvu . in a typical edited jpeg file, xmp information is typically included alongside exif and iptc information interchange model data.” from http://singztechmusings.wordpress.com/2011/10/17/how-to-retrieveextract-metadata-information-from-audio-files-using-java-and-apache-tika-api/

October 20, 2011

by Singaram Subramanian

· 34,226 Views

Handling PHP Sessions in Windows Azure

One of the challenges in building a distributed web application is in handling sessions. When you have multiple instances of an application running and session data is written to local files (as is the default behavior for the session handling functions in PHP) a user session can be lost when a session is started on one instance but subsequent requests are directed (via a load balancer) to other instances. To successfully manage sessions across multiple instances, you need a common data store. In this post I’ll show you how the Windows Azure SDK for PHP makes this easy by storing session data in Windows Azure Table storage. In the 4.0 release of the Windows Azure SDK for PHP, session handling via Windows Azure Table and Blob storage was included in the newly added SessionHandler class. Note: The SessionHandler class supports storing session data in Table storage or Blob storage. I will focus on using Table storage in this post largely because I haven’t been able to come up with a scenario in which using Blob storage would be better (or even necessary). If you have ideas about how/why Blob storage would be better, I’d love to hear them. The SessionHandler class makes it possible to write code for handling sessions in the same way you always have, but the session data is stored on a Windows Azure Table instead of local files. To accomplish this, precede your usual session handling code with these lines: require_once 'Microsoft/WindowsAzure/Storage/Table.php'; require_once 'Microsoft/WindowsAzure/SessionHandler.php'; $storageClient = new Microsoft_WindowsAzure_Storage_Table('table.core.windows.net', 'your storage account name', 'your storage account key'); $sessionHandler = new Microsoft_WindowsAzure_SessionHandler($storageClient , 'sessionstable'); $sessionHandler->register(); Now you can call session_start() and other session functions as you normally would. Nicely, it just works. Really, that’s all there is to using the SessionHandler, but I found it interesting to take a look at how it works. The first interesting thing to note is that the register method is simply calling the session_set_save_handler function to essentially map the session handling functionality to custom functions. Here’s what the method looks like from the source code: public function register() { return session_set_save_handler(array($this, 'open'), array($this, 'close'), array($this, 'read'), array($this, 'write'), array($this, 'destroy'), array($this, 'gc') ); } The reading, writing, and deleting of session data is only slightly more complicated. When writing session data, the key-value pairs that make up the data are first serialized and then base64 encoded. The serialization of the data allows for lots of flexibility in the data you want to store (i.e. you don’t have to worry about matching some schema in the data store). When storing data in a table, each entry must have a partition key and row key that uniquely identify it. The partition key is a string (“sessions” by default, but this is changeable in the class constructor) and the the row key is the session ID. (For more information about the structure of Tables, see this post.) Finally, the data is either updated (it it already exists in the Table) or a new entry is inserted. Here’s a portion of the write function: $serializedData = base64_encode(serialize($serializedData)); $sessionRecord = new Microsoft_WindowsAzure_Storage_DynamicTableEntity($this->_sessionContainerPartition, $id); $sessionRecord->sessionExpires = time(); $sessionRecord->serializedData = $serializedData; try { $this->_storage->updateEntity($this->_sessionContainer, $sessionRecord); } catch (Microsoft_WindowsAzure_Exception $unknownRecord) { $this->_storage->insertEntity($this->_sessionContainer, $sessionRecord); } Not surprisingly, when session data is read from the table, it is retrieved by session ID, base64 decoded, and unserialized. Again, here’s a snippet that show’s what is happening: $sessionRecord = $this->_storage->retrieveEntityById( $this->_sessionContainer, $this->_sessionContainerPartition, $id ); return unserialize(base64_decode($sessionRecord->serializedData)); As you can see, the SessionHandler class makes good use of the storage APIs in the SDK. To learn more about the SessionHandler class (and the storage APIs), check out the documentation on Codeplex. You can, of course, get the complete source code here: http://phpazure.codeplex.com/SourceControl/list/changesets. As I investigated the session handling in the Windows Azure SDK for PHP, I noticed that the absence of support for SQL Azure as a session store was conspicuous. I’m curious about how many people would prefer to use SQL Azure over Azure Tables as a session store. If you have an opinion on this, please let me know in the comments.

October 19, 2011

by Brian Swan

· 7,896 Views

Getters and Setters Are Not Evil

Every now and then some OOP purist comes and tells us that getters and setters are evil, because they break encapsulation. And you should never, ever use getters and setters because this is a sign of a bad design and leads to maintainability nightmares. Well, don’t worry, because those people are wrong. Not completely wrong of course, because getters and setters can break encapsulation, but in the usual scenario for regular business projects they don’t. What is the purpose of encapsulation? First, to hide how exactly an object performs its job. And to protect the internal data of an object, so that no external object can violate its state space. In other words, only the object knows which combination of field values is valid and which isn’t. Exposing fields to the outside world can leave the object in inconsistent state. For example what if you could change the backing array in an ArrayList, without setting the size field? The ArrayList instance will be inconsistent and will be violating its contract. So no getter and setter for the array list internal array. But the majority of objects for which people generate getters and setters are simple data holders. They don’t have any rules to enforce on their state, the state space consists of all possible combinations of values, and furthermore – there is nothing they can do with that data. And before you call me “anemic”, it doesn’t matter if you are doing “real OOP” with domain-driven design, where you have business logic & state in the same object, or you are doing fat service layer + anemic objects. Why it doesn’t matter? Because even in domain-driven projects you have DTOs. And DTOs are simply data holders, which need getters and setters. Another thing is that in many cases your object state is public anyway. Tools use reflection to make use of objects – view technologies use EL to access objects, ORMs use reflection to persist your entities, jackson uses reflection to serialize your objects to JSON, jasper reports uses reflection to get details from its model, etc. Virtually anything you do in the regular project out there requires data being passed outside of the application: to the user, to the database, to the printer, as a result of an API call. And you have to know what that data is. In EL you have ${foo.bar} – with, or without a getter, you consume that field. In an ORM you need to know what database types to use. In the documentation of your JSON API you should specify the structure (another topic here is whether rest-like services need documentation). The overall point here is that you win nothing by not having getters and setters on your data holder objects. Their internal state is public anyway, and it has to be. And any change in those fields means a change has to be made in other places. Change is something people fear – “you will have to change it everywhere in your project” .. well, yeah, you have, because it has changed. If you change the structure of an address from String to an Address class, it’s likely that you should revisit all places it is used and split it there as well. If you change a double to BigDecimal you’d better go and fix all your calculations. Another point – the above examples emphasized on reading the data. However, you must set that data somehow. You have roughly 3 options – constructor, builder, setters. A constructor with 15 arguments is obviously not an option. A builder for every object is just too verbose. So we use setters, because it is more practical and more readable. And that’s the main point here – setters and getters are practical when used on data holder objects. I have supported quite big projects that had a lot of setters and getters, and I had absolutely no problem with that. In fact, tracing “who sets that data” is the same as “where did this object (that encapsulates its data) came from”. And yes, in an ideal OO world you wouldn’t need data holders / DTOs, and there will be no flow of data in the system. But in the real world there is. To conclude – be careful with setters and getters on non-data-only objects. Encapsulation is a real thing. When designing a library, a component or some base frameworks in your project – don’t simply generate getters and setters. But for the regular data object – don’t worry, there’s no evil in that. From http://techblog.bozho.net/?p=621

October 14, 2011

by Bozhidar Bozhanov

· 23,732 Views · 1 Like

The Goal of software development

The Goal by Eli Goldratt is a business book in the form of a novel, where the protagonist must save his factory from closing due to very low productivity. The Goal is not limited to the management of a large organization (not even to for-profit companies): you simply have to define different units of measurement, like goal units instead of making money, the default goal. In fact, from the applications of the Theory of Constraints in our field I think it applies to software development too. What follows is my translation of the themes of The Goal to our field. The Goal is... Mking money, of course. For the ones of you with knowledge in accounting, the original goal is: raising throghput, the amounts of items sold (not produced) in the unit of time. Lowering investment/inventory, all the money tied up in the system in the form of assets that could be sold or products that stay in a warehouse. Lowering operational expense, all the money that we have spent as support and that cannot be recovered. How does these measurements apply to software development? A team does not always have an impact on contract negotiation, so often talking about money is far from everyday reality (kudos to you if you can apply that point of the Agile Manifesto.) The goal for software development can be translated, in my opinion, to: raising the throughput, the amount of features delivered (deployed, not implemented or tested) in the unit of time. You can measure this amount in story points, since feature vary in size. lowering investment/inventory, all the time tied up in the system in the form of undeployed or untested features that clutter the code base. In a minor part, also investment in the form of hardware, but that's by far less important than the team's time. lower operational expense, the time spent by developers every day in order to support the development. Automation is a kind of time investment that will bring more time (and quality) in the future, lowering operational expense. Like for material products, WIP has storage and opportunity costs which goes into the operational expense. Kanban is a tool that tries to reduce WIP in order to foster the two latter points. Throughput accounting This kind of throughput accounting is emphasized in the novel, over the use of cost accounting, where each developer (ehm, factory worker) has to be occupied all the time, even if the work he is doing isn't moving towards the goal: neverending refactoring. Specification of a feature which cannot be implemented until two months are gone, and will have to be rewritten. Implementation of features which won't be merged with the main branch any time soon. With Test-Driven Development, we are getting good at moving a feature from implemented to tested directly in the same commit. Yet the missing step is getting the feature to the users: maybe that's also what Continuous Deployment is all about... Dependent events Dependent events and statistical fluctuations are production systems topics that make a balanced plant close to bankruptcy: however, we're not at the point in which we can model our team as precisely as a factory. The basic point is that a plant in which everyone is working all the time is inefficient: when an early stage (like defining a specification or implementing a feature) gets delayed, downstream step such as deployment are dalayed too. Converely, when an upstream step finish earlier, the downstream stage is already at maximum efficiency and cannot process the intermediate result faster. I wonder if this applies to software development too. In a factory, workers are specialized and can do just a few jobs across the plant. Since workers and machines have different production rates, there will be just one bottleneck: the slowest one. If products have to pass from the bottleneck, anyone producing faster than the bottleneck will just accumulate WIP in front of him. Continuing with our example, if the analyst or domain expert is churning out specifications for new features every day, most of them are just WIP in front of the development team. Once there is an established buffer, any additional specification won't raise throughput any faster; instead, it will raise the inventory (partial features) and the time spent in managing it. I think this is not always true in the most technical phases of development instead. For example, in a small team a developer may be moved to testing or refactoring, or setting up Continuous Integration or evaluation of a new library. Unless you have a DBA which can just manage databases, your developer is not fixed into a stage of the system. The bottleneck The previous example featured a bottleneck, the most famous concept of the Theory of Constraints introduced in the book. This translation to the software development case is mine and could be incomplete. A feature (or a user story) has to pass in a series of stations where different people will work on it to make it real: specification from a domain expert, implementation from a technical team, extended testing with optional customer validation, deployment which should be fast but at the same time must not kill the current version of the application. Each station has an average velocity. By definition, there is a station which is the slowest and can process fewest story points in the unit of time. This is the bottleneck. (This is not always true, as velocity may vary greatly in time with the addition of new people or hidden lines discovered in a feature. Becoming good at estimation and stabilizing a team are two objectives that help reach the assumption.) You can identify a bottleneck by looking at where is the WIP: it will accumulate in front of it. If you already have a kanban board, this phase is simpler... Once identified, the throughput of the system can only be improved by raising the bottleneck's capacity enough so that it is no more a bottleneck. You can move people to it (keeping an eye on communication costs); ensure it is used at maximum efficiency by freeing the specialized developers from other mundane tasks. Now you can restart and find a new bottleneck... Conclusions This is just a little introduction to the themes of the Theory of Constraints and Goldratt's teachings; I don't pretend to explain the whole book in an article. It is also against the Socratic method: you should reach the answers yourself, and these are just examples from my experience. There is more to Goldratt and The Goal than bottlenecks and throughput, such as continuous improvement. If you're working or managing in a software development team, I suggest you to read this book if you have the opportunity. Even when freelancing, it is an eye-opener in moving towards a Goal instead of busyworking; and it's written with a never boring teaching method.

October 4, 2011

by Giorgio Sironi

· 21,635 Views

Brute forcing a bin packing problem

Even a basic planning problem, such as bin packing, can be notoriously hard to solve and scale. One might consider the brute force algorithm. Let's take a look at how that algorithm works out on the cloud balance example of Drools Planner: Given a set of servers with different hardware (CPU, memory and network bandwidth) and given a set of processes with different hardware requirements, assign each process to 1 server and minimize the total cost of the active servers. The brute force algorithm is simple: try every combination between processes where each process is assigned to each server. For example, if we have 6 processes (P0, P1, P2, P3, P4, P5) and 2 servers (S0, S1), we'd try these solutions: P0->S0, P1->S0, P2->S0, P3->S0, P4->S0, P5->S0 P0->S0, P1->S0, P2->S0, P3->S0, P4->S0, P5->S1 P0->S0, P1->S0, P2->S0, P3->S0, P4->S1, P5->S0 P0->S0, P1->S0, P2->S0, P3->S0, P4->S1, P5->S1 ... P0->S1, P1->S1, P2->S1, P3->S1, P4->S1, P5->S1 On my machine, it takes 15ms to calculate the score of these 2^6 combinations. When I scale out to 9 processes and 3 servers, which are 3^9 combinations, it becomes 1582ms. So it scales like this: Notice that despite that the number of processes has not even doubled, the running time multiplied by 100! For comparison, I 've added the running time of the First Fit algorithm. And it gets worse: for 12 processes and 4 servers, which are 4^12 combinations, it take more than 17 minutes: What if we want to distribute 3000 processes over 1000 servers? With this kind of scalability, it will take too long. In fact, the brute force algorithm is useless. Luckily, Drools Planner implements several other optimization algorithms, which can handle such loads. If you want to know more about them, take a look at the Drools Planner manual or come to my talk at JUDCon London (31 Oct - 1 Nov). This article was originally posted on the Drools & jBPM blog.

September 26, 2011

by Geoffrey De Smet

· 9,956 Views

Hibernate by Example - Part 1 (Orphan removal)

So i thought to do a series of hibernate examples showing various features of hibernate. In the first part i wanted to show about the Delete Orphan feature and how it may be used with the use of a story line. So let us begin :) Prerequisites: In order for you to try out the following example you will need the below mentioned JAR files: org.springframework.aop-3.0.6.RELEASE.jar org.springframework.asm-3.0.6.RELEASE.jar org.springframework.aspects-3.0.6.RELEASE.jar org.springframework.beans-3.0.6.RELEASE.jar org.springframework.context.support-3.0.6.RELEASE.jar org.springframework.context-3.0.6.RELEASE.jar org.springframework.core-3.0.6.RELEASE.jar org.springframework.jdbc-3.0.6.RELEASE.jar org.springframework.orm-3.0.6.RELEASE.jar org.springframework.transaction-3.0.6.RELEASE.jar. org.springframework.expression-3.0.6.RELEASE.jar commons-logging-1.0.4.jar log4j.jar aopalliance-1.0.jar dom4j-1.1.jar hibernate-commons-annotations-3.2.0.Final.jar hibernate-core-3.6.4.Final.jar hibernate-jpa-2.0-api-1.0.0.Final.jar javax.persistence-2.0.0.jar jta-1.1.jar javassist-3.1.jar slf4j-api-1.6.2.jar mysql-connector-java-5.1.13-bin.jar commons-collections-3.0.jar For anyone who want the eclipse project to try this out, you can download it with the above mentioned JAR dependencies here. Introduction: Its year 2011. And The Justice League has grown out of proportion and are searching for a developer to help with creating a super hero registering system. A developer competent in Hibernate and ORM is ready to do the system and handle the persistence layer using Hibernate. For simplicity, He will be using a simple stand alone application to persist super heroes. This is how this example will layout: Table design Domain classes and Hibernate mappings DAO & Service classes Spring configuration for the application A simple main class to show how it all works Let the Journey Begin...................... Table Design: The design consists of three simple tables as illustrated by the diagram below; As you can see its a simple one-to-many relationship linked by a Join Table. The Join Table will be used by Hibernate to fill the Super hero list which is in the domain class which we will go on to see next. Domain classes and Hibernate mappings: There are mainly only two domain classes as the Join table in linked with the primary owning entity which is the Justice League entity. So let us go on to see how the domain classes are constructed with annotations; package com.justice.league.domain; import java.io.Serializable; import javax.persistence.Column; import javax.persistence.Entity; import javax.persistence.GeneratedValue; import javax.persistence.GenerationType; import javax.persistence.Id; import javax.persistence.Table; import org.hibernate.annotations.Type; @Entity @Table(name = "SuperHero") public class SuperHero implements Serializable { /** * */ private static final long serialVersionUID = -6712720661371583351L; @Id @GeneratedValue(strategy = GenerationType.AUTO) @Column(name = "super_hero_id") private Long superHeroId; @Column(name = "super_hero_name") private String name; @Column(name = "power_description") private String powerDescription; @Type(type = "yes_no") @Column(name = "isAwesome") private boolean isAwesome; public Long getSuperHeroId() { return superHeroId; } public void setSuperHeroId(Long superHeroId) { this.superHeroId = superHeroId; } public String getName() { return name; } public void setName(String name) { this.name = name; } public String getPowerDescription() { return powerDescription; } public void setPowerDescription(String powerDescription) { this.powerDescription = powerDescription; } public boolean isAwesome() { return isAwesome; } public void setAwesome(boolean isAwesome) { this.isAwesome = isAwesome; } } As i am using MySQL as the primary database, i have used the GeneratedValue strategy as GenerationType.AUTO which will do the auto incrementing whenever a new super hero is created. All other mappings are familiar to everyone with the exception of the last variable where we map a boolean to a Char field in the database. We use Hibernate's @Type annotation to represent true & false as Y & N within the database field. Hibernate has many @Type implementations which you can read about here. In this instance we have used this type. Ok now that we have our class to represent the Super Heroes, lets go on to see how our Justice League domain class looks like which keeps tab of all super heroes who have pledged allegiance to the League. package com.justice.league.domain; import java.io.Serializable; import java.util.ArrayList; import java.util.List; import javax.persistence.CascadeType; import javax.persistence.Column; import javax.persistence.Entity; import javax.persistence.FetchType; import javax.persistence.GeneratedValue; import javax.persistence.GenerationType; import javax.persistence.Id; import javax.persistence.JoinColumn; import javax.persistence.JoinTable; import javax.persistence.OneToMany; import javax.persistence.Table; @Entity @Table(name = "JusticeLeague") public class JusticeLeague implements Serializable { /** * */ private static final long serialVersionUID = 763500275393020111L; @Id @GeneratedValue(strategy = GenerationType.AUTO) @Column(name = "justice_league_id") private Long justiceLeagueId; @Column(name = "justice_league_moto") private String justiceLeagueMoto; @Column(name = "number_of_members") private Integer numberOfMembers; @OneToMany(cascade = { CascadeType.ALL }, fetch = FetchType.EAGER, orphanRemoval = true) @JoinTable(name = "JUSTICE_LEAGUE_SUPER_HERO", joinColumns = { @JoinColumn(name = "justice_league_id") }, inverseJoinColumns = { @JoinColumn(name = "super_hero_id") }) private List superHeroList = new ArrayList(0); public Long getJusticeLeagueId() { return justiceLeagueId; } public void setJusticeLeagueId(Long justiceLeagueId) { this.justiceLeagueId = justiceLeagueId; } public String getJusticeLeagueMoto() { return justiceLeagueMoto; } public void setJusticeLeagueMoto(String justiceLeagueMoto) { this.justiceLeagueMoto = justiceLeagueMoto; } public Integer getNumberOfMembers() { return numberOfMembers; } public void setNumberOfMembers(Integer numberOfMembers) { this.numberOfMembers = numberOfMembers; } public List getSuperHeroList() { return superHeroList; } public void setSuperHeroList(List superHeroList) { this.superHeroList = superHeroList; } } The important fact to note here is the annotation @OneToMany(cascade = { CascadeType.ALL }, fetch = FetchType.EAGER, orphanRemoval = true). Here we have set orphanRemoval = true. So what does that do exactly? Ok so say that you have a group of Super Heroes in your League. And say one Super Hero goes haywire. So we need to remove Him/Her from the League. With JPA cascade this is not possible as it does not detect Orphan records and you will wind up with the database having the deleted Super Hero(s) whereas your collection still has a reference to it. Prior to JPA 2.0 you did not have the orphanRemoval support and the only way to delete orphan records was to use the following Hibernate specific(or ORM specific) annotation which is now deprecated; @org.hibernate.annotations.Cascade(org.hibernate.annotations.CascadeType.DELETE_ORPHAN) But with the introduction of the attribute orphanRemoval, we are now able to handle the deletion of orphan records through JPA. Now that we have our Domain classes DAO & Service classes: To keep with good design standards i have separated the DAO(Data access object) layer and the service layer. So let us see the DAO interface and implementation. Note that i have used HibernateTemplatethrough HibernateDAOSupportso as to keep away any Hibernate specific detail out and access everything in a unified manner using Spring. package com.justice.league.dao; import org.springframework.transaction.annotation.Propagation; import org.springframework.transaction.annotation.Transactional; import com.justice.league.domain.JusticeLeague; @Transactional(propagation = Propagation.REQUIRED, readOnly = false) public interface JusticeLeagueDAO { public void createOrUpdateJuticeLeagure(JusticeLeague league); public JusticeLeague retrieveJusticeLeagueById(Long id); } In the interface layer i have defined the Transaction handling as Required. This is done so that whenever you do not need a transaction you can define that at the method level of that specific method and in more situations you will need a transaction with the exception of data retrieval methods. According to the JPA spec you need a valid transaction for insert/delete/update functions. So lets take a look at the DAO implementation; package com.justice.league.dao.hibernate; import org.springframework.beans.factory.annotation.Qualifier; import org.springframework.orm.hibernate3.support.HibernateDaoSupport; import org.springframework.transaction.annotation.Propagation; import org.springframework.transaction.annotation.Transactional; import com.justice.league.dao.JusticeLeagueDAO; import com.justice.league.domain.JusticeLeague; @Qualifier(value="justiceLeagueHibernateDAO") public class JusticeLeagueHibernateDAOImpl extends HibernateDaoSupport implements JusticeLeagueDAO { @Override public void createOrUpdateJuticeLeagure(JusticeLeague league) { if (league.getJusticeLeagueId() == null) { getHibernateTemplate().persist(league); } else { getHibernateTemplate().update(league); } } @Transactional(propagation = Propagation.NOT_SUPPORTED, readOnly = false) public JusticeLeague retrieveJusticeLeagueById(Long id){ return getHibernateTemplate().get(JusticeLeague.class, id); } } Here i have defined an @Qualifier to let Spring know that this is the Hibernate implementation of the DAO class. Note the package name which ends with hibernate. This as i see is a good design concept to follow where you separate your implementation(s) into separate packages to keep the design clean. Ok lets move on to the service layer implementation. The service layer in this instance is just acting as a mediation layer to call the DAO methods. But in a real world application you will probably have other validations, security related procedures etc handled within the service layer. package com.justice.league.service; import com.justice.league.domain.JusticeLeague; public interface JusticeLeagureService { public void handleJusticeLeagureCreateUpdate(JusticeLeague justiceLeague); public JusticeLeague retrieveJusticeLeagueById(Long id); } package com.justice.league.service.impl; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.beans.factory.annotation.Qualifier; import org.springframework.stereotype.Component; import com.justice.league.dao.JusticeLeagueDAO; import com.justice.league.domain.JusticeLeague; import com.justice.league.service.JusticeLeagureService; @Component("justiceLeagueService") public class JusticeLeagureServiceImpl implements JusticeLeagureService { @Autowired @Qualifier(value = "justiceLeagueHibernateDAO") private JusticeLeagueDAO justiceLeagueDAO; @Override public void handleJusticeLeagureCreateUpdate(JusticeLeague justiceLeague) { justiceLeagueDAO.createOrUpdateJuticeLeagure(justiceLeague); } public JusticeLeague retrieveJusticeLeagueById(Long id){ return justiceLeagueDAO.retrieveJusticeLeagueById(id); } } Few things to note here. First of all the @Component binds this service implementation with the name justiceLeagueService within the spring context so that we can refer to the bean as a bean with an id of name justiceLeagueService. And we have auto wired the JusticeLeagueDAO and defined an @Qualifier so that it will be bound to the Hibernate implementation. The value of the Qualifier should be the same name we gave the class level Qualifier within the DAO Implementation class. And Lastly let us look at the Spring configuration which wires up all these together; Spring configuration for the application: com.justice.league.**.* org.hibernate.dialect.MySQLDialect com.mysql.jdbc.Driver jdbc:mysql://localhost:3306/my_test root password true org.hibernate.dialect.MySQLDialect Note that i have used the HibernateTransactionManager in this instance as i am running it stand alone. If you are running it within an application server you will almost always use a JTA Transaction manager. I have also used auto creation of tables by hibernate for simplicity purposes. The packagesToScan property instructs to scan through all sub packages(including nested packaged within them) under the root package com.justice.league.**.* to be scanned for @Entity annotated classes. We have also bounded the session factory to the justiceLeagueDAO so that we can work with the Hibernate Template. For testing purposes you can have the tag create initially if you want, and let hibernate create the tables for you. Ok so now that we have seen the building blocks of the application, lets see how this all works by first creating some super heroes within the Justice League A simple main class to show how it all works: As the first example lets see how we are going to persist the Justice League with a couple of Super Heroes; package com.test; import java.util.ArrayList; import java.util.List; import org.springframework.context.ApplicationContext; import org.springframework.context.support.ClassPathXmlApplicationContext; import com.justice.league.domain.JusticeLeague; import com.justice.league.domain.SuperHero; import com.justice.league.service.JusticeLeagureService; public class TestSpring { /** * @param args */ public static void main(String[] args) { ApplicationContext ctx = new ClassPathXmlApplicationContext( "spring-context.xml"); JusticeLeagureService service = (JusticeLeagureService) ctx .getBean("justiceLeagueService"); JusticeLeague league = new JusticeLeague(); List superHeroList = getSuperHeroList(); league.setSuperHeroList(superHeroList); league.setJusticeLeagueMoto("Guardians of the Galaxy"); league.setNumberOfMembers(superHeroList.size()); service.handleJusticeLeagureCreateUpdate(league); } private static List getSuperHeroList() { List superHeroList = new ArrayList(); SuperHero superMan = new SuperHero(); superMan.setAwesome(true); superMan.setName("Clark Kent"); superMan.setPowerDescription("Faster than a speeding bullet"); superHeroList.add(superMan); SuperHero batMan = new SuperHero(); batMan.setAwesome(true); batMan.setName("Bruce Wayne"); batMan.setPowerDescription("I just have some cool gadgets"); superHeroList.add(batMan); return superHeroList; } } And if we go to the database and check this we will see the following output; mysql> select * from superhero; +---------------+-----------+-----------------+-------------------------------+ | super_hero_id | isAwesome | super_hero_name | power_description | +---------------+-----------+-----------------+-------------------------------+ | 1 | Y | Clark Kent | Faster than a speeding bullet | | 2 | Y | Bruce Wayne | I just have some cool gadgets | +---------------+-----------+-----------------+-------------------------------+ mysql> select * from justiceleague; +-------------------+-------------------------+-------------------+ | justice_league_id | justice_league_moto | number_of_members | +-------------------+-------------------------+-------------------+ | 1 | Guardians of the Galaxy | 2 | +-------------------+-------------------------+-------------------+ So as you can see we have persisted two super heroes and linked them up with the Justice League. Now let us see how that delete orphan works with the below example; package com.test; import java.util.ArrayList; import java.util.List; import org.springframework.context.ApplicationContext; import org.springframework.context.support.ClassPathXmlApplicationContext; import com.justice.league.domain.JusticeLeague; import com.justice.league.domain.SuperHero; import com.justice.league.service.JusticeLeagureService; public class TestSpring { /** * @param args */ public static void main(String[] args) { ApplicationContext ctx = new ClassPathXmlApplicationContext( "spring-context.xml"); JusticeLeagureService service = (JusticeLeagureService) ctx .getBean("justiceLeagueService"); JusticeLeague league = service.retrieveJusticeLeagueById(1l); List superHeroList = league.getSuperHeroList(); /** * Here we remove Batman(a.k.a Bruce Wayne) out of the Justice League * cos he aint cool no more */ for (int i = 0; i < superHeroList.size(); i++) { SuperHero superHero = superHeroList.get(i); if (superHero.getName().equalsIgnoreCase("Bruce Wayne")) { superHeroList.remove(i); break; } } service.handleJusticeLeagureCreateUpdate(league); } } Here we first retrieve the Justice League record by its primary key. Then we loop through and remove Batman off the League and again call the createOrUpdate method. As we have the remove orphan defined, any Super Hero not in the list which is in the database will be deleted. Again if we query the database we will see that batman has been removed now as per the following; mysql> select * from superhero; +---------------+-----------+-----------------+-------------------------------+ | super_hero_id | isAwesome | super_hero_name | power_description | +---------------+-----------+-----------------+-------------------------------+ | 1 | Y | Clark Kent | Faster than a speeding bullet | +---------------+-----------+-----------------+-------------------------------+ So that's it. The story of how Justice League used Hibernate to remove Batman automatically without being bothered to do it themselves. Next up look forward to how Captain America used Hibernate Criteria to build flexible queries in order to locate possible enemies. Watch out!!!! Have a great day people and thank you for reading!!!! If you have any suggestions or comments pls do leave them by. From http://dinukaroshan.blogspot.com/2011/09/hibernate-by-example-part-1-orphan.html

September 24, 2011

by Dinuka Arseculeratne

· 47,829 Views

Practical PHP Refactoring: Replace Record with Data Class

We often find ourselves tempted by the shortcut of using directly a record-like data structure provided by the language or a framework. There are many example scenarios where a record emerges: when you use a database for persistence (not only a with relational one) there can be a data structure containing the results of a query. when you use associative arrays, often they have a small number of fixed keys. Record-like structures are the equivalent of C structs, or Ruby hashes. This refactoring is a generalization of Replace Array with Object: in this case the starting point is not just an array which always has the same number and type of fields, but any data structure homoegeneous to it: Zend_Db_Table_Row, implementing a Row Data Gateway and giving access to a row in the database. stdClass instances (or arrays) fetched by PDOStatement. In some languages (see C) the record is a particular data structure which is not an object; in PHP, it is always an array or an object of some vendor class. Even ORMs based on Active Record are more advanced than this kind of usage: they usually let you add methods on the model classes, which are populated with data by the ORM itself. In this refactoring, we are talking about a data structure managed by the language of the persistence layer, and whose code you cannot modify. Why replacing a record-like structure? Generic classes do not let you add methods to manipulate their data: when using directly a Zend_Db_Table_Row or an associative array for storage of the result of a query, you have to resort continuously to foreign methods. Each time you need new logic, you have to compromise encapsulation on the object itself, and these methods get duplicated in various instance of client code. Solutions There are different ways to eliminate the coupling to a record-like structure. The first is to refactor to subclassing, where the data structure becomes an Active Record. You extends the vendor class with your own one: this option is only available if the structure is defined as an object. Moreover, the name of the class to instantiate must be configurable in the persistence mechanism. Zend_Db_Table_Record supports this kind of usage. A second option is to refactor to composition: Zend_Db examples exist also for this case. This approach can be used to avoid large hierarchies: your model classes compose the Zend_Db objects and hide the database; methods can be added for once at your own models. A third and final alternative is to use hydration: data is copied to your model objects, and records are thrown away after the fact. Doctrine 2 and Data Mappers in general choose this approach. Steps I will describe a short procedure for refactoring to hydration since it is the most complex approach and it is always applicable. The other are instead specific for the particular data structure (for example with Zend_Db you have to write some subclasses and configure some protected fields to contain the right class names.) Create a new class, as one of your models. Its state should be represented by one row (or more rows joined into one) in the database. This class should gain a private field for each of the record's fields, usually with getters and setters. This class should accept in the constructor or in a Factory Method an instance of the record, so that it can produce a new instance. If you want to decouple from the persistsnce, or you want to go two ways (also save and not only visualize, since this is not just a presentation model), look for a Data Mapper such as Doctrine 2, which will even hide all the record structures from you and manager associations where objects compose other ones. Example In the example, the data of a single user are returned in an array. I chose an array as the record-like structure to minimize the external dependencies of this code. find(42); $this->assertEquals('Giorgio', $giorgio['name']); } } /** * This is a Fake Table Data Gateway. The machinery for making it work with * a database will be distracting for our purposes, so they will be omitted. */ class UsersTable { /** * @return mixed the returned value can be a Zend_Db_Table_Row, * an Active Record, a stdClass, an associative array... * It should just represent a single entity. */ public function find($id) { // execute a PDOStatement and fetch the data return array('id' => 42, 'name' => 'Giorgio'); } } After the refactoring, we have a User class where we can add all the methods we need: find(42); $this->assertEquals('Giorgio', $giorgio->getName()); } } /** * This is a Fake Table Data Gateway. The machinery for making it work with * a database will be distracting for our purposes, so they will be omitted. */ class UsersTable { /** * @return mixed the returned value can be a Zend_Db_Table_Row, * an Active Record, a stdClass, an associative array... * It should just represent a single entity. */ public function find($id) { // execute a PDOStatement and fetch the data return User::fromRecord(array('id' => 42, 'name' => 'Giorgio')); } } class User { private $id; private $name; public static function fromRecord(array $record) { $object = new self(); $object->id = $record['id']; $object->name = $record['name']; return $object; } public function getName() { return $this->name; } }

September 19, 2011

by Giorgio Sironi

· 11,058 Views

EC2 Interview – AWS Interview – Cloud Interview – 8 Questions

If you're looking for a cloud expert, specifically someone who knows Amazon Web Services and EC2, you'll want to have a battery of questions to assess their knowledge.

September 15, 2011

by Sean Hull

· 111,875 Views · 1 Like