Data Resources

The Latest Data Topics

kendo ui is html 5 and jquery based framework and it helps you to create modern web applications. kendo ui helps you in data binding in animations with ui widgets like grid and chart with drag and drop api in touch support. download kendo ui from here once you download you get these folders: navigate to the 'example' folder for examples of various widgets. if you want to start developing web applications using kendoui then you need to add the required file in your project. you need to add the below files in the script folder: and you need to add the below files in the style folder: even though i have added script files and css files in the script folder and style folders respectively, you are free to keep them anywhere you want. after adding these files you need to link them in the header of the html page. you can add the reference as below: in a later post i will go into the details of kendo ui and play around with all other aspects. however, working with any widgets is very intuitive. for example, if you want to work with kendo autocomplete , you can do that as below: and using jquery you can assign the value as below: putting all html and script code together: test.htm kendo ui demo when you run test.htm in your browser, you should get this output: in later posts i will get into detail about all widgets. i hope this post is useful. thanks for reading. source: http://debugmode.net/2012/02/18/introduction-to-telerik-kedno-ui/

February 20, 2012

by Dhananjay Kumar

· 19,826 Views · 1 Like

Mining Data from PDF Files with Python

PDF files aren't pleasant. The good news is that they're documented (http://www.adobe.com/devnet/pdf/pdf_reference.html). The bad news is that they're rather complex. I found four Python packages for reading PDF files. http://pybrary.net/pyPdf/ - weak http://www.swftools.org/gfx_tutorial.html - depends on binary XPDF http://blog.didierstevens.com/programs/pdf-tools/ - limited http://www.unixuser.org/~euske/python/pdfminer/ - acceptable I elected to work with PDFMiner for two reasons. (1) Pure Python, (2) Reasonably Complete. This is not, however, much of an endorsement. The implementation (while seemingly correct for my purposes) needs a fair amount of cleanup. Here's one example of remarkably poor programming. # Connect the parser and document objects. parser.set_document(doc) doc.set_parser(parser) Only one of these two is needed; the other is trivially handled as part of the setter method. Also, the package seems to rely on a huge volume of isinstance type checking. It's not clear if proper polymorphism is even possible. But some kind of filter that picked elements by type might be nicer than a lot of isinstance checks. Annotation Extraction While shabby, the good news is that PDFMiner seems to reliably extract the annotations on a PDF form. In a couple of hours, I had this example of how to read a PDF document and collect the data filled into the form. from pdfminer.pdfparser import PDFParser, PDFDocument from pdfminer.psparser import PSLiteral from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter, PDFTextExtractionNotAllowed from pdfminer.pdfdevice import PDFDevice from pdfminer.pdftypes import PDFObjRef from pdfminer.layout import LAParams, LTTextBoxHorizontal from pdfminer.converter import PDFPageAggregator from collections import defaultdict, namedtuple TextBlock= namedtuple("TextBlock", ["x", "y", "text"]) class Parser( object ): """Parse the PDF. 1. Get the annotations into the self.fields dictionary. 2. Get the text into a dictionary of text blocks. The key to the dictionary is page number (1-based). The value in the dictionary is a sequence of items in (-y, x) order. That is approximately top-to-bottom, left-to-right. """ def __init__( self ): self.fields = {} self.text= {} def load( self, open_file ): self.fields = {} self.text= {} # Create a PDF parser object associated with the file object. parser = PDFParser(open_file) # Create a PDF document object that stores the document structure. doc = PDFDocument() # Connect the parser and document objects. parser.set_document(doc) doc.set_parser(parser) # Supply the password for initialization. # (If no password is set, give an empty string.) doc.initialize('') # Check if the document allows text extraction. If not, abort. if not doc.is_extractable: raise PDFTextExtractionNotAllowed # Create a PDF resource manager object that stores shared resources. rsrcmgr = PDFResourceManager() # Set parameters for analysis. laparams = LAParams() # Create a PDF page aggregator object. device = PDFPageAggregator(rsrcmgr, laparams=laparams) # Create a PDF interpreter object. interpreter = PDFPageInterpreter(rsrcmgr, device) # Process each page contained in the document. for pgnum, page in enumerate( doc.get_pages() ): interpreter.process_page(page) if page.annots: self._build_annotations( page ) txt= self._get_text( device ) self.text[pgnum+1]= txt def _build_annotations( self, page ): for annot in page.annots.resolve(): if isinstance( annot, PDFObjRef ): annot= annot.resolve() assert annot['Type'].name == "Annot", repr(annot) if annot['Subtype'].name == "Widget": if annot['FT'].name == "Btn": assert annot['T'] not in self.fields self.fields[ annot['T'] ] = annot['V'].name elif annot['FT'].name == "Tx": assert annot['T'] not in self.fields self.fields[ annot['T'] ] = annot['V'] elif annot['FT'].name == "Ch": assert annot['T'] not in self.fields self.fields[ annot['T'] ] = annot['V'] # Alternative choices in annot['Opt'] ) else: raise Exception( "Unknown Widget" ) else: raise Exception( "Unknown Annotation" ) def _get_text( self, device ): text= [] layout = device.get_result() for obj in layout: if isinstance( obj, LTTextBoxHorizontal ): if obj.get_text().strip(): text.append( TextBlock(obj.x0, obj.y1, obj.get_text().strip()) ) text.sort( key=lambda row: (-row.y, row.x) ) return text def is_recognized( self ): """Check for Copyright as well as Revision information on each page.""" bottom_page_1 = self.text[1][-3:] bottom_page_2 = self.text[2][-3:] pg1_rev= "Rev 2011.01.17" == bottom_page_1[2].text pg2_rev= "Rev 2011.01.17" == bottom_page_2[0].text return pg1_rev and pg2_rev This gives us a dictionary of field names and values. Essentially transforming the PDF form into the same kind of data that comes from an HTML POST request. An important part is that we don't want much of the background text. Just enough to confirm the version of the form file itself. The cryptic text.sort( key=lambda row: (-row.y, row.x) ) will sort the text blocks into order from top-to-bottom and left-to-right. For the most part, a page footer will show up last. This is not guaranteed, however. In a multi-column layout, the footer can be so close to the bottom of a column that PDFMiner may put the two text blocks together. The other unfortunate part is the extremely long (and opaque) setup required to get the data from the page. Source: http://slott-softwarearchitect.blogspot.com/2012/02/pdf-reading.html

February 14, 2012

by Steven Lott

· 96,967 Views · 1 Like

StAXON - JSON via StAX

XML is for dinosaurs, right? Everybody uses JSON these days. So you do, don’t you? But what about things like XSD, XSLT, JAXB, XPath, etc – is it all evil? In this article, I’d like to introduce the StAXON project (APL2) which tries to give you the best from both worlds: JSON outside, but XML inside. One benefit from this is that you can integrate JSON with powerful XML-related technologies for free. StAXON lets you read and write JSON using the Java Streaming API for XML (javax.xml.stream), also known as StAX. More specifically, StAXON provides implementations of the StAX Cursor API (XMLStreamReader and XMLStreamWriter) StAX Event API (XMLEventReader and XMLEventWriter) StAX Factory API (XMLInputFactory and XMLOutputFactory) for JSON. You may know the Jettison project, which also has XMLStreamReader and XMLStreamWriter implementations. However, StAXON aims to provide a more comprehensive and consistent solution and tries to avoid some of the issues users are having with Jettison. Anyway, let’s get started and see what this “anti-aging substance” for XML can do. Setup Add the following dependency to your Maven POM file: de.odysseus.staxon staxon 1.0 or get the latest StAXON JAR from the Downloads page and add it to your classpath. Mapping Convention The purpose of StAXON’s mapping convention is to generate a more compact JSON. It borrows the "$" syntax for text elements from the Badgerfish convention but attempts to avoid needless text-only JSON objects: Element names become object properties: <–> {"alice":null} Attributes go in properties whose name begin with "@": <–> {"alice":{"@charlie":"david"} Text-only elements go to a simple key/value property: bob <–> {"alice":"bob"} Otherwise, text content is mapped to the "$" property: bob <–> {"alice":{"@charlie":"david","$":"bob"} Nested elements go to nested properties: charlie <–> {"alice":{"bob":"charlie"} A default namespace declaration goes in the element’s "@xmlns" property: <–> {"alice":{"@xmlns":"http://foo.com"} A prefixed namespace declaration goes in the element’s "@xmlns:" property: John Doe555-1111 However, with our JSON-based writer, the output is {"customer":{"name":"John Doe","phone":"555-1111"} Reading JSON Create a JSON-based reader: String json = "{\"customer\":{\"name\":\"John Doe\",\"phone\":\"555-1111\"}"; XMLInputFactory factory = new JsonXMLInputFactory(); XMLStreamReader reader = factory.createXMLStreamReader(new StringReader(json)); Read your document: assert reader.getEventType() == XMLStreamConstants.START_DOCUMENT; reader.nextTag(); assert reader.isStartElement() && "customer".equals(reader.getLocalName()); reader.next(); assert reader.isStartElement() && "name".equals(reader.getLocalName()); reader.next(); assert reader.hasText() && "John Doe".equals(reader.getText()); reader.nextTag(); assert reader.isEndElement(); reader.next(); assert reader.isStartElement() && "phone".equals(reader.getLocalName()); reader.next(); assert reader.hasText() && "555-111".equals(reader.getText()); reader.nextTag(); assert reader.isEndElement(); reader.next(); assert reader.isEndElement(); reader.next(); assert reader.getEventType() == XMLStreamConstants.END_DOCUMENT; reader.close(); Factory Configuration The JsonXMLInputFactory and JsonXMLOutputFactory classes can be configured via the standard setProperty(String, Object) API. The factory classes define several constants for properties they support. However, the JsonXMLConfig interface provides a convenient way to hold the configuration of both - input and output - factories: JsonXMLConfig config = new JsonXMLConfigBuilder(). virtualRoot("customer"). prettyPrint(true). build(); XMLInputFactory inputFactory = new JsonXMLInputFactory(config); ... XMLOutputFactory outputFactory = new JsonXMLOutputFactory(config); ... Virtual Roots Set the virtualRoot configuration property to strip the root element from the JSON representation, e.g. { "name" : "John Doe", "phone" : "555-1111" } As XML requires a single root element, but JSON documents often don’t have one, this is an important feature required to read and write existing JSON formats. Mastering Arrays What about JSON arrays? Unfortunately, there’s nothing like this in XML. And to be honest, this causes most of the trouble when writing JSON via an XML API like StAX. Simply omitting the array boundaries would lead to non-unique JSON properties, which is usually not desired. StAXON provides several ways to deal with JSON arrays. At the core is the idea to leverage XML processing instructions to tell the writer about to start an array: the processing instruction maps a sequence of XML elements with the same name to a JSON array. The processing instruction optionally takes the array element tag name (with prefix) as data. There’s no end array hint as StAXON detects the end of an array sequence and closes it automatically. Consider the following JSON document: { "alice" : { "bob" : [ "edgar", "charlie" ], "peter" : null } } In order to get a "bob" array instead of two separate "bob" properties, we need to provide XML events corresponding to edgar charlie I.e., with the cursor API, you would just insert writer.writeProcessingInstruction(JsonXMLStreamConstants.MULTIPLE_PI_TARGET); // to start an array. Initiating Arrays with Element Paths Sometimes it is not desired or even impossible to generate processing instruction to control arrays. This may be the case if the actual writing isn’t done by your code, but some other framework like JAXB or similar, and you only provide a stream writer. Addressing such a scenario, wouldn’t it be nice being able to tell the writer beforehand, which elements should trigger a JSON array? This is where the XMLMultipleStreamWriter and XMLMultipleEventWriter wrappers step in. E.g., to specify a sequence of bob elements below root element alice as a multiple path: writer = new XMLMultipleStreamWriter(writer, true, "/alice/bob"); The boolean parameter specifies whether our paths include the root node (alice) from the paths. That is, we could also use writer = new XMLMultipleStreamWriter(writer, false, "/bob"); To wrap all bob fields into arrays (not just alice children), we can use a relative path, without a leading slash: writer = new XMLMultipleStreamWriter(writer, false, "bob"); Now we (or some legacy code, framework, …) may write our document, and the writer will take care to trigger the bob array for us. Triggering Arrays automatically Finally, if nothing else works for you, you may also let StAXON fully automatically determine array boundaries. Use this only if you cannot provide processing instructions and cannot provide the paths of the elements that should be wrapped into JSON arrays. However, using this method has several drawbacks: The writer basically needs to cache the entire document in memory, eating both space and time. The writer will not be able to produce empty arrays or arrays with a single element. To enable this feature, set the JsonXMLOutputFactory.PROP_AUTO_ARRAY property to true. Triggering Document Arrays StAXON’s writer implementation allows you to wrap a sequence of documents into a JSON array. To do this, write the PI before writing anything else: writer.writeProcessingInstruction(JsonXMLStreamConstants.MULTIPLE_PI_TARGET); writer.writeStartDocument(); // first array component ... writer.writeEndDocument(); writer.writeStartDocument(); // second array component ... writer.writeEndDocument(); ... writer.close(); The writer.close() call is crucial here, as it will close the JSON array. Using JAXB Consider a JAXB-annotated Customer class: @JsonXML(virtualRoot = true, prettyPrint = true, multiplePaths = "phone") @XmlRootElement public class Customer { public String name; public List phone; } The @JsonXML annotation is used to configure the mapping details. In the above example, the customer root element is stripped from the JSON representation, phone elements are wrapped into an array and JSON output is nicely formatted, e.g. { "name" : "John Doe", "phone" : [ "555-1111" ] } Now, the JsonXMLMapper class enables for dead-simple mapping to and from JSON: /* * Create mapper instance. */ JsonXMLMapper mapper = new JsonXMLMapper(Customer.class); /* * Read customer. */ InputStream input = getClass().getResourceAsStream("input.json"); Customer customer = mapper.readObject(input); input.close(); /* * Write back to console */ mapper.writeObject(System.out, customer); Using JAX-RS StAXON provides the staxon-jaxrs module, which enables your RESTful services to serialize/deserialize JAXB-annotated classes to/from JSON. It includes the following JAX-RS @Provider classes: de.odysseus.staxon.json.jaxrs.jaxb.JsonXMLObjectProvider is used to read and write JSON objects de.odysseus.staxon.json.jaxrs.jaxb.JsonXMLArrayProvider is used to read and write JSON arrays In order to select the StAXON message body readers/writers for your resource, a @JsonXML annotation is required. When used with JAX-RS, the @JsonXML annotation can be placed on a model type (@XmlRootElement or @XmlType) to configure its serialization and deserialization a JAX-RS resource method to configure serialization of the result type a parameter of a JAX-RS resource method to configure deserialization of the parameter type If a @JsonXML annotation is present at a model type and a resource method or parameter, the latter will override the model type annotation. If neither is present, StAXON will not handle the resource. You can find a sample project using Jersey with StAXON here. Using XPath XPath is another standard that can be easily adopted for use with JSON. The Java XPath API (javax.xml.xpath) doesn’t let us provide an XMLStreamReader or similar as a source, but requires a Document Object Model (DOM). Therefore, we need to read our JSON into a DOM first to apply expressions against that DOM. This could be done by performing an XSLT identity transformation to a DOMResult. However, StAXON provides the DOMEventConsumer class to translate XML events to DOM nodes, which should be faster and simpler than leveraging XSLT. Once we have a DOM, there’s nothing special with applying XPath expressions. StringReader json = new StringReader("{\"edgar\":\"david\",\"bob\":\"charlie\"}"); /* * Our sample JSON has no root element, so specify "alice" as virtual root */ JsonXMLConfig config = new JsonXMLConfigBuilder().virtualRoot("alice").build(); /* * create event reader */ XMLEventReader reader = new JsonXMLInputFactory(config).createXMLEventReader(json); /* * parse JSON into Document Object Model (DOM) */ Document document = DOMEventConsumer.consume(reader); /* * evaluate an XPath expression */ XPath xpath = XPathFactory.newInstance().newXPath(); System.out.println(xpath.evaluate("//alice/bob", document)); Running the above sample will print charlie to the console. What else? In the end, using an XML API to read and write JSON may still look like a compromise, but it may turn out to be a good choice. The availability of a StAX implementation for JSON acts as a door opener to powerful XML related technologies and easily enables for dual-format (XML and JSON) services. There’s more we can do with StAXON: XSD, XSLT, XQuery, XML-JSON/JSON-XML conversions, to name a few. Please check the Wiki for some of those.

February 8, 2012

by Christoph Beck

· 22,938 Views

Algorithm of the Week: Data Compression with Prefix Encoding

Prefix encoding, sometimes called front encoding, is yet another algorithm that tries to remove duplicated data in order to reduce its size. Its principles are simple, however this algorithm tends to be difficult to implement. To understand why, first let’s take a look at its nature. Have a look on the following dictionary. use used useful usefulness uselss uselessly uselessness Instead of keeping all these words in plain text or transferring all them over a network, we can compress (encode) them with prefix encoding. It’s clear that each of these words begins with the prefix “use” which is also the first word from the list. So we can easily compress them into the following array. $data = array( 0 => 'use', 1 => '0d', 2 => '0ful', 3 => '0fully', 4 => '0less', 5 => '0lessly', 6 => '0lessness', ); It’s clear that this is not the best compression and we can go even further by using not only the first word as prefix. $data = array( 0 => 'use', 1 => '0d', 2 => '0ful', 3 => '2ly', 4 => '0less', 5 => '4ly', 6 => '4ness', ); Now the compression is better and the good news is that decompression is a fairly simple process. However the tricky part is compression itself. The problem is that it is quite difficult to chose an appropriate prefix. In our first example this is simple, but most of the time in practice we will have more heterogeneous data. Indeed the process of compression can be very difficult for randomly generated data and the algorithm will be not only slow, but difficult to implement. The good thing is that this algorithm can be used in many cases once we know the data format in advance. So let’s see three examples where this algorithm can be very handy. Application Here are three examples of prefix encoding. As I stated above, the process of compression can be very difficult for random data, so it is a good practice to use it only if you know in advance the format of the input data. Date and time prefixes We humans often skip the first two digits of a year, so for instance we don’t always write 1995 or 1996, but we use the shorter – ‘95 and ‘96. Thus years can be encoded with shorter strings. input: (1991, 1992, 1993, 1994, 1995, 1996 output: (91, 92, 93, 94, 95, 96) The problem is that with small changes of the input stream we can confuse the decoder. Thus if we add years from the 21st century we lose the uniqueness of the data. input: (1998, 1992, 1999, 2011, 2012) output: (98, 92, 99, 11, 12) Now the decoder can decode the last two values as (1911, 1912) as “19” is considered to be the prefix. So we must know in advance that our prefix is absolutely equal for each of the values. If not the encoding format must be different. For instance we can also encode the prefix, with some special marker. input: (1998, 1992, 1932, 1924, 2001, 2012) output: (#19, 98, 92, 32, 24, #20, 01, 12) Once the decoder reads the # character it will know to decode the following number as prefix. This can be used in practice for date and time formats. Let’s say we have some datetime values, but we know that all of them are in the same day. 2012-01-31 15:33:45 2012-01-31 16:12:11 2012-01-31 17:32:35 2012-01-31 18:54:34 Obviously we can omit the date part of these strings and send (keep) only the time. Once again, we must be absolutely sure that all these values are in the same day. If not, we can use the encoding strategy of the previous example. Phone numbers Phone numbers are the typical case of prefix encoding. Not only the international code, but also the mobile network operators use prefixes for their phone numbers. Thus if we have to transfer phone numbers from, let’s say the UK, we can replace the leading “+44” with something shorter. If you happen to code a phone book for a mobile device you can save some space by compressing the data using prefix encoding and thus the user will have more space and will store more phone numbers on his or her mobile device. Phone number prefixes can be also used for database normalization. Thus you can store them in a separate db table and leave only the unique numbers from the phonebook. Geo Coordinates Using the same example from my previous post we can send GEO coordinates by removing a common prefix, for large levels of zoom. Indeed when you need to send lots of markers to your map application you can expect all of these markers to be fairly close to each other in large zoom level. On large zoom levels we can expect markers to have the same prefix. Now the coordinates of those points can have a common prefix, like the example bellow with the Subway stations. LatLon(40.762959,-73.985989) LatLon(40.761886,-73.983629) LatLon(40.762861,-73.981612) LatLon(40.764616,-73.98056) We can see that all of these GEO points have the same prefix (40.76x, -73.98x), so we can send the prefix only once. Prefix: (40.76,-73.98) Data: LatLon(2959,5989) LatLon(1886,3629) LatLon(2861,1612) LatLon(4616,056) These are only three examples of prefix encoding and this algorithm is very useful when transferring homogeneous data. Suffix Encoding Suffix encoding is practically the same algorithm as prefix encoding, with the small difference that we use to encode duplicating suffixes. Like the examples below, suffix encoding can be useful in replacing repeating last name suffixes. Johsnon Clark Jackson Or company names. Apple Inc. Google Inc. Yahoo! Inc. Here we can replace “ Inc.” with something else, but shorter. Related posts: Computer Algorithms: Data Compression with Relative Encoding Computer Algorithms: Data Compression with Run-length Encoding Computer Algorithms: Data Compression with Diagram Encoding and Pattern Substitution Source: http://www.stoimen.com/blog/2012/02/06/computer-algorithms-data-compression-with-prefix-encoding/

February 7, 2012

by Stoimen Popov

· 19,440 Views

Using the Android Parcel

A short definition of an Android Parcel would be that of a message container for lightweight, high-performance Inter-process communication (IPC). On Android, a "process" is a standard Linux one, and one process cannot normally access the memory of another process, so with Parcels, the Android system decomposes objects into primitives that can be marshaled/unmarshaled across process boundaries. But Parcels can also be used within the same process, to pass data across different components of a same application. As an example, a typical Android application has several screens, called "Activities" , and needs to communicate data or action from one Activity to the next. To write an object than can be passed through, we can implement the Parcelable interface. Android itself provides a built-in Parcelable object called an Intent which is used to pass information from one component to another. Using an Intent is pretty straightforward. Let's say we're collecting user data from our initial screen called CollectDataActivity. // inside CollectDataActivity, construct intent to pass along the next Activity, i.e. screen Intent in = new Intent(this, ProcessDataActivity.class); in.putExtra("userid", id); // (key,value) pairs in.putExtra("age", age); in.putExtra("phone", phone); in.putExtra("is_registered", true); // call next Activity --> next screen comes up startActivity(in); We need to collect that information from our data collection screen to process it. So all we do is the following: // inside ProcessDataActivity, get the info needed from previous Activity Intent in = this.getIntent(); in.getLongExtra("userid", 0L); in.getIntExtra("age", 0); in.getStringExtra("phone"); in.getBooleanExtra("is_registered", false); // false = default value overridden by user input Again, pretty straightforward. We retrieve the data using the same keys used to send it, and using our Intent's corresponding methods for each data type. But even when communicating with Intents, we can still use Parcels to pass data within the intent. For instance, we can do the above in a more elegant way using a custom, Parcelable User class: In the first Activity: // in CollectDataActivity, populate the Parcelable User object using its setter methods User usr = new User(); usr.setId(id); // collected from user input// etc.. // pass it to another component Intent in = new Intent(this, ProcessDataActivity.class); in.putExtra("user", usr); startActivity(in); In the second Activity: // in ProcessDataActivity retrieve User Intent intent = getIntent(); User usr = (User) intent.getParcelableExtra("user"); And this is what a Parcelable User class looks like: import android.os.Parcel; import android.os.Parcelable; public class User implements Parcelable { private long id; private int age; private String phone; private boolean registered; // No-arg Ctor public User(){} // all getters and setters go here //... /** Used to give additional hints on how to process the received parcel.*/ @Override public int describeContents() { // ignore for now return 0; } @Override public void writeToParcel(Parcel pc, int flags) { pc.writeLong(id); pc.writeInt(age); pc.writeString(phone); pc.writeInt( registered ? 1 :0 ); } /** Static field used to regenerate object, individually or as arrays */ public static final Parcelable.Creator CREATOR = new Parcelable.Creator() { public User createFromParcel(Parcel pc) { return new User(pc); } public User[] newArray(int size) { return new User[size]; } }; /**Ctor from Parcel, reads back fields IN THE ORDER they were written */ public User(Parcel pc){ id = pc.readLong(); age = pc.readInt(); phone = pc.readString(); registered = ( pc.readInt() == 1 ); } } What we did was: Make our User class implement the Parcelable interface. Parcelable is not a marker interface, hence what follows: Implement its describeContents method, which in this case does nothing. Implement its abstract method writeToParcel, which takes the current state of the object and writes it to a Parcel Add a static field called CREATOR to our class, which is an object implementing the Parcelable.Creator interface Add a Constructor that takes a Parcel as parameter. The CREATOR calls that constructor to rebuild our object. This looks like a lot of extra code at first, but bear in mind that, as in most cases, our application might evolve into incorporating more data from the user... Sometimes we need to pass complex objects from one component to another, and passing an object yields a cleaner design. The same logic applies for communicating between an Activity (foreground UI) and a background Service. We would just call the startService method instead of startActivity and pass it our Parcelable User object. Note that a Service is not running in a separate process by default. At this point, there are a couple of questions that may be raised: Isn't using an IPC-friendly, custom object for in-process communication simply overkill? Why would we want to use Parcelable, when we already have built-in Java serialization? The answer to the first concern is...maybe. But communicating through a custom object than through a list of key-value pairs is more OO, and it has no noticeable negative performance impact. As for the second question, why not simply have User implement Serializable, a theoretically simpler, marker interface? In one word, performance. Using Parcels is more efficient than serializing, at the price of some added complexity. That extra efficiency has in turn its limits: passing an image ( Bitmap) using Parcelable is generally not a good idea (although Bitmap does in fact implement Parcelable). A much more memory-efficient way would be to pass only its URI or Resource ID, so that other Android components in your application can have access to it. Another limitation of Parcelable is that it must not be used for general-purpose serialization to storage, since the underlying implementation may vary with different versions of the Android OS. So yes, Parcels are faster by design, but as high-performance transport, not as a replacement for general-purpose serialization mechanism. Having said all that, since our User object is Parcelable, it can now be sent from this application to another one running in another process, in particular through an interface implementing a remote service. In an upcoming post, we'll look at IPC and Android's Interface Definition Language (AIDL). from Tony's Blog

February 4, 2012

by Tony Siciliani

· 59,032 Views · 1 Like

wxPython: wx.ListCtrl Tips and Tricks

Previously, we covered some tips and tricks for the Grid control. In this article, we will go over a few tips and tricks for the wx.ListCtrl widget when it’s in “report” mode. Take a look at the tips below: How to create a simple ListCtrl How to sort the rows of a ListCtrl How to make the ListCtrl cells editable in place Associating objects with ListCtrl rows Alternate the row colors of a ListCtrl How to create a simple ListCtrl The list control is a pretty common widget. In Windows, you will see the list control in Windows Explorer. It has four modes: icon, small icon, list, and report. They roughly match up with icons, tiles, list, and details views in Windows Explorer respectively. We’re going to focus on the ListCtrl in Report mode because that’s the mode that most developers use it in. Here’s a simple example of how to create a list control: import wx ######################################################################## class MyForm(wx.Frame): #---------------------------------------------------------------------- def __init__(self): wx.Frame.__init__(self, None, wx.ID_ANY, "List Control Tutorial") # Add a panel so it looks the correct on all platforms panel = wx.Panel(self, wx.ID_ANY) self.index = 0 self.list_ctrl = wx.ListCtrl(panel, size=(-1,100), style=wx.LC_REPORT |wx.BORDER_SUNKEN ) self.list_ctrl.InsertColumn(0, 'Subject') self.list_ctrl.InsertColumn(1, 'Due') self.list_ctrl.InsertColumn(2, 'Location', width=125) btn = wx.Button(panel, label="Add Line") btn.Bind(wx.EVT_BUTTON, self.add_line) sizer = wx.BoxSizer(wx.VERTICAL) sizer.Add(self.list_ctrl, 0, wx.ALL|wx.EXPAND, 5) sizer.Add(btn, 0, wx.ALL|wx.CENTER, 5) panel.SetSizer(sizer) #---------------------------------------------------------------------- def add_line(self, event): line = "Line %s" % self.index self.list_ctrl.InsertStringItem(self.index, line) self.list_ctrl.SetStringItem(self.index, 1, "01/19/2010") self.list_ctrl.SetStringItem(self.index, 2, "USA") self.index += 1 #---------------------------------------------------------------------- # Run the program if __name__ == "__main__": app = wx.App(False) frame = MyForm() frame.Show() app.MainLoop() As you can probably tell from the code above, it’s really easy to create a ListCtrl instance. Notice that we set the style to report mode using the wx.LC_REPORT flag. To add column headers, we call the ListCtrl’s InsertColumn method and pass an integer to tell the ListCtrl which column is which and a string for the user’s convenience. Yes, the columns are zero-based, so the first column is number zero, the second column is number one, etc. The next important piece is contained in the button’s event handler, add_line, where we learn how to add rows of data to the ListCtrl. The typical method to use is the InsertStringItem method. If you wanted an image added to each row as well, then you’d use a more complicated method like InsertColumnInfo along with the InsertImageStringItem method. You can see how to use them in the wxPython demo. We’re sticking with the easy stuff in this article. Anyway, when you call InsertStringItem you give it the correct row index and a string. You use the SetStringItem method to set the data for the other columns of the row. Notice that the SetStringItem method requires three parameters: the row index, the column index and a string. Lastly, we increment the row index so we don’t overwrite anything. Now you can get out there and make your own! Let’s continue and find out how to sort rows! How to sort the rows of a ListCtrl The ListCtrl widget has had some extra scripts written for it that add functionality to the widget. These scripts are called mixins. You can read about them here. For this recipe, we’ll be using the ColumnSorterMixin mixin. The code below is a stripped down version of one of the wxPython demo examples. import wx import wx.lib.mixins.listctrl as listmix musicdata = { 0 : ("Bad English", "The Price Of Love", "Rock"), 1 : ("DNA featuring Suzanne Vega", "Tom's Diner", "Rock"), 2 : ("George Michael", "Praying For Time", "Rock"), 3 : ("Gloria Estefan", "Here We Are", "Rock"), 4 : ("Linda Ronstadt", "Don't Know Much", "Rock"), 5 : ("Michael Bolton", "How Am I Supposed To Live Without You", "Blues"), 6 : ("Paul Young", "Oh Girl", "Rock"), } ######################################################################## class TestListCtrl(wx.ListCtrl): #---------------------------------------------------------------------- def __init__(self, parent, ID=wx.ID_ANY, pos=wx.DefaultPosition, size=wx.DefaultSize, style=0): wx.ListCtrl.__init__(self, parent, ID, pos, size, style) ######################################################################## class TestListCtrlPanel(wx.Panel, listmix.ColumnSorterMixin): #---------------------------------------------------------------------- def __init__(self, parent): wx.Panel.__init__(self, parent, -1, style=wx.WANTS_CHARS) self.index = 0 self.list_ctrl = TestListCtrl(self, size=(-1,100), style=wx.LC_REPORT |wx.BORDER_SUNKEN |wx.LC_SORT_ASCENDING ) self.list_ctrl.InsertColumn(0, "Artist") self.list_ctrl.InsertColumn(1, "Title", wx.LIST_FORMAT_RIGHT) self.list_ctrl.InsertColumn(2, "Genre") items = musicdata.items() index = 0 for key, data in items: self.list_ctrl.InsertStringItem(index, data[0]) self.list_ctrl.SetStringItem(index, 1, data[1]) self.list_ctrl.SetStringItem(index, 2, data[2]) self.list_ctrl.SetItemData(index, key) index += 1 # Now that the list exists we can init the other base class, # see wx/lib/mixins/listctrl.py self.itemDataMap = musicdata listmix.ColumnSorterMixin.__init__(self, 3) self.Bind(wx.EVT_LIST_COL_CLICK, self.OnColClick, self.list_ctrl) sizer = wx.BoxSizer(wx.VERTICAL) sizer.Add(self.list_ctrl, 0, wx.ALL|wx.EXPAND, 5) self.SetSizer(sizer) #---------------------------------------------------------------------- # Used by the ColumnSorterMixin, see wx/lib/mixins/listctrl.py def GetListCtrl(self): return self.list_ctrl #---------------------------------------------------------------------- def OnColClick(self, event): print "column clicked" event.Skip() ######################################################################## class MyForm(wx.Frame): #---------------------------------------------------------------------- def __init__(self): wx.Frame.__init__(self, None, wx.ID_ANY, "List Control Tutorial") # Add a panel so it looks the correct on all platforms panel = TestListCtrlPanel(self) #---------------------------------------------------------------------- # Run the program if __name__ == "__main__": app = wx.App(False) frame = MyForm() frame.Show() app.MainLoop() This code is a little on the odd side in that we have inherit the mixin in the wx.Panel based class rather than the wx.ListCtrl class. You can do it either way though as long as you rearrange the code correctly. Anyway, we are going to home in on the key differences between this example and the previous one. The first difference of major importance is in the looping construct where we insert the list control’s data. Here we include the list control’s SetItemData method to include the necessary inner-workings that allow the sorting to take place. As you might have guessed, this method associates the row index with the music data dict’s key. Next we instantiate the ColumnSorterMixin and tell it how many columns there are in the list control. We could have left the EVT_LIST_COL_CLICK binding off this example as it has nothing to do with the actual sorting of the rows, but in the interest of increasing your knowledge, it was left in. All it does is show you how to catch the user’s column click event. The rest of the code is self-explanatory. If you want to know about the requirements for this mixin, especially when you have images in your rows, please see the relevant section in the source (i.e. listctrl.py). Now, wasn’t that easy? Let’s continue our journey and find out how to make the cells editable! How to make the ListCtrl cells editable in place Sometimes, the programmer will want to allow the user to click on a cell and edit it in place. This is kind of a lightweight version of the wx.grid.Grid control. Here’s an example: import wx import wx.lib.mixins.listctrl as listmix ######################################################################## class EditableListCtrl(wx.ListCtrl, listmix.TextEditMixin): ''' TextEditMixin allows any column to be edited. ''' #---------------------------------------------------------------------- def __init__(self, parent, ID=wx.ID_ANY, pos=wx.DefaultPosition, size=wx.DefaultSize, style=0): """Constructor""" wx.ListCtrl.__init__(self, parent, ID, pos, size, style) listmix.TextEditMixin.__init__(self) ######################################################################## class MyPanel(wx.Panel): """""" #---------------------------------------------------------------------- def __init__(self, parent): """Constructor""" wx.Panel.__init__(self, parent) rows = [("Ford", "Taurus", "1996", "Blue"), ("Nissan", "370Z", "2010", "Green"), ("Porche", "911", "2009", "Red") ] self.list_ctrl = EditableListCtrl(self, style=wx.LC_REPORT) self.list_ctrl.InsertColumn(0, "Make") self.list_ctrl.InsertColumn(1, "Model") self.list_ctrl.InsertColumn(2, "Year") self.list_ctrl.InsertColumn(3, "Color") index = 0 for row in rows: self.list_ctrl.InsertStringItem(index, row[0]) self.list_ctrl.SetStringItem(index, 1, row[1]) self.list_ctrl.SetStringItem(index, 2, row[2]) self.list_ctrl.SetStringItem(index, 3, row[3]) index += 1 sizer = wx.BoxSizer(wx.VERTICAL) sizer.Add(self.list_ctrl, 0, wx.ALL|wx.EXPAND, 5) self.SetSizer(sizer) ######################################################################## class MyFrame(wx.Frame): """""" #---------------------------------------------------------------------- def __init__(self): """Constructor""" wx.Frame.__init__(self, None, wx.ID_ANY, "Editable List Control") panel = MyPanel(self) self.Show() #---------------------------------------------------------------------- if __name__ == "__main__": app = wx.App(False) frame = MyFrame() app.MainLoop() In this script, we put the TextEditMixin in our wx.ListCtrl class instead of our wx.Panel, which is the opposite of the previous example. The mixin itself does all the heavy lifting. Again, you’ll have to check out the mixin’s source to really understand how it works. Associating objects with ListCtrl rows This subject comes up a lot: How do I associate data (i.e. objects) with my ListCtrl’s rows? Well, we’re going to find out exactly how to do that with the following code: import wx ######################################################################## class Car(object): """""" #---------------------------------------------------------------------- def __init__(self, make, model, year, color="Blue"): """Constructor""" self.make = make self.model = model self.year = year self.color = color ######################################################################## class MyPanel(wx.Panel): """""" #---------------------------------------------------------------------- def __init__(self, parent): """Constructor""" wx.Panel.__init__(self, parent) rows = [Car("Ford", "Taurus", "1996"), Car("Nissan", "370Z", "2010"), Car("Porche", "911", "2009", "Red") ] self.list_ctrl = wx.ListCtrl(self, size=(-1,100), style=wx.LC_REPORT |wx.BORDER_SUNKEN ) self.list_ctrl.Bind(wx.EVT_LIST_ITEM_SELECTED, self.onItemSelected) self.list_ctrl.InsertColumn(0, "Make") self.list_ctrl.InsertColumn(1, "Model") self.list_ctrl.InsertColumn(2, "Year") self.list_ctrl.InsertColumn(3, "Color") index = 0 self.myRowDict = {} for row in rows: self.list_ctrl.InsertStringItem(index, row.make) self.list_ctrl.SetStringItem(index, 1, row.model) self.list_ctrl.SetStringItem(index, 2, row.year) self.list_ctrl.SetStringItem(index, 3, row.color) self.myRowDict[index] = row index += 1 sizer = wx.BoxSizer(wx.VERTICAL) sizer.Add(self.list_ctrl, 0, wx.ALL|wx.EXPAND, 5) self.SetSizer(sizer) #---------------------------------------------------------------------- def onItemSelected(self, event): """""" currentItem = event.m_itemIndex car = self.myRowDict[currentItem] print car.make print car.model print car.color print car.year ######################################################################## class MyFrame(wx.Frame): """""" #---------------------------------------------------------------------- def __init__(self): """Constructor""" wx.Frame.__init__(self, None, wx.ID_ANY, "List Control Tutorial") panel = MyPanel(self) self.Show() #---------------------------------------------------------------------- if __name__ == "__main__": app = wx.App(False) frame = MyFrame() app.MainLoop() The list control widget actually doesn’t have a built-in way to accomplish this feat. If you want that, then you’ll want to check out the ObjectListView widget, which wraps the ListCtrl and gives it a lot more functionality. In the meantime, we’ll take a minute and go over the code above. The first piece is just a plain Car class with four attributes. Then in the MyPanel class, we create a list of Car objects that we’ll use for the ListCtrl’s data. To add the data to the ListCtrl, we use a for loop to iterate over the list. We also associate each row with a Car object using a Python dictionary. We use the row’s index for the key and the dict’s value ends up being the Car object. This allows us to access all the Car/row object’s data later on in the onItemSelected method. Let’s check that out! In onItemSelected, we grab the row’s index with the following little trick: event.m_itemIndex. Then we use that value as the key for our dictionary so that we can gain access to the Car object associated with that row. At this point, we just print out all the Car object’s attributes, but you could do whatever you want here. This basic idea could easily be extended to use a result set from a SqlAlchemy query for the ListCtrl’s data. Hopefully you get the general idea. Now if you were paying close attention, like Robin Dunn (creator of wxPython) was, then you might notice some really silly logic errors in this code. Did you find them? Well, you won’t see it unless you sort the rows, delete a row or insert a row. Do you see it now? Yes, I stupidly based the “unique” key in my dictionary on the row’s position, which will change if any of those events happen. So let’s look at a better example: import wx ######################################################################## class Car(object): """""" #---------------------------------------------------------------------- def __init__(self, make, model, year, color="Blue"): """Constructor""" self.id = id(self) self.make = make self.model = model self.year = year self.color = color ######################################################################## class MyPanel(wx.Panel): """""" #---------------------------------------------------------------------- def __init__(self, parent): """Constructor""" wx.Panel.__init__(self, parent) rows = [Car("Ford", "Taurus", "1996"), Car("Nissan", "370Z", "2010"), Car("Porche", "911", "2009", "Red") ] self.list_ctrl = wx.ListCtrl(self, size=(-1,100), style=wx.LC_REPORT |wx.BORDER_SUNKEN ) self.list_ctrl.Bind(wx.EVT_LIST_ITEM_SELECTED, self.onItemSelected) self.list_ctrl.InsertColumn(0, "Make") self.list_ctrl.InsertColumn(1, "Model") self.list_ctrl.InsertColumn(2, "Year") self.list_ctrl.InsertColumn(3, "Color") index = 0 self.myRowDict = {} for row in rows: self.list_ctrl.InsertStringItem(index, row.make) self.list_ctrl.SetStringItem(index, 1, row.model) self.list_ctrl.SetStringItem(index, 2, row.year) self.list_ctrl.SetStringItem(index, 3, row.color) self.list_ctrl.SetItemData(index, row.id) self.myRowDict[row.id] = row index += 1 sizer = wx.BoxSizer(wx.VERTICAL) sizer.Add(self.list_ctrl, 0, wx.ALL|wx.EXPAND, 5) self.SetSizer(sizer) #---------------------------------------------------------------------- def onItemSelected(self, event): """""" currentItem = event.m_itemIndex car = self.myRowDict[self.list_ctrl.GetItemData(currentItem)] print car.make print car.model print car.color print car.year ######################################################################## class MyFrame(wx.Frame): """""" #---------------------------------------------------------------------- def __init__(self): """Constructor""" wx.Frame.__init__(self, None, wx.ID_ANY, "List Control Tutorial") panel = MyPanel(self) self.Show() #---------------------------------------------------------------------- if __name__ == "__main__": app = wx.App(False) frame = MyFrame() app.MainLoop() In this example, we add a new attribute to our Car class that creates a unique id for each instance that is created using Python’s handy id builtin. Then in the loop where we add the data to the list control, we call the widget’s SetItemData method and give it the row index and the car instance’s unique id. Now it doesn’t matter where the row ends up because it’s had the unique id affixed to it. Finally, we have to modify the onItemSelected to get the right object. The magic happens in this code: # this code was helpfully provided by Robin Dunn car = self.myRowDict[self.list_ctrl.GetItemData(currentItem)] Cool, huh? Our last example will cover how to alternate the row colors, so let’s take a look! Alternate the row colors of a ListCtrl As this section’s title suggests, we will look at how to alternate colors of the rows of a ListCtrl. Here’s the code: import wx import wx.lib.mixins.listctrl as listmix ######################################################################## class MyPanel(wx.Panel): """""" #---------------------------------------------------------------------- def __init__(self, parent): """Constructor""" wx.Panel.__init__(self, parent) rows = [("Ford", "Taurus", "1996", "Blue"), ("Nissan", "370Z", "2010", "Green"), ("Porche", "911", "2009", "Red") ] self.list_ctrl = wx.ListCtrl(self, style=wx.LC_REPORT) self.list_ctrl.InsertColumn(0, "Make") self.list_ctrl.InsertColumn(1, "Model") self.list_ctrl.InsertColumn(2, "Year") self.list_ctrl.InsertColumn(3, "Color") index = 0 for row in rows: self.list_ctrl.InsertStringItem(index, row[0]) self.list_ctrl.SetStringItem(index, 1, row[1]) self.list_ctrl.SetStringItem(index, 2, row[2]) self.list_ctrl.SetStringItem(index, 3, row[3]) if index % 2: self.list_ctrl.SetItemBackgroundColour(index, "white") else: self.list_ctrl.SetItemBackgroundColour(index, "yellow") index += 1 sizer = wx.BoxSizer(wx.VERTICAL) sizer.Add(self.list_ctrl, 0, wx.ALL|wx.EXPAND, 5) self.SetSizer(sizer) ######################################################################## class MyFrame(wx.Frame): """""" #---------------------------------------------------------------------- def __init__(self): """Constructor""" wx.Frame.__init__(self, None, wx.ID_ANY, "List Control w/ Alternate Colors") panel = MyPanel(self) self.Show() #---------------------------------------------------------------------- if __name__ == "__main__": app = wx.App(False) frame = MyFrame() app.MainLoop() The code above will alternate each row’s background color. Thus you should see yellow and white rows. We do this by calling the ListCtrl instance’s SetItemBackgroundColour method. If you were using a virtual list control, then you’d want to override the OnGetItemAttr method. To see an example of the latter method, open up your copy of the wxPython demo; there’s one in there. Wrapping Up We’ve covered a lot of ground here. You should now be able to do a lot more with your wx.ListCtrl than when you started, assuming you’re new to using it, of course. Feel free to ask questions in the comments or suggest future recipes. I hope you found this helpful! Note: All examples were tested on Windows XP with Python 2.5 and wxPython 2.8.10.1. They were also tested on Windows 7 Professional with Python 2.6 Additional Reading The official wxPython wx.ListCtrl documentation The ListControls wiki page ListCtrl Tooltips wiki page The ObjectListView website The UltimateListCtrl, a pure Python implementation now included with wxPython Source Code listctrl.zip listctrl.tar Source: http://www.blog.pythonlibrary.org/2011/01/04/wxpython-wx-listctrl-tips-and-tricks/

February 2, 2012

by Mike Driscoll

· 23,129 Views

In-memory Cache Implementation in C#

The simplest in-memory cache implementation should support Addition of objects into cache either via key-value, or via object creation mechanism Deletion of objects from cache based on key, or object type Querying cache store to check existence of an object There are several ways to achieve this using multiple design patterns. But if we were to implement those design patterns in our applications, we would end up designing a framework similar to Enterprise Library Caching block. So to keep things fairly simple – we need a simple implementation of caching objects in-memory and this cache to be thread-safe for multi-threading applications. So for that, you can just copy this piece of code into your application and you should be all set with an in-memory cache. public static class CacheStore { /// /// In-memory cache dictionary /// private static Dictionary _cache; private static object _sync; /// /// Cache initializer /// static CacheStore() { _cache = new Dictionary(); _sync = new object(); } /// /// Check if an object exists in cache /// /// Type of object /// Name of key in cache /// True, if yes; False, otherwise public static bool Exists(string key) where T : class { Type type = typeof(T); lock (_sync) { return _cache.ContainsKey(type.Name + key); } } /// /// Check if an object exists in cache /// /// Type of object /// True, if yes; False, otherwise public static bool Exists() where T : class { Type type = typeof(T); lock (_sync) { return _cache.ContainsKey(type.Name); } } /// /// Get an object from cache /// /// Type of object /// Object from cache public static T Get() where T : class { Type type = typeof(T); lock (_sync) { if (_cache.ContainsKey(type.Name) == false) throw new ApplicationException("An object of the desired type does not exist: " + type.Name); lock (_sync) { return (T)_cache[type.Name]; } } } /// /// Get an object from cache /// /// Type of object /// Name of key in cache /// Object from cache public static T Get(string key) where T : class { Type type = typeof(T); lock (_sync) { if (_cache.ContainsKey(key + type.Name) == false) throw new ApplicationException(String.Format("An object with key '{0}' does not exists", key)); lock (_sync) { return (T)_cache[key + type.Name]; } } } /// /// Create default instance of the object and add it in cache /// /// Class whose object is to be created /// Object of the class public static T Create(string key, params object[] constructorParameters) where T : class { Type type = typeof(T); T value = (T)Activator.CreateInstance(type, constructorParameters); lock (_sync) { if (_cache.ContainsKey(key + type.Name)) throw new ApplicationException(String.Format("An object with key '{0}' already exists", key)); lock (_sync) { _cache.Add(key + type.Name, value); } } return value; } /// /// Create default instance of the object and add it in cache /// /// Class whose object is to be created /// Object of the class public static T Create(params object[] constructorParameters) where T : class { Type type = typeof(T); T value = (T)Activator.CreateInstance(type, constructorParameters); lock (_sync) { if (_cache.ContainsKey(type.Name)) throw new ApplicationException(String.Format("An object of type '{0}' already exists", type.Name)); lock (_sync) { _cache.Add(type.Name, value); } } return value; } public static void Add(string key, T value) { Type type = typeof(T); if (value.GetType() != type) throw new ApplicationException(String.Format("The type of value passed to cache {0} does not match the cache type {1} for key {2}", value.GetType().FullName, type.FullName, key)); lock (_sync) { if (_cache.ContainsKey(key + type.Name)) throw new ApplicationException(String.Format("An object with key '{0}' already exists", key)); lock (_sync) { _cache.Add(key + type.Name, value); } } } /// /// Remove an object type from cache /// /// Type of object public void Remove() { Type type = typeof(T); lock (_sync) { if (_cache.ContainsKey(type.Name) == false) throw new ApplicationException(String.Format("An object of type '{0}' does not exists in cache", type.Name)); lock (_sync) { _cache.Remove(type.Name); } } } /// /// Remove an object stored with a key from cache /// /// Type of object /// Key of the object public void Remove(string key) { Type type = typeof(T); lock (_sync) { if (_cache.ContainsKey(key + type.Name) == false) throw new ApplicationException(String.Format("An object with key '{0}' does not exists in cache", key)); lock (_sync) { _cache.Remove(key + type.Name); } } } } Every method has 2 overloads With Key as a parameter: This method adds a new key-value in the cache store for a particular object type. This also means that for a particular object (say Employee), you can have multiple cached-objects (say, multiple employees in an organization) Without Key as a parameter – This method adds a new key (type of the object) and value in the cache store. This means, for a particular object type (say ConfigurationSettings) there will single object in the cache (say, configuration value) Implementation example using CacheStore is: MonoAssemblyResolver targetAssembly = null; if (CacheStore.Exists(projMapping.TargetAssemblyPath)) { targetAssembly = CacheStore.Get(projMapping.TargetAssemblyPath); } else { targetAssembly = new MonoAssemblyResolver(projMapping.TargetAssemblyPath); CacheStore.Add(projMapping.TargetAssemblyPath, targetAssembly); } Since this uses plain-C# and is light weight, this can be used in ASP.NET MVC, Silverlight, WPF, or Windows Phone applications. So happy coding! Source: http://www.ganshani.com/2012/01/31/in-memory-cache-implementation-in-c

February 2, 2012

by Punit Ganshani

· 77,741 Views · 1 Like

Algorithm of the Week: Data Compression with Relative Encoding

Overview Relative encoding is another data compression algorithm. While run-length encoding, bitmap encoding and diagram and pattern substitution were trying to reduce repeating data, with relative encoding the goal is a bit different. Indeed run-length encoding was searching for long runs of repeating elements, while pattern substitution and bitmap encoding were trying to “map” where the repetitions happen to occur. The only problem with these algorithms is that the input stream of data is not always constructed out of repeating elements. It is clear that if the input stream contains many repeating elements there must be some way of reducing them. However that doesn’t mean that we cannot compress data if there are no repetitions. It all depends on the data. Let’s say we have the following stream to compress. 1, 2, 3, 4, 5, 6, 7 It's hard to imagine how this stream of data can be compressed. The same problem may occur when trying to compress the alphabet. Indeed the letters of the alphabet are the very base of words so it is the minimal part for word construction and therefore hard to compress. Fortunately this isn’t true always. An algorithm that tries to deal with non-repeating data is relative encoding. Let’s see the following input stream – years from a given decade (the 90′s). 1991, 1991, 1999, 1998, 1991, 1993, 1992, 1992 Here we have 39 characters and we can reduce them. A natural approach is to remove the leading “19” as we humans often do. 91, 91, 99, 98, 91, 93, 92, 92 Now we have a shorter string, but we can go even further by keeping only the first year. All other years will as relative to this year. 91, 0, 8, 7, 0, 2, 1, 1 Now the volume of transferred data is reduced a lot (from 39 to 16 – more than 50%). However there are some questions we need to answer first, because the stream wont always be formatted in such a pretty way. How about the next character stream? 91, 94, 95, 95, 98, 100, 101, 102, 105, 110 We see that the value 100 is somehow in the middle of the interval and it is handy to use it as a base value for the relative encoding. Thus the stream above will become: -9, -6, -5, -5, -2, 100, 1, 2, 5, 10 The problem is that we can’t always decide which value will be the base value so easily. What if the data was dispersed in a different way: 96, 97, 98, 99, 100, 101, 102, 103, 999, 1000, 1001, 1002 Now the value of “100” isn’t useful, because compressing the stream will get something like this: -4, -3, -2, -1, 100, 1, 2, 3, 899, 900, 901, 902 To group the relative values around “some” base values will be far more handy. (-4, -3, -2, -1, 100, 1, 2, 3) (-1, 1000, 1, 2) However, to decide which value will be the base value isn’t that easy. Also the encoding format is not so trivial. On the other hand, this type of encoding can be useful in some specific cases as we can see below. Implementation The implementation of this algorithm depends on the specific task and the format of the data stream. Assuming that we have to transfer the stream of years in JSON from a web server to a browser, here’s a short PHP snippet. // JSON: [1991,1991,1999,1998,1999,1998,1995,1997,1994,1993] $years = array(1991,1991,1999,1998,1999,1998,1995,1997,1994,1993); function relative_encoding($input) { $output = array(); $inputLength = count($input); $base = $input[0]; $output[] = $base; for ($i = 1; $i < $inputLength; $i++) { $output[] = $input[$i] - $base; } return $output; } // JSON: [1991,0,8,7,8,7,4,6,3,2] echo json_encode(relative_encoding($years)); Application This algorithm may be very useful in many cases, such as this one: there are plenty of map applications around the web. Some products such as Google Maps, Yahoo! Maps, Bing Maps are quite famous, while there are also very useful open source projects like OpenStreetMap. The web sites using these apps number in the thousands. A typical use case is to transfer lots of Geo coordinates from a web server to a browser using JSON. Indeed any GEO point on Earth is relative to the point (0,0), which is located near the west coast of Africa, however on large zoom levels, when there are tons of markers we can transfer the information with relative encoding. For instance the following diagram shows San Francisco with some markers on it. The coordinates are relative to the point (0,0) on Earth. Map markers can be relative to the (0, 0) point on Earth, which can occasionally be useless. Far more useful may be to encode those markers, relative to the center of the city, thus we can save some space. Relative encoding can be useful for map markers on a large zoom level, however this type of compression can be tricky. For example, when dragging the map and updating the marker array. On the other hand, we must group markers if we have to load more than one city. That’s why we must be careful when implementing it. But it can be very useful – for instance on initial load of the map we can reduce data and speed up the load time. The thing is that with relative encoding we can save only changes to base value (data) – something like version control systems and thus reducing data transfer and load. Here’s a graphical example. In the first case on the diagram below we can see that each item is stored on its own. It doesn’t depend on the adjacent items and it can be completely independent of them. However we can keep full info only for the first item and any other item will be relative to it, like on the diagram bellow. Source: http://www.stoimen.com/blog/2012/01/30/computer-algorithms-data-compression-with-relative-encoding/

January 31, 2012

by Stoimen Popov

· 17,723 Views

Visualize Maven Project Dependencies with dependency:tree and Dot Diagram Output

The dependency:tree goal of the Maven plugin dependency supports various graphical outputs from the version 2.4 up. This is how you would create a diagram showing all dependencies in the com.example group in the dot format: mvn dependency:tree -Dincludes=com.example-DappendOutput=true -DoutputType=dot -DappendOutput=true -DoutputFile=/path/to/output.dot To actually produce an image from .dot you can use one of .dot renderers, f.ex. this online dot renderer (paste into the right text box, press enter). You could also generate the output f.ex. in the graphml format & visualize it in Eclipse. From http://theholyjava.wordpress.com/2012/01/13/visualize-maven-project-dependencies-with-dependencytree-and-dot-diagram-output/

January 25, 2012

by Jakub Holý

· 31,643 Views

Algorithm of the Week: Data Compression with Diagram Encoding and Pattern Substitution

Two variants of run-length encoding are the diagram encoding and the pattern substitution algorithms. The diagram encoding is actually a very simple algorithm. Unlike run-length encoding, where the input stream must consists of many repeating elements, “aaaaaaaa” for instance, which are very rare in a natural language, there are many so-called “diagrams” in almost any natural language. In plain English there are some diagrams such as “the”, “and”, “ing” (in the word “waiting” for example), “ a”, “ t”, “ e” and many doubled letters. Actually we can extend those diagrams by adding surrounding spaces. Thus we can encode not only “the”, but “ the “, which are 5 characters (2 spaces and 3 letters) with something shorter. On the other hand, as I said, in plain English there are too many doubled letters, which unfortunately aren’t something special for run-length encoding and the compression ratio will be small. Even worse the encoded text may happen to be longer than the input message. Let’s see some examples. Let’s say we’ve to encode the message “successfully accomplished”, which consists of four doubled letters. However to compress it with run-length encoding we’ll need at least 8 characters, which doesn’t help us a lot. // 8 chars replaced by 8 chars!? input: "successfully accomplished" output: "su2ce2sfu2ly a2complished" The problem is that if the input text contains numbers, “2” in particular, we’ve to chose an escape symbol (“@” for example), which we’ll use to mark where the encoded run begins. Thus if the input message is “2 successfully accomplished tasks”, it will be encoded as “2 su@2ce@2sfu@2ly a@2complished tasks”. Now the output message is longer!!! than the input string. // the compressed message is longer!!! input: "2 successfully accomplished" output: "2 su@2ce@2sfu@2ly a@2complished tasks" Again if the input stream contains the escape symbol, we have to find another one, and the problem is that it is often too difficult to find short escape symbol that doesn’t appear in the input text, without a full scan of the text. That is why run-length encoding isn’t a good solution when compressing plain text, where long runs rarely appear. Well, of course, there are exceptions. For example such an exception is the lossy text compression with run-length encoding. It is intuitively clear that compressing text with loss is rarely useful, especially when you’ve to decompress exactly the same text. However there are some cases that lossy compression may be useful. Such case can be removing spaces. Indeed the text “successfully accomplished” brings us exactly the same information as “successfully accomplished”. In this case we can simply remove those spaces. Indeed we can use a marker to indicate the long run of spaces like “successfully@6 accomplished” in order to decompress the input string with absolutely no loss, but we can also throw those symbols away. This desision depends on the goal. Exactly with the same goal in mind we can remove new lines and tabs, only if we’re sure that the sense of the text is preserved. Yet again, a problem is that such long runs don’t happen to occur in random texts. That is why it’s better to use diagram encoding for plain text compression instead of run-length encoding. A Few Questions After understanding the principles of the diagram encoding, let’s see some examples. In the example above it is better to replace doubled letters with something shorter. Let’s say # for “cc”, @ for “ss” and % for “ll”. Thus the input text will be compressed as “su#e@fu%y a#omplished”, which is shorter. But yet again what will happen if the input message contains one of the substitutions? Also we can’t say if there are many doubled letters and enough reasonable substitutions for them. A better approach is to replace patterns. Run-length encoding isn't a good approach for text compression, because long runs rarely appear in a natural language. Pattern Substitution The pattern substitution algorithm is a variant of the diagram encoding. As I said above in plain English a very commonly used pattern can be “ the “, which is five characters long. We can now replace it with something like “$%” for example. In this case the message “I send the message” will become “I send$%message”. However there are some obstacles to overcome. The first problem is that we need to know the language and somehow to define commonly used patterns in a dictionary. What would happen with a message written in some language we don’t know nothing about. Let’s say – Latin like the example bellow. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras venenatis, sapien eget suscipit placerat, justo quam blandit mauris, quis tempor ante sapien sodales augue. Praesent ut mauris quam. Phasellus scelerisque, ante quis consequat tristique, metus turpis consectetur leo, vitae facilisis sapien mi eu sapien. Praesent vitae ligula elit, et faucibus augue. Sed rhoncus sodales dolor ut gravida. In quis augue ac nulla auctor mattis sed sed libero. Donec eget purus eget enim tempor porta vitae eget diam. Mauris aliquet malesuada ipsum, non pulvinar urna vestibulum ac. Donec feugiat velit vitae nunc cursus imperdiet. Donec accumsan faucibus dictum. Phasellus sed mauris sapien. Maecenas mi metus, tincidunt sed rhoncus nec, sodales non sapien. Clearly without knowing Latin it isn’t easy to define which are those commonly used patterns. The thing is that it’s better to use pattern substitution if you know in advance the set of words and characters. The second problem is related to decompression. It is obvious that we need to define a dictionary and this dictionary must be used when decoding the message. It will be great also if we find more patterns longer than three characters. If not, the compression ratio will be low. Unfortunately such patterns aren’t very common in any natural language. Diagram encoding and pattern substitution are far more suitable for text compression than run-length encoding. In fact, pattern substitution is very effective on compressing programming languages. Application It is interesting to answer the question, how to use diagram encoding or patter substitution to compress text in natural language, especially when we don’t know the language in detail? The answer hides in the question. We wont compress natural languages, but machine language. Exactly machine (programming) languages are limited to a smaller sets of words and symbols. Isn’t it true for any programing language? Like PHP, where words like “function”, “while”, “for”, “break”, “switch”, “foreach” happen to be often in use, or HTML with its defined set of tags. Perhaps the best example is CSS, where only the values of the properties can vary. CSS files also tend to have multiple new lines, tabs and spaces, which only humans read. The question here is why should we compress those file types. It’s clear that after the compression they will be completely useless, both for humans and machines. Yes, that is true, but what if we have to store versions of those files into a DB. Kind of a backup. Imagine you’re working for a web hosting company that has to store daily versions of the sites it’s hosting. Thus the volume of stored information even for small companies hosting only few sites can be enormous. The problem is that compressing those files with some conventional compressing tool isn’t a good idea. Thus we’ve to save a copy of the entire site every day, but as we know the difference between daily versions of a site can be small. A version control system is another solution, but then you’ve to store the plain text of the files. Perhaps a better approach is to compress the text using pattern substitution and then saving only differences – kind of version control, which can be done with “relative encoding”. Using the above method we can save lots of disk space and in the same time we can compress/decompress easily. Another good thing is that you can save only changes to the initial files, like version control, which can also be compressed. Implementation The implementation of this algorithm is again on PHP and tries only to describe the main principles of compression. In this case I tried to compress a CSS file using the compression above. Although this example is quite primitive we can see some interesting facts. First of all you only need encoding and decoding dictionaries. Practically the encoding and decoding processes are equal, so you don’t need to implement two different functions. Here in this example a native PHP function is used – str_replace, because the purpose of this algorithm is not to describe pattern substitution techniques, but pattern substitution. It assumes that today’s programming languages have string manipulation functions for the purposes of this task. $str = file_get_contents('large_style_file.css'); $encoding_dict = array( "\n" => '$0', 'text' => '$1', 'color' => '$2', 'display' => '$3', 'font' => '$4', 'width' => '$5', 'height' => '$6', ' ' => '', ); function replace_patterns($input, $dict) { foreach ($dict as $pattern => $replace) { $input = str_replace($pattern, $replace, $input); } return $input; } $result = replace_patterns($str, $encoding_dict); By only replacing few CSS properties I achieved almost 40% of compression ratio (as shown the diagram bellow). The initial file is 202 KB, while compressed it’s only 131 KB. Of course, it all depends on the CSS file, but how about replacing all property names with shorter ones. Perhaps then the compression will be even better. Source: http://www.stoimen.com/blog/2012/01/23/computer-algorithms-data-compression-with-diagram-encoding-and-pattern-substitution/

January 24, 2012

by Stoimen Popov

· 23,827 Views

The Persistence Layer with Spring Data JPA

This is the forth of a series of articles about Persistence with Spring. This article will focus on the configuration and implementation of the persistence layer with Spring 3.1, JPA and Spring Data. For a step by step introduction about setting up the Spring context using Java based configuration and the basic Maven pom for the project, see this article. The Persistence with Spring series: Part 1 – The Persistence Layer with Spring 3.1 and Hibernate Part 3 – The Persistence Layer with Spring 3.1 and JPA Part 5 – Transaction configuration with JPA and Spring 3.1 No More DAO implementations As I discussed in a previous post, the DAO layer usually consists of a lot of boilerplate code that can and should be simplified. The advantages of such a simplification are many fold: a decrease in the number of artifacts that need to be defined and maintained, simplification and consistency of data access patterns and consistency of configuration. Spring Data takes this simplification one step forward and makes it possible to remove the DAO implementations entirely – the interface of the DAO is now the only artifact that need to be explicitly defined. The Spring Data managed DAO In order to start leveraging the Spring Data programming model with JPA, a DAO interface needs to extend the JPA specific Repository interface - JpaRepository – in Spring’s interface hierarchy. This will enable Spring Data to find this interface and automatically create an implementation for it. Also, by extending the interface we get most if not all relevant CRUD generic methods for standard data access available in the DAO. Defining custom access method and queries As discussed, by implementing one of the Repository interfaces, the DAO will already have some basic CRUD methods (and queries) defined and implemented. To define more specific access methods, Spring JPA supports quite a few options – you can either simply define a new method in the interface, or you can provide the actual JPQ query by using the @Query annotation. A third option to define custom queries is to make use of JPA Named Queries, but this has the disadvantage that it either involves XML or burdening the domain class with the queries. In addition to these, Spring Data introduces a more flexible and convenient API, similar to the JPA Criteria API, only more readable and reusable. The advantages of this API will become more pronounced when dealing with a large number of fixed queries that could potentially be more concisely expressed through a smaller number of reusable blocks that keep occurring in different combinations. Automatic Custom Queries When Spring Data creates a new Repository implementation, it analyzes all the methods defined by the interfaces and tries to automatically generate queries from the method name. While this has limitations, it is a very powerful and elegant way of defining new custom access methods with very little effort. For example, if the managed entity has a name field (and the Java Bean standard getter and setter for that field), defining the findByName method in the DAO interface will automatically generate the correct query: public interface IFooDAO extends JpaRepository< Foo, Long >{ Foo findByName( final String name ); } This is a relatively simple example; a much larger set of keywords is supported by query creation mechanism. In the case that the parser cannot match the property with the domain object field, the following exception is thrown: java.lang.IllegalArgumentException: No property nam found for type class org.rest.model.Foo Manual Custom Queries In addition to deriving the query from the method name, a custom query can be manually specified with the method level @Query annotation. For even more fine grained control over the creation of queries, such as using named parameters or modifying existing queries, the reference is a good place to start. Spring Data transaction configuration The actual implementation of the Spring Data managed DAO – SimpleJpaRepository – uses annotations to define and configure transactions. A read only @Transactional annotation is used at the class level, which is then overridden for the non read-only methods. The rest of the transaction semantics are default, but these can be easily overridden manually per method. Exception Translation without the template One of the responsibilities of Spring ORM templates (JpaTemplate, HibernateTemplate) is exception translation – translating JPA exceptions – which tie the API to JPA – to Spring’s DataAccessException hierarchy. Without the template to do that, exception translation can still be enabled by annotating the DAOs with the @Repository annotation. That, coupled with a Spring bean postprocessor will advice all @Repository beans with all the implementations of PersistenceExceptionTranslator found in the Container – to provide exception translation without using the template. The fact that exception translation is indeed active can easily be verified with an integration test: @Test( expected = DataAccessException.class ) public void whenAUniqueConstraintIsBroken_thenSpringSpecificExceptionIsThrown(){ String name = "randomName"; this.service.save( new Foo( name ) ); this.service.save( new Foo( name ) ); } Exception translation is done through proxies; in order for Spring to be able to create proxies around the DAO classes, these must not be declared final. Spring Data Configuration To activate the Spring JPA repository support, the jpa namespace is defined and used to specify the package where to DAO interfaces are located: At this point, there is no equivalent Java based configuration – support for it is however in the works. The Spring Java or XML configuration The JPA configuration with Spring 3.1 has already been carefully discussed in the previous article of this series. Spring Data also takes advantage of the Spring support for the JPA @PersistenceContext annotation which it uses to wire the EntityManager into the Spring factory bean responsible with creating the actual DAO implementations – JpaRepositoryFactoryBean. In addition to the already discussed configuration, there is one last missing piece – including the Spring Data XML configuration in the overall persistence configuration: @Configuration @EnableTransactionManagement @ImportResource( "classpath*:*springDataConfig.xml" ) public class PersistenceJPAConfig{ ... } The Maven configuration In addition to the Maven configuration for JPA defined in a previous article, the spring-data-jpa dependency is addeed: org.springframework.data spring-data-jpa 1.0.2.RELEASE Conclusion This article covered the configuration and implementation of the persistence layer with Spring 3.1, JPA 2 and Spring JPA (part of the Spring Data umbrella project), using both XML and Java based configuration. The various method of defining more advanced custom queries are discussed, as well as configuration with the new jpa namespace and transactional semantics. The final result is a new and elegant take on data access with Spring, with almost no actual implementation work. You can check out the full implementation in the github project. From the originalThe Persistence Layer with Spring Data JPA of the Persistence with Spring series

January 20, 2012

by Eugen Paraschiv

· 154,856 Views · 2 Likes

Algorithm of the Week: Data Compression with Bitmaps

In my previous post we saw how to compress data consisting of very long runs of repeating elements. This type of compression is known as “run-length encoding” and can be very handy when transferring data with no loss. The problem is that the data must follow a specific format. Thus the string “aaaaaaaabbbbbbbb” can be compressed as “a8b8”. Now a string with length 16 can be compressed as a string with length 4, which is 25% of its initial length without loosing any information. There will be a problem in case the characters (elements) were dispersed in a different way. What would happen if the characters are the same, but they don’t form long runs? What if the string was “abababababababab”? The same length, the same characters, but we cannot use run-length encoding! Indeed using this algorithm we’ll get at best the same string. In this case, however, we can see another fact. The string consists of too many repeating elements, although not arranged one after another. We can compress this string with a bitmap. This means that we can save the positions of the occurrences of a given element with a sequence of bits, which can be easily converted into a decimal value. In the example above the string “abababababababab” can be compressed as “1010101010101010”, which is 43690 in decimals, and even better AAAA in hexadecimal. Thus the long string can be compressed. When decompressing (decoding) the message we can convert again from decimal/hexadecimal into binary and match the occurrences of the characters. Well, the example above is too simple, but let’s say only one of the characters is repeating and the rest of the string consists of different characters like this: “abacadaeafagahai”. Then we can use bitmap only for the character “a” – “1010101010101010” and compress it as “AAAA bcdefghi”. As you can see all the example strings are exactly 16 characters and that is a limitation. To use bitmaps with variable length of the data is a bit tricky and it is not always easy (if possible) to decompress it. Basically bitmap compression saves the positions of an element that is repeated very often in the message! In the other hand bitmap compression is not only applicable on strings. We can compress also arrays, objects or any kind of data. The example from my previous post is very suitable. Then we had to transfer a large array from a server to the client (browser) using JSON. The data then was very suitable for “run-length encoding”. Now let’s assume we have the same data – a set of different years, which this time are dispersed in a different way. $data = array( 0 => 1991, 1 => 1992, 2 => 1993, 3 => 1994, 4 => 1991, 5 => 1992, 6 => 1993, 7 => 1992, 8 => 1991, 9 => 1991, 10 => 1991, 11 => 1992, 12 => 1992, 13 => 1991, 14 => 1991, 15 => 1992, ... ); The JSON will encoded message will be the following (a simple but yet very large javascript array). [1991,1992,1993,1994,1991,1992,1993,1992,1991,1991,1991,1992,1992,1991,1991,1992, ...] However if we use bitmap compression we’ll get a “shorter” array. $data = array( 0 => array(1991, '1000100011100110'), 1 => array(1992, '0100010100011001'), 2 => array(1993, '0010001000000000'), 3 => array(1994, '0001000000000000'), ); Now the JSON is: [[1991,"1000100011100110"],[1992,"0100010100011001"],[1993,"0010001000000000"],[1994,"0001000000000000"]] It is obvious that the compression ratio is getting better and better as the uncompressed data grows. In fact, most of us know bitmap compression from images, because this algorithm is largely used for image compression. We can imagine how successful it can be when compressing black and white images (as black and white can be represented as 0 and 1s). Actually it is used for more than two colors (256 for instance) and again the level of compression is very high. Implementation The following implementation on PHP aims only to illustrate the bitmap compressing algorithm. As we know this algorithm can be applicable for any kind of data structures. // too many repeating "a" characters $msg = 'aazahalavaatalawacamaahakafaaaqaaaiauaacaaxaauaxaaaaaapaayatagaaoafaawayazavaaaazaaabararaaaaakakaaqaarazacajaazavanazaaaeanaaoajauaaaaaxalaraaapabataaavaaab'; function bitmap($message) { $i = 0; $bits = $rest = ''; while ($v = $message[$i]) { if ($v == 'a') { $bits .= '1'; } else { $bits .= '0'; $rest .= $v; } $i++; } return number_format(bindec($bits), 0, '.', '') . $rest;; } echo bitmap($msg); // uncompressed: acaaaaadaaaabalaaeaaaaganaaxakaavawamaasavajawaaaayaauaaadalanagaeaeamaarafalaazaaaiasaanaahaaazaraxaalaahaaawaaajasamahaajaakarapanaakaoakaanawalaacamauaamaal // compressed: 152299251941730035874325065523548237677352452096zhlvtlwcmhkfqiucxuxpytgofwyzvzbrrkkqrzcjzvnzenojuxlrpbtvb Application This algorithm is very useful when there is an element in our data that repeats very often, so you need to investigate the nature of the data you want to compress. Actually because of this fact this algorithm is used for image compression as PNG8 or GIF. Source: http://www.stoimen.com/blog/2012/01/16/computer-algorithms-data-compression-with-bitmaps/

January 17, 2012

by Stoimen Popov

· 20,347 Views

Mocking of 'Open' as a Context Manager Made Simple In Python

Using open as a context manager is a great way to ensure your file handles are closed properly and is becoming common: with open('/some/path', 'w') as f: f.write('something') The issue is that even if you mock out the call to open it is the returned object that is used as a context manager (and has __enter__ and __exit__ called). Using MagicMock from the mock library, we can mock out context managers very simply. However, mocking open is fiddly enough that a helper function is useful. Here mock_open creates and configures a MagicMock that behaves as a file context manager. from mock import inPy3k, MagicMock if inPy3k: file_spec = ['_CHUNK_SIZE', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__gt__', '__hash__', '__iter__', '__le__', '__lt__', '__ne__', '__next__', '__repr__', '__str__', '_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', 'buffer', 'close', 'closed', 'detach', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'line_buffering', 'mode', 'name', 'newlines', 'peek', 'raw', 'read', 'read1', 'readable', 'readinto', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines'] else: file_spec = file def mock_open(mock=None, data=None): if mock is None: mock = MagicMock(spec=file_spec) handle = MagicMock(spec=file_spec) handle.write.return_value = None if data is None: handle.__enter__.return_value = handle else: handle.__enter__.return_value = data mock.return_value = handle return mock >>> m = mock_open() >>> with patch('__main__.open', m, create=True): ... with open('foo', 'w') as h: ... h.write('some stuff') ... >>> m.assert_called_once_with('foo', 'w') >>> m.mock_calls [call('foo', 'w'), call().__enter__(), call().write('some stuff'), call().__exit__(None, None, None)] >>> handle = m() >>> handle.write.assert_called_once_with('some stuff') And for reading files, using a StringIO to represent the file handle: >>> from StringIO import StringIO >>> m = mock_open(data=StringIO('foo bar baz')) >>> with patch('__main__.open', m, create=True): ... with open('foo') as h: ... result = h.read() ... >>> m.assert_called_once_with('foo') >>> assert result == 'foo bar baz' Note that the StringIO will only be used for the data if open is used as a context manager. If you just configure and use mocks they will work whichever way open is used. This helper function will be built into mock 0.9. Source: http://www.voidspace.org.uk/python/weblog/arch_d7_2012_01_07.shtml

January 15, 2012

by Michael Foord

· 18,150 Views

Big Data Chapter Excerpt: Implementing Schemas with Apache Thrift

This is an excerpt from the upcoming Manning book about Big Data. Big Data Principles and Best Practices of Scalable Realtime Data Systems By Nathan Marz and Samuel E. Ritchie Thrift is a widely used project that originated at Facebook. It can be used for making language-neutral RPC servers, but developers use it for its schema-creation capabilities. In this article based on chapter 2, author Nathan Marz discusses workhorses of Thrift—the struct and union type definitions—and Thrift’s built-in mechanisms for evolving a schema over time. You may also be interested in… Thrift is a widely used project that originated at Facebook. It can be used for making language-neutral RPC servers, but developers use it for its schema-creation capabilities. The workhorses of Thrift are the struct and union type definitions, and Thrift has built-in mechanisms for evolving a schema over time. Orginally Authored by Nathan Marz and Samuel E. Ritchie Structs The following code shows how to define a struct using the Thrift Interface Definition Language (IDL). Defining a struct is like defining a class in an object-oriented language: you specify all the data the object contains. The difference is that a Thrift struct only contains data and doesn't specify any extra behavior for the object. Fields in a struct can be: Primitive types like strings, ints, longs, and doubles. In the Thrift IDL, these are referred to as string, i32, i64, and double, respectively. Collections of other types. Thrift supports list, map, and set. Another Thrift struct or union. struct Person { 1: string twitter_username; 2: string full_name; 3: list interests; } The following code listing shows how to serialize a struct with Java. As you can see, we're using ArrayList, a native Java data structure, as part of the Person object. List interests = new ArrayList() {{ add("hadoop"); add("nosql"); }; Person person = new Person("joesmith", "Joe Smith", interests); TSerializer serializer = new TSerializer(); byte[] serialized = serializer.serialize(person); Here's how to deserialize a Person object in Python. When the object is deserialized, it will be using native Python data structures for any collection types. person = Person() deserialize(person, serialized_bytes) Fields in structs can be defined as being either required or optional. If a field is defined as required, than a value for that field must be provided or else Thrift will give an error upon serialization or deserialization. If a field is optional, the value will be null if not provided. You should always declare fields as being either required or optional. The following code listing shows how to define a struct containing required and optional fields. struct Tweet { 1: required string text; 2: required i64 id; 3: required i64 timestamp; 4: required Person person; 5: optional i64 response_to_tweet_id; Unions You can also define unions in Thrift. A union is a struct that must have exactly one field set. Unions are useful for representing polymorphic data. The following listing shows how to define a "PersonID" using a Thrift union that can be one of many different kinds of identifiers. union PersonID { 1: string email; 2: i64 facebook_id; 3: i64 twitter_id; } Evolving a schema Thrift is designed so that schemas can be evolved over time. The key to evolving Thrift schemas over time is the numeric identifiers used for every field. Those ids are used to identify fields in their serialized form. When you want to change the schema but still be backward compatible with existing data, you must obey the following rules. Fields may be renamed. This is because the serialized form of an object uses the field ids to identify fields, not the names. Fields may be removed, but you must be sure never to reuse that field id. When deserializing, Thrift will skip over any fields that don't match an id it's expecting. So the data for that field will just be ignored in the existing data. If you were to reuse that field id, Thrift will try to deserialize that old data into your new field which will lead to either invalid or incorrect data. Only optional fields can be added to existing structs. You can't add required fields because existing data won't have that field and will not be deserializable. Note that this point does not apply to unions since unions have no notion of required and optional fields. Summary In a relational database, the schema language is part of the database system and is integrated with how the database stores and processes that data. In the Big Data world, you use your own serialization framework that's separate from the storage and processing pieces. You get the flexibility to fine-tune this component to work exactly as needed to fit your data model. There are a few different open source serialization frameworks available, namely Thrift, Protocol Buffers, and Avro. We discussed our favorite, Apache Thrift, because it’s mature and supports most languages, but you could use any of these tools for defining a schema. Here are some other Manning titles you might be interested in: MongoDB in Action Kyle Banker RabbitMQ in Action Alvaro Videla and Jason J.W. Williams Hadoop in Action Chuck Lam Last updated: January 11, 2012

January 12, 2012

by Chris Smith

· 11,543 Views

Local and Distributed Graph Traversal Engines

in the graph database space, there are two types of traversal engines: local and distributed. local traversal engines are typically for single-machine graph databases and are used for real-time production applications. distributed traversal engines are typically for multi-machine graph databases and are used for batch processing applications. this divide is quite sharp in the community, but there is nothing that prevents the unification of both models. a discussion of this divide and its unification is presented in this post. local traversal engines in a local traversal engine, there is typically a single processing agent that obeys a program. the agent is called a traverser and the program it follows is called a path description . in gremlin , a friend-of-a-friend path description is defined as such: g.v(1).oute('friend').inv.oute('friend').inv when this path description is interpreted by a traverser over a graph, an instance of the description is realized as actual paths in the graph that match that description. for example, the gremlin traverser starts at vertex 1 and then steps to its outgoing friend -edges. next, it will move to the head/target vertices of those edges (i.e. vertices 3 and 4). after that, it will go to the friend -edges of those vertices and finally, to the head of the previous edges (i.e. vertices 6 and 7). in this way, a single traverser is following all the paths that are exposed with each new atomic graph operation (i.e. each new step after a .). the abstract syntax being: step.step.step . what is returned by this friend-of-a-friend path description, on the example graph diagrammed, is vertices 6 and 7. in many situations, its not the end of the path that is desired, but some side-effect of the traversal. for example, as the traverser walks it can update some global data structure such as a ranking of the vertices. this idea is presented in the path description below, where the oute.inv path is looped over 1000 times. each time a vertex is traversed over, the map m is updated. this global map m maintains keys that are vertices and values that denote the number of times that each vertex has been touched ( groupcount ‘s behavior). m = [:] g.v(1).oute.inv.groupcount(m).loop(3){it.loops < 1000} the local traversal engine pattern is abstractly diagrammed on the right, where a single traverser is obeying some path description ( a.b.c ) and in doing so, moving around on a graph and updating a global data structure (the red boxed map). given the need for traversers to move from element to element, graph databases of this form tend to support strong data locality by means of a direct-reference graph data structure (i.e. vertices have pointers to edges and edges to vertices). a few examples of such graph databases include neo4j , orientdb , and dex . distributed traversal engines in a distributed traversal engine, a traversal is represented as a flow of messages between the elements of the graph. generally, each element (e.g. vertex) is operating independently of the other elements. each element is seen as its own processor with its own (usually homogenous) program to execute. elements communicate with each other via message passing . when no more messages have been passed, the traversal is complete and the results of the traversal are typically represented as a distributed data structure over the elements. graph databases of this nature tend to use the bulk synchronous parallel model of distributed computing. each step is synchronized in a manner analogous to a clock cycle in hardware. instances of this model include agrapa , pregel , trinity , and goldenorb . an example of distributed graph traversing is now presented using a ranking algorithm in java. [ note : in this example, edges are not first class citizens. this is typical of the state of the art in distributed traversal engines. they tend to be for single-relational, unlabeled-edge graphs.] public void evaluatestep(int step) { if(!this.inbox.isempty() && step < 1000) { this.rank = this.rank + this.inbox.size(); for(vertex vertex : this.adjacentvertices()) { for(int i=0; i

January 10, 2012

by Marko Rodriguez

· 8,462 Views

Simplifying the Data Access Layer with Spring and Java Generics

1. Overview This is the second of a series of articles about Persistence with Spring. The previous article discussed setting up the persistence layer with Spring 3.1 and Hibernate, without using templates. This article will focus on simplifying the Data Access Layer by using a single, generified DAO, which will result in elegant data access, with no unnecessary clutter. Yes, in Java. The Persistence with Spring series: Part 1 – The Persistence Layer with Spring 3.1 and Hibernate Part 3 – The Persistence Layer with Spring 3.1 and JPA Part 4 – The Persistence Layer with Spring Data JPA Part 5 – Transaction configuration with JPA and Spring 3.1 2. The DAO mess Most production codebases have some kind of DAO layer. Usually the implementation ranges from a raw class with no inheritance to some kind of generified class, but one thing is consistent – there is always more then one. Most likely, there are as many DAOs as there are entities in the system. Also, depending on the level of generics involved, the actual implementations can vary from heavily duplicated code to almost empty, with the bulk of the logic grouped in an abstract class. 2.1. A Generic DAO Instead of having multiple implementations – one for each entity in the system – a single parametrized DAO can be used in such a way that it still takes full advantage of the type safety provided by generics. Two implementations of this concept are presented next, one for a Hibernate centric persistence layer and the other focusing on JPA. These implementation are by no means complete – only some data access methods are included, but they can be easily be made more thorough. 2.2. The Abstract Hibernate DAO public abstract class AbstractHibernateDAO< T extends Serializable > { private Class< T > clazz; @Autowired SessionFactory sessionFactory; public void setClazz( Class< T > clazzToSet ){ this.clazz = clazzToSet; } public T findOne( Long id ){ return (T) this.getCurrentSession().get( this.clazz, id ); } public List< T > findAll(){ return this.getCurrentSession() .createQuery( "from " + this.clazz.getName() ).list(); } public void save( T entity ){ this.getCurrentSession().persist( entity ); } public void update( T entity ){ this.getCurrentSession().merge( entity ); } public void delete( T entity ){ this.getCurrentSession().delete( entity ); } public void deleteById( Long entityId ){ T entity = this.getById( entityId ); this.delete( entity ); } protected Session getCurrentSession(){ return this.sessionFactory.getCurrentSession(); } } The DAO uses the Hibernate API directly, without relying on any Spring templates (such as HibernateTemplate). Using of templates, as well as management of the SessionFactory which is autowired in the DAO were covered in the previous post of the series. 2.3. The Abstract JPA DAO public abstract class AbstractJpaDAO< T extends Serializable > { private Class< T > clazz; @PersistenceContext EntityManager entityManager; public void setClazz( Class< T > clazzToSet ){ this.clazz = clazzToSet; } public T findOne( Long id ){ return this.entityManager.find( this.clazz, id ); } public List< T > findAll(){ return this.entityManager.createQuery( "from " + this.clazz.getName() ) .getResultList(); } public void save( T entity ){ this.entityManager.persist( entity ); } public void update( T entity ){ this.entityManager.merge( entity ); } public void delete( T entity ){ this.entityManager.remove( entity ); } public void deleteById( Long entityId ){ T entity = this.getById( entityId ); this.delete( entity ); } } Similar to the Hibernate DAO implementation, the Java Persistence API is used here directly, again not relying on the now deprecated Spring JpaTemplate. 2.4. The Generic DAO Now, the actual implementation of the generic DAO is as simple as it can be – it contains no logic. Its only purpose is to be injected by the Spring container in a service layer (or in whatever other type of client of the Data Access Layer): @Repository @Scope( BeanDefinition.SCOPE_PROTOTYPE ) public class GenericJpaDAO< T extends Serializable > extends AbstractJpaDAO< T > implements IGenericDAO< T >{ // } @Repository @Scope( BeanDefinition.SCOPE_PROTOTYPE ) public class GenericHibernateDAO< T extends Serializable > extends AbstractHibernateDAO< T > implements IGenericDAO< T >{ // } First, note that the generic implementation is itself parametrized – allowing the client to choose the correct parameter in a case by case basis. This will mean that the clients gets all the benefits of type safety without needing to create multiple artifacts for each entity. Second, notice the prototype scope of these generic DAO implementation. Using this scope means that the Spring container will create a new instance of the DAO each time it is requested (including on autowiring). That will allow a service to use multiple DAOs with different parameters for different entities, as needed. The reason this scope is so important is due to the way Spring initializes beans in the container. Leaving the generic DAO without a scope would mean using the default singleton scope, which would lead to a single instance of the DAO living in the container. That would obviously be majorly restrictive for any kind of more complex scenario. 3. The Service There is now a single DAO to be injected by Spring; also, the Class needs to be specified: @Service class FooService implements IFooService{ IGenericDAO< Foo > dao; @Autowired public void setDao( IGenericDAO< Foo > daoToSet ){ this.dao = daoToSet; this.dao.setClazz( Foo.class ); } // ... } Spring autowires the new DAO insteince using setter injection so that the implementation can be customized with the Class object. After this point, the DAO is fully parametrized and ready to be used by the service. 4. Conclusion This article discussed the simplification of the Data Access Layer by providing a single, reusable implementation of a generic DAO. This implementation was presented in both a Hibernate and a JPA based environment. The result is a streamlined persistence layer, with no unnecessary clutter. For a step by step introduction about setting up the Spring context using Java based configuration and the basic Maven pom for the project, see this article. The next article of the Persistence with Spring series will focus on setting up the DAL layer with Spring 3.1 and JPA. In the meantime, you can check out the full implementation in the github project. If you read this far, you should follow me on twitter here.

January 5, 2012

by Eugen Paraschiv

· 25,029 Views · 1 Like

JAXB and Joda-Time: Dates and Times

Joda-Time provides an alternative to the Date and Calendar classes currently provided in Java SE. Since they are provided in a separate library JAXB does not provide a default mapping for these classes. We can supply the necessary mapping via XmlAdapters. In this post we will cover the following Joda-Time types: DateTime, DateMidnight, LocalDate, LocalTime, LocalDateTime. Java Model The following domain model will be used for this example: package blog.jodatime; import javax.xml.bind.annotation.XmlRootElement; import javax.xml.bind.annotation.XmlType; import org.joda.time.DateMidnight; import org.joda.time.DateTime; import org.joda.time.LocalDate; import org.joda.time.LocalDateTime; import org.joda.time.LocalTime; @XmlRootElement @XmlType(propOrder={ "dateTime", "dateMidnight", "localDate", "localTime", "localDateTime"}) public class Root { private DateTime dateTime; private DateMidnight dateMidnight; private LocalDate localDate; private LocalTime localTime; private LocalDateTime localDateTime; public DateTime getDateTime() { return dateTime; } public void setDateTime(DateTime dateTime) { this.dateTime = dateTime; } public DateMidnight getDateMidnight() { return dateMidnight; } public void setDateMidnight(DateMidnight dateMidnight) { this.dateMidnight = dateMidnight; } public LocalDate getLocalDate() { return localDate; } public void setLocalDate(LocalDate localDate) { this.localDate = localDate; } public LocalTime getLocalTime() { return localTime; } public void setLocalTime(LocalTime localTime) { this.localTime = localTime; } public LocalDateTime getLocalDateTime() { return localDateTime; } public void setLocalDateTime(LocalDateTime localDateTime) { this.localDateTime = localDateTime; } } XmlAdapters Since Joda-Time and XML Schema both represent data and time information according to ISO 8601 the implementation of the XmlAdapters is quite trivial. DateTimeAdapter package blog.jodatime; import javax.xml.bind.annotation.adapters.XmlAdapter; import org.joda.time.DateTime; public class DateTimeAdapter extends XmlAdapter{ public DateTime unmarshal(String v) throws Exception { return new DateTime(v); } public String marshal(DateTime v) throws Exception { return v.toString(); } } DateMidnightAdapter package blog.jodatime; import javax.xml.bind.annotation.adapters.XmlAdapter; import org.joda.time.DateMidnight; public class DateMidnightAdapter extends XmlAdapter { public DateMidnight unmarshal(String v) throws Exception { return new DateMidnight(v); } public String marshal(DateMidnight v) throws Exception { return v.toString(); } } LocalDateAdapter package blog.jodatime; import javax.xml.bind.annotation.adapters.XmlAdapter; import org.joda.time.LocalDate; public class LocalDateAdapter extends XmlAdapter{ public LocalDate unmarshal(String v) throws Exception { return new LocalDate(v); } public String marshal(LocalDate v) throws Exception { return v.toString(); } } LocalTimeAdapter package blog.jodatime; import javax.xml.bind.annotation.adapters.XmlAdapter; import org.joda.time.LocalTime; public class LocalTimeAdapter extends XmlAdapter { public LocalTime unmarshal(String v) throws Exception { return new LocalTime(v); } public String marshal(LocalTime v) throws Exception { return v.toString(); } } LocalDateTimeAdapter package blog.jodatime; import javax.xml.bind.annotation.adapters.XmlAdapter; import org.joda.time.LocalDateTime; public class LocalDateTimeAdapter extends XmlAdapter{ public LocalDateTime unmarshal(String v) throws Exception { return new LocalDateTime(v); } public String marshal(LocalDateTime v) throws Exception { return v.toString(); } } Registering the XmlAdapters We will use the @XmlJavaTypeAdapters annotation to register the Joda-Time types at the package level. This means that whenever these types are found on a field/property on a class within this package the XmlAdapter will automatically be applied. @XmlJavaTypeAdapters({ @XmlJavaTypeAdapter(type=DateTime.class, value=DateTimeAdapter.class), @XmlJavaTypeAdapter(type=DateMidnight.class, value=DateMidnightAdapter.class), @XmlJavaTypeAdapter(type=LocalDate.class, value=LocalDateAdapter.class), @XmlJavaTypeAdapter(type=LocalTime.class, value=LocalTimeAdapter.class), @XmlJavaTypeAdapter(type=LocalDateTime.class, value=LocalDateTimeAdapter.class) }) package blog.jodatime; import javax.xml.bind.annotation.adapters.XmlJavaTypeAdapter; import javax.xml.bind.annotation.adapters.XmlJavaTypeAdapters; import org.joda.time.DateMidnight; import org.joda.time.DateTime; import org.joda.time.LocalDate; import org.joda.time.LocalDateTime; import org.joda.time.LocalTime; Demo To run the following demo you will need the Joda-Time jar on your classpath. It can be obtained here: http://sourceforge.net/projects/joda-time/files/joda-time/ package blog.jodatime; import javax.xml.bind.JAXBContext; import javax.xml.bind.Marshaller; import org.joda.time.DateMidnight; import org.joda.time.DateTime; import org.joda.time.LocalDate; import org.joda.time.LocalDateTime; import org.joda.time.LocalTime; public class Demo { public static void main(String[] args) throws Exception { Root root = new Root(); root.setDateTime(new DateTime(2011, 5, 30, 11, 2, 30, 0)); root.setDateMidnight(new DateMidnight(2011, 5, 30)); root.setLocalDate(new LocalDate(2011, 5, 30)); root.setLocalTime(new LocalTime(11, 2, 30)); root.setLocalDateTime(new LocalDateTime(2011, 5, 30, 11, 2, 30)); JAXBContext jc = JAXBContext.newInstance(Root.class); Marshaller marshaller = jc.createMarshaller(); marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true); marshaller.marshal(root, System.out); } } Output The following is the output from our demo code: 2011-05-30T11:02:30.000-04:00 2011-05-30T00:00:00.000-04:00 2011-05-30 11:02:30.000 2011-05-30T11:02:30.000 From http://blog.bdoughan.com/2011/05/jaxb-and-joda-time-dates-and-times.html

December 29, 2011

by Blaise Doughan

· 15,725 Views

The “4+1” View Model of Software Architecture

In November 1995, while working as Lead software architect at Hughes Aircraft Of Canada Philippe Kruchten published a paper entitled: "Architectural Blueprints—The “4+1” View Model of Software Architecture". The intent was to come up with a mechanism to separate the different aspects of a software system into different views of the system. Why? Because different stakeholders always have different interest in a software system. Some aspects of a system are relevant to the Developers; others are relevant to System administrators. The Developers want to know about things like classes; System administrators want to know about deployment, hardware and network configurations and don't care about classes. Similar points can be made for Testers, Project Managers and Customers. Kruchten thought it made sense to decompose architecture into distinct views so stakeholders could get what they wanted. In total there were 5 views in his approach but he decided to call it 4 + 1. We'll discuss why it's called 4 + 1 later! But first, let's have a look at each of the different views. The logical view This contains information about the various parts of the system. In UML the logical view is modelled using Class, Object, State machine and Interaction diagrams (e.g Sequence diagrams). It's relevance is really to developers. The process view This describes the concurrent processes within the system. It encompasses some non-functional requirements such as performance and availability. In UML, Activity diagrams - which can be used to model concurrent behaviour - are used to model the process view. The development view The development view focusses on software modules and subsystems. In UML, Package and Component diagrams are used to model the development view. The physical view The physical view describes the physical deployment of the system. For example, how many nodes are used and what is deployed on what node. Thus, the physical view concerns some non-functional requirements such as scalability and availability. In UML, Deployment diagrams are used to model the physical view. The use case view This view describes the functionality of the system from the perspective from outside world. It contains diagrams describing what the system is supposed to do from a black box perspective. This view typically contains Use Case diagrams. All other views use this view to guide them. Why is it called the 4 + 1 instead of just 5? Well this is because of the special significance the use case view has. When all other views are finished, it's effectively redundant. However, all other views would not be possible without it. It details the high levels requirements of the system. The other views detail how those requirements are realised. 4 + 1 came before UML It's important to remember the 4 + 1 approach was put forward two years before the first the introduction of UML which did not manifest in its first guise until 1997. UML is how most enterprise architectures are modelled and the 4 + 1 approach still plays a relevance to UML today. UML 2.0 has 13 different types of diagrams - each diagram type can be categorised into one of the 4 + 1 views. UML is 4 + 1 friendly! So is it important? The 4 + 1 approach isn't just about satisfying different stakeholders. It makes modelling easier to do because it makes it easier to organise. A typical project will contain numerous diagrams of the various types. For example, a project may contain a few hundred sequence diagrams and several class diagrams. Grouping diagrams of similar types and purpose means there is an emphasis in separating concerns. Sure isn't it just the same with Java? Grouping Java classes of similar purpose and related responsibilities into packages means organisation is better. Similarly, grouping different components into different jar files means organisation is better. Modelling tools will usually support the 4 + 1 approach and this means projects will have templates for how to split the various types of diagrams. In a company when projects follow industry standard templates again it means things are better organised. The 4 + 1 approach also provides a way for architects to be able to prioritise modelling concerns. It is rare that a project will have enough time to model every single diagram possible for an architecture. Architects can prioritise different views. For example, for a business domain intensive project it would make sense to prioritise the logical view. In a project with high concurrency and complex timing it would make sense to ensure the process view gets ample time. Similarly, the 4 + 1 approach makes it possible for stakeholders to get the parts of the model that are relevant to them. References: Architectural Blueprints—The “4+1” View Model of Software Architecture Paper http://www.cs.ubc.ca/~gregor/teaching/papers/4+1view-architecture.pdf Learning UML 2.0 by Russ Miles & Kim Hamilton. O'Reilly From http://dublintech.blogspot.com/2011/05/41-view-model-of-software-architecture.html

December 28, 2011

by Alex Staveley

· 53,932 Views

Enabling JMX in Hibernate, Ehcache, Quartz, DBPC and Spring

A collection of short how-to's for enabling JMX in several popular Java technologies. Continuing our journey with JMX (see: ...JMX for human beings) we will learn how to enable JMX support (typically statistics and monitoring capabilities) in some popular frameworks. Most of this information can be found on project's home pages, but I decided to collect it with few the addition of some useful tips. Hibernate (with Spring support) Exposing Hibernate statistics with JMX is pretty simple, however some nasty workarounds are requires when JPA API is used to obtain underlying SessionFactory class JmxLocalContainerEntityManagerFactoryBean() extends LocalContainerEntityManagerFactoryBean { override def createNativeEntityManagerFactory() = { val managerFactory = super.createNativeEntityManagerFactory() registerStatisticsMBean(managerFactory) managerFactory } def registerStatisticsMBean(managerFactory: EntityManagerFactory) { managerFactory match { case impl: EntityManagerFactoryImpl => val mBean = new StatisticsService(); mBean.setStatisticsEnabled(true) mBean.setSessionFactory(impl.getSessionFactory); val name = new ObjectName("org.hibernate:type=Statistics,application=spring-pitfalls") ManagementFactory.getPlatformMBeanServer.registerMBean(mBean, name); case _ => } } } Note that I have created a subclass of Springs built-in LocalContainerEntityManagerFactoryBean. By overriding createNativeEntityManagerFactory() method I can access EntityManagerFactory and by trying to downcast it to org.hibernate.ejb.EntityManagerFactoryImpl we were able to register Hibernate Mbean. One more thing has left. Obviously we have to use our custom subclass instead of org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean. Also, in order to collect the actual statistics instead of just seeing zeroes all the way down we must set the hibernate.generate_statistics flag. @Bean def entityManagerFactoryBean() = { val entityManagerFactoryBean = new JmxLocalContainerEntityManagerFactoryBean() entityManagerFactoryBean.setDataSource(dataSource()) entityManagerFactoryBean.setJpaVendorAdapter(jpaVendorAdapter()) entityManagerFactoryBean.setPackagesToScan("com.blogspot.nurkiewicz") entityManagerFactoryBean.setJpaPropertyMap( Map( "hibernate.hbm2ddl.auto" -> "create", "hibernate.format_sql" -> "true", "hibernate.ejb.naming_strategy" -> classOf[ImprovedNamingStrategy].getName, "hibernate.generate_statistics" -> true.toString ).asJava ) entityManagerFactoryBean } Here is a sample of what can we expect to see in JvisualVM (don't forget to install all plugins!): In addition we get a nice Hibernate logging: HQL: select generatedAlias0 from Book as generatedAlias0, time: 10ms, rows: 20 EhCache Monitoring caches is very important, especially in application where you expect values to generally be present there. I tend to query the database as often as needed to avoid unnecessary method arguments or local caching. Everything to make code as simple as possible. However this approach only works when caching on the database layer works correctly. Similar to Hibernate, enabling JMX monitoring in EhCache is a two-step process. First you need to expose provided MBean in MBeanServer: @Bean(initMethod = "init", destroyMethod = "dispose") def managementService = new ManagementService(ehCacheManager(), platformMBeanServer(), true, true, true, true, true) @Bean def platformMBeanServer() = ManagementFactory.getPlatformMBeanServer def ehCacheManager() = ehCacheManagerFactoryBean.getObject @Bean def ehCacheManagerFactoryBean = { val ehCacheManagerFactoryBean = new EhCacheManagerFactoryBean ehCacheManagerFactoryBean.setShared(true) ehCacheManagerFactoryBean.setCacheManagerName("spring-pitfalls") ehCacheManagerFactoryBean } Note that I explicitly set CacheManager name. This is not required but this name is used as part of the Mbean name and a default one contains hashCode value, which is not very pleasant. The final touch is to enable statistics on a cache basis: Now we can happily monitor various caching characteristics of every cache separately: As we can see the percentage of cache misses increases. Never a good thing. If we don't enable cache statistics, enabling JMX is still a good idea since we get a lot of management operations for free, including flushing and clearing caches (useful during debugging and testing). Quartz scheduler In my humble opinion Quartz scheduler is very underestimated library, but I will write an article about it on its own. This time we will only learn how to monitor it via JMX. Fortunately it's as simple as adding: org.quartz.scheduler.jmx.export=true To quartz.properties file. The JMX support in Quartz could have been slightly broader, but still one can query e.g. which jobs are currently running. By the way the new major version of Quartz (2.x) brings very nice DSL-like support for scheduling: val job = newJob(classOf[MyJob]) val trigger = newTrigger(). withSchedule( repeatSecondlyForever() ). startAt( futureDate(30, SECOND) ) scheduler.scheduleJob(job.build(), trigger.build()) Apache Commons DBCP Apache Commons DBCP is the most reasonable JDBC pooling library I came across. There is also c3p0, but it doesn't seem like it's actively developed any more. Tomcat JDBC Connection Pool looked promising, but since it's bundled in Tomcat, your JDBC drivers can no longer be packaged in WAR. The only problem with DBCP is that it does not support JMX. At all (see this two and a half year old issue). Fortunately this can be easily worked around. Besides we will learn how to use Spring built-in JMX support. Looks like the standard BasicDataSource has all what we need, all we have to do is to expose existing metrics via JMX. With Spring it is dead-simple – just subclass BasicDataSource and add @ManagedAttribute annotation over desired attributes: @ManagedResource class ManagedBasicDataSource extends BasicDataSource { @ManagedAttribute override def getNumActive = super.getNumActive @ManagedAttribute override def getNumIdle = super.getNumIdle @ManagedAttribute def getNumOpen = getNumActive + getNumIdle @ManagedAttribute override def getMaxActive: Int= super.getMaxActive @ManagedAttribute override def setMaxActive(maxActive: Int) { super.setMaxActive(maxActive) } @ManagedAttribute override def getMaxIdle = super.getMaxIdle @ManagedAttribute override def setMaxIdle(maxIdle: Int) { super.setMaxIdle(maxIdle) } @ManagedAttribute override def getMinIdle = super.getMinIdle @ManagedAttribute override def setMinIdle(minIdle: Int) { super.setMinIdle(minIdle) } @ManagedAttribute override def getMaxWait = super.getMaxWait @ManagedAttribute override def setMaxWait(maxWait: Long) { super.setMaxWait(maxWait) } @ManagedAttribute override def getUrl = super.getUrl @ManagedAttribute override def getUsername = super.getUsername } Here are few data source metrics going crazy during load-test: JMX support in the Spring framework itself is pretty simple. As you have seen above exposing arbitrary attribute or operation is just a matter of adding an annotation. You only have to remember about enabling JMX support using either XML or Java (also see: SPR-8943 : Annotation equivalent to with @Configuration): or: @Bean def annotationMBeanExporter() = new AnnotationMBeanExporter() This article wasn't particularly exciting. However, the knowledge of JMX metrics will enable us to write simple yet fancy dashboards in no time. Stay tuned! From http://nurkiewicz.blogspot.com/2011/12/enabling-jmx-in-hibernate-ehcache-qurtz.html

December 22, 2011

by Tomasz Nurkiewicz

· 12,629 Views

How to create offline HTML5 web apps in 5 easy steps

Among all cool new features introduced by HTML5, the possibility of caching web pages for offline use is definitely one of my favorites. Today, I’m glad to show you how you can create a page that will be available for offline browsing. Getting started View Demo Download files 1 – Add HTML5 doctype The first thing to do is create a valid HTML5 document. The HTML5 doctype is easier to remember than ones used for xhtml: ... Create a file named index.html, or get the example files from my CSS3 media queries article to use as a basis for this tutorial. In case you need it, the full HTML5 specs are available on the W3C website. 2 – Add .htaccess support The file we’re going to create to cache our web page is called a manifest file. Before creating it, we first have to add a directive to the .htaccess file (assuming your server is Apache). Open the .htaccess file, which is located on your website root, and add the following code: AddType text/cache-manifest .manifest This directive makes sure that every .manifest file is served as text/cache-manifest. If the file isn’t, then the whole manifest will have no effect and the page will not be available offline. 3 – Create the manifest file Now, things are going to be more interesting as we create a manifest file. Create a new file and save it as offline.manifest. Then, paste the following code in it. I’ll explain it later. CACHE MANIFEST #This is a comment CACHE index.html style.css image.jpg image-med.jpg image-small.jpg notre-dame.jpg Right now, you have a perfectly working manifest file. The way it works is very simple: After the CACHE declaration, you have to list each files you want to make available offline. That’s enough for caching a simple web page like the one from my example, but HTML5 caching has other interesting possibilities. For example, consider the following manifest file: CACHE MANIFEST #This is a comment CACHE index.html style.css NETWORK: search.php login.php FALLBACK: /api offline.html Like in the example manifest file, we have a CACHE declaration that caches index.html and style.css. But we also have the NETWORK declaration, which is used to specify files that shouldn’t be cached, such as a login page. The last declaration is FALLBACK. This declaration allows you to redirect the user to a particular file (in this example, offline.html) if a resource (/api) isn’t available offline. 4 – Link your manifest file to the html document Now, both your manifest file and your main html document are ready. The only thing you still have to do is to link the manifest file to the html document. Doing this is easy: simply add the manifest attribute to the html element as shown below: 5 – Test it Once done, you’re ready to go. If you visit your index.html file with Firefox 3.5+, you should see a banner like this one: Other browser I’ve tested (Chrome, Safari, Android and iPhone) do not warn about the file caching, and the file is automatically cached. Below you’ll find the browser compatibility of this technique: As usual Internet Explorer does not support it. IE: No support Firefox: 3.5+ Safari: 4.0+ Chrome: 5.0+ Opera: 10.6+ iPhone: 2.1+ Android: 2.0+ Source: http://www.catswhocode.com/blog/how-to-create-offline-html5-web-apps-in-5-easy-steps

December 22, 2011

by Jean-Baptiste Jung

· 24,071 Views