Databases Resources

The Latest Databases Topics

Practical PHP Refactoring: Convert Procedural Design to Objects

Even in languages where there are no constructs but classes, there is no constraint that can force a programmer into writing object-oriented code. In many cases, just wrapping a series of functions into classes do not result in the design. The Convert Procedural Design to Objects has great benefits, but it reaches a very large scale (potentially the whole application). What does object-oriented mean? In 2011, there is no reason to write procedural code anymore in a web application: all libraries and frameworks worth inclusion are object-oriented, even a part of the PHP code (SPL but most importantly PDO, and even DateTime). All other successful languages in the web space are either object-oriented, functional, or both. Software design literature is based on objects and their patterns. However, using class and extends keywords does not suffice to produce an object-oriented design; entire books are written on this topic. This refactoring tries to solve a common case of procedural design shoehorned into an object model: classes containing behavior, and depending on many other ones. dumb classes only being a container for data, or worse primitive types with no methods at all. It is common in procedural design to segregate responsibilities in this procedure/record pattern, but high level methods can be added on these dumb classes to encapsulate a bit of the data they are containing, and simplify the procedural classes using them. It is just a starting point towards "object-orientation", but often an overlooked one. The Tell Don't Ask principle summarizes what we would like to do in very few words: Procedural code gets information then makes decisions. Object-oriented code tells objects to do things. -- Alec Sharp Instead of an infinite series of calls from a procedure to getters and setters, we want to pass messages even to the lower level objects. Steps A preliminary step is to turn primitive data structures into a data object wrapping them and providing getters. If you see variables like arrays or strings passed around in the code to refactor, this step is necessary to provide a class to accomodate potential new methods. Inline the procedural code into a single class. This step makes us able to extract code along different lines than the original ones in the rest of the refactoring: for example, procedural code is often divided in temporal steps, while objects may segregate different parts of the available data instead. Extract methods on the procedural class. See the next steps for hints on what to extract. Methods that have one of the dumb objects as argument can be moved on the object itself, by eliminating it as a parameter but maintaining the remaining ones. Move Method should free the original giant class from any unrelated responsibilities. The goal is to remove logic from the procedural class as much as possibile, going into an opposite direction with regard to the original design; Fowler notes that in some cases the procedural class totally disappears. Example One of my popular examples is invoice calculation: the computation of fields like total price and due taxes from a series of information. In this procedural design, we have one invoice and a bunch of rows modelled with Primitive Obsession (as arrays). assertEquals(4640, $invoice->total()); } } class Invoice { private $rows; public function __construct($rows) { $this->rows = $rows; } public function total() { $total = 0; foreach ($this->rows as $row) { $rowTotal = $row[0] + $row[0] * $row[1] / 100; $total += $rowTotal; } return $total; } } We introduce the Row class, but the design is now worse: it adds a bunch of lines of code (the new class) without the new entity giving us something in return. The Row object has no responsibilities, and we just have to write getters and sometimes setters. At least we're writing down parts of our model for documentation (giving names to the net price and tax rate numbers), but we aren't sure this model is the most versatile one. assertEquals(4640, $invoice->total()); } } class Invoice { private $rows; public function __construct($rows) { $this->rows = $rows; } public function total() { $total = 0; foreach ($this->rows as $row) { $rowTotal = $row->getNetPrice() + $row->getTaxRate() * $row->getNetPrice() / 100; $total += $rowTotal; } return $total; } } class Row { public function __construct($netPrice, $taxRate) { $this->netPrice = $netPrice; $this->taxRate = $taxRate; } public function getNetPrice() { return $this->netPrice; } public function getTaxRate() { return $this->taxRate; } } For the scope of this small example all business logic is already in a single class, thus we don't have to inline anything. Let's extract a first method instead: class Invoice { private $rows; public function __construct($rows) { $this->rows = $rows; } public function total() { $total = 0; foreach ($this->rows as $row) { $total += $this->rowTotal($row); } return $total; } public function rowTotal($row) { return $row->getNetPrice() + $row->getTaxRate() * $row->getNetPrice() / 100; } } That was a small enough step. In a real situation, the extracted code may be 100-line long, so we would want to test the extraction has been successful before doing anything else. In fact, since the test still passes, we can notice this method has a Row object in its arguments, so it can be moved on Row now that its logic has been clearly isolated: $this->field references should become additional parameters of the method before moving it. Other parameters should just remain formal parameters. Calls to $this->anotherMethod() would be more difficult to treat, as you have the options of moving anothetMethod() in the Row class too, or to extract an interface containing anotherMethod() and pass $this. While moving the code, we change the references to $row to $this, and check that the method scope is public. We also rename the method to total() instead of rowTotal(). { private $rows; public function __construct($rows) { $this->rows = $rows; } public function total() { $total = 0; foreach ($this->rows as $row) { $total += $row->total(); } return $total; } } class Row { public function __construct($netPrice, $taxRate) { $this->netPrice = $netPrice; $this->taxRate = $taxRate; } public function getNetPrice() { return $this->netPrice; } public function getTaxRate() { return $this->taxRate; } public function total() { return $this->getNetPrice() + $this->getTaxRate() * $this->getNetPrice() / 100; } } Finally, we inline the getters, since they're not used from outside the Row class. They will be introduced again in the future in case there is a real need for them: as a rule of thumb we avoid exposing any state from Row that is not necessary. class Row { public function __construct($netPrice, $taxRate) { $this->netPrice = $netPrice; $this->taxRate = $taxRate; } public function total() { return $this->netPrice + $this->taxRate * $this->netPrice / 100; } }

February 8, 2012

by Giorgio Sironi

· 18,244 Views

wxPython: wx.ListCtrl Tips and Tricks

Previously, we covered some tips and tricks for the Grid control. In this article, we will go over a few tips and tricks for the wx.ListCtrl widget when it’s in “report” mode. Take a look at the tips below: How to create a simple ListCtrl How to sort the rows of a ListCtrl How to make the ListCtrl cells editable in place Associating objects with ListCtrl rows Alternate the row colors of a ListCtrl How to create a simple ListCtrl The list control is a pretty common widget. In Windows, you will see the list control in Windows Explorer. It has four modes: icon, small icon, list, and report. They roughly match up with icons, tiles, list, and details views in Windows Explorer respectively. We’re going to focus on the ListCtrl in Report mode because that’s the mode that most developers use it in. Here’s a simple example of how to create a list control: import wx ######################################################################## class MyForm(wx.Frame): #---------------------------------------------------------------------- def __init__(self): wx.Frame.__init__(self, None, wx.ID_ANY, "List Control Tutorial") # Add a panel so it looks the correct on all platforms panel = wx.Panel(self, wx.ID_ANY) self.index = 0 self.list_ctrl = wx.ListCtrl(panel, size=(-1,100), style=wx.LC_REPORT |wx.BORDER_SUNKEN ) self.list_ctrl.InsertColumn(0, 'Subject') self.list_ctrl.InsertColumn(1, 'Due') self.list_ctrl.InsertColumn(2, 'Location', width=125) btn = wx.Button(panel, label="Add Line") btn.Bind(wx.EVT_BUTTON, self.add_line) sizer = wx.BoxSizer(wx.VERTICAL) sizer.Add(self.list_ctrl, 0, wx.ALL|wx.EXPAND, 5) sizer.Add(btn, 0, wx.ALL|wx.CENTER, 5) panel.SetSizer(sizer) #---------------------------------------------------------------------- def add_line(self, event): line = "Line %s" % self.index self.list_ctrl.InsertStringItem(self.index, line) self.list_ctrl.SetStringItem(self.index, 1, "01/19/2010") self.list_ctrl.SetStringItem(self.index, 2, "USA") self.index += 1 #---------------------------------------------------------------------- # Run the program if __name__ == "__main__": app = wx.App(False) frame = MyForm() frame.Show() app.MainLoop() As you can probably tell from the code above, it’s really easy to create a ListCtrl instance. Notice that we set the style to report mode using the wx.LC_REPORT flag. To add column headers, we call the ListCtrl’s InsertColumn method and pass an integer to tell the ListCtrl which column is which and a string for the user’s convenience. Yes, the columns are zero-based, so the first column is number zero, the second column is number one, etc. The next important piece is contained in the button’s event handler, add_line, where we learn how to add rows of data to the ListCtrl. The typical method to use is the InsertStringItem method. If you wanted an image added to each row as well, then you’d use a more complicated method like InsertColumnInfo along with the InsertImageStringItem method. You can see how to use them in the wxPython demo. We’re sticking with the easy stuff in this article. Anyway, when you call InsertStringItem you give it the correct row index and a string. You use the SetStringItem method to set the data for the other columns of the row. Notice that the SetStringItem method requires three parameters: the row index, the column index and a string. Lastly, we increment the row index so we don’t overwrite anything. Now you can get out there and make your own! Let’s continue and find out how to sort rows! How to sort the rows of a ListCtrl The ListCtrl widget has had some extra scripts written for it that add functionality to the widget. These scripts are called mixins. You can read about them here. For this recipe, we’ll be using the ColumnSorterMixin mixin. The code below is a stripped down version of one of the wxPython demo examples. import wx import wx.lib.mixins.listctrl as listmix musicdata = { 0 : ("Bad English", "The Price Of Love", "Rock"), 1 : ("DNA featuring Suzanne Vega", "Tom's Diner", "Rock"), 2 : ("George Michael", "Praying For Time", "Rock"), 3 : ("Gloria Estefan", "Here We Are", "Rock"), 4 : ("Linda Ronstadt", "Don't Know Much", "Rock"), 5 : ("Michael Bolton", "How Am I Supposed To Live Without You", "Blues"), 6 : ("Paul Young", "Oh Girl", "Rock"), } ######################################################################## class TestListCtrl(wx.ListCtrl): #---------------------------------------------------------------------- def __init__(self, parent, ID=wx.ID_ANY, pos=wx.DefaultPosition, size=wx.DefaultSize, style=0): wx.ListCtrl.__init__(self, parent, ID, pos, size, style) ######################################################################## class TestListCtrlPanel(wx.Panel, listmix.ColumnSorterMixin): #---------------------------------------------------------------------- def __init__(self, parent): wx.Panel.__init__(self, parent, -1, style=wx.WANTS_CHARS) self.index = 0 self.list_ctrl = TestListCtrl(self, size=(-1,100), style=wx.LC_REPORT |wx.BORDER_SUNKEN |wx.LC_SORT_ASCENDING ) self.list_ctrl.InsertColumn(0, "Artist") self.list_ctrl.InsertColumn(1, "Title", wx.LIST_FORMAT_RIGHT) self.list_ctrl.InsertColumn(2, "Genre") items = musicdata.items() index = 0 for key, data in items: self.list_ctrl.InsertStringItem(index, data[0]) self.list_ctrl.SetStringItem(index, 1, data[1]) self.list_ctrl.SetStringItem(index, 2, data[2]) self.list_ctrl.SetItemData(index, key) index += 1 # Now that the list exists we can init the other base class, # see wx/lib/mixins/listctrl.py self.itemDataMap = musicdata listmix.ColumnSorterMixin.__init__(self, 3) self.Bind(wx.EVT_LIST_COL_CLICK, self.OnColClick, self.list_ctrl) sizer = wx.BoxSizer(wx.VERTICAL) sizer.Add(self.list_ctrl, 0, wx.ALL|wx.EXPAND, 5) self.SetSizer(sizer) #---------------------------------------------------------------------- # Used by the ColumnSorterMixin, see wx/lib/mixins/listctrl.py def GetListCtrl(self): return self.list_ctrl #---------------------------------------------------------------------- def OnColClick(self, event): print "column clicked" event.Skip() ######################################################################## class MyForm(wx.Frame): #---------------------------------------------------------------------- def __init__(self): wx.Frame.__init__(self, None, wx.ID_ANY, "List Control Tutorial") # Add a panel so it looks the correct on all platforms panel = TestListCtrlPanel(self) #---------------------------------------------------------------------- # Run the program if __name__ == "__main__": app = wx.App(False) frame = MyForm() frame.Show() app.MainLoop() This code is a little on the odd side in that we have inherit the mixin in the wx.Panel based class rather than the wx.ListCtrl class. You can do it either way though as long as you rearrange the code correctly. Anyway, we are going to home in on the key differences between this example and the previous one. The first difference of major importance is in the looping construct where we insert the list control’s data. Here we include the list control’s SetItemData method to include the necessary inner-workings that allow the sorting to take place. As you might have guessed, this method associates the row index with the music data dict’s key. Next we instantiate the ColumnSorterMixin and tell it how many columns there are in the list control. We could have left the EVT_LIST_COL_CLICK binding off this example as it has nothing to do with the actual sorting of the rows, but in the interest of increasing your knowledge, it was left in. All it does is show you how to catch the user’s column click event. The rest of the code is self-explanatory. If you want to know about the requirements for this mixin, especially when you have images in your rows, please see the relevant section in the source (i.e. listctrl.py). Now, wasn’t that easy? Let’s continue our journey and find out how to make the cells editable! How to make the ListCtrl cells editable in place Sometimes, the programmer will want to allow the user to click on a cell and edit it in place. This is kind of a lightweight version of the wx.grid.Grid control. Here’s an example: import wx import wx.lib.mixins.listctrl as listmix ######################################################################## class EditableListCtrl(wx.ListCtrl, listmix.TextEditMixin): ''' TextEditMixin allows any column to be edited. ''' #---------------------------------------------------------------------- def __init__(self, parent, ID=wx.ID_ANY, pos=wx.DefaultPosition, size=wx.DefaultSize, style=0): """Constructor""" wx.ListCtrl.__init__(self, parent, ID, pos, size, style) listmix.TextEditMixin.__init__(self) ######################################################################## class MyPanel(wx.Panel): """""" #---------------------------------------------------------------------- def __init__(self, parent): """Constructor""" wx.Panel.__init__(self, parent) rows = [("Ford", "Taurus", "1996", "Blue"), ("Nissan", "370Z", "2010", "Green"), ("Porche", "911", "2009", "Red") ] self.list_ctrl = EditableListCtrl(self, style=wx.LC_REPORT) self.list_ctrl.InsertColumn(0, "Make") self.list_ctrl.InsertColumn(1, "Model") self.list_ctrl.InsertColumn(2, "Year") self.list_ctrl.InsertColumn(3, "Color") index = 0 for row in rows: self.list_ctrl.InsertStringItem(index, row[0]) self.list_ctrl.SetStringItem(index, 1, row[1]) self.list_ctrl.SetStringItem(index, 2, row[2]) self.list_ctrl.SetStringItem(index, 3, row[3]) index += 1 sizer = wx.BoxSizer(wx.VERTICAL) sizer.Add(self.list_ctrl, 0, wx.ALL|wx.EXPAND, 5) self.SetSizer(sizer) ######################################################################## class MyFrame(wx.Frame): """""" #---------------------------------------------------------------------- def __init__(self): """Constructor""" wx.Frame.__init__(self, None, wx.ID_ANY, "Editable List Control") panel = MyPanel(self) self.Show() #---------------------------------------------------------------------- if __name__ == "__main__": app = wx.App(False) frame = MyFrame() app.MainLoop() In this script, we put the TextEditMixin in our wx.ListCtrl class instead of our wx.Panel, which is the opposite of the previous example. The mixin itself does all the heavy lifting. Again, you’ll have to check out the mixin’s source to really understand how it works. Associating objects with ListCtrl rows This subject comes up a lot: How do I associate data (i.e. objects) with my ListCtrl’s rows? Well, we’re going to find out exactly how to do that with the following code: import wx ######################################################################## class Car(object): """""" #---------------------------------------------------------------------- def __init__(self, make, model, year, color="Blue"): """Constructor""" self.make = make self.model = model self.year = year self.color = color ######################################################################## class MyPanel(wx.Panel): """""" #---------------------------------------------------------------------- def __init__(self, parent): """Constructor""" wx.Panel.__init__(self, parent) rows = [Car("Ford", "Taurus", "1996"), Car("Nissan", "370Z", "2010"), Car("Porche", "911", "2009", "Red") ] self.list_ctrl = wx.ListCtrl(self, size=(-1,100), style=wx.LC_REPORT |wx.BORDER_SUNKEN ) self.list_ctrl.Bind(wx.EVT_LIST_ITEM_SELECTED, self.onItemSelected) self.list_ctrl.InsertColumn(0, "Make") self.list_ctrl.InsertColumn(1, "Model") self.list_ctrl.InsertColumn(2, "Year") self.list_ctrl.InsertColumn(3, "Color") index = 0 self.myRowDict = {} for row in rows: self.list_ctrl.InsertStringItem(index, row.make) self.list_ctrl.SetStringItem(index, 1, row.model) self.list_ctrl.SetStringItem(index, 2, row.year) self.list_ctrl.SetStringItem(index, 3, row.color) self.myRowDict[index] = row index += 1 sizer = wx.BoxSizer(wx.VERTICAL) sizer.Add(self.list_ctrl, 0, wx.ALL|wx.EXPAND, 5) self.SetSizer(sizer) #---------------------------------------------------------------------- def onItemSelected(self, event): """""" currentItem = event.m_itemIndex car = self.myRowDict[currentItem] print car.make print car.model print car.color print car.year ######################################################################## class MyFrame(wx.Frame): """""" #---------------------------------------------------------------------- def __init__(self): """Constructor""" wx.Frame.__init__(self, None, wx.ID_ANY, "List Control Tutorial") panel = MyPanel(self) self.Show() #---------------------------------------------------------------------- if __name__ == "__main__": app = wx.App(False) frame = MyFrame() app.MainLoop() The list control widget actually doesn’t have a built-in way to accomplish this feat. If you want that, then you’ll want to check out the ObjectListView widget, which wraps the ListCtrl and gives it a lot more functionality. In the meantime, we’ll take a minute and go over the code above. The first piece is just a plain Car class with four attributes. Then in the MyPanel class, we create a list of Car objects that we’ll use for the ListCtrl’s data. To add the data to the ListCtrl, we use a for loop to iterate over the list. We also associate each row with a Car object using a Python dictionary. We use the row’s index for the key and the dict’s value ends up being the Car object. This allows us to access all the Car/row object’s data later on in the onItemSelected method. Let’s check that out! In onItemSelected, we grab the row’s index with the following little trick: event.m_itemIndex. Then we use that value as the key for our dictionary so that we can gain access to the Car object associated with that row. At this point, we just print out all the Car object’s attributes, but you could do whatever you want here. This basic idea could easily be extended to use a result set from a SqlAlchemy query for the ListCtrl’s data. Hopefully you get the general idea. Now if you were paying close attention, like Robin Dunn (creator of wxPython) was, then you might notice some really silly logic errors in this code. Did you find them? Well, you won’t see it unless you sort the rows, delete a row or insert a row. Do you see it now? Yes, I stupidly based the “unique” key in my dictionary on the row’s position, which will change if any of those events happen. So let’s look at a better example: import wx ######################################################################## class Car(object): """""" #---------------------------------------------------------------------- def __init__(self, make, model, year, color="Blue"): """Constructor""" self.id = id(self) self.make = make self.model = model self.year = year self.color = color ######################################################################## class MyPanel(wx.Panel): """""" #---------------------------------------------------------------------- def __init__(self, parent): """Constructor""" wx.Panel.__init__(self, parent) rows = [Car("Ford", "Taurus", "1996"), Car("Nissan", "370Z", "2010"), Car("Porche", "911", "2009", "Red") ] self.list_ctrl = wx.ListCtrl(self, size=(-1,100), style=wx.LC_REPORT |wx.BORDER_SUNKEN ) self.list_ctrl.Bind(wx.EVT_LIST_ITEM_SELECTED, self.onItemSelected) self.list_ctrl.InsertColumn(0, "Make") self.list_ctrl.InsertColumn(1, "Model") self.list_ctrl.InsertColumn(2, "Year") self.list_ctrl.InsertColumn(3, "Color") index = 0 self.myRowDict = {} for row in rows: self.list_ctrl.InsertStringItem(index, row.make) self.list_ctrl.SetStringItem(index, 1, row.model) self.list_ctrl.SetStringItem(index, 2, row.year) self.list_ctrl.SetStringItem(index, 3, row.color) self.list_ctrl.SetItemData(index, row.id) self.myRowDict[row.id] = row index += 1 sizer = wx.BoxSizer(wx.VERTICAL) sizer.Add(self.list_ctrl, 0, wx.ALL|wx.EXPAND, 5) self.SetSizer(sizer) #---------------------------------------------------------------------- def onItemSelected(self, event): """""" currentItem = event.m_itemIndex car = self.myRowDict[self.list_ctrl.GetItemData(currentItem)] print car.make print car.model print car.color print car.year ######################################################################## class MyFrame(wx.Frame): """""" #---------------------------------------------------------------------- def __init__(self): """Constructor""" wx.Frame.__init__(self, None, wx.ID_ANY, "List Control Tutorial") panel = MyPanel(self) self.Show() #---------------------------------------------------------------------- if __name__ == "__main__": app = wx.App(False) frame = MyFrame() app.MainLoop() In this example, we add a new attribute to our Car class that creates a unique id for each instance that is created using Python’s handy id builtin. Then in the loop where we add the data to the list control, we call the widget’s SetItemData method and give it the row index and the car instance’s unique id. Now it doesn’t matter where the row ends up because it’s had the unique id affixed to it. Finally, we have to modify the onItemSelected to get the right object. The magic happens in this code: # this code was helpfully provided by Robin Dunn car = self.myRowDict[self.list_ctrl.GetItemData(currentItem)] Cool, huh? Our last example will cover how to alternate the row colors, so let’s take a look! Alternate the row colors of a ListCtrl As this section’s title suggests, we will look at how to alternate colors of the rows of a ListCtrl. Here’s the code: import wx import wx.lib.mixins.listctrl as listmix ######################################################################## class MyPanel(wx.Panel): """""" #---------------------------------------------------------------------- def __init__(self, parent): """Constructor""" wx.Panel.__init__(self, parent) rows = [("Ford", "Taurus", "1996", "Blue"), ("Nissan", "370Z", "2010", "Green"), ("Porche", "911", "2009", "Red") ] self.list_ctrl = wx.ListCtrl(self, style=wx.LC_REPORT) self.list_ctrl.InsertColumn(0, "Make") self.list_ctrl.InsertColumn(1, "Model") self.list_ctrl.InsertColumn(2, "Year") self.list_ctrl.InsertColumn(3, "Color") index = 0 for row in rows: self.list_ctrl.InsertStringItem(index, row[0]) self.list_ctrl.SetStringItem(index, 1, row[1]) self.list_ctrl.SetStringItem(index, 2, row[2]) self.list_ctrl.SetStringItem(index, 3, row[3]) if index % 2: self.list_ctrl.SetItemBackgroundColour(index, "white") else: self.list_ctrl.SetItemBackgroundColour(index, "yellow") index += 1 sizer = wx.BoxSizer(wx.VERTICAL) sizer.Add(self.list_ctrl, 0, wx.ALL|wx.EXPAND, 5) self.SetSizer(sizer) ######################################################################## class MyFrame(wx.Frame): """""" #---------------------------------------------------------------------- def __init__(self): """Constructor""" wx.Frame.__init__(self, None, wx.ID_ANY, "List Control w/ Alternate Colors") panel = MyPanel(self) self.Show() #---------------------------------------------------------------------- if __name__ == "__main__": app = wx.App(False) frame = MyFrame() app.MainLoop() The code above will alternate each row’s background color. Thus you should see yellow and white rows. We do this by calling the ListCtrl instance’s SetItemBackgroundColour method. If you were using a virtual list control, then you’d want to override the OnGetItemAttr method. To see an example of the latter method, open up your copy of the wxPython demo; there’s one in there. Wrapping Up We’ve covered a lot of ground here. You should now be able to do a lot more with your wx.ListCtrl than when you started, assuming you’re new to using it, of course. Feel free to ask questions in the comments or suggest future recipes. I hope you found this helpful! Note: All examples were tested on Windows XP with Python 2.5 and wxPython 2.8.10.1. They were also tested on Windows 7 Professional with Python 2.6 Additional Reading The official wxPython wx.ListCtrl documentation The ListControls wiki page ListCtrl Tooltips wiki page The ObjectListView website The UltimateListCtrl, a pure Python implementation now included with wxPython Source Code listctrl.zip listctrl.tar Source: http://www.blog.pythonlibrary.org/2011/01/04/wxpython-wx-listctrl-tips-and-tricks/

February 2, 2012

by Mike Driscoll

· 23,181 Views

Mapping Mongodb ISODate to Spring Roo Entity

I have been inserting log4j entries into a mongodb database and each entry has been given an ISODate timestamp: "timestamp" : ISODate("2012-01-17T22:30:19.839Z") To create a mapping for this, I had to manually add the timestamp as Spring Roo did not allow timestamp to be used as it was a reserved word. So I manually added: @DateTimeFormat(style="MM/dd/yyyy") private java.util.Date timestamp; But I started getting the following error: Invalid style specification: MM/dd/yyyy The stack trace for that error was: org.joda.time.format.DateTimeFormat.createFormatterForStyle(DateTimeFormat.java:702) org.joda.time.format.DateTimeFormat.patternForStyle(DateTimeFormat.java:212) com.comcast.uivr.web.LoggingController_Roo_Controller.ajc$interMethod$com_comcast_uivr_web_LoggingController_Roo_Controller$com_comcast_uivr_web_LoggingController$addDateTimeFormatPatterns(LoggingController_Roo_Controller.aj:98) com.comcast.uivr.web.LoggingController.ajc$interMethodDispatch2$com_comcast_uivr_web$addDateTimeFormatPatterns(LoggingController.java:1) com.comcast.uivr.web.LoggingController_Roo_Controller.ajc$interMethodDispatch1$com_comcast_uivr_web_LoggingController_Roo_Controller$com_comcast_uivr_web_LoggingController$addDateTimeFormatPatterns(LoggingController_Roo_Controller.aj) com.comcast.uivr.web.LoggingController_Roo_Controller.ajc$interMethod$com_comcast_uivr_web_LoggingController_Roo_Controller$com_comcast_uivr_web_LoggingController$list(LoggingController_Roo_Controller.aj:66) com.comcast.uivr.web.LoggingController.list(LoggingController.java:1) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) java.lang.reflect.Method.invoke(Method.java:597) org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:212) org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:126) org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96) org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617) org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:578) org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80) org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:900) org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:827) org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882) org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:778) javax.servlet.http.HttpServlet.service(HttpServlet.java:617) javax.servlet.http.HttpServlet.service(HttpServlet.java:717) org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:77) org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76) org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:88) org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76) org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) java.lang.Thread.run(Thread.java:662) To fix this I attempted to add the ISO date format for the @DateTimeFormat @DateTimeFormat(style="yyyyMMdd'T'HHmmss.SSSZ") private java.util.Date timestamp; Which still did not work and had the error. To resolve this I shitched to use ISO.DATE_TIME as the style: @DateTimeFormat(iso=ISO.DATE_TIME) private java.util.Date timestamp; From http://www.baselogic.com/blog/development/springframework/mapping-mongodb-isodate-spring-roo-entity/

January 30, 2012

by Mick Knutson

· 23,954 Views · 2 Likes

Streaming Files from MongoDB GridFS

Not too long ago I tweeted what I felt was a small triumph on my latest project, streaming files from MongoDB GridFS for downloads (rather than pulling the whole file into memory and then serving it up). I promised to blog about this but unfortunately my specific usage was a little coupled to the domain on my project so I couldn’t just show it off as is. So I’ve put together an example node.js+GridFS application and shared it on github and will use this post to explain how I accomplished it. GridFS Module First off, special props go to tjholowaychuk who responded in the #node.js irc channel when I asked if anyone has had luck with using GridFS from mongoose. A lot of my resulting code is derived from an gist he shared with me. Anyway, to the code. I’ll describe how I’m using gridfs and after setting the ground work illustrate how simple it is to stream files from GridFS. I created a gridfs module that basically accesses GridStore through mongoose (which I use throughout my application) that can also share the db connection created when connecting mongoose to the mongodb server. mongoose = require "mongoose" request = require "request" GridStore = mongoose.mongo.GridStore Grid = mongoose.mongo.Grid ObjectID = mongoose.mongo.BSONPure.ObjectID We can’t get files from mongodb if we cannot put anything into it, so let’s create a putFile operation. exports.putFile = (path, name, options..., fn) -> db = mongoose.connection.db options = parse(options) options.metadata.filename = name new GridStore(db, name, "w", options).open (err, file) -> return fn(err) if err file.writeFile path, fn parse = (options) -> opts = {} if options.length > 0 opts = options[0] if !opts.metadata opts.metadata = {} opts This really just delegates to the putFile operation that exists in GridStore as part of the mongodb module. I also have a little logic in place to parse options, providing defaults if none were provided. One interesting feature to note is that I store the filename in the metadata because at the time I ran into a funny issue where files retrieved from gridFS had the id as the filename (even though a look in mongo reveals that the filename is in fact in the database). Now the get operation. The original implementation of this simply passed the contents as a buffer to the provided callback by calling store.readBuffer(), but this is now changed to pass the resulting store object to the callback. The value in this is that the caller can use the store object to access metadata, contentType, and other details. The user can also determine how they want to read the file (either into memory or using a ReadableStream). exports.get = (id, fn) -> db = mongoose.connection.db id = new ObjectID(id) store = new GridStore(db, id, "r", root: "fs" ) store.open (err, store) -> return fn(err) if err # band-aid if "#{store.filename}" == "#{store.fileId}" and store.metadata and store.metadata.filename store.filename = store.metadata.filename fn null, store This code just has a small blight in that it checks to see if the filename and fileId are equal. If they are, it then checks to see if metadata.filename is set and sets store.filename to the value found there. I’ve tabled the issue to investigate further later. The Model In my specific instance, I wanted to attach files to a model. In this example, let’s pretend that we have an Application for something (job, a loan application, etc) that we can attach any number of files to. Think of tax receipts, a completed application, other scanned documents. ApplicationSchema = new mongoose.Schema( name: String files: [ mongoose.Schema.Mixed ] ) ApplicationSchema.methods.addFile = (file, options, fn) -> gridfs.putFile file.path, file.filename, options, (err, result) => @files.push result @save fn Here I define files as an array of Mixed object types (meaning they can be anything) and a method addFile which basically takes an object that at least contains a path and filename attribute. It uses this to save the file to gridfs and stores the resulting gridstore file object in the files array (this contains stuff like an id, uploadDate, contentType, name, size, etc). Handling Requests This all plugs in to the request handler to handle form submissions to /new. All this entails is creating an Application model instance, adding the uploaded file from the request (in this case we named the file field “file”, hence req.files.file) and saving it. app.post "/new", (req, res) -> application = new Application() application.name = req.body.name opts = content_type: req.files.file.type application.addFile req.files.file, opts, (err, result) -> res.redirect "/" Now the sum of all this work allows us to reap the rewards by making it super simple to download a requested file from gridFS. app.get "/file/:id", (req, res) -> gridfs.get req.params.id, (err, file) -> res.header "Content-Type", file.type res.header "Content-Disposition", "attachment; filename=#{file.filename}" file.stream(true).pipe(res) Here we simply look up a file by id and use the resulting file object to set Content-Type and Content-Disposition fields and finally make use of ReadableStream::pipe to write the file out to the response object (which is an instance of WritableStream). This is the piece of magic that streams data from MongoDB to the client side. Ideas This is just a humble beginning. Other ideas include completely encapsulating gridfs within the model. Taking things further we could even turn the gridfs model into a mongoose plugin to allow completely blackboxed usage of gridfs. Feel free to check the project out and let me know if you have ideas to take it even further. Fork away! Source: http://blog.james-carr.org/2012/01/09/streaming-files-from-mongodb-gridfs/

January 23, 2012

by James Carr

· 22,418 Views

The Persistence Layer with Spring Data JPA

This is the forth of a series of articles about Persistence with Spring. This article will focus on the configuration and implementation of the persistence layer with Spring 3.1, JPA and Spring Data. For a step by step introduction about setting up the Spring context using Java based configuration and the basic Maven pom for the project, see this article. The Persistence with Spring series: Part 1 – The Persistence Layer with Spring 3.1 and Hibernate Part 3 – The Persistence Layer with Spring 3.1 and JPA Part 5 – Transaction configuration with JPA and Spring 3.1 No More DAO implementations As I discussed in a previous post, the DAO layer usually consists of a lot of boilerplate code that can and should be simplified. The advantages of such a simplification are many fold: a decrease in the number of artifacts that need to be defined and maintained, simplification and consistency of data access patterns and consistency of configuration. Spring Data takes this simplification one step forward and makes it possible to remove the DAO implementations entirely – the interface of the DAO is now the only artifact that need to be explicitly defined. The Spring Data managed DAO In order to start leveraging the Spring Data programming model with JPA, a DAO interface needs to extend the JPA specific Repository interface - JpaRepository – in Spring’s interface hierarchy. This will enable Spring Data to find this interface and automatically create an implementation for it. Also, by extending the interface we get most if not all relevant CRUD generic methods for standard data access available in the DAO. Defining custom access method and queries As discussed, by implementing one of the Repository interfaces, the DAO will already have some basic CRUD methods (and queries) defined and implemented. To define more specific access methods, Spring JPA supports quite a few options – you can either simply define a new method in the interface, or you can provide the actual JPQ query by using the @Query annotation. A third option to define custom queries is to make use of JPA Named Queries, but this has the disadvantage that it either involves XML or burdening the domain class with the queries. In addition to these, Spring Data introduces a more flexible and convenient API, similar to the JPA Criteria API, only more readable and reusable. The advantages of this API will become more pronounced when dealing with a large number of fixed queries that could potentially be more concisely expressed through a smaller number of reusable blocks that keep occurring in different combinations. Automatic Custom Queries When Spring Data creates a new Repository implementation, it analyzes all the methods defined by the interfaces and tries to automatically generate queries from the method name. While this has limitations, it is a very powerful and elegant way of defining new custom access methods with very little effort. For example, if the managed entity has a name field (and the Java Bean standard getter and setter for that field), defining the findByName method in the DAO interface will automatically generate the correct query: public interface IFooDAO extends JpaRepository< Foo, Long >{ Foo findByName( final String name ); } This is a relatively simple example; a much larger set of keywords is supported by query creation mechanism. In the case that the parser cannot match the property with the domain object field, the following exception is thrown: java.lang.IllegalArgumentException: No property nam found for type class org.rest.model.Foo Manual Custom Queries In addition to deriving the query from the method name, a custom query can be manually specified with the method level @Query annotation. For even more fine grained control over the creation of queries, such as using named parameters or modifying existing queries, the reference is a good place to start. Spring Data transaction configuration The actual implementation of the Spring Data managed DAO – SimpleJpaRepository – uses annotations to define and configure transactions. A read only @Transactional annotation is used at the class level, which is then overridden for the non read-only methods. The rest of the transaction semantics are default, but these can be easily overridden manually per method. Exception Translation without the template One of the responsibilities of Spring ORM templates (JpaTemplate, HibernateTemplate) is exception translation – translating JPA exceptions – which tie the API to JPA – to Spring’s DataAccessException hierarchy. Without the template to do that, exception translation can still be enabled by annotating the DAOs with the @Repository annotation. That, coupled with a Spring bean postprocessor will advice all @Repository beans with all the implementations of PersistenceExceptionTranslator found in the Container – to provide exception translation without using the template. The fact that exception translation is indeed active can easily be verified with an integration test: @Test( expected = DataAccessException.class ) public void whenAUniqueConstraintIsBroken_thenSpringSpecificExceptionIsThrown(){ String name = "randomName"; this.service.save( new Foo( name ) ); this.service.save( new Foo( name ) ); } Exception translation is done through proxies; in order for Spring to be able to create proxies around the DAO classes, these must not be declared final. Spring Data Configuration To activate the Spring JPA repository support, the jpa namespace is defined and used to specify the package where to DAO interfaces are located: At this point, there is no equivalent Java based configuration – support for it is however in the works. The Spring Java or XML configuration The JPA configuration with Spring 3.1 has already been carefully discussed in the previous article of this series. Spring Data also takes advantage of the Spring support for the JPA @PersistenceContext annotation which it uses to wire the EntityManager into the Spring factory bean responsible with creating the actual DAO implementations – JpaRepositoryFactoryBean. In addition to the already discussed configuration, there is one last missing piece – including the Spring Data XML configuration in the overall persistence configuration: @Configuration @EnableTransactionManagement @ImportResource( "classpath*:*springDataConfig.xml" ) public class PersistenceJPAConfig{ ... } The Maven configuration In addition to the Maven configuration for JPA defined in a previous article, the spring-data-jpa dependency is addeed: org.springframework.data spring-data-jpa 1.0.2.RELEASE Conclusion This article covered the configuration and implementation of the persistence layer with Spring 3.1, JPA 2 and Spring JPA (part of the Spring Data umbrella project), using both XML and Java based configuration. The various method of defining more advanced custom queries are discussed, as well as configuration with the new jpa namespace and transactional semantics. The final result is a new and elegant take on data access with Spring, with almost no actual implementation work. You can check out the full implementation in the github project. From the originalThe Persistence Layer with Spring Data JPA of the Persistence with Spring series

January 20, 2012

by Eugen Paraschiv

· 154,998 Views · 2 Likes

Local and Distributed Graph Traversal Engines

in the graph database space, there are two types of traversal engines: local and distributed. local traversal engines are typically for single-machine graph databases and are used for real-time production applications. distributed traversal engines are typically for multi-machine graph databases and are used for batch processing applications. this divide is quite sharp in the community, but there is nothing that prevents the unification of both models. a discussion of this divide and its unification is presented in this post. local traversal engines in a local traversal engine, there is typically a single processing agent that obeys a program. the agent is called a traverser and the program it follows is called a path description . in gremlin , a friend-of-a-friend path description is defined as such: g.v(1).oute('friend').inv.oute('friend').inv when this path description is interpreted by a traverser over a graph, an instance of the description is realized as actual paths in the graph that match that description. for example, the gremlin traverser starts at vertex 1 and then steps to its outgoing friend -edges. next, it will move to the head/target vertices of those edges (i.e. vertices 3 and 4). after that, it will go to the friend -edges of those vertices and finally, to the head of the previous edges (i.e. vertices 6 and 7). in this way, a single traverser is following all the paths that are exposed with each new atomic graph operation (i.e. each new step after a .). the abstract syntax being: step.step.step . what is returned by this friend-of-a-friend path description, on the example graph diagrammed, is vertices 6 and 7. in many situations, its not the end of the path that is desired, but some side-effect of the traversal. for example, as the traverser walks it can update some global data structure such as a ranking of the vertices. this idea is presented in the path description below, where the oute.inv path is looped over 1000 times. each time a vertex is traversed over, the map m is updated. this global map m maintains keys that are vertices and values that denote the number of times that each vertex has been touched ( groupcount ‘s behavior). m = [:] g.v(1).oute.inv.groupcount(m).loop(3){it.loops < 1000} the local traversal engine pattern is abstractly diagrammed on the right, where a single traverser is obeying some path description ( a.b.c ) and in doing so, moving around on a graph and updating a global data structure (the red boxed map). given the need for traversers to move from element to element, graph databases of this form tend to support strong data locality by means of a direct-reference graph data structure (i.e. vertices have pointers to edges and edges to vertices). a few examples of such graph databases include neo4j , orientdb , and dex . distributed traversal engines in a distributed traversal engine, a traversal is represented as a flow of messages between the elements of the graph. generally, each element (e.g. vertex) is operating independently of the other elements. each element is seen as its own processor with its own (usually homogenous) program to execute. elements communicate with each other via message passing . when no more messages have been passed, the traversal is complete and the results of the traversal are typically represented as a distributed data structure over the elements. graph databases of this nature tend to use the bulk synchronous parallel model of distributed computing. each step is synchronized in a manner analogous to a clock cycle in hardware. instances of this model include agrapa , pregel , trinity , and goldenorb . an example of distributed graph traversing is now presented using a ranking algorithm in java. [ note : in this example, edges are not first class citizens. this is typical of the state of the art in distributed traversal engines. they tend to be for single-relational, unlabeled-edge graphs.] public void evaluatestep(int step) { if(!this.inbox.isempty() && step < 1000) { this.rank = this.rank + this.inbox.size(); for(vertex vertex : this.adjacentvertices()) { for(int i=0; i

January 10, 2012

by Marko Rodriguez

· 8,535 Views

Solr Select Query GET vs POST Request

Learn the differences between GET and POST requests.

January 10, 2012

by Bas De Nooijer

· 24,318 Views

Searching relational content with Lucene's BlockJoinQuery

Lucene's 3.4.0 release adds a new feature called index-time join (also sometimes called sub-documents, nested documents or parent/child documents), enabling efficient indexing and searching of certain types of relational content. Most search engines can't directly index relational content, as documents in the index logically behave like a single flat database table. Yet, relational content is everywhere! A job listing site has each company joined to the specific listings for that company. Each resume might have separate list of skills, education and past work experience. A music search engine has an artist/band joined to albums and then joined to songs. A source code search engine would have projects joined to modules and then files. Perhaps the PDF documents you need to search are immense, so you break them up and index each section as a separate Lucene document; in this case you'll have common fields (title, abstract, author, date published, etc.) for the overall document, joined to the sub-document (section) with its own fields (text, page number, etc.). XML documents typically contain nested tags, representing joined sub-documents; emails have attachments; office documents can embed other documents. Nearly all search domains have some form of relational content, often requiring more than one join. If such content is so common then how do search applications handle it today? One obvious "solution" is to simply use a relational database instead of a search engine! If relevance scores are less important and you need to do substantial joining, grouping, sorting, etc., then using a database could be best overall. Most databases include some form a text search, some even using Lucene. If you still want to use a search engine, then one common approach is to denormalize the content up front, at index-time, by joining all tables and indexing the resulting rows, duplicating content in the process. For example, you'd index each song as a Lucene document, copying over all fields from the song's joined album and artist/band. This works correctly, but can be horribly wasteful as you are indexing identical fields, possibly including large text fields, over and over. Another approach is to do the join yourself, outside of Lucene, by indexing songs, albums and artist/band as separate Lucene documents, perhaps even in separate indices. At search-time, you first run a query against one collection, for example the songs. Then you iterate through all hits, gathering up (joining) the full set of corresponding albums and then run a second query against the albums, with a large OR'd list of the albums from the first query, repeating this process if you need to join to artist/band as well. This approach will also work, but doesn't scale well as you may have to create possibly immense follow-on queries. Yet another approach is to use a software package that has already implemented one of these approaches for you! elasticsearch, Apache Solr, Apache Jackrabbit, Hibernate Search and many others all handle relational content in some way. With BlockJoinQuery you can now directly search relational content yourself! Let's work through a simple example: imagine you sell shirts online. Each shirt has certain common fields such as name, description, fabric, price, etc. For each shirt you have a number of separate stock keeping units or SKUs, which have their own fields like size, color, inventory count, etc. The SKUs are what you actually sell, and what you must stock, because when someone buys a shirt they buy a specific SKU (size and color). Maybe you are lucky enough to sell the incredible Mountain Three-wolf Moon Short Sleeve Tee, with these SKUs (size, color): small, blue small, black medium, black large, gray Perhaps a user first searches for "wolf shirt", gets a bunch of hits, and then drills down on a particular size and color, resulting in this query: name:wolf AND size=small AND color=blue which should match this shirt. name is a shirt field while the size and color are SKU fields. But if the user drills down instead on a small gray shirt: name:wolf AND size=small AND color=gray then this shirt should not match because the small size only comes in blue and black. How can you run these queries using BlockJoinQuery? Start by indexing each shirt (parent) and all of its SKUs (children) as separate documents, using the new IndexWriter.addDocuments API to add one shirt and all of its SKUs as a single document block. This method atomically adds a block of documents into a single segment as adjacent document IDs, which BlockJoinQuery relies on. You should also add a marker field to each shirt document (e.g. type = shirt), as BlockJoinQuery requires a Filter identifying the parent documents. To run a BlockJoinQuery at search-time, you'll first need to create the parent filter, matching only shirts. Note that the filter must use FixedBitSet under the hood, like CachingWrapperFilter: Filter shirts = new CachingWrapperFilter( new QueryWrapperFilter( new TermQuery( new Term("type", "shirt")))); Create this filter once, up front and re-use it any time you need to perform this join. Then, for each query that requires a join, because it involves both SKU and shirt fields, start with the child query matching only SKU fields: BooleanQuery skuQuery = new BooleanQuery(); skuQuery.add(new TermQuery(new Term("size", "small")), Occur.MUST); skuQuery.add(new TermQuery(new Term("color", "blue")), Occur.MUST); Next, use BlockJoinQuery to translate hits from the SKU document space up to the shirt document space: BlockJoinQuery skuJoinQuery = new BlockJoinQuery( skuQuery, shirts, ScoreMode.None); The ScoreMode enum decides how scores for multiple SKU hits should be aggregated to the score for the corresponding shirt hit. In this query you don't need scores from the SKU matches, but if you did you can aggregate with Avg, Max or Total instead. Finally you are now free to build up an arbitrary shirt query using skuJoinQuery as a clause: BooleanQuery query = new BooleanQuery(); query.add(new TermQuery(new Term("name", "wolf")), Occur.MUST); query.add(skuJoinQuery, Occur.MUST); You could also just run skuJoinQuery as-is if the query doesn't have any shirt fields. Finally, just run this query like normal! The returned hits will be only shirt documents; if you'd also like to see which SKUs matched for each shirt, use BlockJoinCollector: BlockJoinCollector c = new BlockJoinCollector( Sort.RELEVANCE, // sort 10, // numHits true, // trackScores false // trackMaxScore ); searcher.search(query, c); The provided Sort must use only shirt fields (you cannot sort by any SKU fields). When each hit (a shirt) is competitive, this collector will also record all SKUs that matched for that shirt, which you can retrieve like this: TopGroups hits = c.getTopGroups( skuJoinQuery, skuSort, 0, // offset 10, // maxDocsPerGroup 0, // withinGroupOffset true // fillSortFields ); Set skuSort to the sort order for the SKUs within each shirt. The first offset hits are skipped (use this for paging through shirt hits). Under each shirt, at most maxDocsPerGroup SKUs will be returned. Use withinGroupOffset if you want to page within the SKUs. If fillSortFields is true then each SKU hit will have values for the fields from skuSort. The hits returned by BlockJoinCollector.getTopGroups are SKU hits, grouped by shirt. You'd get the exact same results if you had denormalized up-front and then used grouping to group results by shirt. You can also do more than one join in a single query; the joins can be nested (parent to child to grandchild) or parallel (parent to child1 and parent to child2). However, there are some important limitations of index-time joins: The join must be computed at index-time and "compiled" into the index, in that all joined child documents must be indexed along with the parent document, as a single document block. Different document types (for example, shirts and SKUs) must share a single index, which is wasteful as it means non-sparse data structures like FieldCache entries consume more memory than they would if you had separate indices. If you need to re-index a parent document or any of its child documents, or delete or add a child, then the entire block must be re-indexed. This is a big problem in some cases, for example if you index "user reviews" as child documents then whenever a user adds a review you'll have to re-index that shirt as well as all its SKUs and user reviews. There is no QueryParser support, so you need to programmatically create the parent and child queries, separating according to parent and child fields. The join can currently only go in one direction (mapping child docIDs to parent docIDs), but in some cases you need to map parent docIDs to child docIDs. For example, when searching songs, perhaps you want all matching songs sorted by their title. You can't easily do this today because the only way to get song hits is to group by album or band/artist. The join is a one (parent) to many (children), inner join. As usual, patches are welcome! There is work underway to create a more flexible, but likely less performant, query-time join capability, which should address a number of the above limitations. Source: http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html

January 9, 2012

by Michael Mccandless

· 14,819 Views

The Persistence Layer with Spring 3.1 and JPA

1. Overview This is the third of a series of articles about Persistence with Spring. This article will focus on the configuration and implementation of Spring with JPA. For a step by step introduction about setting up the Spring context using Java based configuration and the basic Maven pom for the project, see this article. The Persistence with Spring series: Part 1 – The Persistence Layer with Spring 3.1 and Hibernate Part 2 – Simplifying the Data Access Layer with Spring and Java Generics Part 4 – The Persistence Layer with Spring Data JPA Part 5 – Transaction configuration with JPA and Spring 3.1 2. No More Spring Templates As of Spring 3.1, the JpaTemplate and the corresponding JpaDaoSupport have been deprecated in favor of using the native Java Persistence API. Also, both of these classes are only relevant for JPA 1 (from the JpaTemplate javadoc): Note that this class did not get upgraded to JPA 2.0 and never will. As a consequence, it is now best practice to use the Java Persistence API directly instead of the JpaTemplate, which will effectively decouple the DAO layer implementation from Spring entirely. Exception Translation without the template One of the responsibilities of JpaTemplate is exception translation – translating the low level exceptions – which tie the API to JPA – into higher level, generic Spring exceptions. Without the template to do that, exception translation can still be enabled by annotating the DAOs with the @Repository annotation. That, coupled with a Spring bean postprocessor will advice all @Repository beans with all the implementations of PersistenceExceptionTranslator found in the Container – to provide exception translation without using the template. Exception translation is done through proxies; in order for Spring to be able to create proxies around the DAO classes, these must not be declared final. Injecting the JPA EntityManager with Spring without the template The EntityManager is the API of the persistence context; this can be injected directly in the DAO. The Spring Container is capable of acting as a JPA container and of injecting the EntityManager by honoring the @PersistenceContext (both as field-level and a method-level annotation). For this to work, the PersistenceAnnotationBeanPostProcessor bean must exist in the Spring Container. The bean can be either created explicitly by defining it in the configuration, or automatically, by defining context:annotation-config or context:component-scan in the configuration. 3. The Spring Java configuration The EntityManager is set up in the configuration by creating a Spring factory bean to manage it; this will allow the PersistenceAnnotationBeanPostProcessor to retrieve it from the Container. There are two options to set this up – either the simpler LocalEntityManagerFactoryBean or the more flexible LocalContainerEntityManagerFactoryBean. The latter option is used here, so that additional properties can be configured on it: @Configuration @EnableTransactionManagement public class PersistenceJPAConfig{ @Bean public LocalContainerEntityManagerFactoryBean entityManagerFactoryBean(){ LocalContainerEntityManagerFactoryBean factoryBean = new LocalContainerEntityManagerFactoryBean(); factoryBean.setDataSource( this.restDataSource() ); factoryBean.setPackagesToScan( new String[ ] { "org.rest" } ); JpaVendorAdapter vendorAdapter = new HibernateJpaVendorAdapter(){ { // JPA properties ... } }; factoryBean.setJpaVendorAdapter( vendorAdapter ); factoryBean.setJpaProperties( this.additionlProperties() ); return factoryBean; } @Bean public DataSource restDataSource(){ DriverManagerDataSource dataSource = new DriverManagerDataSource(); dataSource.setDriverClassName( this.driverClassName ); dataSource.setUrl( this.url ); dataSource.setUsername( "restUser" ); dataSource.setPassword( "restmy5ql" ); return dataSource; } @Bean public PlatformTransactionManager transactionManager(){ JpaTransactionManager transactionManager = new JpaTransactionManager(); transactionManager.setEntityManagerFactory( this.entityManagerFactoryBean().getObject() ); return transactionManager; } @Bean public PersistenceExceptionTranslationPostProcessor exceptionTranslation(){ return new PersistenceExceptionTranslationPostProcessor(); } } Also, note that cglib must be on the classpath for Java @Configuration classes to work; to better understand the need for cglib as a dependency, see this article. 4. The Spring XML configuration The same Spring configuration with XML: There is a relatively small difference between the way Spring is configured in XML and the new Java based configuration – in XML, a reference to another bean can point to either the bean or a bean factory for that bean. In Java however, since the types are different, the compiler doesn’t allow it, and so the EntityManagerFactory is first retrieved from it’s bean factory and then passed to the transaction manager: txManager.setEntityManagerFactory( this.entityManagerFactoryBean().getObject() ); 5. Going full XML-less Usually, JPA defines a persistence unit through the META-INF/persistence.xml file. Starting with Spring 3.1, this XML file is no longer necessary – the LocalContainerEntityManagerFactoryBean now supports a ‘packagesToScan’ property where the packages to scan for @Entity classes can be specified. The persistence.xml file was the last piece of XML to be removed – now, JPA can be fully set up with no XML. 5.1. The JPA Properties JPA properties would usually be specified in the persistence.xml file; alternatively, the properties can be specified directly to the entity manager factory bean: factoryBean.setJpaProperties( this.additionlProperties() ); As a side-note, if Hibernate would be the persistence provider, then this would be the way to specify Hibernate specific properties. 5.2. The DAO Each DAO will be based on an parametrized, abstract DAO class class with support for the common generic operations: public abstract class AbstractJpaDAO< T extends Serializable > { private Class< T > clazz; @PersistenceContext EntityManager entityManager; public void setClazz( Class< T > clazzToSet ){ this.clazz = clazzToSet; } public T findOne( Long id ){ return this.entityManager.find( this.clazz, id ); } public List< T > findAll(){ return this.entityManager.createQuery( "from " + this.clazz.getName() ) .getResultList(); } public void save( T entity ){ this.entityManager.persist( entity ); } public void update( T entity ){ this.entityManager.merge( entity ); } public void delete( T entity ){ this.entityManager.remove( entity ); } public void deleteById( Long entityId ){ T entity = this.getById( entityId ); this.delete( entity ); } } A few aspects are interesting here – as discussed, the abstract DAO does not extend any Spring template (such as JpaTemplate). Instead, the JPA EntityManager is injected directly in the DAO, and will have the role of the main Persistence API Also, note that the entity Class is passed in the constructor to be used in the generic operations: @Repository public class FooDAO extends AbstractHibernateDAO< Foo > implements IFooDAO{ public FooDAO(){ setClazz(Foo.class ); } } 6. The Maven configuration In addition to the Maven configuration defined in a previous article, the following dependencies are addeed: spring-orm (which also has spring-tx as its dependency) and hibernate-entitymanager: org.springframework spring-orm 3.2.2.RELEASE org.hibernate hibernate-entitymanager 4.2.0.Final runtime mysql mysql-connector-java 5.1.24 runtime Note that the MySQL dependency is included as a reference – a driver is needed to configure the datasource, but any Hibernate supported database will do. 7. Conclusion This article covered the configuration and implementation of the persistence layer with JPA 2 and Spring 3.1, using both XML and Java based configuration. The reasons to stop relying on templates for the DAO layer was discussed, as well as getting rid of the last piece of XML usually associated with JPA – the persistence.xml. The final result is a lightweight, clean DAO implementation, with almost no compile-time reliance on Spring. You can check out the full implementation in the github project. OriginalThe Persistence Layer with Spring 3.1 and JPA from the Persistence with Spring series

January 7, 2012

by Eugen Paraschiv

· 63,041 Views · 1 Like

Simplifying the Data Access Layer with Spring and Java Generics

1. Overview This is the second of a series of articles about Persistence with Spring. The previous article discussed setting up the persistence layer with Spring 3.1 and Hibernate, without using templates. This article will focus on simplifying the Data Access Layer by using a single, generified DAO, which will result in elegant data access, with no unnecessary clutter. Yes, in Java. The Persistence with Spring series: Part 1 – The Persistence Layer with Spring 3.1 and Hibernate Part 3 – The Persistence Layer with Spring 3.1 and JPA Part 4 – The Persistence Layer with Spring Data JPA Part 5 – Transaction configuration with JPA and Spring 3.1 2. The DAO mess Most production codebases have some kind of DAO layer. Usually the implementation ranges from a raw class with no inheritance to some kind of generified class, but one thing is consistent – there is always more then one. Most likely, there are as many DAOs as there are entities in the system. Also, depending on the level of generics involved, the actual implementations can vary from heavily duplicated code to almost empty, with the bulk of the logic grouped in an abstract class. 2.1. A Generic DAO Instead of having multiple implementations – one for each entity in the system – a single parametrized DAO can be used in such a way that it still takes full advantage of the type safety provided by generics. Two implementations of this concept are presented next, one for a Hibernate centric persistence layer and the other focusing on JPA. These implementation are by no means complete – only some data access methods are included, but they can be easily be made more thorough. 2.2. The Abstract Hibernate DAO public abstract class AbstractHibernateDAO< T extends Serializable > { private Class< T > clazz; @Autowired SessionFactory sessionFactory; public void setClazz( Class< T > clazzToSet ){ this.clazz = clazzToSet; } public T findOne( Long id ){ return (T) this.getCurrentSession().get( this.clazz, id ); } public List< T > findAll(){ return this.getCurrentSession() .createQuery( "from " + this.clazz.getName() ).list(); } public void save( T entity ){ this.getCurrentSession().persist( entity ); } public void update( T entity ){ this.getCurrentSession().merge( entity ); } public void delete( T entity ){ this.getCurrentSession().delete( entity ); } public void deleteById( Long entityId ){ T entity = this.getById( entityId ); this.delete( entity ); } protected Session getCurrentSession(){ return this.sessionFactory.getCurrentSession(); } } The DAO uses the Hibernate API directly, without relying on any Spring templates (such as HibernateTemplate). Using of templates, as well as management of the SessionFactory which is autowired in the DAO were covered in the previous post of the series. 2.3. The Abstract JPA DAO public abstract class AbstractJpaDAO< T extends Serializable > { private Class< T > clazz; @PersistenceContext EntityManager entityManager; public void setClazz( Class< T > clazzToSet ){ this.clazz = clazzToSet; } public T findOne( Long id ){ return this.entityManager.find( this.clazz, id ); } public List< T > findAll(){ return this.entityManager.createQuery( "from " + this.clazz.getName() ) .getResultList(); } public void save( T entity ){ this.entityManager.persist( entity ); } public void update( T entity ){ this.entityManager.merge( entity ); } public void delete( T entity ){ this.entityManager.remove( entity ); } public void deleteById( Long entityId ){ T entity = this.getById( entityId ); this.delete( entity ); } } Similar to the Hibernate DAO implementation, the Java Persistence API is used here directly, again not relying on the now deprecated Spring JpaTemplate. 2.4. The Generic DAO Now, the actual implementation of the generic DAO is as simple as it can be – it contains no logic. Its only purpose is to be injected by the Spring container in a service layer (or in whatever other type of client of the Data Access Layer): @Repository @Scope( BeanDefinition.SCOPE_PROTOTYPE ) public class GenericJpaDAO< T extends Serializable > extends AbstractJpaDAO< T > implements IGenericDAO< T >{ // } @Repository @Scope( BeanDefinition.SCOPE_PROTOTYPE ) public class GenericHibernateDAO< T extends Serializable > extends AbstractHibernateDAO< T > implements IGenericDAO< T >{ // } First, note that the generic implementation is itself parametrized – allowing the client to choose the correct parameter in a case by case basis. This will mean that the clients gets all the benefits of type safety without needing to create multiple artifacts for each entity. Second, notice the prototype scope of these generic DAO implementation. Using this scope means that the Spring container will create a new instance of the DAO each time it is requested (including on autowiring). That will allow a service to use multiple DAOs with different parameters for different entities, as needed. The reason this scope is so important is due to the way Spring initializes beans in the container. Leaving the generic DAO without a scope would mean using the default singleton scope, which would lead to a single instance of the DAO living in the container. That would obviously be majorly restrictive for any kind of more complex scenario. 3. The Service There is now a single DAO to be injected by Spring; also, the Class needs to be specified: @Service class FooService implements IFooService{ IGenericDAO< Foo > dao; @Autowired public void setDao( IGenericDAO< Foo > daoToSet ){ this.dao = daoToSet; this.dao.setClazz( Foo.class ); } // ... } Spring autowires the new DAO insteince using setter injection so that the implementation can be customized with the Class object. After this point, the DAO is fully parametrized and ready to be used by the service. 4. Conclusion This article discussed the simplification of the Data Access Layer by providing a single, reusable implementation of a generic DAO. This implementation was presented in both a Hibernate and a JPA based environment. The result is a streamlined persistence layer, with no unnecessary clutter. For a step by step introduction about setting up the Spring context using Java based configuration and the basic Maven pom for the project, see this article. The next article of the Persistence with Spring series will focus on setting up the DAL layer with Spring 3.1 and JPA. In the meantime, you can check out the full implementation in the github project. If you read this far, you should follow me on twitter here.

January 5, 2012

by Eugen Paraschiv

· 25,132 Views · 1 Like

How to deploy a neo4j instance in Amazon EC2 in 10 minutes

Neo4j is a high-performance, NOSQL graph database with all the features of a mature and robust database. In this post I will explain how to deploy a neo4j instance in Amazon EC2 web service. For this tutorial to take you no more than 10 minutes you should be able to execute properly some bash commands like mv, tar, ssh and scp (secure copy). I also assume that you have an account in Amazon Web Services and you are familiar to the process of launching instances. If not, I strongly recommend you to follow this starting guide and complete it till you manage to connect to your instance with ssh. Start downloading the latest stable version of neo4j. Which you can find here. The “Community Edition” fits well for development purposes. Do not forget to select the Unix version of the server. This will download a tar.gz file which you will copy to your EC2 instance later. While you download the neo4j server open the AWS Management Console and launch a Basic 32-bit Amazon Linux AMI. If you want to launch an Ubuntu AMI please notice that it doesn’t ship with Java, which is required for running neo4j. If you are not familiar with key pairs, pem files or security groups I insist you to follow the EC2 starting guide I mentioned above. You can either create a new security group or use the default, but you will need to configure a new security rule for the neo4j server port. After launching the instance, create a TCP rule on port 7474 with source 0.0.0.0/0. Here you are opening port 7474 for anyone. If you are planning to use the neo4j REST API and remotely call it from another server, for example a Rails application hosted in Heroku, for security reasons, you may want to change the source field to the address of your Heroku server. Do not forget to open port 22 (SSH), this is typically the first rule normal people create after launching an instance. You are almost done! You should now install neo4j in your instance. Open a terminal in your localhost and navigate to the path where you downloaded neo4j. Copy the file to your Amazon instance by using the scp command: scp -i your_pem_file.pem neo4j-community-1.6.M01-unix.tar.gz ec2-user@YOUR_PUBLIC_INSTANCE_DNS:/home/ec2-user Please notice that you will need to change the path to your pem file, typically placed in ~/.ssh, the filename of the neo4j server you just downloaded and the plublic DNS of your instance. Now connect to your instance with SSH: ssh -i your_pem_file.pem ec2-user@YOUR_PUBLIC_INSTANCE_DNS Untar the neo4j server: tar xvfz neo4j-community-1.6.M01-unix.tar.gz.tar.gz Move it to /usr/local and rename the folder to neo4j: sudo mv neo4j-community-1.6.M01 /usr/local/neo4j Almost done!!! You should now open neo4j-server.properties under the conf directory and add the following line: org.neo4j.server.webserver.address=0.0.0.0 This lines allows anyone to connect remotely to your neo4j database server. Now run the start script. From the neo4j server folder. sudo ./bin/neo4j start Finally, open a browser and access the webadmin interface of your neo4j database by typing http://YOUR_PUBLIC_INSTANCE_DNS:7474. You should see the Neo4j Monitoring and Management Tool, pretty cool! If not, ask me You can now try using the REST API and the curl bash command to insert nodes and relationships. I hope this post helped you, good luck! Follow me on Twitter @negarnil Source: http://www.cloudtmp.com/java/how-to-deploy-a-neo4j-instance-in-amazon-ec2-in-10-minutes/

December 27, 2011

by Nicolas Garnil

· 27,440 Views · 1 Like

How to order by multiple columns using Lambas and LINQ in C#

Today, I was required to order a list of records based on a Name and then ID. A simple one, but I did spend some time on how to do it with Lambda Expression in C#. C# provides the OrderBy, OrderByDescending, ThenBy, ThenByDescending. You can use them in your lambda expression to order the records as per your requirement. Assuming your list is “Phones” and contains the following data public class Phone { public int ID { get; set; } public string Name { get; set; } } public class Phones : List { public Phones() { Add(new Phone { ID = 1, Name = "Windows Phone 7" }); Add(new Phone { ID = 5, Name = "iPhone" }); Add(new Phone { ID = 2, Name = "Windows Phone 7" }); Add(new Phone { ID = 3, Name = "Windows Mobile 6.1" }); Add(new Phone { ID = 6, Name = "Android" }); Add(new Phone { ID = 10, Name = "BlackBerry" }); } } If you were to use LINQ Query , the query will look like the one below dataGridView1.DataSource = (from m in new Phones() orderby m.Name, m.ID select m).ToList(); Simple isn’t it ? It very simple using Lamba expression too. Your Lambda’s expression for the above LINQ query will look like the one below dataGridView1.DataSource = new Phones().OrderByDescending(a => a.Name).ThenByDescending(a => a.ID).ToList();

December 24, 2011

by Senthil Kumar

· 136,944 Views

Easy URL rewriting in ASP.NET 4.0 web forms

In this post I am going to explain URL rewriting in greater details. This post will contain basic of URL rewriting and will explain how we can do URL rewriting in fewer lines of code. Why we need URL rewriting? Let’s consider a simpler scenario we want to display a customer details on a ASP.NET Page so how our page will know that for which customer we need to display details? The simplest way of doing is to use query string we will pass a customer id which uniquely identifies customer in query string. So our url will look like this. Customer.aspx?Id=1 This will work but the problem with above URL is that its not user friendly and search engine friendly. who is going to remember that what query string parameter I am going to pass and why we need that parameter. Also when search engine will crawl this site it will going to read this URL blindly as this url is not informative because it query string is not readable for search engine crawlers. So your search engine will be ranked lower as this URL is not readable to search engine crawlers.Now when do a URL rewriting our URL will be cleaner shorter and simpler like this. Customers/Id/1/ Here anybody in world can understand it talking about customer and this page will used to show customer details.Even search engine crawler will also know that you are talking about customers. That is why we need URL rewriting. URL rewriting and ASP.NET In earlier versions of ASP.NET we have to write lots of code for URL rewriting but Now with ASP.NET 4.0 you can easily rewrite in fewer lines of code. So let’s start a Demo where I will demonstrate you how we can easily rewrite URLs. So let’s first create a ASP. NET web form application via File->New->Project and a dialog box will open just like below and then created a empty project called URL rewriting. After creating a project I have added global.asax – where we are going to write url mapping logic and then I have added an asp.net page called which will display customer information just like below. So everything is now ready let’s start writing code. First thing we need to do is to define routes. Route will map a URL to physical page.First I will create static function called Register route which map route to particular file and then I am going to call this from application_start event of global.asax. Following is code for that. using System; using System.Web.Routing; namespace UrlRewriting { public class Global : System.Web.HttpApplication { protected void Application_Start(object sender, EventArgs e) { RegisterRoutes(RouteTable.Routes); } public static void RegisterRoutes(RouteCollection routeCollection) { routeCollection.MapPageRoute("RouteForCustomer", "Customer/{Id}", "~/Customer.aspx"); } } } Now as mapping code has been done let’s right code for customer.aspx page. I have have following code in page_load event of customer.aspx using System; namespace UrlRewriting { public partial class Customer : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { string id = Page.RouteData.Values["Id"].ToString(); Response.Write("Customer Details page"); Response.Write(string.Format("Displaying information for customer : {0}",id)); } } } Here in above code you can see that I am getting value from page route data and then just printing it. In real world it will fetch customer data from database and show customer details on page. Now let’s run that application. It will print a details of customer as I have passed Id in URL suppose you pass 1 as id in URL then it will look like following. Now if you put 2 in url it will print information about customer 2. That’s it. So we have enabled URL rewriting in asp.net in fewer lines of code. In next post I am going to explain redirection with URL rewriting. Hope you like this post. Stay tuned for more.. Till then Happy programming

December 10, 2011

by Jalpesh Vadgama

· 28,568 Views

MySQL vs. Neo4j on a Large-Scale Graph Traversal

this post presents an analysis of mysql (a relational database) and neo4j (a graph database) in a side-by-side comparison on a simple graph traversal. the data set that was used was an artificially generated graph with natural statistics. the graph has 1 million vertices and 4 million edges. the degree distribution of this graph on a log-log plot is provided below. a visualization of a 1,000 vertex subset of the graph is diagrammed above. loading the graph the graph data set was loaded both into mysql and neo4j. in mysql a single table was used with the following schema. create table graph ( outv int not null, inv int not null ); create index outv_index using btree on graph (outv); create index inv_index using btree on graph (inv); after loading the data, the table appears as below. the first line reads: “vertex 0 is connected to vertex 1.” mysql> select * from graph limit 10; +------+-----+ | outv | inv | +------+-----+ | 0 | 1 | | 0 | 2 | | 0 | 6 | | 0 | 7 | | 0 | 8 | | 0 | 9 | | 0 | 10 | | 0 | 12 | | 0 | 19 | | 0 | 25 | +------+-----+ 10 rows in set (0.04 sec) the 1 million vertex graph data set was also loaded into neo4j. in gremlin , the graph edges appear as below. the first line reads: “vertex 0 is connected to vertex 992915.” gremlin> g.e[1..10] ==>e[183][0-related->992915] ==>e[182][0-related->952836] ==>e[181][0-related->910150] ==>e[180][0-related->897901] ==>e[179][0-related->871349] ==>e[178][0-related->857804] ==>e[177][0-related->798969] ==>e[176][0-related->773168] ==>e[175][0-related->725516] ==>e[174][0-related->700292] warming up the caches before traversing the graph data structure in both mysql and neo4j, each database had a “ warm up ” procedure run on it. in mysql, a “select * from graph” was evaluated and all of the results were iterated through. in neo4j, every vertex in the graph was iterated through and the outgoing edges of each vertex were retrieved. finally, for both mysql and neo4j, the experiment discussed next was run twice in a row and the results of the second run were evaluated. traversing the graph the traversal that was evaluated on each database started from some root vertex and emanated n-steps out. there was no sorting, no distinct-ing, etc. the only two variables for the experiments are the length of the traversal and the root vertex to start the traversal from. in mysql, the following 5 queries denote traversals of length 1 through 5. note that the “?” is a variable parameter of the query that denotes the root vertex. select a.inv from graph as a where a.outv=? select b.inv from graph as a, graph as b where a.inv=b.outv and a.outv=? select c.inv from graph as a, graph as b, graph as c where a.inv=b.outv and b.inv=c.outv and a.outv=? select d.inv from graph as a, graph as b, graph as c, graph as d where a.inv=b.outv and b.inv=c.outv and c.inv=d.outv and a.outv=? select e.inv from graph as a, graph as b, graph as c, graph as d, graph as e where a.inv=b.outv and b.inv=c.outv and c.inv=d.outv and d.inv=e.outv and a.outv=? for neo4j, the blueprints pipes framework was used. a pipe of length n was constructed using the following static method. public static pipeline createpipeline(final integer steps) { final arraylist pipes = new arraylist(); for (int i = 0; i < steps; i++) { pipe pipe1 = new vertexedgepipe(vertexedgepipe.step.out_edges); pipe pipe2 = new edgevertexpipe(edgevertexpipe.step.in_vertex); pipes.add(pipe1); pipes.add(pipe2); } return new pipeline(pipes); } for both mysql and neo4j, the results of the query (sql and pipes) were iterated through. thus, all results were retrieved for each query. in mysql, this was done as follows. while (resultset.next()) { resultset.getint(finalcolumn); } in neo4j, this is done as follows. while (pipeline.hasnext()) { pipeline.next(); } experimental results the artificial graph dataset was constructed with a “ rich get richer “, preferential attachment model . thus, the vertices created earlier are the most dense (i.e. highest number of adjacent vertices). this property was used to limit the amount of time it would take to evaluate the tests for each traversal. only the first 250 vertices were used as roots of the traversals. before presenting timing results, note that all of these experiments were run on a macbook pro with a 2.66ghz intel core 2 duo and 4gigs of ram at 1067 mhz ddr3. the packages used were java 1.6, mysql jdbc 5.0.8, and blueprints pipes 0.1.2. java version "1.6.0_17" java(tm) se runtime environment (build 1.6.0_17-b04-248-10m3025) java hotspot(tm) 64-bit server vm (build 14.3-b01-101, mixed mode) the following java virtual machine parameters were used: -xmx1000m -xms500m below are the total running times for both mysql (red) and neo4j (blue) for traversals of length 1, 2, 3, and 4. the raw data is presented below along with the total number of vertices returned by each traversal—which, of course, is the same for both mysql and neo4j given that its the same graph data set being processed. also realize that traversals can loop and thus, many of the same vertices are returned multiple times. finally, note that only neo4j has the running time for a traversal of length 5. mysql did not finish after waiting 2 hours to complete. in comparison, neo4j took 14.37 minutes to complete a 5 step traversal. [mysql steps-1] time(ms):124 -- vertices_returned:11360 [mysql steps-2] time(ms):922 -- vertices_returned:162640 [mysql steps-3] time(ms):8851 -- vertices_returned:2206437 [mysql steps-4] time(ms):112930 -- vertices_returned:28125623 [mysql steps-5] n/a [neo4j steps-1] time(ms):27 -- vertices_returned:11360 [neo4j steps-2] time(ms):474 -- vertices_returned:162640 [neo4j steps-3] time(ms):3366 -- vertices_returned:2206437 [neo4j steps-4] time(ms):49312 -- vertices_returned:28125623 [neo4j steps-5] time(ms):862399 -- vertices_returned:358765631 next, the individual data points for both mysql and neo4j are presented in the plot below. each point denotes how long it took to return n number of vertices for the varying traversal lengths. finally, the data below provides the number of vertices returned per millisecond (on average) for each of the traversals. again, mysql did not finish in its 2 hour limit for a traversal of length 5. [mysql steps-1] vertices/ms:91.6128847554668 [mysql steps-2] vertices/ms:176.399127537985 [mysql steps-3] vertices/ms:249.286746556076 [mysql steps-4] vertices/ms:249.053599519823 [mysql steps-5] n/a [neo4j steps-1] vertices/ms:420.740351166341 [neo4j steps-2] vertices/ms:343.122344772028 [neo4j steps-3] vertices/ms:655.507125256186 [neo4j steps-4] vertices/ms:570.360621871775 [neo4j steps-5] vertices/ms:416.00886711325 conclusion in conclusion, given a traversal of an artificial graph with natural statistics, the graph database neo4j is more optimal than the relational database mysql. however, no attempts have been made to optimize the java vm, the sql queries, etc. these experiments were run with both neo4j and mysql “out of the box” and with a “natural syntax” for both types of queries. source: http://markorodriguez.com/2011/02/18/mysql-vs-neo4j-on-a-large-scale-graph-traversal/

December 5, 2011

by Marko Rodriguez

· 58,582 Views · 1 Like

Create Your Own XML/JSON/HTML API with PHP

Develop your own API service for your PHP projects.

December 1, 2011

by Andrei Prikaznov

· 68,246 Views

Zero Downtime – What is it and why is it important?

For most large web applications, uptime is of foremost importants. Any outage can be seen by customers as a frustration, or opportunity to move to a competitor. What's more for a site that also includes e-commerce, it can mean real lost sales. Zero Downtime describes a site without service interruption. To achieve such lofty goals, redundancy becomes a critical requirement at every level of your infrastructure. If you're using cloud hosting, are you redundant to alternate availability zones and regions? Are you using geographically distributed load balancing? Do you have multiple clustered databases on the backend, and multiple webservers load balanced. All of these requirements will increase uptime, but may not bring you close to zero downtime. For that you'll need thorough testing. The solution is to pull the trigger on sections of your infrastructure, and prove that it fails over quickly without noticeable outage. The ultimate test is the outage itself. Sean Hull on Quora: What is zero downtime and why is it important? Source: http://www.iheavy.com/2011/06/23/zero-downtime-what-is-it-and-why-is-it-important/

November 23, 2011

by Sean Hull

· 26,149 Views

Eventual Consistency in NoSQL Databases: Theory and Practice

One of NoSQL's goals: handle previously-unthinkable amounts of data. One of unthinkable-amounts-of-data's problems: previously-improbable events become extremely probable, precisely because the set of interactions is so large. Flip a coin a hundred times, and you're not likely to get 50 heads in a row. But flip it a few trillion times, and you probably will find some 50-heads streaks. So NoSQL's performance strength is also its mathematical weakness. This order of scale can result in lots of problems, but one of the most common is consistency -- the C in ACID -- clearly a fundamental desideratum for any database system, but in principle much harder to acheive for NoSQL databases than for others. Emerging database technologies have forced developers and computer scientists to define more exactly what kind of consistency is really needed, for any given application. Two years ago, ACM (the Association for Computing Machinery) published an extremely helpful examination of the attenuated notion of consistency called 'eventual consistency'. Their summary: Data inconsistency in large-scale reliable distributed systems must be tolerated for two reasons: improving read and write performance under highly concurrent conditions; and handling partition cases where a majority model would render part of the system unavailable even though the nodes are up and running. The article surveys technical solutions as well as user considerations that might soften the undesirability of anything less than perfect, instantaneous consistency. It's not long (4 pages plus pictures), and explains some deep database issues quite clearly. On the more practical side of the problem: Russell Brown recently gave a talk at the NoSQL Exchange 2011 on exactly this topic. More specifically, he showed how some distributed systems (Riak in particular) try to minimize conflicts, and suggested some ways to reconcile conflicts automatically using smart semantic techniques. Check out the NoSQL Exchange page for Russell's talk here, which includes an embedded video. But read the ACM article first for a broader overview, since Russell launches into technical details pretty quickly.

November 22, 2011

by John Esposito

· 12,613 Views

Adventures in Archiving with MongoDB

Most databases have them: a small handful of big tables that grow at a substantially faster rate than any of their peers (think tweets, clicks, check-ins, etc). Over time, the sheer size of these tables cause queries against them to slow down, and increase the size and time taken for backups. In some cases, not all of this data needs to be “live” so that it can be randomly accessed forever, and can be archived once some criteria has been met. I like to think of this as Eventually Irrelevant (if I may coin a term). This post was authored by Brian Ploetz Many database products support partitioning features, which allow you to arrange data physically based on some criteria (typically date, range, list, or hash-based partitioning). These partitioning features allow you to drop partitions from the live database when appropriate. While MongoDB supports horizontal partitioning via sharding, it does not currently support the notion of dropping shards. Even if it did, the criteria for what documents to drop from a collection would most likely not be based on your shard key, so it’s unlikely you’d want to drop a shard anyways. You’d want to drop data from all shards. Capped Collections and the forthcoming TTL-based Capped Collections feature are not quite what we’re looking for either. Capped Collections have very strict rules (including not being able to shard them), and for both normal and TTL-based Capped Collections, as old documents are aged out they are simply dropped on the floor. What we need is more control over the dropping process, such that we can guarantee that our data has been copied to an archive database/data warehouse before being dropped from the main database. I set out to find the most efficient way of manually archiving documents from a large collection given the tools currently available in MongoDB. For the purposes of this exercise, let’s assume we have a large collection of ad clicks which are associated with ads. We want to archive clicks which are associated with ads which expired more than 3 months ago. MapReduce MongoDB wants you to do bulk operations using MapReduce. Since all of the examples and documentation revolve around aggregation, I suspected I wouldn’t be able to use MapReduce as the backbone of my archiving process. After some experimenting, I found that my intuition was correct. Upon attempting to insert/remove documents within the map or reduce functions, I would quickly run into exception 10293 internal error: locks are not upgradeable errors: Sun Sep 25 21:22:24 [conn5] update warehouse.clicks query: { _id: ObjectId('4d6bf191db0b4b2b1fad5b65') } exception 10293 internal error: locks are not upgradeable: { "opid" : 4248173, "active" : false, "waitingForLock" : false, "op" : "update", "ns" : "?", "query" : { "_id" : { "$oid" : "4d6bf191db0b4b2b1fad5b65" } }, "client" : "0.0.0.0:0", "desc" : "conn" } 0ms Sun Sep 25 21:22:24 [conn5] Assertion: 10293:internal error: locks are not upgradeable: { "opid" : 4248174, "active" : false, "waitingForLock" : false, "op" : "update", "ns" : "?", "query" : { "_id" : { "$oid" : "4d6bf198db0b4b2b20ad5b65" } }, "client" : "0.0.0.0:0", "desc" : "conn" } 0x10008de9b 0x1002064f7 0x1002c1ec6 0x1002c4f3d 0x1002c5f7a 0x1000a8181 0x100147df4 0x1004dc68e 0x1004efc93 0x1004dc71b 0x1004dcb71 0x10049a078 0x100158efa 0x100377760 0x100389d0c 0x10034c204 0x10034d877 0x100180cc4 0x100184649 0x1002b9e89 0 mongod 0x000000010008de9b _ZN5mongo11msgassertedEiPKc + 315 1 mongod 0x00000001002064f7 _ZN5mongo10MongoMutex19_writeLockedAlreadyEv + 263 2 mongod 0x00000001002c1ec6 _ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE + 886 3 mongod 0x00000001002c4f3d _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE + 5661 4 mongod 0x00000001002c5f7a _ZN5mongo14DBDirectClient3sayERNS_7MessageE + 106 5 mongod 0x00000001000a8181 _ZN5mongo12DBClientBase6updateERKSsNS_5QueryENS_7BSONObjEbb + 273 6 mongod 0x0000000100147df4 _ZN5mongo12mongo_updateEP9JSContextP8JSObjectjPlS4_ + 660 7 mongod 0x00000001004dc68e js_Invoke + 3864 8 mongod 0x00000001004efc93 js_Interpret + 71932 9 mongod 0x00000001004dc71b js_Invoke + 4005 10 mongod 0x00000001004dcb71 js_InternalInvoke + 404 11 mongod 0x000000010049a078 JS_CallFunction + 86 12 mongod 0x0000000100158efa _ZN5mongo7SMScope6invokeEyRKNS_7BSONObjEib + 666 13 mongod 0x0000000100377760 _ZN5mongo2mr8JSMapper3mapERKNS_7BSONObjE + 96 14 mongod 0x0000000100389d0c _ZN5mongo2mr16MapReduceCommand3runERKSsRNS_7BSONObjERSsRNS_14BSONObjBuilderEb + 1740 15 mongod 0x000000010034c204 _ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb + 628 16 mongod 0x000000010034d877 _ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_10BufBuilderERNS_14BSONObjBuilderEbi + 2151 17 mongod 0x0000000100180cc4 _ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_10BufBuilderERNS_14BSONObjBuilderEbi + 52 18 mongod 0x0000000100184649 _ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_ + 10585 19 mongod 0x00000001002b9e89 _ZN5mongo13receivedQueryERNS_6ClientERNS_10DbResponseERNS_7MessageE + 569 Upon killing the client which kicked off the MapReduce process, theses errors would continue to spin out of control, and I’d have to bounce the server. While MapReduce’s divide and conquer approach is ideal for what we’re trying to accomplish, unless the ability to obtain a write lock within a MapReduce function is supported, it is not a viable option for now. db.eval() While a server side script is certainly a viable option for writing an archiving process, it has a fundamental flaw in that it won’t work for sharded collections, which could be a non-starter for some folks. An example archiving process using db.eval() would look something like this: archiveClick = function archiveClick(doc) { if (expiredAdIds.indexOf(doc.ad_id) != -1) { archivedb = db.getMongo().getDB("archive"); archivedb.clicks.save(doc); // before we remove the original document, make sure the archive worked if (archivedb.getLastError() == null) { db.clicks.remove({_id: doc._id}); } else { throw "could not archive document with _id " + doc._id; } } } archiveClicks = function archiveClicks() { threeMonthsAgo = new Date(); threeMonthsAgo.setMonth(threeMonthsAgo.getMonth()-3); expiredAdIds = []; getExpiredAds = function(ad) {expiredAdIds.push(ad._id.str);} db.ads.find({end_date: {$lte: threeMonthsAgo}).forEach(getExpiredAds); db.system.js.save({_id: "expiredAdIds", value: expiredAdIds}); db.clicks.find().forEach(archiveClick); db.system.js.remove({_id: "expiredAdIds"}); } db.system.js.save({_id: "archiveClick", value: archiveClick}); db.system.js.save({_id: "archiveClicks", value: archiveClicks}); db.runCommand({$eval: "archiveClicks()", nolock: true}); Note the nolock: true sent to db.eval(). This ensures the mongod isn’t locked for other operations while this runs. The Holy Grail: Define Custom archive() Functions On Collections In order for our archiving process to work for sharded collections, we will need to add new functions to collection objects in the shell to perform the archiving. Frankly, this feels a lot more natural than db.eval(). I wanted to support two modes of archiving: immediate archiving, and a mark & sweep approach which allows you to queue documents to be archived at one point in time, and perform the actual archiving at a later time (off-peak hours). I’ve made these functions available in my mongodb-archive project on GitHub. Download the archive.js file, and use it like so: load("archive.js"); threeMonthsAgo = new Date(); threeMonthsAgo.setMonth(threeMonthsAgo.getMonth()-3); expiredAdIds = []; getExpiredAds = function(ad) {expiredAdIds.push(ad._id.str);} db.ads.find({end_date: {$lte: threeMonthsAgo}).forEach(getExpiredAds); archiveConnection = new Mongo("localhost:27018"); archiveDB = archiveConnection.getDB("archive"); archiveCollection = archiveDB.getCollection("clicks"); for (var i = 0; i < expiredAdIds.length; i++) { print("archiving clicks for ad_id: " + expiredAdIds[i]); db.clicks.archive({"ad_id": expiredAdIds[i]}, archiveCollection); print(""); } The mark/sweep variant would look like: db.clicks.queueForArchive({"ad_id": expiredAdIds[i]}); // ....some time later..... db.clicks.archiveQueued(archiveCollection); The biggest factors on the performance of the archiving process are really no different than the things that impact the performance of your MongoDB database overall: working set size, indexes, disk speed, and lock contention. The more indexes there are on the source collection, the longer the remove() will take. The more indexes there are on the archive collection, the longer the save() will take. The more of the data that you wish to archive is resident in memory, the faster the process will be. On my MacBook Pro (2.3GHz i7, 8GB RAM) with a 5400 RPM SATA drive and a clicks collection containing over 11 million documents, I was able to archive ~500 documents per second with the immediate archiving approach. Example output from mongostat when this was running: Archive DB: insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 1 0 0 3 0 208m 2.85g 15m 3.8 0 0|0 0|0 1k 1k 2 17:13:02 0 0 45 0 0 46 0 208m 2.85g 15m 3.4 0 0|0 0|0 61k 6k 2 17:13:03 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 8 0 0 9 0 208m 2.85g 15m 0.1 0 0|0 0|0 10k 2k 2 17:13:04 0 0 47 0 0 48 0 208m 2.85g 15m 3.4 0 0|0 0|0 66k 7k 2 17:13:05 0 0 0 0 0 1 1 208m 2.85g 15m 0 0 0|0 0|0 62b 1k 2 17:13:06 0 0 18 0 0 19 0 208m 2.85g 15m 2.6 0 0|0 0|0 24k 3k 2 17:13:07 0 0 79 0 0 80 0 208m 2.85g 16m 1.1 0 0|0 0|0 111k 11k 2 17:13:08 0 0 131 0 0 132 0 208m 2.85g 16m 8.8 0 0|0 0|0 181k 17k 2 17:13:09 0 0 216 0 0 217 0 208m 2.85g 16m 3.8 0 0|0 0|0 291k 27k 2 17:13:10 0 0 425 0 0 426 0 208m 2.85g 17m 10.5 0 0|0 0|0 600k 53k 2 17:13:11 0 0 490 0 0 490 0 208m 2.85g 19m 7.3 0 0|0 0|1 658k 61k 2 17:13:12 0 0 630 0 0 632 0 208m 2.85g 20m 11.3 0 0|0 0|0 844k 78k 2 17:13:13 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 818 0 0 818 0 208m 2.85g 22m 13.6 0 0|0 0|0 1m 101k 2 17:13:14 0 0 686 0 0 688 0 208m 2.85g 21m 9.6 0 0|0 0|0 932k 85k 2 17:13:15 0 0 0 0 0 1 0 208m 2.85g 22m 0 0 0|0 0|0 62b 1k 2 17:13:16 0 0 178 0 0 179 0 208m 2.85g 22m 1.2 0 0|0 0|0 252k 23k 2 17:13:17 0 0 974 0 0 975 0 208m 2.85g 23m 12.3 0 0|0 0|0 1m 121k 2 17:13:18 0 0 920 0 0 921 0 208m 2.85g 26m 10.1 0 0|0 0|0 1m 114k 2 17:13:19 0 0 810 0 0 811 0 208m 2.85g 26m 8.1 0 0|0 0|0 1m 100k 2 17:13:20 0 0 612 0 0 613 0 208m 2.85g 27m 8.1 0 0|0 0|0 798k 76k 2 17:13:21 0 0 0 0 0 1 0 208m 2.85g 27m 0 0 0|0 0|0 62b 1k 2 17:13:22 0 0 0 0 0 1 0 208m 2.85g 27m 0 0 0|0 0|0 62b 1k 2 17:13:23 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 30 0 0 31 0 208m 2.85g 27m 1.1 0 0|0 0|0 46k 5k 2 17:13:24 0 0 852 0 0 853 0 208m 2.85g 27m 14.8 0 0|0 0|0 1m 106k 2 17:13:25 0 0 768 0 0 769 0 208m 2.85g 29m 9.5 0 0|0 0|0 1m 95k 2 17:13:26 0 0 867 0 0 868 0 208m 2.85g 32m 13.4 0 0|0 0|0 1m 107k 2 17:13:27 0 0 705 0 0 706 0 208m 2.85g 31m 10.8 0 0|0 0|0 936k 88k 2 17:13:28 0 0 331 0 0 332 0 208m 2.85g 32m 3.9 0 0|0 0|0 446k 42k 2 17:13:29 0 0 0 0 0 1 0 208m 2.85g 32m 0 0 0|0 0|0 62b 1k 2 17:13:30 0 0 551 0 0 552 0 208m 2.85g 32m 6.1 0 0|0 0|0 739k 69k 2 17:13:31 0 0 728 0 0 729 0 208m 2.85g 35m 12.2 0 0|0 0|0 949k 90k 2 17:13:32 0 0 991 0 0 992 0 208m 2.85g 35m 11.7 0 0|0 0|0 1m 123k 2 17:13:33 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 935 0 0 935 0 208m 2.85g 38m 10.9 0 0|0 0|1 1m 116k 2 17:13:34 0 0 431 0 0 433 0 208m 2.85g 39m 5.9 0 0|0 0|0 563k 54k 2 17:13:35 0 0 0 0 0 1 0 208m 2.85g 39m 0 0 0|0 0|0 62b 1k 2 17:13:36 0 0 518 0 0 519 0 208m 2.85g 40m 5 0 0|0 0|0 693k 65k 2 17:13:37 0 0 904 0 0 905 0 208m 2.85g 40m 8.9 0 0|0 0|0 1m 112k 2 17:13:38 0 0 958 0 0 959 0 208m 2.85g 41m 14.2 0 0|0 0|0 1m 119k 2 17:13:39 0 0 899 0 0 900 0 208m 2.85g 44m 11.1 0 0|0 0|0 1m 111k 2 17:13:40 0 0 401 0 0 402 0 208m 2.85g 42m 5.9 0 0|0 0|0 540k 50k 2 17:13:41 0 0 856 0 0 857 0 208m 2.85g 44m 9.4 0 0|0 0|0 1m 106k 2 17:13:42 0 0 232 0 0 233 0 208m 2.85g 45m 2.9 0 0|0 0|0 311k 29k 1 17:13:43 Source DB: insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 2 0 12 1 13 0 56g 115g 2.82g 52.2 0 0|0 0|1 1k 822k 2 17:13:02 0 0 0 42 0 43 0 56g 115g 2.65g 88.1 0 0|0 0|1 5k 4k 2 17:13:03 0 0 0 14 0 15 1 56g 115g 1.88g 101 0 0|0 0|1 1k 2k 2 17:13:04 0 0 0 33 1 35 0 56g 115g 1.89g 53.8 16.6 0|0 1|0 4k 4k 2 17:13:05 0 0 0 0 0 1 0 56g 115g 1.9g 0 0 0|0 1|0 62b 1k 2 17:13:06 0 0 0 34 0 35 0 56g 115g 1.9g 47.3 0 0|0 0|0 4k 4m 2 17:13:07 0 0 0 91 0 92 0 56g 115g 1.89g 86.8 0 0|0 0|0 12k 8k 2 17:13:08 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 149 0 150 0 56g 115g 1.9g 82.5 0 0|0 0|0 20k 13k 2 17:13:09 0 0 0 287 0 287 0 56g 115g 1.9g 74.3 0 0|0 0|1 38k 25k 2 17:13:10 0 0 0 387 0 389 0 56g 115g 1.91g 64.6 0 0|0 0|0 52k 33k 2 17:13:11 0 0 0 580 0 581 0 56g 115g 1.93g 59 0 0|0 0|0 78k 49k 2 17:13:12 0 0 0 685 0 686 0 56g 115g 1.93g 47.9 0 0|0 0|0 93k 58k 2 17:13:13 0 0 0 729 0 730 0 56g 115g 1.95g 42.6 0 0|0 0|0 99k 61k 2 17:13:14 0 0 0 551 1 552 0 56g 115g 1.97g 34.5 0 0|0 1|0 74k 47k 2 17:13:15 0 0 0 0 0 1 0 56g 115g 1.98g 0 0 0|0 1|0 62b 1k 2 17:13:16 0 0 0 368 0 368 0 56g 115g 1.96g 20 0 0|0 0|1 50k 4m 2 17:13:17 0 0 0 1032 0 1034 0 56g 115g 2g 35.9 0 0|0 0|0 140k 87k 2 17:13:18 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 834 0 835 0 56g 115g 2.01g 42.3 0 0|0 0|0 113k 70k 2 17:13:19 0 0 0 922 0 922 0 56g 115g 2.02g 38.9 0 0|0 0|1 125k 77k 2 17:13:20 0 0 0 338 1 340 0 56g 115g 2.03g 13 0 0|0 1|0 46k 29k 2 17:13:21 0 0 0 0 0 1 0 56g 115g 2.04g 0 0 0|0 1|0 62b 1k 2 17:13:22 0 0 0 0 0 1 0 56g 115g 2.04g 0 0 0|0 1|0 62b 1k 2 17:13:23 0 0 0 185 0 186 0 56g 115g 2.02g 24.7 0 0|0 0|0 25k 4m 2 17:13:24 0 0 0 920 0 920 0 56g 115g 2.02g 29.9 0 0|0 0|1 125k 77k 2 17:13:25 0 0 0 741 0 742 0 56g 115g 2.04g 49 0 0|0 0|1 100k 62k 2 17:13:26 0 0 0 886 0 887 0 56g 115g 2.05g 33 0 0|0 0|1 120k 74k 2 17:13:27 0 0 0 683 0 684 0 56g 115g 2.08g 57.3 0 0|0 0|1 92k 58k 2 17:13:28 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 138 1 140 0 56g 115g 2.06g 14.5 0 0|0 1|0 18k 12k 2 17:13:29 0 0 0 12 0 12 0 56g 115g 2.07g 0.2 0 0|0 0|1 1k 4m 2 17:13:30 0 0 0 761 0 763 0 56g 115g 2.08g 46.5 0 0|0 0|0 103k 64k 2 17:13:31 0 0 0 762 0 762 0 56g 115g 2.09g 47 0 0|0 0|1 103k 64k 2 17:13:32 0 0 0 957 0 959 0 56g 115g 2.1g 35.3 0 0|0 0|0 130k 80k 2 17:13:33 0 0 0 905 0 905 0 56g 115g 2.13g 38.5 0 0|1 0|1 123k 76k 2 17:13:34 0 0 0 239 1 241 0 56g 115g 2.12g 14.9 0 0|0 1|0 32k 21k 2 17:13:35 0 0 0 0 0 1 0 56g 115g 2.13g 0 0 0|0 1|0 62b 1k 2 17:13:36 0 0 0 780 0 781 0 56g 115g 2.13g 42.8 0 0|0 0|0 106k 4m 2 17:13:37 0 0 0 805 0 806 0 56g 115g 2.14g 39.8 0 0|0 0|0 109k 68k 2 17:13:38 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 988 0 989 0 56g 115g 2.16g 34.8 0 0|0 0|0 134k 83k 2 17:13:39 0 0 0 953 0 953 0 56g 115g 2.17g 39.9 0 0|0 0|1 129k 80k 2 17:13:40 0 0 0 375 1 376 0 56g 115g 2.18g 13.9 0 0|1 0|1 51k 1m 2 17:13:41 0 0 0 867 0 869 0 56g 115g 2.19g 41.7 0 0|0 0|0 118k 73k 1 17:13:42 Total: MongoDB shell version: 2.0.1 connecting to: localhost:27017/prodcopy archiving clicks for ad_id: 4d6c12497b90420373000056 archiving documents... archived 19045 documents in 40701ms. For the queued approach with this same set up, I was able to queue ~1200 documents per second, and the archiving process performed at about the same ~500 documents per second. Example output from mongostat while this was running: Archive DB: insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 0 0 1 0 208m 2.85g 18m 0 0 0|0 0|0 62b 1k 2 17:16:44 0 0 0 0 0 1 0 208m 2.85g 18m 0 0 0|0 0|0 62b 1k 2 17:16:45 0 0 0 0 0 1 0 208m 2.85g 18m 0 0 0|0 0|0 62b 1k 2 17:16:46 0 0 0 0 0 1 0 208m 2.85g 18m 0 0 0|0 0|0 62b 1k 2 17:16:47 0 0 0 0 0 1 0 208m 2.85g 18m 0 0 0|0 0|0 62b 1k 2 17:16:48 0 0 0 0 0 1 0 208m 2.85g 18m 0 0 0|0 0|0 62b 1k 2 17:16:49 0 0 248 0 0 249 0 208m 2.85g 17m 6 0 0|0 0|0 283k 31k 2 17:16:50 0 0 780 0 0 781 0 208m 2.85g 18m 8.2 0 0|0 0|0 883k 97k 2 17:16:51 0 0 1256 0 0 1257 0 208m 2.85g 22m 13.7 0 0|0 0|0 1m 155k 2 17:16:52 0 0 1084 0 0 1085 0 208m 2.85g 21m 10 0 0|0 0|0 1m 134k 2 17:16:53 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 1108 0 0 1109 0 208m 2.85g 25m 9.8 0 0|0 0|0 1m 137k 2 17:16:54 0 0 1108 0 0 1109 0 208m 2.85g 24m 9.8 0 0|0 0|0 1m 137k 2 17:16:55 0 0 1110 0 0 1111 0 208m 2.85g 27m 11 0 0|0 0|0 1m 137k 2 17:16:56 0 0 1166 0 0 1167 0 208m 2.85g 31m 11.3 0 0|0 0|0 1m 144k 2 17:16:57 0 0 1246 0 0 1247 0 208m 2.85g 31m 10.6 0 0|0 0|0 1m 154k 2 17:16:58 0 0 1094 0 0 1095 0 208m 2.85g 35m 9.6 0 0|0 0|0 1m 135k 2 17:16:59 0 0 957 0 0 958 0 208m 2.85g 33m 9.6 0 0|0 0|0 1m 119k 2 17:17:00 0 0 721 0 0 722 0 464m 3.35g 38m 8.9 0 0|0 0|0 1m 90k 2 17:17:01 0 0 243 0 0 243 0 464m 3.35g 34m 9.7 0 0|0 0|0 445k 31k 2 17:17:02 0 0 921 0 0 923 0 464m 3.35g 37m 30.1 0 0|0 0|0 1m 114k 2 17:17:03 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 668 0 0 669 0 464m 3.35g 38m 9.6 0 0|0 0|0 962k 83k 2 17:17:04 0 0 995 0 0 995 0 464m 3.35g 41m 21.2 0 0|0 0|1 1m 123k 2 17:17:05 0 0 467 0 0 468 0 464m 3.35g 27m 12 0 0|1 0|1 679k 58k 2 17:17:06 0 0 1 0 0 2 1 464m 3.35g 17m 85.5 0 0|1 0|1 1k 1k 2 17:17:07 0 0 337 0 0 338 0 464m 3.35g 19m 36.2 0 0|0 0|1 499k 42k 2 17:17:08 0 0 917 0 0 919 0 464m 3.35g 21m 20.4 0 0|0 0|0 1m 114k 2 17:17:09 0 0 1022 0 0 1022 0 464m 3.35g 25m 22.7 0 0|0 0|1 1m 126k 2 17:17:10 0 0 970 0 0 971 0 464m 3.35g 24m 27.4 0 0|0 0|1 1m 120k 2 17:17:11 0 0 166 0 0 168 0 464m 3.35g 25m 1.2 0 0|0 0|0 251k 21k 2 17:17:12 0 0 862 0 0 863 0 464m 3.35g 28m 23.1 0 0|0 0|0 1m 107k 2 17:17:13 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 888 0 0 889 0 464m 3.35g 28m 28.8 0 0|0 0|0 1m 110k 2 17:17:14 0 0 540 0 0 541 0 464m 3.35g 30m 12.2 0 0|0 0|0 802k 67k 2 17:17:15 0 0 879 0 0 880 0 464m 3.35g 31m 22.3 0 0|0 0|0 1m 109k 2 17:17:16 0 0 473 0 0 474 0 464m 3.35g 36m 17.2 0 0|0 0|0 1m 59k 1 17:17:17 Source DB: insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 0 0 1 0 56g 115g 1.86g 0 0 0|0 0|0 62b 1k 1 17:16:29 0 0 0 0 0 1 0 56g 115g 1.86g 0 0 0|0 0|0 62b 1k 1 17:16:30 1 1 1 0 1 3 0 56g 115g 1.87g 51.1 0 0|0 0|1 475b 697k 2 17:16:31 0 0 0 0 0 1 0 56g 115g 349m 129 0 0|0 0|1 62b 1k 2 17:16:32 0 0 0 0 0 1 0 56g 115g 367m 101 0 0|0 0|1 62b 1k 2 17:16:33 0 0 0 0 0 1 0 56g 115g 384m 73.4 0 0|0 0|1 62b 1k 2 17:16:34 0 0 0 0 0 1 0 56g 115g 428m 112 0 0|0 0|1 62b 1k 2 17:16:35 0 0 0 0 0 1 0 56g 115g 436m 94.8 0 0|0 0|1 62b 1k 2 17:16:36 0 0 0 0 0 1 0 56g 115g 450m 108 0 0|0 0|1 62b 1k 2 17:16:37 0 0 0 0 0 1 0 56g 115g 461m 110 0 0|0 0|1 62b 1k 2 17:16:38 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 0 0 1 0 56g 115g 480m 95.5 0 0|0 0|1 62b 1k 2 17:16:39 0 0 0 0 0 1 0 56g 115g 494m 73.5 0 0|0 0|1 62b 1k 2 17:16:40 0 0 0 0 0 1 0 56g 115g 531m 120 0 0|0 0|1 62b 1k 2 17:16:41 0 0 0 0 0 1 0 56g 115g 541m 108 0 0|0 0|1 62b 1k 2 17:16:42 0 0 0 0 0 1 0 56g 115g 564m 74.6 0 0|0 0|1 62b 1k 2 17:16:43 0 0 0 0 0 1 0 56g 115g 579m 127 0 0|0 0|1 62b 1k 2 17:16:44 0 0 0 0 0 1 0 56g 115g 605m 100 0 0|0 0|1 62b 1k 2 17:16:45 0 0 0 0 0 1 0 56g 115g 631m 98.5 0 0|0 0|1 62b 1k 2 17:16:46 0 0 0 0 0 1 0 56g 115g 657m 89.1 0 0|0 0|1 62b 1k 2 17:16:47 0 0 0 0 0 1 0 56g 115g 662m 87.8 0 0|0 0|1 62b 1k 2 17:16:48 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 0 0 1 0 56g 115g 680m 93.5 0 0|0 0|1 62b 1k 2 17:16:49 0 1 0 530 1 531 0 56g 115g 767m 81.3 0 0|0 0|1 72k 4m 2 17:16:50 0 0 0 884 0 886 0 56g 115g 721m 48 0 0|0 0|0 120k 74k 2 17:16:51 0 0 0 1079 0 1080 0 56g 115g 698m 33.6 0 0|0 0|0 146k 90k 2 17:16:52 0 0 0 1294 0 1294 0 56g 115g 720m 23.8 0 0|1 0|1 175k 108k 2 17:16:53 0 0 0 1118 1 1120 0 56g 115g 740m 31.9 0 0|0 0|0 152k 4m 2 17:16:54 0 0 0 1111 0 1112 0 56g 115g 718m 33.3 0 0|0 0|0 151k 93k 2 17:16:55 0 0 0 1097 0 1098 0 56g 115g 706m 33.1 0 0|0 0|0 149k 92k 2 17:16:56 0 0 0 1097 0 1098 0 56g 115g 701m 33.2 0 0|0 0|0 149k 92k 2 17:16:57 0 0 0 1091 1 1092 0 56g 115g 696m 35.1 0 0|0 0|0 148k 4m 2 17:16:58 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 1149 0 1150 0 56g 115g 718m 21.3 0 0|0 0|0 156k 96k 2 17:16:59 0 0 0 1091 0 1091 0 56g 115g 707m 31.2 0 0|0 0|1 148k 91k 2 17:17:00 0 0 0 468 0 470 0 56g 115g 714m 9.3 0 0|0 0|0 63k 40k 2 17:17:01 0 0 0 332 0 333 0 56g 115g 694m 27.3 0 0|0 0|0 45k 28k 2 17:17:02 0 0 0 1013 1 1014 0 56g 115g 713m 17.8 0 0|0 0|0 137k 4m 2 17:17:03 0 0 0 551 0 552 0 56g 115g 691m 39.9 0 0|0 0|0 74k 47k 2 17:17:04 0 0 0 1140 0 1141 0 56g 115g 709m 19.7 0 0|0 0|0 155k 95k 2 17:17:05 0 0 0 126 0 127 0 56g 115g 709m 2.1 0 0|0 0|0 17k 11k 2 17:17:06 0 0 0 92 0 92 0 56g 115g 710m 1.6 0 0|0 0|1 12k 8k 2 17:17:07 0 0 0 605 1 607 0 56g 115g 700m 29.1 0 0|0 0|0 82k 4m 2 17:17:08 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn time 0 0 0 814 0 814 0 56g 115g 699m 17.3 0 0|0 0|1 110k 68k 2 17:17:09 0 0 0 1036 0 1037 0 56g 115g 708m 18.4 0 0|0 0|0 140k 87k 2 17:17:10 0 0 0 820 0 821 0 56g 115g 715m 14.4 0 0|1 0|1 111k 69k 2 17:17:11 0 0 0 298 0 300 0 56g 115g 683m 24.2 0 0|0 0|0 40k 26k 2 17:17:12 0 0 0 907 1 908 0 56g 115g 699m 20.4 0 0|0 0|0 123k 4m 2 17:17:13 0 0 0 946 0 947 0 56g 115g 707m 16.4 0 0|0 0|0 128k 79k 2 17:17:14 0 0 0 374 0 375 0 56g 115g 677m 36.7 0 0|0 0|0 50k 32k 2 17:17:15 0 0 0 905 1 905 0 56g 115g 689m 16.2 0 0|0 0|0 123k 821k 2 17:17:16 0 0 0 259 0 261 0 56g 115g 690m 5.1 0 0|0 0|0 35k 22k 1 17:17:17 0 0 0 0 0 1 0 56g 115g 690m 0 0 0|0 0|0 62b 1k 1 17:17:18 Total: MongoDB shell version: 2.0.1 connecting to: localhost:27017/prodcopy archiving clicks for ad_id: 4d6c12497b90420373000062 ensuring sparse index on field "arch" marking documents for archive... queued 22227 documents for archive in 19369ms. archiving queued documents to archive.clicks... archived 22227 queued documents to archive.clicks in 26901ms. Upgrading the hard drive to an SSD, I was able to almost triple the performance of the archiving step, and archive ~1300 documents per second. Obviously this is all a giant hack, and it would be nice if MongoDB supported some kind of partitioning feature so we didn’t have to do any of this manually. But until then, this is the best I could get given the tools currently available. Source: http://blog.brianploetz.com/post/12131083486/adventures-in-archiving-with-mongodb

November 21, 2011

by Mitch Pronschinske

· 14,417 Views

Freight Management System on NetBeans

Lynden is a family of transportation and logistics companies specialized in shipping to Alaska and other locations worldwide. Over land, on the water, in the air - or in any combination - Lynden has been helping customers solve transportation problems for over a century. The Lynden Freight Management System is a NetBeans Platform application which serves a dual purpose as both a planning and freight tracking tool. The Planning module allows terminal managers to see all freight that is currently inbound to their location as well as freight that is scheduled to depart from their location so they can make the most efficient use of their dock space and resources as possible. The Trace module allows customer service personnel to search for customer account information, view the tracking history of any given freight item in the system as well as display any documents related to the shipment, such as bills of lading or delivery receipts. NetBeans Platform Lynden has benefited from the NetBeans Platform as it allows developers to focus on the business logic of our applications rather than the underlying "plumbing". We are able to leverage built-in support for event handling, enable/disable functionality on UI controls, dockable windows, and automatic updates for our application with minimal work compared to rolling our own framework. We chose to go the desktop application route as we have a number of existing desktop applications here that this application will likely need to interface with at some point, as well as a commercial set of rich UI components that we have been using for some time now. For the initial deployment, we will be pushing the installer out to employee PCs via the Landesk remote desktop administration tool. Future updates to various modules within the application will be done via the update center functionality built into the NetBeans Platform. Screenshots

November 19, 2011

by Rob Terpilowski

· 11,786 Views · 3 Likes

ASP.NET MVC: Converting business objects to select list items

Some of our business classes are used to fill dropdown boxes or select lists. And often you have some base class for all your business classes. In this posting I will show you how to use base business class to write extension method that converts collection of business objects to ASP.NET MVC select list items without writing a lot of code. BusinessBase, BaseEntity and other base classes I prefer to have some base class for all my business classes so I can easily use them regardless of their type in contexts I need. NB! Some guys say that it is good idea to have base class for all your business classes and they also suggest you to have mappings done same way in database. Other guys say that it is good to have base class but you don’t have to have one master table in database that contains identities of all your business objects. It is up to you how and what you prefer to do but whatever you do – think and analyze first, please. :) To keep things maximally simple I will use very primitive base class in this example. This class has only Id property and that’s it. public class BaseEntity { public virtual long Id { get; set; } } Now we have Id in base class and we have one more question to solve – how to better visualize our business objects? To users ID is not enough, they want something more informative. We can define some abstract property that all classes must implement. But there is also another option we can use – overriding ToString() method in our business classes. public class Product : BaseEntity { public virtual string SKU { get; set; } public virtual string Name { get; set; } public override string ToString() { if (string.IsNullOrEmpty(Name)) return base.ToString(); return Name; } } Although you can add more functionality and properties to your base class we are at point where we have what we needed: identity and human readable presentation of business objects. Writing list items converter Now we can write method that creates list items for us. public static class BaseEntityExtensions { public static IEnumerable ToSelectListItems (this IList baseEntities) where T : BaseEntity { return ToSelectListItems((IEnumerator) baseEntities.GetEnumerator()); } public static IEnumerable ToSelectListItems (this IEnumerator baseEntities) { var items = new HashSet(); while (baseEntities.MoveNext()) { var item = new SelectListItem(); var entity = baseEntities.Current; item.Value = entity.Id.ToString(); item.Text = entity.ToString(); items.Add(item); } return items; } } You can see here to overloads of same method. One works with List and the other with IEnumerator. Although mostly my repositories return IList when querying data there are always situations where I can use more abstract types and interfaces. Using extension methods in code In your code you can use ToSelectListItems() extension methods like shown on following code fragment. ... var model = new MyFormModel(); model.Statuses = _myRepository.ListStatuses().ToSelectListItems(); ... You can call this method on all your business classes that extend your base entity. Wanna have some fun with this code? Write overload for extension method that accepts selected item ID.

November 11, 2011

by Gunnar Peipman

· 13,667 Views