DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Databases Topics

article thumbnail
Java EE: The Basics
wanted to go through some of the basic tenets, the technical terminology related to java ee. for many people, java ee/j2ee still mean servlets, jsps or maybe struts at best. no offence or pun intended! this is not a java ee 'bible' by any means. i am not capable enough of writing such a thing! so let us line up the 'keywords' related to java ee and then look at them one by one java ee java ee apis (specifications) containers services multitiered applications components let's try to elaborate on the above mentioned points. ok. so what is java ee? 'ee' stands for enterprise edition. that essentially makes java ee - java enterprise edition. if i had to summarize java ee in a couple of sentences, it would go something like this "java ee is a platform which defines 'standard specifications/apis' which are then implemented by vendors and used for development of enterprise (distributed, 'multi-tired', robust) 'applications'. these applications are composed of modules or 'components' which use java ee 'containers' as their run-time infrastructure." what is this 'standardized platform' based upon? what does it constitute? the platform revolves around 'standard' specifications or apis . think of these as contracts defined by a standard body e.g. enterprise java beans (ejb), java persistence api (jpa), java message service (jms) etc. these contracts/specifications/apis are implemented by different vendors e.g. glassfish, oracle weblogic, apache tomee etc alright. what about containers? containers can be visualized as 'virtual/logical partitions' . each container supports a subset of the apis/specifications defined by the java ee platform they provide run-time 'services' to the 'applications' which they host the java ee specification lists 4 types of containers ejb container web container application client container applet container java ee containers i am not going to dwell into details of these containers in this post. services?? well, 'services' are nothing but a result of the vendor implementations of the standard 'specifications' (mentioned above). examples of specifications are - jersey for jax-rs (restful services), tyrus (web sockets), eclipselink (jpa), weld (cdi) etc. the 'container' is the interface between the deployed application ('service' consumer) and the application server. here is a list of 'services' which are rendered by the 'container' to the underlying 'components' (this is not an exhaustive list) persistence - offered by the java persistence api (jpa) which drives object relational mapping (orm) and an abstraction for the database operations. messaging - the java message service (jms) provides asynchronous messaging between disparate parts of your applications. contexts & dependency injection - cdi provides loosely coupled and type safe injection of resources. web services - jaxrs and jaxws provide support for rest and soap style services respectively transaction - provided by the java transaction api (jta) implementation what is a typical java ee 'application'? what does it comprise of? applications are composed of different ' components ' which in turn are supported by their corresponding ' container ' supported 'component' types are: enterprise applications - make use of the specifications like ejb, jms, jpa etc and are executed within an ejb container web applications - they leverage the servlet api, jsp, jsf etc and are supported by a web container application client - executed in client side. they need an application client container which has a set of supported libraries and executes in a java se environment. applets - these are gui applications which execute in a web browser. how are java ee applications structured? as far as java ee 'application' architecture is concerned, they generally tend follow the n-tier model consisting of client tier, server tier and of course the database (back end) tier client tier - consists of web browsers or gui (swing, java fx) based clients. web browsers tend to talk to the 'web components' on the server tier while the gui clients interact directly with the 'business' layer within the server tier server tier - this tier comprises of the dynamic web components (jsp, jsf, servlets) and the business layer driven by ejbs, jms, jpa, jta specifications. database tier - contains 'enterprise information systems' backed by databases or even legacy data repositories. generic 3-tier java ee application architecture java ee - bare bones, basics.... as quickly and briefly as i possibly could. that's all for now! :-) stay tuned for more java ee content, specifically around the latest and greatest version of the java ee platform --> java ee 7 happy reading!
April 29, 2014
by Abhishek Gupta DZone Core CORE
· 40,595 Views · 3 Likes
article thumbnail
Neo4j & Cypher: Creating a Time Tree Down to the Day
Michael recently wrote a blog post showing how to create a time tree representing time down to the second using Neo4j’s Cypher query language, something I built on top of for a side project I’m working on. The domain I want to model is RSVPs to meetup invites – I want to understand how much in advance people respond and how likely they are to drop out at a later stage. For this problem I only need to measure time down to the day so my task is a bit easier than Michael’s. After a bit of fiddling around with leap years, I believe the following query will create a time tree representing all the days from 2011 – 2014, which covers the time the London Neo4j meetup has been running: WITH range(2011, 2014) AS years, range(1,12) as months FOREACH(year IN years | MERGE (y:Year {year: year}) FOREACH(month IN months | CREATE (m:Month {month: month}) MERGE (y)-[:HAS_MONTH]->(m) FOREACH(day IN (CASE WHEN month IN [1,3,5,7,8,10,12] THEN range(1,31) WHEN month = 2 THEN CASE WHEN year % 4 <> 0 THEN range(1,28) WHEN year % 100 <> 0 THEN range(1,29) WHEN year % 400 <> 0 THEN range(1,29) ELSE range(1,28) END ELSE range(1,30) END) | CREATE (d:Day {day: day}) MERGE (m)-[:HAS_DAY]->(d)))) The next step is to link adjacent days together so that we can easily traverse between adjacent days without needing to go back up and down the tree. For example we should have something like this: (jan31)-[:NEXT]->(feb1)-[:NEXT]->(feb2) We can build this by first collecting all the ‘day’ nodes in date order like so: MATCH (year:Year)-[:HAS_MONTH]->(month)-[:HAS_DAY]->(day) WITH year,month,day ORDER BY year.year, month.month, day.day WITH collect(day) as days RETURN days And then iterating over adjacent nodes to create the ‘NEXT’ relationship: MATCH (year:Year)-[:HAS_MONTH]->(month)-[:HAS_DAY]->(day) WITH year,month,day ORDER BY year.year, month.month, day.day WITH collect(day) as days FOREACH(i in RANGE(0, length(days)-2) | FOREACH(day1 in [days[i]] | FOREACH(day2 in [days[i+1]] | CREATE UNIQUE (day1)-[:NEXT]->(day2)))) Now if we want to find the previous 5 days from the 1st February 2014 we could write the following query: MATCH (y:Year {year: 2014})-[:HAS_MONTH]->(m:Month {month: 2})-[:HAS_DAY]->(:Day {day: 1})<-[:NEXT*0..5]-(day) RETURN y,m,day If we want to we can create the time tree and then connect the day nodes all in one query by using ‘WITH *’ like so: WITH range(2011, 2014) AS years, range(1,12) as months FOREACH(year IN years | MERGE (y:Year {year: year}) FOREACH(month IN months | CREATE (m:Month {month: month}) MERGE (y)-[:HAS_MONTH]->(m) FOREACH(day IN (CASE WHEN month IN [1,3,5,7,8,10,12] THEN range(1,31) WHEN month = 2 THEN CASE WHEN year % 4 <> 0 THEN range(1,28) WHEN year % 100 <> 0 THEN range(1,29) WHEN year % 400 <> 0 THEN range(1,29) ELSE range(1,28) END ELSE range(1,30) END) | CREATE (d:Day {day: day}) MERGE (m)-[:HAS_DAY]->(d)))) WITH * MATCH (year:Year)-[:HAS_MONTH]->(month)-[:HAS_DAY]->(day) WITH year,month,day ORDER BY year.year, month.month, day.day WITH collect(day) as days FOREACH(i in RANGE(0, length(days)-2) | FOREACH(day1 in [days[i]] | FOREACH(day2 in [days[i+1]] | CREATE UNIQUE (day1)-[:NEXT]->(day2)))) Now I need to connect the RSVP events to the tree!
April 29, 2014
by Mark Needham
· 4,841 Views
article thumbnail
Neo4j 2.0.0: Query Not Prepared Correctly / Type Mismatch: Expected Map
I was playing around with Neo4j’s Cypher last weekend and found myself accidentally running some queries against an earlier version of the Neo4j 2.0 series (2.0.0). My first query started with a map and I wanted to create a person from an identifier inside the map: WITH {person: {id: 1} AS params MERGE (p:Person {id: params.person.id}) RETURN p When I ran the query I got this error: ==> SyntaxException: Type mismatch: expected Map but was Boolean, Number, String or Collection (line 1, column 62) ==> "WITH {person: {id: 1} AS params MERGE (p:Person {id: params.person.id}) RETURN p" If we try the same query in 2.0.1 it works as we’d expect: ==> +---------------+ ==> | p | ==> +---------------+ ==> | Node[1]{id:} | ==> +---------------+ ==> 1 row ==> Nodes created: 1 ==> Properties set: 1 ==> Labels added: 1 ==> 47 ms My next query was the following which links topics of interest to a person: WITH {topics: [{name: "Java"}, {name: "Neo4j"}]} AS params MERGE (p:Person {id: 2}) FOREACH(t IN params.topics | MERGE (topic:Topic {name: t.name}) MERGE (p)-[:INTERESTED_IN]->(topic) ) RETURN p In 2.0.0 that query fails like so: ==> InternalException: Query not prepared correctly! but if we try it in 2.0.1 we’ll see that it works as well: ==> +---------------+ ==> | p | ==> +---------------+ ==> | Node[4]{id:2} | ==> +---------------+ ==> 1 row ==> Nodes created: 1 ==> Relationships created: 2 ==> Properties set: 1 ==> Labels added: 1 ==> 53 ms So if you’re seeing either of those errors, then get yourself upgraded to 2.0.1 as well!
April 28, 2014
by Mark Needham
· 4,429 Views
article thumbnail
Update Row With Highest ID In MySQL
Recently needed to update the last inserted row of a table but didn't have anyway in knowing what the highest ID in the table was. I can easily do this by using the max() function to select the highest ID in the table. SELECT MAX(id) FROM table; Then I can use the result of this query in the UPDATE query to edit the record with the highest ID. But this is quite a easy query so I should be able to do this in one query by using a nested select query on the UPDATE. UPDATE table SET name='test_name' WHERE id = (SELECT max(id) FROM table) But the problem with this is that the MAX() function doesn't work inside a nested select so had to find another way of doing this. I found out that you can use an ORDER BY and a LIMIT in an UPDATE query therefore I can use a combination of these in the UPDATE query to make sure I only update the record with the highest ID, by doing a descendant order on the ID and limiting the return to only 1 record. UPDATE table SET name='test_name' ORDER BY id DESC LIMIT 1;
April 23, 2014
by Paul Underwood
· 21,532 Views
article thumbnail
Writing effective custom queries in Hibernate
There are many instances where we will have to write custom queries with hibernate. I always hated writing custom queries due to following reasons Hibernate returns List of Object arrays (List getSalaryByDepartment() { Session session = HibernateUtil.getSessionFactory().openSession(); try { List salaryByDepartments = new ArrayList(); for (Object object : contactList) { Object[] result = (Object[]) object; SalaryByDepartment salaryByDepartment = new SalaryByDepartment(); salaryByDepartment.setDeptId((Integer) result[0]); salaryByDepartment.setDepartmentName((String) result[1]); salaryByDepartment.setSalary((Double) result[2]); salaryByDepartments.add(salaryByDepartment); } return salaryByDepartments; } catch (HibernateException e) { e.printStackTrace(); return null; } finally { session.close(); } } There is a better mechanism for writing the custom queries called 'select new' but it is not widely used for some reason. It might be an overkill to write all custom queries using this mechanism, but is good for queries which return more than 2 columns. public List getNewSalaryByDepartment() { Session session = HibernateUtil.getSessionFactory().openSession(); try { List salaryByDept = session.createQuery("select " + "new hsqldb.results.SalaryByDepartment(department.id, department.departmentName, sum(employee.salary)) " + "from Employee employee, Department department " + "where employee.department.id = department.id group by department.id") .list(); return salaryByDept; } catch (HibernateException e) { e.printStackTrace(); return null; } finally { session.close(); } }
April 21, 2014
by Amar Mattey
· 27,117 Views
article thumbnail
Sampling A Neo4j Database
After reading the interesting blog post of my colleague Rik van Bruggen on “Media, Politics and Graphs” I thought it would be really cool to render it as a GrapGist. Especially, as he already shared all the queries as a GitHub Gist. Unfortunately the dataset was a bit large for a sensible GraphGist representation, so I thought about means of extracting a smaller sample of his raw data that he made available (see his blog post for the link). Considering my last blog post on creating data from sampling a cross product, this should be much easier. We know we want to have all nodes with the labels PARTY, SHOW and GENDER in our graph as well as a sample of GUEST nodes with their relationships. The first part is easy: MATCH (n) WHERE n:PARTY OR n:SHOW OR n:GENDER RETURN n; The second part uses something that was not helpful in my last exploration, namely that random sampling when applied directly to a match, is used to filter the first node-pattern in the match and then still traverse all relationships/paths emanating from that node. MATCH(n:GUEST)-[r]->() WHERE rand() < 0.1 RETURN n,r; The number you compare rand() to is the percentage you want to get back, in this example 10%. Now I have two nice queries, that can get me the data, how can I bring them together? With UNION ALL MATCH (n) WHERE n:PARTY OR n:SHOW OR n:GENDER RETURN n, null as r UNION ALL MATCH(n:GUEST)-[r]->() WHERE rand() < 0.1 RETURN n,r; And where do I get the Cypher statements from, that I can use to populate my GraphGist database setup? Fortunately my dump command made it into the Neo4j-Shell, so that we can just run it on the command-line and redirect the output into a file: bin/neo4j-shell -path talkshow/graph.db \ -c 'dump MATCH (n) WHERE n:PARTY OR n:SHOW OR n:GENDER RETURN n, null as r UNION ALL MATCH(n:GUEST)-[r]->() WHERE rand() < 0.1 RETURN n,r;' \ > talkshow/sample.cql Don’t forget the semicolon at the end! Looking at sample.cql we see something like: begin create (_0:`SHOW` {`Modularity Name`:"B&vD", `id`:"B&vD", `label`:"B&vD", `modularity_class`:3, `weighted outdegree`:0.000000}) create (_1:`SHOW` {`Modularity Name`:"P&W", `id`:"P&W", `label`:"P&W", `modularity_class`:4, `weighted outdegree`:0.000000}) create (_2:`SHOW` {`Modularity Name`:"DWDD", `id`:"DWDD", `label`:"DWDD", `modularity_class`:5, `weighted outdegree`:0.000000}) ... ... create _509-[:`VISITED` {`quantity`:1}]->_5 create _509-[:`VISITED` {`quantity`:1}]->_2 create _509-[:`VISITED` {`quantity`:1}]->_1 create _509-[:`VISITED` {`quantity`:1}]->_0 ; commit Which we can now use to populate our database for our GraphGist, and here it is in all its beauty – GraphGist: “Media, Politics and Graphs”. But actually I chose not to use Rik’s GitHub Gist with the queries, but to copy the nice text and pictures from his blog post into the GraphGist. You might notice that some of the parties go without connections. That would need some tweaking of the sampling which I leave as exercise for you. Have fun Michael
April 21, 2014
by Michael Hunger
· 3,624 Views
article thumbnail
Insert Embedded or Linked OLE Object in Word Files & EUDC Fonts Support in .NET & Java Apps
What's New in this Release? Aspose development team is happy to announce the monthly release of Aspose.Words for Java &.NET 14.3.0. Aspose.Words now supports insertion of OLE objects such as another Microsoft Word document or a Microsoft Excel chart. A new public method, InsertOleObject, has been introduced in the DocumentBuilder class. This method can be used to insert an embedded or linked OLE object from a file into a Word document. Aspose.Words’ rendering engine now partially supports EUDC (End-User-Defined-Characters) fonts. Please find below the description of how EUDC fonts works on Windows. In this first implementation, Aspose.Words uses a single EUDC font. When rendering a document to fixed-page formats, this font is searched among the specified font sources by “EUDC” family name. Starting from Aspose.Words 14.3.0, Best Fit position of data labels in pie charts is partially supported. In previous versions labels with best fit position were rendered as if they had the inside end position. Currently we use a modified Open Office algorithm to set the best fit position of data labels. The list of new and improved features in this release are listed below Public API for insertion of OLE objects both linked and embedded Outline, Shadow, Reflection, Glow and Fill text effects for rendering text inside DrawingML shapes EUDC fonts rendering partially supported “PDF Logical Structure” export reworked, significantly improving memory usage Support OLE embedding of documents and files Feature Write an article about how to work with Table of contents in Aspose.Words Add tag support Support BestFit position of Pie chart's data labels. Support style:text-decoration attribute of Paragraph tag. Preserve private characters in EUDC.TTE during rendering to Pdf Support HTML table row borders in HTML import Support text outline effect. Support text fill effect. Support text shadow effect. Support text reflection effect Support text glow effect. Support text effects applied to text in Dml Shape or in SmartArt. Implement reading text effects from rPr in DML. Support inheriting styles from parent objects. Add image compression options for different image types Add a link to the online documentation in the DLLs only release Default run properties lose lang attribute value on DOCX to DOC conversions Padding for image in table is lost when rendering to Pdf Consider using "Don't vertically align cells containing floating objects" compatibility option when exporting to HTML Shape in table's cell is improperly horizontally aligned to center after export to HTML is now fixed Floating shape is now properly vertically positioned after export to HTML Warning : Unkno/wn ProgId value 'Visio.Drawing.11'. This might cause inaccessible OLE embedding Charts (DrawingML) issue fixed and now render correctly in Pdf file Aspose.Words now properly work in Jdeveloper IDE OLE object cannot be edited after re saving the document Diagram connectors are inverted/flipped after conversion from Docx to pdf WordArt letters are condensing issue is resolved Condensed character spacing is lost is fixed Relative hyperlink with Unicode is now properly saved to Pdf Add pre-built Document Explorer JAR to Java release. Aspose.Words does not take in account style set in . Table now looks correctly while converting html to doc. Hyperlinks split into multiple fragments/links in output PDF Horizontal table position is corrected. Document.UpdateFields Does not Update TOC in DOCX Content in the output html is overlapped at many places in Html Different table justify alignment for different compatibilityMode values. CSS selectors now work for , , and elements Chart Legend/Series now render correctly in output Pdf file Data labels with best fit position are rendering at correct places in Chart Offset of the hyperlink text line is off by a few pixels in output Pdf Hyperlinks split into multiple fragments/links in output PDF is fixed Document.UpdateFields is enhanced and update SUM formula field Paragraph's first line indent increases is fixed when exported to HTM Space before a paragraph following a floating element is too large after HTML to DOCX conversion is now fixd Block-level SVG image become inline after HTML to DOCX conversion Problem with vertical paragraph spacing is resolved when importing HTML using InsertHtml method Shape rotation is fixed after conversion from Doc to Pdf Text color is changed after conversion from Docx to WordML/Doc Word Table indentation is now corrected when is placed inside Arrows on the Lines gets distorted when converting to Pdf is now fixed A space character is exported to PDF output between Japanese and Numeric characters is fixed Line Shape from Header is merged with the top border of Table in Body Docx to WordML conversion issue resolved with content control Junk text is rendered in fixed page formats Positions of some DrawingML circles are now preserved during rendering A row and some content is rendering at the bottom of previous page is fixed Header table rows and images are now preserved in PDF List items issue fixed, now line up correctly after conversion from RTF to HTML Conversion from Docm to Doc creates corrupted document now fixed Hyperlink for an icon is now preserved during HTML to PDF conversion Text overlapping is fixed after conversion from Docx to Pdf Text position change is fixed after conversion from Docx to Pdf Other most recent bug fixes are also included in this release Newly added documentation pages and articles Some new tips and articles have now been added into Aspose.Words for .NET documentation that may guide you briefly how to use Aspose.Words for performing different tasks like the followings. How Aspose.Words Uses True Type Fonts How to Extract Images from a Document Overview: Aspose.Words Aspose.Words is a word processing component that enables .NET, Java & Android applications to read, write and modify Word documents without using Microsoft Word. Other useful features include document creation, content and formatting manipulation, mail merge abilities, reporting features, TOC updated/rebuilt, Embedded OOXML, Footnotes rendering and support of DOCX, DOC, WordprocessingML, HTML, XHTML, TXT and PDF formats (requires Aspose.Pdf). It supports both 32-bit and 64-bit operating systems. You can even use Aspose.Words for .NET to build applications with Mono. More about Aspose.Words Homepage Aspose.Words for .NET Homepage Java Word Library Download Aspose.Words for .NET Download Aspose.Words for Java Demos for Aspose.Words Contact Information Aspose Pty Ltd Suite 163, 79 Longueville Road Lane Cove, NSW, 2066 Australia Aspose - Your File Format Experts [email protected] Phone: 888.277.6734 Fax: 866.810.9465
April 18, 2014
by David Zondray
· 3,071 Views
article thumbnail
Mule Meets Zuul: A Centralized Properties Management – Part I, Server Side
It is always recommended to use Spring properties with Mule, to externalize any configuration parameters (URLs, ports, user names, passwords, etc.). For example, the Acme APIfrom my previous post connects to an external database. So instead of hard-coding connectivity options inside my application code, I would create a properties file, e.g. acme.properties, as follows: acme.jdbc.host=acmedb acme.jdbc.port=3306 acme.jdbc.database=acmeProducts acme.jdbc.user=WileECoyote acme.jdbc.password=GeeWhizz Obviously, as a developer, I would use a test instance of Acme database to test my application. I’d commit the code to the version control system, including the properties file. Then my application would begin its journey from the automated build system to the Dev environment, to QA, Pre-Prod, and finally Prod – and fail to deploy on production because it wouldn’t be able to connect to the test database! Or even worse, it would connect to the test database and use it and no one would notice the problem until customers placed $0 order for an Acme widget which would normally cost $1000, all because the test database didn’t contain actual prices! Sure, I could just follow the recommendations on our web site and create multiple sets of properties, e.g. acme.dev.properties, acme.qa.properties, acme.prod.properties etc. But instead of solving the problem, it would create a few new ones. First, those properties must still be packaged within the application. Needless to say, IT guys would never give me the credentials for the production database, so I’d have to provide instructions for them on how to modify the properties file AFTER the application is deployed on the prod platform. Second, if (or rather WHEN) any of those properties will need to be changed (for example, the production DB is migrated to a new server), the whole process has to be repeated. And don’t forget about passwords and other sensitive data that should never appear in the code as open text and have to be encrypted. It seems like every single customer I’ve worked with has this problem. And there was no convincing solution until one of our customers told me about an application called Zuul. As the description on the Zuul web site says, “Zuul is a free, open source web application which can be used to centralize and manage configuration for your internal applications. It enables your operations team to control changes and your developers a centralized place to organize settings.” Of course, I couldn’t resist the urge to download it and try it out with Mule. The installation and configuration of the Zuul server was pretty straightforward. After all, Zuul is a standard web application, so I just deployed it to my local Tomcat instance, alongside with MMC which was already deployed on it. I configured the database settings to point to my local MySQL instance. For the LDAP server I used OpenLDAP. I had to download and install the Unlimited Strength JCE Policy Files. Then I started Tomcat and opened the Zuul URL in my browser and logged in as administrator. The first task is to create my environments. Navigating to Administration->Environments menu, I see three environments, prod, qa, and dev, which Zuul creates by default. Just what I need! Moreover, the prod environment is red – which means, only someone with Administrator privileges can mess with it. And while we are in the Administration screen, let’s create a new encryption key for our password values. Administration->Key Management, then click on Create New... button and populate the form: And now we can create our properties. Select Settings->Create New, give it a name, e.g. AcmeProperties. On the next screen, you’re given the option to create a new properties set from scratch, or to upload an existing properties file. Since we already have acme.properties for our dev environment, let’s just use it. Select dev environment on the left tab, then click Upload File button: Upload acme.properties and you’ll see the following screen: Now we can encrypt the database password. Just make sure the correct key is selected, then click Edit and select Encrypt. To finish the server setup, we replicate this set of properties on the qa and prod environments. Select qa tab, then click Copy Existing, then in the Search text box type dev. Your properties set "/dev/AcmeProperties.properties" will be highlighted. Click Copy button and now you have the identical set of properties in qa. Repeat the process for the prod environment. Change properties values on each environment accordingly. This concludes the server setup procedure. In the next post, I will show you how to configure Mule to use Zuul properties management. UPDATE: Zuul can be downloaded at http://www.devnull.org/zuul
April 17, 2014
by Ross Mason
· 7,495 Views · 1 Like
article thumbnail
Innodb redo log archiving
This post was originally written by Vlad Lesin for the MySQL Performance Blog. Percona Server 5.6.11-60.3 introduces a new “log archiving” feature. Percona XtraBackup 2.1.5 supports “apply archived logs.” What does it mean and how it can be used? Percona products propose three kinds of incremental backups. The first is full scan of data files and comparison the data with backup data to find some delta. This approach provides a history of changes and saves disk space by storing only data deltas. But the disadvantage is a full-data file scan that adds load to the disk subsystem. The second kind of incremental backup avoids extra disk load during data file scans. The idea is in reading only changed data pages. The information about what specific pages were changed is provided by the server itself which writes files with the information during work. It’s a good alternative but changed-pages tracking adds some small load. And Percona XtraBackup’s delta reading leads to non-sequential disk io. This is good alternative but there is one more option. The Innodb engine has a data log. It writes all operations which modify database pages to log files. This log is used in the case of unexpected server terminating to recover data. The Innodb log consists of the several log files which are filled sequentially in circular. The idea is to save those files somewhere and apply all modifications from archived logs to backup data files. The disadvantage of this approach is in using extra disk space. The advantage is there is no need to do an “explicit” backup on the host server. A simple script could sit and wait for logs to appear then scp/netcat them over to another machine. But why not use good-old replication? Maybe replication does not have such performance as logs recovering but it is more controlled and well-known. Archived logs allows you to do any number of things with them from just storing them to doing periodic log applying. You can not recover from a ‘DROP TABLE’, etc with replication. But with this framework one could maintain the idea of “point in time” backups. So the “archived logs” feature is one more option to organize incremental backups. It is not widely used as it was issued not so far and there is not A good understanding of how it works and how it can be used. We are open to any suggestions about its suggest improvements and use cases. The subject of this post is to describe how it works in depth. As log archiving is closely tied with innodb redo logs the internals of redo logs will be covered too. This post would be useful not only for DBA but also for Software Engineers because not only common principles are considered but the specific code too, and knowledge from this post can be used for further MySQL code exploring and patching. What is the innodb log and how it is written? Let’s remember what are innodb logs, why they are written, what they are used for. The Innodb engine has buffer pool. This is a cache of database pages. Any changes are done on page in buffer pool, then page is considered as “dirty,” which means it must be flushed, and pushed to the flush list which is processed periodically by special thread. If pages are not flushed to disk and server is terminated unexpectedly the changes will be lost. To avoid this innodb writes changes to redo log and recover data from redo log during start. This technique allows to delay buffer pool pages flushing. It can increase performance because several changes of one page can be accumulated in memory and then flushed by one io. Except that flushed pages can be grouped to decrease the number of non-sequential io’s. But the down-side of this approach is time for data recovering. Let’s consider how this log is stored, generated and used for data recovering. Log files Redo log consists of a several log files which are treated as a circular buffer. The number and the size of log files can be configured. Each log file has a header. The description of this header can be found in “storage/innobase/include/log0log.h” by “LOG_GROUP_ID” keyword. Each log file contains log records. Redo log records are written sequentially by log blocks of OS_FILE_LOG_BLOCK_SIZE size which is equal to 512 bytes by default and can be changed with innodb option. Each record has its LSN. LSN is a “Log Sequence Number” – the number of bytes written to log from the log creation to the certain log record. Each block consists of header, trailer and log records. Log blocks Let’s consider log block header. The first 4 bytes of the header is log block number. The block number is very similar as LSN but LSN is measured in bytes and block number is measured by OS_FILE_LOG_BLOCK_SIZE. Here is the simple formula how LSN is converted to block number: return(((ulint) (lsn / OS_FILE_LOG_BLOCK_SIZE) & 0x3FFFFFFFUL) + 1); This formula can be found in log_block_convert_lsn_to_no() function. The next two bytes is the number of bytes used in the block. The next two bytes is the offset of the first MTR log record in this block. What is MTR will be described below. Currently it can be considered as a synonym of bunch of log records which are gathered together as a description of some logical operation. For example it can be a group of log records for inserting new row to some table. This field is used when there are records of several MTR’s in one block. The next four bytes is a checkpoint number. The trailer is four bytes of log block checksum. The above description can be found in “storage/innobase/include/log0log.h” by “LOG_BLOCK_HDR_NO” keyword. Before writing to disk log blocks must be somehow formed and stored. And the question is: How log blocks are stored in memory and on disk? Where log blocks are stored before flushing to disk and how they are written and flushed? Global log object and log buffer The answer to the first part of the question is log buffer. Server holds very important global object log_sys in memory. It contains a lot of useful information about logging state. Log buffer is pointed by log_sys->buf pointer which is initialized in log_init(). I would highlight the following log_sys fields that are used for work with log buffer and flushing: log_sys->buf_size – the size of log buffer, can be set with innodb-log-buffer-size variable, the default value is 8M; log_sys->buf_free – the offset from which the next log record will be written; log_sys->max_buf_free – if log_sys->buf_free is greater then this value log buffer must be flushed, see log_free_check(); log_sys->buf_next_to_write – the offset of the next log record to write to disk; log_sys->write_lsn – the LSN up to which log is written; log_sys->flushed_to_disk_lsn – the LSN up to which log is written and flushed; log_sys->lsn – the last LSN in log buffer; So log_sys->buf_next_to_write is between 0 and log_sys->buf_free, log_sys->write_lsn is equal or less log_sys->lsn, log_sys->flushed_to_disk_lsn is less or equal to log_sys->write_lsn. The relationships for those fields can be easily traced with debugged by setting up watchpoints. Ok, we have log buffer, but how do log records come to this buffer? Where log records come from? Innodb has special objects that allow you to gather redo log records for some operations in one bunch before writing them to log buffer. These objects are called “mini-transactions” and corresponding functions and data types have “mtr” prefix in the code. The objects itself are described in mtr_t “c” structure. The most interesting fields of this structure are the following: mtr_t::log – contains log records for the mini-transaction, mtr_t::memo – contains pointers to pages which are changed or locked by the mini-transaction, it is used to push pages to flush list and release locks after logs records are copied to log buffer in mtr_commit() (see mtr_memo_pop_all() called in mtr_commit()). mtr_start() function initializes an object of mtr_t type and mtr_commit() writes log records from mtr_t::log to log_sys->buf + log_sys->buf_free. So the typical sequence of any operation which changes data is the following: mtr_start(); // initialize mtr object some_ops... // operations on data which are logged in mtr_t::log mtr_commit(); // write logged operations from mtr_t::log to log buffer log_sys->buf page_cur_insert_rec_write_log() is a good example of how mtr records can be written and mtr::memo can be filled. The low-level function which writes data to log buffer is log_write_low(). This function is invoked inside of mtr_commit() and not only copy the log records from mtr_t object to log buffer log_sys->buf but also creates a new log blocks inside of log_sys->buf, fills their header, trailer, calculates checksum. So log buffer contains log blocks which are sequentially filled with log records which are grouped in “mini-transactions” which logically can be treated as some logical operation over data which consists of a sequence of mini-operations(log records). As log records are written sequentially in log buffer one mini-transaction and even one log record can be written in two neighbour blocks. That is why the header field which would contain the offset of the first MTR in the block is necessary to calculate the point from which log records parsing can be started. This field was described in 2.2. So we have a buffer of log blocks in a memory. How is data from this buffer written to disk? The mysql documentation says that this depends on innodb_flush_log_at_trx_commit option. There can be three cases depending on the value of this option. Let’s consider each of them. Writing log buffer to disk: innodb_flush_log_at_trx_commit is 1 or 2. The first two cases is when innodb_flush_log_at_trx_commit is 1 or 2. In these cases flush log records are written for 2 and flushed for 1 on each transaction commit. If innodb_flush_log_at_trx_commit is 2 log records are flushed periodically by special thread which will be considered later. The low-level function which writes log records from buffer to file is log_group_write_buf(). But in the most cases it is not called directly but it is called from more high level log_write_up_to(). For the current case the calling stack is the following: (trx_commit_in_memory() or trx_commit_complete_for_mysql() or trx_prepare() e.t.c)-> trx_flush_log_if_needed()-> trx_flush_log_if_needed_low()-> log_write_up_to()-> log_group_write_buf(). It is quite easy to find the higher levels of calling stack, just set up breakpoint on log_group_write_buf() and execute any sql query that modifies innodb data. For example for the simple “insert” sql query the higher levels of calling stack are the following: mysql_execute_command()-> trans_commit_stmt()-> ha_commit_trans()-> TC_LOG_DUMMY::commit()-> ha_commit_low()-> innobase_commit()-> trx_commit_complete_for_mysql()-> trx_flush_log_if_needed()-> ... . log_io_complete() callback is invoked when i/o is finished for log files (see fil_aio_wait()). log_io_complete() flushes log files if this is not forbidden by innodb_flush_method or innodb_flush_log_at_trx_commit options. Writing log buffer to disk: innodb_flush_log_at_trx_commit is equal to 0 The third case is when innodb_flush_log_at_trx_commit is equal to 0. For this case log buffer is NOT written to disk on transaction commit, it is written and flushed periodically by separate thread “srv_master_thread”. If innodb_flush_log_at_trx_commit = 0 log files are flushed in the same thread by the same calls. The calling stack is the following: srv_master_thread()-> (srv_master_do_active_tasks() or srv_master_do_idle_tasks() or srv_master_do_shutdown_tasks())-> srv_sync_log_buffer_in_background()-> log_buffer_sync_in_background()->log_write_up_to()->... . Special cases for logs flushing While log_io_complete() do flushing depending on innodb_flush_log_at_trx_commit value among others log_write_up_to() has it’s own flushing criteria. This is flush_to_disk function argument. So it is possible to force log files flushing even if innodb_flush_log_at_trx_commit = 0. Here are examples of such cases: 1) buf_flush_write_block_low() Each page contains information about the last applied LSN(buf_flush_write_block_low::newest_modification), each log record is a description of change on certain page. Imagine we flushed some changed pages but log records for these pages were not flushed and server goes down. After starting the server some pages will have the newest modifications, but some of them were not flushed and the correspondent log records are lost too. We will have inconsistent database in this case. That is why log records must be flushed before the pages they refer. 2) srv_sync_log_buffer_in_background() As it was described above this function is called periodically by special thread and forces flushing. 3) log_checkpoint() When checkpoint is made log files must be reliably flushed. 4) The special handlerton innobase_flush_logs() which can be called through ha_flush_logs() from mysql server. For example ha_flush_logs() is called from MYSQL_BIN_LOG::reset_logs() when “RESET MASTER” or “RESET SLAVE” are executed. 5) srv_master_do_shutdown_tasks() – on shutdown, ha_innobase::create() – on table creating, ha_innobase::delete_table() – on table removing, innobase_drop_database() – on all database tables removing, innobase_rename_table() – on table rename e.t.c If log files are treated as circular buffer what happens when the buffer is overflown? Briefly. Innodb has a mechanism which allows you to avoid overflowing. It is called “checkpoints.” The checkpoint is a state when log files are synchronized with data files. In this case there is no need to keep the history of changes before checkpoint because all pages with the last modifications LSN less or equal to checkpoint LSN are flushed and the log files space from the last written LSN to the last checkpoint LSN can be reused. We will not describe a checkpoint process here because it is a separate interesting subject. The only thing we need to know is when checkpoint happens all pages with modification LSN less or equal to checkpoint LSN are reliably flushed. How archived logs are written by server. So the log contains information about page changes. But as we said, log files are the circular buffer. This means that they occupy fixed disk size and the oldest records can be rewritten by the newest ones as there are points when data files are synchronized with log files called checkpoints and there is no need to store the previous history of log records to guarantee database consistency. The idea is to save somewhere all log records to have the possibility of applying them to backuped data to have some kind if incremental backup. For example if we want to have an archive of log records. As log consists of log files it is reasonable to store log records in such files too, and these files are called “archived logs.” Archived log files are written to the directory which can be set with special innodb option. Each file has the same size as innodb log size and the suffix of each archived file is the LSN from which it is started. As well as log writing system log archiving system stores its data in global log_sys object. Here are the most valuable fields in log_sys from my point of view: log_sys->archive_buf, log_sys->archive_buf_size – logs archive buffer and its size, log records are copied from log buffer log_sys->buf to this buffer before writing to disk; log_sys->archiving_phase – the current phase of log archiving: LOG_ARCHIVE_READ when log records are being copied from log_sys->buf to log_sys->archive_buf, LOG_ARCHIVE_WRITE when log_sys->archive_buf is being written to disk; log_sys->archived_lsn – the LSN to which log files are written; log_sys->next_archived_lsn – the LSN to which write operations was invoked but not yet finished; log_sys->max_archived_lsn_age – the maximum difference between log_sys->lsn and log_sys->archived_lsn, if this difference exceeds the log are being archived synchronously, i.e. the difference is decreased; log_sys->archive_lock – this is rw-lock which is used for synchronizing LOG_ARCHIVE_WRITE and LOG_ARCHIVE_READ phases, it is x-locked on LOG_ARCHIVE_WRITE phase. So how is data copied from log_sys->buf to log_sys->archived_buf? log_archive_do() is used for this. It is not only set the proper state for archived log fields in log_sys but also invokes log_group_read_log_seg() with corresponding arguments which not only copy data from log buffer to archived log buffer but also invokes asynchronous write operation for archived log buffer. log_archive_do() can wait until io operations are finished using log_sys->archive_lock if corresponding function parameter is set. The main question is on what circumstances log_archive_do() is invoked, i.e. when log records are being written to archived log files. The first call stack is the following: log_free_check()-> log_check_margins()-> log_archive_margin()-> log_archive_do(). Here is text of log_free_check() with comments: /*********************************************************************// Checks if there is need for a log buffer flush or a new checkpoint, and does this if yes. Any database operation should call this when it has modified more than about 4 pages. NOTE that this function may only be called when the OS thread owns no synchronization objects except the dictionary mutex. */ UNIV_INLINE void log_free_check(void) /*================*/ { #ifdef UNIV_SYNC_DEBUG ut_ad(sync_thread_levels_empty_except_dict()); #endif /* UNIV_SYNC_DEBUG */ if (log_sys->check_flush_or_checkpoint) { log_check_margins(); } } log_sys->check_flush_or_checkpoint is set when there is no enough free space in log buffer or it is time to do checkpoint or any other bound case. log_archive_margin() is invoked only if the limit if the difference between log_sys->lsn and log_sys->archived_lsn is exceeded. Let’s refer to this difference as archived lsn age. One more call log_archive_do() is from log_open() when archived lsn age exceeds some limit. log_open() is called on each mtr_commit(). And for this case archived logs are written synchronously. The next synchronous call is from log_archive_all() during shutdown. Summarizing all above archived logs begins to be written when the log buffer is full enough to be written or when checkpoint happens or when the server is in the process of shut down. And there is no any delay between writing to archive log buffer and writing to disk. I mean there is no way to say that archived logs must be written once a second as it is possible for redo logs with innodb_flush_log_at_trx_commit = 0. As soon as data is copied to the buffer the write operation is invoked immediately for this buffer. Archived log buffer is not filled on each mtr_commit() so it does not slow down the usual logging process. The exception is when there are a lot of io operations what can be the reason of archive log age is too big. The result of big archive log age is the synchronous archived logs writing during mtr_commit(). Memory to memory copying is quite fast operation that is why the data is copied to archived log buffer and is written to disk asynchronously minimizing delays which can be caused by logs archiving. PS: Here is another call stack for writing archived log buffer to archived log files: log_io_complete()->log_io_complete_archive()->log_archive_check_completion_low()->log_archive_groups(). I propose to explore this stack yourself. Logs recovery process, how it is started and works inside. Archived logs applying. So we discovered how innodb redo logging works, and how redo logs are archived. And the last uncovered thing is how recovery works and how archived logs are applied. These two processes are very similar – that is why they are discussed in one section of this post. The story begins with innobase_start_or_create_for_mysql() which is invoked from innobase_init(). The following trident in innobase_start_or_create_for_mysql() can be used to search the relevant code: if (create_new_db) { ... } else if (srv_archive_recovery) { ... } else { ... } The second condition and the last one is the place from which archived logs applying and innodb logs recovery processes correspondingly start. These two blocks wrap two pairs of functions: recv_recovery_from_archive_start() recv_recovery_from_archive_finish() and recv_recovery_from_checkpoint_start() recv_recovery_from_checkpoint_finish() And all the magic happens in these pairs. As well as global log_sys object for redo logging there is global recv_sys object for innodb recovery and archived logs applying. It is created and initialized in recv_sys_create() and recv_sys_init() functions correspondingly. The following fields if recv_sys object are the most important from my point: recv_sys->limit_lsn – the LSN up to which recovery should be made, this value is initialized with the maximum value of uint64_t(see #define LSN_MAX) for the recovery process and with certain value which is passed as an argument of recv_recovery_from_archive_start() function and can be set via xtrabackup option for log applying; recv_sys->parse_start_lsn – the LSN from which logs parsing is started, for the the logs recovery this value equals to the last checkpoint LSN, for logs applying this is last applied LSN; recv_sys->scanned_lsn – the LSN up to which log files are scanned; recv_sys->recovered_lsn – the LSN up to which log records are applied, this value <= recv_sys->scanned_lsn; The first thing that must be done for starting recovery process is to find out the point in log files where the recovery must be started from. This is the last checkpoint LSN. recv_find_max_checkpoint() proceed this. As we can see in log_group_checkpoint() the following code writes checkpoint info into two places in the first log file depending on the checkpoint number: /* We alternate the physical place of the checkpoint info in the first log file */ if ((log_sys->next_checkpoint_no & 1) == 0) { write_offset = LOG_CHECKPOINT_1; } else { write_offset = LOG_CHECKPOINT_2; } So recv_find_max_checkpoint() reads checkpoint info from both places and selects the latest checkpoint. The same idea is applied for logs, too, but the last applied LSN instead of last checkpoint LSN must be found. Here is the call stack for reading last applied LSN: innobase_start_or_create_for_mysql()-> open_or_create_data_files()-> fil_read_first_page(). The last applied LSN is stored in the first page of data files in (min|max)_flushed_lsn fields(see FIL_PAGE_FILE_FLUSH_LSN offset). These values are written in fil_write_flushed_lsn_to_data_files() function on server shutdown. So the main difference between logs applying and recovery process at this stage is the manner of calculating LSN from which log records will be read. For logs applying the last flushed LSN is used but for recovery process it is the last checkpoint LSN. Why does this difference take place? Logs can be applied periodically. Assume we gather archived logs and apply them once an hour to have fresh backup. After applying the previous bunch of log files there can be unfinished transactions. For the recovery process any unfinished transactions are rolled back to have consistent db state at server starting. But for the logs applying process there is no need to roll back them because any unfinished transactions can be finished during the next logs applying. After calculating the start LSN the sequence of actions is the same for both recovering and applying. The next step is reading and parsing log records. See recv_group_scan_log_recs() which is invoked from recv_recovery_from_checkpoint_start_func() for logs recovering and recv_recovery_from_archive_start()->log_group_recover_from_archive_file() for logs applying. The first we read log records to some buffer and then invoke recv_scan_log_recs() to parse them. recv_scan_log_recs() checks each log block on consistency(checksum + comparing the log block number written in log block with log block number calculated from log block LSN) and other edge cases and copy it to parsing buffer recv_sys->buf with recv_sys_add_to_parsing_buf() function. The parsing buffer is then parsed by recv_parse_log_recs(). Log records are stored in hash table recv_sys->addr_hash. The key for this hash table is calculated basing on space id and page number pair. This pair refers to the page to which log records must be applied. The value of the hash table is object of recv_addr_t type. recv_addr_t type contains rec_list field which is the list of log records for applying to the (space id, page num) page (see recv_add_to_hash_table(). After parsing and storing log record in hash table recv_sys->addr_hash log records are applied. The function which is responsible for log records applying is recv_apply_hashed_log_recs(). It is invoked from recv_scan_log_recs() if there is no enough memory to store log records and at the end of recovering/applying process. For each element of recv_sys->addr_hash, i.e. for each DB page which must be changed with log records recv_recover_page() is invoked. It can be invoked as from recv_apply_hashed_log_recs() in the case if page is already in buffer pool of from buf_page_io_complete() on io completion, i.e. just after page was read from storage. Applying log records on page read completion is necessary and very convenient. Assume log records have not yet applied as we had enough memory to store the whole recovery log records. But we want for example to boot DB dictionary. I this case any records that concern to the pages of the dictionary will be applied to those pages just after reading them from storage to buffer pool. The function which applies log records to the certain page is recv_recover_page_func(). It gets the list of log records for the certain page from recv_sys->addr_hash hash table, for each element of this list it compares the lsn of last page changes with the LSN of the record, and if the former is greater the later it applies log record to the page. After applying all log records from archived logs xtrabackup writes last applied LSN to (min|max)_flushed LSN fields of each data file and finishes execution. The logs recovery process rollbacks all unfinished transactions unless this is forbidden with innodb-force-recovery parameter. Conclusion We covered the processes of redo logs writing and recovery in depth. These are very important processes as they provide data consistency on crashes. These two processes became a base for logs archiving and applying features. As log records can describe any data changes the idea is to store these records somewhere and then apply them to backups for organizing some kind of incremental backup. The features were implemented a short time ago and currently they are not widely used. So if you have something to say about them you are welcome to comment for discussion.
April 16, 2014
by Peter Zaitsev
· 6,162 Views
article thumbnail
How to Migrate from MySQL to MongoDB
In the last week I was working on a key project to migrate a BI platform from MySQL to MongoDB. We chose that database due to its support and scalability.
April 14, 2014
by Moshe Kaplan
· 115,867 Views · 6 Likes
article thumbnail
How JOIN Order Can Increase Performance in SQL Queries
Introduction All developers are very much concerned about performance. If someone say that this increase performance, all the developer are running behind it. It is not a bad practice at all. Rather as per my point of view we must span all our effort related improve the performance of query. “One common question that we find that, if we change the ordering of table join in case of inner join will effect or increase performance” To understand it lets take a simple example of Inner join. There is two tables named Table-A and Table-B. We can us the Inner Join on both the table. Like this FROM [Table-A] AS a INNER JOIN [Table-B] AS b ON a.IDNO = b.IDNO OR FROM [Table-B] AS a INNER JOIN [Table-A] AS b ON a.IDNO = b.IDNO Which one is best for performance? To answer this question we all know that whenever a SQL Query is executed the MS SQL server create several query plans with different join Order and choose the best one. That means the Join order that we are writing in the query may not be executed by execution plan. May be different join order is used by the execution plan. In the above case the execution plan decide which Join order he will chose depends on best possible costing of execution. Here [Table-A] JOIN [Table-B] or [Table-B] JOIN [Table-A], MS SQL Server knows it well that both are same. To understand it Details Lets take an Example Step-1 [ Create Base Table and Insert Some Records ] -- Item Master IF OBJECT_ID(N'dbo.tbl_ITEMDTLS', N'U')IS NOT NULL BEGIN DROP TABLE [dbo].[tbl_ITEMDTLS]; END GO CREATE TABLE [dbo].[tbl_ITEMDTLS] ( ITEMCD INT NOT NULL IDENTITY PRIMARY KEY, ITEMNAME VARCHAR(50) NOT NULL ) GO -- Inserting Records INSERT INTO [dbo].[tbl_ITEMDTLS] (ITEMNAME) VALUES ('ITEM-1'),('ITEM-2'),('ITEM-3'); -- Item UOM Master IF OBJECT_ID(N'dbo.tbl_UOMDTLS', N'U')IS NOT NULL BEGIN DROP TABLE [dbo].[tbl_UOMDTLS]; END GO CREATE TABLE [dbo].[tbl_UOMDTLS] ( UOMCD INT NOT NULL IDENTITY PRIMARY KEY, UOMNAME VARCHAR(50) NOT NULL ) GO -- Inserting Records INSERT INTO [dbo].[tbl_UOMDTLS] (UOMNAME) VALUES ('KG'),('LTR'),('GRM'); GO -- Transaction Table IF OBJECT_ID(N'dbo.tbl_SBILL', N'U')IS NOT NULL BEGIN DROP TABLE [dbo].[tbl_SBILL]; END GO CREATE TABLE [dbo].[tbl_SBILL] ( TRID INT NOT NULL IDENTITY PRIMARY KEY, ITEMCD INT NOT NULL, UOMCD INT NOT NULL, QTY DECIMAL(18,3) NOT NULL, RATE DECIMAL(18,2) NOT NULL, AMOUNT AS QTY * RATE ); GO -- Foreign Key Constraint ALTER TABLE [dbo].[tbl_SBILL] ADD CONSTRAINT FK_ITEM_tbl_SBILL FOREIGN KEY(ITEMCD) REFERENCES [dbo].[tbl_ITEMDTLS](ITEMCD); GO ALTER TABLE [dbo].[tbl_SBILL] ADD CONSTRAINT FK_UOMCD_tbl_SBILL FOREIGN KEY(UOMCD) REFERENCES [dbo].[tbl_UOMDTLS](UOMCD); -- Insert Records INSERT INTO [dbo].[tbl_SBILL] (ITEMCD, UOMCD, QTY, RATE) VALUES (1, 1, 20, 2000),(2, 3, 23, 1400); Step-2 [ Now Make Some JOIN ] SELECT b.TRID, b.ITEMCD, a.ITEMNAME, b.UOMCD, c.UOMNAME, b.QTY, b.RATE, b.AMOUNT FROM [dbo].[tbl_ITEMDTLS] AS a INNER JOIN [dbo].[tbl_SBILL] AS b ON a.ITEMCD = b.ITEMCD INNER JOIN [dbo].[tbl_UOMDTLS] AS c ON b.UOMCD = c.UOMCD; Here [tbl_ITEMDETAILS] JOIN [tbl_SALES] JOIN [tbl_UOMDETAILS] If we look at the Execution Plan We find that [tbl_SALES] JOIN [tbl_ITEMDETAILS] JOIN [tbl_UOMDETAILS] Step-2 [ Now we need to Force Order Hint to maintain Join Order ] SELECT b.TRID, b.ITEMCD, a.ITEMNAME, b.UOMCD, c.UOMNAME, b.QTY, b.RATE, b.AMOUNT FROM [dbo].[tbl_ITEMDTLS] AS a INNER JOIN [dbo].[tbl_SBILL] AS b ON a.ITEMCD = b.ITEMCD INNER JOIN [dbo].[tbl_UOMDTLS] AS c ON b.UOMCD = c.UOMCD OPTION ( QUERYRULEOFF JoinCommute); For this we need the FORCE ORDER Hint. The query optimizer uses different rules to evaluate different plan and one of the rules is called JoinCommute. We can turn it off using the undocumented query hint QUERYRULEOFF. Hope you like it.
April 14, 2014
by Joydeep Das
· 26,599 Views
article thumbnail
Be a Lazy But Productive Android Developer, Part 5: Image Loading Library
Welcome to part 5 of “Be a lazy but a productive android developer” series. If you are a lazy Android developer and looking for image loading library, which could help you to load image(s) asynchronously without writing a logic for downloading and caching images then this article is for you. This series so far: Part 1: We looked at RoboGuice, a dependency injection library by which we can reduce the boiler plate code, save time and there by achieve productivity during Android app development. Part 2: We saw and explored about Genymotion, which is a rocket speed emulator and super-fast emulator as compared to native emulator. And we can use Genymotion while developing apps and can quickly test apps and there by can achieve productivity. Part 3: We understood and explored about JSON Parsing libraries (GSON and Jackson), using which we can increase app performance, we can decrease boilerplate code and there by can optimize productivity. Part 4: We talked about Card UI and explored card library, also created a basic card and simple card list demo. In this part In this part, we are going to talk about some image libraries using which we can load image(s) asynchronously, can cache images and also can download images into the local storage. Required features for loading images Almost every android app has a need to load remote images. While loading remote images, we have to take care of below things: Image loading process must be done in background (i.e. asynchronously) to avoid blocking UI main thread. Image recycling image should be done. Image should be displayed once its loaded successfully. Images should be cached in local memory for the later use. If remote image gets failed (due to network connection or bad url or any other reasons) to load then it should be managed perfectly for avoiding duplicate requests to load the same again, instead it should load if and only if net connection is available. Memory management should be done efficiently. In short, we have to write a code to manage each and every aspects of image loading but there are some awesome libraries available, using which we can load/download image asynchronously. We just have to call the load image method and success/failure callbacks. Asynchronous image loading Consider a case where we are having 50 images and 50 titles and we try to load all the images/text into the listview, it won’t display anything until all the images get downloaded. Here Asynchronous image loading process comes in picture. Asynchronous image loading is nothing but a loading process which happens in background so that it doesn’t block main UI thread and let user to play with other loaded data on the screen. Images will be getting displayed as and when it gets downloaded from background threads. Asynchronous image loading libraries Nostra’s Universal Image loader – https://github.com/nostra13/Android-Universal-Image-Loader Picasso – http://square.github.io/picasso/ UrlImageViewHelper by Koush Volley - By Android team members @ Google Novoda’s Image loader – https://github.com/novoda/ImageLoader Let’s have a look at examples using Picasso and Universal Image loader libraries. Example 1: Nostra’s Universal Image loader Step 1: Initialize ImageLoader configuration ? public class MyApplication extends Application{ @Override public void onCreate() { // TODO Auto-generated method stub super.onCreate(); // Create global configuration and initialize ImageLoader with this configuration ImageLoaderConfiguration config = new ImageLoaderConfiguration.Builder(getApplicationContext()).build(); ImageLoader.getInstance().init(config); } } Step 2: Declare application class inside Application tag in AndroidManifest.xml file ? Step 3: Load image and display into ImageView ? ImageLoader.getInstance().displayImage(objVideo.getThumb(), holder.imgVideo); Now, Universal Image loader also provides a functionality to implement success/failure callback to check whether image loading is failed or successful. ? ImageLoader.getInstance().displayImage(photoUrl, imgView, new ImageLoadingListener() { @Override public void onLoadingStarted(String arg0, View arg1) { // TODO Auto-generated method stub findViewById(R.id.EL3002).setVisibility(View.VISIBLE); } @Override public void onLoadingFailed(String arg0, View arg1, FailReason arg2) { // TODO Auto-generated method stub findViewById(R.id.EL3002).setVisibility(View.GONE); } @Override public void onLoadingComplete(String arg0, View arg1, Bitmap arg2) { // TODO Auto-generated method stub findViewById(R.id.EL3002).setVisibility(View.GONE); } @Override public void onLoadingCancelled(String arg0, View arg1) { // TODO Auto-generated method stub findViewById(R.id.EL3002).setVisibility(View.GONE); } }); Example 2: Picasso Image loading straight way: ? Picasso.with(context).load("http://postimg.org/image/wjidfl5pd/").into(imageView); Image re-sizing: ? Picasso.with(context) .load(imageUrl) .resize(100, 100) .centerCrop() .into(imageView) Example 3: UrlImageViewHelper library It’s an android library that sets an ImageView’s contents from a url, manages image downloading, caching, and makes your coffee too. UrlImageViewHelper will automatically download and manage all the web images and ImageViews. Duplicate urls will not be loaded into memory twice. Bitmap memory is managed by using a weak reference hash table, so as soon as the image is no longer used by you, it will be garbage collected automatically. Image loading straight way: ? UrlImageViewHelper.setUrlDrawable(imgView, "http://yourwebsite.com/image.png"); Placeholder image when image is being downloaded: ? UrlImageViewHelper.setUrlDrawable(imgView, "http://yourwebsite.com/image.png", R.drawable.loadingPlaceHolder); Cache images for a minute only: ? UrlImageViewHelper.setUrlDrawable(imgView, "http://yourwebsite.com/image.png", null, 60000); Example 4: Volley library Yes Volley is a library developed and being managed by some android team members at Google, it was announced by Ficus Kirkpatrick during the last I/O. I wrote an article about Volley library 10 months back , read it and give it a try if you haven’t used it yet. Let’s look at an example of image loading using Volley. Step 1: Take a NetworkImageView inside your xml layout. ? Step 2: Define a ImageCache class Yes you are reading title perfectly, we have to define an ImageCache class for initializing ImageLoader object. ? public class BitmapLruCache extends LruCache implements ImageLoader.ImageCache { public BitmapLruCache() { this(getDefaultLruCacheSize()); } public BitmapLruCache(int sizeInKiloBytes) { super(sizeInKiloBytes); } @Override protected int sizeOf(String key, Bitmap value) { return value.getRowBytes() * value.getHeight() / 1024; } @Override public Bitmap getBitmap(String url) { return get(url); } @Override public void putBitmap(String url, Bitmap bitmap) { put(url, bitmap); } public static int getDefaultLruCacheSize() { final int maxMemory = (int) (Runtime.getRuntime().maxMemory() / 1024); final int cacheSize = maxMemory / 8; return cacheSize; } } Step 3: Create an ImageLoader object and load image Create an ImageLoader object and initialize it with ImageCache object and RequestQueue object. ? ImageLoader.ImageCache imageCache = new BitmapLruCache(); ImageLoader imageLoader = new ImageLoader(Volley.newRequestQueue(context), imageCache); Step 4: Load an image into ImageView ? NetworkImageView imgAvatar = (NetworkImageView) findViewById(R.id.imgDemo); imageView.setImageUrl(url, imageLoader); Which library to use? Can you decide which library you would use? Let us know which and what are the reasons? Selection of the library is always depends on the requirement. Let’s look at the few fact points about each library so that you would able to compare exactly and can take decision. Picasso: It’s just a one liner code to load image using Picasso. No need to initialize ImageLoader and to prepare a singleton instance of image loader. Picasso allows you to specify exact target image size. It’s useful when you have memory pressure or performance issues, you can trade off some image quality for speed. Picasso doesn’t provide a way to prepare and store thumbnails of local images. Sometimes you need to check image loading process is in which state, loading, finished execution, failed or cancelled image loading. Surprisingly It doesn’t provide a callback functionality to check any state. “fetch()” dose not pass back anything. “get()” is for synchronously read, and “load()” is for asynchronously draw a view. Universal Image loader (UIL): It’s the most popular image loading library out there. Actually, it’s based on the Fedor Vlasov’s project which was again probably a very first complete solution and also a most voted answer (for the image loading solution) on Stackoverflow. UIL library is better in documentation and even there’s a demo example which highlights almost all the features. UIL provides an easy way to download image. UIL uses builders for customization. Almost everything can be configured. UIL doesn’t not provide a way to specify image size directly you want to load into a view. It uses some rules based on the size of the view. Indirectly you can do it by mentioning ImageSize argument in the source code and bypass the view size checking. It’s not as flexible as Picasso. Volley: It’s officially by Android dev team, Google but still it’s not documented. It’s just not an image loading library only but an asynchronous networking library Developer has to define ImageCache class their self and has to initialize ImageLoader object with RequestQueue and ImageCache objects. So now I am sure now you can be able to compare libraries. Choosing library is a bit difficult talk because it always depends on the requirement and type of projects. If the project is large then you should go for Picasso or Universal Image loader. If the project is small then you can consider to use Volley librar, because Volley isn’t an image loading library only but it tries to solve a more generic solution.). I suggest you to start with Picasso. If you want more control and customization, go for UIL. Read more: http://blog.bignerdranch.com/3177-solving-the-android-image-loading-problem-volley-vs-picasso/ http://stackoverflow.com/questions/19995007/local-image-caching-solution-for-android-square-picasso-vs-universal-image-load https://plus.google.com/103583939320326217147/posts/bfAFC5YZ3mq Hope you liked this part of “Lazy android developer: Be productive” series. Till the next part, keep exploring image loading libraries mentioned above and enjoy!
April 11, 2014
by Paresh Mayani
· 63,973 Views · 2 Likes
article thumbnail
Visualizing SQL Statements
Usually if I concentrate I am able to understand most SQL statements. There are times though such as: When a set of tables is not familiar When I did not write the SQL statement When the SQL statement is long and involves many tables and joins When I want to discuss a statement with a colleague All of the above Having a visual representation of a SQL statement can be helpful in deciphering the statement. My visualisation tool of choice for SQL is an Open Source application called Reverse Snowflake Joins (REVJ). As the name implies, this tool shines when it comes to showing you how your tables are related. I have installed the tool on my workstation but when I am on the move I use the online version of the tool. Using the tool is straight forward, simply paste your SQL statement in the text area and generate the diagram, the online version generates an SVG image. I have at times found that the tool struggles with complex CASE statements. In such cases I remove the CASE statement and just include the fields used in the case statement. Below is a sample statement to show REVJ at work. SELECT a.prod_cat_name ,b.prod_name ,c.prod_owner_name ,p.promo_id ,pt.promo_type ,sum(s.units) as total_units ,sum(s.sale_price) as total_sale_price ,sum(prev_s.units) as prev_yr_total_units FROM product_category a JOIN product b ON a.product_cat_id = b.product_cat_id LEFT OUTER JOIN product_owner c ON a.product_cat_id = c.product_cat_id JOIN sales s ON b.product_id = s.product_id JOIN sales prev_s ON s.sale_year = prev_s.sale_year-1 LEFT OUTER JOIN promotion p ON s.promo_id = p.promo_id RIGHT OUTER JOIN promo_type pt ON p.promo_type_id = pt.promo_type_id WHERE pt.promo_type IN ('Email', 'TV') AND a.prod_cat_name = 'Electronics' AND s.sale_year >=2013 GROUP BY a.prod_cat_name ,b.prod_name ,c.prod_owner_name ,p.promo_id ,pt.promo_type HAVING sum(s.units)>100 The generated image shown below. Notice how the filters applied to each table are also shown, further simplifying the task of understanding the SQL statement. For more complex examples have a look at big samples page.
April 7, 2014
by Mpumelelo Msimanga
· 18,755 Views
article thumbnail
6 Simple Performance Tips for SQL SELECT Statements
Performance tuning SELECT statements can be a time consuming task which in my opinion follows Pareto principle’s. 20% effort is likely give you an 80% performance improvement. To get another 20% performance improvement you probably need to spend 80% of the time. Unless you work on the planet Venus where each day on Venus is equal to 243 Earth days, delivery deadlines are likely to mean you will not have enough time to put into tuning your SQL queries. After years writing and running SQL statements I began to develop a mental check-list of things I looked at when trying to improve query performance. These are the things I check before moving on to query plans and reading the sometimes complicated documentation of the database I am working on. My check-list is by no means comprehensive or scientific, more like a back of the envelope calculation but can I can say that most of the time I do get performance improvements following these simple steps. The check-list follows. Check Indexes There should be indexes on all fields used in the WHERE and JOIN portions of the SQL statement. Take the 3-Minute SQL performance test. Regardless of your score be sure to read through the answers as they are informative. Limit Size of Your Working Data Set Examine the tables used in the SELECT statement to see if you can apply filters in the WHERE clause of your statement. A classic example is when a query initially worked well when there were only a few thousand rows in the table. As the application grew the query slowed down. The solution may be as simple as restricting the query to looking at the current month’s data. When you have queries that have sub-selects, look to apply filtering to the inner statement of the sub-selects as opposed to the outer statements. Only Select Fields You Need Extra fields often increase the grain of the data returned and thus result in more (detailed) data being returned to the SQL client. Additionally: When using reporting and analytical applications, sometimes the slow report performance is because the reporting tool has to do the aggregation as data is received in detailed form. Occasionally the query may run quickly enough but your problem could be a network related issue as large amounts of detailed data are sent to the reporting server across the network. When using a column-oriented DBMS only the columns you have selected will be read from disk, the less columns you include in your query the less IO overhead. Remove Unnecessary Tables The reasons for removing unnecessary tables are the same as the reasons for removing fields not needed in the select statement. Writing SQL statements is a process that usually takes a number of iterations as you write and test your SQL statements. During development it is possible that you add tables to the query that may not have any impact on the data returned by the SQL code. Once the SQL is correct I find many people do not review their script and remove tables that do not have any impact or use in the final data returned. By removing the JOINS to these unnecessary tables you reduce the amount of processing the database has to do. Sometimes, much like removing columns you may find your reduce the data bring brought back by the database. Remove OUTER JOINS This can easier said than done and depends on how much influence you have in changing table content. One solution is to remove OUTER JOINS by placing placeholder rows in both tables. Say you have the following tables with an OUTER JOIN defined to ensure all data is returned: customer_id customer_name 1 John Doe 2 Mary Jane 3 Peter Pan 4 Joe Soap customer_id sales_person NULL Newbee Smith 2 Oldie Jones 1 Another Oldie NULL Greenhorn The solution is to add a placeholder row in the customer table and update all NULL values in the sales table to the placeholder key. customer_id customer_name 0 NO CUSTOMER 1 John Doe 2 Mary Jane 3 Peter Pan 4 Joe Soap customer_id sales_person 0 Newbee Smith 2 Oldie Jones 1 Another Oldie 0 Greenhorn Not only have you removed the need for an OUTER JOIN you have also standardised how sales people with no customers are represented. Other developers will not have to write statements such as ISNULL(customer_id, “No customer yet”). Remove Calculated Fields in JOIN and WHERE Clauses This is another one of those that may at times be easier said than done depending on your permissions to make changes to the schema. This can be done by creating a field with the calculated values used in the join on the table. Given the following SQL statement: FROM sales a JOIN budget b ON ((year(a.sale_date)* 100) + month(a.sale_date)) = b.budget_year_month Performance can be improved by adding a column with the year and month in the sales table. The updated SQL statement would be as follows: SELECT * FROM PRODUCTSFROM sales a JOIN budget b ON a.sale_year_month = b.budget_year_month Conclusion The recommendations boil down to a few short pointers check for indexes work with the smallest data set required remove unnecessary fields and tables and remove calculations in your JOIN and WHERE clauses. If all these recommendations fail to improve your SQL query performance my last suggestion is you move to Venus. All you will need is a single day to tune your SQL.
March 31, 2014
by Mpumelelo Msimanga
· 349,471 Views · 5 Likes
article thumbnail
How to Run a SQL Query Across Multiple Databases with One Query
In SQL Server management studio, using, View, Registered Servers (Ctrl+Alt+G) set up the servers that you want to execute the same query across all servers for, right click the group, select new query. Then when you execute the query, the results will come back with the first column showing you the database instance that that row came from.
March 28, 2014
by Merrick Chaffer
· 49,283 Views
article thumbnail
Documenting Your Spring API with Swagger
over the last several months, i've been developing a rest api using spring boot . my client hired an outside company to develop a native ios app, and my development team was responsible for developing its api. our main task involved integrating with epic , a popular software system used in health care. we also developed a crowd -backed authentication system, based loosely on philip sorst's angular rest security . to document our api, we used spring mvc integration for swagger (a.k.a. swagger-springmvc). i briefly looked into swagger4spring-web , but gave up quickly when it didn't recognize spring's @restcontroller. we started with swagger-springmvc 0.6.5 and found it fairly easy to integrate. unfortunately, it didn't allow us to annotate our model objects and tell clients which fields were required. we were quite pleased when a new version (0.8.2) was released that supports swagger 1.3 and its @apimodelproperty. what is swagger? the goal of swagger is to define a standard, language-agnostic interface to rest apis which allows both humans and computers to discover and understand the capabilities of the service without access to source code, documentation, or through network traffic inspection. to demonstrate how swagger works, i integrated it into josh long's x-auth-security project. if you have a boot-powered project, you should be able to use the same steps. 1. add swagger-springmvc dependency to your project. com.mangofactory swagger-springmvc 0.8.2 note: on my client's project, we had to exclude "org.slf4j:slf4j-log4j12" and add "jackson-module-scala_2.10:2.3.1" as a dependency. i did not need to do either of these in this project. 2. add a swaggerconfig class to configure swagger. the swagger-springmvc documentation has an example of this with a bit more xml. package example.config; import com.mangofactory.swagger.configuration.jacksonscalasupport; import com.mangofactory.swagger.configuration.springswaggerconfig; import com.mangofactory.swagger.configuration.springswaggermodelconfig; import com.mangofactory.swagger.configuration.swaggerglobalsettings; import com.mangofactory.swagger.core.defaultswaggerpathprovider; import com.mangofactory.swagger.core.swaggerapiresourcelisting; import com.mangofactory.swagger.core.swaggerpathprovider; import com.mangofactory.swagger.scanners.apilistingreferencescanner; import com.wordnik.swagger.model.*; import org.springframework.beans.factory.annotation.autowired; import org.springframework.beans.factory.annotation.value; import org.springframework.context.annotation.bean; import org.springframework.context.annotation.componentscan; import org.springframework.context.annotation.configuration; import java.util.arraylist; import java.util.arrays; import java.util.list; import static com.google.common.collect.lists.newarraylist; @configuration @componentscan(basepackages = "com.mangofactory.swagger") public class swaggerconfig { public static final list default_include_patterns = arrays.aslist("/news/.*"); public static final string swagger_group = "mobile-api"; @value("${app.docs}") private string docslocation; @autowired private springswaggerconfig springswaggerconfig; @autowired private springswaggermodelconfig springswaggermodelconfig; /** * adds the jackson scala module to the mappingjackson2httpmessageconverter registered with spring * swagger core models are scala so we need to be able to convert to json * also registers some custom serializers needed to transform swagger models to swagger-ui required json format */ @bean public jacksonscalasupport jacksonscalasupport() { jacksonscalasupport jacksonscalasupport = new jacksonscalasupport(); //set to false to disable jacksonscalasupport.setregisterscalamodule(true); return jacksonscalasupport; } /** * global swagger settings */ @bean public swaggerglobalsettings swaggerglobalsettings() { swaggerglobalsettings swaggerglobalsettings = new swaggerglobalsettings(); swaggerglobalsettings.setglobalresponsemessages(springswaggerconfig.defaultresponsemessages()); swaggerglobalsettings.setignorableparametertypes(springswaggerconfig.defaultignorableparametertypes()); swaggerglobalsettings.setparameterdatatypes(springswaggermodelconfig.defaultparameterdatatypes()); return swaggerglobalsettings; } /** * api info as it appears on the swagger-ui page */ private apiinfo apiinfo() { apiinfo apiinfo = new apiinfo( "news api", "mobile applications and beyond!", "https://helloreverb.com/terms/", "[email protected]", "apache 2.0", "http://www.apache.org/licenses/license-2.0.html" ); return apiinfo; } /** * configure a swaggerapiresourcelisting for each swagger instance within your app. e.g. 1. private 2. external apis * required to be a spring bean as spring will call the postconstruct method to bootstrap swagger scanning. * * @return */ @bean public swaggerapiresourcelisting swaggerapiresourcelisting() { //the group name is important and should match the group set on apilistingreferencescanner //note that swaggercache() is by defaultswaggercontroller to serve the swagger json swaggerapiresourcelisting swaggerapiresourcelisting = new swaggerapiresourcelisting(springswaggerconfig.swaggercache(), swagger_group); //set the required swagger settings swaggerapiresourcelisting.setswaggerglobalsettings(swaggerglobalsettings()); //use a custom path provider or springswaggerconfig.defaultswaggerpathprovider() swaggerapiresourcelisting.setswaggerpathprovider(apipathprovider()); //supply the api info as it should appear on swagger-ui web page swaggerapiresourcelisting.setapiinfo(apiinfo()); //global authorization - see the swagger documentation swaggerapiresourcelisting.setauthorizationtypes(authorizationtypes()); //every swaggerapiresourcelisting needs an apilistingreferencescanner to scan the spring request mappings swaggerapiresourcelisting.setapilistingreferencescanner(apilistingreferencescanner()); return swaggerapiresourcelisting; } @bean /** * the apilistingreferencescanner does most of the work. * scans the appropriate spring requestmappinghandlermappings * applies the correct absolute paths to the generated swagger resources */ public apilistingreferencescanner apilistingreferencescanner() { apilistingreferencescanner apilistingreferencescanner = new apilistingreferencescanner(); //picks up all of the registered spring requestmappinghandlermappings for scanning apilistingreferencescanner.setrequestmappinghandlermapping(springswaggerconfig.swaggerrequestmappinghandlermappings()); //excludes any controllers with the supplied annotations apilistingreferencescanner.setexcludeannotations(springswaggerconfig.defaultexcludeannotations()); // apilistingreferencescanner.setresourcegroupingstrategy(springswaggerconfig.defaultresourcegroupingstrategy()); //path provider used to generate the appropriate uri's apilistingreferencescanner.setswaggerpathprovider(apipathprovider()); //must match the swagger group set on the swaggerapiresourcelisting apilistingreferencescanner.setswaggergroup(swagger_group); //only include paths that match the supplied regular expressions apilistingreferencescanner.setincludepatterns(default_include_patterns); return apilistingreferencescanner; } /** * example of a custom path provider */ @bean public apipathprovider apipathprovider() { apipathprovider apipathprovider = new apipathprovider(docslocation); apipathprovider.setdefaultswaggerpathprovider(springswaggerconfig.defaultswaggerpathprovider()); return apipathprovider; } private list authorizationtypes() { arraylist authorizationtypes = new arraylist<>(); list authorizationscopelist = newarraylist(); authorizationscopelist.add(new authorizationscope("global", "access all")); list granttypes = newarraylist(); loginendpoint loginendpoint = new loginendpoint(apipathprovider().getappbasepath() + "/user/authenticate"); granttypes.add(new implicitgrant(loginendpoint, "access_token")); return authorizationtypes; } @bean public swaggerpathprovider relativeswaggerpathprovider() { return new apirelativeswaggerpathprovider(); } private class apirelativeswaggerpathprovider extends defaultswaggerpathprovider { @override public string getappbasepath() { return "/"; } @override public string getswaggerdocumentationbasepath() { return "/api-docs"; } } } the apipathprovider class referenced above is as follows: package example.config; import com.mangofactory.swagger.core.swaggerpathprovider; import org.springframework.beans.factory.annotation.autowired; import org.springframework.web.util.uricomponentsbuilder; import javax.servlet.servletcontext; public class apipathprovider implements swaggerpathprovider { private swaggerpathprovider defaultswaggerpathprovider; @autowired private servletcontext servletcontext; private string docslocation; public apipathprovider(string docslocation) { this.docslocation = docslocation; } @override public string getapiresourceprefix() { return defaultswaggerpathprovider.getapiresourceprefix(); } public string getappbasepath() { return uricomponentsbuilder .fromhttpurl(docslocation) .path(servletcontext.getcontextpath()) .build() .tostring(); } @override public string getswaggerdocumentationbasepath() { return uricomponentsbuilder .fromhttpurl(getappbasepath()) .pathsegment("api-docs/") .build() .tostring(); } @override public string getrequestmappingendpoint(string requestmappingpattern) { return defaultswaggerpathprovider.getrequestmappingendpoint(requestmappingpattern); } public void setdefaultswaggerpathprovider(swaggerpathprovider defaultswaggerpathprovider) { this.defaultswaggerpathprovider = defaultswaggerpathprovider; } } in src/main/resources/application.properties , add an "app.docs" property. this will need to be changed as you move your application from local -> test -> staging -> production. spring boot's externalized configuration makes this fairly simple. app.docs=http://localhost:8080 3. verify swagger produces json. after completing the above steps, you should be able to see the json swagger generates for your api. open http://localhost:8080/api-docs in your browser or curl http://localhost:8080/api-docs . { "apiversion": "1", "swaggerversion": "1.2", "apis": [ { "path": "http://localhost:8080/api-docs/mobile-api/example_newscontroller", "description": "example.newscontroller" } ], "info": { "title": "news api", "description": "mobile applications and beyond!", "termsofserviceurl": "https://helloreverb.com/terms/", "contact": "[email protected]", "license": "apache 2.0", "licenseurl": "http://www.apache.org/licenses/license-2.0.html" } } 4. copy swagger ui into your project. swagger ui is a good-looking javascript client for swagger's json. i integrated it using the following steps: git clone https://github.com/wordnik/swagger-ui cp -r swagger-ui/dist ~/dev/x-auth-security/src/main/resources/public/docs i modified docs/index.html, deleting its header () element, as well as made its url dynamic. ... $(function () { var apiurl = window.location.protocol + "//" + window.location.host; if (window.location.pathname.indexof('/api') > 0) { apiurl += window.location.pathname.substring(0, window.location.pathname.indexof('/api')) } apiurl += "/api-docs"; log('api url: ' + apiurl); window.swaggerui = new swaggerui({ url: apiurl, dom_id: "swagger-ui-container", ... after making these changes, i was able to open fire up the app with "mvn spring-boot:run" and view http://localhost:8080/docs/index.html in my browser. 5. annotate your api. there are two services in x-auth-security: one for authentication and one for news. to provide more information to the "news" service's documentation, add @api and @apioperation annotations. these annotations aren't necessary to get a service to show up in swagger ui, but if you don't specify the @api("user"), you'll end up with an ugly-looking class name instead (e.g. example_xauth_userxauthtokencontroller). @restcontroller @api(value = "news", description = "news api") class newscontroller { map entries = new concurrenthashmap(); @requestmapping(value = "/news", method = requestmethod.get) @apioperation(value = "get news", notes = "returns news items") collection entries() { return this.entries.values(); } @requestmapping(value = "/news/{id}", method = requestmethod.delete) @apioperation(value = "delete news item", notes = "deletes news item by id") newsentry remove(@pathvariable long id) { return this.entries.remove(id); } @requestmapping(value = "/news/{id}", method = requestmethod.get) @apioperation(value = "get a news item", notes = "returns a news item") newsentry entry(@pathvariable long id) { return this.entries.get(id); } @requestmapping(value = "/news/{id}", method = requestmethod.post) @apioperation(value = "update news", notes = "updates a news item") newsentry update(@requestbody newsentry news) { this.entries.put(news.getid(), news); return news; } ... } you might notice the screenshot above only shows news. this is because swaggerconfig.default_include_patterns only specifies news. the following will include all apis. public static final list default_include_patterns = arrays.aslist("/.*"); after adding these annotations and modifying swaggerconfig , you should see all available services. in swagger-springmvc 0.8.x, the ability to use @apimodel and @apimodelproperty annotations was added. this means you can annotate newsentry to specify which fields are required. @apimodel("news entry") public static class newsentry { @apimodelproperty(value = "the id of the item", required = true) private long id; @apimodelproperty(value = "content", required = true) private string content; // getters and setters } this results in the model's documentation showing up in swagger ui. if "required" isn't specified, a property shows up as optional . parting thoughts the qa engineers and 3rd party ios developers have been very pleased with our api documentation. i believe this is largely due to swagger and its nice-looking ui. the swagger ui also provides an interface to test the endpoints by entering parameters (or json) into html forms and clicking buttons. this could benefit those qa folks that prefer using selenium to test html (vs. raw rest endpoints). i've been quite pleased with swagger-springmvc, so kudos to its developers. they've been very responsive in fixing issues i've reported . the only thing i'd like is support for recognizing jsr303 annotations (e.g. @notnull) as required fields. to see everything running locally, checkout my modified x-auth-security project on github and the associated commits for this article.
March 27, 2014
by Matt Raible
· 119,995 Views · 5 Likes
article thumbnail
Goose for Database Migrations
I've been hunting for good database tools to perform that class of tasks that we all need, but that we end up re-implementing over and over again. One such task is database migrations. I've been experimenting with Goose to provide general-purpose database migration support. What Is Goose? Goose is a general purpose database migration manager. The idea is simple: You provide SQL schema files that follow a particular naming convention You provide a simple dbconf.yml file that tells Goose how to connect to your various databases Goose provides you simple tools to upgrade (goose up), check on (goose status), and even revert (goose down) schema changes. Goose does this by adding one more table inside your database. This table tracks which schema changes it has made. Based on its history, it can tell which scheme updates need to be run and which have already been run. While Goose is written in Go (golang), it is agnostic about what language your app is written in. Getting Started I got Goose up and running in less than 30 minutes, and you can probably do it faster. I already have an empty Postgres database called foo. But it has no tables. I have an existing codebase, too (MyProject). Here is the process for configuring Goose to manage the database schema management. First, I create the db/ directory, which will house all of the Goose-specific files, including my schema. $ cd MyProject $ mkdir db $ cd db $ vim dbconf.yml # Open with the editor of your choice. The dbconf.yml file contains a list of databases along with the relevant information for connecting to each. Mine looks something like this: test: driver: postgres open: user=foo dbname=foo_test sslmode=disable development: driver: postgres open: user=foo dbname=foo_dev sslmode=disable (Important: use spaces, not tabs, in YAML.) Now I have two databases configured. One for testing and one for development. By default, Goose assumes the target database is development. The above is just configured to connect to the PostgreSQL instance locally running. If I need support for a remote host, I can add host=... password=... (and remove sslmode=disable). At this point, I can generate a new migration. $ cd .. # Back to MyProject/, not in db/ $ goose create NewSchema sql goose: created db/migrations/20140311133014_NewSchema.sql $ vim db/migrations/20140311133014_NewSchema.sql # Use whatever editor you like Notice that the goose create command will create a new SQL file that follows Goose's naming convention. (That trailing sql on the command is important. goose create can also generate go migration files) My new schema file has two sections: a section for goose up and a section to rollback with goose down: -- +goose Up CREATE TABLE foo ( -- ... ); -- +goose Down DROP TABLE foo; With that done, I can now very easily create by development database: $ goose up If I want to setup test instead, I use the -env flag: $ goose -env=test up And that's it! In subsequent schema files, I may ALTER existing tables or CREATE new ones, and so on. Just about anything that your SQL engine can execute can be passed through Goose. (Though there are some formatting annotations you need to use for things like stored procedures.) Goose Pros In addition to the general ease of use of Goose, here are some additional features that I really like: You do not need your entire codebase to execute Goose. Our deployment box, for example, only has the Goose db/ directory, not the rest of the code. It is largely language neutral if you're just migrating SQL. It works with PostgreSQL, MySQL, and SQLite. The history table that it creates is human-readable, which makes it easy for me to see what's been going on. It supports environment variable interpolation. Don't want your password inside the dbconf.yml file? Just do something like this: development: driver: postgres open: user=foo dbname=foo_dev sslmode=disable password=$MY_DB_PASSWORD This will cause Goose to check the environment for a variable named $MY_DB_PASSWORD. Goose Cons Honestly, I have very few. Right now, you need the Go runtime to install and build Goose. Of course, you can compile Goose once, and then use it wherever. While it has support for Go language migrations, it would be nice to be able to write migration scripts that are executed via the shell. That way, one could use Bash, Python, Perl, or whatever else to trigger migrations. But, hey... this is a pretty minor complaint. Overall, though, Goose is a fantastic tool for handling migrations with ease.
March 27, 2014
by Matt Butcher
· 17,061 Views
article thumbnail
Integration Testing for Spring Applications with JNDI Connection Pools
We all know we need to use connection pools where ever we connect to a database. All of the modern drivers using JDBC type 4 support it. In this post we will have look at an overview ofconnection pooling in spring applications and how to deal with same context in a non JEE enviorements (like tests). Most examples of connecting to database in spring is done using DriverManagerDataSource. If you don't read the documentation properly then you are going to miss a very important point. NOTE: This class is not an actual connection pool; it does not actually pool Connections. It just serves as simple replacement for a full-blown connection pool, implementing the same standard interface, but creating new Connections on every call. Useful for test or standalone environments outside of a J2EE container, either as a DataSource bean in a corresponding ApplicationContext or in conjunction with a simple JNDI environment. Pool-assuming Connection.close() calls will simply close the Connection, so any DataSource-aware persistence code should work. Yes, by default the spring applications does not use pooled connections. There are two ways to implement the connection pooling. Depending on who is managing the pool. If you are running in a JEE environment, then it is prefered use the container for it. In a non-JEE setup there are libraries which will help the application to manage the connection pools. Lets discuss them in bit detail below. 1. Server (Container) managed connection pool (Using JNDI) When the application connects to the database server, establishing the physical actual connection takes much more than the execution of the scripts. Connection pooling is a technique that was pioneered by database vendors to allow multiple clients to share a cached set of connection objects that provide access to a database resource. The JavaWorld article gives a good overview about this. In a J2EE container, it is recommended to use a JNDI DataSource provided by the container. Such a DataSource can be exposed as a DataSource bean in a Spring ApplicationContext via JndiObjectFactoryBean, for seamless switching to and from a local DataSource bean like this class. The below articles helped me in setting up the data source in JBoss AS. 1. DebaJava Post 2. JBoss Installation Guide 3. JBoss Wiki Next step is to use these connections created by the server from the application. As mentioned in the documentation you can use the JndiObjectFactoryBean for this. It is as simple as below If you want to write any tests using springs "SpringJUnit4ClassRunner" it can't load the context becuase the JNDI resource will not be available. For tests, you can then either set up a mock JNDI environment through Spring's SimpleNamingContextBuilder, or switch the bean definition to a local DataSource (which is simpler and thus recommended). As I was looking for a good solutions to this problem (I did not want a separate context for tests) this SO answer helped me. It sort of uses the various tips given in the Javadoc to good effect. The issue with the above solution is the repetition of code to create the JNDI connections. I have solved it using a customized runner SpringWithJNDIRunner. This class adds the JNDI capabilities to the SpringJUnit4ClassRunner. It reads the data source from "test-datasource.xml" file in the class path and binds it to the JNDI resource with name "java:/my-ds". After the execution of this code the JNDI resource is available for the spring container to consume. import javax.naming.NamingException; import org.junit.runners.model.InitializationError; import org.springframework.context.ApplicationContext; import org.springframework.context.support.ClassPathXmlApplicationContext; import org.springframework.mock.jndi.SimpleNamingContextBuilder; import org.springframework.test.context.junit4.SpringJUnit4ClassRunner; /** * This class adds the JNDI capabilities to the SpringJUnit4ClassRunner. * @author mkadicha * */ public class SpringWithJNDIRunner extends SpringJUnit4ClassRunner { public static boolean isJNDIactive; /** * JNDI is activated with this constructor. * * @param klass * @throws InitializationError * @throws NamingException * @throws IllegalStateException */ public SpringWithJNDIRunner(Class klass) throws InitializationError, IllegalStateException, NamingException { super(klass); synchronized (SpringWithJNDIRunner.class) { if (!isJNDIactive) { ApplicationContext applicationContext = new ClassPathXmlApplicationContext( "test-datasource.xml"); SimpleNamingContextBuilder builder = new SimpleNamingContextBuilder(); builder.bind("java:/my-ds", applicationContext.getBean("dataSource")); builder.activate(); isJNDIactive = true; } } } } To use this runner you just need to use the annotation @RunWith(SpringWithJNDIRunner.class) in your test. This class extends SpringJUnit4ClassRunner beacuse a there can only be one class in the @RunWith annotation. The JNDI is created only once is a test cycle. This class provides a clean solution to the problem. 2. Application managed connection pool If you need a "real" connection pool outside of a J2EE container, consider Apache's Jakarta Commons DBCP or C3P0. Commons DBCP's BasicDataSource and C3P0's ComboPooledDataSource are full connection pool beans, supporting the same basic properties as this class plus specific settings (such as minimal/maximal pool size etc). Below user guides can help you configure this. 1. Spring Docs 2. C3P0 Userguide 3. DBCP Userguide The below articles speaks about the general guidelines and best practices in configuring the connection pools. 1. SO question on Spring JDBC Connection pools 2. Connection pool max size in MS SQL Server 2008 3. How to decide the max number of connections 4. Monitoring the number of active connections in SQL Server 2008 Note:- All the text in italics are copied from the spring documentation of the DriverManagerDataSource.
March 26, 2014
by Manu Pk
· 25,264 Views · 1 Like
article thumbnail
Postgres and Oracle Compatibility with Hibernate
Postgres and Oracle compatibility with Hibernate There are situations your JEE application needs to support Postgres and Oracle as a Database. Hibernate should do the job here, however, there are some specifics worth mentioning. While enabling Postgres for application already running Oracle I came across following tricky parts: BLOBs support, CLOBs support, Oracle not knowing Boolean type (using Integer) instead and DUAL table. These were the tricks I had to apply to make the @Entity classes running on both of these. Please note I’ve used Postgres 9.3 with Hibernate 4.2.1.SP1. BLOBs support The problem with Postgres is that it offers 2 types of BLOB storage: bytea - data stored in table oid - table holds just identifier to data stored elsewhere I guess in the most of the situations you can live with the bytea as well as I did. The other one as far as I’ve read is to be used for some huge data (in gigabytes) as it supports streams for IO operations. Well, it sounds nice there is such a support, however using Hibernate in this case can make things quite problematic (due to need to use the specific annotations), especially if you try to achieve compatibility with Oracle. To see the trouble here, see StackOverflow: proper hibernate annotation for byte[] All- the combinations are described there: annotation postgres oracle works on ------------------------------------------------------------- byte[] + @Lob oid blob oracle byte[] bytea raw(255) postgresql byte[] + @Type(PBA) oid blob oracle byte[] + @Type(BT) bytea blob postgresql where @Type(PBA) stands for: @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") and @Type(BT) stands for: @Type(type="org.hibernate.type.BinaryType"). These result in all sorts of Postgres errors, like: ERROR: column “foo” is of type oid but expression is of type bytea or ERROR: column “foo” is of type bytea but expression is of type oid Well, there seems to be a solution, still it includes patching of Hibernate library (something I see as the last option when playing with 3.rd party library). There is also a reference to official blog post from the Hibernate guys on the topic: PostgreSQL and BLOBs. Still solution described in blog post seems not working for me and based on the comments, seems to be invalid for more people. BLOBs solved OK, so now the optimistic part. After quite some debugging I ended up with the Entity definition like this : @Lob private byte[] foo; Oracle has no trouble with that, moreover I had to customize the Postgres dialect in a way: public class PostgreSQLDialectCustom extends PostgreSQL82Dialect { @Override public SqlTypeDescriptor remapSqlTypeDescriptor(SqlTypeDescriptor sqlTypeDescriptor) { if (sqlTypeDescriptor.getSqlType() == java.sql.Types.BLOB) { return BinaryTypeDescriptor.INSTANCE; } return super.remapSqlTypeDescriptor(sqlTypeDescriptor); } } That’s it! Quite simple right? That works for persisting to bytea typed columns in Postgres (as that fits my usecase). CLOBs support The errors in misconfiguration looked something like this: org.postgresql.util.PSQLException: Bad value for type long : ... So first I’ve found (on String LOBs on PostgreSQL with Hibernate 3.6) following solution: @Lob @Type(type = "org.hibernate.type.TextType") private String foo; Well, that works, but for Postgres only. Then there was a suggestion (on StackOverflow: Postgres UTF-8 clobs with JDBC) from to go for: @Lob @Type(type="org.hibernate.type.StringClobType") private String foo; That pointed me the right direction (the funny part was that it was just a comment to some answers). It was quite close, but didn’t work for me in all cases, still resulted in errors in my tests. CLOBs solved The important was @deprecation javadocs in the org.hibernate.type.StringClobType that brought me to working one: @Lob @Type(type="org.hibernate.type.MaterializedClobType") private String foo; That works for both Postgres and Oracle, without any further hacking (on Hibernate side) needed. Boolean type Oracle knows no Boolean type and the trouble is that Postgres does. As there was also some plain SQL present, I ended up In Postgres with error: ERROR: column “foo” is of type boolean but expression is of type integer I decided to enable cast from Integer to Boolean in Postgres rather than fixing all the plain SQL places (in a way found in Forum: Automatically Casting From Integer to Boolean): update pg_cast set castcontext = 'i' where oid in ( select c.oid from pg_cast c inner join pg_type src on src.oid = c.castsource inner join pg_type tgt on tgt.oid = c.casttarget where src.typname like 'int%' and tgt.typname like 'bool%'); Please note you should run the SQL update by user with provileges to update catalogs (probably not your postgres user used for DB connection from your application), as I’ve learned on Stackoverflow: Postgres - permission denied on updating pg_catalog.pg_cast. DUAL table There is one more specific in the Oracle I came across. If you have plain SQL, in Oracle there is DUAL table provied (see more info on Wikipedia on that) that might harm you in Postgres. Still the solution is simple. In Postgres create a view that would fill the similar purpose. It can be created like this: create or replace view dual as select 1; Conclusion Well that should be it. Enjoy your cross DB compatible JEE apps.
March 26, 2014
by Peter Butkovic
· 21,900 Views · 1 Like
article thumbnail
Distributed Counters Feature Design
this is another experiment with longer posts. previously, i used the time series example as the bed on which to test some ideas regarding feature design, to explain how we work and in general work out the rough patches along the way. i should probably note that these posts are purely fiction at this point. we have no plans to include a time series feature in ravendb at this time. i am trying to work out some thoughts in the open and get your feedback. at any rate, yesterday we had a request for cassandra style counters at the mailing list. and as long as i am doing feature design series, i thought that i could talk about how i would go about implementing this. again, consider this fiction, i have no plans of implementing this at this time. the essence of what we want is to be able to… count stuff. efficiently, in a distributed manner, with optional support for cross data center replication. very roughly, the idea is to have “sub counters”, unique for every node in the system. whenever you increment the value, we log this to our own sub counter, and then replicate it out. whenever you read it, we just sum all the data we have from all the sub counters. let us outline the various parts of the solution in the same order as the one i used for time series. storage a counter is just a named 64 bits signed integer. a counter name can be any string up to 128 printable characters. the external interface of the storage would look like this: 1: public struct counterincrement 2: { 3: public string name; 4: public long change; 5: } 6: 7: public struct counter 8: { 9: public string name; 10: public string source; 11: public long value; 12: } 13: 14: public interface icounterstorage 15: { 16: void localincrementbatch(counterincrement[] batch); 17: 18: counter[] read(string name); 19: 20: void replicatedupdates(counter[] updates); 21: } as you can see, this gives us very simple interface for the storage. we can either change the data locally (which modify our own storage) or we can get an update from a replica about its changes. there really isn’t much more to it, to be fair. the localincrementbatch() increment a local value, and read() will return all the values for a counter. there is a little bit of trickery involved in how exactly one would store the counter values. for now, i think we’ll store each counter as two step values. we’ll have a tree of multi tree values that will carry each value from each source. that means that a counter will take roughly 4kb or so. this is easy to work with and nicely fit the model voron uses internally. note that we’ll outline additional requirement for storage (searching for counter by prefix, iterating over counters, addresses of other servers, stats, etc) below. i’m not showing them here because they aren’t the major issue yet. over the wire skipping out on any optimizations that might be required, we will expose the following endpoints: get /counters/read?id=users/1/visits&users/1/posts <—will return json response with all the relevant values (already summed up). { “users/1/visits”: 43, “users/1/posts”: 3 } get /counters/read?id=users/1/visits&users/1/1/posts&raw=true <—will return json response with all the relevant values, per source. { “users/1/visits”: {“rvn1”: 21, “rvn2”: 22 } , “users/1/posts”: { “rvn1”: 2, “rvn3”: 1 } } post /counters/increment <– allows to increment counters. the request is a json array of the counter name and the change. for a real system, you’ll probably need a lot more stuff, metrics, stats, etc. but this is the high level design, so this would be enough. note that we are skipping the high performance stream based writes we outlined for time series. we’ll probably won’t need them, so that doesn’t matter, but they are an option if we need them. system behavior this is where it is really not interesting, there is very little behavior here, actually. we only have to read the data from the storage, sum it up, and send it to the user. hardly what i’ll call business logic. client api the client api will probably look something like this: 1: counters.increment("users/1/posts"); 2: counters.increment("users/1/visits", 4); 3: 4: using(var batch = counters.batch()) 5: { 6: batch.increment("users/1/posts"); 7: batch.increment("users/1/visits",5); 8: batch.submit(); 9: } note that we’re offering both batch and single api. we’ll likely also want to offer a fire & forget style, which will be able to offer even better performance (because they could do batching across more than a single thread), but that is out of scope for now. for simplicity sake, we are going to have the client just a container for all of endpoints that it knows about. the container would be responsible for… updating the client visible topology, selecting the best server to use at any given point, etc. user interface there isn’t much to it. just show a list of counter values in a list. allow to search by prefix, allow to dive into a particular counter and read its raw values, but that is about it. oh, and allow to delete a counter. deleting data honestly, i really hate deletes. they are very expensive to handle properly the moment you have more than a single node. in this case, there is an inherent race condition between a delete going out and another node getting an increment. and then there is the issue of what happens if you had a node down when you did the delete, etc. this just sucks. deletion are handled normally, (with the race condition caveat, obviously), and i’ll discuss how we replicate them in a bit. high availability / scale out by definition, we actually don’t want to have storage replication here. either log shipping or consensus based. we actually do want to have different values, because we are going to be modifying things independently on many servers. that means that we need to do replication at the database level. and that leads to some interesting questions. again, the hard part here is the deletes. actually, the really hard part is what we are going to do with the new server problem. the new server problem dictates how we are going to bring a new server into the cluster. if we could fix the size of the cluster, that would make things a lot easier. however, we are actually interested in being able to dynamically grow the cluster size. therefor, there are only two real ways to do it: add a new empty node to the cluster, and have it be filled from all the other servers. add a new node by backing up an existing node, and restoring as a new node. ravendb, for example, follows the first option. but it means that in needs to track a lot more information. the second option is actually a lot simpler, because we don’t need to care about keeping around old data. however, this means that the process of bringing up a new server would now be: update all nodes in the cluster with the new node address (node isn’t up yet, replication to it will fail and be queued). backup an existing node and restore at the new node. start the new node. the order of steps is quite important. and it would be easy to get it wrong. also, on large systems, backup & restore can take a long time. operationally speaking, i would much rather just be able to do something like, bring a new node into the cluster in “silent” mode. that is, it would get information from all the other nodes, and i can “flip the switch” and make it visible to clients at any point in time. that is how you do it with ravendb, and it is an incredibly powerful system, when used properly. that means that for all intents and purposes, we don’t do real deletes. what we’ll actually do is replace the counter value with delete marker. this turns deletes into a much simple “just another write”. it has the sad implication of not free disk space on deletes, but deletes tend to be rare, and it is usually fine to add a “purge” admin option that can be run on as needed basis. but that brings us to an interesting issue, how do we actually handle replication. the topology map to simplify things, we are going to go with one way replication from a node to another. that allows complex topologies like master-master, cluster-cluster, replication chain, etc. but in the end, this is all about a single node replication to another. the first question to ask is, are we going to replicate just our local changes, or are we going to have to replicate external changes as well? the problem with replicating external changes is that you may have the following topology: now, server a got a value and sent it to server b. server b then forwarded it to server c. however, at that point, we also have a the value from server a replicated directly to server c. which value is it supposed to pick? and what about a scenario where you have more complex topology? in general, because in this type of system, we can have any node accept writes, and we actually desire this to be the case , we don’t want this behavior. we want to only replicate local data, not all the data. of course, that leads to an annoying question, what happens if we have a 3 node cluster, and one node fails catastrophically. we can bring a new node in, and the other two nodes will be able to fill in their values via replication, but what about the node that is down? the data isn’t gone, it is still right there in the other two nodes, but we need a way to pull it out. therefor, i think that the best option would be to say that nodes only replicate their local state, except in the case of a new node. a new node will be told the address of an existing node in the cluster, at which point it will: register itself in all the nodes in the cluster (discoverable from the existing node). this assumes a standard two way replication link between all servers, if this isn’t the case, the operators would have the responsibility to setup the actual replication semantics on their own. new node now starts getting updates from all the nodes in the cluster. it keeps them in a log for now, not doing anything yet. ask that node for a complete update of all of its current state. when it has all the complete state of the existing node, it replays all of the remembered logs that it didn’t have a chance to apply yet. then it announces that it is in a valid state to start accepting client connections. note that this process is likely to be very sensitive to high data volumes. that is why you’ll usually want to select a backup node to read from, and that decision is an ops decision. you’ll also want to be able to report extensively on the current status of the node, since this can take a while, and ops will be watching this very closely. server name a node requires a unique name. we can use guids, but those aren’t readable, so we can use machine name + port, but those can change. ideally, we can require the user to set us up with a unique name. that is important for readability and for being able to alter see all the values we have in all the nodes. it is important that names are never repeated, so we’ll probably have a guid there anyway, just to be on the safe side. actual replication semantics since we have the new server problem down to an automated process, we can choose the drastically simpler model of just having an internal queue per each replication destination. whenever we make a change, we also make a note of that in the queue for that destination, then we start an async replication process to that server, sending all of our updates there. it is always safe to overwrite data using replication, because we are overwriting our own data, never anyone else. and… that is about it, actually. there are probably a lot of details that i am missing / would discover if we were to actually implement this. but i think that this is a pretty good idea about what this feature is about.
March 25, 2014
by Oren Eini
· 12,589 Views · 1 Like
  • Previous
  • ...
  • 498
  • 499
  • 500
  • 501
  • 502
  • 503
  • 504
  • 505
  • 506
  • 507
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×