Tools Resources

The Latest Tools Topics

Make Jenkins Windows Service use your Preferred JRE

recently i was working on installing and configuring a new instance of jenkins . for some reason, which is out of this post’s context, i wanted to make jenkins run with a specific version of the java environment. fortunately it was something really easy. this post is mainly a reminder to me, next time i’d like to do the same jenkins by default uses the jre which located under the jre sub-directory of your jenkins installation home ( %jenkins_home ). to change this find the file named jenkins.xml in which is located in your %jenkins_home directory. edit it and look for the following section %base%\jre\bin\java now change the content of the executable property to point to your favorite jre. you can describe it as an absolute or relative path or you can even use, environment variables. save the file and restart jenkins. that’s it! enjoy!

November 26, 2013

by Patroklos Papapetrou

· 16,779 Views · 2 Likes

New in Neo4j: Optional Relationships with OPTIONAL MATCH

One of the breaking changes in Neo4j 2.0.0-RC1 compared to previous versions is that the -[?]-> syntax for matching optional relationships has been retired and replaced with the OPTIONAL MATCH construct. An example where we might want to match an optional relationship could be if we want to find colleagues that we haven’t worked with given the following model: Suppose we have the following data set: CREATE (steve:Person {name: "Steve"}) CREATE (john:Person {name: "John"}) CREATE (david:Person {name: "David"}) CREATE (paul:Person {name: "Paul"}) CREATE (sam:Person {name: "Sam"}) CREATE (londonOffice:Office {name: "London Office"}) CREATE UNIQUE (steve)-[:WORKS_IN]->(londonOffice) CREATE UNIQUE (john)-[:WORKS_IN]->(londonOffice) CREATE UNIQUE (david)-[:WORKS_IN]->(londonOffice) CREATE UNIQUE (paul)-[:WORKS_IN]->(londonOffice) CREATE UNIQUE (sam)-[:WORKS_IN]->(londonOffice) CREATE UNIQUE (steve)-[:COLLEAGUES_WITH]->(john) CREATE UNIQUE (steve)-[:COLLEAGUES_WITH]->(david) We might write the following query to find people from the same office as Steve but that he hasn’t worked with: MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague) WHERE person.name = "Steve" AND office.name = "London Office" WITH person, potentialColleague MATCH (potentialColleague)-[c?:COLLEAGUES_WITH]-(person) WHERE c IS null RETURN potentialColleague ==> +----------------------+ ==> | potentialColleague | ==> +----------------------+ ==> | Node[4]{name:"Paul"} | ==> | Node[5]{name:"Sam"} | ==> +----------------------+ We first find which office Steve works in and find the people who also work in that office. Then we optionally match the ‘COLLEAGUES_WITH’ relationship and only return people who Steve doesn’t have that relationship with. If we run that query in 2.0.0-RC1 we get this exception: ==> SyntaxException: Question mark is no longer used for optional patterns - use OPTIONAL MATCH instead (line 1, column 199) ==> "MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague) WHERE person.name = "Steve" AND office.name = "London Office" WITH person, potentialColleague MATCH (potentialColleague)-[c?:COLLEAGUES_WITH]-(person) WHERE c IS null RETURN potentialColleague" ==> Based on that advice we might translate our query to read like this: MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague) WHERE person.name = "Steve" AND office.name = "London Office" WITH person, potentialColleague OPTIONAL MATCH (potentialColleague)-[c:COLLEAGUES_WITH]-(person) WHERE c IS null RETURN potentialColleague If we run that we get back more people than we’d expect: ==> +------------------------+ ==> | potentialColleague | ==> +------------------------+ ==> | Node[15]{name:"John"} | ==> | Node[14]{name:"David"} | ==> | Node[13]{name:"Paul"} | ==> | Node[12]{name:"Sam"} | ==> +------------------------+ The reason this query doesn’t work as we’d expect is because the WHERE clause immediately following OPTIONAL MATCH is part of the pattern rather than being evaluated afterwards as we’ve become used to. The OPTIONAL MATCH part of the query matches a ‘COLLEAGUES_WITH’ relationship where the relationship is actually null, something of a contradiction! However, since the match is optional a row is still returned. If we include ‘c’ in the RETURN part of the query we can see that this is the case: MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague) WHERE person.name = "Steve" AND office.name = "London Office" WITH person, potentialColleague OPTIONAL MATCH (potentialColleague)-[c:COLLEAGUES_WITH]-(person) WHERE c IS null RETURN potentialColleague, c ==> +---------------------------------+ ==> | potentialColleague | c | ==> +---------------------------------+ ==> | Node[15]{name:"John"} | | ==> | Node[14]{name:"David"} | | ==> | Node[13]{name:"Paul"} | | ==> | Node[12]{name:"Sam"} | | ==> +---------------------------------+ If we take out the WHERE part of the OPTIONAL MATCH the query is a bit closer to what we want: MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague) WHERE person.name = "Steve" AND office.name = "London Office" WITH person, potentialColleague OPTIONAL MATCH (potentialColleague)-[c:COLLEAGUES_WITH]-(person) RETURN potentialColleague, c ==> +-----------------------------------------------+ ==> | potentialColleague | c | ==> +-----------------------------------------------+ ==> | Node[2]{name:"John"} | :COLLEAGUES_WITH[5]{} | ==> | Node[3]{name:"David"} | :COLLEAGUES_WITH[6]{} | ==> | Node[4]{name:"Paul"} | | ==> | Node[5]{name:"Sam"} | | ==> +-----------------------------------------------+ If we introduce a WITH after the OPTIONAL MATCH we can choose to filter out those people that we’ve already worked with: MATCH (person:Person)-[:WORKS_IN]->(office)<-[:WORKS_IN]-(potentialColleague) WHERE person.name = "Steve" AND office.name = "London Office" WITH person, potentialColleague OPTIONAL MATCH (potentialColleague)-[c:COLLEAGUES_WITH]-(person) WITH potentialColleague, c WHERE c IS null RETURN potentialColleague If we evaluate that query it returns the same output as our original query: ==> +----------------------+ ==> | potentialColleague | ==> +----------------------+ ==> | Node[4]{name:"Paul"} | ==> | Node[5]{name:"Sam"} | ==> +----------------------+

November 26, 2013

by Mark Needham

· 21,542 Views · 7 Likes

Neo4j: Modeling Hyper Edges in a Property Graph

At the Graph Database meet up in Antwerp, we discussed how you would model a hyper edge in a property graph like Neo4j, and I realized that I’d done this in my football graph without realizing. A hyper edge is defined as follows: A hyperedge is a connection between two or more vertices, or nodes, of a hypergraph. A hypergraph is a graph in which generalized edges (called hyperedges) may connect more than two nodes with discrete properties. In Neo4j, an edge (or relationship) can only be between itself or another node; there’s no way of creating a relationship between more than 2 nodes. I had problems when trying to model the relationship between a player and a football match because I wanted to say that a player participated in a match and represented a specific team in that match. I started out with the following model: Unfortunately, creating a direct relationship from the player to the match means that there’s no way to work out which team they played for. This information is useful because sometimes players transfer teams in the middle of a season and we want to analyze how they performed for each team. In a property graph, we need to introduce an extra node which links the match, player and team together: Although we are forced to adopt this design it actually helps us realize an extra entity in our domain which wasn’t visible before – a player’s performance in a match. If we want to capture information about a players’ performance in a match we can store it on this node. We can also easily aggregate players stats by following the played relationship without needing to worry about the matches they played in. The Neo4j manual has a few more examples of domain models containing hyper edges which are worth having a look at if you want to learn more.

November 19, 2013

by Mark Needham

· 7,054 Views

Modeling Data in Neo4j: Bidirectional Relationships

transitioning from the relational world to the beautiful world of graphs requires a shift in thinking about data. although graphs are often much more intuitive than tables, there are certain mistakes people tend to make when modelling their data as a graph for the first time. in this article, we look at one common source of confusion: bidirectional relationships. directed relationships relationships in neo4j must have a type, giving the relationship a semantic meaning, and a direction. frequently, the direction becomes part of the relationship's meaning. in other words, the relationship would be ambiguous without it. for example, the following graph shows that the czech republic defeated sweden in ice hockey. had the direction of the relationship been reversed, the swedes would be much happier. with no direction at all, the relationship would be ambiguous, since it would not be clear who the winner was. note that the existence of this relationship implies a relationship of a different type going in the opposite direction, as the next graph illustrates. this is often the case. to give another example, the fact that pulp fiction was directed_by quentin tarantino implies that quentin tarantino is_director_of pulp fiction. you could come up with a huge number of such relationship pairs. one common mistake people often make when modelling their domain in neo4j is creating both types of relationships. since one relationship implies the other, this is wasteful, both in terms of space and traversal time. neo4j can traverse relationships in both directions. more importantly, thanks to the way neo4j organizes its data, the speed of traversal does not depend on the direction of the relationships being traversed. bidirectional relationships some relationships, on the other hand, are naturally bidirectional. a classic example is facebook or real-life friendship. this relationship is mutual - when someone is your friend, you are (hopefully) his friend, too. depending on how we look at the model, we could also say such relationship is undirected. graphaware and neo technology are partner companies. since this is a mutual relationship, we could model it as bidirectional or undirected relationship, respectively. but since none of this is directly possible in neo4j, beginners often resort to the following model, which suffers from the exact same problem as the incorrect ice hockey model: an extra unnecessary relationship. neo4j apis allow developers to completely ignore relationship direction when querying the graph, if they so desire. for example, in neo4j's own query language, cypher, the key part of a query finding all partner companies of neo technology would look something like match (neo)-[:partner]-(partner) the result would be the same as executing and merging the results of the following two different queries: match (neo)-[:partner]->(partner) and match (neo)<-[:partner]-(partner) therefore, the correct (or at least most efficient) way of modelling the partner relationships is using a single partner relationship with an arbitrary direction . conclusion relationships in neo4j can be traversed in both directions with the same speed. moreover, direction can be completely ignored. therefore, there is no need to create two different relationships between nodes, if one implies the other.

November 6, 2013

by Michal Bachman

· 28,500 Views · 2 Likes

Securing Docker’s Remote API

One piece to Docker that is interesting AMAZING is the Remote API that can be used to programatically interact with docker. I recently had a situation where I wanted to run many containers on a host with a single container managing the other containers through the API. But the problem I soon discovered is that at the moment when you turn networking on it is an all or nothing type of thing… you can’t turn networking off selectively on a container by container basis. You can disable IPv4 forwarding, but you can still reach the docker remote API on the machine if you can guess the IP address of it. One solution I came up with for this is to use nginx to expose the unix socket for docker over HTTPS and utilize client-side ssl certificates to only allow trusted containers to have access. I liked this setup a lot so I thought I would share how it’s done. Disclaimer: assumes some knowledge of docker! Generate The SSL Certificates We’ll use openssl to generate and self-sign the certs. Since this is for an internal service we’ll just sign it ourselves. We also remove the password from the keys so that we aren’t prompted for it each time we start nginx. # Create the CA Key and Certificate for signing Client Certs openssl genrsa -des3 -out ca.key 4096 openssl rsa -in ca.key -out ca.key # remove password! openssl req -new -x509 -days 365 -key ca.key -out ca.crt # Create the Server Key, CSR, and Certificate openssl genrsa -des3 -out server.key 1024 openssl rsa -in server.key -out server.key # remove password! openssl req -new -key server.key -out server.csr # We're self signing our own server cert here. This is a no-no in production. openssl x509 -req -days 365 -in server.csr -CA ca.crt -CAkey ca.key -set_serial 01 -out server.crt # Create the Client Key and CSR openssl genrsa -des3 -out client.key 1024 openssl rsa -in client.key -out client.key # no password! openssl req -new -key client.key -out client.csr # Sign the client certificate with our CA cert. Unlike signing our own server cert, this is what we want to do. openssl x509 -req -days 365 -in client.csr -CA ca.crt -CAkey ca.key -set_serial 01 -out client.crt Another option may be to leave the passphrase in and provide it as an environment variable when running a docker container or through some other means as an extra layer of security. We’ll move ca.crt, server.key and server.crt to /etc/nginx/certs. Setup Nginx The nginx setup for this is pretty straightforward. We just listen for traffic on localhost on port 4242. We require client-side ssl certificate validation and reference the certificates we generated in the previous step. And most important of all, set up an upstream proxy to the docker unix socket. I simply overwrote what was already in /etc/nginx/sites-enabled/default. upstream docker { server unix:/var/run/docker.sock fail_timeout=0; } server { listen 4242; server localhost; ssl on; ssl_certificate /etc/nginx/certs/server.crt; ssl_certificate_key /etc/nginx/certs/server.key; ssl_client_certificate /etc/nginx/certs/ca.crt; ssl_verify_client on; access_log on; error_log /dev/null; location / { proxy_pass http://docker; proxy_redirect off; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; client_max_body_size 10m; client_body_buffer_size 128k; proxy_connect_timeout 90; proxy_send_timeout 120; proxy_read_timeout 120; proxy_buffer_size 4k; proxy_buffers 4 32k; proxy_busy_buffers_size 64k; proxy_temp_file_write_size 64k; } } One important piece to make this work is you should add the user nginx runs as to the docker group so that it can read from the socket. This could be www-data, nginx, or something else! Hack It Up! With this setup and nginx restarted, let’s first run a curl command to make sure that this setup correctly. First we’ll make a call without the client cert to double check that we get denied access then a proper one. # Is normal http traffic denied? curl -v http://localhost:4242/info # How about https, sans client cert and key? curl -v -s -k https://localhost:4242/info # And the final good request! curl -v -s -k --key client.key --cert client.crt https://localhost:4242/info For the first two we should get some run of the mill 400 http response codes before we get a proper JSON response from the final command! Woot! But wait there’s more… let’s build a container that can call the service to launch other containers! For this example we’ll simply build two containers: one that has the client certificate and key and one that doesn’t. The code for these examples are pretty straightforward and to save space I’ll leave the untrusted container out. You can view the untrusted container on github (although it is nothing exciting). First, the node.js application that will connect and display information: https = require 'https' fs = require 'fs' options = host: 172.42.1.62 port: 4242 method: 'GET' path: '/containers/json' key: fs.readFileSync('ssl/client.key') cert: fs.readFileSync('ssl/client.crt') headers: { 'Accept': 'application/json'} # not required, but being semantic here! req = https.request options, (res) -> console.log res req.end() And the Dockerfile used to build the container. Notice we add the client.crt and client.key as part of building it! FROM shykes/nodejs MAINTAINER James R. Carr ADD ssl/client* /srv/app/ssl ADD package.json /srv/app/package.json ADD app.coffee /srv/app/app.coffee RUN cd /srv/app && npm install . CMD cd /srv/app && npm start That’s about it. Run docker build . and docker run -n >IMAGE ID< and we should see a json dump to the console of the actively running containers. Doing the same in the untrusted directory should present us with some 400 error about not providing a client ssl certificate. I’ve shared a project with all this code plus a vagrant file on github for your own prusual. Enjoy!

October 31, 2013

by James Carr

· 14,313 Views

Writing Git Hooks Using Python

Since git hooks can be any executable script with an appropriate #! line, Python is more than suitable for writing your git hooks. Simply stated, git hooks are scripts which are called at different points of time in the life cycle of working with your git repository. Let’s start by creating a new git repository: ~/work> git init git-hooks-exp Initialized empty Git repository in /home/gene/work/git-hooks-exp/.git/ ~/work> cd git-hooks-exp/ ~/work/git-hooks-exp (master)> tree -al .git/ .git/ ├── branches ├── config ├── description ├── HEAD ├── hooks │ ├── applypatch-msg.sample │ ├── commit-msg.sample │ ├── post-update.sample │ ├── pre-applypatch.sample │ ├── pre-commit.sample │ ├── prepare-commit-msg.sample │ ├── pre-rebase.sample │ └── update.sample ├── info │ └── exclude ├── objects │ ├── info │ └── pack └── refs ├── heads └── tags 9 directories, 12 files Inside the .git are a number of directories and files, one of them being hooks/ which is where the hooks live. By default, you will have a number of hooks with the file names ending in .sample. They may be useful as starting points for your own scripts. However, since they all have an extension .sample, none of the hooks are actually activated. For a hook to be activated, it must have the right file name and it should be executable. Let’s see how we can write a hook using Python. We will write a post-commit hook. This hook is called immediately after you have made a commit. We are going to do something fairly useless, but quite interesting in this hook. We will take the commit SHA1 of this commit, and print how it may look like in a more human form. I do the latter using the humanhash module. You will need to have it installed. Here is how the hook looks like: #!/usr/bin/python import subprocess import humanhash # get the last commit SHA and print it after humanizing it # https://github.com/zacharyvoase/humanhash print humanhash.humanize( subprocess.check_output( ['git','rev-parse','HEAD'])) I use the subprocess.check_output() function to execute the command git rev-parse HEAD so that I can get the commit SHA1 and then call the humanhash.humanize() function with it. Save the hook as a file, post-commit in your hooks/ directory and make it executable using chmod +x .git/hooks/post-commit. Let’s see the hook in action: ~/work/git-hooks-exp (master)> touch file ~/work/git-hooks-exp (master)> git add file ~/work/git-hooks-exp (master)> git commit -m "Added a file" carbon-network-connecticut-equal [master (root-commit) 2d7880b] Added a file 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 file The commit SHA1 for the commit turned out to be 2d7880be746a1c1e75844fc1aa161e2b8d955427. Let’s check it with the humanize function and check if we get the same message as above: >>> humanhash.humanize('2d7880be746a1c1e75844fc1aa161e2b8d955427') 'carbon-network-connecticut-equal' And you can see the same message above as well. For some of the hooks, you will see that they are called with some parameters. In Python you can access them using the sys.argv attribute from the sys module, with the first member being the name of the hook of course and the others will be the parameters that the hook is called with.

October 31, 2013

by Amit Saha

· 13,621 Views

Examples of the Windows Azure Storage Services REST API

The examples in this post were updated in September to work with the current version of the Windows Azure Storage REST API. In the Windows Azure MSDN Azure Forum there are occasional questions about the Windows Azure Storage Services REST API. I have occasionally responded to these with some code examples showing how to use the API. I thought it would be useful to provide some examples of using the REST API for tables, blobs and queues – if only so I don’t have to dredge up examples when people ask how to use it. This post is not intended to provide a complete description of the REST API. The REST API is comprehensively documented (other than the lack of working examples). Since the REST API is the definitive way to address Windows Azure Storage Services I think people using the higher level Storage Client API should have a passing understanding of the REST API to the level of being able to understand the documentation. Understanding the REST API can provide a deeper understanding of why the Storage Client API behaves the way it does. Fiddler The Fiddler Web Debugging Proxy is an essential tool when developing using the REST (or Storage Client) API since it captures precisely what is sent over the wire to the Windows Azure Storage Services. Authorization Nearly every request to the Windows Azure Storage Services must be authenticated. The exception is access to blobs with public read access. The supported authentication schemes for blobs, queues and tables and these are described here. The requests must be accompanied by an Authorization header constructed by making a hash-based message authentication code using the SHA-256 hash. The following is an example of performing the SHA-256 hash for the Authorization header: public static String CreateAuthorizationHeader(String canonicalizedString) { String signature = String.Empty; using (HMACSHA256 hmacSha256 = new HMACSHA256( Convert.FromBase64String(storageAccountKey) )) { Byte[] dataToHmac = System.Text.Encoding.UTF8.GetBytes(canonicalizedString); signature = Convert.ToBase64String(hmacSha256.ComputeHash(dataToHmac)); } String authorizationHeader = String.Format( CultureInfo.InvariantCulture, "{0} {1}:{2}", AzureStorageConstants.SharedKeyAuthorizationScheme, AzureStorageConstants.Account, signature ); return authorizationHeader; } This method is used in all the examples in this post. AzureStorageConstants is a helper class containing various constants. Key is a secret key for Windows Azure Storage Services account specified by Account. In the examples given here, SharedKeyAuthorizationScheme is SharedKey. The trickiest part in using the REST API successfully is getting the correct string to sign. Fortunately, in the event of an authentication failure the Blob Service and Queue Service responds with the authorization string they used and this can be compared with the authorization string used in generating the Authorization header. This has greatly simplified the us of the REST API. Table Service API The Table Service API supports the following table-level operations: Create Table Delete Table Query Tables The Table Service API supports the following entity-level operations: Delete Entity Insert Entity Merge Entity Update Entity Query Entities These operations are implemented using the appropriate HTTP VERB: DELETE – delete GET – query MERGE – merge POST – insert PUT – update This section provides examples of the Insert Entity and Query Entities operations. Insert Entity The InsertEntity() method listed in this section inserts an entity with two String properties, Artist and Title, into a table. The entity is submitted as an ATOM entry in the body of a request POSTed to the Table Service. In this example, the ATOM entry is generated by the GetRequestContentInsertXml() method. The date must be in RFC 1123 format in the x-ms-date header supplied to the canonicalized resource used to create the Authorization string. Note that the storage service version is set to “2012-02-12″ which requires the DataServiceVersion and MaxDataServiceVersion to be set appropriately. public void InsertEntity(String tableName, String artist, String title) { String requestMethod = "POST"; String urlPath = tableName; String storageServiceVersion = "2012-02-12"; String dateInRfc1123Format = DateTime.UtcNow.ToString("R", CultureInfo.InvariantCulture); String contentMD5 = String.Empty; String contentType = "application/atom+xml"; String canonicalizedResource = String.Format("/{0}/{1}", AzureStorageConstants.Account, urlPath); String stringToSign = String.Format( "{0}\n{1}\n{2}\n{3}\n{4}", requestMethod, contentMD5, contentType, dateInRfc1123Format, canonicalizedResource); String authorizationHeader = Utility.CreateAuthorizationHeader(stringToSign); UTF8Encoding utf8Encoding = new UTF8Encoding(); Byte[] content = utf8Encoding.GetBytes(GetRequestContentInsertXml(artist, title)); Uri uri = new Uri(AzureStorageConstants.TableEndPoint + urlPath); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri); request.Accept = "application/atom+xml,application/xml"; request.ContentLength = content.Length; request.ContentType = contentType; request.Method = requestMethod; request.Headers.Add("x-ms-date", dateInRfc1123Format); request.Headers.Add("x-ms-version", storageServiceVersion); request.Headers.Add("Authorization", authorizationHeader); request.Headers.Add("Accept-Charset", "UTF-8"); request.Headers.Add("DataServiceVersion", "2.0;NetFx"); request.Headers.Add("MaxDataServiceVersion", "2.0;NetFx"); using (Stream requestStream = request.GetRequestStream()) { requestStream.Write(content, 0, content.Length); } using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { Stream dataStream = response.GetResponseStream(); using (StreamReader reader = new StreamReader(dataStream)) { String responseFromServer = reader.ReadToEnd(); } } } private String GetRequestContentInsertXml(String artist, String title) { String defaultNameSpace = "http://www.w3.org/2005/Atom"; String dataservicesNameSpace = "http://schemas.microsoft.com/ado/2007/08/dataservices"; String metadataNameSpace = "http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"; XmlWriterSettings xmlWriterSettings = new XmlWriterSettings(); xmlWriterSettings.OmitXmlDeclaration = false; xmlWriterSettings.Encoding = Encoding.UTF8; StringBuilder entry = new StringBuilder(); using (XmlWriter xmlWriter = XmlWriter.Create(entry)) { xmlWriter.WriteProcessingInstruction("xml", "version=\"1.0\" encoding=\"UTF-8\""); xmlWriter.WriteWhitespace("\n"); xmlWriter.WriteStartElement("entry", defaultNameSpace); xmlWriter.WriteAttributeString("xmlns", "d", null, dataservicesNameSpace); xmlWriter.WriteAttributeString("xmlns", "m", null, metadataNameSpace); xmlWriter.WriteElementString("title", null); xmlWriter.WriteElementString("updated", String.Format("{0:o}", DateTime.UtcNow)); xmlWriter.WriteStartElement("author"); xmlWriter.WriteElementString("name", null); xmlWriter.WriteEndElement(); xmlWriter.WriteElementString("id", null); xmlWriter.WriteStartElement("content"); xmlWriter.WriteAttributeString("type", "application/xml"); xmlWriter.WriteStartElement("properties", metadataNameSpace); xmlWriter.WriteElementString("PartitionKey", dataservicesNameSpace, artist); xmlWriter.WriteElementString("RowKey", dataservicesNameSpace, title); xmlWriter.WriteElementString("Artist", dataservicesNameSpace, artist); xmlWriter.WriteElementString("Title", dataservicesNameSpace, title + "\n" + title); xmlWriter.WriteEndElement(); xmlWriter.WriteEndElement(); xmlWriter.WriteEndElement(); xmlWriter.Close(); } String requestContent = entry.ToString(); return requestContent; } This generates the following request (as captured by Fiddler): POST https://STORAGE_ACCOUNT.table.core.windows.net/authors HTTP/1.1 Accept: application/atom+xml,application/xml Content-Type: application/atom+xml x-ms-date: Sun, 08 Sep 2013 06:31:12 GMT x-ms-version: 2012-02-12 Authorization: SharedKey STORAGE_ACCOUNT:w7Uu4wHZx4fFwa2bsxd/TJVZZ1AqMPwxvW+pYtoWHd0= Accept-Charset: UTF-8 DataServiceVersion: 2.0;NetFx MaxDataServiceVersion: 2.0;NetFx Host: STORAGE_ACCOUNT.table.core.windows.net Content-Length: 514 Expect: 100-continue Connection: Keep-Alive The body of the request is: 2013-09-08T07:19:07Z Beckett Molloy 2013-09-08T07:19:07.2189243Z Beckett Molloy Molloy Note that I should have URLEncoded the PartitionKey and RowKey but did not do so for simplicity. There are, in fact, some issues with the URL encoding of spaces and other symbols. Get Entity The GetEntity() method described in this section retrieves the single entity inserted in the previous section. The particular entity to be retrieved is identified directly in the URL. public void GetEntity(String tableName, String partitionKey, String rowKey) { String requestMethod = "GET"; String urlPath = String.Format("{0}(PartitionKey='{1}',RowKey='{2}')", tableName, partitionKey, rowKey); String storageServiceVersion = "2012-02-12"; String dateInRfc1123Format = DateTime.UtcNow.ToString("R", CultureInfo.InvariantCulture); String canonicalizedResource = String.Format("/{0}/{1}", AzureStorageConstants.Account, urlPath); String stringToSign = String.Format( "{0}\n\n\n{1}\n{2}", requestMethod, dateInRfc1123Format, canonicalizedResource); String authorizationHeader = Utility.CreateAuthorizationHeader(stringToSign); Uri uri = new Uri(AzureStorageConstants.TableEndPoint + urlPath); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri); request.Method = requestMethod; request.Headers.Add("x-ms-date", dateInRfc1123Format); request.Headers.Add("x-ms-version", storageServiceVersion); request.Headers.Add("Authorization", authorizationHeader); request.Headers.Add("Accept-Charset", "UTF-8"); request.Accept = "application/atom+xml,application/xml"; request.Headers.Add("DataServiceVersion", "2.0;NetFx"); request.Headers.Add("MaxDataServiceVersion", "2.0;NetFx"); using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { Stream dataStream = response.GetResponseStream(); using (StreamReader reader = new StreamReader(dataStream)) { String responseFromServer = reader.ReadToEnd(); } } } This generates the following request (as captured by Fiddler): GET https://STORAGE_ACCOUNT.table.core.windows.net/authors(PartitionKey='Beckett',RowKey='Molloy') HTTP/1.1 x-ms-date: Sun, 08 Sep 2013 06:31:14 GMT x-ms-version: 2012-02-12 Authorization: SharedKey STORAGE_ACCOUNT:1hWbr4aNq4JWCpNJY3rsLH1SkIyeFTJflbqyKMPQ1Gk= Accept-Charset: UTF-8 Accept: application/atom+xml,application/xml DataServiceVersion: 2.0;NetFx MaxDataServiceVersion: 2.0;NetFx Host: STORAGE_ACCOUNT.table.core.windows.net The Table Service generates the following response: HTTP/1.1 200 OK Cache-Control: no-cache Content-Type: application/atom+xml;charset=utf-8 ETag: W/"datetime'2013-09-08T06%3A31%3A14.1579056Z'" Server: Windows-Azure-Table/1.0 Microsoft-HTTPAPI/2.0 x-ms-request-id: f4bd4c77-6fb6-42a8-8dff-81ea8d28fa2e x-ms-version: 2012-02-12 Date: Sun, 08 Sep 2013 06:31:15 GMT Content-Length: 1108 The returned entities, in this case a single entity, are returned in ATOM entry format in the response body: https://STORAGE_ACCOUNT.table.core.windows.net/authors(PartitionKey='Beckett',RowKey='Molloy') 2013-09-08T06:31:15Z Beckett Molloy 2013-09-08T06:31:14.1579056Z Beckett Molloy Molloy Blob Service API The Blob Service API supports the following account-level operation: List Containers The Blob Service API supports the following container-level operation: Create Container Delete Container Get Container ACL Get Container Properties Get Container Metadata List Blobs Set Container ACL Set Container Metadata The Blob Service API supports the following blob-level operation: Copy Blob Delete Blob Get Blob Get Blob Metadata Get Blob Properties Lease Blob Put Blob Set Blob Metadata Set Blob Properties Snapshot Blob The Blob Service API supports the following operations on block blobs: Get Block List Put Block Put Block List The Blob Service API supports the following operations on page blobs: Get Page Regions Put Page This section provides examples of the Put Blob and Lease Blob operations. Put Blob The Blob Service and Queue Service use a different form of shared-key authentication from the Table Service so care should be taken in creating the string to be signed for authorization. The blob type, BlockBlob or PageBlob, must be specified as a request header and consequently appears in the authorization string. public void PutBlob(String containerName, String blobName) { String requestMethod = "PUT"; String urlPath = String.Format("{0}/{1}", containerName, blobName); String storageServiceVersion = "2012-02-12"; String dateInRfc1123Format = DateTime.UtcNow.ToString("R", CultureInfo.InvariantCulture); String content = "Andrew Carnegie was born in Dunfermline"; UTF8Encoding utf8Encoding = new UTF8Encoding(); Byte[] blobContent = utf8Encoding.GetBytes(content); Int32 blobLength = blobContent.Length; const String blobType = "BlockBlob"; String canonicalizedHeaders = String.Format( "x-ms-blob-type:{0}\nx-ms-date:{1}\nx-ms-version:{2}", blobType, dateInRfc1123Format, storageServiceVersion); String canonicalizedResource = String.Format("/{0}/{1}", AzureStorageConstants.Account, urlPath); String stringToSign = String.Format( "{0}\n\n\n{1}\n\n\n\n\n\n\n\n\n{2}\n{3}", requestMethod, blobLength, canonicalizedHeaders, canonicalizedResource); String authorizationHeader = Utility.CreateAuthorizationHeader(stringToSign); Uri uri = new Uri(AzureStorageConstants.BlobEndPoint + urlPath); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri); request.Method = requestMethod; request.Headers.Add("x-ms-blob-type", blobType); request.Headers.Add("x-ms-date", dateInRfc1123Format); request.Headers.Add("x-ms-version", storageServiceVersion); request.Headers.Add("Authorization", authorizationHeader); request.ContentLength = blobLength; using (Stream requestStream = request.GetRequestStream()) { requestStream.Write(blobContent, 0, blobLength); } using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { String ETag = response.Headers["ETag"]; } } This generates the following request: PUT https://STORAGE_ACCOUNT.blob.core.windows.net/fife/dunfermline HTTP/1.1 x-ms-blob-type: BlockBlob x-ms-date: Sun, 08 Sep 2013 06:28:29 GMT x-ms-version: 2012-02-12 Authorization: SharedKey STORAGE_ACCOUNT:ntvh/lamVmikvwHhy6vRVBIh87kibkPlEOiHyLDia6g= Host: STORAGE_ACCOUNT.blob.core.windows.net Content-Length: 39 Expect: 100-continue Connection: Keep-Alive The body of the request is: Andrew Carnegie was born in Dunfermline The Blob Service generates the following response: HTTP/1.1 201 Created Transfer-Encoding: chunked Content-MD5: RYJnWGXLyt94l5jG82LjBw== Last-Modified: Sun, 08 Sep 2013 06:28:31 GMT ETag: "0x8D07A73C5704A86" Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0 x-ms-request-id: b74ef0a2-294d-4581-b8f1-6cda724bbdbf x-ms-version: 2012-02-12 Date: Sun, 08 Sep 2013 06:28:30 GMT Lease Blob The Blob Service allows a user to lease a blob for a minute at a time and so acquire a write lock on it. The use case for this is the locking of a page blob used to store the VHD backing an writeable Azure Drive. The LeaseBlob() example in this section demonstrates a subtle issue with the creation of authorization strings. The URL has a query string, comp=lease. Rather than using this directly in creating the authorization string it must be converted into comp:lease with a colon replacing the equal symbol – see modifiedURL in the example. Furthermore, the Lease Blob operation requires the use of an x-ms-lease-action to indicate whether the lease is being acquired, renewed, released or broken. public void LeaseBlob(String containerName, String blobName) { String requestMethod = "PUT"; String urlPath = String.Format("{0}/{1}?comp=lease", containerName, blobName); String modifiedUrlPath = String.Format("{0}/{1}\ncomp:lease", containerName, blobName); const Int32 contentLength = 0; String storageServiceVersion = "2012-02-12"; String dateInRfc1123Format = DateTime.UtcNow.ToString("R", CultureInfo.InvariantCulture); String leaseAction = "acquire"; String leaseDuration = "60"; String canonicalizedHeaders = String.Format( "x-ms-date:{0}\nx-ms-lease-action:{1}\nx-ms-lease-duration:{2}\nx-ms-version:{3}", dateInRfc1123Format, leaseAction, leaseDuration, storageServiceVersion); String canonicalizedResource = String.Format("/{0}/{1}", AzureStorageConstants.Account, modifiedUrlPath); String stringToSign = String.Format( "{0}\n\n\n{1}\n\n\n\n\n\n\n\n\n{2}\n{3}", requestMethod, contentLength, canonicalizedHeaders, canonicalizedResource); String authorizationHeader = Utility.CreateAuthorizationHeader(stringToSign); Uri uri = new Uri(AzureStorageConstants.BlobEndPoint + urlPath); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri); request.Method = requestMethod; request.Headers.Add("x-ms-date", dateInRfc1123Format); request.Headers.Add("x-ms-lease-action", leaseAction); request.Headers.Add("x-ms-lease-duration", leaseDuration); request.Headers.Add("x-ms-version", storageServiceVersion); request.Headers.Add("Authorization", authorizationHeader); request.ContentLength = contentLength; using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { String leaseId = response.Headers["x-ms-lease-id"]; } } This generates the following request: PUT https://STORAGE_ACCOUNT.blob.core.windows.net/fife/dunfermline?comp=lease HTTP/1.1 x-ms-date: Sun, 08 Sep 2013 06:28:31 GMT x-ms-lease-action: acquire x-ms-lease-duration: 60 x-ms-version: 2012-02-12 Authorization: SharedKey rebus:+SQ5+RFZg3hUaws5XCRHxsDgXb1ycdRIz5EKyHJWP7s= Host: rebus.blob.core.windows.net Content-Length: 0 The Blob Service generates the following response: HTTP/1.1 201 Created Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0 x-ms-request-id: 4b6ff77f-f885-4f74-803a-c92920d225c3 x-ms-version: 2012-02-12 x-ms-lease-id: b1320c2c-65ad-41d6-a7bd-85a4242c0ac5 Date: Sun, 08 Sep 2013 06:28:31 GMT Content-Length: 0 Queue Service API The Queue Service API supports the following queue-level operation: List Queues The Queue Service API supports the following queue-level operation: Create Queue Delete Queue Get Queue Metadata Set Queue Metadata The Queue Service API supports the following message-level operations: Clear Messages Delete Message Get Messages Peek Messages Put Message This section provides examples of the Put Message and Get Message operations. Put Message The most obvious curiosity about Put Message is that it uses the HTTP verb POST rather than PUT. The issue is presumably the interaction of the English language and the HTTP standard which states that PUT should be idempotent and that the Put Message operation is clearly not since each invocation merely adds another message to the queue. Regardless, it did catch me out when I failed to read the documentation well enough – so take that as a warning. The content of a message posted to the queue must be formatted in a specified XML schema and must then be UTF8 encoded. public void PutMessage(String queueName, String message) { String requestMethod = "POST"; String urlPath = String.Format("{0}/messages", queueName); String storageServiceVersion = "2012-02-12"; String dateInRfc1123Format = DateTime.UtcNow.ToString("R", CultureInfo.InvariantCulture); String messageText = String.Format( "{0}", message); UTF8Encoding utf8Encoding = new UTF8Encoding(); Byte[] messageContent = utf8Encoding.GetBytes(messageText); Int32 messageLength = messageContent.Length; String canonicalizedHeaders = String.Format( "x-ms-date:{0}\nx-ms-version:{1}", dateInRfc1123Format, storageServiceVersion); String canonicalizedResource = String.Format("/{0}/{1}", AzureStorageConstants.Account, urlPath); String stringToSign = String.Format( "{0}\n\n\n{1}\n\n\n\n\n\n\n\n\n{2}\n{3}", requestMethod, messageLength, canonicalizedHeaders, canonicalizedResource); String authorizationHeader = Utility.CreateAuthorizationHeader(stringToSign); Uri uri = new Uri(AzureStorageConstants.QueueEndPoint + urlPath); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri); request.Method = requestMethod; request.Headers.Add("x-ms-date", dateInRfc1123Format); request.Headers.Add("x-ms-version", storageServiceVersion); request.Headers.Add("Authorization", authorizationHeader); request.ContentLength = messageLength; using (Stream requestStream = request.GetRequestStream()) { requestStream.Write(messageContent, 0, messageLength); } using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { String requestId = response.Headers["x-ms-request-id"]; } } This generates the following request: POST https://rebus.queue.core.windows.net/revolution/messages HTTP/1.1 x-ms-date: Sun, 08 Sep 2013 06:34:08 GMT x-ms-version: 2012-02-12 Authorization: SharedKey rebus:nyASTVWifnxHKnj2wXwuzzzXz5CxUBZj58SToV5QFK8= Host: rebus.queue.core.windows.net Content-Length: 76 Expect: 100-continue Connection: Keep-Alive The body of the request is: Saturday in the cafe The Queue Service generates the following response: HTTP/1.1 201 Created Server: Windows-Azure-Queue/1.0 Microsoft-HTTPAPI/2.0 x-ms-request-id: 14c6e73b-15d9-480c-b251-c4c01b48e529 x-ms-version: 2012-02-12 Date: Sun, 08 Sep 2013 06:34:09 GMT Content-Length: 0 Get Messages The Get Messages operation described in this section retrieves a single message with the default message visibility timeout of 30 seconds. public void GetMessage(String queueName) { string requestMethod = "GET"; String urlPath = String.Format("{0}/messages", queueName); String storageServiceVersion = "2012-02-12"; String dateInRfc1123Format = DateTime.UtcNow.ToString("R", CultureInfo.InvariantCulture); String canonicalizedHeaders = String.Format( "x-ms-date:{0}\nx-ms-version:{1}", dateInRfc1123Format, storageServiceVersion); String canonicalizedResource = String.Format("/{0}/{1}", AzureStorageConstants.Account, urlPath); String stringToSign = String.Format( "{0}\n\n\n\n\n\n\n\n\n\n\n\n{1}\n{2}", requestMethod, canonicalizedHeaders, canonicalizedResource); String authorizationHeader = Utility.CreateAuthorizationHeader(stringToSign); Uri uri = new Uri(AzureStorageConstants.QueueEndPoint + urlPath); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri); request.Method = requestMethod; request.Headers.Add("x-ms-date", dateInRfc1123Format); request.Headers.Add("x-ms-version", storageServiceVersion); request.Headers.Add("Authorization", authorizationHeader); request.Accept = "application/atom+xml,application/xml"; using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { Stream dataStream = response.GetResponseStream(); using (StreamReader reader = new StreamReader(dataStream)) { String responseFromServer = reader.ReadToEnd(); } } } This generates the following request: GET https://rebus.queue.core.windows.net/revolution/messages HTTP/1.1 x-ms-date: Sun, 08 Sep 2013 06:34:11 GMT x-ms-version: 2012-02-12 Authorization: SharedKey rebus:K67XooYhokw0i0AlCzYQ4GeLLrJih1r1vSqiO9DBo0c= Accept: application/atom+xml,application/xml Host: rebus.queue.core.windows.net The Queue Service generates the following response: HTTP/1.1 200 OK Content-Type: application/xml Server: Windows-Azure-Queue/1.0 Microsoft-HTTPAPI/2.0 x-ms-request-id: efb21a86-7d66-47fd-b13d-7aa74fce0568 x-ms-version: 2012-02-12 Date: Sun, 08 Sep 2013 06:34:12 GMT Content-Length: 484 The message is returned in the response body as follows: 05fd902f-6031-4ef4-8298-ef3844ec3bc6Sun, 08 Sep 2013 06:34:11 GMTSun, 15 Sep 2013 06:34:11 GMT1AgAAAAMAAAAAAAAAAL+zgF2szgE=Sun, 08 Sep 2013 06:34:43 GMTSaturday in the cafe I noticed that some newline specifiers in strings (\n) were lost when the blog was auto-ported from Windows Live Spaces to WordPress. I have put them back in but it is possible I missed some. Consequently, in the event of a problem you should check the newlines in canonicalizedHeaders and stringToSign.

October 24, 2013

by Neil Mackenzie

· 38,843 Views

Introduction to Android Studio

Feeling good to be back at the blog . Actually, I have been managing GDG Ahmedabad, delivering android talks, and managing workshops locally and outside my region. Last month, I was quite busy in organizing the “DevFest” event for GDG Ahmedabad, and then for the preparation of my two talks for the GDG Kathmandu DevFest. I was invited to deliver two talks at DevFest, which was organized by GDG Kathmandu. I have already published slides on my Speakerdeck. I am not sure whether you have already checked and learned from my speaker deck, but still give me a chance to write about Introduction to Android studio here. What is Android Studio? It’s an Android focused IDE, designed specially for Android development. It was launched on 16th May 2013, during Google's I/O 2013 event. Android studio contains all the Android SDK tools to design, test, debug and profile your app. By looking at the development tools and environment, we can see its similar to Eclipse with the ADT plug-in, but as I have mentioned above, it's an Android focused IDE, and there are many cool features available in Android Studio that can foster and increase your development productivity. One great thing is that it depends on the IntelliJ Idea IDE, which has proved itself to be a great IDE and has been in use by many Android engineers. What is the Difference Between IntelliJ Idea and Android Studio? Nothing, in regards to Android. If you use IntelliJ… Keep using it IntelliJ 13 will have the same stuff EAP of IntelliJ Idea 13 includes all the new stuff If Not… Give Android Studio a try You may have some questions in mind regarding IntelliJ and Android Studio. If so, check the FAQ section: IntelliJ IDEA and Android Studio FAQ. Let’s Download Android Studio You can download Android Studio from the android developer site: http://developer.android.com/sdk/installing/studio.html. Cool Features of Android Studio As I have mentioned, it's similar to Eclipse with the ADT plug-in, but Android Studio has many cool features that can help you to increase development productivity. Here are the cool features: Powerful code editing (smart editing, code re-factoring) Rich layout Editor (As soon as you drag and drop views on the layout, it shows you a preview in all the screens including Nexus 4, Nexus 7, Nexus 10 and many other resolutions. Layout designing can be done much faster way as compared to eclipse.) Gradle-based build support Maven Support Template-based wizards Lint tool analysis (The Android lint tool is a static code analysis tool that checks your Android project source files for potential bugs and optimization improvements for correctness, security, performance, usability, accessibility, and internationalization). You can experience all the cool features by using Android Studio yourself Awesome Stuff Inside Darcula Theme It's actually a black-based theme. While using Android Studio, I enjoy working in Darcula theme environment. By the way, Its Darcula theme, not Dracula. I am correcting this just because I have seen many people on Stackoverflow and Google+ saying Dracula. You can set the Darcula theme in Android Studio by: File > Settings > IDE Settings > Appearance > Theme: Darcula. Preview All the Screens We can consider this is as part of the Rich layout editor feature. With this privilege, users can design layouts and can check layouts by previewing in all the possible screens, such as Nexus 4, Nexus 7, Nexus and many other devices. It helps the user to improve layout designs while providing compatibility to various resolutions available. Device Framed Screen Capture It provides ability to directly generate a screenshot of your application. Yes, it was already included in the SDK, but Android Studio provides something more: Device frame (As frames for many Nexus devices are available, you can capture screenshot in whichever frame you like most) Drop shadow Screen glare Color Preview I like this feature very much and I have found this feature helpful while working on big projects. While using Eclipse, we have to have 3rd party color chooser and picker but this feature gives privilege to select color from in-build color chooser and can also have preview in Colors.xml file. Color Preview – Activity class While using Eclipse, it’s difficult to check which color we have used. Yes, we can imagine the color by its name, but an actual preview is much better. This feature was recently introduced in Android Studio, so you must have latest version installed. Hard Coded Strings Here is another feature I like and have found useful: Whenever you use any string resources from Strings.xml, it displays actual value instead of variable name. This setting comes by default, but in case you aren’t able to get hard coded strings in your activity class, then try any of the below ways. Settings > Editor > Code Folding > Android String References OR Select String and right click on it and then go to Folding > Collapse OR CTRL + Numpad ‘-’ Create Layout Variation This provides the ability to create layout variation directly. For example: layout for the large screen, layout for Xlarge screen, etc. The great thing is that the created variant layout gets stored in particular folders like layout-xlarge, layout-large-land, etc. Should I Use Android Studio? You might have explored all the cool features, or you are ready to explore right now. But questions might have arisen in your mind: “Should I use Android Studio,” or “should we start using Android Studio right now,” or “should I continue with IntelliJ or Eclipse?” My answer is a big NO to use Android Studio as your main IDE for Android development, because currently its EARLY ACCESS PREVIEW and it's maturing over days. Engineers have been working hard to improve this IDE. So, you should wait until the BETA comes out. I agree with Carlos Vega (commented over G+) on this point: “You should at least migrate to Intellij Idea 12 so that you get familiar with the IDE’s workflow and keyboard shortcuts. That way when Android Studio reach a more stable level, you can switch without a major learning curve.” Thanks, Carlos Vega, for the input. By the way, here is the presentation I delivered at the GDG Kathmandu DevFest.

October 7, 2013

by Paresh Mayani

· 26,755 Views

Sparse and Memory-mapped Files

One of the problems with memory-mapped files is that you can’t actually map beyond the end of the file. So you can’t use that to extend your file. I had a thought about and set out to check out what happens when I create a sparse file, a file that only take space when you write to it, and at the same time, map it. As it turns out, this actually works pretty well in practice. You can do so without any issues. Here is how it works: using (var f = File.Create(path)) { int bytesReturned = 0; var nativeOverlapped = new NativeOverlapped(); if (!NativeMethod.DeviceIoControl(f.SafeFileHandle, EIoControlCode.FsctlSetSparse, IntPtr.Zero, 0, IntPtr.Zero, 0, ref bytesReturned, ref nativeOverlapped)) { throw new Win32Exception(); } f.SetLength(1024*1024*1024*64L); } This creates a sparse file that is 64 GB in size. Then we can map it normally: using (var mmf = MemoryMappedFile.CreateFromFile(path)) using (var memoryMappedViewAccessor = mmf.CreateViewAccessor(0, 1024*1024*1024*64L)) { for (long i = 0; i < memoryMappedViewAccessor.Capacity; i += buffer.Length) { memoryMappedViewAccessor.WriteArray(i, buffer, 0, buffer.Length); } } And then we can do stuff to it. And that includes writing to yet-unallocated parts of the file. This also means that you don’t have to worry about writing past the end of the file, the OS will take care of all of that for you. Happy happy, joy joy, etc. There is one problem with this method, however. It means that you have a 64 GB file, but you don’t have that much allocated. What that means in turn is that you might not have that much space available for the file. Which brings up an interesting question, what happens when you are trying to commit a new page, and the disk is out of space? Using file I/O you would get an I/O error with the right code. But when using memory mapped files, the error would actually turn up during access, which can happen pretty much anywhere. It also means that it is a Standard Exception Handling error in Windows, which requires special treatment. To test this out, I wrote the following so it would write to a disk that had only about 50 GB free. I wanted to know what would happen when it ran out of space. That is actually something that happens, and we need to be able to address this issue robustly. The kicker is that this might actually happen at any time, so that would really result is some… interesting behavior with regards to robustness. In other words, I don’t think that this is a viable option, it is a really cool trick, but I don’t think it is a very well thought out option. By the way, the result of my experiment was that we had an effectively a frozen process. No errors, nothing, just a hung. Also, I am pretty sure that WriteArray() is really slow, but I’ll check this out at another pointer in time.

October 1, 2013

by Oren Eini

· 8,161 Views

Connecting to SQL Azure with SQL Management Studio

Intro If you want to manage your SQL Databases in Azure using tools that you’re a little more familiar and comfortable with – for example – SQL Management Studio, how do you go about connecting? You could read the help article from Microsoft, or you can follow my intuitive screen-based instructions, below: Assumptions 1. I’m assuming you have a version of SQL Management Studio already installed. I believe you’ll need at least SQL Server 2008 R2’s version or newer 2. I’m further assuming you’ve already created a SQL Database in Azure Steps to Connect SSMS to SQL Azure 1. Authenticate to the Azure Portal 2. Click on SQL Databases 3. Click on Servers 4. Click on the name of the Server you wish to connect to… 5. Click on Configure… If not already in place, click on ‘Add to the allowed IP addresses’ to add your current IP address (or specify an address you wish to connect from) and click ‘Save’ 6. Open SQL Management Studio and connect to Database services (usually comes up by default) Enter the fully qualified server name (.database.windows.net) Change to SQL Server Authentication Enter the login preferred (if a new database, the username you specified when yuo created the DB server) Enter the correct password 7. Hit the Connect button Troubleshooting Ensure you have the appropriate ports open outbound from your local network or connection (typically port 1433) Ensure you have allowed the correct public IP address you’re trying to connect from via the Azure Portal (steps 1-5 above) Ensure you are using the correct server name and user name For SSMS, this is the server name (in step 4) followed by .database.windows.net Ensure you are using SQL Server Authentication For SSMS the username format is If you forgot the password of your username, you can reset the password in the Azure Portal, in step 4, click on Dashboard: Lastly… You can click on the Database (in step 2) to see your connection options:

September 25, 2013

by Rob Sanders

· 262,940 Views

Using Grep from Inside Vim

"this is my rifle. there are many like it, but this one is mine." - rifleman’s creed there are a thousand ways to grep over files. most developers i have observed keep a separate command line open just for searching. a few use an ide that has file search built in. personally, i use a couple of vim macros. in vim, you can execute a cross-file search with something like :vimgrep /dostuff()/j ../**/*.c . i don’t know about you, but the first time i saw that syntax my brain simply refused. instead, i have the following in my .vimrc file: " opens search results in a window w/ links and highlight the matches command! -nargs=+ grep execute 'silent grep! -i -r -n --exclude *.{json,pyc} . -e ' | copen | execute 'silent /' " shift-control-* greps for the word under the cursor :nmap g :grep =expand("") the first command is just a simple alias for the above mentioned native grep. like all custom commands, it must start with a capital letter (to differentiate it from native commands). you simply type :grep foobar and it will search in your current directory through all file extensions (except .json and .pyc -- you can add more to the blacklist). it also displays the results in a nice little buffer window, which you can navigate through with normal hjkl keys, and open matches in the main editor window. the second line is a key mapping that will grep for the word currently under the cursor. you can just navigate to a word and hit leader-g to issue the grep command.

September 24, 2013

by Chase Seibert

· 7,469 Views

Tuning Linux I/O Scheduler for SSDs

This post comes from Michael Rice at the NuoDB Techblog. Tuning Linux I/O Scheduler for SSDs Hello Techblog readers! I'm going to talk about tuning the Linux I/O scheduler to increase throughput and decrease latency on an SSD. I'll also cover another interesting topic about tuning the performance of NuoDB by specifying storage for the archive and journal directories. Tuning the Linux I/O Scheduler Linux gives you the option to select the I/O scheduler. The scheduler can be changed without rebooting, too! You may be asking at this point, "why would I ever want to change the I/O scheduler?" Changing the scheduler makes sense when the overhead of optimizing the I/O (re-ordering I/O requests) is unnecessary and expensive. This setting should be fine-tuned per storage device. The best setting for an SSD will not be a good setting for an HDD. The current I/O scheduler can be viewed by typing the following command: mrice@host:~$ cat /sys/block/sda/queue/scheduler noop anticipatory deadline [cfq] The current I/O scheduler (in brackets) for /dev/sda on this machine is CFQ, Completely Fair Queuing. This setting became the default in kernel 2.6.18 and it works well for HDDs. However, SSDs don't have rotating platters or magnetic heads. The I/O optimization algorithms in CFQ don't apply to SSDs. For an SSD, the NOOP I/O scheduler can reduce I/O latency and increase throughput as well as eliminate the CPU time spent re-ordering I/O requests. This scheduler typically works well on SANs, SSDs, Virtual Machines, and even fancy Fusion I/O cards. At this point you're probably thinking "OK, I'm sold! How do I change the scheduler already?" You can use the echo command as shown below: mrice@host:~$ echo noop | sudo tee /sys/block/sda/queue/scheduler To see the change, just cat the scheduler again. mrice@host:~$ cat /sys/block/sda/queue/scheduler [noop] anticipatory deadline cfq Notice that noop is now selected in brackets. This change is only temporary and will reset back to the default scheduler, CFQ in this case, when the machine reboots. You need to edit the Grub configuration in order to keep the setting permanently. However, this will change the I/O scheduler for all block devices. The problem is that NOOP is not a good choice for HDD. I'd only permanently change the setting if the machine only has SSDs. On Grub 2: Edit: /etc/default/grub Add "elevator=noop" to the GRUB_CMDLINE_LINUX_DEFAULT line. sudo update-grub At this point, you've changed the I/O scheduler to NOOP. How do you know if it made a difference? You could run a benchmark against it and compare the numbers (just remember to flush the file-system cache). The other way is to take a look at the output from iostat. I/O requests spend less time in the queue with the NOOP I/O scheduler. This can be seen with in the "await" field from iostat. Here's an example of a larger write operation with NOOP. iostat -x 1 /dev/sda Device: sda rrqm/s 0.00 wrqm/s 143819.00 r/s 6.00 w/s 2881.00 rkB/s 24.00 wkB/s 586800.00 avgrq-sz 406.53 avgqu-sz 0.94 await 0.33 r_await 3.33 w_await 0.32 svctm 0.11 %util 31.00 Tuning NuoDB Performance Now that you've learned about the NOOP I/O scheduler, I'll talk about tuning NuoDB with an SSD. If you've read the tech blogs you'll know that there are two building blocks for a NuoDB database: the Transaction Engine, TE for short, and the Storage Manager, SM. The TE is an in-memory only copy of the database (actually a portion of the database). As a result, an SSD won't help the performance of a TE because, it doesn't store atoms to disk. The SM contains two modules that write to disk: the archive and the journal. The archive stores atoms to disk when archive configuration parameter points to a file system (versus HDFS and S3). The journal, on the other hand, synchronously writes messages to disk. If you read the blog post on durability, you may remember that the "Remote Commit with Journaling" setting provides the highest level of durability but at the cost of slower speed. Using an SSD in this situation can drastically improve performance. To tune this setting, we'll need to make a nuodb directory on the SSD: mkdir -p /ssd/nuodb The SSD in this example has the mount point /ssd on this machine. Easy, right? I'm assuming you've already set the Linux I/O scheduler for the SSD to NOOP. The next step is to configure NuoDB to use this path when creating the journal directory. The journal has a direct correlation on the transaction throughput because the journal has to finish the disk write before an ACK for the transaction commit is sent by the SM back to the TE. What about the archive? The archive is decoupled from the transaction commit, the atoms will remain in memory and gradually make their way to disk. This will have very little effect on the TPS of the database. As a result, the archive directory can be placed on just a regular HDD. The quick guide for tuning performance is to put the journal on an SSD and archive can be placed on an HDD. Here are the commands: nuodbmgr --broker localhost --password bird nuodb [domain] > start process sm Database: hockey Host: s1 Process command-line options: --journal enable --journal-dir /ssd/nuodb/demo-journal Archive directory: /var/opt/nuodb/demo-archives Initialize archive: true Started: [SM] s1/127.0.0.1:37880 [ pid = 25036 ] ACTIVE nuodb [domain/hockey] > start process te Host: t1 Process command-line options: --dba-user dba --dba-password goalie Started: [TE] t1/127.0.0.1:48315 [ pid = 25052 ] ACTIVE The important point with this tuned configuration is that only the journal is on an SSD. As a result, the SSD doesn't have to be one of these massive TB drives. The costs of SSDs have significantly dropped in price and a single 128 GB or 256 GB SSD would be adequate for the journal data. This configuration should rival the local commit performance which is wicked awesome considering it is the highest durability level! I encourage you to try it out and ask some questions.

September 12, 2013

by Seth Proctor

· 22,533 Views

Introducing the Spring YARN framework for Developing Apache Hadoop YARN Applications

Originally posted on the SpringSource blog by Janne Valkealahti We're super excited to let the cat out of the bag and release support for writing YARN based applications as part of the Spring for Apache Hadoop 2.0 M1 release. In this blog post I’ll introduce you to YARN, what you can do with it, and how Spring simplifies the development of YARN based applications. If you have been following the Hadoop community over the past year or two, you’ve probably seen a lot of discussions around YARN and the next version of Hadoop's MapReduce called MapReduce v2. YARN (Yet Another Resource Negotiator) is a component of the MapReduce project created to overcome some performance issues in Hadoop's original design. The fundamental idea of MapReduce v2 is to split the functionalities of the JobTracker, Resource Management and Job Scheduling/Monitoring, into separate daemons. The idea is to have a global Resource Manager (RM) and a per-application Application Master (AM). A generic diagram for YARN component dependencies can be found from YARN architecture. MapReduce Version 2 is an application running on top of YARN. It is also possible to make similar custom YARN based application which have nothing to do with MapReduce, it is simply running YARN application. However, writing a custom YARN based application is difficult. The YARN APIs are low-level infrastructure APIs and not developer APIs. Take a look at thedocumentation for writing a YARN application to get an idea of what is involved. Starting with the 2.0 version, Spring for Apache Hadoop introduces the Spring YARN sub-project to provide support for building Spring based YARN applications. This support for YARN steps in by trying to make development easier. “Spring handles the infrastructure so you can focus on your application” applies to writing Hadoop applications as well as other types of Java applications. Spring’s YARN support also makes it easier to test your YARN application. With Spring’s YARN support, you're going to use all familiar concepts of Spring Framework itself, including configuration and generally speaking what you can do in your application. At a high level, Spring YARN provides three different components, YarnClient, YarnAppmaster andYarnContainer which together can be called a Spring YARN Application. We provide default implementations for all components while still giving the end user an option to customize as much as he or she wants. Lets take a quick look at a very simplistic Spring YARN application which runs some custom code in a Hadoop cluster. The YarnClient is used to communicate with YARN's Resource Manager. This provides management actions like submitting new application instances, listing applications and killing running applications. When submitting applications from the YarnClient, the main concerns relate to how the Application Master is configured and launched. Both the YarnAppmaster andYarnContainer share the same common launch context config logic so you'll see a lot of similarities in YarnClient and YarnAppmaster configuration. Similar to how the YarnClient will define the launch context for the YarnAppmaster, the YarnAppmaster defines the launch context for the YarnContainer. The Launch context defines the commands to start the container, localized files, command line parameters, environment variables and resource limits(memory, cpu). The YarnContainer is a worker that does the heavy lifting of what a YARN application will actually do. The YarnAppmaster is communicating with YARN Resource Manager and starts and stops YarnContainers accordingly. You can create a Spring application that launches an ApplicationMaster by using the YARN XML namespace to define a Spring Application Context. Context configuration for YarnClient defines the launch context for YarnAppmaster. This includes resources and libraries needed by YarnAppmaster and its environment settings. An example of this is shown below. Note: Future releases will provide a Java based API for configuration, similar to what is done in Spring Security 3.2. The purpose of YarnAppmaster is to control the instance of a running application. TheYarnAppmaster is responsible for controlling the lifecycle of all its YarnContainers, the whole running application after the application is submitted, as well as itself. The example above is defining a context configuration for the YarnAppmaster. Similar to what we saw in YarnClient configuration, we define local resources for the YarnContainer and its environment. The classpath setting picks up hadoop jars as well as your own application jars in default locations, change the setting if you want to use non-default directories. Also within theYarnAppmaster we define components handling the container allocation and bootstrapping. Allocator component is interacting with YARN resource manager handling the resource scheduling. Runner component is responsible for bootstrapping of allocated containers. Above example defines a simple YarnContainer context configuration. To implement the functionality of the container, you implement the interface YarnContainer. The YarnContainer interface is similar to Java’s Runnable interface, its has a run() method, as well as two additional methods related to getting environment and command line information. Below is a simple hello world application that will be run inside of a YARN container: public class MyCustomYarnContainer implements YarnContainer { private static final Log log = LogFactory.getLog(MyCustomYarnContainer.class); @Override public void run() { log.info("Hello from MyCustomYarnContainer"); } @Override public void setEnvironment(Map environment) {} @Override public void setParameters(Properties parameters) {} } We just showed the configuration of a Spring YARN Application and the core application logic so what remains is how to bootstrap the application to run inside the Hadoop cluster. The utility class, CommandLineClientRunner provides this functionality. You can you use CommandLineClientRunner either manually from a command line or use it from your own code. # java -cp org.springframework.yarn.client.CommandLineClientRunner application-context.xml yarnClient -submit A Spring YARN Application is packaged into a jar file which then can be transferred into HDFS with the rest of the dependencies. A YarnClient can transfer all needed libraries into HDFS during the application submit process but generally speaking it is more advisable to do this manually in order to avoid unnecessary network I/O. Your application wont change until new version is created so it can be copied into HDFS prior the first application submit. You can i.e. use Hadoop's hdfs dfs -copyFromLocal command. Below you can see an example of a typical project setup. src/main/java/org/example/MyCustomYarnContainer.java src/main/resources/application-context.xml src/main/resources/appmaster-context.xml src/main/resources/container-context.xml As a wild guess, we'll make a bet that you have now figured out that you are not actually configuring YARN, instead you are configuring Spring Application Contexts for all three components, YarnClient, YarnAppmaster and YarnContainer. We have just scratched the surface of what we can do with Spring YARN. While we’re preparing more blog posts, go ahead and check existing samples in GitHub. Basically, to reflect the concepts we described in this blog post, see the multi-context example in our samples repository. Future blog posts will cover topics like Unit Testing and more advanced YARN application development.

September 11, 2013

by Pieter Humphrey

· 13,727 Views

Assigning UUIDs to Neo4j Nodes and Relationships

TL;DR: This blog post features a small demo project on github: neo4j-uuid and explains how to automatically assign UUIDs to nodes and relationships in Neo4j. A very brief introduction into Neo4j 1.9′s KernelExtensionFactory is included as well. A Little Rant on Neo4j Node/Relationship IDs In a lot of use cases there is demand for storing a reference to a Neo4j node or relationship in a third party system. The first naive idea probably is to use the internal node/relationship id that Neo4j provides. Do not do that! Ever! You ask why? Well, Neo4j’s id is basically a offset in one of the store files Neo4j uses (with some math involved). Assume you delete couple of nodes. This produces holes in the store files that Neo4j might reclaim when creating new nodes later on. And since the id is a file offset there is a chance that the new node will have exactly the same id like the previously deleted node. If you don’t synchronously update all node id references stored elsewhere, you’re in trouble. If neo4j would be completely redeveloped from scratch the getId() method would not be part of the public API. As long as you use node ids only inside a request of an application for example, there’s nothing wrong. To repeat myself: Never ever store a node id in a third party system. I have officially warned you. UUIDs Enough of ranting, let’s see what we can do to safely store node references in an external system. Basically we need an identifier that has no semantics in contrast to the node id. A common approach to this is using Universally Unique Identifiers (UUID). Java JDK offers a UUID implementation, so we could potentially use UUID.randomUUID(). Unfortunately random UUIDs are slow to generate. A preferred approach is to use the machine’s MAC and a timestamp as base for the UUID – this should provide enough uniqueness. There a nice library out there at http://wiki.fasterxml.com/JugHome providing exactly what we need. Automatic UUID Assignments For convenience it would be great if all fresh created nodes and relationships get automatically assigned a uuid property without doing this explicitly. Fortunately Neo4j supports TransactionEventHandlers, a callback interface pluging into transaction handling. A TransactionEventHandler has a chance to modify or veto any transaction. It’s a sharp tool which can have significant negative performance impact if used the wrong way. I’ve implemented a UUIDTransactionEventHandler that performs the following tasks: Populate a UUID property for each new node or relationship Reject a transaction if a manual modification of a UUID is attempted; either assignment or removal public class UUIDTransactionEventHandler implements TransactionEventHandler { public static final String UUID_PROPERTY_NAME = "uuid"; private final TimeBasedGenerator uuidGenerator = Generators.timeBasedGenerator(); @Override public Object beforeCommit(TransactionData data) throws Exception { checkForUuidChanges(data.removedNodeProperties(), "remove"); checkForUuidChanges(data.assignedNodeProperties(), "assign"); checkForUuidChanges(data.removedRelationshipProperties(), "remove"); checkForUuidChanges(data.assignedRelationshipProperties(), "assign"); populateUuidsFor(data.createdNodes()); populateUuidsFor(data.createdRelationships()); return null; } @Override public void afterCommit(TransactionData data, java.lang.Object state) { } @Override public void afterRollback(TransactionData data, java.lang.Object state) { } /** * @param propertyContainers set UUID property for a iterable on nodes or relationships */ private void populateUuidsFor(Iterable propertyContainers) { for (PropertyContainer propertyContainer : propertyContainers) { if (!propertyContainer.hasProperty(UUID_PROPERTY_NAME)) { final UUID uuid = uuidGenerator.generate(); final StringBuilder sb = new StringBuilder(); sb.append(Long.toHexString(uuid.getMostSignificantBits())).append(Long.toHexString(uuid.getLeastSignificantBits())); propertyContainer.setProperty(UUID_PROPERTY_NAME, sb.toString()); } } } private void checkForUuidChanges(Iterable> changeList, String action) { for (PropertyEntry removedProperty : changeList) { if (removedProperty.key().equals(UUID_PROPERTY_NAME)) { throw new IllegalStateException("you are not allowed to " + action + " " + UUID_PROPERTY_NAME + " properties"); } } } } Setting up Using KernelExtensionFactory There are two remaining tasks for full automation of UUID assignments: We need to setup autoindexing for uuid properties to have a convenient way to look up nodes or relationships by UUID We need to register UUIDTransactionEventHandler with the graph database Since version 1.9 Neo4j has the notion of KernelExtensionFactory. Using KernelExtensionFactory you can supply a class that receives lifecycle callbacks when e.g. Neo4j is started or stopped. This is the right place for configuring autoindexing and setting up the TransactionEventHandler. Since JVM’s ServiceLoader is used KernelExtenstionFactories need to be registered in a file META-INF/services/org.neo4j.kernel.extension.KernelExtensionFactory by listing all implementations you want to use: org.neo4j.extension.uuid.UUIDKernelExtensionFactory KernelExtensionFactories can declare dependencies, therefore declare a inner interface (“Dependencies” in code) below that just has getters. Using proxies Neo4j will implement this class and supply you with the required dependencies. The dependencies are match on requested type, see Neo4j’s source code what classes are supported for being dependencies. KernelExtensionFactories must implement a newKernelExtension method that is supposed to return a instance of LifeCycle. For our UUID project we return a instance of UUIDLifeCycle: package org.neo4j.extension.uuid; import org.neo4j.graphdb.GraphDatabaseService; import org.neo4j.graphdb.PropertyContainer; import org.neo4j.graphdb.event.TransactionEventHandler; import org.neo4j.graphdb.factory.GraphDatabaseSettings; import org.neo4j.graphdb.index.AutoIndexer; import org.neo4j.graphdb.index.IndexManager; import org.neo4j.kernel.configuration.Config; import org.neo4j.kernel.lifecycle.LifecycleAdapter; import java.util.Map; /** * handle the setup of auto indexing for UUIDs and registers a {@link UUIDTransactionEventHandler} */ class UUIDLifeCycle extends LifecycleAdapter { private TransactionEventHandler transactionEventHandler; private GraphDatabaseService graphDatabaseService; private IndexManager indexManager; private Config config; UUIDLifeCycle(GraphDatabaseService graphDatabaseService, Config config) { this.graphDatabaseService = graphDatabaseService; this.indexManager = graphDatabaseService.index(); this.config = config; } /** * since {@link org.neo4j.kernel.NodeAutoIndexerImpl#start()} is called *after* {@link org.neo4j.extension.uuid.UUIDLifeCycle#start()} it would apply config settings for auto indexing. To prevent this we change config here. * @throws Throwable */ @Override public void init() throws Throwable { Map params = config.getParams(); params.put(GraphDatabaseSettings.node_auto_indexing.name(), "true"); params.put(GraphDatabaseSettings.relationship_auto_indexing.name(), "true"); config.applyChanges(params); } @Override public void start() throws Throwable { startUUIDIndexing(indexManager.getNodeAutoIndexer()); startUUIDIndexing(indexManager.getRelationshipAutoIndexer()); transactionEventHandler = new UUIDTransactionEventHandler(); graphDatabaseService.registerTransactionEventHandler(transactionEventHandler); } @Override public void stop() throws Throwable { stopUUIDIndexing(indexManager.getNodeAutoIndexer()); stopUUIDIndexing(indexManager.getRelationshipAutoIndexer()); graphDatabaseService.unregisterTransactionEventHandler(transactionEventHandler); } void startUUIDIndexing(AutoIndexer autoIndexer) { autoIndexer.startAutoIndexingProperty(UUIDTransactionEventHandler.UUID_PROPERTY_NAME); } void stopUUIDIndexing(AutoIndexer autoIndexer) { autoIndexer.stopAutoIndexingProperty(UUIDTransactionEventHandler.UUID_PROPERTY_NAME); } } Most of the code is pretty much straight forward, l.44/45 set up autoindexing for uuid property. l48 registers the UUIDTransactionEventHandler with the graph database. Not that obvious is the code in the init() method. Neo4j’s NodeAutoIndexerImpl configures autoindexing itself and switches it on or off depending on the respective config option. However we want to have autoindexing always switched on. Unfortunately NodeAutoIndexerImpl is run after our code and overrides our settings. That’s we l.37-40 tweaks the config settings to force nice behaviour of NodeAutoIndexerImpl. Looking up Nodes or Relationships for UUID For completeness the project also contains a trivial unmanaged extension for looking up nodes and relationships using the REST interface, see UUIDRestInterface. By sending a HTTP GET to http://localhost:7474/db/data/node/ the node’s internal id returned. Build System and Testing For building the project, Gradle is used; build.gradle is trivial. Of course couple of tests are included. As a long standing addict I’ve obviously used Spock for testing. See the test code here. Final Words A downside of this implementation is that each and every node and relationships gets indexed. Indexing always trades write performance for read performance. Keep that in mind. It might make sense to get rid of unconditional auto indexing and put some domain knowledge into the TransactionEventHandler to assign only those nodes uuids and index them that are really used for storing in an external system.

August 22, 2013

by Stefan Armbruster

· 11,712 Views

OpenStack Savanna: Fast Hadoop Cluster Provisioning on OpenStack

introduction openstack is one of the most popular open source cloud computing projects to provide infrastructure as a service solution. its key components are compute (nova), networking (neutron, formerly known as quantum), storage (object and block storage, swift and cinder, respectively), openstack dashboard (horizon), identity service (keystone) and image service (glance). there are other official incubated projects like metering (celiometer) and orchestration and service definition (heat). savanna is a hadoop as a service for openstack introduced by mirantis . it is still in an early phase (version .02 was released in summer 2013) and according to its roadmap version 1.0 is targeted for official openstack incubation. in principle, heat also could be used for hadoop cluster provisioning but savanna is especially tuned for providing hadoop-specific api functionality while heat is meant to be used for generic purposes. savanna architecture savanna is integrated with the core openstack components such as keystone, nova, glance, swift and horizon. it has a rest api that supports the hadoop cluster provisioning steps. savanna api is implemented as a wsgi server that, by default, listens to port 8386. in addition, savanna can also be integrated with horizon, the openstack dashboard to create a hadoop cluster from the management console. savanna also comes with a vanilla plugin that deploys a hadoop cluster image. the standard out-of-the-box vanilla plugin supports hadoop 1.1.2 version. installing savanna the simplest option to try out savanna is to use devstack in a virtual machine. i was using an ubuntu 12.04 virtual instance in my tests. in that environment we need to execute the following commands to install devstack and savanna api: $ sudo apt-get install git-core $ git clone https://github.com/openstack-dev/devstack.git $ vi localrc # edit localrc admin_password=nova mysql_password=nova rabbit_password=nova service_password=$admin_password service_token=nova # enable swift enabled_services+=,swift swift_hash=66a3d6b56c1f479c8b4e70ab5c2000f5 swift_replicas=1 swift_data_dir=$dest/data # force checkout prerequsites # force_prereq=1 # keystone is now configured by default to use pki as the token format which produces huge tokens. # set uuid as keystone token format which is much shorter and easier to work with. keystone_token_format=uuid # change the floating_range to whatever ips vm is working in. # in nat mode it is subnet vmware fusion provides, in bridged mode it is your local network. floating_range=192.168.55.224/27 # enable auto assignment of floating ips. by default savanna expects this setting to be enabled extra_opts=(auto_assign_floating_ip=true) # enable logging screen_logdir=$dest/logs/screen $ ./stack.sh # this will take a while to execute $ sudo apt-get install python-setuptools python-virtualenv python-dev $ virtualenv savanna-venv $ savanna-venv/bin/pip install savanna $ mkdir savanna-venv/etc $ cp savanna-venv/share/savanna/savanna.conf.sample savanna-venv/etc/savanna.conf # to start savanna api: $ savanna-venv/bin/python savanna-venv/bin/savanna-api --config-file savanna-venv/etc/savanna.conf to install savanna ui integrated with horizon, we need to run the following commands: $ sudo pip install savanna-dashboard $ cd /opt/stack/horizon/openstack-dashboard $ vi settings.py horizon_config = { 'dashboards': ('nova', 'syspanel', 'settings', 'savanna'), installed_apps = ( 'savannadashboard', .... $ cd /opt/stack/horizon/openstack-dashboard/local $ vi local_settings.py savanna_url = 'http://localhost:8386/v1.0' $ sudo service apache2 restart provisioning a hadoop cluster as a first step, we need to configure keystone-related environment variables to get the authentication token: ubuntu@ip-10-59-33-68:~$ vi .bashrc $ export os_auth_url=http://127.0.0.1:5000/v2.0/ $ export os_tenant_name=admin $ export os_username=admin $ export os_password=nova ubuntu@ip-10-59-33-68:~$ source .bashrc ubuntu@ip-10-59-33-68:~$ ubuntu@ip-10-59-33-68:~$ env | grep os os_password=nova os_auth_url=http://127.0.0.1:5000/v2.0/ os_username=admin os_tenant_name=admin ubuntu@ip-10-59-33-68:~$ keystone token-get +-----------+----------------------------------+ | property | value | +-----------+----------------------------------+ | expires | 2013-08-09t20:31:12z | | id | bdb582c836e3474f979c5aa8f844c000 | | tenant_id | 2f46e214984f4990b9c39d9c6222f572 | | user_id | 077311b0a8304c8e86dc0dc168a67091 | +-----------+----------------------------------+ $ export auth_token="bdb582c836e3474f979c5aa8f844c000" $ export tenant_id="2f46e214984f4990b9c39d9c6222f572" then we need to create the glance image that we want to use for our hadoop cluster. in our example we have used mirantis's vanilla image but we can also build our own image: $ wget http://savanna-files.mirantis.com/savanna-0.2-vanilla-1.1.2-ubuntu-12.10.qcow2 $ glance image-create --name=savanna-0.2-vanilla-hadoop-ubuntu.qcow2 --disk-format=qcow2 --container-format=bare < ./savanna-0.2-vanilla-1.1.2-ubuntu-12.10.qcow2 ubuntu@ip-10-59-33-68:~/devstack$ glance image-list +--------------------------------------+-----------------------------------------+-------------+------------------+-----------+--------+ | id | name | disk format | container format | size | status | +--------------------------------------+-----------------------------------------+-------------+------------------+-----------+--------+ | d0d64f5c-9c15-4e7b-ad4c-13859eafa7b8 | cirros-0.3.1-x86_64-uec | ami | ami | 25165824 | active | | fee679ee-e0c0-447e-8ebd-028050b54af9 | cirros-0.3.1-x86_64-uec-kernel | aki | aki | 4955792 | active | | 1e52089b-930a-4dfc-b707-89b568d92e7e | cirros-0.3.1-x86_64-uec-ramdisk | ari | ari | 3714968 | active | | d28051e2-9ddd-45f0-9edc-8923db46fdf9 | savanna-0.2-vanilla-hadoop-ubuntu.qcow2 | qcow2 | bare | 551699456 | active | +--------------------------------------+-----------------------------------------+-------------+------------------+-----------+--------+ $ export image_id=d28051e2-9ddd-45f0-9edc-8923db46fdf9 then we have installed httpie , an open source http client that can be used to send rest requests to savanna api: $ sudo pip install httpie from now on we will use httpie to send savanna commands. we need to register the image with savanna: $ export savanna_url="http://localhost:8386/v1.0/$tenant_id" $ http post $savanna_url/images/$image_id x-auth-token:$auth_token username=ubuntu http/1.1 202 accepted content-length: 411 content-type: application/json date: thu, 08 aug 2013 21:28:07 gmt { "image": { "os-ext-img-size:size": 551699456, "created": "2013-08-08t21:05:55z", "description": "none", "id": "d28051e2-9ddd-45f0-9edc-8923db46fdf9", "metadata": { "_savanna_description": "none", "_savanna_username": "ubuntu" }, "mindisk": 0, "minram": 0, "name": "savanna-0.2-vanilla-hadoop-ubuntu.qcow2", "progress": 100, "status": "active", "tags": [], "updated": "2013-08-08t21:28:07z", "username": "ubuntu" } } $ http $savanna_url/images/$image_id/tag x-auth-token:$auth_token tags:='["vanilla", "1.1.2", "ubuntu"]' http/1.1 202 accepted content-length: 532 content-type: application/json date: thu, 08 aug 2013 21:29:25 gmt { "image": { "os-ext-img-size:size": 551699456, "created": "2013-08-08t21:05:55z", "description": "none", "id": "d28051e2-9ddd-45f0-9edc-8923db46fdf9", "metadata": { "_savanna_description": "none", "_savanna_tag_1.1.2": "true", "_savanna_tag_ubuntu": "true", "_savanna_tag_vanilla": "true", "_savanna_username": "ubuntu" }, "mindisk": 0, "minram": 0, "name": "savanna-0.2-vanilla-hadoop-ubuntu.qcow2", "progress": 100, "status": "active", "tags": [ "vanilla", "ubuntu", "1.1.2" ], "updated": "2013-08-08t21:29:25z", "username": "ubuntu" } } then we need to create a nodegroup templates (json files) that will be sent to savanna. there is one template for the master nodes ( namenode , jobtracker ) and another template for the worker nodes such as datanode and tasktracker . the hadoop version is 1.1.2. $ vi ng_master_template_create.json { "name": "test-master-tmpl", "flavor_id": "2", "plugin_name": "vanilla", "hadoop_version": "1.1.2", "node_processes": ["jobtracker", "namenode"] } $ vi ng_worker_template_create.json { "name": "test-worker-tmpl", "flavor_id": "2", "plugin_name": "vanilla", "hadoop_version": "1.1.2", "node_processes": ["tasktracker", "datanode"] } $ http $savanna_url/node-group-templates x-auth-token:$auth_token < ng_master_template_create.json http/1.1 202 accepted content-length: 387 content-type: application/json date: thu, 08 aug 2013 21:58:00 gmt { "node_group_template": { "created": "2013-08-08t21:58:00", "flavor_id": "2", "hadoop_version": "1.1.2", "id": "b3a79c88-b6fb-43d2-9a56-310218c66f7c", "name": "test-master-tmpl", "node_configs": {}, "node_processes": [ "jobtracker", "namenode" ], "plugin_name": "vanilla", "updated": "2013-08-08t21:58:00", "volume_mount_prefix": "/volumes/disk", "volumes_per_node": 0, "volumes_size": 10 } } $ http $savanna_url/node-group-templates x-auth-token:$auth_token < ng_worker_template_create.json http/1.1 202 accepted content-length: 388 content-type: application/json date: thu, 08 aug 2013 21:59:41 gmt { "node_group_template": { "created": "2013-08-08t21:59:41", "flavor_id": "2", "hadoop_version": "1.1.2", "id": "773b2cfb-1e05-46f4-923f-13edc7d6aac6", "name": "test-worker-tmpl", "node_configs": {}, "node_processes": [ "tasktracker", "datanode" ], "plugin_name": "vanilla", "updated": "2013-08-08t21:59:41", "volume_mount_prefix": "/volumes/disk", "volumes_per_node": 0, "volumes_size": 10 } } the next step is to define the cluster template: $ vi cluster_template_create.json { "name": "demo-cluster-template", "plugin_name": "vanilla", "hadoop_version": "1.1.2", "node_groups": [ { "name": "master", "node_group_template_id": "b3a79c88-b6fb-43d2-9a56-310218c66f7c", "count": 1 }, { "name": "workers", "node_group_template_id": "773b2cfb-1e05-46f4-923f-13edc7d6aac6", "count": 2 } ] } $ http $savanna_url/cluster-templates x-auth-token:$auth_token < cluster_template_create.json http/1.1 202 accepted content-length: 815 content-type: application/json date: fri, 09 aug 2013 07:04:24 gmt { "cluster_template": { "anti_affinity": [], "cluster_configs": {}, "created": "2013-08-09t07:04:24", "hadoop_version": "1.1.2", "id": "{ "name": "cluster-1", "plugin_name": "vanilla", "hadoop_version": "1.1.2", "cluster_template_id" : "64c4117b-acee-4da7-937b-cb964f0471a9", "user_keypair_id": "stack", "default_image_id": "3f9fc974-b484-4756-82a4-bff9e116919b" }", "name": "demo-cluster-template", "node_groups": [ { "count": 1, "flavor_id": "2", "name": "master", "node_configs": {}, "node_group_template_id": "b3a79c88-b6fb-43d2-9a56-310218c66f7c", "node_processes": [ "jobtracker", "namenode" ], "volume_mount_prefix": "/volumes/disk", "volumes_per_node": 0, "volumes_size": 10 }, { "count": 2, "flavor_id": "2", "name": "workers", "node_configs": {}, "node_group_template_id": "773b2cfb-1e05-46f4-923f-13edc7d6aac6", "node_processes": [ "tasktracker", "datanode" ], "volume_mount_prefix": "/volumes/disk", "volumes_per_node": 0, "volumes_size": 10 } ], "plugin_name": "vanilla", "updated": "2013-08-09t07:04:24" } } now we are ready to create the hadoop cluster: $ vi cluster_create.json { "name": "cluster-1", "plugin_name": "vanilla", "hadoop_version": "1.1.2", "cluster_template_id" : "64c4117b-acee-4da7-937b-cb964f0471a9", "user_keypair_id": "savanna", "default_image_id": "d28051e2-9ddd-45f0-9edc-8923db46fdf9" } $ http $savanna_url/clusters x-auth-token:$auth_token < cluster_create.json http/1.1 202 accepted content-length: 1153 content-type: application/json date: fri, 09 aug 2013 07:28:14 gmt { "cluster": { "anti_affinity": [], "cluster_configs": {}, "cluster_template_id": "64c4117b-acee-4da7-937b-cb964f0471a9", "created": "2013-08-09t07:28:14", "default_image_id": "d28051e2-9ddd-45f0-9edc-8923db46fdf9", "hadoop_version": "1.1.2", "id": "d919f1db-522f-45ab-aadd-c078ba3bb4e3", "info": {}, "name": "cluster-1", "node_groups": [ { "count": 1, "created": "2013-08-09t07:28:14", "flavor_id": "2", "instances": [], "name": "master", "node_configs": {}, "node_group_template_id": "b3a79c88-b6fb-43d2-9a56-310218c66f7c", "node_processes": [ "jobtracker", "namenode" ], "updated": "2013-08-09t07:28:14", "volume_mount_prefix": "/volumes/disk", "volumes_per_node": 0, "volumes_size": 10 }, { "count": 2, "created": "2013-08-09t07:28:14", "flavor_id": "2", "instances": [], "name": "workers", "node_configs": {}, "node_group_template_id": "773b2cfb-1e05-46f4-923f-13edc7d6aac6", "node_processes": [ "tasktracker", "datanode" ], "updated": "2013-08-09t07:28:14", "volume_mount_prefix": "/volumes/disk", "volumes_per_node": 0, "volumes_size": 10 } ], "plugin_name": "vanilla", "status": "validating", "updated": "2013-08-09t07:28:14", "user_keypair_id": "savanna" } } after a while we can run the nova command to check if the instances are created and running: $ nova list +--------------------------------------+-----------------------+--------+------------+-------------+----------------------------------+ | id | name | status | task state | power state | networks | +--------------------------------------+-----------------------+--------+------------+-------------+----------------------------------+ | 1a9f43bf-cddb-4556-877b-cc993730da88 | cluster-1-master-001 | active | none | running | private=10.0.0.2, 192.168.55.227 | | bb55f881-1f96-4669-a94a-58cbf4d88f39 | cluster-1-workers-001 | active | none | running | private=10.0.0.3, 192.168.55.226 | | 012a24e2-fa33-49f3-b051-9ee2690864df | cluster-1-workers-002 | active | none | running | private=10.0.0.4, 192.168.55.225 | +--------------------------------------+-----------------------+--------+------------+-------------+----------------------------------+ now we can log in to the hadoop master instance and run the required hadoop commands: $ ssh -i savanna.pem [email protected] $ sudo chmod 777 /usr/share/hadoop $ sudo su hadoop $ cd /usr/share/hadoop $ hadoop jar hadoop-example-1.1.2.jar pi 10 100 savanna ui via horizon in order to create nodegroup templates, cluster templates and the cluster itself we used a command line tool -- httpie -- to send rest api calls. the same functionality is also available via horizon, the standard openstack dashboard. first we need to register the image with savanna: then we need to create the nodegroup templates: after that we have to create the cluster template: and finally we have to create the cluster:

August 20, 2013

by Istvan Szegedi

· 9,499 Views

130+ Essential Vim Commands

Since the 70′s, vi and vim are very popular text editors among programmers. 5 years ago, I wrote an article named “100 vim commands every programmer should know” and here is a reworked, updated version. Enjoy! Basics :e filename Open filename for edition :w Save file :q Exit Vim :q! Quit without saving :x Write file (if changes has been made) and exit :sav filename Saves file as filename . Repeats the last change made in normal mode 5. Repeats 5 times the last change made in normal mode Moving in the file k or Up Arrow move the cursor up one line j or Down Arrow move the cursor down one line e move the cursor to the end of the word b move the cursor to the begining of the word 0 move the cursor to the begining of the line G move the cursor to the end of the file gg move the cursor to the begining of the file L move the cursor to the bottom of the screen :59 move cursor to line 59. Replace 59 by the desired line number. 20| move cursor to column 20. % Move cursor to matching parenthesis [[ Jump to function start [{ Jump to block start Cut, copy & paste y Copy the selected text to clipboard p Paste clipboard contents dd Cut current line yy Copy current line y$ Copy to end of line D Cut to end of line Search /word Search word from top to bottom ?word Search word from bottom to top * Search the word under cursor /\cstring Search STRING or string, case insensitive /jo[ha]n Search john or joan /\< the Search the, theatre or then /the\> Search the or breathe /\< the\> Search the /\< ¦.\> Search all words of 4 letters /\/ Search fred but not alfred or frederick /fred\|joe Search fred or joe /\<\d\d\d\d\> Search exactly 4 digits /^\n\{3} Find 3 empty lines :bufdo /searchstr/ Search in all open files bufdo %s/something/somethingelse/g Search something in all the open buffers and replace it with somethingelse Replace :%s/old/new/g Replace all occurences of old by new in file :%s/onward/forward/gi Replace onward by forward, case unsensitive :%s/old/new/gc Replace all occurences with confirmation :2,35s/old/new/g Replace all occurences between lines 2 and 35 :5,$s/old/new/g Replace all occurences from line 5 to EOF :%s/^/hello/g Replace the begining of each line by hello :%s/$/Harry/g Replace the end of each line by Harry :%s/onward/forward/gi Replace onward by forward, case unsensitive :%s/ *$//g Delete all white spaces :g/string/d Delete all lines containing string :v/string/d Delete all lines containing which didn’t contain string :s/Bill/Steve/ Replace the first occurence of Bill by Steve in current line :s/Bill/Steve/g Replace Bill by Steve in current line :%s/Bill/Steve/g Replace Bill by Steve in all the file :%s/^M//g Delete DOS carriage returns (^M) :%s/\r/\r/g Transform DOS carriage returns in returns :%s#<[^>]\+>##g Delete HTML tags but keeps text :%s/^$.*$\n\1$/\1/ Delete lines which appears twice Ctrl+a Increment number under the cursor Ctrl+x Decrement number under cursor ggVGg? Change text to Rot13 Case Vu Lowercase line VU Uppercase line g~~ Invert case vEU Switch word to uppercase vE~ Modify word case ggguG Set all text to lowercase gggUG Set all text to uppercase :set ignorecase Ignore case in searches :set smartcase Ignore case in searches excepted if an uppercase letter is used :%s/\<./\u&/g Sets first letter of each word to uppercase :%s/\<./\l&/g Sets first letter of each word to lowercase :%s/.*/\u& Sets first letter of each line to uppercase :%s/.*/\l& Sets first letter of each line to lowercase Read/Write files :1,10 w outfile Saves lines 1 to 10 in outfile :1,10 w >> outfile Appends lines 1 to 10 to outfile :r infile Insert the content of infile :23r infile Insert the content of infile under line 23 File explorer :e . Open integrated file explorer :Sex Split window and open integrated file explorer :Sex! Same as :Sex but split window vertically :browse e Graphical file explorer :ls List buffers :cd .. Move to parent directory :args List files :args *.php Open file list :grep expression *.php Returns a list of .php files contening expression gf Open file name under cursor Interact with Unix :!pwd Execute the pwd unix command, then returns to Vi !!pwd Execute the pwd unix command and insert output in file :sh Temporary returns to Unix $exit Retourns to Vi Alignment :%!fmt Align all lines !}fmt Align all lines at the current position 5!!fmt Align the next 5 lines Tabs/Windows :tabnew Creates a new tab gt Show next tab :tabfirst Show first tab :tablast Show last tab :tabm n(position) Rearrange tabs :tabdo %s/foo/bar/g Execute a command in all tabs :tab ball Puts all open files in tabs :new abc.txt Edit abc.txt in new window Window spliting :e filename Edit filename in current window :split filename Split the window and open filename ctrl-w up arrow Puts cursor in top window ctrl-w ctrl-w Puts cursor in next window ctrl-w_ Maximize current window vertically ctrl-w| Maximize current window horizontally ctrl-w= Gives the same size to all windows 10 ctrl-w+ Add 10 lines to current window :vsplit file Split window vertically :sview file Same as :split in readonly mode :hide Close current window :nly Close all windows, excepted current :b 2 Open #2 in this window Auto-completion Ctrl+n Ctrl+p (in insert mode) Complete word Ctrl+x Ctrl+l Complete line :set dictionary=dict Define dict as a dictionnary Ctrl+x Ctrl+k Complete with dictionnary Marks m {a-z} Marks current position as {a-z} ' {a-z} Move to position {a-z} '' Move to previous position Abbreviations :ab mail [email protected] Define mail as abbreviation of [email protected] Text indent :set autoindent Turn on auto-indent :set smartindent Turn on intelligent auto-indent :set shiftwidth=4 Defines 4 spaces as indent size ctrl-t, ctrl-d Indent/un-indent in insert mode >> Indent << Un-indent =% Indent the code between parenthesis 1GVG= Indent the whole file Syntax highlighting :syntax on Turn on syntax highlighting :syntax off Turn off syntax highlighting :set syntax=perl Force syntax highlighting

August 15, 2013

by Jean-Baptiste Jung

· 20,687 Views · 1 Like

neo4j: Extracting a subgraph as an adjacency matrix and calculating eigenvector centrality with JBLAS

Earlier in the week I wrote a blog post showing how to calculate the eigenvector centrality of an adjacency matrix using JBLAS and the next step was to work out the eigenvector centrality of a neo4j sub graph. There were 3 steps involved in doing this: Export the neo4j sub graph as an adjacency matrix Run JBLAS over it to get eigenvector centrality scores for each node Write those scores back into neo4j I decided to make use of the Paul Revere data set from Kieran Healy’s blog post which consists of people and groups that they had membership of. The script to import the data is on my fork of the revere repository. Having imported the data the next step was to write a cypher query which would give me the people in anadjacency matrix with the number in each column/row intersection showing how many common groups that pair of people had. I thought it’d be easier to build this query incrementally so I started out writing a query which would return one row of the adjacency matrix: MATCH p1:Person, p2:Person WHERE p1.name = "Paul Revere" WITH p1, p2 MATCH p = p1-[?:MEMBER_OF]->()<-[?:MEMBER_OF]-p2 WITH p1.name AS p1, p2.name AS p2, COUNT(p) AS links ORDER BY p2 RETURN p1, COLLECT(links) AS row Here we start with Paul Revere and then find the relationships between him and every other person by way of a common group membership. We use an optional relationship since we need to include a value in each column/row of our adjacency matrix we need to return a 0 value for anyone he doesn’t intersect with. If we run that query we get back the following: +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | p1 | row | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "Paul Revere" | [2,1,1,1,1,1,1,1,1,1,1,1,1,1,2,3,1,1,1,1,1,1,3,3,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,3,2,1,1,2,1,2,1,1,1,1,1,0,1,1,1,1,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,2,1,1,1,1,1,1,2,1,3,1,3,2,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,0,1,0,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,4,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3,1,1,1,1,1,1,2,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,3,1,1,2,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,1,1,1,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3,1,1,2,1,1,1,1,1,1,1,1,3,1,1,1,1,3,1,1,1,1,0,1,2,1,1,1,1,1,1,1] | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ As it turns outs we’ve only got to remove the WHERE clause and order everybody and we’ve get the adjacency matrix for everyone: MATCH p1:Person, p2:Person WITH p1, p2 MATCH p = p1-[?:MEMBER_OF]->()<-[?:MEMBER_OF]-p2 WITH p1.name AS p1, p2.name AS p2, COUNT(p) AS links ORDER BY p2 RETURN p1, COLLECT(links) AS row ORDER BY p1 +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | p1 | row | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | "Abiel Ruddock" | [0,1,1,1,0,1,0,1,0,0,1,1,1,0,1,2,0,1,0,1,1,1,2,2,1,0,0,1,1,0,1,1,1,1,1,0,0,0,0,1,1,0,0,2,2,0,0,1,1,2,1,1,1,0,1,0,1,1,0,0,2,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,0,1,1,0,1,1,1,1,1,1,1,1,1,0,2,1,2,1,0,0,0,0,1,1,0,1,0,0,1,0,2,0,0,1,0,0,0,1,0,0,2,0,1,0,1,1,1,0,0,1,1,0,0,0,0,0,0,2,0,0,0,0,0,0,0,1,0,1,1,0,1,1,1,2,0,0,1,1,0,0,2,0,1,2,1,1,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,1,2,1,0,1,1,1,1,1,0,0,1,1,0,0,0,0,1,0,1,1,0,0,1,0,0,2,1,0,0,1,1,1,1,0,1,0,0,0,1,0,1,0,1,1,0,0,1,0,1,0,1,0,0,1,0,2,1,1,0,0,2,0,1,0,0,0,0,1,0,1,0,1,0,1,0] | | "Abraham Hunt" | [1,0,1,1,0,1,0,0,0,0,0,1,0,0,0,1,0,1,0,1,1,0,1,1,0,0,0,1,1,0,1,0,0,1,0,0,0,0,0,1,0,0,0,1,1,0,0,0,1,1,1,1,1,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,0,0,1,1,0,1,0,1,1,1,1,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,1,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,1,1,0,1,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,1,1,1,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,1,0] | ... +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 254 rows 9897 ms The next step was to wire up the query results with the JBLAS code that I wrote in the previous post. I ended up with the following: public class Neo4jAdjacencyMatrixSpike { public static void main(String[] args) throws SQLException { ClientResponse response = client() .resource("http://localhost:7474/db/data/cypher") .entity(queryAsJson(), MediaType.APPLICATION_JSON) .accept(MediaType.APPLICATION_JSON) .post(ClientResponse.class); JsonNode result = response.getEntity(JsonNode.class); ArrayNode rows = (ArrayNode) result.get("data"); List principalEigenvector = JBLASSpike.getPrincipalEigenvector(new DoubleMatrix(asMatrix(rows))); List people = asPeople(rows); updatePeopleWithEigenvector(people, principalEigenvector); System.out.println(sort(people).take(10)); } private static double[][] asMatrix(ArrayNode rows) { double[][] matrix = new double[rows.size()][254]; int rowCount = 0; for (JsonNode row : rows) { ArrayNode matrixRow = (ArrayNode) row.get(2); double[] rowInMatrix = new double[254]; matrix[rowCount] = rowInMatrix; int columnCount = 0; for (JsonNode jsonNode : matrixRow) { matrix[rowCount][columnCount] = jsonNode.asInt(); columnCount++; } rowCount++; } return matrix; } // rest cut for brevity } Here we are taking the query and then converting it into an array of arrays before passing it to our JBLAS code to calculate the principal eigenvector. We then return the top 10 people: Person{name='William Cooper', eigenvector=0.172604992239612, nodeId=68}, Person{name='Nathaniel Barber', eigenvector=0.17260499223961198, nodeId=18}, Person{name='John Hoffins', eigenvector=0.17260499223961195, nodeId=118}, Person{name='Paul Revere', eigenvector=0.17171142003936804, nodeId=207}, Person{name='Caleb Davis', eigenvector=0.16383970722169897, nodeId=71}, Person{name='Caleb Hopkins', eigenvector=0.16383970722169897, nodeId=121}, Person{name='Henry Bass', eigenvector=0.16383970722169897, nodeId=21}, Person{name='Thomas Chase', eigenvector=0.16383970722169897, nodeId=54}, Person{name='William Greenleaf', eigenvector=0.16383970722169897, nodeId=104}, Person{name='Edward Proctor', eigenvector=0.15600043886738055, nodeId=201} I get back the same 10 people as Kieran Healy although they have different eigenvector values. As far as I understand the absolute value doesn’t matter, what’s more important is the relative score to other people so I think we’re ok. The final step was to write these eigenvector values back into neo4j which we can do with the following code: private static void updateNeo4jWithEigenvectors(List people) { for (Person person : people) { ObjectNode request = JsonNodeFactory.instance.objectNode(); request.put("query", "START p = node({nodeId}) SET p.eigenvectorCentrality={value}"); ObjectNode params = JsonNodeFactory.instance.objectNode(); params.put("nodeId", person.nodeId); params.put("value", person.eigenvector); request.put("params", params); client() .resource("http://localhost:7474/db/data/cypher") .entity(request, MediaType.APPLICATION_JSON) .accept(MediaType.APPLICATION_JSON) .post(ClientResponse.class); } } Now we might use that eigenvector centrality value in other queries, such as one to show who the most central/potentially influential people are in each group: MATCH g:Group<-[:MEMBER_OF]-p WITH g.name AS group, p.name AS personName, p.eigenvectorCentrality as eigen ORDER BY eigen DESC WITH group, COLLECT(personName) AS people RETURN group, HEAD(people) + [HEAD(TAIL(people))] + [HEAD(TAIL(TAIL(people)))] AS mostCentral +--------------------------------------------------------------------------+ | group | mostCentral | +--------------------------------------------------------------------------+ | "StAndrewsLodge" | ["Paul Revere","Joseph Warren","Thomas Urann"] | | "BostonCommittee" | ["William Cooper","Nathaniel Barber","John Hoffins"] | | "LoyalNine" | ["Caleb Hopkins","William Greenleaf","Caleb Davis"] | | "LondonEnemies" | ["William Cooper","Nathaniel Barber","John Hoffins"] | | "LongRoomClub" | ["Paul Revere","John Hancock","Benjamin Clarke"] | | "NorthCaucus" | ["William Cooper","Nathaniel Barber","John Hoffins"] | | "TeaParty" | ["William Cooper","Nathaniel Barber","John Hoffins"] | +--------------------------------------------------------------------------+ 7 rows 280 ms Our top ten feature frequently although it’s interesting that only one of them is in the ‘LongRoomClub’ group which perhaps indicates that people in that group are less likely to be members of the other ones. I’d be interested if anyone can think of other potential uses for eigenvector centrality once we’ve got it back in the graph. All the code described in this post is on github if you want to take it for a spin.

August 12, 2013

by Mark Needham

· 5,695 Views

Jersey Client: Testing External Calls

Jim and I have been doing a bit of work over the last week which involved calling neo4j’s HA status URI to check whether or not an instance was a master/slave and we’ve been using jersey-client. The code looked roughly like this: class Neo4jInstance { private Client httpClient; private URI hostname; public Neo4jInstance(Client httpClient, URI hostname) { this.httpClient = httpClient; this.hostname = hostname; } public Boolean isSlave() { String slaveURI = hostname.toString() + ":7474/db/manage/server/ha/slave"; ClientResponse response = httpClient.resource(slaveURI).accept(TEXT_PLAIN).get(ClientResponse.class); return Boolean.parseBoolean(response.getEntity(String.class)); } } While writing some tests against this code we wanted to stub out the actual calls to the HA slave URI so we could simulate both conditions and a brief search suggested that mockito was the way to go. We ended up with a test that looked like this: @Test public void shouldIndicateInstanceIsSlave() { Client client = mock( Client.class ); WebResource webResource = mock( WebResource.class ); WebResource.Builder builder = mock( WebResource.Builder.class ); ClientResponse clientResponse = mock( ClientResponse.class ); when( builder.get( ClientResponse.class ) ).thenReturn( clientResponse ); when( clientResponse.getEntity( String.class ) ).thenReturn( "true" ); when( webResource.accept( anyString() ) ).thenReturn( builder ); when( client.resource( anyString() ) ).thenReturn( webResource ); Boolean isSlave = new Neo4jInstance(client, URI.create("http://localhost")).isSlave(); assertTrue(isSlave); } which is pretty gnarly but does the job. I thought there must be a better way so I continued searching and eventually came across this post on the mailing list which suggested creating a custom ClientHandler and stubbing out requests/responses there. I had a go at doing that and wrapped it with a little DSL that only covers our very specific use case: private static ClientBuilder client() { return new ClientBuilder(); } static class ClientBuilder { private String uri; private int statusCode; private String content; public ClientBuilder requestFor(String uri) { this.uri = uri; return this; } public ClientBuilder returns(int statusCode) { this.statusCode = statusCode; return this; } public Client create() { return new Client() { public ClientResponse handle(ClientRequest request) throws ClientHandlerException { if (request.getURI().toString().equals(uri)) { InBoundHeaders headers = new InBoundHeaders(); headers.put("Content-Type", asList("text/plain")); return createDummyResponse(headers); } throw new RuntimeException("No stub defined for " + request.getURI()); } }; } private ClientResponse createDummyResponse(InBoundHeaders headers) { return new ClientResponse(statusCode, headers, new ByteArrayInputStream(content.getBytes()), messageBodyWorkers()); } private MessageBodyWorkers messageBodyWorkers() { return new MessageBodyWorkers() { public Map> getReaders(MediaType mediaType) { return null; } public Map> getWriters(MediaType mediaType) { return null; } public String readersToString(Map> mediaTypeListMap) { return null; } public String writersToString(Map> mediaTypeListMap) { return null; } public MessageBodyReader getMessageBodyReader(Class tClass, Type type, Annotation[] annotations, MediaType mediaType) { return (MessageBodyReader) new StringProvider(); } public MessageBodyWriter getMessageBodyWriter(Class tClass, Type type, Annotation[] annotations, MediaType mediaType) { return null; } public List getMessageBodyWriterMediaTypes(Class tClass, Type type, Annotation[] annotations) { return null; } public MediaType getMessageBodyWriterMediaType(Class tClass, Type type, Annotation[] annotations, List mediaTypes) { return null; } }; } public ClientBuilder content(String content) { this.content = content; return this; } } If we change our test to use this code it now looks like this: @Test public void shouldIndicateInstanceIsSlave() { Client client = client().requestFor("http://localhost:7474/db/manage/server/ha/slave"). returns(200). content("true"). create(); Boolean isSlave = new Neo4jInstance(client, URI.create("http://localhost")).isSlave(); assertTrue(isSlave); } Is there a better way? In Ruby I’ve used WebMock to achieve this and Ashok pointed me towards WebStub which looks nice except I’d need to pass in the hostname + port rather than constructing that in the code.

August 1, 2013

by Mark Needham

· 10,828 Views

AWS: Attaching an EBS volume on an EC2 instance and making it available for use

I recently wanted to attach an EBS volume to an existing EC2 instance that I had running and since it was for a one off tasks (famous last words) I decided to configure it manually. I created the EBS volume through the AWS console and one thing that initially caught me out is that the EC2 instance and EBS volume need to be in the same region and zone. Therefore if I create my EC2 instance in ‘eu-west-1b’ then I need to create my EBS volume in ‘eu-west-1b’ as well otherwise I won’t be able to attach it to that instance. I attached the device as /dev/sdf although the UI gives the following warning: Linux Devices: /dev/sdf through /dev/sdp Note: Newer linux kernels may rename your devices to /dev/xvdf through /dev/xvdp internally, even when the device name entered here (and shown in the details) is /dev/sdf through /dev/sdp. After attaching the EBS volume to the EC2 instance my next step was to SSH onto my EC2 instance and make the EBS volume available. The first step is to create a file system on the volume: $ sudo mkfs -t ext3 /dev/sdf mke2fs 1.42 (29-Nov-2011) Could not stat /dev/sdf --- No such file or directory The device apparently does not exist; did you specify it correctly? It turns out that warning was handy and the device has in fact been renamed. We can confirm this by callingfdisk: $ sudo fdisk -l Disk /dev/xvda1: 8589 MB, 8589934592 bytes 255 heads, 63 sectors/track, 1044 cylinders, total 16777216 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/xvda1 doesn't contain a valid partition table Disk /dev/xvdf: 53.7 GB, 53687091200 bytes 255 heads, 63 sectors/track, 6527 cylinders, total 104857600 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 Disk /dev/xvdf doesn't contain a valid partition table /dev/xvdf is the one we’re interested in so I re-ran the previous command: $ sudo mkfs -t ext3 /dev/xvdf mke2fs 1.42 (29-Nov-2011) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 3276800 inodes, 13107200 blocks 655360 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 400 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424 Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done Once I’d done that I needed to create a mount point for the volume and I thought the best place was probably a directory under /mnt: $ sudo mkdir /mnt/ebs The final step is to mount the volume: $ sudo mount /dev/xvdf /mnt/ebs And if we run df we can see that it’s ready to go: $ df -h Filesystem Size Used Avail Use% Mounted on /dev/xvda1 7.9G 883M 6.7G 12% / udev 288M 8.0K 288M 1% /dev tmpfs 119M 164K 118M 1% /run none 5.0M 0 5.0M 0% /run/lock none 296M 0 296M 0% /run/shm /dev/xvdf 50G 180M 47G 1% /mnt/ebs

July 31, 2013

by Mark Needham

· 11,981 Views

Why I Never Use the Maven Release Plugin

Just about every 6 months or so an article appears cursing Maven, attracting both proponents as opponents to Maven and Ant. While it’s real fun to watch (I really get a laugh when people start to advocate the return to Ant), most of the time it’s always the same arguments. Maven lacks flexibility, the plugin system sucks (when will people learn to use plugin versions…), you can’t use scripting and the all time favorite: the release plugin sucks. Well, I am a Maven addict and I’m happy to say: yes, I agree, the release plugin sucks. Big time. But here’s something you may have forgotten: you don’t need it! Even more: you shouldn’t use it. The Maven release plugin tries to make releasing software a breeze. That’s where the plugin authors got it wrong to start with. Releases are not something done on a whim. They are carefully planned and orchestrated actions, preceded by countless rules and followed by more rules. Assuming you can bundle all that in a simple mvn release:release is just plain naive. Even Maven’s most fierce supporters agree on this. The Maven release plugin just tries to do too much stuff at once: build your software, tag it, build it again, deploy it, build the site (triggering yet another build in the process) and deploy the site. And whilst doing that, running the tests x times. Most of the time, you’re making candidate releases, so building the complete documentation is a complete waste of time. Now, if you break down the release plugin into sensible steps, you’ll really save yourself a whole lot of trouble. I use these steps to release something. As a sidenote: I use git and git-flow standards (as described here). Assume the POM’s version’s currently on 1.0-SNAPSHOT. Announce the release process Very important. As I said, you don’t release on a whim. Make sure everyone on your team knows a release is pending and has all their stuff pushed to the development branch that needs to be included. Branch the development branch into a release branch. Following git-flow rules, I make a release branch 1.0. Update the POM version of the development branch. Update the version to the next release version. For example mvn versions:set -DnewVersion=2.0-SNAPSHOT. Commit and push. Now you can put resources developing towards the next release version. Update the POM version of the release branch. Update the version to the standard CR version. For example mvn versions:set -DnewVersion=1.0.CR-SNAPSHOT. Commit and push. Run tests on the release branch. Run all the tests. If one or more fail, fix them first. Create a candidate release from the release branch. Use the Maven version plugin to update your POM’s versions. For example mvn versions:set -DnewVersion=1.0.CR1. Commit and push. Make a tag on git. Use the Maven version plugin to update your POM’s versions back to the standard CR version. For example mvn versions:set -DnewVersion=1.0.CR-SNAPSHOT. Commit and push. Checkout the new tag. Do a deployment build (mvn clean deploy). Since you’ve just run your tests and fixed any failing ones, this shouldn’t fail. Put deployment on QA environment. Iterate until QA gives a green light on the candidate release. Fix bugs. Fix bugs reported on the CR releases on the release branch. Merge into development branch on regular intervals (or even better, continuous). Run tests continuously, making bug reports on failures and fixing them as you go. Create a candidate release. Use the Maven version plugin to update your POM’s versions. For example mvn versions:set -DnewVersion=1.0.CRx. Commit and push. Make a tag on git. Use the Maven version plugin to update your POM’s versions back to the standard CR version. For example mvn versions:set -DnewVersion=1.0.CR-SNAPSHOT. Commit and push. Checkout the new tag. Do a deployment build (mvn clean deploy). Since you’ve run your tests continuously, this shouldn’t fail. Put deployment on QA environment. Once QA has signed off on the release, create a final release. Check whether there are no new commits since the last release tag (if there are, slap developers as they have done stuff that wasn’t needed or asked for). Use the Maven version plugin to update your POM’s versions. For example mvn versions:set -DnewVersion=1.0. Commit and push. Tag the release branch. Merge into the master branch. Checkout the master branch. Do a deployment build (mvn clean deploy). Start production release and deployment process (in most companies, not a small feat). This can involve building the site and doing other stuff, some not even Maven related. There’s no way in hell Maven can automate this process and if you try, you’ll bump into the many pitfalls the release plugin has to offer. The release plugin is just a combination of the versions, scm, deploy and site plugin that seriously violates the single responsibility principle. The release plugin is one of the reasons Maven has gotten a bad reputation with some people. It’s long due for an overhaul, but if you ask me, they should just remove it altogether. Releasing software is a process, not a single command on the command line. The process I just described isn’t perfect in any way, but it works and I avoid using the release plugin as it just does too much stuff. Have fun bashing Maven, but please, keep it clean :) .

July 26, 2013

by Lieven Doclo

· 118,084 Views · 13 Likes