DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Databases Topics

article thumbnail
Ecosystem of Hadoop Animal Zoo
hadoop is best known for map reduce and it's distributed file system (hdfs). recently other productivity tools developed on top of these will form a complete ecosystem of hadoop. most of the projects are hosted under apache software foundation . hadoop ecosystem projects are listed below. hadoop common a set of components and interfaces for distributed file system and i/o (serialization, java rpc, persistent data structures) http://hadoop.apache.org/ hadoop ecosystem hdfs a distributed file system that runs on large clusters of commodity hardware. hadoop distributed file system, hdfs renamed form ndfs. scalable data store that stores semi-structured, un-structured and structured data. http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/hdfsuserguide.html http://wiki.apache.org/hadoop/hdfs map reduce map reduce is the distributed, parallel computing programming model for hadoop. inspired from google map reduce research paper . hadoop includes implementation of map reduce programming model. in map reduce there are two phases, not surprisingly map and reduce. to be precise in between map and reduce phase, there is another phase called sort and shuffle. job tracker in name node machine manages other cluster nodes. map reduce programming can be written in java. if you like sql or other non- java languages, you are still in luck. you can use utility called hadoop streaming. http://wiki.apache.org/hadoop/hadoopmapreduce hadoop streaming a utility to enable map reduce code in many languages like c, perl, python, c++, bash etc., examples include a python mapper and awk reducer. http://hadoop.apache.org/docs/r1.2.1/streaming.html avro a serialization system for efficient, cross-language rpc and persistent data storage. avro is a framework for performing remote procedure calls and data serialization. in the context of hadoop, it can be used to pass data from one program or language to another, e.g. from c to pig. it is particularly suited for use with scripting languages such as pig, because data is always stored with its schema in avro. http://avro.apache.org/ apache thrift apache thrift allows you to define data types and service interfaces in a simple definition file. taking that file as input, the compiler generates code to be used to easily build rpc clients and servers that communicate seamlessly across programming languages. instead of writing a load of boilerplate code to serialize and transport your objects and invoke remote methods, you can get right down to business. http://thrift.apache.org/ hive and hue if you like sql, you would be delighted to hear that you can write sql and hive convert it to a map reduce job. but, you don't get a full ansi-sql environment. hue gives you a browser based graphical interface to do your hive work. hue features a file browser for hdfs, a job browser for map reduce/yarn, an hbase browser, query editors for hive, pig, cloudera impala and sqoop2.it also ships with an oozie application for creating and monitoring workflows, a zookeeper browser and an sdk. pig a high-level programming data flow language and execution environment to do map reduce coding the pig language is called pig latin. you may find naming conventions some what un-conventional, but you get incredible price-performance and high availability. https://pig.apache.org/ jaql jaql is a functional, declarative programming language designed especially for working with large volumes of structured, semi-structured and unstructured data. as its name implies, a primary use of jaql is to handle data stored as json documents, but jaql can work on various types of data. for example, it can support xml, comma-separated values (csv) data and flat files. a "sql within jaql" capability lets programmers work with structured sql data while employing a json data model that's less restrictive than its structured query language counterparts. 1. jaql in google code 2. what is jaql? by ibm sqoop sqoop provides a bi-directional data transfer between hadoop -hdfs and your favorite relational database. for example you might be storing your app data in relational store such as oracle, now you want to scale your application with hadoop so you can migrate oracle database data to hadoop hdfs using sqoop. http://sqoop.apache.org/ oozie manages hadoop workflow. this doesn't replace your scheduler or BPM tooling, but it will provide if-then-else branching and control with hadoop jobs. https://oozie.apache.org/ zookeeper a distributed, highly available coordination service. zookeeper provides primitives such as distributed locks that can be used for building the highly scalable applications. it is used to manage synchronization for cluster. http://zookeeper.apache.org/ hbase based on google's bigtable , hbase "is an open-source, distributed, version, column-oriented store" that sits on top of hdfs. a super scalable key-value store. it works very much like a persistent hash-map (for python developers think like a dictionary). it is not a conventional relational database. it is a distributed, column oriented database. hbase uses hdfs for it's underlying. supports both batch-style computations using map reduce and point queries for random reads. https://hbase.apache.org/ cassandra a column oriented nosql data store which offers scalability, high availability with out compromising on performance. it perfect platform for commodity hardware and cloud infrastructure.cassandra's data model offers the convenience of column indexes with the performance of log-structured updates, strong support for de-normalization and materialized views , and powerful built-in caching. http://cassandra.apache.org/ flume a real time loader for streaming your data into hadoop. it stores data in hdfs and hbase.flume "channels" data between "sources" and "sinks" and its data harvesting can either be scheduled or event-driven. possible sources for flume include avro, files, and system logs, and possible sinks include hdfs and hbase. http://flume.apache.org/ mahout machine learning for hadoop, used for predictive analytics and other advanced analysis. there are currently four main groups of algorithms in mahout: recommendations, a.k.a. collective filtering classification, a.k.a categorization clustering frequent item set mining, a.k.a parallel frequent pattern mining mahout is not simply a collection of pre-existing algorithms; many machine learning algorithms are intrinsically non-scalable; that is, given the types of operations they perform, they cannot be executed as a set of parallel processes. algorithms in the mahout library belong to the subset that can be executed in a distributed fashion. http://en.wikipedia.org/wiki/list_of_machine_learning_algorithms https://www.coursera.org/course/machlearning https://mahout.apache.org/ fuse makes the hdfs system to look like a regular file system so that you can use ls, rm, cd etc., directly on hdfs data. whirr apache whirr is a set of libraries for running cloud services. whirr provides a cloud-neutral way to run services. you don't have to worry about the idiosyncrasies of each provider.a common service api. the details of provisioning are particular to the service. smart defaults for services. you can get a properly configured system running quickly, while still being able to override settings as needed. you can also use whirr as a command line tool for deploying clusters. https://whirr.apache.org/ giraph an open source graph processing api like pregel from google https://giraph.apache.org/ chukwa chukwa, an incubator project on apache, is a data collection and analysis system built on top of hdfs and map reduce. tailored for collecting logs and other data from distributed monitoring systems, chukwa provides a workflow that allows for incremental data collection, processing and storage in hadoop. it is included in the apache hadoop distribution as an independent module. https://chukwa.apache.org/ drill apache drill, an incubator project on apache, is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. drill is the open source version of google's dremel system which is available as an iaas service called google big query. one explicitly stated design goal is that drill is able to scale to 10,000 servers or more and to be able to process petabytes of data and trillions of records in seconds. http://incubator.apache.org/drill/ impala (cloudera) released by cloudera, impala is an open-source project which, like apache drill, was inspired by google's paper on dremel; the purpose of both is to facilitate real-time querying of data in hdfs or hbase. impala uses an sql-like language that, though similar to hiveql, is currently more limited than hiveql. because impala relies on the hive meta store, hive must be installed on a cluster in order for impala to work. the secret behind impala's speed is that it "circumvents map reduce to directly access the data through a specialized distributed query engine that is very similar to those found in commercial parallel rdbmss." (source: cloudera) http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html http://training.cloudera.com/elearning/impala/
June 3, 2015
by Umashankar Ankuri
· 23,894 Views · 3 Likes
article thumbnail
The Myth of Asynchronous JDBC
I keep seeing people (especially in the scala/typesafe world) posting about async jdbc libraries. STOP IT! Under the current APIs, async JDBC belongs in a realm with Unicorns, Tiger Squirrels, and 8' spiders. While you might be able to move the blocking operations and queue requests and keep your "main" worker threads from blocking, jdbc is synchronous. At some point, somewhere, there's going to be a thread blocking waiting for a response. It's frustrating to see so many folks hyping this and muddying the waters. Unless you write your own client for a dbms and have a dbms that can multiplex calls over a single connection (or using some other strategy to enable this capability) db access is going to block. It's not impossible to make the calls completely async, but nobody's built it yet. Yes, I know ajdbc is taking a stab at this capability, but even IT uses a thread pool for the blocking calls (be default). Someday we'll have async database access (it's not impossible...well it IS with the current JDBC specification), but no general purpose RDBMS has this right now. The primary problems with the hype/misdirection are that #1 inexperienced programmers don't understand that they've just moved the problem and will use the APIs and wonder why the system is so slow (oh I have 1000 db calls queued up waiting for my single db thread to process the work) and #2 It belies a serious misunderstanding of the difference between async JDBC (not possible per current spec) and async db access (totally possible/doable, but rare in the wild).
May 29, 2015
by Michael Mainguy
· 17,814 Views · 2 Likes
article thumbnail
What is API First?
API First is a design strategy for a company’s entire product line, where APIs are the basis of every product instead of being a separate side product. To understand why API First is a good idea, you first have to understand how existing APIs are generally created. There are various models for creating APIs, but generally the API accesses the backend directly in parallel with the main product. This means that if you want to make new applications, you have to either write more systems that access the backend or extend the API so that it supports both alternative products. Additionally, an API is frequently considered to be an “extra, nice to have” product rather than an important member of the product ecosystem. This creates serious problems with resource contention as the API competes against revenue producing products for engineering resources. APIs as a Side Product Think first about an example of the “usual” setup for APIs – as you can imagine, the API is separate from the main product, and even if all secondary products run off of the API, there will be a mismatch between the two experiences. Keeping everything entirely consistent is theoretically doable but a lot more work than simply restructuring the infrastructure to treat the API as a piece of core technology for the system. There is some thought that separating out the API from the main product can protect that product from attacks via the API, but truthfully the backend system is the critical piece, and separating out the clients in this way simply makes it harder to triage and fix problems that might occur in the product or the API. Keeping everything in the same pipeline helps assure consistency, reliability and as your product grows, it makes scaling much easier. Imagine that a company has a product for creating and updating contact information for users. The backend returns lists of users based on a database call from the client. In this case, the main product communicates directly with the backend system, and is likely to retrieve data in a different way as a result. When the backend system team adds a new “Location” field feature, it doesn't show up in the API until an engineer has time to add it there. This means that there will be a necessary lag between the addition of this field to the main product and availability within the API until and unless someone takes the time to write essentially duplicate code to retrieve the new fields. There’s a lot of technical debt incurred when you have multiple systems trying to reproduce a single interface. There’s not likely to be a good process for comparing this API change to other APIs from the company – resulting in inconsistent results. In this case, the mobile app wouldn’t allow or see locations, resulting in developer dissatisfaction and customer confusion and irritation. Once you’ve established how you want your users to interact with your system, it’s best to support that everywhere. When you have multiple teams creating products without a shared vision, you also tend to have poor communication between those teams. This can lead to bug fixes in one code base but not in the other, or inconsistencies between the items available from the system depending on which interface is being used. API First Model The other possibility is an API First model. All of the products run off of the same interface. This ensures that each device, application or integration has the same resources available to use. These resources will be consistent across the entire product line. Note that just because an API resource is available within the system, you don’t have to expose it to all the world – you can decide which of the API resources is internal, partner only, or open to anyone. It’s still a great idea to have your API ready because when a major partner asks for access to some specific resources to support a use case, you have it ready to go. API First also encourages communication between your backend team and each of the client engineering teams. Understanding use cases at a high level the helps create APIs that are usable for the use cases you understand up front, and more likely to support future use cases that come up. Once you’re creating the API as a larger team, you’ll find many places where different teams offer complementary resources adding to a much better structured system. API First makes a lot of sense for any company – as soon as you have more than one product (for most companies that’s going to be a website and a mobile application) you need to have a layer to protect the clients from changes on the server. A well-documented interface into the system, crafted with specific use cases in mind, allows you the freedom to change things around on the backend, as long as the interface doesn’t change. Integrated testing is easier, and the products running on the API will by their nature test the integrity of the system on a regular basis. You can learn much more about APIs in my book Irresistible APIs, available from Manning Publications, Inc.
May 29, 2015
by Kirsten Hunter
· 6,658 Views · 1 Like
article thumbnail
Efficient Cassandra Write Pattern for Micro-Batching
The best way to write to a Cassandra cluster are concurrent asynchronous writes. In cases where data exhibits strong temporal locality, speed can be improved.
May 20, 2015
by John Georgiadis
· 35,048 Views · 1 Like
article thumbnail
Data Locality w/ Cassandra : How to Scan the Local Token Range of a Table...
I'm working on a mechanism that will allow HPCC to access data stored in Cassandra with data locality, leveraging the Java streaming capabilities from HPCC.
May 18, 2015
by Brian O' Neill
· 14,350 Views · 1 Like
article thumbnail
Integrating External APIs into your Meteor.js Application
Meteor itself does not rely on REST APIs, but it can easily access data from other services. This article is an excerpt from the book Meteor in Action and explains how you can integrate third-party data into your applications by accessing RESTful URLs from the server-side. Many applications rely on external APIs to retrieve data. Getting information regarding your friends from Facebook, looking up the current weather in your area, or simply retrieving an avatar image from another website – there are endless uses for integrating additional data. They all share a common challenge: APIs must be called from the server, but an API usually takes longer than executing the method itself. You need to ensure that the result gets back to the client – even if it takes a couple of seconds. Let’s talk about how to integrate an external API via HTTP. Based on the IP address of a visitor, you can tell various information about their current location, e.g., coordinates, city or timezone. There is a simple API that takes an IPv4 address and returns all these tidbits as a JSON object. The API is called Telize. Making RESTful calls with the http package In order to communicate with RESTful external APIs such as Telize, you need to add the http package: meteor add http While the http package allows you to make HTTP calls from both client and server, the API call in this example will be performed from the server only. Many APIs require you to provide an ID as well as a secret key to identify the application that makes an API request. In those cases you should always run your requests from the server. That way you never have to share secret keys with clients. Let's look at a graphic to explain the basic concept. A user requests location information for an IP address (step 1). The client application calls a server method called geoJsonforIp (step 2) that makes an (asynchronous) call to the external API using the HTTP.get() method (step 3). The response (step 4) is a JSON object with information regarding the geographic location associated with an IP address, which gets sent back to the client via a callback (step 5). Using a synchronous method to query an API Let’s add a method that queries telize.com for a given IP address as shown in the following listing. This includes only the bare essentials for querying an API for now. Remember: This code belongs in a server-side only file or inside a if (Meteor.isServer) {} block. Meteor.methods({ // The method expects a valid IPv4 address 'geoJsonForIp': function (ip) { console.log('Method.geoJsonForIp for', ip); // Construct the API URL var apiUrl = 'http://www.telize.com/geoip/' + ip; // query the API var response = HTTP.get(apiUrl).data; return response; } }); Once the method is available on the server, querying the location of an IP works simply by calling the method with a callback from the client: Meteor.call('geoJsonForIp', '8.8.8.8', function(err,res){ console.log(res); }); While this solution appears to be working fine there are two major flaws to this approach: If the API is slow to respond requests will start queuing up. Should the API return an error there is no way to return it back to the UI. To address the issue of queuing, you can add an unblock() statement to the method: this.unblock(); Calling an external API should always be done asynchronously. That way you can also return possible error values back to the browser, which will solve the second issue. Let’s create a dedicated function for calling the API asynchronously to keep the method itself clean. Using an asynchronous method to call an API The listing below shows how to issue an HTTP.get call and return the result via a callback. It also includes error handling that can be shown on the client. var apiCall = function (apiUrl, callback) { // try…catch allows you to handle errors try { var response = HTTP.get(apiUrl).data; // A successful API call returns no error // but the contents from the JSON response callback(null, response); } catch (error) { // If the API responded with an error message and a payload if (error.response) { var errorCode = error.response.data.code; var errorMessage = error.response.data.message; // Otherwise use a generic error message } else { var errorCode = 500; var errorMessage = 'Cannot access the API'; } // Create an Error object and return it via callback var myError = new Meteor.Error(errorCode, errorMessage); callback(myError, null); } } Inside a try…catch block, you can differentiate between a successful API call (the try block) and an error case (the catch block). A successful call may return null for the error object of the callback, an error will return only an error object and null for the actual response. There are different types of errors and you want to differentiate between a problem with accessing the API and an API call that got an error inside the returned response. This is what the if statement checks for – in case the error object has a response property both code and message for the error should be taken from it; otherwise you can display a generic error 500 that the API could not be accessed. Each case, success and failure, returns a callback that can be passed back to the UI. In order to make the API call asynchronous you need to update the method as shown in the next code snippet. The improved code unblocks the method and wraps the API call in a wrapAsync function. Meteor.methods({ 'geoJsonForIp': function (ip) { // avoid blocking other method calls from the same client this.unblock(); var apiUrl = 'http://www.telize.com/geoip/' + ip; // asynchronous call to the dedicated API calling function var response = Meteor.wrapAsync(apiCall)(apiUrl); return response; } }); Finally, to allow requests from the browser and show error messages you should add a template similar to the following code. Query the location data for an IP Look up location {{#with location} {{#if error} There was an error: {{error.errorType} {{error.message}! {{else} The IP address {{location.ip} is in {{location.city} ({{location.country}). {{/if} {{/with} A Session variable called location is used to store the results from the API call. Clicking the button takes the content of the input box and sends it as a parameter to the geoJsonForIp method. The Session variable is set to the value of the callback. This is the required JavaScript code for connecting the template with the method call: Template.telize.helpers({ location: function () { return Session.get('location'); } }); Template.telize.events({ 'click button': function (evt, tpl) { var ip = tpl.find('input#ipv4').value; Meteor.call('geoJsonForIp', ip, function (err, res) { // The method call sets the Session variable to the callback value if (err) { Session.set('location', {error: err}); } else { Session.set('location', res); return res; } }); } }); As a result you will be able to make API calls from the browser just like in this figure: And that’show to integrate an external API via HTTP!
May 15, 2015
by Stephan Hochhaus
· 40,167 Views
article thumbnail
Log Collection With Graylog on AWS
Log collection is essential to properly analyzing issues in production. An interface to search and be notified about exceptions on all your servers is a must. Well, if you have one server, you can easily ssh to it and check the logs, of course, but for larger deployments, collecting logs centrally is way more preferable than logging to 10 machines in order to find “what happened”. There are many options to do that, roughly separated in two groups – 3rd party services and software to be installed by you. 3rd party (or “cloud-based” if you want) log collection services include Splunk,Loggly, Papertrail, Sumologic. They are very easy to setup and you pay for what you use. Basically, you send each message (e.g. via a custom logback appender) to a provider’s endpoint, and then use the dashboard to analyze the data. In many cases that would be the preferred way to go. In other cases, however, company policy may frown upon using 3rd party services to store company-specific data, or additional costs may be undesired. In these cases extra effort needs to be put into installing and managing an internal log collection software. They work in a similar way, but implementation details may differ (e.g. instead of sending messages with an appender to a target endpoint, the software, using some sort of an agent, collects local logs and aggregates them). Open-source options include Graylog, FluentD, Flume, Logstash. After a very quick research, I considered graylog to fit our needs best, so below is a description of the installation procedure on AWS (though the first part applies regardless of the infrastructure). The first thing to look at are the ready-to-use images provided by graylog, including docker, openstack, vagrant and AWS. Unfortunately, the AWS version has two drawbacks – it’s using Ubuntu, rather than the Amazon AMI. That’s not a huge issue, although some generic scripts you use in your stack may have to be rewritten. The other was the dealbreaker – when you start it, it doesn’t run a web interface, although it claims it should. Only mongodb, elasticsearch and graylog-server are started. Having 2 instances – one web, and one for the rest would complicate things, so I opted for manual installation. Graylog has two components – the server, which handles the input, indexing and searching, and the web interface, which is a nice UI that communicates with the server. The web interface uses mongodb for metadata, and the server uses elasticsearch to store the incoming logs. Below is a bash script (CentOS) that handles the installation. Note that there is no “sudo”, because initialization scripts are executed as root on AWS. #!/bin/bash # install pwgen for password-generation yum upgrade ca-certificates --enablerepo=epel yum --enablerepo=epel -y install pwgen # mongodb cat >/etc/yum.repos.d/mongodb-org.repo <<'EOT' [mongodb-org] name=MongoDB Repository baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/ gpgcheck=0 enabled=1 EOT yum -y install mongodb-org chkconfig mongod on service mongod start # elasticsearch rpm --import https://packages.elasticsearch.org/GPG-KEY-elasticsearch cat >/etc/yum.repos.d/elasticsearch.repo <<'EOT' [elasticsearch-1.4] name=Elasticsearch repository for 1.4.x packages baseurl=http://packages.elasticsearch.org/elasticsearch/1.4/centos gpgcheck=1 gpgkey=http://packages.elasticsearch.org/GPG-KEY-elasticsearch enabled=1 EOT yum -y install elasticsearch chkconfig --add elasticsearch # configure elasticsearch sed -i -- 's/#cluster.name: elasticsearch/cluster.name: graylog2/g' /etc/elasticsearch/elasticsearch.yml sed -i -- 's/#network.bind_host: localhost/network.bind_host: localhost/g' /etc/elasticsearch/elasticsearch.yml service elasticsearch stop service elasticsearch start # java yum -y update yum -y install java-1.7.0-openjdk update-alternatives --set java /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java # graylog wget https://packages.graylog2.org/releases/graylog2-server/graylog-1.0.1.tgz tar xvzf graylog-1.0.1.tgz -C /opt/ mv /opt/graylog-1.0.1/ /opt/graylog/ cp /opt/graylog/bin/graylogctl /etc/init.d/graylog sed -i -e 's/GRAYLOG2_SERVER_JAR=\${GRAYLOG2_SERVER_JAR:=graylog.jar}/GRAYLOG2_SERVER_JAR=\${GRAYLOG2_SERVER_JAR:=\/opt\/graylog\/graylog.jar}/' /etc/init.d/graylog sed -i -e 's/LOG_FILE=\${LOG_FILE:=log\/graylog-server.log}/LOG_FILE=\${LOG_FILE:=\/var\/log\/graylog-server.log}/' /etc/init.d/graylog cat >/etc/init.d/graylog <<'EOT' #!/bin/bash # chkconfig: 345 90 60 # description: graylog control sh /opt/graylog/bin/graylogctl $1 EOT chkconfig --add graylog chkconfig graylog on chmod +x /etc/init.d/graylog # graylog web wget https://packages.graylog2.org/releases/graylog2-web-interface/graylog-web-interface-1.0.1.tgz tar xvzf graylog-web-interface-1.0.1.tgz -C /opt/ mv /opt/graylog-web-interface-1.0.1/ /opt/graylog-web/ cat >/etc/init.d/graylog-web <<'EOT' #!/bin/bash # chkconfig: 345 91 61 # description: graylog web interface sh /opt/graylog-web/bin/graylog-web-interface > /dev/null 2>&1 & EOT chkconfig --add graylog-web chkconfig graylog-web on chmod +x /etc/init.d/graylog-web #configure mkdir --parents /etc/graylog/server/ cp /opt/graylog/graylog.conf.example /etc/graylog/server/server.conf sed -i -e 's/password_secret =.*/password_secret = '$(pwgen -s 96 1)'/' /etc/graylog/server/server.conf sed -i -e 's/root_password_sha2 =.*/root_password_sha2 = '$(echo -n password | shasum -a 256 | awk '{print $1}')'/' /etc/graylog/server/server.conf sed -i -e 's/application.secret=""/application.secret="'$(pwgen -s 96 1)'"/g' /opt/graylog-web/conf/graylog-web-interface.conf sed -i -e 's/graylog2-server.uris=""/graylog2-server.uris="http:\/\/127.0.0.1:12900\/"/g' /opt/graylog-web/conf/graylog-web-interface.conf service graylog start sleep 30 service graylog-web start You may also want to set a TTL (auto-expiration) for messages, so that you don’t store old logs forever. Here’s how # wait for the index to be created INDEXES=$(curl --silent "http://localhost:9200/_cat/indices") until [[ "$INDEXES" =~ "graylog2_0" ]]; do sleep 5 echo "Index not yet created. Indexes: $INDEXES" INDEXES=$(curl --silent "http://localhost:9200/_cat/indices") done # set each indexed message auto-expiration (ttl) curl -XPUT "http://localhost:9200/graylog2_0/message/_mapping" -d'{"message": {"_ttl" : { "enabled" : true, "default" : "15d" }}' Now you have everything running on the instance. Then you have to do some AWS-specific things (if using CloudFormation, that would include a pile of JSON). Here’s the list: you can either have an auto-scaling group with one instance, or a single instance. I prefer the ASG, though the other one is a bit simpler. The ASG gives you auto-respawn if the instance dies. set the above script to be invoked in the UserData of the launch configuration of the instance/asg (e.g. by getting it from s3 first) allow UDP port 12201 (the default logging port). That should happen for the instance/asg security group (inbound), for the application nodes security group (outbound), and also as a network ACL of your VPC. Test the UDP connection to make sure it really goes through. Keep the access restricted for all sources, except for your instances. you need to pass the private IP address of your graylog server instance to all the application nodes. That’s tricky on AWS, as private IP addresses change. That’s why you need something stable. You can’t use an ELB (load balancer), because it doesn’t support UDP. There are two options: Associate an Elastic IP with the node on startup. Pass that IP to the application nodes. But there’s a catch – if they connect to the elastic IP, that would go via NAT (if you have such), and you may have to open your instance “to the world”. So, you must turn the elastic IP into its corresponding public DNS. The DNS then will be resolved to the private IP. You can do that by manually and hacky: 1 GRAYLOG_ADDRESS="ec2-$GRAYLOG_ADDRESS//./-}.us-west-1.compute.amazonaws.com" or you can use the AWS EC2 CLI to obtain the instance details of the instance that the elastic IP is associated with, and then with another call obtain its Public DNS. Instead of using an Elastic IP, which limits you to a single instance, you can use Route53 (the AWS DNS manager). That way, when a graylog server instance starts, it can append itself to a route53 record, that way allowing for a round-robin DNS of multiple graylog instances that are in a cluster. Manipulating the Route53 records is again done via the AWS CLI. Then you just pass the domain name to applications nodes, so that they can send messages. alternatively, you can install graylog-server on all the nodes (as an agent), and point them to an elasticsearch cluster. But that’s more complicated and probably not the intended way to do it configure your logging framework to send messages to graylog. There are standard GELF (the greylog format) appenders, e.g. this one, and the only thing you have to do is use the Public DNS environment variable in the logback.xml (which supports environment variable resolution). You should make the web interface accessible outside the network, so you can use an ELB for that, or the round-robin DNS mentioned above. Just make sure the security rules are tight and not allowing external tampering with your log data. If you are not running a graylog cluster (which I won’t cover), then the single instance can potentially fail. That isn’t a great loss, as log messages can be obtained from the instances, and they are short-lived anyway. But the metadata of the web interface is important – dashboards, alerts, etc. So it’s good to do regular backups (e.g. with mongodump). Using an EBS volume is also an option. Even though you send your log messages to the centralized log collector, it’s a good idea to also keep local logs, with the proper log rotation and cleanup. It’s not a trivial process, but it’s essential to have log collection, so I hope the guide has been helpful.
May 14, 2015
by Bozhidar Bozhanov
· 19,995 Views
article thumbnail
Collecting Transaction Per Minute from SQL Server and HammerDB
SQL Server script file can be created to run in a loop collecting for a given amount of time at a specified interval.
May 11, 2015
by Greg Schulz
· 10,307 Views
article thumbnail
8 Questions You Need to Ask About Microservices, Containers & Docker in 2015
In containers and microservices, we’re facing the greatest potential change in how we deliver and run software services since the arrival of virtual machines.
May 9, 2015
by Andrew Phillips
· 15,031 Views · 1 Like
article thumbnail
Quick Notes: What is CAP Theorem?
CAP theorem states that any database system can only attain two out of following states which is Consistency, Availability and Partition Tolerance.
May 5, 2015
by Ajitesh Kumar
· 26,339 Views · 3 Likes
article thumbnail
Why Run Your Microservices on a PaaS
[This article by Chris Haddad comes to you from the DZone Guide to Cloud Development - 2015 Edition. For more information—including in-depth articles from industry experts, best solutions for PaaS, iPaaS, IaaS, and MBaaS, and more—click the link below to download your free copy of the guide.] Microservices can be understood from two angles. First, the differential: teams that take a microservice design approach divide business solutions into distinct, full-stack business services owned by autonomous teams. Second, the integral: microservice-based applications weave multiple atomic microservices into holistic user experiences. Unfortunately, traditional application delivery models and traditional middleware infrastructure do not address microservice-specific demands for on-demand provisioning, dynamic composition, and service level management. On the other hand, the Platform-as-a-Service (PaaS) model addresses these demands perfectly. Running microservices on a PaaS fabric decreases solution fragility, reduces operational burden, and enhances developer productivity. To understand why, we’ll first review how microservices separate concerns from both business and object-oriented design perspectives. Second, we’ll consider how microservice-based design can complicate deployment as applications scale dynamically. Third, we’ll focus on how a PaaS environment helps to solve many of the problems both addressed and introduced by microservices-based architectures — in other words, why PaaS and microservices are a match made in heaven. Microservices: Separating Concerns By Business Solution A microservice approach decomposes monolithic applications according to the single responsibility pattern. In a microservice solution, each microservice interface delivers discrete business capabilities (e.g. customer profile, product catalogue, inventory, order, billing, fulfillment) within a well-defined, bounded context. The atomic microservice interfaces reside on separate and distinct full-stack application platforms that contain separate database storage, integration flows, and web application hosting. By separating concerns onto separate full-stack platforms and not sharing database instances or web application hosts across services, every team is free to choose different runtime languages and frameworks for its own microservice. Also, every team is free to evolve its data schemas, application frameworks, and business logic without impacting other teams. Because microservices are a relatively new design approach, many development teams may have the misconception that creating a microservice-based solution requires simply deploying small web services in containers. But this doesn’t cut quite deep enough. The correct approach is to evolve your monolithic design by applying service-oriented principles (i.e. encapsulation, loose coupling, separation of concerns) in conjunction with domain-driven design techniques and dynamic runtime application composition. For example, in a typical ecommerce scenario, a development team applies the bounded context pattern and single responsibility pattern to refactor a monolithic application into units distinguished by business capability (see Figure 2). By creating a user experience from loosely coupled services instead of tightly coupled native-language business objects, teams have more independence to develop, evolve, and deploy each business capability separately. Obviously, the microservice design approach works best for (a) greenfield projects or (b) modernization efforts where teams focus on refactoring monolithic application assets. The Microservice Execution Trap Although a microservice approach decouples development dependencies and speeds up development iterations, microservices also create a challenging environment for high-performance scaling and reliable runtime execution. More complex, loosely coupled, and dynamic environments distribute business capabilities over the entire network. Even a task as simple as responding to a single web application page request may spread out across several microservice instances residing on a distributed network topology. Martin Fowler and Stefan Tilkov (both microservice proponents) warn teams that successfully implementing a microservice approach requires choosing platforms that decrease solution fragility and reduce operational burdens. What Platform-as-a-Service Offers Platform-as-a-Service environments reduce microservice operational burdens when infrastructure-as-code and declarative policies are used to eliminate all manual actions and increase runtime quality of service (i.e. reliability, availability, scalability, and performance). The appropriate PaaS environment will automatically deploy, provision, and link full-stack microservices. In a microservice architecture, teams want to rapidly release new versions and perform A/B testing across versions. When teams define instance dependencies, scaling properties, and security policies as PaaS metadata or code scripts, the runtime fabric can reduce manual effort and increase release confidence. With a DevOps- friendly PaaS, the team can experiment with new service versions and safely rollback to a prior stable release if a problem arises. Because microservices are full-stack silos *1* that can be composed of multiple server instances (e.g. web server, database, load balancer, integration server), a PaaS can reduce deployment complexity by automatically spinning up and linking all instances. Linking may require discovering instance locations, dynamically initializing network routes, and auto-configuring connection strings based on service version or tenant. A traditional application will compose business functions and user experience by statically linking class files and shared object libraries. In contrast, microservice- based applications use service composition to connect available microservices endpoints and realize a fully functional application. While many microservice proponents promote microservice-based interactions by “smart endpoints through dumb pipes, ‘ effective service composition requires smart infrastructure building blocks to bootstrap and maintain connections between services and consumers. The right PaaS solves these problems. Infrastructure building blocks will register service endpoint locations, associate metadata and policies, connect clients, circuit break around failures, correlate inter-service calls, and load balance traffic. A microservice-friendly PaaS will provide service registries, metadata services, discovery services, and service virtualization gateways. In the pipe, circuit breakers will automatically route traffic on failover or overload. Smart endpoint code will dynamically connect with microservices based on discovery service responses and negotiated quality of service parameters. Rather than being hard-coded to a specific service hostname and URI, endpoint code will query for microservice location based on security assurances, performance guarantees, traffic load, service version, client tenancy, or business domain. When services are unavailable or underperform, smart endpoints will follow the tolerant reader pattern and gracefully degrade experience or proactively recover. A few recovery options include reading from local caches or circuit tripping to backup service endpoints. In conjunction with smart endpoint actions, a smart PaaS will spin up new microservice endpoints and full-stack instances based on service level management metrics. By following microservice architecture best practices, teams create anti-fragile applications that not only withstand a shock, but also improve performance and quality of service when stressed or experiencing failures. To drive this non-intuitive behavior, the underlying platform environment must be ready to scale, repair, and reconnect services. PaaS service level management components will create more resilient and anti-fragile microservices by monitoring performance, elastically provisioning instances, and dynamically re-routing traffic. Scaling an anti-fragile microservice is more difficult than scaling a web application. The PaaS should distribute microservice instances across multiple availability zones and dynamically adjust traffic to reduce latency and response time. Because transient microservice instances will rapidly start, stop, and change location, the service management layer must be completely automated and integrated with routing services. A PaaS environment will deliver the service level management, dynamic service composition, circuit breakers, and on-demand provisioning functions required to overcome the complexity inherent within a distributed microservice-based application architecture. Running microservices on a PaaS fabric will decrease solution fragility, reduce operational burden, and enhance developer productivity. If you are pursuing a microservice design approach, make sure you choose a microservice- friendly PaaS. DOWNLOAD YOUR FREE COPY TODAY
May 5, 2015
by Chris Haddad
· 12,076 Views · 2 Likes
article thumbnail
A Look at Nanomsg and Scalability Protocols (Why ZeroMQ Shouldn’t Be Your First Choice)
Earlier this month, I explored ZeroMQ and how it proves to be a promising solution for building fast, high-throughput, and scalable distributed systems. Despite lending itself quite well to these types of problems, ZeroMQ is not without its flaws. Its creators have attempted to rectify many of these shortcomings through spiritual successors Crossroads I/O and nanomsg. The now-defunct Crossroads I/O is a proper fork of ZeroMQ with the true intention being to build a viable commercial ecosystem around it. Nanomsg, however, is a reimagining of ZeroMQ—a complete rewrite in C1. It builds upon ZeroMQ’s rock-solid performance characteristics while providing several vital improvements, both internal and external. It also attempts to address many of the strange behaviors that ZeroMQ can often exhibit. Today, I’ll take a look at what differentiates nanomsg from its predecessor and implement a use case for it in the form of service discovery. Nanomsg vs. ZeroMQ A common gripe people have with ZeroMQ is that it doesn’t provide an API for new transport protocols, which essentially limits you to TCP, PGM, IPC, and ITC. Nanomsg addresses this problem by providing a pluggable interface for transports and messaging protocols. This means support for new transports (e.g. WebSockets) and new messaging patterns beyond the standard set of PUB/SUB, REQ/REP, etc. Nanomsg is also fully POSIX-compliant, giving it a cleaner API and better compatibility. No longer are sockets represented as void pointers and tied to a context—simply initialize a new socket and begin using it in one step. With ZeroMQ, the context internally acts as a storage mechanism for global state and, to the user, as a pool of I/O threads. This concept has been completely removed from nanomsg. In addition to POSIX compliance, nanomsg is hoping to be interoperable at the API and protocol levels, which would allow it to be a drop-in replacement for, or otherwise interoperate with, ZeroMQ and other libraries which implement ZMTP/1.0 and ZMTP/2.0. It has yet to reach full parity, however. ZeroMQ has a fundamental flaw in its architecture. Its sockets are not thread-safe. In and of itself, this is not problematic and, in fact, is beneficial in some cases. By isolating each object in its own thread, the need for semaphores and mutexes is removed. Threads don’t touch each other and, instead, concurrency is achieved with message passing. This pattern works well for objects managed by worker threads but breaks down when objects are managed in user threads. If the thread is executing another task, the object is blocked. Nanomsg does away with the one-to-one relationship between objects and threads. Rather than relying on message passing, interactions are modeled as sets of state machines. Consequently, nanomsg sockets are thread-safe. Nanomsg has a number of other internal optimizations aimed at improving memory and CPU efficiency. ZeroMQ uses a simple trie structure to store and match PUB/SUB subscriptions, which performs nicely for sub-10,000 subscriptions but quickly becomes unreasonable for anything beyond that number. Nanomsg uses a space-optimized trie called a radix tree to store subscriptions. Unlike its predecessor, the library also offers a true zero-copy API which greatly improves performance by allowing memory to be copied from machine to machine while completely bypassing the CPU. ZeroMQ implements load balancing using a round-robin algorithm. While it provides equal distribution of work, it has its limitations. Suppose you have two datacenters, one in New York and one in London, and each site hosts instances of “foo” services. Ideally, a request made for foo from New York shouldn’t get routed to the London datacenter and vice versa. With ZeroMQ’s round-robin balancing, this is entirely possible unfortunately. One of the new user-facing features that nanomsg offers is priority routing for outbound traffic. We avoid this latency problem by assigning priority one to foo services hosted in New York for applications also hosted there. Priority two is then assigned to foo services hosted in London, giving us a failover in the event that foos in New York are unavailable. Additionally, nanomsg offers a command-line tool for interfacing with the system called nanocat. This tool lets you send and receive data via nanomsg sockets, which is useful for debugging and health checks. Scalability Protocols Perhaps most interesting is nanomsg’s philosophical departure from ZeroMQ. Instead of acting as a generic networking library, nanomsg intends to provide the “Lego bricks” for building scalable and performant distributed systems by implementing what it refers to as “scalability protocols.” These scalability protocols are communication patterns which are an abstraction on top of the network stack’s transport layer. The protocols are fully separated from each other such that each can embody a well-defined distributed algorithm. The intention, as stated by nanomsg’s author Martin Sustrik, is to have the protocol specifications standardized through the IETF. Nanomsg currently defines six different scalability protocols: PAIR, REQREP, PIPELINE, BUS, PUBSUB, and SURVEY. PAIR (Bidirectional Communication) PAIR implements simple one-to-one, bidirectional communication between two endpoints. Two nodes can send messages back and forth to each other. REQREP (Client Requests, Server Replies) The REQREP protocol defines a pattern for building stateless services to process user requests. A client sends a request, the server receives the request, does some processing, and returns a response. PIPELINE (One-Way Dataflow) PIPELINE provides unidirectional dataflow which is useful for creating load-balanced processing pipelines. A producer node submits work that is distributed among consumer nodes. BUS (Many-to-Many Communication) BUS allows messages sent from each peer to be delivered to every other peer in the group. PUBSUB (Topic Broadcasting) PUBSUB allows publishers to multicast messages to zero or more subscribers. Subscribers, which can connect to multiple publishers, can subscribe to specific topics, allowing them to receive only messages that are relevant to them. SURVEY (Ask Group a Question) The last scalability protocol, and the one in which I will further examine by implementing a use case with, is SURVEY. The SURVEY pattern is similar to PUBSUB in that a message from one node is broadcasted to the entire group, but where it differs is that each node in the group responds to the message. This opens up a wide variety of applications because it allows you to quickly and easily query the state of a large number of systems in one go. The survey respondents must respond within a time window configured by the surveyor. Implementing Service Discovery As I pointed out, the SURVEY protocol has a lot of interesting applications. For example: What data do you have for this record? What price will you offer for this item? Who can handle this request? To continue exploring it, I will implement a basic service-discovery pattern. Service discovery is a pretty simple question that’s well-suited for SURVEY: what services are out there? Our solution will work by periodically submitting the question. As services spin up, they will connect with our service discovery system so they can identify themselves. We can tweak parameters like how often we survey the group to ensure we have an accurate list of services and how long services have to respond. This is great because 1) the discovery system doesn’t need to be aware of what services there are—it just blindly submits the survey—and 2) when a service spins up, it will be discovered and if it dies, it will be “undiscovered.” Here is the ServiceDiscovery class: from collections import defaultdict import random from nanomsg import NanoMsgAPIError from nanomsg import Socket from nanomsg import SURVEYOR from nanomsg import SURVEYOR_DEADLINE class ServiceDiscovery(object): def __init__(self, port, deadline=5000): self.socket = Socket(SURVEYOR) self.port = port self.deadline = deadline self.services = defaultdict(set) def bind(self): self.socket.bind('tcp://*:%s' % self.port) self.socket.set_int_option(SURVEYOR, SURVEYOR_DEADLINE, self.deadline) def discover(self): if not self.socket.is_open(): return self.services self.services = defaultdict(set) self.socket.send('service query') while True: try: response = self.socket.recv() except NanoMsgAPIError: break service, address = response.split('|') self.services[service].add(address) return self.services def resolve(self, service): providers = self.services[service] if not providers: return None return random.choice(tuple(providers)) def close(self): self.socket.close() The discover method submits the survey and then collects the responses. Notice we construct a SURVEYOR socket and set the SURVEYOR_DEADLINE option on it. This deadline is the number of milliseconds from when a survey is submitted to when a response must be received—adjust it accordingly based on your network topology. Once the survey deadline has been reached, a NanoMsgAPIError is raised and we break the loop. The resolve method will take the name of a service and randomly select an available provider from our discovered services. We can then wrap ServiceDiscovery with a daemon that will periodically run discover. import os import time from service_discovery import ServiceDiscovery DEFAULT_PORT = 5555 DEFAULT_DEADLINE = 5000 DEFAULT_INTERVAL = 2000 def start_discovery(port, deadline, interval): discovery = ServiceDiscovery(port, deadline=deadline) discovery.bind() print 'Starting service discovery [port: %s, deadline: %s, interval: %s]' \ % (port, deadline, interval) while True: print discovery.discover() time.sleep(interval / 1000) if __name__ == '__main__': port = int(os.environ.get('PORT', DEFAULT_PORT)) deadline = int(os.environ.get('DEADLINE', DEFAULT_DEADLINE)) interval = int(os.environ.get('INTERVAL', DEFAULT_INTERVAL)) start_discovery(port, deadline, interval) The discovery parameters are configured through environment variables which I inject into a Docker container. Services must connect to the discovery system when they start up. When they receive a survey, they should respond by identifying what service they provide and where the service is located. One such service might look like the following: import os from threading import Thread from nanomsg import REP from nanomsg import RESPONDENT from nanomsg import Socket DEFAULT_DISCOVERY_HOST = 'localhost' DEFAULT_DISCOVERY_PORT = 5555 DEFAULT_SERVICE_NAME = 'foo' DEFAULT_SERVICE_PROTOCOL = 'tcp' DEFAULT_SERVICE_HOST = 'localhost' DEFAULT_SERVICE_PORT = 9000 def register_service(service_name, service_address, discovery_host, discovery_port): socket = Socket(RESPONDENT) socket.connect('tcp://%s:%s' % (discovery_host, discovery_port)) print 'Starting service registration [service: %s %s, discovery: %s:%s]' \ % (service_name, service_address, discovery_host, discovery_port) while True: message = socket.recv() if message == 'service query': socket.send('%s|%s' % (service_name, service_address)) def start_service(service_name, service_protocol, service_port): socket = Socket(REP) socket.bind('%s://*:%s' % (service_protocol, service_port)) print 'Starting service %s' % service_name while True: request = socket.recv() print 'Request: %s' % request socket.send('The answer is 42') if __name__ == '__main__': discovery_host = os.environ.get('DISCOVERY_HOST', DEFAULT_DISCOVERY_HOST) discovery_port = os.environ.get('DISCOVERY_PORT', DEFAULT_DISCOVERY_PORT) service_name = os.environ.get('SERVICE_NAME', DEFAULT_SERVICE_NAME) service_host = os.environ.get('SERVICE_HOST', DEFAULT_SERVICE_HOST) service_port = os.environ.get('SERVICE_PORT', DEFAULT_SERVICE_PORT) service_protocol = os.environ.get('SERVICE_PROTOCOL', DEFAULT_SERVICE_PROTOCOL) service_address = '%s://%s:%s' % (service_protocol, service_host, service_port) Thread(target=register_service, args=(service_name, service_address, discovery_host, discovery_port)).start() start_service(service_name, service_protocol, service_port) Once again, we configure parameters through environment variables set on a container. Note that we connect to the discovery system with a RESPONDENT socket which then responds to service queries with the service name and address. The service itself uses a REP socket that simply responds to any requests with “The answer is 42,” but it could take any number of forms such as HTTP, raw socket, etc. The full code for this example, including Dockerfiles, can be found on GitHub. Nanomsg or ZeroMQ? Based on all the improvements that nanomsg makes on top of ZeroMQ, you might be wondering why you would use the latter at all. Nanomsg is still relatively young. Although it has numerous language bindings, it hasn’t reached the maturity of ZeroMQ which has a thriving development community. ZeroMQ has extensive documentation and other resources to help developers make use of the library, while nanomsg has very little. Doing a quick Google search will give you an idea of the difference (about 500,000 results for ZeroMQ to nanomsg’s 13,500). That said, nanomsg’s improvements and, in particular, its scalability protocols make it very appealing. A lot of the strange behaviors that ZeroMQ exposes have been resolved completely or at least mitigated. It’s actively being developed and is quickly gaining more and more traction. Technically, nanomsg has been in beta since March, but it’s starting to look production-ready if it’s not there already.
May 4, 2015
by Tyler Treat
· 16,040 Views · 1 Like
article thumbnail
How To: Neo4j Data Import - Minimal Example
The easiest way to import data from relational or legacy systems, like plain CSV files without headers, into Neo4j with Cypher and a graph model.
April 30, 2015
by Michael Hunger
· 7,765 Views
article thumbnail
How to Create Multi-Column PDF Document inside .NET Applications
This technical tip shows how .NET developers create multi-column PDF document using Aspose.Pdf for .NET. In magazines and newspapers, we mostly see that news are displayed in multiple columns on the single pages instead of the books where text paragraphs are mostly printed on the whole pages from left to right position. Many document processing applications like Microsoft Word and Adobe Acrobat Writer allow users to create multiple columns on a single page and then add data to them. Aspose.Pdf for .NET also offers the feature to create multiple columns inside the pages of PDF documents. In order to create multi-column PDF file, we can make use of Aspose.Pdf.FloatingBox class as it provides ColumnInfo.ColumnCount property to specify the number of columns inside FloatingBox and we can also specify the spacing between columns and columns widths using ColumnInfo.ColumnSpacing and ColumnInfo.ColumnWidths properties accordingly. Please note that FloatingBox is an element inside Document Object Model and it can have obsolete positioning as compared to relative positioning (i.e. Text, Graph, Image etc). Column spacing means the space between the columns and the default spacing between the columns is 1.25cm. If the column width is not specified, then Aspose.Pdf for .NET calculates width for each column automatically according to the page size and column spacing. The code example is given below to demonstrate the creation of two columns with Graphs objects (Line) and they are added to paragraphs collection of FloatingBox, which is then added paragraphs collection of Page instance. //C# Code Sample Document doc = new Document(); // specify the left margin info for the PDF file doc.PageInfo.Margin.Left = 40; // specify the Right margin info for the PDF file doc.PageInfo.Margin.Right = 40; Page page = doc.Pages.Add(); Aspose.Pdf.Drawing.Graph graph1 = new Aspose.Pdf.Drawing.Graph(500, 2); // Add the line to paraphraphs collection of section object page.Paragraphs.Add(graph1); //specify the coordinates for the line float[] posArr = new float[] { 1, 2, 500, 2 }; Aspose.Pdf.Drawing.Line l1 = new Aspose.Pdf.Drawing.Line(posArr); graph1.Shapes.Add(l1); //Create string variables with text containing html tags string s = "" + " How to Steer Clear of money scams " + ""; //Create text paragraphs containing HTML text HtmlFragment heading_text = new HtmlFragment(s); page.Paragraphs.Add(heading_text); Aspose.Pdf.FloatingBox box = new Aspose.Pdf.FloatingBox(); //Add four columns in the section box.ColumnInfo.ColumnCount = 2; //Set the spacing between the columns box.ColumnInfo.ColumnSpacing = "5"; box.ColumnInfo.ColumnWidths = "105 105"; TextFragment text1 = new TextFragment("By A Googler (The Official Google Blog)"); text1.TextState.FontSize = 8; text1.TextState.LineSpacing = 2; box.Paragraphs.Add(text1); text1.TextState.FontSize = 10; text1.TextState.FontStyle = FontStyles.Italic; // Create a graphs object to draw a line Aspose.Pdf.Drawing.Graph graph2 = new Aspose.Pdf.Drawing.Graph(50, 10); // specify the coordinates for the line float[] posArr2 = new float[] { 1, 10, 100, 10 }; Aspose.Pdf.Drawing.Line l2 = new Aspose.Pdf.Drawing.Line(posArr2); graph2.Shapes.Add(l2); // Add the line to paragraphs collection of section object box.Paragraphs.Add(graph2); TextFragment text2 = new TextFragment(@"Sed augue tortor, sodales id, luctus et, pulvinar ut, eros. Suspendisse vel dolor. Sed quam. Curabitur ut massa vitae eros euismod aliquam. Pellentesque sit amet elit. Vestibulum interdum pellentesque augue. Cras mollis arcu sit amet purus. Donec augue. Nam mollis tortor a elit. Nulla viverra nisl vel mauris. Vivamus sapien. nascetur ridiculus mus. Nam justo lorem, aliquam luctus, sodales et, semper sed, enim Nam justo lorem, aliquam luctus, sodales et,nAenean posuere ante ut neque. Morbi sollicitudin congue felis. Praesent turpis diam, iaculis sed, pharetra non, mollis ac, mauris. Phasellus nisi ipsum, pretium vitae, tempor sed, molestie eu, dui. Duis lacus purus, tristique ut, iaculis cursus, tincidunt vitae, risus. Sed commodo. *** sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam justo lorem, aliquam luctus, sodales et, semper sed, enim Nam justo lorem, aliquam luctus, sodales et, semper sed, enim Nam justo lorem, aliquam luctus, sodales et, semper sed, enim nAenean posuere ante ut neque. Morbi sollicitudin congue felis. Praesent turpis diam, iaculis sed, pharetra non, mollis ac, mauris. Phasellus nisi ipsum, pretium vitae, tempor sed, molestie eu, dui. Duis lacus purus, tristique ut, iaculis cursus, tincidunt vitae, risus. Sed commodo. *** sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Sed urna. . Duis convallis ultrices nisi. Maecenas non ligula. Nunc nibh est, tincidunt in, placerat sit amet, vestibulum a, nulla. Praesent porttitor turpis eleifend ante. Morbi sodales.nAenean posuere ante ut neque. Morbi sollicitudin congue felis. Praesent turpis diam, iaculis sed, pharetra non, mollis ac, mauris. Phasellus nisi ipsum, pretium vitae, tempor sed, molestie eu, dui. Duis lacus purus, tristique ut, iaculis cursus, tincidunt vitae, risus. Sed commodo. *** sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Sed urna. . Duis convallis ultrices nisi. Maecenas non ligula. Nunc nibh est, tincidunt in, placerat sit amet, vestibulum a, nulla. Praesent porttitor turpis eleifend ante. Morbi sodales."); box.Paragraphs.Add(text2); page.Paragraphs.Add(box); string outFile = "c:/pdftest/Muli-Column.pdf"; //Save the Pdf doc.Save(outFile); ' Load the PDF file Dim doc As Document = New Document("source.pdf") ' Get the first link annotation from first page of document Dim linkAnnot As LinkAnnotation = CType(doc.Pages(1).Annotations(1), LinkAnnotation) ' Modification link: change link destination Dim goToAction As Aspose.Pdf.InteractiveFeatures.GoToAction = CType(linkAnnot.Action, Aspose.Pdf.InteractiveFeatures.GoToAction) ' Specify the destination for link object ' The first parameter is document object, second is destination page number. ' The 5ht argument is zoom factor when displaying the respective page. When using 2, the page will be displayed in 200% zoom goToAction.Destination = New Aspose.Pdf.InteractiveFeatures.XYZExplicitDestination(doc, 1, 1, 2, 2) ' Save the document with updated link doc.Save("PDFLINK_Modified_output.pdf") //VB.NET Code Sample Dim doc As Document = New Document() ' specify the left margin info for the PDF file doc.PageInfo.Margin.Left = 40 ' specify the Right margin info for the PDF file doc.PageInfo.Margin.Right = 40 Dim page As Page = doc.Pages.Add() Dim graph1 As Aspose.Pdf.Drawing.Graph = New Aspose.Pdf.Drawing.Graph(500, 2) ' Add the line to paragraphs collection of section object page.Paragraphs.Add(graph1) 'specify the coordinates for the line Dim posArr() As Single = {1, 2, 500, 2} Dim l1 As Aspose.Pdf.Drawing.Line = New Aspose.Pdf.Drawing.Line(posArr) graph1.Shapes.Add(l1) 'Create string variables with text containing html tags Dim s As String = " How to Steer Clear of money scams " 'Create text paragraphs containing HTML text Dim heading_text As HtmlFragment = New HtmlFragment(s) page.Paragraphs.Add(heading_text) Dim box As Aspose.Pdf.FloatingBox = New Aspose.Pdf.FloatingBox() 'Add four columns in the section box.ColumnInfo.ColumnCount = 2 'Set the spacing between the columns box.ColumnInfo.ColumnSpacing = "5" box.ColumnInfo.ColumnWidths = "105 105" Dim text1 As Aspose.Pdf.Text.TextFragment = New Aspose.Pdf.Text.TextFragment("By A Googler (The Official Google Blog)") text1.TextState.FontSize = 8 text1.TextState.LineSpacing = 2 box.Paragraphs.Add(text1) text1.TextState.FontSize = 10 text1.TextState.FontStyle = Aspose.Pdf.Text.FontStyles.Italic ' Create a graphs object to draw a line Dim graph2 As Aspose.Pdf.Drawing.Graph = New Aspose.Pdf.Drawing.Graph(50, 10) ' specify the coordinates for the line Dim posArr2() As Single = {1, 10, 100, 10} Dim l2 As Aspose.Pdf.Drawing.Line = New Aspose.Pdf.Drawing.Line(posArr2) graph2.Shapes.Add(l2) ' Add the line to paragraphs collection of section object box.Paragraphs.Add(graph2) Dim text2 As Aspose.Pdf.Text.TextFragment = New Aspose.Pdf.Text.TextFragment("Sed augue tortor, sodales id, luctus et, pulvinar ut, eros. Suspendisse vel dolor. Sed quam. Curabitur ut massa vitae eros euismod aliquam. Pellentesque sit amet elit. Vestibulum interdum pellentesque augue. Cras mollis arcu sit amet purus. Donec augue. Nam mollis tortor a elit. Nulla viverra nisl vel mauris. Vivamus sapien. nascetur ridiculus mus. Nam justo lorem, aliquam luctus, sodales et, semper sed, enim Nam justo lorem, aliquam luctus, sodales et,nAenean posuere ante ut neque. Morbi sollicitudin congue felis. Praesent turpis diam, iaculis sed, pharetra non, mollis ac, mauris. Phasellus nisi ipsum, pretium vitae, tempor sed, molestie eu, dui. Duis lacus purus, tristique ut, iaculis cursus, tincidunt vitae, risus. Sed commodo. *** sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam justo lorem, aliquam luctus, sodales et, semper sed, enim Nam justo lorem, aliquam luctus, sodales et, semper sed, enim Nam justo lorem, aliquam luctus, sodales et, semper sed, enim nAenean posuere ante ut neque. Morbi sollicitudin congue felis. Praesent turpis diam, iaculis sed, pharetra non, mollis ac, mauris. Phasellus nisi ipsum, pretium vitae, tempor sed, molestie eu, dui. Duis lacus purus, tristique ut, iaculis cursus, tincidunt vitae, risus. Sed commodo. *** sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Sed urna. . Duis convallis ultrices nisi. Maecenas non ligula. Nunc nibh est, tincidunt in, placerat sit amet, vestibulum a, nulla. Praesent porttitor turpis eleifend ante. Morbi sodales.nAenean posuere ante ut neque. Morbi sollicitudin congue felis. Praesent turpis diam, iaculis sed, pharetra non, mollis ac, mauris. Phasellus nisi ipsum, pretium vitae, tempor sed, molestie eu, dui. Duis lacus purus, tristique ut, iaculis cursus, tincidunt vitae, risus. Sed commodo. *** sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Sed urna. . Duis convallis ultrices nisi. Maecenas non ligula. Nunc nibh est, tincidunt in, placerat sit amet, vestibulum a, nulla. Praesent porttitor turpis eleifend ante. Morbi sodales.") box.Paragraphs.Add(text2) page.Paragraphs.Add(box) Dim outFile As String = "c:/pdftest/Muli-Column.pdf" 'Save the Pdf doc.Save(outFile)
April 29, 2015
by David Zondray
· 1,444 Views
article thumbnail
Diagnosing SST Errors with Percona XtraDB Cluster for MySQL
[This article was written by Stephane Combaudon] State Snapshot Transfer (SST) is used in Percona XtraDB Cluster (PXC) when a new node joins the cluster or to resync a failed node if Incremental State Transfer (IST) is no longer available. SST is triggered automatically but there is no magic: If it is not configured properly, it will not work and new nodes will never be able to join the cluster. Let’s have a look at a few classic issues. Port for SST is not open The donor and the joiner communicate on port 4444, and if the port is closed on one side, SST will always fail. You will see in the error log of the donor that SST is started: [...] 141223 16:08:48 [Note] WSREP: Node 2 (node1) requested state transfer from '*any*'. Selected 0 (node3)(SYNCED) as donor. 141223 16:08:48 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 6) 141223 16:08:48 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 141223 16:08:48 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.234.101:4444/xtrabackup_sst' --auth 'sstuser:s3cret' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid '04c085a1-89ca-11e4-b1b6-6b692803109b:6'' [...] But then nothing happens, and some time later you will see a bunch of errors: [...] 2014/12/23 16:09:52 socat[2965] E connect(3, AF=2 192.168.234.101:4444, 16): Connection timed out WSREP_SST: [ERROR] Error while getting data from donor node: exit codes: 0 1 (20141223 16:09:52.057) WSREP_SST: [ERROR] Cleanup after exit with status:32 (20141223 16:09:52.064) WSREP_SST: [INFO] Cleaning up temporary directories (20141223 16:09:52.068) 141223 16:09:52 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.234.101:4444/xtrabackup_sst' --auth 'sstuser:s3cret' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid '04c085a1-89ca-11e4-b1b6-6b692803109b:6' [...] On the joiner side, you will see a similar sequence: SST is started, then hangs and is finally aborted: [...] 141223 16:08:48 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 6) 141223 16:08:48 [Note] WSREP: Requesting state transfer: success, donor: 0 141223 16:08:49 [Note] WSREP: (f9560d0d, 'tcp://0.0.0.0:4567') turning message relay requesting off 141223 16:09:52 [Warning] WSREP: 0 (node3): State transfer to 2 (node1) failed: -32 (Broken pipe) 141223 16:09:52 [ERROR] WSREP: gcs/src/gcs_group.cpp:long int gcs_group_handle_join_msg(gcs_group_t*, const gcs_recv_msg_t*)():717: Will never receive state. Need to abort. The solution is of course to make sure that the ports are open on both sides. SST is not correctly configured Sometimes you will see an error like this on the donor: 141223 21:03:15 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.234.102:4444/xtrabackup_sst' --auth 'sstuser:s3cretzzz' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid 'e63f38f2-8ae6-11e4-a383-46557c71f368:0'' [...] WSREP_SST: [ERROR] innobackupex finished with error: 1. Check /var/lib/mysql//innobackup.backup.log (20141223 21:03:26.973) And if you look at innobackup.backup.log: 41223 21:03:26 innobackupex: Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_file=/etc/my.cnf;mysql_read_default_group=xtrabackup;mysql_socket=/var/lib/mysql/mysql.sock' as 'sstuser' (using password: YES). innobackupex: got a fatal error with the following stacktrace: at /usr//bin/innobackupex line 2995 main::mysql_connect('abort_on_error', 1) called at /usr//bin/innobackupex line 1530 innobackupex: Error: Failed to connect to MySQL server: DBI connect(';mysql_read_default_file=/etc/my.cnf;mysql_read_default_group=xtrabackup;mysql_socket=/var/lib/mysql/mysql.sock','sstuser',...) failed: Access denied for user 'sstuser'@'localhost' (using password: YES) at /usr//bin/innobackupex line 2979 What happened? The default SST method is xtrabackup-v2 and for it to work, you need to specify a username/password in the my.cnf file: [mysqld] wsrep_sst_auth=sstuser:s3cret And you also need to create the corresponding MySQL user: mysql> GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost' IDENTIFIED BY 's3cret'; So you should check that the user has been correctly created in MySQL and that wsrep_sst_auth is correctly set. Galera versions do not match Here is another set of errors you may see in the error log of the donor: 141223 21:14:27 [Warning] WSREP: unserialize error invalid flags 2: 71 (Protocol error) at gcomm/src/gcomm/datagram.hpp:unserialize():101 141223 21:14:30 [Warning] WSREP: unserialize error invalid flags 2: 71 (Protocol error) at gcomm/src/gcomm/datagram.hpp:unserialize():101 141223 21:14:33 [Warning] WSREP: unserialize error invalid flags 2: 71 (Protocol error) at gcomm/src/gcomm/datagram.hpp:unserialize():101 Here the issue is that you try to connect a node using Galera 2.x and a node running Galera 3.x. This can happen if you try to use a PXC 5.5 node and a PXC 5.6 node. The right solution is probably to understand why you ended up with such inconsistent versions and make sure all nodes are using the same Percona XtraDB Cluster version and Galera version. But if you know what you are doing, you can also instruct the node using Galera 3.x that it will communicate with Galera 2.x nodes by specifying in the my.cnf file: [mysqld] wsrep_provider_options="socket.checksum=1" Conclusion SST errors can have multiple reasons for occurring, and the best way to diagnose the issue is to have a look at the error log of the donor and the joiner. Galera is in general quite verbose so you can follow the progress of SST on both nodes and see where it fails. Then it is mostly about being able to interpret the error messages.
April 27, 2015
by Peter Zaitsev
· 11,858 Views
article thumbnail
Fixed Width Sortable Tables Row with jQueryUI
When you use jQuery UI sortable function on a table I've noticed that it will collapse the width of the row you're dragging which can lead to a strange user experience. In this tutorial we are going to see how you can use a helper function to change the width of dragging rows back to the original width. Have a look at the demo to see the difference. Demo jQuery Sortable is part of the jQuery UI library which can be found below. jQuery Sortable To define a table to have sortable rows all you have to do is apply the sortable method to the parent element of the row, which normal would be the table itself or ideally the table body. FilmDateRatingThe Shawshank Redemption19949.2 Then you can make the table body rows sortable by using the following jQuery code. $('table tbody').sortable(); One of the options you can use on the sortable method is helper property where you can define a function to run when dragging the display. Therefore we simply need to create a function that will reset the width of the table row by simply using the function below. $('table tbody').sortable({ helper: fixWidthHelper }).disableSelection(); function fixWidthHelper(e, ui) { ui.children().each(function() { $(this).width($(this).width()); }); return ui; } Demo
April 27, 2015
by Paul Underwood
· 19,454 Views
article thumbnail
Increasing Slow Query Performance with the Parallel Query Execution
[This article was written by Alexander Rubin] MySQL and Scaling-up (using more powerful hardware) was always a hot topic. Originally MySQL did not scale well with multiple CPUs; there were times when InnoDB performed poorer with more CPU cores than with less CPU cores. MySQL 5.6 can scale significantly better; however there is still 1 big limitation: 1 SQL query will eventually use only 1 CPU core (no parallelism). Here is what I mean by that: let’s say we have a complex query which will need to scan million of rows and may need to create a temporary table; in this case MySQL will not be able to scan the table in multiple threads (even with partitioning) so the single query will not be faster on the more powerful server. On the contrary, a server with more slower CPUs will show worse performance than the server with less (but faster) CPUs. To address this issue we can use a parallel query execution. Vadim wrote about the PHP asynchronous calls for MySQL. Another way to increase the parallelism will be to use “sharding” approach, for example with Shard Query. I’ve decided to test out the parallel (asynchronous) query execution with relatively large table: I’ve used the US Flights Ontime performance database, which was originally used by Vadim in the old post Analyzing air traffic performance. Let’s see how this can help us increase performance of the complex query reports. Parallel Query Example To illustrate the parallel query execution with MySQL I’ve created the following table: CREATE TABLE `ontime` ( `YearD` year(4) NOT NULL, `Quarter` tinyint(4) DEFAULT NULL, `MonthD` tinyint(4) DEFAULT NULL, `DayofMonth` tinyint(4) DEFAULT NULL, `DayOfWeek` tinyint(4) DEFAULT NULL, `FlightDate` date DEFAULT NULL, `UniqueCarrier` char(7) DEFAULT NULL, `AirlineID` int(11) DEFAULT NULL, `Carrier` char(2) DEFAULT NULL, `TailNum` varchar(50) DEFAULT NULL, `FlightNum` varchar(10) DEFAULT NULL, `OriginAirportID` int(11) DEFAULT NULL, `OriginAirportSeqID` int(11) DEFAULT NULL, `OriginCityMarketID` int(11) DEFAULT NULL, `Origin` char(5) DEFAULT NULL, `OriginCityName` varchar(100) DEFAULT NULL, `OriginState` char(2) DEFAULT NULL, `OriginStateFips` varchar(10) DEFAULT NULL, `OriginStateName` varchar(100) DEFAULT NULL, `OriginWac` int(11) DEFAULT NULL, `DestAirportID` int(11) DEFAULT NULL, `DestAirportSeqID` int(11) DEFAULT NULL, `DestCityMarketID` int(11) DEFAULT NULL, `Dest` char(5) DEFAULT NULL, -- ... (removed number of fields) `id` int(11) NOT NULL AUTO_INCREMENT, PRIMARY KEY (`id`), KEY `YearD` (`YearD`), KEY `Carrier` (`Carrier`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; And loaded 26 years of data into it. The table is 56G with ~152M rows. Software: Percona 5.6.15-63.0. Hardware: Supermicro; X8DTG-D; 48G of RAM; 24xIntel(R) Xeon(R) CPU L5639 @ 2.13GHz, 1xSSD drive (250G) So we have 24 relatively slow CPUs Simple query Now we can run some queries. The first query is very simple: find all flights per year (in the US): select yeard, count(*) from ontime group by yeard As we have the index on YearD, the query will use the index: mysql> explain select yeard, count(*) from ontime group by yeardG *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ontime type: index possible_keys: YearD,comb1 key: YearD key_len: 1 ref: NULL rows: 148046200 Extra: Using index 1 row in set (0.00 sec) The query is simple, however, it will have to scan 150M rows. Here is the results of the query (cached): mysql> select yeard, count(*) from ontime group by yeard; +-------+----------+ | yeard | count(*) | +-------+----------+ | 1988 | 5202096 | | 1989 | 5041200 | | 1990 | 5270893 | | 1991 | 5076925 | | 1992 | 5092157 | | 1993 | 5070501 | | 1994 | 5180048 | | 1995 | 5327435 | | 1996 | 5351983 | | 1997 | 5411843 | | 1998 | 5384721 | | 1999 | 5527884 | | 2000 | 5683047 | | 2001 | 5967780 | | 2002 | 5271359 | | 2003 | 6488540 | | 2004 | 7129270 | | 2005 | 7140596 | | 2006 | 7141922 | | 2007 | 7455458 | | 2008 | 7009726 | | 2009 | 6450285 | | 2010 | 6450117 | | 2011 | 6085281 | | 2012 | 6096762 | | 2013 | 5349447 | +-------+----------+ 26 rows in set (54.10 sec) The query took 54 seconds and utilized only 1 CPU core. However, this query is perfect for running in parallel. We can run 26 parallel queries, each will count its own year. I’ve used the following shell script to run the queries in background: #!/bin/bash date for y in {1988..2013} do sql="select yeard, count(*) from ontime where yeard=$y" mysql -vvv ontime -e "$sql" &>par_sql1/$y.log & done wait date Here are the results: par_sql1/1988.log:1 row in set (3.70 sec) par_sql1/1989.log:1 row in set (4.08 sec) par_sql1/1990.log:1 row in set (4.59 sec) par_sql1/1991.log:1 row in set (4.26 sec) par_sql1/1992.log:1 row in set (4.54 sec) par_sql1/1993.log:1 row in set (2.78 sec) par_sql1/1994.log:1 row in set (3.41 sec) par_sql1/1995.log:1 row in set (4.87 sec) par_sql1/1996.log:1 row in set (4.41 sec) par_sql1/1997.log:1 row in set (3.69 sec) par_sql1/1998.log:1 row in set (3.56 sec) par_sql1/1999.log:1 row in set (4.47 sec) par_sql1/2000.log:1 row in set (4.71 sec) par_sql1/2001.log:1 row in set (4.81 sec) par_sql1/2002.log:1 row in set (4.19 sec) par_sql1/2003.log:1 row in set (4.04 sec) par_sql1/2004.log:1 row in set (5.12 sec) par_sql1/2005.log:1 row in set (5.10 sec) par_sql1/2006.log:1 row in set (4.93 sec) par_sql1/2007.log:1 row in set (5.29 sec) par_sql1/2008.log:1 row in set (5.59 sec) par_sql1/2009.log:1 row in set (4.44 sec) par_sql1/2010.log:1 row in set (4.91 sec) par_sql1/2011.log:1 row in set (5.08 sec) par_sql1/2012.log:1 row in set (4.85 sec) par_sql1/2013.log:1 row in set (4.56 sec) Complex Query Now we can try more complex query. Lets imagine we want to find out which airlines have maximum delays for the flights inside continental US during the business days from 1988 to 2009 (I was trying to come up with the complex query with multiple conditions in the where clause). select min(yeard), max(yeard), Carrier, count(*) as cnt, sum(ArrDelayMinutes>30) as flights_delayed, round(sum(ArrDelayMinutes>30)/count(*),2) as rate FROM ontime WHERE DayOfWeek not in (6,7) and OriginState not in ('AK', 'HI', 'PR', 'VI') and DestState not in ('AK', 'HI', 'PR', 'VI') and flightdate < '2010-01-01' GROUP by carrier HAVING cnt > 100000 and max(yeard) > 1990 ORDER by rate DESC As the query has “group by” and “order by” plus multiple ranges in the where clause it will have to create a temporary table: id: 1 select_type: SIMPLE table: ontime type: index possible_keys: comb1 key: comb1 key_len: 9 ref: NULL rows: 148046200 Extra: Using where; Using temporary; Using filesort (for this query I’ve created the combined index: KEY comb1 (Carrier,YearD,ArrDelayMinutes) to increase performance) The query runs in ~15 minutes: +------------+------------+---------+----------+-----------------+------+ | min(yeard) | max(yeard) | Carrier | cnt | flights_delayed | rate | +------------+------------+---------+----------+-----------------+------+ | 2003 | 2009 | EV | 1454777 | 237698 | 0.16 | | 2006 | 2009 | XE | 1016010 | 152431 | 0.15 | | 2006 | 2009 | YV | 740608 | 110389 | 0.15 | | 2003 | 2009 | B6 | 683874 | 103677 | 0.15 | | 2003 | 2009 | FL | 1082489 | 158748 | 0.15 | | 2003 | 2005 | DH | 501056 | 69833 | 0.14 | | 2001 | 2009 | MQ | 3238137 | 448037 | 0.14 | | 2003 | 2006 | RU | 1007248 | 126733 | 0.13 | | 2004 | 2009 | OH | 1195868 | 160071 | 0.13 | | 2003 | 2006 | TZ | 136735 | 16496 | 0.12 | | 1988 | 2009 | UA | 9593284 | 1197053 | 0.12 | | 1988 | 2009 | AA | 10600509 | 1185343 | 0.11 | | 1988 | 2001 | TW | 2659963 | 280741 | 0.11 | | 1988 | 2009 | CO | 6029149 | 673863 | 0.11 | | 2007 | 2009 | 9E | 577244 | 59440 | 0.10 | | 1988 | 2009 | DL | 11869471 | 1156267 | 0.10 | | 1988 | 2009 | NW | 7601727 | 725460 | 0.10 | | 1988 | 2009 | AS | 1506003 | 146920 | 0.10 | | 2003 | 2009 | OO | 2654259 | 257069 | 0.10 | | 1988 | 2009 | US | 10276941 | 991016 | 0.10 | | 1988 | 1991 | PA | 206841 | 19465 | 0.09 | | 1988 | 2005 | HP | 2607603 | 235675 | 0.09 | | 1988 | 2009 | WN | 12722174 | 1107840 | 0.09 | | 2005 | 2009 | F9 | 307569 | 28679 | 0.09 | +------------+------------+---------+----------+-----------------+------+ 24 rows in set (15 min 56.40 sec) Now we can split this query and run the 31 queries (=31 distinct airlines in this table) in parallel. I have used the following script: date for c in '9E' 'AA' 'AL' 'AQ' 'AS' 'B6' 'CO' 'DH' 'DL' 'EA' 'EV' 'F9' 'FL' 'HA' 'HP' 'ML' 'MQ' 'NW' 'OH' 'OO' 'PA' 'PI' 'PS' 'RU' 'TW' 'TZ' 'UA' 'US' 'WN' 'XE' 'YV' do sql=" select min(yeard), max(yeard), Carrier, count(*) as cnt, sum(ArrDelayMinutes>30) as flights_delayed, round(sum(ArrDelayMinutes>30)/count(*),2) as rate FROM ontime WHERE DayOfWeek not in (6,7) and OriginState not in ('AK', 'HI', 'PR', 'VI') and DestState not in ('AK', 'HI', 'PR', 'VI') and flightdate < '2010-01-01' and carrier = '$c'" mysql -uroot -vvv ontime -e "$sql" &>par_sql_complex/$c.log & done wait date In this case we will also avoid creating temporary table (as we have an index which starts with carrier). Results: total time is 5 min 47 seconds (3x faster) Start: 15:41:02 EST 2013 End: 15:46:49 EST 2013 Per query statistics: par_sql_complex/9E.log:1 row in set (44.47 sec) par_sql_complex/AA.log:1 row in set (5 min 41.13 sec) par_sql_complex/AL.log:1 row in set (15.81 sec) par_sql_complex/AQ.log:1 row in set (14.52 sec) par_sql_complex/AS.log:1 row in set (2 min 43.01 sec) par_sql_complex/B6.log:1 row in set (1 min 26.06 sec) par_sql_complex/CO.log:1 row in set (3 min 58.07 sec) par_sql_complex/DH.log:1 row in set (31.30 sec) par_sql_complex/DL.log:1 row in set (5 min 47.07 sec) par_sql_complex/EA.log:1 row in set (28.58 sec) par_sql_complex/EV.log:1 row in set (2 min 6.87 sec) par_sql_complex/F9.log:1 row in set (46.18 sec) par_sql_complex/FL.log:1 row in set (1 min 30.83 sec) par_sql_complex/HA.log:1 row in set (39.42 sec) par_sql_complex/HP.log:1 row in set (2 min 45.57 sec) par_sql_complex/ML.log:1 row in set (4.64 sec) par_sql_complex/MQ.log:1 row in set (2 min 22.55 sec) par_sql_complex/NW.log:1 row in set (4 min 26.67 sec) par_sql_complex/OH.log:1 row in set (1 min 9.67 sec) par_sql_complex/OO.log:1 row in set (2 min 14.97 sec) par_sql_complex/PA.log:1 row in set (17.62 sec) par_sql_complex/PI.log:1 row in set (14.52 sec) par_sql_complex/PS.log:1 row in set (3.46 sec) par_sql_complex/RU.log:1 row in set (40.14 sec) par_sql_complex/TW.log:1 row in set (2 min 32.32 sec) par_sql_complex/TZ.log:1 row in set (14.16 sec) par_sql_complex/UA.log:1 row in set (4 min 55.18 sec) par_sql_complex/US.log:1 row in set (4 min 38.08 sec) par_sql_complex/WN.log:1 row in set (4 min 56.12 sec) par_sql_complex/XE.log:1 row in set (24.21 sec) par_sql_complex/YV.log:1 row in set (20.82 sec) As we can see there are large airlines (like AA, UA, US, DL, etc) which took most of the time. In this case the load will not be distributed evenly as in the previous example; however, by running the query in parallel we have got 3x times better response time on this server. CPU utilization: Cpu3 : 22.0%us, 1.2%sy, 0.0%ni, 74.4%id, 2.4%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 16.0%us, 0.0%sy, 0.0%ni, 84.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 39.0%us, 1.2%sy, 0.0%ni, 56.1%id, 3.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 33.3%us, 0.0%sy, 0.0%ni, 51.9%id, 13.6%wa, 0.0%hi, 1.2%si, 0.0%st Cpu7 : 33.3%us, 1.2%sy, 0.0%ni, 48.8%id, 16.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu8 : 24.7%us, 0.0%sy, 0.0%ni, 60.5%id, 14.8%wa, 0.0%hi, 0.0%si, 0.0%st Cpu9 : 24.4%us, 0.0%sy, 0.0%ni, 56.1%id, 19.5%wa, 0.0%hi, 0.0%si, 0.0%st Cpu10 : 40.7%us, 0.0%sy, 0.0%ni, 56.8%id, 2.5%wa, 0.0%hi, 0.0%si, 0.0%st Cpu11 : 19.5%us, 1.2%sy, 0.0%ni, 65.9%id, 12.2%wa, 0.0%hi, 1.2%si, 0.0%st Cpu12 : 40.2%us, 1.2%sy, 0.0%ni, 56.1%id, 2.4%wa, 0.0%hi, 0.0%si, 0.0%st Cpu13 : 82.7%us, 0.0%sy, 0.0%ni, 17.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu14 : 55.4%us, 0.0%sy, 0.0%ni, 43.4%id, 1.2%wa, 0.0%hi, 0.0%si, 0.0%st Cpu15 : 86.6%us, 0.0%sy, 0.0%ni, 13.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu16 : 61.0%us, 1.2%sy, 0.0%ni, 37.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu17 : 29.3%us, 1.2%sy, 0.0%ni, 69.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu18 : 18.8%us, 0.0%sy, 0.0%ni, 52.5%id, 28.8%wa, 0.0%hi, 0.0%si, 0.0%st Cpu19 : 14.3%us, 1.2%sy, 0.0%ni, 57.1%id, 27.4%wa, 0.0%hi, 0.0%si, 0.0%st Cpu20 : 12.3%us, 0.0%sy, 0.0%ni, 59.3%id, 28.4%wa, 0.0%hi, 0.0%si, 0.0%st Cpu21 : 10.7%us, 0.0%sy, 0.0%ni, 76.2%id, 11.9%wa, 0.0%hi, 1.2%si, 0.0%st Cpu22 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu23 : 10.8%us, 2.4%sy, 0.0%ni, 71.1%id, 15.7%wa, 0.0%hi, 0.0%si, 0.0%st Note that in case of “order by” we will need to manually sort the results, however, sorting 10-100 rows will be fast. Conclusion Splitting a complex report into multiple queries and running it in parallel (asynchronously) can increase performance (3x to 10x in the above example) and will better utilize modern hardware. It is also possible to split the queries between multiple MySQL servers (i.e. MySQL slave servers) to further increase scalability (will require more coding).
April 25, 2015
by Peter Zaitsev
· 13,187 Views
article thumbnail
Agrona Event Counters
Efficient open source event counters from the Agrona library. Agrona The Agrona library is an open source Java library of utility code. Unlike libraries such as Google Guava or Apache Commons which are general purpose Java utility libraries, Agrona is targeted at providing high performance code. It initially consists of code from the open source Aeron messaging library. Event Counters One of the features of the Agrona library is the event counters framework. One of the design goals of Aeron was to be easy to monitor. We wanted to make sure that people could easily check up on what Aeron is doing with services such as Nagios, internally written monitoring software or just from the commandline. Writing integrations with many services is a herculean task in and of itself, but we definitely wanted to be able to expose an API. We also didn't want to incorporate large 3rd party external dependencies, any allocation heavy code or things we couldn't control the performance of. This meant that we were going to have to write our own event counters rather than using something like the Coda Hale metrics framework. Our requirements for monitoring were very simple though. Update or increment the counter's value. Read or write to/from the counter value from a different thread or process. No garbage creation after initial setup. Labels should be associated with each counter value for readability's sake. Design Threadsafe updates of a long value is a very simple operation and already supported in Java through the AtomicLong class. The problem with using an AtomicLong as your event counter is that an external program, running in a different process, can't read from the value from your Java heap. Consequently Agrona's event counters are allocated in an off-heap buffer. This can be placed on a memory-mapped file which means it can be shared between two different processes. In order to give our counters names we store a table of name and counter id entries on another buffer. This can be placed on the same memory mapped file for convenience. API The CountersManager is responsible for allocating event counters. It needs to be instantiated with the buffers upon which to store the event counters and their labels. Here is an example of how to instantiate a counter from the counters manager. AtomicCounter conductorProxyFails = countersManager.newCounter("Failed offers to DriverConductorProxy"); You can also iterate over the current counter names and their ids. Here is some code that uses that to print a table of the event counter values: countersManager.forEach((id, label)->{finalint offset =CountersManager.counterOffset(id);finallong value = valuesBuffer.getLongVolatile(offset);System.out.format("%3d: %,20d - %s\n", id, value, label);}); Each instance of AtomicCounter represents one event counter. Here are some examples of using the atomic counter in code. // Increment the counter in a thread-safe manner conductorProxyFails.increment();// Increment the counter if you're only writing from a single thread conductorProxyFails.orderedIncrement();// atomically add 5 to the counter value conductorProxyFails.add(5);// Reset the counter conductorProxyFails.set(0); Conclusions I've just gone through a few simple examples of how to use the event counters from Agrona, which hopefully you've found useful. This isn't the only code in Agrona though - there are utilities for agents, and executing timing events as well as collections such as queues, ringbuffers and hashmaps. We're also expanding the library which is already on maven central. Currently documentation is a little bit thin on the ground, but contributions are always welcome. Thanks to Martin Thompson and Chris West for feedback on this blog post.
April 23, 2015
by Richard Warburton
· 5,851 Views
article thumbnail
AutoCompleteTextBox in C# Windows Form Application
In this Article, We will learn how to create AutoCompleteTextBox using C# Windows Form Application. In my previous article, we learned How to Search Records in DataGridView Using C#. Let's Begin. Create a new Windows Form Application. Drop a Label and TextBox Control from the ToolBox. Now go to Code behind file(.cs code) and add the following Code: using System; using System.Windows.Forms; namespace AutoCompleteTextBoxDemo { public partial class Form1 : Form { public Form1() { InitializeComponent(); } //AutoCompleteData Method private void autoCompleteData() { //Set AutoCompleteSource property of txt_StateName as CustomSource txt_StateName.AutoCompleteSource = AutoCompleteSource.CustomSource; //Set AutoCompleteMode property of txt_StateName as SuggestAppend. SuggestAppend Applies both Suggest and Append txt_StateName.AutoCompleteMode = AutoCompleteMode.SuggestAppend; txt_StateName.AutoCompleteCustomSource.AddRange(new string[]{"Maharastra","Andhra Pradesh","Assam","Punjab","Arunachal Pradesh","Bihar","Goa","Gujarat","Haryana"}); } private void Form1_Load(object sender, EventArgs e) { autoCompleteData(); } } } In the preeceding code, We set the AutoCompleteSource, AutoCompleteMode and AutoCompleteCustomSource properties of Textbox named as txt_StateName so that it automatically completes the input string. Preview: AutoComplete TextBox using a Database: In this example, We will Suggest/Append the data in TextBox(txt_StateName) from the Database. For Demonstration, I have created a Database (named Sample). Add a Table, tbl_State. The following is the table schema for creating tbl_State. Add the following lines of code: using System; using System.Windows.Forms; using System.Data.SqlClient; namespace AutoCompleteTextBoxDemo { public partial class Form2 : Form { public Form2() { InitializeComponent(); } private void autoCompleteData() { SqlConnection con = new SqlConnection("Data Source=.;Initial Catalog=Sample;Integrated Security=true;"); SqlCommand com = new SqlCommand("Select State from tbl_State", con); con.Open(); SqlDataReader rdr = com.ExecuteReader(); //AutoCompleteStringCollection Contains a collection of strings to use for the auto-complete feature on certain Windows Forms controls. AutoCompleteStringCollection autoCompleteCollection = new AutoCompleteStringCollection(); while (rdr.Read()) { autoCompleteCollection.Add(rdr.GetString(0)); } //Set AutoCompleteSource property of txt_StateName as CustomSource txt_StateName.AutoCompleteSource = AutoCompleteSource.CustomSource; //Set AutoCompleteMode property of txt_StateName as SuggestAppend. SuggestAppend Applies both Suggest and Append txt_StateName.AutoCompleteMode = AutoCompleteMode.SuggestAppend; txt_StateName.AutoCompleteCustomSource = autoCompleteCollection; con.Close(); } //Form2_Load Event private void Form2_Load(object sender, EventArgs e) { autoCompleteData(); } } } Preview: Hope you like it. Thanks.
April 23, 2015
by Anoop Kumar Sharma
· 7,586 Views
article thumbnail
How to Work with Merged Cells in Word Documents Table inside Android Apps
This technical tip shows how developers can work with merged cells in a Word documents inside Android applications. Several cells in a table can be merged together into a single cell. This is useful when crows require a title or large blocks of text which span across the width of the table. This can only be achieved by merging cells in the table into a single cell. Aspose.Words supports merged cells when working with all input formats including when importing HTML content. In Aspose.Words, merged cells are represented by CellFormat.HorizontalMerge and CellFormat.VerticalMerge. The CellFormat.HorizontalMerge property describes if the cell is part of a horizontal merge of cells. Likewise the CellFormat.VerticalMerge property describes if the cell is a part of a vertical merge of cells. The values of these properties are what define the merge behavior of cells. The first cell in a sequence of merged cells will have CellMerge.First. Any subsequent merged cells has CellMerge.Previous. A cell which is not merged has CellMerge.None. Sometimes when you load an existing document cells in a table will appear merged. However these can be in fact one long cell. Microsoft Word at times is known to export merged cells in this way. This can cause confusion when attempting to work with individual cells. There appears to be no particular pattern as to when this happens. //Checking if a Cell is Merged // Prints the horizontal and vertical merge type of a cell. public void checkCellsMerged() throws Exception { Document doc = new Document(getMyDir() + "Table.MergedCells.doc"); // Retrieve the first table in the document. Table table = (Table)doc.getChild(NodeType.TABLE, 0, true); for (Row row : table.getRows()) { for (Cell cell : row.getCells()) { System.out.println(printCellMergeType(cell)); } } } public String printCellMergeType(Cell cell) { boolean isHorizontallyMerged = cell.getCellFormat().getHorizontalMerge() != CellMerge.NONE; boolean isVerticallyMerged = cell.getCellFormat().getVerticalMerge() != CellMerge.NONE; String cellLocation = MessageFormat.format("R{0}, C{1}", cell.getParentRow().getParentTable().indexOf(cell.getParentRow()) + 1, cell.getParentRow().indexOf(cell) + 1); if (isHorizontallyMerged && isVerticallyMerged) return MessageFormat.format("The cell at {0} is both horizontally and vertically merged", cellLocation); else if (isHorizontallyMerged) return MessageFormat.format("The cell at {0} is horizontally merged.", cellLocation); else if (isVerticallyMerged) return MessageFormat.format("The cell at {0} is vertically merged", cellLocation); else return MessageFormat.format("The cell at {0} is not merged", cellLocation); } //Merging Cells in a Table //Creates a table with two rows with cells in the first row horizontally merged. Document doc = new Document(); DocumentBuilder builder = new DocumentBuilder(doc); builder.insertCell(); builder.getCellFormat().setHorizontalMerge(CellMerge.FIRST); builder.write("Text in merged cells."); builder.insertCell(); // This cell is merged to the previous and should be empty. builder.getCellFormat().setHorizontalMerge(CellMerge.PREVIOUS); builder.endRow(); builder.insertCell(); builder.getCellFormat().setHorizontalMerge(CellMerge.NONE); builder.write("Text in one cell."); builder.insertCell(); builder.write("Text in another cell."); builder.endRow(); builder.endTable(); //Example: Merging Cells Vertically //Creates a table with two columns with cells merged vertically in the first column. Document doc = new Document(); DocumentBuilder builder = new DocumentBuilder(doc); builder.insertCell(); builder.getCellFormat().setVerticalMerge(CellMerge.FIRST); builder.write("Text in merged cells."); builder.insertCell(); builder.getCellFormat().setVerticalMerge(CellMerge.NONE); builder.write("Text in one cell"); builder.endRow(); builder.insertCell(); // This cell is vertically merged to the cell above and should be empty. builder.getCellFormat().setVerticalMerge(CellMerge.PREVIOUS); builder.insertCell(); builder.getCellFormat().setVerticalMerge(CellMerge.NONE); builder.write("Text in another cell"); builder.endRow(); builder.endTable(); //Merging all Cells in a Range //A method which merges all cells of a table in the specified range of cells /** * Merges the range of cells found between the two specified cells both horizontally and vertically. Can span over multiple rows. */ public static void mergeCells(Cell startCell, Cell endCell) { Table parentTable = startCell.getParentRow().getParentTable(); // Find the row and cell indices for the start and end cell. Point startCellPos = new Point(startCell.getParentRow().indexOf(startCell), parentTable.indexOf(startCell.getParentRow())); Point endCellPos = new Point(endCell.getParentRow().indexOf(endCell), parentTable.indexOf(endCell.getParentRow())); // Create the range of cells to be merged based off these indices. Inverse each index if the end cell if before the start cell. Rectangle mergeRange = new Rectangle(Math.min(startCellPos.x, endCellPos.x), Math.min(startCellPos.y, endCellPos.y), Math.abs(endCellPos.x - startCellPos.x) + 1, Math.abs(endCellPos.y - startCellPos.y) + 1); for (Row row : parentTable.getRows()) { for(Cell cell : row.getCells()) { Point currentPos = new Point(row.indexOf(cell), parentTable.indexOf(row)); // Check if the current cell is inside our merge range then merge it. if (mergeRange.contains(currentPos)) { if (currentPos.x == mergeRange.x) cell.getCellFormat().setHorizontalMerge(CellMerge.FIRST); else cell.getCellFormat().setHorizontalMerge(CellMerge.PREVIOUS); if (currentPos.y == mergeRange.y) cell.getCellFormat().setVerticalMerge(CellMerge.FIRST); else cell.getCellFormat().setVerticalMerge(CellMerge.PREVIOUS); } } } } //Merging Cells between Two Cells // Merges the range of cells between the two specified cells // We want to merge the range of cells found inbetween these two cells. Cell cellStartRange = table.getRows().get(2).getCells().get(2); Cell cellEndRange = table.getRows().get(3).getCells().get(3); // Merge all the cells between the two specified cells into one. mergeCells(cellStartRange, cellEndRange);
April 22, 2015
by David Zondray
· 6,385 Views
  • Previous
  • ...
  • 490
  • 491
  • 492
  • 493
  • 494
  • 495
  • 496
  • 497
  • 498
  • 499
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×