Databases Resources

The Latest Databases Topics

Efficient Cassandra Write Pattern for Micro-Batching

The best way to write to a Cassandra cluster are concurrent asynchronous writes. In cases where data exhibits strong temporal locality, speed can be improved.

May 20, 2015

by John Georgiadis

· 35,106 Views · 1 Like

Data Locality w/ Cassandra : How to Scan the Local Token Range of a Table...

I'm working on a mechanism that will allow HPCC to access data stored in Cassandra with data locality, leveraging the Java streaming capabilities from HPCC.

May 18, 2015

by Brian O' Neill

· 14,404 Views · 1 Like

Integrating External APIs into your Meteor.js Application

Meteor itself does not rely on REST APIs, but it can easily access data from other services. This article is an excerpt from the book Meteor in Action and explains how you can integrate third-party data into your applications by accessing RESTful URLs from the server-side. Many applications rely on external APIs to retrieve data. Getting information regarding your friends from Facebook, looking up the current weather in your area, or simply retrieving an avatar image from another website – there are endless uses for integrating additional data. They all share a common challenge: APIs must be called from the server, but an API usually takes longer than executing the method itself. You need to ensure that the result gets back to the client – even if it takes a couple of seconds. Let’s talk about how to integrate an external API via HTTP. Based on the IP address of a visitor, you can tell various information about their current location, e.g., coordinates, city or timezone. There is a simple API that takes an IPv4 address and returns all these tidbits as a JSON object. The API is called Telize. Making RESTful calls with the http package In order to communicate with RESTful external APIs such as Telize, you need to add the http package: meteor add http While the http package allows you to make HTTP calls from both client and server, the API call in this example will be performed from the server only. Many APIs require you to provide an ID as well as a secret key to identify the application that makes an API request. In those cases you should always run your requests from the server. That way you never have to share secret keys with clients. Let's look at a graphic to explain the basic concept. A user requests location information for an IP address (step 1). The client application calls a server method called geoJsonforIp (step 2) that makes an (asynchronous) call to the external API using the HTTP.get() method (step 3). The response (step 4) is a JSON object with information regarding the geographic location associated with an IP address, which gets sent back to the client via a callback (step 5). Using a synchronous method to query an API Let’s add a method that queries telize.com for a given IP address as shown in the following listing. This includes only the bare essentials for querying an API for now. Remember: This code belongs in a server-side only file or inside a if (Meteor.isServer) {} block. Meteor.methods({ // The method expects a valid IPv4 address 'geoJsonForIp': function (ip) { console.log('Method.geoJsonForIp for', ip); // Construct the API URL var apiUrl = 'http://www.telize.com/geoip/' + ip; // query the API var response = HTTP.get(apiUrl).data; return response; } }); Once the method is available on the server, querying the location of an IP works simply by calling the method with a callback from the client: Meteor.call('geoJsonForIp', '8.8.8.8', function(err,res){ console.log(res); }); While this solution appears to be working fine there are two major flaws to this approach: If the API is slow to respond requests will start queuing up. Should the API return an error there is no way to return it back to the UI. To address the issue of queuing, you can add an unblock() statement to the method: this.unblock(); Calling an external API should always be done asynchronously. That way you can also return possible error values back to the browser, which will solve the second issue. Let’s create a dedicated function for calling the API asynchronously to keep the method itself clean. Using an asynchronous method to call an API The listing below shows how to issue an HTTP.get call and return the result via a callback. It also includes error handling that can be shown on the client. var apiCall = function (apiUrl, callback) { // try…catch allows you to handle errors try { var response = HTTP.get(apiUrl).data; // A successful API call returns no error // but the contents from the JSON response callback(null, response); } catch (error) { // If the API responded with an error message and a payload if (error.response) { var errorCode = error.response.data.code; var errorMessage = error.response.data.message; // Otherwise use a generic error message } else { var errorCode = 500; var errorMessage = 'Cannot access the API'; } // Create an Error object and return it via callback var myError = new Meteor.Error(errorCode, errorMessage); callback(myError, null); } } Inside a try…catch block, you can differentiate between a successful API call (the try block) and an error case (the catch block). A successful call may return null for the error object of the callback, an error will return only an error object and null for the actual response. There are different types of errors and you want to differentiate between a problem with accessing the API and an API call that got an error inside the returned response. This is what the if statement checks for – in case the error object has a response property both code and message for the error should be taken from it; otherwise you can display a generic error 500 that the API could not be accessed. Each case, success and failure, returns a callback that can be passed back to the UI. In order to make the API call asynchronous you need to update the method as shown in the next code snippet. The improved code unblocks the method and wraps the API call in a wrapAsync function. Meteor.methods({ 'geoJsonForIp': function (ip) { // avoid blocking other method calls from the same client this.unblock(); var apiUrl = 'http://www.telize.com/geoip/' + ip; // asynchronous call to the dedicated API calling function var response = Meteor.wrapAsync(apiCall)(apiUrl); return response; } }); Finally, to allow requests from the browser and show error messages you should add a template similar to the following code. Query the location data for an IP Look up location {{#with location} {{#if error} There was an error: {{error.errorType} {{error.message}! {{else} The IP address {{location.ip} is in {{location.city} ({{location.country}). {{/if} {{/with} A Session variable called location is used to store the results from the API call. Clicking the button takes the content of the input box and sends it as a parameter to the geoJsonForIp method. The Session variable is set to the value of the callback. This is the required JavaScript code for connecting the template with the method call: Template.telize.helpers({ location: function () { return Session.get('location'); } }); Template.telize.events({ 'click button': function (evt, tpl) { var ip = tpl.find('input#ipv4').value; Meteor.call('geoJsonForIp', ip, function (err, res) { // The method call sets the Session variable to the callback value if (err) { Session.set('location', {error: err}); } else { Session.set('location', res); return res; } }); } }); As a result you will be able to make API calls from the browser just like in this figure: And that’show to integrate an external API via HTTP!

May 15, 2015

by Stephan Hochhaus

· 40,203 Views

Log Collection With Graylog on AWS

Log collection is essential to properly analyzing issues in production. An interface to search and be notified about exceptions on all your servers is a must. Well, if you have one server, you can easily ssh to it and check the logs, of course, but for larger deployments, collecting logs centrally is way more preferable than logging to 10 machines in order to find “what happened”. There are many options to do that, roughly separated in two groups – 3rd party services and software to be installed by you. 3rd party (or “cloud-based” if you want) log collection services include Splunk,Loggly, Papertrail, Sumologic. They are very easy to setup and you pay for what you use. Basically, you send each message (e.g. via a custom logback appender) to a provider’s endpoint, and then use the dashboard to analyze the data. In many cases that would be the preferred way to go. In other cases, however, company policy may frown upon using 3rd party services to store company-specific data, or additional costs may be undesired. In these cases extra effort needs to be put into installing and managing an internal log collection software. They work in a similar way, but implementation details may differ (e.g. instead of sending messages with an appender to a target endpoint, the software, using some sort of an agent, collects local logs and aggregates them). Open-source options include Graylog, FluentD, Flume, Logstash. After a very quick research, I considered graylog to fit our needs best, so below is a description of the installation procedure on AWS (though the first part applies regardless of the infrastructure). The first thing to look at are the ready-to-use images provided by graylog, including docker, openstack, vagrant and AWS. Unfortunately, the AWS version has two drawbacks – it’s using Ubuntu, rather than the Amazon AMI. That’s not a huge issue, although some generic scripts you use in your stack may have to be rewritten. The other was the dealbreaker – when you start it, it doesn’t run a web interface, although it claims it should. Only mongodb, elasticsearch and graylog-server are started. Having 2 instances – one web, and one for the rest would complicate things, so I opted for manual installation. Graylog has two components – the server, which handles the input, indexing and searching, and the web interface, which is a nice UI that communicates with the server. The web interface uses mongodb for metadata, and the server uses elasticsearch to store the incoming logs. Below is a bash script (CentOS) that handles the installation. Note that there is no “sudo”, because initialization scripts are executed as root on AWS. #!/bin/bash # install pwgen for password-generation yum upgrade ca-certificates --enablerepo=epel yum --enablerepo=epel -y install pwgen # mongodb cat >/etc/yum.repos.d/mongodb-org.repo <<'EOT' [mongodb-org] name=MongoDB Repository baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/ gpgcheck=0 enabled=1 EOT yum -y install mongodb-org chkconfig mongod on service mongod start # elasticsearch rpm --import https://packages.elasticsearch.org/GPG-KEY-elasticsearch cat >/etc/yum.repos.d/elasticsearch.repo <<'EOT' [elasticsearch-1.4] name=Elasticsearch repository for 1.4.x packages baseurl=http://packages.elasticsearch.org/elasticsearch/1.4/centos gpgcheck=1 gpgkey=http://packages.elasticsearch.org/GPG-KEY-elasticsearch enabled=1 EOT yum -y install elasticsearch chkconfig --add elasticsearch # configure elasticsearch sed -i -- 's/#cluster.name: elasticsearch/cluster.name: graylog2/g' /etc/elasticsearch/elasticsearch.yml sed -i -- 's/#network.bind_host: localhost/network.bind_host: localhost/g' /etc/elasticsearch/elasticsearch.yml service elasticsearch stop service elasticsearch start # java yum -y update yum -y install java-1.7.0-openjdk update-alternatives --set java /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java # graylog wget https://packages.graylog2.org/releases/graylog2-server/graylog-1.0.1.tgz tar xvzf graylog-1.0.1.tgz -C /opt/ mv /opt/graylog-1.0.1/ /opt/graylog/ cp /opt/graylog/bin/graylogctl /etc/init.d/graylog sed -i -e 's/GRAYLOG2_SERVER_JAR=\${GRAYLOG2_SERVER_JAR:=graylog.jar}/GRAYLOG2_SERVER_JAR=\${GRAYLOG2_SERVER_JAR:=\/opt\/graylog\/graylog.jar}/' /etc/init.d/graylog sed -i -e 's/LOG_FILE=\${LOG_FILE:=log\/graylog-server.log}/LOG_FILE=\${LOG_FILE:=\/var\/log\/graylog-server.log}/' /etc/init.d/graylog cat >/etc/init.d/graylog <<'EOT' #!/bin/bash # chkconfig: 345 90 60 # description: graylog control sh /opt/graylog/bin/graylogctl $1 EOT chkconfig --add graylog chkconfig graylog on chmod +x /etc/init.d/graylog # graylog web wget https://packages.graylog2.org/releases/graylog2-web-interface/graylog-web-interface-1.0.1.tgz tar xvzf graylog-web-interface-1.0.1.tgz -C /opt/ mv /opt/graylog-web-interface-1.0.1/ /opt/graylog-web/ cat >/etc/init.d/graylog-web <<'EOT' #!/bin/bash # chkconfig: 345 91 61 # description: graylog web interface sh /opt/graylog-web/bin/graylog-web-interface > /dev/null 2>&1 & EOT chkconfig --add graylog-web chkconfig graylog-web on chmod +x /etc/init.d/graylog-web #configure mkdir --parents /etc/graylog/server/ cp /opt/graylog/graylog.conf.example /etc/graylog/server/server.conf sed -i -e 's/password_secret =.*/password_secret = '$(pwgen -s 96 1)'/' /etc/graylog/server/server.conf sed -i -e 's/root_password_sha2 =.*/root_password_sha2 = '$(echo -n password | shasum -a 256 | awk '{print $1}')'/' /etc/graylog/server/server.conf sed -i -e 's/application.secret=""/application.secret="'$(pwgen -s 96 1)'"/g' /opt/graylog-web/conf/graylog-web-interface.conf sed -i -e 's/graylog2-server.uris=""/graylog2-server.uris="http:\/\/127.0.0.1:12900\/"/g' /opt/graylog-web/conf/graylog-web-interface.conf service graylog start sleep 30 service graylog-web start You may also want to set a TTL (auto-expiration) for messages, so that you don’t store old logs forever. Here’s how # wait for the index to be created INDEXES=$(curl --silent "http://localhost:9200/_cat/indices") until [[ "$INDEXES" =~ "graylog2_0" ]]; do sleep 5 echo "Index not yet created. Indexes: $INDEXES" INDEXES=$(curl --silent "http://localhost:9200/_cat/indices") done # set each indexed message auto-expiration (ttl) curl -XPUT "http://localhost:9200/graylog2_0/message/_mapping" -d'{"message": {"_ttl" : { "enabled" : true, "default" : "15d" }}' Now you have everything running on the instance. Then you have to do some AWS-specific things (if using CloudFormation, that would include a pile of JSON). Here’s the list: you can either have an auto-scaling group with one instance, or a single instance. I prefer the ASG, though the other one is a bit simpler. The ASG gives you auto-respawn if the instance dies. set the above script to be invoked in the UserData of the launch configuration of the instance/asg (e.g. by getting it from s3 first) allow UDP port 12201 (the default logging port). That should happen for the instance/asg security group (inbound), for the application nodes security group (outbound), and also as a network ACL of your VPC. Test the UDP connection to make sure it really goes through. Keep the access restricted for all sources, except for your instances. you need to pass the private IP address of your graylog server instance to all the application nodes. That’s tricky on AWS, as private IP addresses change. That’s why you need something stable. You can’t use an ELB (load balancer), because it doesn’t support UDP. There are two options: Associate an Elastic IP with the node on startup. Pass that IP to the application nodes. But there’s a catch – if they connect to the elastic IP, that would go via NAT (if you have such), and you may have to open your instance “to the world”. So, you must turn the elastic IP into its corresponding public DNS. The DNS then will be resolved to the private IP. You can do that by manually and hacky: 1 GRAYLOG_ADDRESS="ec2-$GRAYLOG_ADDRESS//./-}.us-west-1.compute.amazonaws.com" or you can use the AWS EC2 CLI to obtain the instance details of the instance that the elastic IP is associated with, and then with another call obtain its Public DNS. Instead of using an Elastic IP, which limits you to a single instance, you can use Route53 (the AWS DNS manager). That way, when a graylog server instance starts, it can append itself to a route53 record, that way allowing for a round-robin DNS of multiple graylog instances that are in a cluster. Manipulating the Route53 records is again done via the AWS CLI. Then you just pass the domain name to applications nodes, so that they can send messages. alternatively, you can install graylog-server on all the nodes (as an agent), and point them to an elasticsearch cluster. But that’s more complicated and probably not the intended way to do it configure your logging framework to send messages to graylog. There are standard GELF (the greylog format) appenders, e.g. this one, and the only thing you have to do is use the Public DNS environment variable in the logback.xml (which supports environment variable resolution). You should make the web interface accessible outside the network, so you can use an ELB for that, or the round-robin DNS mentioned above. Just make sure the security rules are tight and not allowing external tampering with your log data. If you are not running a graylog cluster (which I won’t cover), then the single instance can potentially fail. That isn’t a great loss, as log messages can be obtained from the instances, and they are short-lived anyway. But the metadata of the web interface is important – dashboards, alerts, etc. So it’s good to do regular backups (e.g. with mongodump). Using an EBS volume is also an option. Even though you send your log messages to the centralized log collector, it’s a good idea to also keep local logs, with the proper log rotation and cleanup. It’s not a trivial process, but it’s essential to have log collection, so I hope the guide has been helpful.

May 14, 2015

by Bozhidar Bozhanov

· 20,031 Views

Collecting Transaction Per Minute from SQL Server and HammerDB

SQL Server script file can be created to run in a loop collecting for a given amount of time at a specified interval.

May 11, 2015

by Greg Schulz

· 10,335 Views

8 Questions You Need to Ask About Microservices, Containers & Docker in 2015

In containers and microservices, we’re facing the greatest potential change in how we deliver and run software services since the arrival of virtual machines.

May 9, 2015

by Andrew Phillips

· 15,066 Views · 1 Like

Quick Notes: What is CAP Theorem?

CAP theorem states that any database system can only attain two out of following states which is Consistency, Availability and Partition Tolerance.

May 5, 2015

by Ajitesh Kumar

· 26,380 Views · 3 Likes

Why Run Your Microservices on a PaaS

[This article by Chris Haddad comes to you from the DZone Guide to Cloud Development - 2015 Edition. For more information—including in-depth articles from industry experts, best solutions for PaaS, iPaaS, IaaS, and MBaaS, and more—click the link below to download your free copy of the guide.] Microservices can be understood from two angles. First, the differential: teams that take a microservice design approach divide business solutions into distinct, full-stack business services owned by autonomous teams. Second, the integral: microservice-based applications weave multiple atomic microservices into holistic user experiences. Unfortunately, traditional application delivery models and traditional middleware infrastructure do not address microservice-specific demands for on-demand provisioning, dynamic composition, and service level management. On the other hand, the Platform-as-a-Service (PaaS) model addresses these demands perfectly. Running microservices on a PaaS fabric decreases solution fragility, reduces operational burden, and enhances developer productivity. To understand why, we’ll first review how microservices separate concerns from both business and object-oriented design perspectives. Second, we’ll consider how microservice-based design can complicate deployment as applications scale dynamically. Third, we’ll focus on how a PaaS environment helps to solve many of the problems both addressed and introduced by microservices-based architectures — in other words, why PaaS and microservices are a match made in heaven. Microservices: Separating Concerns By Business Solution A microservice approach decomposes monolithic applications according to the single responsibility pattern. In a microservice solution, each microservice interface delivers discrete business capabilities (e.g. customer profile, product catalogue, inventory, order, billing, fulfillment) within a well-defined, bounded context. The atomic microservice interfaces reside on separate and distinct full-stack application platforms that contain separate database storage, integration flows, and web application hosting. By separating concerns onto separate full-stack platforms and not sharing database instances or web application hosts across services, every team is free to choose different runtime languages and frameworks for its own microservice. Also, every team is free to evolve its data schemas, application frameworks, and business logic without impacting other teams. Because microservices are a relatively new design approach, many development teams may have the misconception that creating a microservice-based solution requires simply deploying small web services in containers. But this doesn’t cut quite deep enough. The correct approach is to evolve your monolithic design by applying service-oriented principles (i.e. encapsulation, loose coupling, separation of concerns) in conjunction with domain-driven design techniques and dynamic runtime application composition. For example, in a typical ecommerce scenario, a development team applies the bounded context pattern and single responsibility pattern to refactor a monolithic application into units distinguished by business capability (see Figure 2). By creating a user experience from loosely coupled services instead of tightly coupled native-language business objects, teams have more independence to develop, evolve, and deploy each business capability separately. Obviously, the microservice design approach works best for (a) greenfield projects or (b) modernization efforts where teams focus on refactoring monolithic application assets. The Microservice Execution Trap Although a microservice approach decouples development dependencies and speeds up development iterations, microservices also create a challenging environment for high-performance scaling and reliable runtime execution. More complex, loosely coupled, and dynamic environments distribute business capabilities over the entire network. Even a task as simple as responding to a single web application page request may spread out across several microservice instances residing on a distributed network topology. Martin Fowler and Stefan Tilkov (both microservice proponents) warn teams that successfully implementing a microservice approach requires choosing platforms that decrease solution fragility and reduce operational burdens. What Platform-as-a-Service Offers Platform-as-a-Service environments reduce microservice operational burdens when infrastructure-as-code and declarative policies are used to eliminate all manual actions and increase runtime quality of service (i.e. reliability, availability, scalability, and performance). The appropriate PaaS environment will automatically deploy, provision, and link full-stack microservices. In a microservice architecture, teams want to rapidly release new versions and perform A/B testing across versions. When teams define instance dependencies, scaling properties, and security policies as PaaS metadata or code scripts, the runtime fabric can reduce manual effort and increase release confidence. With a DevOps- friendly PaaS, the team can experiment with new service versions and safely rollback to a prior stable release if a problem arises. Because microservices are full-stack silos *1* that can be composed of multiple server instances (e.g. web server, database, load balancer, integration server), a PaaS can reduce deployment complexity by automatically spinning up and linking all instances. Linking may require discovering instance locations, dynamically initializing network routes, and auto-configuring connection strings based on service version or tenant. A traditional application will compose business functions and user experience by statically linking class files and shared object libraries. In contrast, microservice- based applications use service composition to connect available microservices endpoints and realize a fully functional application. While many microservice proponents promote microservice-based interactions by “smart endpoints through dumb pipes, ‘ effective service composition requires smart infrastructure building blocks to bootstrap and maintain connections between services and consumers. The right PaaS solves these problems. Infrastructure building blocks will register service endpoint locations, associate metadata and policies, connect clients, circuit break around failures, correlate inter-service calls, and load balance traffic. A microservice-friendly PaaS will provide service registries, metadata services, discovery services, and service virtualization gateways. In the pipe, circuit breakers will automatically route traffic on failover or overload. Smart endpoint code will dynamically connect with microservices based on discovery service responses and negotiated quality of service parameters. Rather than being hard-coded to a specific service hostname and URI, endpoint code will query for microservice location based on security assurances, performance guarantees, traffic load, service version, client tenancy, or business domain. When services are unavailable or underperform, smart endpoints will follow the tolerant reader pattern and gracefully degrade experience or proactively recover. A few recovery options include reading from local caches or circuit tripping to backup service endpoints. In conjunction with smart endpoint actions, a smart PaaS will spin up new microservice endpoints and full-stack instances based on service level management metrics. By following microservice architecture best practices, teams create anti-fragile applications that not only withstand a shock, but also improve performance and quality of service when stressed or experiencing failures. To drive this non-intuitive behavior, the underlying platform environment must be ready to scale, repair, and reconnect services. PaaS service level management components will create more resilient and anti-fragile microservices by monitoring performance, elastically provisioning instances, and dynamically re-routing traffic. Scaling an anti-fragile microservice is more difficult than scaling a web application. The PaaS should distribute microservice instances across multiple availability zones and dynamically adjust traffic to reduce latency and response time. Because transient microservice instances will rapidly start, stop, and change location, the service management layer must be completely automated and integrated with routing services. A PaaS environment will deliver the service level management, dynamic service composition, circuit breakers, and on-demand provisioning functions required to overcome the complexity inherent within a distributed microservice-based application architecture. Running microservices on a PaaS fabric will decrease solution fragility, reduce operational burden, and enhance developer productivity. If you are pursuing a microservice design approach, make sure you choose a microservice- friendly PaaS. DOWNLOAD YOUR FREE COPY TODAY

May 5, 2015

by Chris Haddad

· 12,097 Views · 2 Likes

A Look at Nanomsg and Scalability Protocols (Why ZeroMQ Shouldn’t Be Your First Choice)

Earlier this month, I explored ZeroMQ and how it proves to be a promising solution for building fast, high-throughput, and scalable distributed systems. Despite lending itself quite well to these types of problems, ZeroMQ is not without its flaws. Its creators have attempted to rectify many of these shortcomings through spiritual successors Crossroads I/O and nanomsg. The now-defunct Crossroads I/O is a proper fork of ZeroMQ with the true intention being to build a viable commercial ecosystem around it. Nanomsg, however, is a reimagining of ZeroMQ—a complete rewrite in C1. It builds upon ZeroMQ’s rock-solid performance characteristics while providing several vital improvements, both internal and external. It also attempts to address many of the strange behaviors that ZeroMQ can often exhibit. Today, I’ll take a look at what differentiates nanomsg from its predecessor and implement a use case for it in the form of service discovery. Nanomsg vs. ZeroMQ A common gripe people have with ZeroMQ is that it doesn’t provide an API for new transport protocols, which essentially limits you to TCP, PGM, IPC, and ITC. Nanomsg addresses this problem by providing a pluggable interface for transports and messaging protocols. This means support for new transports (e.g. WebSockets) and new messaging patterns beyond the standard set of PUB/SUB, REQ/REP, etc. Nanomsg is also fully POSIX-compliant, giving it a cleaner API and better compatibility. No longer are sockets represented as void pointers and tied to a context—simply initialize a new socket and begin using it in one step. With ZeroMQ, the context internally acts as a storage mechanism for global state and, to the user, as a pool of I/O threads. This concept has been completely removed from nanomsg. In addition to POSIX compliance, nanomsg is hoping to be interoperable at the API and protocol levels, which would allow it to be a drop-in replacement for, or otherwise interoperate with, ZeroMQ and other libraries which implement ZMTP/1.0 and ZMTP/2.0. It has yet to reach full parity, however. ZeroMQ has a fundamental flaw in its architecture. Its sockets are not thread-safe. In and of itself, this is not problematic and, in fact, is beneficial in some cases. By isolating each object in its own thread, the need for semaphores and mutexes is removed. Threads don’t touch each other and, instead, concurrency is achieved with message passing. This pattern works well for objects managed by worker threads but breaks down when objects are managed in user threads. If the thread is executing another task, the object is blocked. Nanomsg does away with the one-to-one relationship between objects and threads. Rather than relying on message passing, interactions are modeled as sets of state machines. Consequently, nanomsg sockets are thread-safe. Nanomsg has a number of other internal optimizations aimed at improving memory and CPU efficiency. ZeroMQ uses a simple trie structure to store and match PUB/SUB subscriptions, which performs nicely for sub-10,000 subscriptions but quickly becomes unreasonable for anything beyond that number. Nanomsg uses a space-optimized trie called a radix tree to store subscriptions. Unlike its predecessor, the library also offers a true zero-copy API which greatly improves performance by allowing memory to be copied from machine to machine while completely bypassing the CPU. ZeroMQ implements load balancing using a round-robin algorithm. While it provides equal distribution of work, it has its limitations. Suppose you have two datacenters, one in New York and one in London, and each site hosts instances of “foo” services. Ideally, a request made for foo from New York shouldn’t get routed to the London datacenter and vice versa. With ZeroMQ’s round-robin balancing, this is entirely possible unfortunately. One of the new user-facing features that nanomsg offers is priority routing for outbound traffic. We avoid this latency problem by assigning priority one to foo services hosted in New York for applications also hosted there. Priority two is then assigned to foo services hosted in London, giving us a failover in the event that foos in New York are unavailable. Additionally, nanomsg offers a command-line tool for interfacing with the system called nanocat. This tool lets you send and receive data via nanomsg sockets, which is useful for debugging and health checks. Scalability Protocols Perhaps most interesting is nanomsg’s philosophical departure from ZeroMQ. Instead of acting as a generic networking library, nanomsg intends to provide the “Lego bricks” for building scalable and performant distributed systems by implementing what it refers to as “scalability protocols.” These scalability protocols are communication patterns which are an abstraction on top of the network stack’s transport layer. The protocols are fully separated from each other such that each can embody a well-defined distributed algorithm. The intention, as stated by nanomsg’s author Martin Sustrik, is to have the protocol specifications standardized through the IETF. Nanomsg currently defines six different scalability protocols: PAIR, REQREP, PIPELINE, BUS, PUBSUB, and SURVEY. PAIR (Bidirectional Communication) PAIR implements simple one-to-one, bidirectional communication between two endpoints. Two nodes can send messages back and forth to each other. REQREP (Client Requests, Server Replies) The REQREP protocol defines a pattern for building stateless services to process user requests. A client sends a request, the server receives the request, does some processing, and returns a response. PIPELINE (One-Way Dataflow) PIPELINE provides unidirectional dataflow which is useful for creating load-balanced processing pipelines. A producer node submits work that is distributed among consumer nodes. BUS (Many-to-Many Communication) BUS allows messages sent from each peer to be delivered to every other peer in the group. PUBSUB (Topic Broadcasting) PUBSUB allows publishers to multicast messages to zero or more subscribers. Subscribers, which can connect to multiple publishers, can subscribe to specific topics, allowing them to receive only messages that are relevant to them. SURVEY (Ask Group a Question) The last scalability protocol, and the one in which I will further examine by implementing a use case with, is SURVEY. The SURVEY pattern is similar to PUBSUB in that a message from one node is broadcasted to the entire group, but where it differs is that each node in the group responds to the message. This opens up a wide variety of applications because it allows you to quickly and easily query the state of a large number of systems in one go. The survey respondents must respond within a time window configured by the surveyor. Implementing Service Discovery As I pointed out, the SURVEY protocol has a lot of interesting applications. For example: What data do you have for this record? What price will you offer for this item? Who can handle this request? To continue exploring it, I will implement a basic service-discovery pattern. Service discovery is a pretty simple question that’s well-suited for SURVEY: what services are out there? Our solution will work by periodically submitting the question. As services spin up, they will connect with our service discovery system so they can identify themselves. We can tweak parameters like how often we survey the group to ensure we have an accurate list of services and how long services have to respond. This is great because 1) the discovery system doesn’t need to be aware of what services there are—it just blindly submits the survey—and 2) when a service spins up, it will be discovered and if it dies, it will be “undiscovered.” Here is the ServiceDiscovery class: from collections import defaultdict import random from nanomsg import NanoMsgAPIError from nanomsg import Socket from nanomsg import SURVEYOR from nanomsg import SURVEYOR_DEADLINE class ServiceDiscovery(object): def __init__(self, port, deadline=5000): self.socket = Socket(SURVEYOR) self.port = port self.deadline = deadline self.services = defaultdict(set) def bind(self): self.socket.bind('tcp://*:%s' % self.port) self.socket.set_int_option(SURVEYOR, SURVEYOR_DEADLINE, self.deadline) def discover(self): if not self.socket.is_open(): return self.services self.services = defaultdict(set) self.socket.send('service query') while True: try: response = self.socket.recv() except NanoMsgAPIError: break service, address = response.split('|') self.services[service].add(address) return self.services def resolve(self, service): providers = self.services[service] if not providers: return None return random.choice(tuple(providers)) def close(self): self.socket.close() The discover method submits the survey and then collects the responses. Notice we construct a SURVEYOR socket and set the SURVEYOR_DEADLINE option on it. This deadline is the number of milliseconds from when a survey is submitted to when a response must be received—adjust it accordingly based on your network topology. Once the survey deadline has been reached, a NanoMsgAPIError is raised and we break the loop. The resolve method will take the name of a service and randomly select an available provider from our discovered services. We can then wrap ServiceDiscovery with a daemon that will periodically run discover. import os import time from service_discovery import ServiceDiscovery DEFAULT_PORT = 5555 DEFAULT_DEADLINE = 5000 DEFAULT_INTERVAL = 2000 def start_discovery(port, deadline, interval): discovery = ServiceDiscovery(port, deadline=deadline) discovery.bind() print 'Starting service discovery [port: %s, deadline: %s, interval: %s]' \ % (port, deadline, interval) while True: print discovery.discover() time.sleep(interval / 1000) if __name__ == '__main__': port = int(os.environ.get('PORT', DEFAULT_PORT)) deadline = int(os.environ.get('DEADLINE', DEFAULT_DEADLINE)) interval = int(os.environ.get('INTERVAL', DEFAULT_INTERVAL)) start_discovery(port, deadline, interval) The discovery parameters are configured through environment variables which I inject into a Docker container. Services must connect to the discovery system when they start up. When they receive a survey, they should respond by identifying what service they provide and where the service is located. One such service might look like the following: import os from threading import Thread from nanomsg import REP from nanomsg import RESPONDENT from nanomsg import Socket DEFAULT_DISCOVERY_HOST = 'localhost' DEFAULT_DISCOVERY_PORT = 5555 DEFAULT_SERVICE_NAME = 'foo' DEFAULT_SERVICE_PROTOCOL = 'tcp' DEFAULT_SERVICE_HOST = 'localhost' DEFAULT_SERVICE_PORT = 9000 def register_service(service_name, service_address, discovery_host, discovery_port): socket = Socket(RESPONDENT) socket.connect('tcp://%s:%s' % (discovery_host, discovery_port)) print 'Starting service registration [service: %s %s, discovery: %s:%s]' \ % (service_name, service_address, discovery_host, discovery_port) while True: message = socket.recv() if message == 'service query': socket.send('%s|%s' % (service_name, service_address)) def start_service(service_name, service_protocol, service_port): socket = Socket(REP) socket.bind('%s://*:%s' % (service_protocol, service_port)) print 'Starting service %s' % service_name while True: request = socket.recv() print 'Request: %s' % request socket.send('The answer is 42') if __name__ == '__main__': discovery_host = os.environ.get('DISCOVERY_HOST', DEFAULT_DISCOVERY_HOST) discovery_port = os.environ.get('DISCOVERY_PORT', DEFAULT_DISCOVERY_PORT) service_name = os.environ.get('SERVICE_NAME', DEFAULT_SERVICE_NAME) service_host = os.environ.get('SERVICE_HOST', DEFAULT_SERVICE_HOST) service_port = os.environ.get('SERVICE_PORT', DEFAULT_SERVICE_PORT) service_protocol = os.environ.get('SERVICE_PROTOCOL', DEFAULT_SERVICE_PROTOCOL) service_address = '%s://%s:%s' % (service_protocol, service_host, service_port) Thread(target=register_service, args=(service_name, service_address, discovery_host, discovery_port)).start() start_service(service_name, service_protocol, service_port) Once again, we configure parameters through environment variables set on a container. Note that we connect to the discovery system with a RESPONDENT socket which then responds to service queries with the service name and address. The service itself uses a REP socket that simply responds to any requests with “The answer is 42,” but it could take any number of forms such as HTTP, raw socket, etc. The full code for this example, including Dockerfiles, can be found on GitHub. Nanomsg or ZeroMQ? Based on all the improvements that nanomsg makes on top of ZeroMQ, you might be wondering why you would use the latter at all. Nanomsg is still relatively young. Although it has numerous language bindings, it hasn’t reached the maturity of ZeroMQ which has a thriving development community. ZeroMQ has extensive documentation and other resources to help developers make use of the library, while nanomsg has very little. Doing a quick Google search will give you an idea of the difference (about 500,000 results for ZeroMQ to nanomsg’s 13,500). That said, nanomsg’s improvements and, in particular, its scalability protocols make it very appealing. A lot of the strange behaviors that ZeroMQ exposes have been resolved completely or at least mitigated. It’s actively being developed and is quickly gaining more and more traction. Technically, nanomsg has been in beta since March, but it’s starting to look production-ready if it’s not there already.

May 4, 2015

by Tyler Treat

· 16,102 Views · 1 Like

How To: Neo4j Data Import - Minimal Example

The easiest way to import data from relational or legacy systems, like plain CSV files without headers, into Neo4j with Cypher and a graph model.

April 30, 2015

by Michael Hunger

· 7,793 Views

How to Create Multi-Column PDF Document inside .NET Applications

This technical tip shows how .NET developers create multi-column PDF document using Aspose.Pdf for .NET. In magazines and newspapers, we mostly see that news are displayed in multiple columns on the single pages instead of the books where text paragraphs are mostly printed on the whole pages from left to right position. Many document processing applications like Microsoft Word and Adobe Acrobat Writer allow users to create multiple columns on a single page and then add data to them. Aspose.Pdf for .NET also offers the feature to create multiple columns inside the pages of PDF documents. In order to create multi-column PDF file, we can make use of Aspose.Pdf.FloatingBox class as it provides ColumnInfo.ColumnCount property to specify the number of columns inside FloatingBox and we can also specify the spacing between columns and columns widths using ColumnInfo.ColumnSpacing and ColumnInfo.ColumnWidths properties accordingly. Please note that FloatingBox is an element inside Document Object Model and it can have obsolete positioning as compared to relative positioning (i.e. Text, Graph, Image etc). Column spacing means the space between the columns and the default spacing between the columns is 1.25cm. If the column width is not specified, then Aspose.Pdf for .NET calculates width for each column automatically according to the page size and column spacing. The code example is given below to demonstrate the creation of two columns with Graphs objects (Line) and they are added to paragraphs collection of FloatingBox, which is then added paragraphs collection of Page instance. //C# Code Sample Document doc = new Document(); // specify the left margin info for the PDF file doc.PageInfo.Margin.Left = 40; // specify the Right margin info for the PDF file doc.PageInfo.Margin.Right = 40; Page page = doc.Pages.Add(); Aspose.Pdf.Drawing.Graph graph1 = new Aspose.Pdf.Drawing.Graph(500, 2); // Add the line to paraphraphs collection of section object page.Paragraphs.Add(graph1); //specify the coordinates for the line float[] posArr = new float[] { 1, 2, 500, 2 }; Aspose.Pdf.Drawing.Line l1 = new Aspose.Pdf.Drawing.Line(posArr); graph1.Shapes.Add(l1); //Create string variables with text containing html tags string s = "" + " How to Steer Clear of money scams " + ""; //Create text paragraphs containing HTML text HtmlFragment heading_text = new HtmlFragment(s); page.Paragraphs.Add(heading_text); Aspose.Pdf.FloatingBox box = new Aspose.Pdf.FloatingBox(); //Add four columns in the section box.ColumnInfo.ColumnCount = 2; //Set the spacing between the columns box.ColumnInfo.ColumnSpacing = "5"; box.ColumnInfo.ColumnWidths = "105 105"; TextFragment text1 = new TextFragment("By A Googler (The Official Google Blog)"); text1.TextState.FontSize = 8; text1.TextState.LineSpacing = 2; box.Paragraphs.Add(text1); text1.TextState.FontSize = 10; text1.TextState.FontStyle = FontStyles.Italic; // Create a graphs object to draw a line Aspose.Pdf.Drawing.Graph graph2 = new Aspose.Pdf.Drawing.Graph(50, 10); // specify the coordinates for the line float[] posArr2 = new float[] { 1, 10, 100, 10 }; Aspose.Pdf.Drawing.Line l2 = new Aspose.Pdf.Drawing.Line(posArr2); graph2.Shapes.Add(l2); // Add the line to paragraphs collection of section object box.Paragraphs.Add(graph2); TextFragment text2 = new TextFragment(@"Sed augue tortor, sodales id, luctus et, pulvinar ut, eros. Suspendisse vel dolor. Sed quam. Curabitur ut massa vitae eros euismod aliquam. Pellentesque sit amet elit. Vestibulum interdum pellentesque augue. Cras mollis arcu sit amet purus. Donec augue. Nam mollis tortor a elit. Nulla viverra nisl vel mauris. Vivamus sapien. nascetur ridiculus mus. Nam justo lorem, aliquam luctus, sodales et, semper sed, enim Nam justo lorem, aliquam luctus, sodales et,nAenean posuere ante ut neque. Morbi sollicitudin congue felis. Praesent turpis diam, iaculis sed, pharetra non, mollis ac, mauris. Phasellus nisi ipsum, pretium vitae, tempor sed, molestie eu, dui. Duis lacus purus, tristique ut, iaculis cursus, tincidunt vitae, risus. Sed commodo. *** sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam justo lorem, aliquam luctus, sodales et, semper sed, enim Nam justo lorem, aliquam luctus, sodales et, semper sed, enim Nam justo lorem, aliquam luctus, sodales et, semper sed, enim nAenean posuere ante ut neque. Morbi sollicitudin congue felis. Praesent turpis diam, iaculis sed, pharetra non, mollis ac, mauris. Phasellus nisi ipsum, pretium vitae, tempor sed, molestie eu, dui. Duis lacus purus, tristique ut, iaculis cursus, tincidunt vitae, risus. Sed commodo. *** sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Sed urna. . Duis convallis ultrices nisi. Maecenas non ligula. Nunc nibh est, tincidunt in, placerat sit amet, vestibulum a, nulla. Praesent porttitor turpis eleifend ante. Morbi sodales.nAenean posuere ante ut neque. Morbi sollicitudin congue felis. Praesent turpis diam, iaculis sed, pharetra non, mollis ac, mauris. Phasellus nisi ipsum, pretium vitae, tempor sed, molestie eu, dui. Duis lacus purus, tristique ut, iaculis cursus, tincidunt vitae, risus. Sed commodo. *** sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Sed urna. . Duis convallis ultrices nisi. Maecenas non ligula. Nunc nibh est, tincidunt in, placerat sit amet, vestibulum a, nulla. Praesent porttitor turpis eleifend ante. Morbi sodales."); box.Paragraphs.Add(text2); page.Paragraphs.Add(box); string outFile = "c:/pdftest/Muli-Column.pdf"; //Save the Pdf doc.Save(outFile); ' Load the PDF file Dim doc As Document = New Document("source.pdf") ' Get the first link annotation from first page of document Dim linkAnnot As LinkAnnotation = CType(doc.Pages(1).Annotations(1), LinkAnnotation) ' Modification link: change link destination Dim goToAction As Aspose.Pdf.InteractiveFeatures.GoToAction = CType(linkAnnot.Action, Aspose.Pdf.InteractiveFeatures.GoToAction) ' Specify the destination for link object ' The first parameter is document object, second is destination page number. ' The 5ht argument is zoom factor when displaying the respective page. When using 2, the page will be displayed in 200% zoom goToAction.Destination = New Aspose.Pdf.InteractiveFeatures.XYZExplicitDestination(doc, 1, 1, 2, 2) ' Save the document with updated link doc.Save("PDFLINK_Modified_output.pdf") //VB.NET Code Sample Dim doc As Document = New Document() ' specify the left margin info for the PDF file doc.PageInfo.Margin.Left = 40 ' specify the Right margin info for the PDF file doc.PageInfo.Margin.Right = 40 Dim page As Page = doc.Pages.Add() Dim graph1 As Aspose.Pdf.Drawing.Graph = New Aspose.Pdf.Drawing.Graph(500, 2) ' Add the line to paragraphs collection of section object page.Paragraphs.Add(graph1) 'specify the coordinates for the line Dim posArr() As Single = {1, 2, 500, 2} Dim l1 As Aspose.Pdf.Drawing.Line = New Aspose.Pdf.Drawing.Line(posArr) graph1.Shapes.Add(l1) 'Create string variables with text containing html tags Dim s As String = " How to Steer Clear of money scams " 'Create text paragraphs containing HTML text Dim heading_text As HtmlFragment = New HtmlFragment(s) page.Paragraphs.Add(heading_text) Dim box As Aspose.Pdf.FloatingBox = New Aspose.Pdf.FloatingBox() 'Add four columns in the section box.ColumnInfo.ColumnCount = 2 'Set the spacing between the columns box.ColumnInfo.ColumnSpacing = "5" box.ColumnInfo.ColumnWidths = "105 105" Dim text1 As Aspose.Pdf.Text.TextFragment = New Aspose.Pdf.Text.TextFragment("By A Googler (The Official Google Blog)") text1.TextState.FontSize = 8 text1.TextState.LineSpacing = 2 box.Paragraphs.Add(text1) text1.TextState.FontSize = 10 text1.TextState.FontStyle = Aspose.Pdf.Text.FontStyles.Italic ' Create a graphs object to draw a line Dim graph2 As Aspose.Pdf.Drawing.Graph = New Aspose.Pdf.Drawing.Graph(50, 10) ' specify the coordinates for the line Dim posArr2() As Single = {1, 10, 100, 10} Dim l2 As Aspose.Pdf.Drawing.Line = New Aspose.Pdf.Drawing.Line(posArr2) graph2.Shapes.Add(l2) ' Add the line to paragraphs collection of section object box.Paragraphs.Add(graph2) Dim text2 As Aspose.Pdf.Text.TextFragment = New Aspose.Pdf.Text.TextFragment("Sed augue tortor, sodales id, luctus et, pulvinar ut, eros. Suspendisse vel dolor. Sed quam. Curabitur ut massa vitae eros euismod aliquam. Pellentesque sit amet elit. Vestibulum interdum pellentesque augue. Cras mollis arcu sit amet purus. Donec augue. Nam mollis tortor a elit. Nulla viverra nisl vel mauris. Vivamus sapien. nascetur ridiculus mus. Nam justo lorem, aliquam luctus, sodales et, semper sed, enim Nam justo lorem, aliquam luctus, sodales et,nAenean posuere ante ut neque. Morbi sollicitudin congue felis. Praesent turpis diam, iaculis sed, pharetra non, mollis ac, mauris. Phasellus nisi ipsum, pretium vitae, tempor sed, molestie eu, dui. Duis lacus purus, tristique ut, iaculis cursus, tincidunt vitae, risus. Sed commodo. *** sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Nam justo lorem, aliquam luctus, sodales et, semper sed, enim Nam justo lorem, aliquam luctus, sodales et, semper sed, enim Nam justo lorem, aliquam luctus, sodales et, semper sed, enim nAenean posuere ante ut neque. Morbi sollicitudin congue felis. Praesent turpis diam, iaculis sed, pharetra non, mollis ac, mauris. Phasellus nisi ipsum, pretium vitae, tempor sed, molestie eu, dui. Duis lacus purus, tristique ut, iaculis cursus, tincidunt vitae, risus. Sed commodo. *** sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Sed urna. . Duis convallis ultrices nisi. Maecenas non ligula. Nunc nibh est, tincidunt in, placerat sit amet, vestibulum a, nulla. Praesent porttitor turpis eleifend ante. Morbi sodales.nAenean posuere ante ut neque. Morbi sollicitudin congue felis. Praesent turpis diam, iaculis sed, pharetra non, mollis ac, mauris. Phasellus nisi ipsum, pretium vitae, tempor sed, molestie eu, dui. Duis lacus purus, tristique ut, iaculis cursus, tincidunt vitae, risus. Sed commodo. *** sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Sed urna. . Duis convallis ultrices nisi. Maecenas non ligula. Nunc nibh est, tincidunt in, placerat sit amet, vestibulum a, nulla. Praesent porttitor turpis eleifend ante. Morbi sodales.") box.Paragraphs.Add(text2) page.Paragraphs.Add(box) Dim outFile As String = "c:/pdftest/Muli-Column.pdf" 'Save the Pdf doc.Save(outFile)

April 29, 2015

by David Zondray

· 1,466 Views

Diagnosing SST Errors with Percona XtraDB Cluster for MySQL

[This article was written by Stephane Combaudon] State Snapshot Transfer (SST) is used in Percona XtraDB Cluster (PXC) when a new node joins the cluster or to resync a failed node if Incremental State Transfer (IST) is no longer available. SST is triggered automatically but there is no magic: If it is not configured properly, it will not work and new nodes will never be able to join the cluster. Let’s have a look at a few classic issues. Port for SST is not open The donor and the joiner communicate on port 4444, and if the port is closed on one side, SST will always fail. You will see in the error log of the donor that SST is started: [...] 141223 16:08:48 [Note] WSREP: Node 2 (node1) requested state transfer from '*any*'. Selected 0 (node3)(SYNCED) as donor. 141223 16:08:48 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 6) 141223 16:08:48 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 141223 16:08:48 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.234.101:4444/xtrabackup_sst' --auth 'sstuser:s3cret' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid '04c085a1-89ca-11e4-b1b6-6b692803109b:6'' [...] But then nothing happens, and some time later you will see a bunch of errors: [...] 2014/12/23 16:09:52 socat[2965] E connect(3, AF=2 192.168.234.101:4444, 16): Connection timed out WSREP_SST: [ERROR] Error while getting data from donor node: exit codes: 0 1 (20141223 16:09:52.057) WSREP_SST: [ERROR] Cleanup after exit with status:32 (20141223 16:09:52.064) WSREP_SST: [INFO] Cleaning up temporary directories (20141223 16:09:52.068) 141223 16:09:52 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.234.101:4444/xtrabackup_sst' --auth 'sstuser:s3cret' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid '04c085a1-89ca-11e4-b1b6-6b692803109b:6' [...] On the joiner side, you will see a similar sequence: SST is started, then hangs and is finally aborted: [...] 141223 16:08:48 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 6) 141223 16:08:48 [Note] WSREP: Requesting state transfer: success, donor: 0 141223 16:08:49 [Note] WSREP: (f9560d0d, 'tcp://0.0.0.0:4567') turning message relay requesting off 141223 16:09:52 [Warning] WSREP: 0 (node3): State transfer to 2 (node1) failed: -32 (Broken pipe) 141223 16:09:52 [ERROR] WSREP: gcs/src/gcs_group.cpp:long int gcs_group_handle_join_msg(gcs_group_t*, const gcs_recv_msg_t*)():717: Will never receive state. Need to abort. The solution is of course to make sure that the ports are open on both sides. SST is not correctly configured Sometimes you will see an error like this on the donor: 141223 21:03:15 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '192.168.234.102:4444/xtrabackup_sst' --auth 'sstuser:s3cretzzz' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid 'e63f38f2-8ae6-11e4-a383-46557c71f368:0'' [...] WSREP_SST: [ERROR] innobackupex finished with error: 1. Check /var/lib/mysql//innobackup.backup.log (20141223 21:03:26.973) And if you look at innobackup.backup.log: 41223 21:03:26 innobackupex: Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_file=/etc/my.cnf;mysql_read_default_group=xtrabackup;mysql_socket=/var/lib/mysql/mysql.sock' as 'sstuser' (using password: YES). innobackupex: got a fatal error with the following stacktrace: at /usr//bin/innobackupex line 2995 main::mysql_connect('abort_on_error', 1) called at /usr//bin/innobackupex line 1530 innobackupex: Error: Failed to connect to MySQL server: DBI connect(';mysql_read_default_file=/etc/my.cnf;mysql_read_default_group=xtrabackup;mysql_socket=/var/lib/mysql/mysql.sock','sstuser',...) failed: Access denied for user 'sstuser'@'localhost' (using password: YES) at /usr//bin/innobackupex line 2979 What happened? The default SST method is xtrabackup-v2 and for it to work, you need to specify a username/password in the my.cnf file: [mysqld] wsrep_sst_auth=sstuser:s3cret And you also need to create the corresponding MySQL user: mysql> GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost' IDENTIFIED BY 's3cret'; So you should check that the user has been correctly created in MySQL and that wsrep_sst_auth is correctly set. Galera versions do not match Here is another set of errors you may see in the error log of the donor: 141223 21:14:27 [Warning] WSREP: unserialize error invalid flags 2: 71 (Protocol error) at gcomm/src/gcomm/datagram.hpp:unserialize():101 141223 21:14:30 [Warning] WSREP: unserialize error invalid flags 2: 71 (Protocol error) at gcomm/src/gcomm/datagram.hpp:unserialize():101 141223 21:14:33 [Warning] WSREP: unserialize error invalid flags 2: 71 (Protocol error) at gcomm/src/gcomm/datagram.hpp:unserialize():101 Here the issue is that you try to connect a node using Galera 2.x and a node running Galera 3.x. This can happen if you try to use a PXC 5.5 node and a PXC 5.6 node. The right solution is probably to understand why you ended up with such inconsistent versions and make sure all nodes are using the same Percona XtraDB Cluster version and Galera version. But if you know what you are doing, you can also instruct the node using Galera 3.x that it will communicate with Galera 2.x nodes by specifying in the my.cnf file: [mysqld] wsrep_provider_options="socket.checksum=1" Conclusion SST errors can have multiple reasons for occurring, and the best way to diagnose the issue is to have a look at the error log of the donor and the joiner. Galera is in general quite verbose so you can follow the progress of SST on both nodes and see where it fails. Then it is mostly about being able to interpret the error messages.

April 27, 2015

by Peter Zaitsev

· 11,878 Views

Fixed Width Sortable Tables Row with jQueryUI

When you use jQuery UI sortable function on a table I've noticed that it will collapse the width of the row you're dragging which can lead to a strange user experience. In this tutorial we are going to see how you can use a helper function to change the width of dragging rows back to the original width. Have a look at the demo to see the difference. Demo jQuery Sortable is part of the jQuery UI library which can be found below. jQuery Sortable To define a table to have sortable rows all you have to do is apply the sortable method to the parent element of the row, which normal would be the table itself or ideally the table body. FilmDateRatingThe Shawshank Redemption19949.2 Then you can make the table body rows sortable by using the following jQuery code. $('table tbody').sortable(); One of the options you can use on the sortable method is helper property where you can define a function to run when dragging the display. Therefore we simply need to create a function that will reset the width of the table row by simply using the function below. $('table tbody').sortable({ helper: fixWidthHelper }).disableSelection(); function fixWidthHelper(e, ui) { ui.children().each(function() { $(this).width($(this).width()); }); return ui; } Demo

April 27, 2015

by Paul Underwood

· 19,481 Views

Increasing Slow Query Performance with the Parallel Query Execution

[This article was written by Alexander Rubin] MySQL and Scaling-up (using more powerful hardware) was always a hot topic. Originally MySQL did not scale well with multiple CPUs; there were times when InnoDB performed poorer with more CPU cores than with less CPU cores. MySQL 5.6 can scale significantly better; however there is still 1 big limitation: 1 SQL query will eventually use only 1 CPU core (no parallelism). Here is what I mean by that: let’s say we have a complex query which will need to scan million of rows and may need to create a temporary table; in this case MySQL will not be able to scan the table in multiple threads (even with partitioning) so the single query will not be faster on the more powerful server. On the contrary, a server with more slower CPUs will show worse performance than the server with less (but faster) CPUs. To address this issue we can use a parallel query execution. Vadim wrote about the PHP asynchronous calls for MySQL. Another way to increase the parallelism will be to use “sharding” approach, for example with Shard Query. I’ve decided to test out the parallel (asynchronous) query execution with relatively large table: I’ve used the US Flights Ontime performance database, which was originally used by Vadim in the old post Analyzing air traffic performance. Let’s see how this can help us increase performance of the complex query reports. Parallel Query Example To illustrate the parallel query execution with MySQL I’ve created the following table: CREATE TABLE `ontime` ( `YearD` year(4) NOT NULL, `Quarter` tinyint(4) DEFAULT NULL, `MonthD` tinyint(4) DEFAULT NULL, `DayofMonth` tinyint(4) DEFAULT NULL, `DayOfWeek` tinyint(4) DEFAULT NULL, `FlightDate` date DEFAULT NULL, `UniqueCarrier` char(7) DEFAULT NULL, `AirlineID` int(11) DEFAULT NULL, `Carrier` char(2) DEFAULT NULL, `TailNum` varchar(50) DEFAULT NULL, `FlightNum` varchar(10) DEFAULT NULL, `OriginAirportID` int(11) DEFAULT NULL, `OriginAirportSeqID` int(11) DEFAULT NULL, `OriginCityMarketID` int(11) DEFAULT NULL, `Origin` char(5) DEFAULT NULL, `OriginCityName` varchar(100) DEFAULT NULL, `OriginState` char(2) DEFAULT NULL, `OriginStateFips` varchar(10) DEFAULT NULL, `OriginStateName` varchar(100) DEFAULT NULL, `OriginWac` int(11) DEFAULT NULL, `DestAirportID` int(11) DEFAULT NULL, `DestAirportSeqID` int(11) DEFAULT NULL, `DestCityMarketID` int(11) DEFAULT NULL, `Dest` char(5) DEFAULT NULL, -- ... (removed number of fields) `id` int(11) NOT NULL AUTO_INCREMENT, PRIMARY KEY (`id`), KEY `YearD` (`YearD`), KEY `Carrier` (`Carrier`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; And loaded 26 years of data into it. The table is 56G with ~152M rows. Software: Percona 5.6.15-63.0. Hardware: Supermicro; X8DTG-D; 48G of RAM; 24xIntel(R) Xeon(R) CPU L5639 @ 2.13GHz, 1xSSD drive (250G) So we have 24 relatively slow CPUs Simple query Now we can run some queries. The first query is very simple: find all flights per year (in the US): select yeard, count(*) from ontime group by yeard As we have the index on YearD, the query will use the index: mysql> explain select yeard, count(*) from ontime group by yeardG *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ontime type: index possible_keys: YearD,comb1 key: YearD key_len: 1 ref: NULL rows: 148046200 Extra: Using index 1 row in set (0.00 sec) The query is simple, however, it will have to scan 150M rows. Here is the results of the query (cached): mysql> select yeard, count(*) from ontime group by yeard; +-------+----------+ | yeard | count(*) | +-------+----------+ | 1988 | 5202096 | | 1989 | 5041200 | | 1990 | 5270893 | | 1991 | 5076925 | | 1992 | 5092157 | | 1993 | 5070501 | | 1994 | 5180048 | | 1995 | 5327435 | | 1996 | 5351983 | | 1997 | 5411843 | | 1998 | 5384721 | | 1999 | 5527884 | | 2000 | 5683047 | | 2001 | 5967780 | | 2002 | 5271359 | | 2003 | 6488540 | | 2004 | 7129270 | | 2005 | 7140596 | | 2006 | 7141922 | | 2007 | 7455458 | | 2008 | 7009726 | | 2009 | 6450285 | | 2010 | 6450117 | | 2011 | 6085281 | | 2012 | 6096762 | | 2013 | 5349447 | +-------+----------+ 26 rows in set (54.10 sec) The query took 54 seconds and utilized only 1 CPU core. However, this query is perfect for running in parallel. We can run 26 parallel queries, each will count its own year. I’ve used the following shell script to run the queries in background: #!/bin/bash date for y in {1988..2013} do sql="select yeard, count(*) from ontime where yeard=$y" mysql -vvv ontime -e "$sql" &>par_sql1/$y.log & done wait date Here are the results: par_sql1/1988.log:1 row in set (3.70 sec) par_sql1/1989.log:1 row in set (4.08 sec) par_sql1/1990.log:1 row in set (4.59 sec) par_sql1/1991.log:1 row in set (4.26 sec) par_sql1/1992.log:1 row in set (4.54 sec) par_sql1/1993.log:1 row in set (2.78 sec) par_sql1/1994.log:1 row in set (3.41 sec) par_sql1/1995.log:1 row in set (4.87 sec) par_sql1/1996.log:1 row in set (4.41 sec) par_sql1/1997.log:1 row in set (3.69 sec) par_sql1/1998.log:1 row in set (3.56 sec) par_sql1/1999.log:1 row in set (4.47 sec) par_sql1/2000.log:1 row in set (4.71 sec) par_sql1/2001.log:1 row in set (4.81 sec) par_sql1/2002.log:1 row in set (4.19 sec) par_sql1/2003.log:1 row in set (4.04 sec) par_sql1/2004.log:1 row in set (5.12 sec) par_sql1/2005.log:1 row in set (5.10 sec) par_sql1/2006.log:1 row in set (4.93 sec) par_sql1/2007.log:1 row in set (5.29 sec) par_sql1/2008.log:1 row in set (5.59 sec) par_sql1/2009.log:1 row in set (4.44 sec) par_sql1/2010.log:1 row in set (4.91 sec) par_sql1/2011.log:1 row in set (5.08 sec) par_sql1/2012.log:1 row in set (4.85 sec) par_sql1/2013.log:1 row in set (4.56 sec) Complex Query Now we can try more complex query. Lets imagine we want to find out which airlines have maximum delays for the flights inside continental US during the business days from 1988 to 2009 (I was trying to come up with the complex query with multiple conditions in the where clause). select min(yeard), max(yeard), Carrier, count(*) as cnt, sum(ArrDelayMinutes>30) as flights_delayed, round(sum(ArrDelayMinutes>30)/count(*),2) as rate FROM ontime WHERE DayOfWeek not in (6,7) and OriginState not in ('AK', 'HI', 'PR', 'VI') and DestState not in ('AK', 'HI', 'PR', 'VI') and flightdate < '2010-01-01' GROUP by carrier HAVING cnt > 100000 and max(yeard) > 1990 ORDER by rate DESC As the query has “group by” and “order by” plus multiple ranges in the where clause it will have to create a temporary table: id: 1 select_type: SIMPLE table: ontime type: index possible_keys: comb1 key: comb1 key_len: 9 ref: NULL rows: 148046200 Extra: Using where; Using temporary; Using filesort (for this query I’ve created the combined index: KEY comb1 (Carrier,YearD,ArrDelayMinutes) to increase performance) The query runs in ~15 minutes: +------------+------------+---------+----------+-----------------+------+ | min(yeard) | max(yeard) | Carrier | cnt | flights_delayed | rate | +------------+------------+---------+----------+-----------------+------+ | 2003 | 2009 | EV | 1454777 | 237698 | 0.16 | | 2006 | 2009 | XE | 1016010 | 152431 | 0.15 | | 2006 | 2009 | YV | 740608 | 110389 | 0.15 | | 2003 | 2009 | B6 | 683874 | 103677 | 0.15 | | 2003 | 2009 | FL | 1082489 | 158748 | 0.15 | | 2003 | 2005 | DH | 501056 | 69833 | 0.14 | | 2001 | 2009 | MQ | 3238137 | 448037 | 0.14 | | 2003 | 2006 | RU | 1007248 | 126733 | 0.13 | | 2004 | 2009 | OH | 1195868 | 160071 | 0.13 | | 2003 | 2006 | TZ | 136735 | 16496 | 0.12 | | 1988 | 2009 | UA | 9593284 | 1197053 | 0.12 | | 1988 | 2009 | AA | 10600509 | 1185343 | 0.11 | | 1988 | 2001 | TW | 2659963 | 280741 | 0.11 | | 1988 | 2009 | CO | 6029149 | 673863 | 0.11 | | 2007 | 2009 | 9E | 577244 | 59440 | 0.10 | | 1988 | 2009 | DL | 11869471 | 1156267 | 0.10 | | 1988 | 2009 | NW | 7601727 | 725460 | 0.10 | | 1988 | 2009 | AS | 1506003 | 146920 | 0.10 | | 2003 | 2009 | OO | 2654259 | 257069 | 0.10 | | 1988 | 2009 | US | 10276941 | 991016 | 0.10 | | 1988 | 1991 | PA | 206841 | 19465 | 0.09 | | 1988 | 2005 | HP | 2607603 | 235675 | 0.09 | | 1988 | 2009 | WN | 12722174 | 1107840 | 0.09 | | 2005 | 2009 | F9 | 307569 | 28679 | 0.09 | +------------+------------+---------+----------+-----------------+------+ 24 rows in set (15 min 56.40 sec) Now we can split this query and run the 31 queries (=31 distinct airlines in this table) in parallel. I have used the following script: date for c in '9E' 'AA' 'AL' 'AQ' 'AS' 'B6' 'CO' 'DH' 'DL' 'EA' 'EV' 'F9' 'FL' 'HA' 'HP' 'ML' 'MQ' 'NW' 'OH' 'OO' 'PA' 'PI' 'PS' 'RU' 'TW' 'TZ' 'UA' 'US' 'WN' 'XE' 'YV' do sql=" select min(yeard), max(yeard), Carrier, count(*) as cnt, sum(ArrDelayMinutes>30) as flights_delayed, round(sum(ArrDelayMinutes>30)/count(*),2) as rate FROM ontime WHERE DayOfWeek not in (6,7) and OriginState not in ('AK', 'HI', 'PR', 'VI') and DestState not in ('AK', 'HI', 'PR', 'VI') and flightdate < '2010-01-01' and carrier = '$c'" mysql -uroot -vvv ontime -e "$sql" &>par_sql_complex/$c.log & done wait date In this case we will also avoid creating temporary table (as we have an index which starts with carrier). Results: total time is 5 min 47 seconds (3x faster) Start: 15:41:02 EST 2013 End: 15:46:49 EST 2013 Per query statistics: par_sql_complex/9E.log:1 row in set (44.47 sec) par_sql_complex/AA.log:1 row in set (5 min 41.13 sec) par_sql_complex/AL.log:1 row in set (15.81 sec) par_sql_complex/AQ.log:1 row in set (14.52 sec) par_sql_complex/AS.log:1 row in set (2 min 43.01 sec) par_sql_complex/B6.log:1 row in set (1 min 26.06 sec) par_sql_complex/CO.log:1 row in set (3 min 58.07 sec) par_sql_complex/DH.log:1 row in set (31.30 sec) par_sql_complex/DL.log:1 row in set (5 min 47.07 sec) par_sql_complex/EA.log:1 row in set (28.58 sec) par_sql_complex/EV.log:1 row in set (2 min 6.87 sec) par_sql_complex/F9.log:1 row in set (46.18 sec) par_sql_complex/FL.log:1 row in set (1 min 30.83 sec) par_sql_complex/HA.log:1 row in set (39.42 sec) par_sql_complex/HP.log:1 row in set (2 min 45.57 sec) par_sql_complex/ML.log:1 row in set (4.64 sec) par_sql_complex/MQ.log:1 row in set (2 min 22.55 sec) par_sql_complex/NW.log:1 row in set (4 min 26.67 sec) par_sql_complex/OH.log:1 row in set (1 min 9.67 sec) par_sql_complex/OO.log:1 row in set (2 min 14.97 sec) par_sql_complex/PA.log:1 row in set (17.62 sec) par_sql_complex/PI.log:1 row in set (14.52 sec) par_sql_complex/PS.log:1 row in set (3.46 sec) par_sql_complex/RU.log:1 row in set (40.14 sec) par_sql_complex/TW.log:1 row in set (2 min 32.32 sec) par_sql_complex/TZ.log:1 row in set (14.16 sec) par_sql_complex/UA.log:1 row in set (4 min 55.18 sec) par_sql_complex/US.log:1 row in set (4 min 38.08 sec) par_sql_complex/WN.log:1 row in set (4 min 56.12 sec) par_sql_complex/XE.log:1 row in set (24.21 sec) par_sql_complex/YV.log:1 row in set (20.82 sec) As we can see there are large airlines (like AA, UA, US, DL, etc) which took most of the time. In this case the load will not be distributed evenly as in the previous example; however, by running the query in parallel we have got 3x times better response time on this server. CPU utilization: Cpu3 : 22.0%us, 1.2%sy, 0.0%ni, 74.4%id, 2.4%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 16.0%us, 0.0%sy, 0.0%ni, 84.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 39.0%us, 1.2%sy, 0.0%ni, 56.1%id, 3.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 33.3%us, 0.0%sy, 0.0%ni, 51.9%id, 13.6%wa, 0.0%hi, 1.2%si, 0.0%st Cpu7 : 33.3%us, 1.2%sy, 0.0%ni, 48.8%id, 16.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu8 : 24.7%us, 0.0%sy, 0.0%ni, 60.5%id, 14.8%wa, 0.0%hi, 0.0%si, 0.0%st Cpu9 : 24.4%us, 0.0%sy, 0.0%ni, 56.1%id, 19.5%wa, 0.0%hi, 0.0%si, 0.0%st Cpu10 : 40.7%us, 0.0%sy, 0.0%ni, 56.8%id, 2.5%wa, 0.0%hi, 0.0%si, 0.0%st Cpu11 : 19.5%us, 1.2%sy, 0.0%ni, 65.9%id, 12.2%wa, 0.0%hi, 1.2%si, 0.0%st Cpu12 : 40.2%us, 1.2%sy, 0.0%ni, 56.1%id, 2.4%wa, 0.0%hi, 0.0%si, 0.0%st Cpu13 : 82.7%us, 0.0%sy, 0.0%ni, 17.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu14 : 55.4%us, 0.0%sy, 0.0%ni, 43.4%id, 1.2%wa, 0.0%hi, 0.0%si, 0.0%st Cpu15 : 86.6%us, 0.0%sy, 0.0%ni, 13.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu16 : 61.0%us, 1.2%sy, 0.0%ni, 37.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu17 : 29.3%us, 1.2%sy, 0.0%ni, 69.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu18 : 18.8%us, 0.0%sy, 0.0%ni, 52.5%id, 28.8%wa, 0.0%hi, 0.0%si, 0.0%st Cpu19 : 14.3%us, 1.2%sy, 0.0%ni, 57.1%id, 27.4%wa, 0.0%hi, 0.0%si, 0.0%st Cpu20 : 12.3%us, 0.0%sy, 0.0%ni, 59.3%id, 28.4%wa, 0.0%hi, 0.0%si, 0.0%st Cpu21 : 10.7%us, 0.0%sy, 0.0%ni, 76.2%id, 11.9%wa, 0.0%hi, 1.2%si, 0.0%st Cpu22 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu23 : 10.8%us, 2.4%sy, 0.0%ni, 71.1%id, 15.7%wa, 0.0%hi, 0.0%si, 0.0%st Note that in case of “order by” we will need to manually sort the results, however, sorting 10-100 rows will be fast. Conclusion Splitting a complex report into multiple queries and running it in parallel (asynchronously) can increase performance (3x to 10x in the above example) and will better utilize modern hardware. It is also possible to split the queries between multiple MySQL servers (i.e. MySQL slave servers) to further increase scalability (will require more coding).

April 25, 2015

by Peter Zaitsev

· 13,204 Views

Agrona Event Counters

Efficient open source event counters from the Agrona library. Agrona The Agrona library is an open source Java library of utility code. Unlike libraries such as Google Guava or Apache Commons which are general purpose Java utility libraries, Agrona is targeted at providing high performance code. It initially consists of code from the open source Aeron messaging library. Event Counters One of the features of the Agrona library is the event counters framework. One of the design goals of Aeron was to be easy to monitor. We wanted to make sure that people could easily check up on what Aeron is doing with services such as Nagios, internally written monitoring software or just from the commandline. Writing integrations with many services is a herculean task in and of itself, but we definitely wanted to be able to expose an API. We also didn't want to incorporate large 3rd party external dependencies, any allocation heavy code or things we couldn't control the performance of. This meant that we were going to have to write our own event counters rather than using something like the Coda Hale metrics framework. Our requirements for monitoring were very simple though. Update or increment the counter's value. Read or write to/from the counter value from a different thread or process. No garbage creation after initial setup. Labels should be associated with each counter value for readability's sake. Design Threadsafe updates of a long value is a very simple operation and already supported in Java through the AtomicLong class. The problem with using an AtomicLong as your event counter is that an external program, running in a different process, can't read from the value from your Java heap. Consequently Agrona's event counters are allocated in an off-heap buffer. This can be placed on a memory-mapped file which means it can be shared between two different processes. In order to give our counters names we store a table of name and counter id entries on another buffer. This can be placed on the same memory mapped file for convenience. API The CountersManager is responsible for allocating event counters. It needs to be instantiated with the buffers upon which to store the event counters and their labels. Here is an example of how to instantiate a counter from the counters manager. AtomicCounter conductorProxyFails = countersManager.newCounter("Failed offers to DriverConductorProxy"); You can also iterate over the current counter names and their ids. Here is some code that uses that to print a table of the event counter values: countersManager.forEach((id, label)->{finalint offset =CountersManager.counterOffset(id);finallong value = valuesBuffer.getLongVolatile(offset);System.out.format("%3d: %,20d - %s\n", id, value, label);}); Each instance of AtomicCounter represents one event counter. Here are some examples of using the atomic counter in code. // Increment the counter in a thread-safe manner conductorProxyFails.increment();// Increment the counter if you're only writing from a single thread conductorProxyFails.orderedIncrement();// atomically add 5 to the counter value conductorProxyFails.add(5);// Reset the counter conductorProxyFails.set(0); Conclusions I've just gone through a few simple examples of how to use the event counters from Agrona, which hopefully you've found useful. This isn't the only code in Agrona though - there are utilities for agents, and executing timing events as well as collections such as queues, ringbuffers and hashmaps. We're also expanding the library which is already on maven central. Currently documentation is a little bit thin on the ground, but contributions are always welcome. Thanks to Martin Thompson and Chris West for feedback on this blog post.

April 23, 2015

by Richard Warburton

· 5,871 Views

AutoCompleteTextBox in C# Windows Form Application

In this Article, We will learn how to create AutoCompleteTextBox using C# Windows Form Application. In my previous article, we learned How to Search Records in DataGridView Using C#. Let's Begin. Create a new Windows Form Application. Drop a Label and TextBox Control from the ToolBox. Now go to Code behind file(.cs code) and add the following Code: using System; using System.Windows.Forms; namespace AutoCompleteTextBoxDemo { public partial class Form1 : Form { public Form1() { InitializeComponent(); } //AutoCompleteData Method private void autoCompleteData() { //Set AutoCompleteSource property of txt_StateName as CustomSource txt_StateName.AutoCompleteSource = AutoCompleteSource.CustomSource; //Set AutoCompleteMode property of txt_StateName as SuggestAppend. SuggestAppend Applies both Suggest and Append txt_StateName.AutoCompleteMode = AutoCompleteMode.SuggestAppend; txt_StateName.AutoCompleteCustomSource.AddRange(new string[]{"Maharastra","Andhra Pradesh","Assam","Punjab","Arunachal Pradesh","Bihar","Goa","Gujarat","Haryana"}); } private void Form1_Load(object sender, EventArgs e) { autoCompleteData(); } } } In the preeceding code, We set the AutoCompleteSource, AutoCompleteMode and AutoCompleteCustomSource properties of Textbox named as txt_StateName so that it automatically completes the input string. Preview: AutoComplete TextBox using a Database: In this example, We will Suggest/Append the data in TextBox(txt_StateName) from the Database. For Demonstration, I have created a Database (named Sample). Add a Table, tbl_State. The following is the table schema for creating tbl_State. Add the following lines of code: using System; using System.Windows.Forms; using System.Data.SqlClient; namespace AutoCompleteTextBoxDemo { public partial class Form2 : Form { public Form2() { InitializeComponent(); } private void autoCompleteData() { SqlConnection con = new SqlConnection("Data Source=.;Initial Catalog=Sample;Integrated Security=true;"); SqlCommand com = new SqlCommand("Select State from tbl_State", con); con.Open(); SqlDataReader rdr = com.ExecuteReader(); //AutoCompleteStringCollection Contains a collection of strings to use for the auto-complete feature on certain Windows Forms controls. AutoCompleteStringCollection autoCompleteCollection = new AutoCompleteStringCollection(); while (rdr.Read()) { autoCompleteCollection.Add(rdr.GetString(0)); } //Set AutoCompleteSource property of txt_StateName as CustomSource txt_StateName.AutoCompleteSource = AutoCompleteSource.CustomSource; //Set AutoCompleteMode property of txt_StateName as SuggestAppend. SuggestAppend Applies both Suggest and Append txt_StateName.AutoCompleteMode = AutoCompleteMode.SuggestAppend; txt_StateName.AutoCompleteCustomSource = autoCompleteCollection; con.Close(); } //Form2_Load Event private void Form2_Load(object sender, EventArgs e) { autoCompleteData(); } } } Preview: Hope you like it. Thanks.

April 23, 2015

by Anoop Kumar Sharma

· 7,606 Views

How to Work with Merged Cells in Word Documents Table inside Android Apps

This technical tip shows how developers can work with merged cells in a Word documents inside Android applications. Several cells in a table can be merged together into a single cell. This is useful when crows require a title or large blocks of text which span across the width of the table. This can only be achieved by merging cells in the table into a single cell. Aspose.Words supports merged cells when working with all input formats including when importing HTML content. In Aspose.Words, merged cells are represented by CellFormat.HorizontalMerge and CellFormat.VerticalMerge. The CellFormat.HorizontalMerge property describes if the cell is part of a horizontal merge of cells. Likewise the CellFormat.VerticalMerge property describes if the cell is a part of a vertical merge of cells. The values of these properties are what define the merge behavior of cells. The first cell in a sequence of merged cells will have CellMerge.First. Any subsequent merged cells has CellMerge.Previous. A cell which is not merged has CellMerge.None. Sometimes when you load an existing document cells in a table will appear merged. However these can be in fact one long cell. Microsoft Word at times is known to export merged cells in this way. This can cause confusion when attempting to work with individual cells. There appears to be no particular pattern as to when this happens. //Checking if a Cell is Merged // Prints the horizontal and vertical merge type of a cell. public void checkCellsMerged() throws Exception { Document doc = new Document(getMyDir() + "Table.MergedCells.doc"); // Retrieve the first table in the document. Table table = (Table)doc.getChild(NodeType.TABLE, 0, true); for (Row row : table.getRows()) { for (Cell cell : row.getCells()) { System.out.println(printCellMergeType(cell)); } } } public String printCellMergeType(Cell cell) { boolean isHorizontallyMerged = cell.getCellFormat().getHorizontalMerge() != CellMerge.NONE; boolean isVerticallyMerged = cell.getCellFormat().getVerticalMerge() != CellMerge.NONE; String cellLocation = MessageFormat.format("R{0}, C{1}", cell.getParentRow().getParentTable().indexOf(cell.getParentRow()) + 1, cell.getParentRow().indexOf(cell) + 1); if (isHorizontallyMerged && isVerticallyMerged) return MessageFormat.format("The cell at {0} is both horizontally and vertically merged", cellLocation); else if (isHorizontallyMerged) return MessageFormat.format("The cell at {0} is horizontally merged.", cellLocation); else if (isVerticallyMerged) return MessageFormat.format("The cell at {0} is vertically merged", cellLocation); else return MessageFormat.format("The cell at {0} is not merged", cellLocation); } //Merging Cells in a Table //Creates a table with two rows with cells in the first row horizontally merged. Document doc = new Document(); DocumentBuilder builder = new DocumentBuilder(doc); builder.insertCell(); builder.getCellFormat().setHorizontalMerge(CellMerge.FIRST); builder.write("Text in merged cells."); builder.insertCell(); // This cell is merged to the previous and should be empty. builder.getCellFormat().setHorizontalMerge(CellMerge.PREVIOUS); builder.endRow(); builder.insertCell(); builder.getCellFormat().setHorizontalMerge(CellMerge.NONE); builder.write("Text in one cell."); builder.insertCell(); builder.write("Text in another cell."); builder.endRow(); builder.endTable(); //Example: Merging Cells Vertically //Creates a table with two columns with cells merged vertically in the first column. Document doc = new Document(); DocumentBuilder builder = new DocumentBuilder(doc); builder.insertCell(); builder.getCellFormat().setVerticalMerge(CellMerge.FIRST); builder.write("Text in merged cells."); builder.insertCell(); builder.getCellFormat().setVerticalMerge(CellMerge.NONE); builder.write("Text in one cell"); builder.endRow(); builder.insertCell(); // This cell is vertically merged to the cell above and should be empty. builder.getCellFormat().setVerticalMerge(CellMerge.PREVIOUS); builder.insertCell(); builder.getCellFormat().setVerticalMerge(CellMerge.NONE); builder.write("Text in another cell"); builder.endRow(); builder.endTable(); //Merging all Cells in a Range //A method which merges all cells of a table in the specified range of cells /** * Merges the range of cells found between the two specified cells both horizontally and vertically. Can span over multiple rows. */ public static void mergeCells(Cell startCell, Cell endCell) { Table parentTable = startCell.getParentRow().getParentTable(); // Find the row and cell indices for the start and end cell. Point startCellPos = new Point(startCell.getParentRow().indexOf(startCell), parentTable.indexOf(startCell.getParentRow())); Point endCellPos = new Point(endCell.getParentRow().indexOf(endCell), parentTable.indexOf(endCell.getParentRow())); // Create the range of cells to be merged based off these indices. Inverse each index if the end cell if before the start cell. Rectangle mergeRange = new Rectangle(Math.min(startCellPos.x, endCellPos.x), Math.min(startCellPos.y, endCellPos.y), Math.abs(endCellPos.x - startCellPos.x) + 1, Math.abs(endCellPos.y - startCellPos.y) + 1); for (Row row : parentTable.getRows()) { for(Cell cell : row.getCells()) { Point currentPos = new Point(row.indexOf(cell), parentTable.indexOf(row)); // Check if the current cell is inside our merge range then merge it. if (mergeRange.contains(currentPos)) { if (currentPos.x == mergeRange.x) cell.getCellFormat().setHorizontalMerge(CellMerge.FIRST); else cell.getCellFormat().setHorizontalMerge(CellMerge.PREVIOUS); if (currentPos.y == mergeRange.y) cell.getCellFormat().setVerticalMerge(CellMerge.FIRST); else cell.getCellFormat().setVerticalMerge(CellMerge.PREVIOUS); } } } } //Merging Cells between Two Cells // Merges the range of cells between the two specified cells // We want to merge the range of cells found inbetween these two cells. Cell cellStartRange = table.getRows().get(2).getCells().get(2); Cell cellEndRange = table.getRows().get(3).getCells().get(3); // Merge all the cells between the two specified cells into one. mergeCells(cellStartRange, cellEndRange);

April 22, 2015

by David Zondray

· 6,413 Views

Why Elasticsearch is Suitable for Application Log Analytics

Handling Application Logs Enterprise application development using Web technologies has been around for a long time. In recent years we have seen a sharp increase in the deployment of such applications. This is partly due to the proliferation of ecommerce sites, social media sites, mobile application supporting sites, as well as the desire of enterprises to have their applications available 24x7. In most cases, such applications cater to huge load and are deployed on cloud infrastructure. Monitoring deployed applications is increasingly becoming a crucial task, as deployed applications are bound to fail, irrespective of the robust techniques used during development. Whenever an application fails, the most common resolution method starts by examining the application log. If the application has implemented logging properly, the logs can reveal the cause of application failure. Examination of log files is usually done by viewing the file using tools like vi, less, more, tail or grep. Another method is to download the file to a Windows system and viewing it using an editor like Notepad++. Engineers usually scan the log information to look for clues that point to the reasons for failure. Once the cause of failure is identified, suitable action is taken for restoring the application and/or service. The Key to Application Log Analytics This process, of logging onto a remote system and viewing logs is tedious. Additionally, many of the tools do not provide support to make the task of issue identification any simpler. Even when using tools like grep (if we know the pattern), we still need to view the logs in order to go through other information that has been logged, such as the log information that precedes the failure point. While it has always been possible to develop applications to parse application logs, the recent renewed interest in application log analytics is due to the acceptance of NoSQL-like technologies and the availability of standard tools to parse application logs. Though relational databases (RDBMS) have for many years provided the facility to store structured data, they are not well-suited for handling log data, as in many cases, the structure of the logged information is not the same across the file. This does not fit well in the rigidly defined world of an RDBMS. In comparison, NoSQL allows document flexibility and documents with different schemas can be stored in the same database / index / store. The ability to convert log data into a well-defined structure, as well as the ability to search, are key to implement a modern log analytics solution. In this document, we cover how Elasticsearch. Elasticsearch can store documents, giving us the benefit of structured storage without the overheads of a database system. The Suitability of Elasticsearch In the following subsections, we share our views as to why Elasticsearch is a suitable data store for an application log analytics solution. Elasticsearch is part of a popular trio of tools, commonly known as ELK. Of these, L stands for Logstash, the log parser; E stands for Elasticsearch, the document store; and K stands for Kibana, the visualization tool. Storing Documents Logstash can be used to parse plain text data into structured text. Once data has some structure, it becomes easy to find information by enabling search on it. While parsing application logs is not a challenge, the challenge has been in storing the data and enabling search on it. Most prior solutions have used an RDBMS for storage, but the varying structure and textual nature of application logs makes it difficult to use an RDBMS table structure to store data. RDBMSs are not geared toward ‘search’. They are geared for maintaining a ‘single value of truth’ for the data, defining relations between the data, ensuring their consistency and so on. Search is also not a strong point for RDBMSs as they use exact matches for values, while Elasticsearch supports exact matches as well as partial matches. It also supports document scoring, which attaches a confidence factor to the documents located. Elasticsearch supports documents in JSON format and uses the NoSQL philosophy for document storage. This has the advantage of allowing a flexible schema for the data. Unlike an RDBMS, Elasticsearch is a search engine at heart and hence is built for the same. Though Elasticsearch uses NoSQL for storing documents, it does not provide robust methods to update stored data. Not supporting updates is a serious disadvantage in most cases. In the case of application logs, not supporting updates actually works in favour of Elasticsearch. In case of machine logs, updates are not really required. Application logs are generated from a debugging perspective – having data handy for debugging purposes in the event of application crash or incorrect execution. They usually record important events from application execution and provide additional information to allow application developers to identify the reasons for failure. Additionally, existing information in application logs is rarely, if ever, updated. New information is continually being written to the logs, with no need to refer to old information. This plays to Elasticsearch’s strength, which is able to ingest and index new information very quickly. Search One of the easiest ways of locating information from large volumes of logs is to perform a search. Elasticsearch is well suited not only to handle search, it also supports huge volume of data, using distributed computing (implemented using Shards). While Kibana is one of the commonly used tools to display and visualize information stored in Elasticsearch, it is more suited to display standard charts like bar chart, column chart and pie chart. If the features provided by Kibana are not enough, we can always use Elasticsearch’s REST API support and it’s Query DSL (Domain-Specific Language), to search for required information. The Query DSL and the result of the query are in JSON format. Though this format makes it easy for applications to parse and process, users would need a friendly user interface to interact with the data. Handling Voluminous Data Elasticsearch supports distributed search out of the box – using the concept of ‘shards’. A shard is a single Lucene instance and is managed by Elasticsearch. Two types of shards, namely ‘primary shard’ and ‘replica shard’ are supported. By default, a document is first indexed on the primary shard and then on the replica shards. The number of primary shards can be specified, to cater to the expected volume. By default, Elasticsearch creates five shards for an index. But, once the number of primary shards is decided, it cannot be changed. A replica shards are copies the primary shard. They are used to handle fail-over and the increase performance. While performance across voluminous data can be handled by sharding, it is important to note that shards, once created for an index, cannot be changed. Thus, the sharding strategy of the data has to be decided in advance, after an assessment of the data and an estimation of its growth. In the case of application logs, the sharding strategy can be based on the application name, the business unit ID, the application OD or the application’s geolocation, just to name a few. Analytics By storing data in a structure, analytics can be enabled on the data. Not only can application perform a simple search, it is also possible to restrict the search for specific terms or over a specified time period. Structured storage also makes it easier to develop reports with well-defined visualizations, which in turn makes it easy to understand the current state of applications. It is also possible to perform various analytics operations like time series analysis using the timestamp and identification of patterns from the data using machine learning techniques (assuming, we have the right kind of data in the logs). Though Elasticsearch does not provide built-in support for analytics, applications can benefit from its fast search capability and also from its ability to handle voluminous data sets. In Closing One of the main hurdles for application logs has been the ability to search for information from the huge volume of data. By parsing application log files using Logstash, we can convert a flat file into structured data. Structured data, once stored in Elasticsearch, is easier to search and locate. Visualizations and business logic for generating alerts and tickets is easier to develop on structured data. Elasticsearch, which stores and searches documents, along with its ability to scale over huge volume of data, is a good candidate for inclusion in an application log analytics solution.

April 22, 2015

by Bipin Patwardhan

· 11,766 Views · 2 Likes

On Neo4j Indexes, Match and Merge

Neo4j uses schema indexes, constraints, merges, matches, and a variety of other indexing features to help with your NoSQL needs.

April 20, 2015

by Michael Hunger

· 11,866 Views

Profiling MySQL Queries from Performance Schema

[This article was written by Jarvin Real] When optimizing queries and investigating performance issues, MySQL comes with built in support for profiling queries aka SET profiling=1; . This is already awesome and simple to use, but why the PERFORMANCE_SCHEMA alternative? Because profiling will be removed soon (already deprecated on MySQL 5.6 ad 5.7); the built-in profiling capability can only be enabled per session. This means that you cannot capture profiling information for queries running from other connections. If you are using Percona Server, the profiling option forlog_slow_verbosity is a nice alternative, unfortunately, not everyone is using Percona Server. Now, for a quick demo: I execute a simple query and profile it below. Note that all of these commands are executed from a single session to my test instance. mysql> SHOW PROFILES; +----------+------------+----------------------------------------+ | Query_ID | Duration | Query | +----------+------------+----------------------------------------+ | 1 | 0.00011150 | SELECT * FROM sysbench.sbtest1 LIMIT 1 | +----------+------------+----------------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> SHOW PROFILE SOURCE FOR QUERY 1; +----------------------+----------+-----------------------+------------------+-------------+ | Status | Duration | Source_function | Source_file | Source_line | +----------------------+----------+-----------------------+------------------+-------------+ | starting | 0.000017 | NULL | NULL | NULL | | checking permissions | 0.000003 | check_access | sql_parse.cc | 5797 | | Opening tables | 0.000021 | open_tables | sql_base.cc | 5156 | | init | 0.000009 | mysql_prepare_select | sql_select.cc | 1050 | | System lock | 0.000005 | mysql_lock_tables | lock.cc | 306 | | optimizing | 0.000002 | optimize | sql_optimizer.cc | 138 | | statistics | 0.000006 | optimize | sql_optimizer.cc | 381 | | preparing | 0.000005 | optimize | sql_optimizer.cc | 504 | | executing | 0.000001 | exec | sql_executor.cc | 110 | | Sending data | 0.000025 | exec | sql_executor.cc | 190 | | end | 0.000002 | mysql_execute_select | sql_select.cc | 1105 | | query end | 0.000003 | mysql_execute_command | sql_parse.cc | 5465 | | closing tables | 0.000004 | mysql_execute_command | sql_parse.cc | 5544 | | freeing items | 0.000005 | mysql_parse | sql_parse.cc | 6969 | | cleaning up | 0.000006 | dispatch_command | sql_parse.cc | 1874 | +----------------------+----------+-----------------------+------------------+-------------+ 15 rows in set, 1 warning (0.00 sec) To demonstrate how we can achieve the same with Performance Schema, we first identify our current connection id. In the real world, you might want to get the connection/processlist id of the thread you want to watch i.e. from SHOW PROCESSLIST . mysql> SELECT THREAD_ID INTO @my_thread_id -> FROM threads WHERE PROCESSLIST_ID = CONNECTION_ID(); Query OK, 1 row affected (0.00 sec) Next, we identify the bounding EVENT_IDs for the statement stages. We will look for the statement we wanted to profile using the query below from the events_statements_history_long table. Your LIMIT clause may vary depending on how much queries the server might be getting. mysql> SELECT THREAD_ID, EVENT_ID, END_EVENT_ID, SQL_TEXT, NESTING_EVENT_ID -> FROM events_statements_history_long -> WHERE THREAD_ID = @my_thread_id -> AND EVENT_NAME = 'statement/sql/select' -> ORDER BY EVENT_ID DESC LIMIT 3 G *************************** 1. row *************************** THREAD_ID: 13848 EVENT_ID: 419 END_EVENT_ID: 434 SQL_TEXT: SELECT THREAD_ID INTO @my_thread_id FROM threads WHERE PROCESSLIST_ID = CONNECTION_ID() NESTING_EVENT_ID: NULL *************************** 2. row *************************** THREAD_ID: 13848 EVENT_ID: 374 END_EVENT_ID: 392 SQL_TEXT: SELECT * FROM sysbench.sbtest1 LIMIT 1 NESTING_EVENT_ID: NULL *************************** 3. row *************************** THREAD_ID: 13848 EVENT_ID: 353 END_EVENT_ID: 364 SQL_TEXT: select @@version_comment limit 1 NESTING_EVENT_ID: NULL 3 rows in set (0.02 sec) From the results above, we are mostly interested with the EVENT_ID and END_EVENT_ID values from the second row, this will give us the stage events of this particular query from the events_stages_history_long table. mysql> SELECT EVENT_NAME, SOURCE, (TIMER_END-TIMER_START)/1000000000 as 'DURATION (ms)' -> FROM events_stages_history_long -> WHERE THREAD_ID = @my_thread_id AND EVENT_ID BETWEEN 374 AND 392; +--------------------------------+----------------------+---------------+ | EVENT_NAME | SOURCE | DURATION (ms) | +--------------------------------+----------------------+---------------+ | stage/sql/init | mysqld.cc:998 | 0.0214 | | stage/sql/checking permissions | sql_parse.cc:5797 | 0.0023 | | stage/sql/Opening tables | sql_base.cc:5156 | 0.0205 | | stage/sql/init | sql_select.cc:1050 | 0.0089 | | stage/sql/System lock | lock.cc:306 | 0.0047 | | stage/sql/optimizing | sql_optimizer.cc:138 | 0.0016 | | stage/sql/statistics | sql_optimizer.cc:381 | 0.0058 | | stage/sql/preparing | sql_optimizer.cc:504 | 0.0044 | | stage/sql/executing | sql_executor.cc:110 | 0.0008 | | stage/sql/Sending data | sql_executor.cc:190 | 0.0251 | | stage/sql/end | sql_select.cc:1105 | 0.0017 | | stage/sql/query end | sql_parse.cc:5465 | 0.0031 | | stage/sql/closing tables | sql_parse.cc:5544 | 0.0037 | | stage/sql/freeing items | sql_parse.cc:6969 | 0.0056 | | stage/sql/cleaning up | sql_parse.cc:1874 | 0.0006 | +--------------------------------+----------------------+---------------+ 15 rows in set (0.01 sec) As you can see the results are pretty close, not exactly the same but close. SHOW PROFILE shows Duration in seconds, while the results above is in milliseconds. Some limitations to this method though: As we’ve seen it takes a few hoops to dish out the information we need. Because we have to identify the statement we have to profile manually, this procedure may not be easy to port into tools like the sys schema or pstop. Only possible if Performance Schema is enabled (by default its enabled since MySQL 5.6.6, yay!) Does not cover all metrics compared to the native profiling i.e. CONTEXT SWITCHES, BLOCK IO, SWAPS Depending on how busy the server you are running the tests, the sizes of the history tables may be too small, as such you either have to increase or loose the history to early i.e.performance_schema_events_stages_history_long_size variable. Using ps_history might help in this case though with a little modification to the queries. The resulting Duration per event may vary, I would think this may be due to the additional as described on performance_timers table. In any case we hope to get this cleared up as result whenthis bug is fixed.

April 18, 2015

by Peter Zaitsev

· 8,109 Views