DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Microservices Topics

article thumbnail
Apache Camel in Space—Erh, in Docker and Kubernetes and Fancy Fabric8 Web Console
I have just recorded a 5 minute video that demonstrates running an out of stock example from Apache Camel release, the camel-example-servlet packaged as a docker container and running on a kubernetes platform, such as openshift 3. camel-servlet-example scaled up to 3 running containers (pods) which is easy with kubernetes and fabric8 In this video I have already deployed the example and then demonstrates how we can use the fabric8 web console to manage our application. And also connect to the running container and see inside, such as the Camel routes visually as shown above. Then I run a simple bash script from my laptop that sends a HTTP GET to the Camel example and prints the response. The script runs in a endless loop and demonstrates how kubernetes can easily scale up and down multiple Camel containers and load balance across the running containers. And at the end its even self healing when I force killing docker containers. So I suggest to grab a fresh cup of tea or coffee and sit back and play the 5 minutes video. The video is hosted on vimeo and can be seen from this link.
June 29, 2015
by Claus Ibsen
· 2,148 Views · 1 Like
article thumbnail
Docker Events and Docker Metrics Monitoring
Docker deployments can be very dynamic with containers being started and stopped, moved around the YARN or Mesos-managed clusters, having very short life spans (the so-called pets) or long uptimes (aka cattle). Getting insight into the current and historical state of such clusters goes beyond collecting container performance metrics and sending alert notifications. If a container dies or gets paused, for example, you may want to know about it, right? Or maybe you’d want to be able to see that a container went belly up in retrospect when troubleshooting, wouldn’t you? Just two weeks ago we added Docker Monitoring (docker image is right here for your pulling pleasure) to SPM. We didn’t stop there — we’ve now expanded SPM’s Docker support by adding Docker Event collection, charting, and correlation. Every time a container is created or destroyed, started, stopped, or when it dies, spm-agent-docker captures the appropriate event so you can later see what happened where and when, correlate it with metrics, alerts, anomalies — all of which are captured in SPM — or with any other information you have at your disposal. The functionality and the value this brings should be pretty obvious from the annotated screenshot below. Like this post? Please tweet about Docker Events and Docker Metrics Monitoring Know somebody who’d find this post useful? Please let them know… Here’s the list of Docker events SPM Docker monitoring agent currently captures: Version Information on Startup: server-info – created by spm-agent framework with node.js and OS version info on startup docker-info – Docker Version, API Version, Kernel Version on startup Docker Status Events: Container Lifecycle Events like create, exec_create, destroy, export Container Runtime Events like die, exec_start, kill, oom, pause, restart, start, stop, unpause Every time a Docker container emits one of these events spm-agent-docker will capture it in real-time, ship it over to SPM, and you’ll be able to see it as shown in the above screenshot. Oh, and if you’re running CoreOS, you may also want to see how to index CoreOS logs into ELK/Logsene. Why? Because then you can have not only metrics and container events in one place, but also all container and application logs, too! If you’re using Docker, we hope you find this useful! Anything else you’d like us to add to SPM (for Docker or anyother integration)? Leave a comment, ping @sematext, or send us email – tell us what you’d like to get for early Christmas!
June 27, 2015
by Stefan Thies
· 3,177 Views
article thumbnail
Generating CSV-files on .NET
I have project where I need to output some reports as CSV-files. I found a good library called CsvHelper from NuGet and it works perfect for me. After some playing with it I was able to generate CSV-files that were shown correctly in Excel. Here is some sample code and also extensions that make it easier to work with DataTables. Simple report Here’s the simple fragment of code that illustrates how to use CsvHelper. using (var writer = new StreamWriter(Response.OutputStream)) using (var csvWriter = new CsvWriter(writer)) { csvWriter.Configuration.Delimiter = ";"; csvWriter.WriteField("Task No"); csvWriter.WriteField("Customer"); csvWriter.WriteField("Title"); csvWriter.WriteField("Manager"); csvWriter.NextRecord(); foreach (var project in data) { csvWriter.WriteField(project.Code); csvWriter.WriteField(project.CustomerName); csvWriter.WriteField(project.Name); csvWriter.WriteField(project.ProjectManagerName); csvWriter.NextRecord(); } } Of course, you can use other methods to output whole object or object list with one shot. I just needed here custom headers that doesn’t match property names 1:1. Generic helper for DataTable Some of my projects come from service layer as DataTable. I don’t want to add new models or Data Transfer Objects (DTO) with no good reason and DataTable is actually flexible enough if you need to add new fields to report and you want to do it fast. As DataTables are not supported by default (yet?), I wrote simple extension methods that work on DataTable views. When called on DataTable it selects default view automatically. The idea is – you can set filter on default data view and leave out the rows you don’t need. If you just want to show DataTable to screen as table then check out my posting Simple view to display contents of DataTable. public static class CsvHelperExtensions { public static void WriteDataTable(this CsvWriter csvWriter, DataTable table) { WriteDataView(csvWriter, table.DefaultView); } public static void WriteDataView(this CsvWriter csvWriter, DataView view) { foreach (DataColumn col in view.Table.Columns) { csvWriter.WriteField(col.ColumnName); } csvWriter.NextRecord(); foreach (DataRowView row in view) { foreach (DataColumn col in view.Table.Columns) { csvWriter.WriteField(row[col.ColumnName]); } csvWriter.NextRecord(); } } } And here is simple MVC controller action that gets data as DataTable and returns it as CSV-file. The result is CSV-file that opens correctly in Excel. [HttpPost] public void ExportIncomesReport() { var data = // Get DataTable here Response.ContentType = "text/csv"; Response.AddHeader("Content-disposition", "attachment;filename=IncomesReport.csv"); var preamble = Encoding.UTF8.GetPreamble(); Response.OutputStream.Write(preamble, 0, preamble.Length); using (var writer = new StreamWriter(Response.OutputStream)) using (var csvWriter = new CsvWriter(writer)) { csvWriter.Configuration.Delimiter = ";"; csvWriter.WriteDataTable(data); } } One thing to notice – with CsvHelper we have full control over a stream where we write data and this way we can write more performant code. Related Posts .Net Framework 4.0: string.IsNullOrWhiteSpace() method Exporting GridView Data to Excel Code Contracts: Hiding ContractException How to dump object properties My object to object mapper source released The post Generating CSV-files on .NET appeared first on Gunnar Peipman - Programming Blog.
June 26, 2015
by Gunnar Peipman
· 4,710 Views · 1 Like
article thumbnail
7 Things I Didn’t Expect to Hear at Gartner’s IT Ops Summit
Last week’s Gartner IT Operations Strategies & Solutions Summit in Orlando, Fla., was exactly what you’d expect—a place to talk about the IT operations issues impacting some of the largest companies in the world. Even so, there were a few interesting surprises. Among them: 1. Bi-modal is big. Not everyone will succeed. Gartner continued to tell its customers to employ two modes of IT—a traditional, slower moving capability for older, typically internal systems of record; and a high-speed, experimental one for new, typically customer-facing Web and mobile apps. “This is a time of experimentation and innovation,” said Gartner VP and distinguished analyst Chris Howard in his opening keynote. Organizations can’t ignore that there are multiple speeds and they should participate in all. Gartner managing VPRonni Colville added that by 2017, 75% of IT orgs will have this “bi-modal” IT capability. See also: Bi-Modal IT: Gartner Endorses Both Disruptive and Conservative Approaches to Technology However, “50% will make a mess of it,” Colville said. Why? Not necessarily because of technology failings, but more often because of a lack of people skills. 2. IT success is all about people. Donna Scott, also a Gartner VP and distinguished analyst, told her keynote audience that “you will be judged on agility, speed, and innovation.” However, the biggest problems Gartner sees for infrastructure and operations team engagement and innovation are lack of time, company culture that’s not conducive to these approaches, and a lack of business skills in IT. More than half of the people responding to an in-room poll said “people” are the part of IT ops that must change first. Not technology. Gartner research director George Spafford underscored similar issues in large organizations trying to use DevOps at scale: people and “human factors” are the biggest concerns from his in-room poll. All these probably contributed to hiring best-selling author Daniel Pink as a keynote speaker on the opening day of the conference. His focus? Not IT or architecture. Instead, he pounded home the importance of influencing people and selling internally. 3. Big orgs are trying DevOps. But the issues are different at scale. In numerous sessions I saw many hands go up when analysts asked, “Who here is trying DevOps?” Clearly, the approach is getting traction in large companies. But there’s lots of learning still to do. In fact, that was Spafford’s biggest bit of advice. “Always be learning,” he said, “trying to see what works and what breaks, especially at scale.” And, even once you’ve had some initial success, keep learning. “If you’ve done ‪DevOps, stay humble,” he advised. 4. Looking to innovative organizations for ideas … analytics on the rise. Many sessions addressed how large organizations are taking on ideas fostered by smaller, more risk-tolerant companies, and offered advice for doing so successfully. In addition to multiple discussions of DevOps, an entire session was devoted to establishing your own “Genius Bar®—a “walk-up IT support center” as explained in this CIO article. As at previous conferences, Gartner research VP Cameron Haight ran several sessions on lessons learned from firms running massive, Web-scale IT systems. “You need lots of data … and access to it inexpensively,” he said. Some commercial monitoring companies (New Relic included!) got a shout out for taking the lessons of Web scale IT to heart in their offerings. In addition, Haight said, “Analytics are increasingly important for application performance monitoring given the huge amount of data now available.” 5. Cloud: Enterprises want it, but aren’t very good at it yet. Gartner research director Dennis Smith talked through the enterprise’s interest in cloud computing. A huge majority of his in-room poll wanted some mix of both public and private cloud, while only 9% wanted to use only a private cloud environment and a measly 4% were looking to move entirely to the public cloud. The most popular choice (41%) was an 80/20 split between private and public cloud infrastructure. “Enterprises don’t make the dean’s list,” for cloud usage, Smith said, earning no more than a C average in his opinion. Large organizations are doing well at visibility, governance, and delivering standardized stacks, he said, but are less skilled at optimizing for these new environments. Still, Smith said the trends point toward enterprises improving on all fronts. 6. Cloud security can be better than yours. Importantly, Gartner VP and distinguished analyst Neil MacDonald gave the cloud a vote of confidence: noting that, for a variety of reasons, “Well-managed public cloud can be more secure than your own data center.” For example, on-premise software can pose serious security risks, he said, because of “deployment lag” where customers are stuck using software releases with unpatched security vulnerabilities. With a cloud-based Software-as-a-Service (SaaS), security updates can be more quickly rolled out to all customers. But cloud security can be different, requiring a shift to information-level security from OS-level security. Best practices include doing away with a huge pool of all-powerful sysadmins in favor of JEA, or “just enough administration,” where sysadmins have just enough privileges to do their job, and no more. An analogous security practice for compute resources is “least privilege,” where apps and microservices can’t talk to each other unless they specifically need to do so. Audience polling supported MacDonald’s optimistic view of cloud security, which suggests that large enterprises may struggle less with their cloud policies moving forward. 7. Containers: Try ’em! Ahead of this week’s DockerCon in San Francisco, Gartner devoted significant airtime to educating the audience on containers and microservices. My summary of ‪Gartner VP and distinguished analyst Tom Bittman’s advice on containers was simple: Try ’em. Now. Complement them with VMs. ‪And Docker (the company) is important, but not the be-all and end-all in this space. Bittman (copping to some deja vu from Gartner presentations he made on server virtualization 13 years ago) noted that while virtualization has been focused on admin and ops functions, containers are focused on value for developers. But because containers are well suited for driving up VM utilization for workloads that share the same OS, we can expect to see more combinations of containers and server virtualization. Finally, Bittman underscored that Gartner doesn’t see containers having much impact on premise, but making a huge difference in the cloud. That doesn’t necessarily fit with what’s been shown in other research, such as this 2015 State of Containers Survey sponsored by VMblog.com and StackEngine, so we’ll want to watch how this plays out. This is all a lot to digest. The Gartner IT Operations Strategies & Solutions Summitacknowledges the importance of dealing with existing IT systems and practices as well as promising new technologies and thinking, and tries to point a way forward. In fact, Haight had a very good quote about microservices that I thought also served to wrap up the entire event: “If you want to run with the big dogs, you need to rethink application architecture,” he said. That can be very difficult for an enterprise to fully implement … but also very appealing. Note: Al Sargent contributed to this post. All product and company names herein may be trademarks of their registered owners. Server, tortoise and hare, business team, and cloud security images courtesy ofShutterstock.com.
June 24, 2015
by Fredric Paul
· 1,820 Views
article thumbnail
Perforce and Go2Group Integrate Helix SCM Platform with ConnectALL ALM Router
New Integration Provides Seamless Connections Between Perforce Helix and Leading Application Lifecycle Management Systems WOKINGHAM, UK. (June 24, 2015) – Perforce Software, the leader in software configuration management (SCM) and collaboration, and Go2Group, an Atlassian Platinum and Enterprise Expert, today announced the Perforce ConnectALL Adapter. The new adapter for Go2Group’s ConnectALL ALM Router connects Perforce Helix to Application Lifecycle Management (ALM) systems supported by ConnectALL. The companies also announced that they have expanded their partnership, which first began in 2002. “Very few SCMs can handle binary data, and no other SCM solution supports large file formats that scale across globally distributed enterprises like Helix,” said Brett Taylor, president of Go2Group. “Our customers demand future-proof solutions, and with Perforce we know they don’t have to worry about outgrowing their systems—it will serve them well whether they’re a team of 50 or 50,000.” With the Perforce adapter, ConnectALL automatically synchronises data and workflow with other ALM systems and integrates ALM systems components within minutes. “We’re excited to be a part of the ConnectALL ecosystem of adapters and to enable companies to more easily design, configure, synchronise, manage, and monitor their integrations with Perforce,” said Dave Robertson, vice president of Channels at Perforce. “We’re glad to extend our partnership with Go2Group to new technologies and markets.” Go2Group is part of Perforce’s network of sales partners across Europe, the Middle East, Africa, Asia Pacific and India. Perforce partners serve customers in more than 100 countries worldwide. The Perforce ConnectALL Adapter is available for purchase from the Go2Group website.
June 24, 2015
by Fran Cator
· 955 Views
article thumbnail
Craving for reactive awesomeness? Vert.x 3 is now live!
We're pleased to announce the release of Vert.x 3! This is the culmination of over a year's work to bring you what we hope is the most compelling way to write scalable modern applications and services on the JVM using Java, Groovy, Ruby or JavaScript. Explore the new web-site to find out more. Please see the official release announcement for full details.
June 24, 2015
by Tim Fox
· 922 Views
article thumbnail
Hazelcast Cluster Quorum
Originally written by David Brimley. The Death Spiral. A new feature in the 3.5 release of Hazelcast is the Cluster Quorum. In this instance we’re not talking about a Quorum in its traditional distributed systems sense, think of a Cluster Quorum as a kind of gatekeeper, protecting your cluster during times of unexpected member loss. You can use Cluster Quorums to restrict operations on Maps or indeed the entire cluster based upon environmental criteria. This sounds great you say, but I’m still not sure how this can help me? OK. Let's take a look at a scenario… Imagine a cluster that has a very high number of writes to a certain map. We also have other maps that are not updated quite so frequently and all the while we have hundreds of clients all reading from the cluster but not at the same frequency as the data that is entering the system. In normal circumstances if a machine or a number of machines were to die in the cluster we may still have enough memory available to store our data, but the amount of threads available to process requests would be reduced. We now have less cores available and the partition threads in the cluster could quickly become overwhelmed by the one map that is updated rapidly. This could mean other clients becoming starved of threads, unable to service requests. It’s also possible that the remaining members would become so consumed that they’re unable to respond to membership pings, the knock on effect could result in the member being forced out of the cluster on the assumption that it is dead. To protect the rest of the cluster in the event of member loss we need a way to stop the writes to the high frequency map whilst allowing operations to the other data structures. We can then continue to provide a good service to our other users whilst the crashed machines are restored to the cluster. Bring on the Quorum! As of Hazelcast 3.5 we now have the ability to restrict operations on distinct data structures. We do this via a Quorum configuration. We observed that other IMDG products provide Quorums that have protection at a cluster level,we decided to go one step further and provide Quorum protection around data structures as well. In the example below we create a very simple Quorum on the default map. The ‘default’ map in Hazelcast is the configuration used if no other match is found. In this instance no operations will be allowed unless the cluster has a minimum of 3 members. You’ll also note that the Quorum configuration is separate from the Map. This means that you can have multiple Quorums in a cluster attached to many different structures. If the Quorum thresholds are not satisfied then a QuorumException is thrown when we try to interact with the default map in any way. Be it from a client or another member. 3 quorumRuleWithThreeNodes Quorum Functions It’s simple to set up a Quorum check based on cluster size as we’ve seen above, but if you want to make a slightly more complex check you can do this by applying a Quorum Function. QuorumConfig quorumConfig = new QuorumConfig(); quorumConfig.setName("MyQuorum"); quorumConfig.setEnabled(true); quorumConfig.setType(QuorumType.WRITE); quorumConfig.setQuorumFunctionImplementation(new QuorumFunction() { @Override public boolean apply(Collection members) { return (members.size() >= 3) && (someOtherExternalClusterState); } }); In the example above we use Configuration API to set-up the Quorum to disallowwrites if the boolean returned from the QuorumFunction is false. In the function we test if the size of the cluster is greater than 3 and also if a variable namedsomeOtherExternalClusterState is equal true. You now get the idea that by using a function you can test for other state and not just cluster member. Listen In. Another nice feature of Quorums is the ability to listen in to Quorum Events. You can register a new callback interface called not surprisingly a QuorumListener. Quorum listeners are local to the node that they are registered, so they receive only events occurred on that local node. 3 com.company.quorum.ThreeNodeQuorumListener quorumRuleWithThreeNodes The QuorumListener has just one method that is called passing you aQuorumEvent. package com.hazelcast.quorum; import java.util.EventListener; /** * Listener to get notified when a quorum state is changed */ public interface QuorumListener extends EventListener { /** * Called when quorum presence state is changed. * * @param quorumEvent provides information about quorum presence and current member list. */ void onChange(QuorumEvent quorumEvent); } The QuorumEvent itself allows you to determine if a Quorum has been established or if it has been lost via its isPresent() method call. Additionally it provides the required cluster members to form a quorum and also the current membership list. Query the Quorums. Above we saw how we could receive callbacks, but in some cases we may just wish to make an immediate check to see if the Quorum is established or not. We can do this via the QuorumService. HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance(config); QuorumService quorumService = hazelcastInstance.getQuorumService(); Quorum quorum = quorumService.getQuorum(quorumName); boolean quorumPresence = quorum.isPresent(); In Conclusion The Cluster Quorum feature is another important tool for you to manage your cluster. In future versions of Hazelcast there are plans to add other data structures, for example you’ll be able to protect operations against Topics or Queues.
June 24, 2015
by Andrea Echstenkamper
· 2,834 Views · 2 Likes
article thumbnail
New Relic’s Docker Monitoring Now Generally Available
[This article was written by Andrew Marshall] We’ve been talking a lot about Docker over the past few weeks—with good reason. Docker’s explosive growth in popularity within the enterprise has enabled new distributed application architectures and with it a need for app-centric monitoring of your Docker containers within the context of the rest of your infrastructure. We’re thrilled to announce today that New Relic’s Docker monitoring is now generally available to New Relic customers, just in time for DockerCon 2015! (And as we noted last week, New Relic’s Docker monitoring solution has been selected by Docker for its Ecosystem Technology Partner program as a proven container monitoring solution.) Why app-centric monitoring? If you’re a software business using Docker containers, chances are you’ve done so to gain efficiencies from your system resources or portability across environments to shorten the cycle between writing and running code. Either way, adding Docker containers to your app development meant a new tier of infrastructure to monitor, which equated to a “black box” in your data—one that you had no visibility into from a monitoring perspective, Docker monitoring with New Relic is designed to “fix” this lack of monitoring visibility by adding an app-centric view of Docker containers to the existing New Relic Servers interface you already use. Now, instead of having a gap between the application and server monitoring views, we’ve added the ability to see containers with the same “first-class“ experience as you would with virtual machines and servers. You can now drill down from the application (which is really what you care about) to the individual Docker container, and then to the physical server. No more blind spots! As we strive to do with all of our products, we took the approach of “important” over “impressive” when it comes to the container information we provide to users. Based on direct feedback from customers, we’ve tried to take the mystery out of finding the right container to help you get back to developing your applications. As the way people use containers changes over time, we plan to continue to listen to our customers to help shape how we approach Docker container monitoring. Restoring 360-degree view of your application environment One example of how app-centric monitoring can impact a team moving to microservices or distributed application environments is Motus, a mobile workforce management company. Motus has been a New Relic customer for more than four years and recently has been shifting to a microservices architecture with approximately 95% of its production workload now running in Docker containers. While Docker helpd Motus gain speed and agility while reducing infrastructure complexity, the link between the application and what was happening with the container it was running on was broken. During the trial of New Relic’s Docker monitoring, Motus was able to more easily identify which container an app was running on, all the way down to the node. That was a big help when they needed to investigate an issue and determine if a new container was required.. During the beta alone, Motus estimates that using New Relic helped them to reduce the time to investigate and fix problems with its Docker containers by 30%! Motus isn’t just using New Relic to diagnose when a problem occurs. Docker monitoring with New Relic has helped Motus analyze and “right size” its containers for the application to better allocate resources for performance and budget. Get started with New Relic’s Docker monitoring today, for more information, please stop by our booth at DockerCon, June 22-23 in San Francisco! Resources: Motus Docker Monitoring Case Study Docker Monitoring with New Relic Enabling Docker Monitoring with New Relic Docker in the New Relic Community Forum
June 24, 2015
by Fredric Paul
· 1,019 Views
article thumbnail
Percona XtraDB Cluster (PXC): How Many Nodes Do You Need?
Written by Stephane Combaudon. A question I often hear when customers want to set up a production PXC cluster is: “How many nodes should we use?” Three nodes is the most common deployment, but when are more nodes needed? They also ask: “Do we always need to use an even number of nodes?” This is what we’ll clarify in this post. This is all about quorum I explained in a previous post that a quorum vote is held each time one node becomes unreachable. With this vote, the remaining nodes will estimate whether it is safe to keep on serving queries. If quorum is not reached, all remaining nodes will set themselves in a state where they cannot process any query (even reads). To get the right size for you cluster, the only question you should answer is: how many nodes can simultaneously fail while leaving the cluster operational? If the answer is 1 node, then you need 3 nodes: when 1 node fails, the two remaining nodes have quorum. If the answer is 2 nodes, then you need 5 nodes. If the answer is 3 nodes, then you need 7 nodes. And so on and so forth. Remember that group communication is not free, so the more nodes in the cluster, the more expensive group communication will be. That’s why it would be a bad idea to have a cluster with 15 nodes for instance. In general we recommend that you talk to us if you think you need more than 10 nodes. What about an even number of nodes? The recommendation above always specifies odd number of nodes, so is there anything bad with an even number of nodes? Let’s take a 4-node cluster and see what happens if nodes fail: If 1 node fails, 3 nodes are remaining: they have quorum. If 2 nodes fail, 2 nodes are remaining: they no longer have quorum (remember 50% is NOT quorum). Conclusion: availability of a 4-node cluster is no better than the availability of a 3-node cluster, so why bother with a 4th node? The next question is: is a 4-node cluster less available than a 3-node cluster? Many people think so, specifically after reading this sentence from the manual: Clusters that have an even number of nodes risk split-brain conditions. Many people read this as “as soon as one node fails, this is a split-brain condition and the whole cluster stop working”. This is not correct! In a 4-node cluster, you can lose 1 node without any problem, exactly like in a 3-node cluster. This is not better but not worse. By the way the manual is not wrong! The sentence makes sense with its context. There could actually reasons why you might want to have an even number of nodes, but we will discuss that topic in the next section. Quorum with multiple data centers To provide more availability, spreading nodes in several datacenters is a common practice: if power fails in one DC, nodes are available elsewhere. The typical implementation is 3 nodes in 2 DCs: Notice that while this setup can handle any single node failure, it can’t handle all single DC failures: if we lose DC1, 2 nodes leave the cluster and the remaining node has not quorum. You can try with 4, 5 or any number of nodes and it will be easy to convince yourself that in all cases, losing one DC can make the whole cluster stop operating. If you want to be resilient to a single DC failure, you must have 3 DCs, for instance like this: Other considerations Sometimes other factors will make you choose a higher number of nodes. For instance, look at these requirements: All traffic is directed to a single node. The application should be able to fail over to another node in the same datacenter if possible. The cluster must keep operating even if one datacenter fails. The following architecture is an option (and yes, it has an even number of nodes!): Conclusion Regarding availability, it is easy to estimate the number of nodes you need for your PXC cluster. But node failures are not the only aspect to consider: Resilience to a datacenter failure can, for instance, influence the number of nodes you will be using.
June 24, 2015
by Peter Zaitsev
· 1,390 Views
article thumbnail
It's Time to Start Programming (for) Adults
This week we're in Boston at DevNation, an awesome, young (second ever), and relatively intimate (~500 attendees) conference on anything and everything hard-core, cool-and-hot (DevOps, big data, Angular, IoT, you name it), and of course -- since the conference is organized by Red Hat -- totally open-source. So far I've had in-depth conversations with five super-amazing engineers, attended several inspiring keynotes, and chatted with one skilled developer after another. We'll transcribe the deeper interviews shortly, including some on topics totally unrelated to this post. But meanwhile I'd like to offer some thoughts inspired by the first day of the event. The general theme is: we're just beginning to get serious about separation of concerns. The metaphor that keeps popping into my head comes from the first keynote: machines have finally grown up. Imperatives: telling really unintelligent agents what to do (and then they sort of do whatever they please) It is trivial to observe that computers are incredibly stupid. Turing's fundamental paper is about how to figure out whether a theoretical computer will keep calculating the values of a function until the heat death of the universe (okay that's a slight oversimplification). The fact that Edsger Dijkstra felt the need to rail gently against all goto statements in any higher-level language than machine code suggests that, in 1968, far too many computers needed instructions about how to read the instructions that tell them what to do in the first place. Richard Feynman's famous lecture on computer heuristics is the condescension of the man who conceived quantum computing to the level of functional composition (hmmm) and file systems (double sigh). Stupid agents need to be told exactly what to do. Then they need to be told to pay attention to the exact part of the command that tells them exactly what they have been told to do (dude, just goto line 1343 already and shut up). Then they don't do what you told them (optimistically we call this an 'exception'), and then you send them into time out / set a break point and try to figure out where the idiot state muted off the rails. They stare blankly at the wall / variable / register and either do nothing or repeat another unintelligibly wrong result until you notice that your increment is (apparently meaninglessly to you) one bracket too deep. You sigh and tell them what to do again, and after a while they hit age thirty (life-years/debug-hours) and maybe do something useful with their (process-)lives. Well, maybe I'm straining the metaphor a little here, but you get the point because it cuts too close to home. We spend far too much time fixing stupid mistakes that we didn't even know we were making because -- like all actual human beings -- we assumed that the agent we commanded will use their common sense to iron out those few whiffs of, admit it, frank nonsense that our step-by-step instructions will probably always contain. So, at least, goes the imperative programming paradigm. The machine does what you tell it to; and the universe collapses onto itself before the last real number is computed. Functions: reliable, predictable adults Time to give credit where it's due: I'm really just riffing on the metaphor Venkat Subramanian offered in his highly enjoyable keynote on The Joy of Functional Programming yesterday morning His not-so-smart agents -- the 'programmed' of imperative programming -- were toddlers. Since I don't have any kids, I can't presume to understand this experience fully (although I did grow up with three younger brothers..). But the general idea is: imperative programming is tricky because, when you spell everything out super literally, it's very hard to tell exactly why what you thought should happen didn't. Venkat's talk was a whirlwind of functional concepts, from the thrill of immutability to the self-evident utility of memoization. For random (Myers-Briggs?) reasons, the object-oriented paradigm never seemed very intuitive to me -- I've gravitated towards functional style even when the problem domain wasn't actually modeled very well by functions -- but Venkat's side-by-side implementations of simple calculations in OO and functional Java showed the readability delta very clearly. Functional code is beautiful because it looks like its purpose. It tells you flat-out: here is what I do; and then it does it. But immutable functions are also beautiful because they do exactly the same thing every time. I couldn't count on my two year old brother very much at all because given a certain input I had pretty much no idea what would come out. But we all count on our grown-up collaborators to output exactly what they should, given a definite input, predictably and reliably every time. Of course, people also do more than expected -- every intervention of intelligence is an injection of creativity, not generated by the definition of the function -- but at least they do what you need them to do and no less. Containers: grown-ups with good boundaries I'm picking out just one aspect of the resurgent 'joy' of functional programming because the renaissance of containerization (another 'old' technology that is just now really taking off) is, I think, a part of the same shift toward, let's say, treating computers as adults. If functions are reliable agents, then applications in well-defined containers are self-sufficient agents who know exactly what they need from others and neither require nor demand anything more. If apps on dedicated VMs are teenagers negotiating personal boundaries by waking/booting up independently (and taking far too long -- and far too many resources -- to do so, given their meager output) -- or bubble boys, isolated in ways that are unfortunate in order to isolate in ways that are absolutely necessary -- then containerized applications are subway-riders who jam into the train without offending anyone or campers who can live anywhere with just a backpack of just the stuff they need. Of course, subway-riders and campers do more than just not-mess-up. But what's kind of neat about containers is that -- like an adult with good boundaries -- clearly defined bounds and interfaces free up the application / mind to do whatever world-changing thing the developer / human has cooked up. I'll come back to this metaphor in a later article. (Mesh networks, SDN, and ad-hoc computing are all part of the same picture, I think. Kubernetes probably is too, along with event-driven and reactive programming, the actor model, dreams of Smalltalk, and of course REST, at least of the HATEOAS flavor.) But maybe this isn't a good way to think about some of these recent sparks in devworld within a single paradigm -- and maybe my perpetual discomfort with OO is influencing me too much. What do you think?
June 23, 2015
by John Esposito
· 1,935 Views · 1 Like
article thumbnail
AppFabric Coming Apart? 5 Reasons to Move to Redis Labs
Written by Leena Joshi Microsoft recently announced that Microsoft AppFabric 1.1 for Windows Server will be at the end of support on April 2, 2016. Less than a year away! Don’t panic yet, there is another, better solution. Redis is the product of choice for thousands of developers worldwide who want to accelerate their applications. Redis is the fastest growing NoSQL datastore, that runs in-memory and delivers millions of transactions at sub millisecond latencies. Redis Labs provides enterprise class Redis: as a downloadable product Redis Labs Enterprise Cluster (RLEC) or as a seamlessly scalable, highly available service, Redis Cloud. Here are the top 5 reasons to move applications using Microsoft AppFabric to Redis Cloud or Redis Labs Enterprise Cluster : Performance: One of the biggest reasons to move is the blazing fast performance and versatility of Redis. While AppFabric performs simple GET and SET commands, Redis comes with a variety of data structures (strings, lists, hashes, sets, sorted sets) and a sophisticated set of commands, embedded Lua scripting and bit operations that helps you address more than high speed caching – it lets you implement high speed transactions, real time analytics, in-app social functionality, messaging, job and queue management and much more. Scalability: The current AppFabric replacement offered by MS runs on Azure, however it is not a service that scales seamlessly like Redis Cloud. Redis Cloud with 4900+ customers to date is proven to be highly scalable and available and provides all the functionality of Redis with none of the deployment overhead. Also, if you don’t really want a cloud solution, RLEC (Redis Labs Enterprise Cluster) runs on-premises or wherever you are deployed and relieves you of any provisioning, configuring, scaling, clustering, monitoring – it fully automates all those tasks and makes it super-easy to deploy Redis clusters. Portability: AppFabric is Windows and .Net specific while Redis supports many environments and languages.Thanks to our community, Redis supports many languages including Python, Ruby, Java, PHP, Node, C, C# . So, if you have a mixed environment, now you can even extend your use of in-memory, high speed technologies. Built-in Monitoring: Both Redis Cloud and RLEC come with a dashboard to view different operational metrics as well as built-in alerts to receive notification on important events. This is very much more cumbersome with AppFabric and you would have had to use external tools to get the same level of manageability. Ease of use: Redis is simple yet very sophisticated and used by thousands of developers worldwide. It is #3 among NoSQL database adoption (source:DBengines) and #12 among all tools used by developers (source:Stackshare). This means that a talent pool of Redis users and a worldwide community is always there to help! RLEC is free to download and Redis Cloud has a free tier as well – so it is really easy to try out both those products. If you have any questions about migrating from AppFabric to Redis Labs, our solutions consultants are always here to help. Email us at [email protected] and we can set up some time to walk you through the migration! - See more at: https://redislabs.com/blog/appfabric-coming-apart-top-5-reasons-to-move-to-redis#.VYSGzucYGL4
June 22, 2015
by Itamar Haber
· 973 Views
article thumbnail
Devnation Keynote 6/22 #2: The Future of Development with Kubernetes and Docker
From the DevNation Agenda site: You've probably heard a lot about Linux containers and the exciting potential they hold. In this presentation, Matt Hicks will cover how Docker and Kubernetes have evolved to fundamentally change how you will approach development and operations. If you are looking for an understanding of the technology and how it relates to the common roles in IT today, this is the talk to watch. Speaker: Matt Hicks -- Vice President of engineering, Red Hat Matt Hicks is a founding member of the OpenShift by Red Hat team. He has spent more than a decade in software engineering, with a variety of roles in development, operations, architecture, and management. His real expertise is in bridging the gap between developing code and actually running it in production. An expert in IT and cloud-based architectures, he spends his time these days evolving OpenShift to use the power of cloud and make developers more productive.
June 22, 2015
by N A
· 1,088 Views · 2 Likes
article thumbnail
Long-Term Log Analysis with AWS Redshift
You will aggregate a lot of logs over the lifetime of your product and codebase, so it’s important to be able to search through them. In the rare case of a security issue, not having that capability is incredibly painful. You might be able to use services that allow you to search through the logs of the last two weeks quickly. But what if you want to search through the last six months, a year, or even further? That availability can be rather expensive or not even an option at all with existing services. Many hosted log services provide S3 archival support which we can use to build a long-term log analysis infrastructure with AWS Redshift. Recently I’ve set up scripts to be able to create that infrastructure whenever we need it at Codeship. AWS Redshift AWS Redshift is a data warehousing solution by AWS. It has an easy clustering and ingestion mechanism ideal for loading large log files and then searching through them with SQL. As it automatically balances your log files across several machines, you can easily scale up if you need more speed. As I said earlier, looking through large amounts of log files is a relatively rare occasion; you don’t need this infrastructure to be around all the time, which makes it a perfect use case for AWS. Setting Up Your Log Analysis Let’s walk through the scripts that drive our long-term log analysis infrastructure. You can check them out in the flomotlik/redshift-logging GitHub repository. I’ll take you step by step through configuring the whole setup of the environment variables needed, as well as starting the creation of the cluster and searching the logs. But first, let’s get a high-level overview of what the setup script is doing before going into all the different options that you can set: Creates an AWS Redshift cluster. You can configure the number of servers and which server type should be used. Waits for the cluster to become ready. Creates a SQL table inside the Redshift cluster to load the log files into. Ingests all log files into the Redshift cluster from AWS S3. Cleans up the database and prints the psql access command to connect into the cluster. Be sure to check out the script on GitHub before we go into all the different options that you can set through the .env file. Options to set The following is a list of all the options available to you. You can simply copy the .env.template file to .env and then fill in all the options to get picked up. AWS_ACCESS_KEY_ID AWS key of the account that should run the Redshift cluster. AWS_SECRET_ACCESS_KEY AWS secret key of the account that should run the Redshift cluster. AWS_REGION=us-east-1 AWS region the cluster should run in, default us-east-1. Make sure to use the same region that is used for archiving your logs to S3 to have them close. REDSHIFT_USERNAME Username to connect with psql into the cluster. REDSHIFT_PASSWORD Password to connect with psql into the cluster. S3_AWS_ACCESS_KEY_ID AWS key that has access to the S3 bucket you want to pull your logs from. We run the log analysis cluster in our AWS Sandbox account but pull the logs from our production AWS account so the Redshift cluster doesn’t impact production in any way. S3_AWS_SECRET_ACCESS_KEY AWS secret key that has access to the S3 bucket you want to pull your logs from. PORT=5439 Port to connect to with psql. CLUSTER_TYPE=single-node The cluster type can be single-node or multi-node. Multi-node clusters get auto-balanced which gives you more speed at a higher cost. NODE_TYPE Instance type that’s used for the nodes of the cluster. Check out the Redshift Documentation for details on the instance types and their differences. NUMBER_OF_NODES=10 Number of nodes when running in multi-mode. CLUSTER_IDENTIFIER=log-analysis DB_NAME=log-analysis S3_PATH=s3://your_s3_bucket/papertrail/logs/862693/dt=2015 Database format and failed loads When ingesting log statements into the cluster, make sure to check the amount of failed loads that are happening. You might have to edit the database format to fit to your specific log output style. You can debug this easily by creating a single-node cluster first that only loads a small subset of your logs and is very fast as a result. Make sure to have none or nearly no failed loads before you extend to the whole cluster. In case there are issues, check out the documentation of the copy command which loads your logs into the database and the parameters in the setup script for that. Example and benchmarks It’s a quick thing to set up the whole cluster and run example queries against it. For example, I’ll load all of our logs of the last nine months into a Redshift cluster and run several queries against it. I haven’t spent any time on optimizing the table, but you could definitely gain some more speed out of the whole system if necessary. It’s just fast enough already for us out of the box. As you can see here, loading all logs of May — more than 600 million log lines — took only 12 minutes on a cluster of 10 machines. We could easily load more than one month into that 10-machine cluster since there’s more than enough storage available, but for this post, one month is enough. After that, we’re able to search through the history of all of our applications and past servers through SQL. We connect with our psql client and send of SQL queries against the “events’ database. For example, what if we want to know how many build servers reported logs in May: loganalysis=# select count(distinct(source_name)) from events where source_name LIKE 'i-%'; count ------- 801 (1 row) So in May, we had 801 EC2 build servers running for our customers. That query took ~3 seconds to finish. Or let’s say we want to know how many people accessed the configuration page of our main repository (the project ID is hidden with XXXX): loganalysis=# select count(*) from events where source_name = 'mothership' and program LIKE 'app/web%' and message LIKE 'method=GET path=/projects/XXXX/configure_tests%'; count ------- 15 (1 row) So now we know that there were 15 accesses on that configuration page throughout May. We can also get all the details, including who accessed it when through our logs. This could help in case of any security issues we’d need to look into. The query took about 40 seconds to go though all of our logs, but it could be optimized on Redshift even more. Those are just some of the queries you could use to look through your logs, gaining more insight into your customers’ use of your system. And you et all of that with a setup that costs $2.50 an hour, can be shut down immediately, and recreated any time you need access to that data again. Conclusions Being able to search through and learn from your history is incredibly important for building a large infrastructure. You need to be able to look into your history easily, especially when it comes to security issues. With AWS Redshift, you have a great tool in hand that allows you to start an ad hoc analytics infrastructure that’s fast and cheap for short-term reviews. Of course, Redshift can do a lot more as well. Let us know what your processes and tools around logging, storage, and search are in the comments.
June 21, 2015
by Florian Motlik
· 1,449 Views
article thumbnail
Ode to a Workstation
Every now and then I get work done in the home office. I’ve written previously about my setup, but after churning out some solution design today, I sat back and really took some time to appreciate the workspace. I’m really pleased with the configuration, it’s probably the best setup I’ve had in years. The desk is a former QLD police desk from the 1940s, so it wasn’t built for modern computers – not a problem, the cables run down the back which is just a minor annoyance. The keyboard and mouse are gaming varieties so that they perform well – the old Sennheiser (RF) wireless headset has been with me since 2006 and still works very well. The wooden clock (recently reviewed) acts as external speakers, a Bluetooth receiver and has a built in microphone so it can be used as a hands-free option for conference calls. It also features Qi wireless charging capability and also features a thermostat. Under the second monitor is a HDD caddy which supports USB3, and features 4 bays which can be used in parallel. I try to keep the desk reasonably neat, and there’s plenty of space so it doesn’t get too cluttered. I have a nice view out the window to a small courtyard which gets early morning sun.
June 21, 2015
by Rob Sanders
· 1,071 Views
article thumbnail
Diff'ing Software Architecture Diagrams
robert annett wrote a post titled diagrams for system evolution where he describes a simple approach to showing how to visually describe changes to a software architecture. in essence, in order to show how a system is to change, he'll draw different versions of the same diagram and use colour-coding to highlight the elements/relationships that will be added, removed or modified. i've typically used a similar approach for describing as-is and to-be architectures in the past too. it's a technique that works well. although you can version control diagrams, it's still tricky to diff them using a tool. one solution that addresses this problem is to not create diagrams, but instead create a textual description of your software architecture model that is then subsequently rendered with some tooling. you could do this with an architecture description language (such as darwin ) although i would much rather use my regular programming language instead. creating a software architecture model as code this is exactly what structurizr is designed to do. i've recreated robert's diagrams with structurizr as follows. and since the diagrams were created by a model described as java code , that description can be diff'ed using your regular toolchain. code provides opportunities this perhaps isn't as obvious as robert's visual approach, and i would likely still highlight the actual differences on diagrams using notation as robert did too. creating a textual description of a software architecture model does provide some interesting opportunities though.
June 19, 2015
by Simon Brown
· 1,354 Views
article thumbnail
Building Microservices: Using an API Gateway
Learn about using the microservice architecture pattern to build microservices and API gateways--compared to the usage of monolithic application architecture.
June 16, 2015
by Patrick Nommensen
· 121,121 Views · 40 Likes
article thumbnail
Why 12 Factor Application Patterns, Microservices and CloudFoundry Matter (Part 2)
Learn why 12 Factor Application Patterns, Microservices and CloudFoundry matter when trying to change the way your product is produced.
June 12, 2015
by Tim Spann DZone Core CORE
· 15,647 Views · 4 Likes
article thumbnail
Top 80 Thread- Java Interview Questions and Answers (Part 2)
PART 1 > THREADS - Top 80 interview questions and answers (detailed explanation with programs) Question 61. class MyRunnable implements Runnable{ public void run(){ for(int i=0;i<3;i++){ System.out.println("i="+i+" ,ThreadName="+Thread.currentThread().getName()); } } } public class MyClass { public static void main(String...args){ MyRunnable runnable=new MyRunnable(); System.out.println("start main() method"); Thread thread1=new Thread(runnable); Thread thread2=new Thread(runnable); thread1.start(); thread2.start(); System.out.println("end main() method"); } } Answer. Thread behaviour is unpredictable because execution of Threads depends on Thread scheduler, start main() method will be the printed first, but after that we cannot guarantee the order of thread1, thread2 and main thread they might run simultaneously or sequentially, so order of end main() method will not be guaranteed. /*OUTPUT start main() method end main() method i=0 ,ThreadName=Thread-0 i=0 ,ThreadName=Thread-1 i=1 ,ThreadName=Thread-0 i=2 ,ThreadName=Thread-0 i=1 ,ThreadName=Thread-1 i=2 ,ThreadName=Thread-1 */ Question 62. class MyRunnable implements Runnable{ public void run(){ for(int i=0;i<3;i++){ System.out.println("i="+i+" ,ThreadName="+Thread.currentThread().getName()); } } } public class MyClass { public static void main(String...args) throws InterruptedException{ System.out.println("In main() method"); MyRunnable runnable=new MyRunnable(); Thread thread1=new Thread(runnable); Thread thread2=new Thread(runnable); thread1.start(); thread1.join(); thread2.start(); thread2.join(); System.out.println("end main() method"); } } Answer. We use join() methodto ensure all threads that started from main must end in order in which they started and also main should end in last. In other words join() method waited for this thread to die. /*OUTPUT In main() method i=0 ,ThreadName=Thread-0 i=1 ,ThreadName=Thread-0 i=2 ,ThreadName=Thread-0 i=0 ,ThreadName=Thread-1 i=1 ,ThreadName=Thread-1 i=2 ,ThreadName=Thread-1 end main() method */ Question 63. class MyRunnable implements Runnable { public void run() { try { while (!Thread.currentThread().isInterrupted()) { Thread.sleep(1000); System.out.println("x"); } } catch (InterruptedException e) { System.out.println(Thread.currentThread().getName() + " ENDED"); } } } public class MyClass { public static void main(String args[]) throws Exception { MyRunnable obj = new MyRunnable(); Thread t = new Thread(obj, "Thread-1"); t.start(); System.out.println("press enter"); System.in.read(); t.interrupt(); } } Answer. "press enter" will be printed first then thread1 will keep on printing x until enter is pressed, once enter is pressed "Thread-1 ENDED" will be printed. System.in.read() causes main thread to go from running to waiting state (thread waits for user input) /* OUTPUT press enter x x x x Thread-1 ENDED */ Question 64. class MyRunnable implements Runnable{ public void run(){ synchronized (this) { System.out.println("1 "); try { this.wait(); System.out.println("2 "); } catch (InterruptedException e) { e.printStackTrace(); } } } } public class MyClass { public static void main(String[] args) { MyRunnable myRunnable=new MyRunnable(); Thread thread1=new Thread(myRunnable,"Thread-1"); thread1.start(); } } Answer. Thread acquires lock on myRunnable object so 1 was printed but notify wasn't called so 2 will never be printed, this is called frozen process. Deadlock is formed, These type of deadlocksare called Frozen processes. /*OUTPUT 1 */ Question 65. import java.util.ArrayList; /* Producer is producing, Producer will allow consumer to * consume only when 10 products have been produced (i.e. when production is over). */ class Producer implements Runnable{ ArrayList sharedQueue; Producer(){ sharedQueue=new ArrayList(); } @Override public void run(){ synchronized (this) { for(int i=1;i<=3;i++){ //Producer will produce 10 products sharedQueue.add(i); System.out.println("Producer is still Producing, Produced : "+i); try{ Thread.sleep(1000); }catch(InterruptedException e){e.printStackTrace();} } System.out.println("Production is over, consumer can consume."); this.notify(); } } } class Consumer extends Thread{ Producer prod; Consumer(Producer obj){ prod=obj; } public void run(){ synchronized (this.prod) { System.out.println("Consumer waiting for production to get over."); try{ this.prod.wait(); }catch(InterruptedException e){e.printStackTrace();} } int productSize=this.prod.sharedQueue.size(); for(int i=0;i Q61- Q80
June 6, 2015
by Ankit Mittal
· 13,693 Views · 3 Likes
article thumbnail
Mounting an EBS Volume to Docker on AWS Elastic Beanstalk
Mounting an EBS volume to a Docker instance running on Amazon Elastic Beanstalk (EB) is surprisingly tricky. The good news is that it is possible. I will describe how to automatically create and mount a new EBS volume (optionally based on a snapshot). If you would prefer to mount a specific, existing EBS volume, you should check out leg100’s docker-ebs-attach (using AWS API to mount the volume) that you can use either in a multi-container setup or just include the relevant parts in your own Dockerfile. The problem with EBS volumes is that, if I am correct, a volume can only be mounted to a single EC2 instance – and thus doesn’t play well with EB’s autoscaling. That is why EB supports only creating and mounting a fresh volume for each instance. Why would you want to use an auto-created EBS volume? You can already use a docker VOLUME to mount a directory on the host system’s ephemeral storage to make data persistent across docker restarts/redeploys. The only advantage of EBS is that it survives restarts of the EC2 instance but that is something that, I suppose, happens rarely. I suspect that in most cases EB actually creates a new EC2 instance and then destroys the old one. One possible benefit of an EBS volume is that you can take a snapshot of it and use that to launch future instances. I’m now inclined to believe that a better solution in most cases is to set up automatic backup to and restore from S3, f.ex. using duplicity with its S3 backend (as I do for my NAS). Anyway, here is how I got EBS volume mounting working. There are 4 parts to the solution: Configure EB to create an EBS mount for your instances Add custom EB commands to format and mount the volume upon first use Restart the Docker daemon after the volume is mounted so that it will see it (see this discussion) Configure Docker to mount the (mounted) volume inside the container 1-3.: .ebextensions/01-ebs.config: # .ebextensions/01-ebs.config commands: 01format-volume: command: mkfs -t ext3 /dev/sdh test: file -sL /dev/sdh | grep -v 'ext3 filesystem' # ^ prints '/dev/sdh: data' if not formatted 02attach-volume: ### Note: The volume may be renamed by the Kernel, e.g. sdh -> xvdh but # /dev/ will then contain a symlink from the old to the new name command: | mkdir /media/ebs_volume mount /dev/sdh /media/ebs_volume service docker restart # We must restart Docker daemon or it wont' see the new mount test: sh -c "! grep -qs '/media/ebs_volume' /proc/mounts" option_settings: # Tell EB to create a 100GB volume and mount it to /dev/sdh - namespace: aws:autoscaling:launchconfiguration option_name: BlockDeviceMappings value: /dev/sdh=:100 4.: Dockerrun.aws.json and Dockerfile: Dockerrun.aws.json: mount the host’s /media/ebs_volume as /var/easydeploy/share inside the container: { "AWSEBDockerrunVersion": "1", "Volumes": [ { "HostDirectory": "/media/ebs_volume", "ContainerDirectory": "/var/easydeploy/share" } ] } Dockerfile: Tell Docker to use a directory on the host system as /var/easydeploy/share – either a randomly generated one or the one given via the -m mount option to docker run: ... VOLUME ["/var/easydeploy/share"] ...
June 3, 2015
by Jakub Holý
· 14,766 Views
article thumbnail
Ecosystem of Hadoop Animal Zoo
hadoop is best known for map reduce and it's distributed file system (hdfs). recently other productivity tools developed on top of these will form a complete ecosystem of hadoop. most of the projects are hosted under apache software foundation . hadoop ecosystem projects are listed below. hadoop common a set of components and interfaces for distributed file system and i/o (serialization, java rpc, persistent data structures) http://hadoop.apache.org/ hadoop ecosystem hdfs a distributed file system that runs on large clusters of commodity hardware. hadoop distributed file system, hdfs renamed form ndfs. scalable data store that stores semi-structured, un-structured and structured data. http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/hdfsuserguide.html http://wiki.apache.org/hadoop/hdfs map reduce map reduce is the distributed, parallel computing programming model for hadoop. inspired from google map reduce research paper . hadoop includes implementation of map reduce programming model. in map reduce there are two phases, not surprisingly map and reduce. to be precise in between map and reduce phase, there is another phase called sort and shuffle. job tracker in name node machine manages other cluster nodes. map reduce programming can be written in java. if you like sql or other non- java languages, you are still in luck. you can use utility called hadoop streaming. http://wiki.apache.org/hadoop/hadoopmapreduce hadoop streaming a utility to enable map reduce code in many languages like c, perl, python, c++, bash etc., examples include a python mapper and awk reducer. http://hadoop.apache.org/docs/r1.2.1/streaming.html avro a serialization system for efficient, cross-language rpc and persistent data storage. avro is a framework for performing remote procedure calls and data serialization. in the context of hadoop, it can be used to pass data from one program or language to another, e.g. from c to pig. it is particularly suited for use with scripting languages such as pig, because data is always stored with its schema in avro. http://avro.apache.org/ apache thrift apache thrift allows you to define data types and service interfaces in a simple definition file. taking that file as input, the compiler generates code to be used to easily build rpc clients and servers that communicate seamlessly across programming languages. instead of writing a load of boilerplate code to serialize and transport your objects and invoke remote methods, you can get right down to business. http://thrift.apache.org/ hive and hue if you like sql, you would be delighted to hear that you can write sql and hive convert it to a map reduce job. but, you don't get a full ansi-sql environment. hue gives you a browser based graphical interface to do your hive work. hue features a file browser for hdfs, a job browser for map reduce/yarn, an hbase browser, query editors for hive, pig, cloudera impala and sqoop2.it also ships with an oozie application for creating and monitoring workflows, a zookeeper browser and an sdk. pig a high-level programming data flow language and execution environment to do map reduce coding the pig language is called pig latin. you may find naming conventions some what un-conventional, but you get incredible price-performance and high availability. https://pig.apache.org/ jaql jaql is a functional, declarative programming language designed especially for working with large volumes of structured, semi-structured and unstructured data. as its name implies, a primary use of jaql is to handle data stored as json documents, but jaql can work on various types of data. for example, it can support xml, comma-separated values (csv) data and flat files. a "sql within jaql" capability lets programmers work with structured sql data while employing a json data model that's less restrictive than its structured query language counterparts. 1. jaql in google code 2. what is jaql? by ibm sqoop sqoop provides a bi-directional data transfer between hadoop -hdfs and your favorite relational database. for example you might be storing your app data in relational store such as oracle, now you want to scale your application with hadoop so you can migrate oracle database data to hadoop hdfs using sqoop. http://sqoop.apache.org/ oozie manages hadoop workflow. this doesn't replace your scheduler or BPM tooling, but it will provide if-then-else branching and control with hadoop jobs. https://oozie.apache.org/ zookeeper a distributed, highly available coordination service. zookeeper provides primitives such as distributed locks that can be used for building the highly scalable applications. it is used to manage synchronization for cluster. http://zookeeper.apache.org/ hbase based on google's bigtable , hbase "is an open-source, distributed, version, column-oriented store" that sits on top of hdfs. a super scalable key-value store. it works very much like a persistent hash-map (for python developers think like a dictionary). it is not a conventional relational database. it is a distributed, column oriented database. hbase uses hdfs for it's underlying. supports both batch-style computations using map reduce and point queries for random reads. https://hbase.apache.org/ cassandra a column oriented nosql data store which offers scalability, high availability with out compromising on performance. it perfect platform for commodity hardware and cloud infrastructure.cassandra's data model offers the convenience of column indexes with the performance of log-structured updates, strong support for de-normalization and materialized views , and powerful built-in caching. http://cassandra.apache.org/ flume a real time loader for streaming your data into hadoop. it stores data in hdfs and hbase.flume "channels" data between "sources" and "sinks" and its data harvesting can either be scheduled or event-driven. possible sources for flume include avro, files, and system logs, and possible sinks include hdfs and hbase. http://flume.apache.org/ mahout machine learning for hadoop, used for predictive analytics and other advanced analysis. there are currently four main groups of algorithms in mahout: recommendations, a.k.a. collective filtering classification, a.k.a categorization clustering frequent item set mining, a.k.a parallel frequent pattern mining mahout is not simply a collection of pre-existing algorithms; many machine learning algorithms are intrinsically non-scalable; that is, given the types of operations they perform, they cannot be executed as a set of parallel processes. algorithms in the mahout library belong to the subset that can be executed in a distributed fashion. http://en.wikipedia.org/wiki/list_of_machine_learning_algorithms https://www.coursera.org/course/machlearning https://mahout.apache.org/ fuse makes the hdfs system to look like a regular file system so that you can use ls, rm, cd etc., directly on hdfs data. whirr apache whirr is a set of libraries for running cloud services. whirr provides a cloud-neutral way to run services. you don't have to worry about the idiosyncrasies of each provider.a common service api. the details of provisioning are particular to the service. smart defaults for services. you can get a properly configured system running quickly, while still being able to override settings as needed. you can also use whirr as a command line tool for deploying clusters. https://whirr.apache.org/ giraph an open source graph processing api like pregel from google https://giraph.apache.org/ chukwa chukwa, an incubator project on apache, is a data collection and analysis system built on top of hdfs and map reduce. tailored for collecting logs and other data from distributed monitoring systems, chukwa provides a workflow that allows for incremental data collection, processing and storage in hadoop. it is included in the apache hadoop distribution as an independent module. https://chukwa.apache.org/ drill apache drill, an incubator project on apache, is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. drill is the open source version of google's dremel system which is available as an iaas service called google big query. one explicitly stated design goal is that drill is able to scale to 10,000 servers or more and to be able to process petabytes of data and trillions of records in seconds. http://incubator.apache.org/drill/ impala (cloudera) released by cloudera, impala is an open-source project which, like apache drill, was inspired by google's paper on dremel; the purpose of both is to facilitate real-time querying of data in hdfs or hbase. impala uses an sql-like language that, though similar to hiveql, is currently more limited than hiveql. because impala relies on the hive meta store, hive must be installed on a cluster in order for impala to work. the secret behind impala's speed is that it "circumvents map reduce to directly access the data through a specialized distributed query engine that is very similar to those found in commercial parallel rdbmss." (source: cloudera) http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html http://training.cloudera.com/elearning/impala/
June 3, 2015
by Umashankar Ankuri
· 23,879 Views · 3 Likes
  • Previous
  • ...
  • 267
  • 268
  • 269
  • 270
  • 271
  • 272
  • 273
  • 274
  • 275
  • 276
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×