DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Tools Topics

article thumbnail
Apache Camel in Space—Erh, in Docker and Kubernetes and Fancy Fabric8 Web Console
I have just recorded a 5 minute video that demonstrates running an out of stock example from Apache Camel release, the camel-example-servlet packaged as a docker container and running on a kubernetes platform, such as openshift 3. camel-servlet-example scaled up to 3 running containers (pods) which is easy with kubernetes and fabric8 In this video I have already deployed the example and then demonstrates how we can use the fabric8 web console to manage our application. And also connect to the running container and see inside, such as the Camel routes visually as shown above. Then I run a simple bash script from my laptop that sends a HTTP GET to the Camel example and prints the response. The script runs in a endless loop and demonstrates how kubernetes can easily scale up and down multiple Camel containers and load balance across the running containers. And at the end its even self healing when I force killing docker containers. So I suggest to grab a fresh cup of tea or coffee and sit back and play the 5 minutes video. The video is hosted on vimeo and can be seen from this link.
June 29, 2015
by Claus Ibsen
· 2,148 Views · 1 Like
article thumbnail
Stackato on the Microsoft Azure Cloud
The growth of Azure has been outstanding--more than 90,000 new subscriptions every month. And the innovation is exponential with over 500 new features and services being added to the platform in the last 12 months. We're very excited to be part of this growth. As we announced yesterday, you can now access Stackato through Azure. We think it's a great way for Azure customers to get access to a Cloud Foundry and Docker based PaaS. With Azure, Microsoft provides an easy path to the cloud for their customers. All applications can be run on one cloud. Microsoft wants to dominate the cloud the same as it has with on-premise software and rarely does a day go by without reading an article about Azure. Whether it's their recent announcement to help encourage start-up's use of Azure by providing $120,000 worth of credits per year or their commitment to open source. Azure gives its customers a growing collection of integrated services that make it easier to build and manage enterprise, mobile, web and Internet of Things (IoT) apps faster. Enterprises face real complexities when building their cloud solution. Having a solid infrastructure is really just the first step in the process--companies also need the right platform to support the deployment and management of their cloud-native applications. The platform should give their developers the freedom to use the language best suited to build the application. In addition, enterprises are on more than one cloud. They need to have the versatility to scale out or move their applications to whatever cloud is appropriate in order to meet end user demand without any downtime. With Stackato, we help remove these complexities. We provide enterprises with a polyglot PaaS that supports the development of applications in virtually any language. We like to refer to Stackato as being "infrastructure-agnostic" and allow companies to deploy their applications to any cloud--private, public or hybrid--without the need to run new scripts or re-package the application in order for it to work in the new environment. The combination of Stackato on Azure gives enterprises the technology they need to streamline application delivery, drive innovation and meet the demands of their customers.
June 29, 2015
by Kathy Thomas
· 926 Views
article thumbnail
R: Scraping the Release Dates of Github Projects
Continuing on from my blog post about scraping Neo4j’s release dates I thought it’d be even more interesting to chart the release dates of some github projects. In theory the release dates should be accessible through the github API but the few that I looked at weren’t returning any data so I scraped the data together. We’ll be using rvest again and I first wrote the following function to extract the release versions and dates from a single page: library(dplyr) library(rvest) process_page = function(releases, session) { rows = session %>% html_nodes("ul.release-timeline-tags li") for(row in rows) { date = row %>% html_node("span.date") version = row %>% html_node("div.tag-info a") if(!is.null(version) && !is.null(date)) { date = date %>% html_text() %>% str_trim() version = version %>% html_text() %>% str_trim() releases = rbind(releases, data.frame(date = date, version = version)) } } return(releases) } Let’s try it out on the Cassandra release page and see what it comes back with: > r = process_page(data.frame(), html_session("https://github.com/apache/cassandra/releases")) > r date version 1 Jun 22, 2015 cassandra-2.1.7 2 Jun 22, 2015 cassandra-2.0.16 3 Jun 8, 2015 cassandra-2.1.6 4 Jun 8, 2015 cassandra-2.2.0-rc1 5 May 19, 2015 cassandra-2.2.0-beta1 6 May 18, 2015 cassandra-2.0.15 7 Apr 29, 2015 cassandra-2.1.5 8 Apr 1, 2015 cassandra-2.0.14 9 Apr 1, 2015 cassandra-2.1.4 10 Mar 16, 2015 cassandra-2.0.13 That works pretty well but it’s only one page! To get all the pages we can use the follow_link function to follow the ‘Next’ link until there aren’t anymore pages to process. We end up with the following function to do this: find_all_releases = function(starting_page) { s = html_session(starting_page) releases = data.frame() next_page = TRUE while(next_page) { possibleError = tryCatch({ releases = process_page(releases, s) s = s %>% follow_link("Next") }, error = function(e) { e }) if(inherits(possibleError, "error")){ next_page = FALSE } } return(releases) } Let’s try it out starting from the Cassandra page: > cassandra = find_all_releases("https://github.com/apache/cassandra/releases") Navigating to https://github.com/apache/cassandra/releases?after=cassandra-2.0.13 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-2.0.10 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-2.0.8 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-1.2.13 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-2.0.0-rc1 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-1.2.3 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-1.2.0-beta2 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-1.0.10 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-1.0.6 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-1.0.0-rc2 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-0.7.7 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-0.7.4 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-0.7.0-rc3 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-0.6.4 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-0.5.0-rc3 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-0.4.0-final > cassandra %>% sample_n(10) date version 151 Mar 13, 2010 cassandra-0.5.0-rc2 25 Jul 3, 2014 cassandra-1.2.18 51 Jul 27, 2013 cassandra-1.2.8 21 Aug 19, 2014 cassandra-2.1.0-rc6 73 Sep 24, 2012 cassandra-1.2.0-beta1 158 Mar 13, 2010 cassandra-0.4.0-rc2 113 May 20, 2011 cassandra-0.7.6-2 15 Oct 24, 2014 cassandra-2.1.1 103 Sep 15, 2011 cassandra-1.0.0-beta1 93 Nov 29, 2011 cassandra-1.0.4 I want to plot when the different releases happened in time and in order to do that we need to create an extra column containing the ‘release series’ which we can do with the following transformation: series = function(version) { parts = strsplit(as.character(version), "\\.") return(unlist(lapply(parts, function(p) paste(p %>% unlist %>% head(2), collapse = ".")))) } bySeries = cassandra %>% mutate(date2 = mdy(date), series = series(version), short_version = gsub("cassandra-", "", version), short_series = series(short_version)) > bySeries %>% sample_n(10) date version date2 series short_version short_series 3 Jun 8, 2015 cassandra-2.1.6 2015-06-08 cassandra-2.1 2.1.6 2.1 161 Mar 13, 2010 cassandra-0.4.0-beta1 2010-03-13 cassandra-0.4 0.4.0-beta1 0.4 62 Feb 15, 2013 cassandra-1.1.10 2013-02-15 cassandra-1.1 1.1.10 1.1 153 Mar 13, 2010 cassandra-0.5.0-beta2 2010-03-13 cassandra-0.5 0.5.0-beta2 0.5 37 Feb 7, 2014 cassandra-2.0.5 2014-02-07 cassandra-2.0 2.0.5 2.0 36 Feb 7, 2014 cassandra-1.2.15 2014-02-07 cassandra-1.2 1.2.15 1.2 29 Jun 2, 2014 cassandra-2.1.0-rc1 2014-06-02 cassandra-2.1 2.1.0-rc1 2.1 21 Aug 19, 2014 cassandra-2.1.0-rc6 2014-08-19 cassandra-2.1 2.1.0-rc6 2.1 123 Feb 16, 2011 cassandra-0.7.2 2011-02-16 cassandra-0.7 0.7.2 0.7 135 Nov 1, 2010 cassandra-0.7.0-beta3 2010-11-01 cassandra-0.7 0.7.0-beta3 0.7 Now let’s plot those releases and see what we get: ggplot(aes(x = date2, y = short_series), data = bySeries %>% filter(!grepl("beta|rc", short_version))) + geom_text(aes(label=short_version),hjust=0.5, vjust=0.5, size = 4, angle = 90) + theme_bw() An interesting thing we can see from this visualisation is what overlap the various series of versions have. Most of the time there are only two series of versions overlapping but the 1.2, 2.0 and 2.1 series all overlap which is unusual. In this chart we excluded all beta and RC versions. Let’s bring those back in and just show the last 3 versions: ggplot(aes(x = date2, y = short_series), data = bySeries %>% filter(grepl("2\\.[012]\\.|1\\.2\\.", short_version))) + geom_text(aes(label=short_version),hjust=0.5, vjust=0.5, size = 4, angle = 90) + theme_bw() From this chart it’s clearer that the 2.0 and 2.1 series have recent releases so there will probably be three overlapping versions when the 2.2 series is released as well. The chart is still a bit cluttered although less than before. I’m not sure of a better way of visualising this type of data so if you have any ideas do let me know!
June 29, 2015
by Mark Needham
· 1,414 Views
article thumbnail
Scraping Github Pull Requests and Their Code Review Comments
Github stores its pull-request and code review data in MySql. I’d much prefer a git reperentation for both (JSON, commits, audit trail, etc). Kinda the way Github Wiki pages are stored. That’s an aside though, this article is about storing code-review comments long term. The problem I’m trying to solve is one of deletion of users thich causes their pull requenst commentary to also get deleted. Sure the commits make it back to the origin/master (in the pull request is processed), but many things are left assoctaed with the fork. If the user gets deleted such info is gone forever :( I want a permanent copy, so the interim answer is to scrape the data I fear losing, while it still exists. Hence a scrape-pull-requests.sh bash script (for Mac and maybe Linux). Github’s portal is written in Ruby on Rails. It is extremely fast which helps scraping generally. There’s not a lot of JavaScript and that means that Wget is a viable extraction tool. Anyway the script runs quickly, and leaves a decent HTML interface for easy access later. I’ve tested it, but won’t leave up a scraped set of pull-requests as our GH overlords might object on copyright grounds. They can’t object for your own GithubEnterprise instance of course. Github could change the structure of their HTML, and the script might stop work so well.If that happens I’m happy to accept back pull-requests via the usual mechanism :)
June 29, 2015
by Paul Hammant
· 1,993 Views
article thumbnail
Spark Grows Up and Scales Out
Written by Craig Wentworth. To understand the furor that’s greeted recent vendor announcements around open source analytics computing engine Spark, and some commentary seemingly setting up a Spark versus Hadoop battle, it’s worth taking a moment to recap on what each actually is (and is not). As I covered in last year’s MWD report on Hadoop and its family of tools, when people talk about Apache Hadoop they’re often referring to a whole framework of tools designed to facilitate distributed parallel processing of large datasets. That processing was traditionally confined to MapReduce batch jobs in Hadoop’s early days, though Hadoop 2 brought the YARN resource scheduler and opened up Hadoop to streaming, real-time querying and a wider array of analytical programming applications (beyond MapReduce). Spark has been designed to run on top of Hadoop’s Distributed File System (amongst other data platforms) as an alternative to MapReduce – tuned for real-time streaming data processing and fast interactive queries, and with multi-genre analytics applicability (machine learning, time series, graph, SQL, streaming out-of-the-box). It gets that speed advantage by caching in-memory (rather than writing interim results to disk, as MapReduce does), but with that approach comes a need for higher-spec physical machines (compared with MapReduce’s tolerance for commodity hardware). So, Spark isn’t about to replace Hadoop -- but it may well supplant MapReduce (especially in growing real-time use cases). Those “Spark vs Hadoop” headlines are about as meaningful as one proclaiming “mushrooms vs pizza." Yes, mushroom might be a more suitable topping than, say, pepperoni (especially in a vegetarian use case), but it’ll still be deployed on the same dough and tomato sauce pizza platform. Nobody’s about to suggest the mushroom should go it alone! But what’s behind the headlines and the hype is a story of enterprise adoption – or at least vendors anticipating that adoption and investing in ‘the weaponization of Spark’ as it faces the more exacting standards of security, scaling performance, consistency, etc. which come with mainstream enterprise deployment. Big names like IBM, Databricks (the company formed by the originators of Spark), and MapR made commitments in and around the Spark Summit earlier this month. MapR has announced three new Quick Start Solutions for its Hadoop distribution to help customers get started with Spark in real-time security log analytics, genome sequencing, and time series analytics; Databricks’ cloud-hosted Spark platform (formerly known as Databricks Cloud) has become generally available; and IBM announced a raft of measures designed to give Spark a significant shot in the arm – it’s open sourcing its SystemML technology to bolster Spark’s machine learning capabilities, integrating Spark into its own analytics platforms, investing in Spark training and education, committing 3,500 of its researchers and developers to work on Spark-related projects, and offering Spark as a service on its Bluemix developer cloud. Given the overlap with Databricks’ business model (of offering development, certification, and support for Spark), IBM’s intentions are likely to tread on some toes before long – but for now, at least, both companies are content to focus on the combined push benefiting the Spark community and its enterprise aspirations overall (though clearly IBM’s betting on all this investment buying it some influence over where Spark goes next). It’s worth bearing in mind that not all its supporters champion Spark wholesale and all the interested parties tend to be interested in particular bits of Spark (as wide-ranging as it is) because of overlaps with their own preferred toolsets. For instance, although Spark supports many analytics genres, Cloudera focuses on its machine learning capabilities (as it has its own SQL-on-Hadoop tool in Impala), and MapR and Hortonworks also promote Drill and Hive as their favoured source of SQL-on-Hadoop. IBM’s support is focused on Spark’s machine learning and in-memory capabilities (hence the SystemML open sourcing news). In the face of such strong vendor preferences, how long before some of Spark’s current features fall away (or at least start to show the effects of being starved of as much care and feeding as is bestowed upon vendors’ favourite Spark components)? The Spark community is at much the same place the Hadoop one was at a while back – it’s showing great promise and suitability in key growth workloads (in Spark’s case, such as real-time IoT applications). However, the product as it stands is too immature for many enterprise tastes. Cue enterprise software vendors stepping up to help grow Spark up fast. Their challenge though is to smooth out the edges without smothering what made it so interesting in the first place.
June 28, 2015
by Angela Ashenden
· 2,343 Views
article thumbnail
Docker Events and Docker Metrics Monitoring
Docker deployments can be very dynamic with containers being started and stopped, moved around the YARN or Mesos-managed clusters, having very short life spans (the so-called pets) or long uptimes (aka cattle). Getting insight into the current and historical state of such clusters goes beyond collecting container performance metrics and sending alert notifications. If a container dies or gets paused, for example, you may want to know about it, right? Or maybe you’d want to be able to see that a container went belly up in retrospect when troubleshooting, wouldn’t you? Just two weeks ago we added Docker Monitoring (docker image is right here for your pulling pleasure) to SPM. We didn’t stop there — we’ve now expanded SPM’s Docker support by adding Docker Event collection, charting, and correlation. Every time a container is created or destroyed, started, stopped, or when it dies, spm-agent-docker captures the appropriate event so you can later see what happened where and when, correlate it with metrics, alerts, anomalies — all of which are captured in SPM — or with any other information you have at your disposal. The functionality and the value this brings should be pretty obvious from the annotated screenshot below. Like this post? Please tweet about Docker Events and Docker Metrics Monitoring Know somebody who’d find this post useful? Please let them know… Here’s the list of Docker events SPM Docker monitoring agent currently captures: Version Information on Startup: server-info – created by spm-agent framework with node.js and OS version info on startup docker-info – Docker Version, API Version, Kernel Version on startup Docker Status Events: Container Lifecycle Events like create, exec_create, destroy, export Container Runtime Events like die, exec_start, kill, oom, pause, restart, start, stop, unpause Every time a Docker container emits one of these events spm-agent-docker will capture it in real-time, ship it over to SPM, and you’ll be able to see it as shown in the above screenshot. Oh, and if you’re running CoreOS, you may also want to see how to index CoreOS logs into ELK/Logsene. Why? Because then you can have not only metrics and container events in one place, but also all container and application logs, too! If you’re using Docker, we hope you find this useful! Anything else you’d like us to add to SPM (for Docker or anyother integration)? Leave a comment, ping @sematext, or send us email – tell us what you’d like to get for early Christmas!
June 27, 2015
by Stefan Thies
· 3,169 Views
article thumbnail
Geek Reading Week of June 26, 2015
Leading today, VentureBeat reports on the quiet launch of Google’s Cloud Source Repositories. This seems like something we should have heard more about, but I don’t remember seeing anything about it. Amazon AWS announces the availability of all things Alexa, the Skills Kit, the Voice Service and a Fund. Last but not least, we have AJ Kohn, from Blind Five Year Old, talking about click-through rate being a ranking signal on Google’s search results. I don’t talk about SEO much, but reading AJ’s work is always fascinating. As always, enjoy today’s items, and please participate in the discussions on these sites. Top Stories Google has quietly launched a GitHub competitor, Cloud Source Repositories | VentureBeat Alexa Skills Kit, Alexa Voice Service, Alexa Fund | AWS Official Blog Startups, Career and Process Why offices are where work goes to die | Swizec Teller Unleashing the power of small teams | Andreas Papathanasis What Is A Tester? | Developsense Blog What happens when you stop relying on resumes | Aline Lerner Design and Development Swift 2: SIMD | Russ Bishop Why is Git better than Mercurial? | Javalobby Create a Maven archetype | Javalobby pip -t: A simple and transparent alternative to virtualenv | Zoomer Analytics Killing Off Wasabi – Part 1 | Fog Creek Blog WebAssembly- Explained | Modus Create Generating JSON Schema from XSD with JAXB and Jackson | Inspired by Actual Events AI, Machine Learning, Research and Advanced Algorithms Applying Machine Learning to Text Mining with Amazon S3 and RapidMiner | Amazon AWS Big Data, Visualization, SQL and NoSQL Is Click Through Rate A Ranking Signal? | Blind Five Year Old Cache-friendly binary search | Bannalia Discovering the Computer Science Behind Postgres Indexes | Java Code Geeks How an open-source competitive benchmark helped to improve databases | ArangoDB Security, Encryption and Cryptography Cracking JXcore… Again | Mark Haase Link Collections The Daily Six Pack: June 25, 2015 | Dirk Strauss Double Shot #1517 | A Fresh Cup Dew Drop – June 25, 2015 (#2042) | Morning Dew The Daily Six Pack: June 26, 2015 | Dirk Strauss
June 27, 2015
by Robert Diana
· 989 Views
article thumbnail
CryTek's CryEngine 3.8.1 Released: Updates Include Linux and VR Support
CryTek just released its CryEngine 3.8.1, an update packed with features. The CryEngine website dubs this the heftiest upgrade since debuting their Engine-as-a-Service in May. Among the many new features is virtual reality (VR) support, OpenGL compatibility, and Linux support. One of the biggest trends in gaming is VR, and CryEngine 3.8.1 adds VR support. Initially, it’s limited to the Oculus Rift, but chances are as more headsets emerge and see adoption among both developers and gamers, compatibility will expand to support these as well. Epic Games’ Unreal Engine also added VR support, along with Unity3D’s most recent release. There’s a neat VR demo on the CryTek website, which developers will surely want to check out. Another significant change is the addition of Linux support. While Wine and Playonlinux have both helped many games to run on a variety of Linux-based operating systems (OSes). However, native Linux support means easier use for developers. As more games add Linux compatibility, spearheaded by Steam’s SteamOS, the CryEngine itself can now be run on Linux. This latest update means that the CryEngine will join Unity3D, the Unreal Engine, and Source as a powerful game development engine with Linux and VR support. Virtual reality is seeing widespread adoption among the developer community, and Linux compatibility in gaming is a huge trend. While CryEngine 3.8.1’s ability to run on Linux won’t necessarily mean games developed will be compatible on the popular open source OS, it certainly makes it easier to ensure Linux support. There’s also OpenGL support, which will also aid cross-platform development. The CryEngine has been used to create many games known for their gorgeous eye-candy. Notably, the CryTek’s aptly names Crysis series is built with the CryEngine, as is Ryse: Son of Rome, State of Decay, and Kingdom Come: Deliverance.
June 26, 2015
by Moe Long DZone Core CORE
· 1,277 Views
article thumbnail
Spring Integration Kafka 1.2 is Available, With 0.8.2 Support and Performance Enhancements
Spring Integration Kafka 1.2 is out with a major performance overhaul.
June 25, 2015
by Pieter Humphrey
· 2,960 Views
article thumbnail
7 Things I Didn’t Expect to Hear at Gartner’s IT Ops Summit
Last week’s Gartner IT Operations Strategies & Solutions Summit in Orlando, Fla., was exactly what you’d expect—a place to talk about the IT operations issues impacting some of the largest companies in the world. Even so, there were a few interesting surprises. Among them: 1. Bi-modal is big. Not everyone will succeed. Gartner continued to tell its customers to employ two modes of IT—a traditional, slower moving capability for older, typically internal systems of record; and a high-speed, experimental one for new, typically customer-facing Web and mobile apps. “This is a time of experimentation and innovation,” said Gartner VP and distinguished analyst Chris Howard in his opening keynote. Organizations can’t ignore that there are multiple speeds and they should participate in all. Gartner managing VPRonni Colville added that by 2017, 75% of IT orgs will have this “bi-modal” IT capability. See also: Bi-Modal IT: Gartner Endorses Both Disruptive and Conservative Approaches to Technology However, “50% will make a mess of it,” Colville said. Why? Not necessarily because of technology failings, but more often because of a lack of people skills. 2. IT success is all about people. Donna Scott, also a Gartner VP and distinguished analyst, told her keynote audience that “you will be judged on agility, speed, and innovation.” However, the biggest problems Gartner sees for infrastructure and operations team engagement and innovation are lack of time, company culture that’s not conducive to these approaches, and a lack of business skills in IT. More than half of the people responding to an in-room poll said “people” are the part of IT ops that must change first. Not technology. Gartner research director George Spafford underscored similar issues in large organizations trying to use DevOps at scale: people and “human factors” are the biggest concerns from his in-room poll. All these probably contributed to hiring best-selling author Daniel Pink as a keynote speaker on the opening day of the conference. His focus? Not IT or architecture. Instead, he pounded home the importance of influencing people and selling internally. 3. Big orgs are trying DevOps. But the issues are different at scale. In numerous sessions I saw many hands go up when analysts asked, “Who here is trying DevOps?” Clearly, the approach is getting traction in large companies. But there’s lots of learning still to do. In fact, that was Spafford’s biggest bit of advice. “Always be learning,” he said, “trying to see what works and what breaks, especially at scale.” And, even once you’ve had some initial success, keep learning. “If you’ve done ‪DevOps, stay humble,” he advised. 4. Looking to innovative organizations for ideas … analytics on the rise. Many sessions addressed how large organizations are taking on ideas fostered by smaller, more risk-tolerant companies, and offered advice for doing so successfully. In addition to multiple discussions of DevOps, an entire session was devoted to establishing your own “Genius Bar®—a “walk-up IT support center” as explained in this CIO article. As at previous conferences, Gartner research VP Cameron Haight ran several sessions on lessons learned from firms running massive, Web-scale IT systems. “You need lots of data … and access to it inexpensively,” he said. Some commercial monitoring companies (New Relic included!) got a shout out for taking the lessons of Web scale IT to heart in their offerings. In addition, Haight said, “Analytics are increasingly important for application performance monitoring given the huge amount of data now available.” 5. Cloud: Enterprises want it, but aren’t very good at it yet. Gartner research director Dennis Smith talked through the enterprise’s interest in cloud computing. A huge majority of his in-room poll wanted some mix of both public and private cloud, while only 9% wanted to use only a private cloud environment and a measly 4% were looking to move entirely to the public cloud. The most popular choice (41%) was an 80/20 split between private and public cloud infrastructure. “Enterprises don’t make the dean’s list,” for cloud usage, Smith said, earning no more than a C average in his opinion. Large organizations are doing well at visibility, governance, and delivering standardized stacks, he said, but are less skilled at optimizing for these new environments. Still, Smith said the trends point toward enterprises improving on all fronts. 6. Cloud security can be better than yours. Importantly, Gartner VP and distinguished analyst Neil MacDonald gave the cloud a vote of confidence: noting that, for a variety of reasons, “Well-managed public cloud can be more secure than your own data center.” For example, on-premise software can pose serious security risks, he said, because of “deployment lag” where customers are stuck using software releases with unpatched security vulnerabilities. With a cloud-based Software-as-a-Service (SaaS), security updates can be more quickly rolled out to all customers. But cloud security can be different, requiring a shift to information-level security from OS-level security. Best practices include doing away with a huge pool of all-powerful sysadmins in favor of JEA, or “just enough administration,” where sysadmins have just enough privileges to do their job, and no more. An analogous security practice for compute resources is “least privilege,” where apps and microservices can’t talk to each other unless they specifically need to do so. Audience polling supported MacDonald’s optimistic view of cloud security, which suggests that large enterprises may struggle less with their cloud policies moving forward. 7. Containers: Try ’em! Ahead of this week’s DockerCon in San Francisco, Gartner devoted significant airtime to educating the audience on containers and microservices. My summary of ‪Gartner VP and distinguished analyst Tom Bittman’s advice on containers was simple: Try ’em. Now. Complement them with VMs. ‪And Docker (the company) is important, but not the be-all and end-all in this space. Bittman (copping to some deja vu from Gartner presentations he made on server virtualization 13 years ago) noted that while virtualization has been focused on admin and ops functions, containers are focused on value for developers. But because containers are well suited for driving up VM utilization for workloads that share the same OS, we can expect to see more combinations of containers and server virtualization. Finally, Bittman underscored that Gartner doesn’t see containers having much impact on premise, but making a huge difference in the cloud. That doesn’t necessarily fit with what’s been shown in other research, such as this 2015 State of Containers Survey sponsored by VMblog.com and StackEngine, so we’ll want to watch how this plays out. This is all a lot to digest. The Gartner IT Operations Strategies & Solutions Summitacknowledges the importance of dealing with existing IT systems and practices as well as promising new technologies and thinking, and tries to point a way forward. In fact, Haight had a very good quote about microservices that I thought also served to wrap up the entire event: “If you want to run with the big dogs, you need to rethink application architecture,” he said. That can be very difficult for an enterprise to fully implement … but also very appealing. Note: Al Sargent contributed to this post. All product and company names herein may be trademarks of their registered owners. Server, tortoise and hare, business team, and cloud security images courtesy ofShutterstock.com.
June 24, 2015
by Fredric Paul
· 1,820 Views
article thumbnail
Literate Programming and GitHub
I remain captivated by the ideals of Literate Programming. My fork of PyLit (https://github.com/slott56/PyLit-3) coupled with Sphinx seems to handle LP programming in a very elegant way. It works like this. Write RST files describing the problem and the solution. This includes the actual implementation code. And everything else that's relevant. Run PyLit3 to build final Python code from the RST documentation. This should include the setup.py so that it can be installed properly. Run Sphinx to build pretty HTML pages (and LaTeX) from the RST documentation. I often include the unit tests along with the sphinx build so that I'm sure that things are working. The challenge is final presentation of the whole package. The HTML can be easy to publish, but it can't (trivially) be used to recover the code. We have to upload two separate and distinct things. (We could use BeautifulSoup to recover RST from HTML and then PyLit to rebuild the code. But that sounds crazy.) The RST is easy to publish, but hard to read and it requires a pass with PyLit to emit the code and then another pass with Sphinx to produce the HTML. A single upload doesn't work well. If we publish only the Python code we've defeated the point of literate programming. Even if we focus on the Python, we need to do a separate upload of HTML to providing the supporting documentation. After working with this for a while, I've found that it's simplest to have one source and several targets. I use RST ⇒ (.py, .html, .tex). This encourages me to write documentation first. I often fail, and have blocks of code with tiny summaries and non-existent explanations. PyLit allows one to use .py ⇒ .rst ⇒ .html, .tex. I've messed with this a bit and don't like it as much. Code first leaves the documentation as a kind of afterthought. How can we publish simply and cleanly: without separate uploads? Enter GitHub and gh-pages. See the "sphinxdoc-test" project for an example. Also thishttps://github.com/daler/sphinxdoc-test. The bulk of this is useful advice on a way to create the gh-pages branch from your RST source via Sphinx and some GitHub commands. Following this line of thinking, we almost have the case for three branches in a LP project. The "master" branch with the RST source. And nothing more. The "code" branch with the generated Python code created by PyLit. The "gh-pages" branch with the generated HTML created by Sphinx. I think I like this. We need three top-level directories. One has RST source. A build script would run PyLit to populate the (separate) directory for the code branch. The build script would also run Sphinx to populate a third top-level directory for the gh-pages branch. The downside of this shows up when you need to create a branch for a separate effort. You have a "some-major-change" branch to master. Where's the code? Where's the doco? You don't want to commit either of those derived work products until you merge the "some-major-change" back into master. GitHub Literate Programming There are many LP projects on GitHub. There are perhaps a dozen which focus on publishing with the Github-flavored Markdown as the source language. Because Markdown is about as easy to parse as RST, the tooling is simple. Because Markdown lacks semantic richness, I'm not switching. I've found that semantically rich markup is essential. This is a key feature of RST. It's carried forward by Sphinx to create very sophisticated markup. Think:code:`sample` vs. :py:func:`sample` vs. :py:mod:`sample` vs.:py:exc:`sample`. The final typesetting may be similar, but they are clearly semantically distinct and create separate index entries. A focus on Markdown seems to be a limitation. It's encouraging to see folks experiment with literate programming using Markdown and GitHub. Perhaps other folks will look at more sophisticated markup languages like RST. Previous Exercises See https://sourceforge.net/projects/stingrayreader/ for a seriously large literate programming effort. The HTML is also hosted at SourceForge: http://stingrayreader.sourceforge.net/index.html. This project is awkward because -- well -- I have to do a separate FTP upload of the finished pages after a change. It's done with a script, not a simple "git push." SourceForge has a GitHub repository. https://sourceforge.net/p/stingrayreader/code/ci/master/tree/. But. SourceForge doesn't use GitHub.com's UI, so it's not clear if it supports the gh-pages feature. I assume it doesn't, but, maybe it does. (I can't even login to SourceForge with Safari... I should really stop using SourceForge and switch to GitHub.) See https://github.com/slott56/HamCalc-2.1 for another complex, LP effort. This predates my dim understanding of the gh-pages branch, so it's got HTML (in doc/build/html), but it doesn't show it elegantly. I'm still not sure this three-branch Literate Programming approach is sensible. My first step should probably be to rearrange the PyLit3 project into this three-branch structure.
June 24, 2015
by Steven Lott
· 1,963 Views
article thumbnail
New Relic’s Docker Monitoring Now Generally Available
[This article was written by Andrew Marshall] We’ve been talking a lot about Docker over the past few weeks—with good reason. Docker’s explosive growth in popularity within the enterprise has enabled new distributed application architectures and with it a need for app-centric monitoring of your Docker containers within the context of the rest of your infrastructure. We’re thrilled to announce today that New Relic’s Docker monitoring is now generally available to New Relic customers, just in time for DockerCon 2015! (And as we noted last week, New Relic’s Docker monitoring solution has been selected by Docker for its Ecosystem Technology Partner program as a proven container monitoring solution.) Why app-centric monitoring? If you’re a software business using Docker containers, chances are you’ve done so to gain efficiencies from your system resources or portability across environments to shorten the cycle between writing and running code. Either way, adding Docker containers to your app development meant a new tier of infrastructure to monitor, which equated to a “black box” in your data—one that you had no visibility into from a monitoring perspective, Docker monitoring with New Relic is designed to “fix” this lack of monitoring visibility by adding an app-centric view of Docker containers to the existing New Relic Servers interface you already use. Now, instead of having a gap between the application and server monitoring views, we’ve added the ability to see containers with the same “first-class“ experience as you would with virtual machines and servers. You can now drill down from the application (which is really what you care about) to the individual Docker container, and then to the physical server. No more blind spots! As we strive to do with all of our products, we took the approach of “important” over “impressive” when it comes to the container information we provide to users. Based on direct feedback from customers, we’ve tried to take the mystery out of finding the right container to help you get back to developing your applications. As the way people use containers changes over time, we plan to continue to listen to our customers to help shape how we approach Docker container monitoring. Restoring 360-degree view of your application environment One example of how app-centric monitoring can impact a team moving to microservices or distributed application environments is Motus, a mobile workforce management company. Motus has been a New Relic customer for more than four years and recently has been shifting to a microservices architecture with approximately 95% of its production workload now running in Docker containers. While Docker helpd Motus gain speed and agility while reducing infrastructure complexity, the link between the application and what was happening with the container it was running on was broken. During the trial of New Relic’s Docker monitoring, Motus was able to more easily identify which container an app was running on, all the way down to the node. That was a big help when they needed to investigate an issue and determine if a new container was required.. During the beta alone, Motus estimates that using New Relic helped them to reduce the time to investigate and fix problems with its Docker containers by 30%! Motus isn’t just using New Relic to diagnose when a problem occurs. Docker monitoring with New Relic has helped Motus analyze and “right size” its containers for the application to better allocate resources for performance and budget. Get started with New Relic’s Docker monitoring today, for more information, please stop by our booth at DockerCon, June 22-23 in San Francisco! Resources: Motus Docker Monitoring Case Study Docker Monitoring with New Relic Enabling Docker Monitoring with New Relic Docker in the New Relic Community Forum
June 24, 2015
by Fredric Paul
· 1,019 Views
article thumbnail
Git for Windows, Getting Invalid Username or Password with Wincred
if you use https to communicate with your git repository, es, github or visualstudioonline, you usually setup credential manager to avoid entering credential for each command that contact the server. with latest versions of git you can configure wincred with this simple command. git config --global credential.helper wincred this morning i start getting error while i’m trying to push some commits to github. $ git push remote: invalid username or password. fatal: authentication failed for 'https://github.com/proximosrl/jarvis.documents tore.git/' if i remove credential helper (git config –global credential.helper unset) everything works, git ask me for user name and password and i’m able to do everything, but as soon as i re-enable credential helper, the error returned. this problem is probably originated by some corruption of stored credentials, and usually you can simply clear stored credentials and at the next operation you will be prompted for credentials and everything starts worked again. the question is, where are stored credential for wincred? if you use wincred for credential.helper, git is storing your credentials in standard windows credential manager you can simply open credential manager on your computer, figure 1: credential manager in your control panel settings opening credential manager you can manage windows and web credentials. now simply have a look to both web credentials and windows credentials, and delete everything related to github or the server you are using. the next time you issue a git command that requires authentication, you will be prompted for credentials again and the credentials will be stored again in the store. gian maria.
June 23, 2015
by Ricci Gian Maria
· 20,951 Views
article thumbnail
You're Invited: MuleSoft Forum in Gurgaon, Mumbai & Bangalore!!!
Don’t miss the opportunity to engage in an interactive discussion with your peers around digital transformation at MuleSoft Forum, and learn why an API-led connectivity has become IT's secret weapon. Register a seat for yourself using the following links: Gurgaon - 9th July Mumbai - 14th July Bangalore - 16th July What's in store at MuleSoft Forum: Whether you're the CIO, an Enterprise Architect, or Developer, MuleSoft Forum will provide you with a number of innovative ways to improve and transform your business. At MuleSoft Forum, you'll have the opportunity to: Learn how Mule ESB Enterprise Edition offers reliability, performance, scalability, security – all out-of-the box. See MuleSoft's connectivity platform in action in a short live demo Discover how API-Led Connectivity enables major business initiatives Attend the exclusive MuleSoft ecosystem networking reception
June 23, 2015
by Selvin Raj
· 1,355 Views · 1 Like
article thumbnail
It's Time to Start Programming (for) Adults
This week we're in Boston at DevNation, an awesome, young (second ever), and relatively intimate (~500 attendees) conference on anything and everything hard-core, cool-and-hot (DevOps, big data, Angular, IoT, you name it), and of course -- since the conference is organized by Red Hat -- totally open-source. So far I've had in-depth conversations with five super-amazing engineers, attended several inspiring keynotes, and chatted with one skilled developer after another. We'll transcribe the deeper interviews shortly, including some on topics totally unrelated to this post. But meanwhile I'd like to offer some thoughts inspired by the first day of the event. The general theme is: we're just beginning to get serious about separation of concerns. The metaphor that keeps popping into my head comes from the first keynote: machines have finally grown up. Imperatives: telling really unintelligent agents what to do (and then they sort of do whatever they please) It is trivial to observe that computers are incredibly stupid. Turing's fundamental paper is about how to figure out whether a theoretical computer will keep calculating the values of a function until the heat death of the universe (okay that's a slight oversimplification). The fact that Edsger Dijkstra felt the need to rail gently against all goto statements in any higher-level language than machine code suggests that, in 1968, far too many computers needed instructions about how to read the instructions that tell them what to do in the first place. Richard Feynman's famous lecture on computer heuristics is the condescension of the man who conceived quantum computing to the level of functional composition (hmmm) and file systems (double sigh). Stupid agents need to be told exactly what to do. Then they need to be told to pay attention to the exact part of the command that tells them exactly what they have been told to do (dude, just goto line 1343 already and shut up). Then they don't do what you told them (optimistically we call this an 'exception'), and then you send them into time out / set a break point and try to figure out where the idiot state muted off the rails. They stare blankly at the wall / variable / register and either do nothing or repeat another unintelligibly wrong result until you notice that your increment is (apparently meaninglessly to you) one bracket too deep. You sigh and tell them what to do again, and after a while they hit age thirty (life-years/debug-hours) and maybe do something useful with their (process-)lives. Well, maybe I'm straining the metaphor a little here, but you get the point because it cuts too close to home. We spend far too much time fixing stupid mistakes that we didn't even know we were making because -- like all actual human beings -- we assumed that the agent we commanded will use their common sense to iron out those few whiffs of, admit it, frank nonsense that our step-by-step instructions will probably always contain. So, at least, goes the imperative programming paradigm. The machine does what you tell it to; and the universe collapses onto itself before the last real number is computed. Functions: reliable, predictable adults Time to give credit where it's due: I'm really just riffing on the metaphor Venkat Subramanian offered in his highly enjoyable keynote on The Joy of Functional Programming yesterday morning His not-so-smart agents -- the 'programmed' of imperative programming -- were toddlers. Since I don't have any kids, I can't presume to understand this experience fully (although I did grow up with three younger brothers..). But the general idea is: imperative programming is tricky because, when you spell everything out super literally, it's very hard to tell exactly why what you thought should happen didn't. Venkat's talk was a whirlwind of functional concepts, from the thrill of immutability to the self-evident utility of memoization. For random (Myers-Briggs?) reasons, the object-oriented paradigm never seemed very intuitive to me -- I've gravitated towards functional style even when the problem domain wasn't actually modeled very well by functions -- but Venkat's side-by-side implementations of simple calculations in OO and functional Java showed the readability delta very clearly. Functional code is beautiful because it looks like its purpose. It tells you flat-out: here is what I do; and then it does it. But immutable functions are also beautiful because they do exactly the same thing every time. I couldn't count on my two year old brother very much at all because given a certain input I had pretty much no idea what would come out. But we all count on our grown-up collaborators to output exactly what they should, given a definite input, predictably and reliably every time. Of course, people also do more than expected -- every intervention of intelligence is an injection of creativity, not generated by the definition of the function -- but at least they do what you need them to do and no less. Containers: grown-ups with good boundaries I'm picking out just one aspect of the resurgent 'joy' of functional programming because the renaissance of containerization (another 'old' technology that is just now really taking off) is, I think, a part of the same shift toward, let's say, treating computers as adults. If functions are reliable agents, then applications in well-defined containers are self-sufficient agents who know exactly what they need from others and neither require nor demand anything more. If apps on dedicated VMs are teenagers negotiating personal boundaries by waking/booting up independently (and taking far too long -- and far too many resources -- to do so, given their meager output) -- or bubble boys, isolated in ways that are unfortunate in order to isolate in ways that are absolutely necessary -- then containerized applications are subway-riders who jam into the train without offending anyone or campers who can live anywhere with just a backpack of just the stuff they need. Of course, subway-riders and campers do more than just not-mess-up. But what's kind of neat about containers is that -- like an adult with good boundaries -- clearly defined bounds and interfaces free up the application / mind to do whatever world-changing thing the developer / human has cooked up. I'll come back to this metaphor in a later article. (Mesh networks, SDN, and ad-hoc computing are all part of the same picture, I think. Kubernetes probably is too, along with event-driven and reactive programming, the actor model, dreams of Smalltalk, and of course REST, at least of the HATEOAS flavor.) But maybe this isn't a good way to think about some of these recent sparks in devworld within a single paradigm -- and maybe my perpetual discomfort with OO is influencing me too much. What do you think?
June 23, 2015
by John Esposito
· 1,930 Views · 1 Like
article thumbnail
Neo4j: The Foul Revenge Graph
Last week I was showing the foul graph to my colleague Alistair who came up with the idea of running a ‘foul revenge’ query to find out which players gained revenge for a foul with one of their own later in them match. Queries like this are very path centric and therefore work well in a graph. To recap, this is what the foul graph looks like: The first thing that we need to do is connect the fouls in a linked list based on time so that we can query their order more easily. We can do this with the following query: MATCH (foul:Foul)-[:COMMITTED_IN_MATCH]->(match) WITH foul,match ORDER BY match.id, foul.sortableTime WITH match, COLLECT(foul) AS fouls FOREACH(i in range(0, length(fouls) -2) | FOREACH(foul1 in [fouls[i]] | FOREACH (foul2 in [fouls[i+1]] | MERGE (foul1)-[:NEXT]->(foul2) ))); This query collects fouls grouped by match and then adds a ‘NEXT’ relationship between adjacent fouls. The graph now looks like this: Now let’s find the revenge foulers in the Bayern Munich vs Barcelona match. We’re looking for the following pattern: This translates to the following cypher query: match (foul1:Foul)-[:COMMITTED_AGAINST]->(app1)-[:COMMITTED_FOUL]->(foul2)-[:COMMITTED_AGAINST]->(app2)-[:COMMITTED_FOUL]->(foul1), (player1)-[:MADE_APPEARANCE]->(app1), (player2)-[:MADE_APPEARANCE]->(app2), (foul1)-[:COMMITTED_IN_MATCH]->(match:Match {id: "32683310"})<-[:COMMITTED_IN_MATCH]-(foul2) WHERE (foul1)-[:NEXT*]->(foul2) RETURN player2.name AS firstFouler, player1.name AS revengeFouler, foul1.time, foul1.location, foul2.time, foul2.location I’ve added in a few extra parts to the pattern to pull out the players involved and to find the revenge foulers in a specific match – the Bayern Munich vs Barcelona Semi Final 2nd leg. We end up with the following revenge fouls: We can see here that Dani Alves actually gains revenge on Bastian Schweinsteiger twice for a foul he made in the 10th minute. If we tweak the query to the following we can get a visual representation of the revenge fouls as well: match (foul1:Foul)-[:COMMITTED_AGAINST]->(app1)-[:COMMITTED_FOUL]->(foul2)-[:COMMITTED_AGAINST]->(app2)-[:COMMITTED_FOUL]->(foul1), (player1)-[:MADE_APPEARANCE]->(app1), (player2)-[:MADE_APPEARANCE]->(app2), (foul1)-[:COMMITTED_IN_MATCH]->(match:Match {id: "32683310"})<-[:COMMITTED_IN_MATCH]-(foul2), (foul1)-[:NEXT*]->(foul2) RETURN * At the moment I’ve restricted the revenge concept to single matches but I wonder whether it’d be more interesting to create a linked list of fouls which crosses matches between teams in the same season. The code for all of this is on github – the README is a bit sketchy at the moment but I’ll be fixing that up soon.
June 23, 2015
by Mark Needham
· 1,010 Views
article thumbnail
Devnation Keynote 6/22 #2: The Future of Development with Kubernetes and Docker
From the DevNation Agenda site: You've probably heard a lot about Linux containers and the exciting potential they hold. In this presentation, Matt Hicks will cover how Docker and Kubernetes have evolved to fundamentally change how you will approach development and operations. If you are looking for an understanding of the technology and how it relates to the common roles in IT today, this is the talk to watch. Speaker: Matt Hicks -- Vice President of engineering, Red Hat Matt Hicks is a founding member of the OpenShift by Red Hat team. He has spent more than a decade in software engineering, with a variety of roles in development, operations, architecture, and management. His real expertise is in bridging the gap between developing code and actually running it in production. An expert in IT and cloud-based architectures, he spends his time these days evolving OpenShift to use the power of cloud and make developers more productive.
June 22, 2015
by N A
· 1,087 Views · 2 Likes
article thumbnail
Long-Term Log Analysis with AWS Redshift
You will aggregate a lot of logs over the lifetime of your product and codebase, so it’s important to be able to search through them. In the rare case of a security issue, not having that capability is incredibly painful. You might be able to use services that allow you to search through the logs of the last two weeks quickly. But what if you want to search through the last six months, a year, or even further? That availability can be rather expensive or not even an option at all with existing services. Many hosted log services provide S3 archival support which we can use to build a long-term log analysis infrastructure with AWS Redshift. Recently I’ve set up scripts to be able to create that infrastructure whenever we need it at Codeship. AWS Redshift AWS Redshift is a data warehousing solution by AWS. It has an easy clustering and ingestion mechanism ideal for loading large log files and then searching through them with SQL. As it automatically balances your log files across several machines, you can easily scale up if you need more speed. As I said earlier, looking through large amounts of log files is a relatively rare occasion; you don’t need this infrastructure to be around all the time, which makes it a perfect use case for AWS. Setting Up Your Log Analysis Let’s walk through the scripts that drive our long-term log analysis infrastructure. You can check them out in the flomotlik/redshift-logging GitHub repository. I’ll take you step by step through configuring the whole setup of the environment variables needed, as well as starting the creation of the cluster and searching the logs. But first, let’s get a high-level overview of what the setup script is doing before going into all the different options that you can set: Creates an AWS Redshift cluster. You can configure the number of servers and which server type should be used. Waits for the cluster to become ready. Creates a SQL table inside the Redshift cluster to load the log files into. Ingests all log files into the Redshift cluster from AWS S3. Cleans up the database and prints the psql access command to connect into the cluster. Be sure to check out the script on GitHub before we go into all the different options that you can set through the .env file. Options to set The following is a list of all the options available to you. You can simply copy the .env.template file to .env and then fill in all the options to get picked up. AWS_ACCESS_KEY_ID AWS key of the account that should run the Redshift cluster. AWS_SECRET_ACCESS_KEY AWS secret key of the account that should run the Redshift cluster. AWS_REGION=us-east-1 AWS region the cluster should run in, default us-east-1. Make sure to use the same region that is used for archiving your logs to S3 to have them close. REDSHIFT_USERNAME Username to connect with psql into the cluster. REDSHIFT_PASSWORD Password to connect with psql into the cluster. S3_AWS_ACCESS_KEY_ID AWS key that has access to the S3 bucket you want to pull your logs from. We run the log analysis cluster in our AWS Sandbox account but pull the logs from our production AWS account so the Redshift cluster doesn’t impact production in any way. S3_AWS_SECRET_ACCESS_KEY AWS secret key that has access to the S3 bucket you want to pull your logs from. PORT=5439 Port to connect to with psql. CLUSTER_TYPE=single-node The cluster type can be single-node or multi-node. Multi-node clusters get auto-balanced which gives you more speed at a higher cost. NODE_TYPE Instance type that’s used for the nodes of the cluster. Check out the Redshift Documentation for details on the instance types and their differences. NUMBER_OF_NODES=10 Number of nodes when running in multi-mode. CLUSTER_IDENTIFIER=log-analysis DB_NAME=log-analysis S3_PATH=s3://your_s3_bucket/papertrail/logs/862693/dt=2015 Database format and failed loads When ingesting log statements into the cluster, make sure to check the amount of failed loads that are happening. You might have to edit the database format to fit to your specific log output style. You can debug this easily by creating a single-node cluster first that only loads a small subset of your logs and is very fast as a result. Make sure to have none or nearly no failed loads before you extend to the whole cluster. In case there are issues, check out the documentation of the copy command which loads your logs into the database and the parameters in the setup script for that. Example and benchmarks It’s a quick thing to set up the whole cluster and run example queries against it. For example, I’ll load all of our logs of the last nine months into a Redshift cluster and run several queries against it. I haven’t spent any time on optimizing the table, but you could definitely gain some more speed out of the whole system if necessary. It’s just fast enough already for us out of the box. As you can see here, loading all logs of May — more than 600 million log lines — took only 12 minutes on a cluster of 10 machines. We could easily load more than one month into that 10-machine cluster since there’s more than enough storage available, but for this post, one month is enough. After that, we’re able to search through the history of all of our applications and past servers through SQL. We connect with our psql client and send of SQL queries against the “events’ database. For example, what if we want to know how many build servers reported logs in May: loganalysis=# select count(distinct(source_name)) from events where source_name LIKE 'i-%'; count ------- 801 (1 row) So in May, we had 801 EC2 build servers running for our customers. That query took ~3 seconds to finish. Or let’s say we want to know how many people accessed the configuration page of our main repository (the project ID is hidden with XXXX): loganalysis=# select count(*) from events where source_name = 'mothership' and program LIKE 'app/web%' and message LIKE 'method=GET path=/projects/XXXX/configure_tests%'; count ------- 15 (1 row) So now we know that there were 15 accesses on that configuration page throughout May. We can also get all the details, including who accessed it when through our logs. This could help in case of any security issues we’d need to look into. The query took about 40 seconds to go though all of our logs, but it could be optimized on Redshift even more. Those are just some of the queries you could use to look through your logs, gaining more insight into your customers’ use of your system. And you et all of that with a setup that costs $2.50 an hour, can be shut down immediately, and recreated any time you need access to that data again. Conclusions Being able to search through and learn from your history is incredibly important for building a large infrastructure. You need to be able to look into your history easily, especially when it comes to security issues. With AWS Redshift, you have a great tool in hand that allows you to start an ad hoc analytics infrastructure that’s fast and cheap for short-term reviews. Of course, Redshift can do a lot more as well. Let us know what your processes and tools around logging, storage, and search are in the comments.
June 21, 2015
by Florian Motlik
· 1,446 Views
article thumbnail
How to to Backup Linux with Snapshots
While working on different web projects I have accumulated a large pool of tools and services to facilitate the work of developers, system administrators and DevOps One of the first challenges, that every developer faces at the end of each project is backup configuration and maintenance of media files, UGC, databases, application and servers' data (e.g. configuration files). Nowadays, there are a lot of solutions to make a snapshot backup of the entire server, and I decided to make a list of most convinient and really useful tools and services. rsync - http://linux.die.net/man/1/rsync Rsync is a fast and extraordinarily versatile file copying tool. It can copy locally, to/from another host over any remote shell, or to/from a remote rsync daemon. It offers a large number of options that control every aspect of its behavior and permit very flexible specification of the set of files to be copied. It's a build in Linux tool. Real hardcore =) rsnapshot - http://rsnapshot.org/ rsnapshot is a filesystem snapshot utility for making backups of local and remote systems. Using rsync and hard links, it is possible to keep multiple, full backups instantly available. The disk space required is just a little more than the space of one full backup, plus incrementals. Depending on your configuration, it is quite possible to set up in just a few minutes. Files can be restored by the users who own them, without the root user getting involved. Stackoverflow users recommended it to me couple of years ago and I thinks that is a really good solutions, which unites the best from rsync and Linux filesystem. Snapper - http://snapper.io/ Snapper is a tool for Linux filesystem snapshot management. Apart from the obvious creation and deletion of snapshots, it can compare snapshots and revert differences between snapshots. In simple terms, this allows root and non-root users to view older versions of files and revert changes. Allows you to configure schedule for the backups, automatically deletes old snapshots. The only sad thing is that Snapper has no updates since 2014. backup2l - http://backup2l.sourceforge.net/ backup2l is a lightweight command line tool for generating, maintaining and restoring backups on a mountable file system (e. g. hard disk). The main design goals are are low maintenance effort, efficiency, transparency and robustness. In a default installation, backups are created autonomously by a cron script. The script, that I've found on sourceforge and even used on couple of my projects 5 years ago. But it is not being updated since 2009. FlyBack - https://www.linuxlinks.com/flyback/ FlyBack is software for system backup and restore, which offers similar functionality to the Mac OS X Leopard's Time Machine. Linux has almost all of the required technology already built in to recreate it. FlyBack is a snapshot-based backup tool based on rsync. It creates successive backup directories mirroring the files users want to backup, but hard-links unchanged files to the previous backup. Plenty of settings, mostly build for desktop computers, simple UI. TimeVault - https://wiki.ubuntu.com/TimeVault TimeVault monitors files for changes and takes snapshots after some user-specified delay. It is a simple front-end for making snapshots of a set of directories. Snapshots are a copy of a directory structure or file at a certain point in time. They use very little space for files which have not changed since the last snapshot was made, as they use hard links that point to existing backups. TimeVault makes all the work silently in background and is fully automated solution. Currently it gets no updates but it was a good solution when it just released. Box Backup - http://www.boxbackup.org/ Box Backup is an open source, completely automatic, on-line backup system. A backup daemon runs on systems to be backed up, and copies encrypted data to the server when it notices changes - so backups are continuous and up-to-date (although traditional snapshot backups are possible too). All backed up data is stored on the server in files on a filesystem - no tape, archive or other special devices are required. BitCalm - https://bitcalm.com BitCalm makes it easy for web developers to set up backup of applications on Linux servers just in one minute. It is SaaS for server backups. After installing python client user can manage backups for files and even databases in web-interface. Service provides Amazon S3 as a storage and allows users to connect their own storage for backups. All backups are incremental. Service is built for servers and supports all popular Linux based OS: Ubuntu, Debian, CentOS, ArchLinux. To let user be calm, service sends daily reports and notifications. BitCalm allows to manage multiple backups in a single account and user can restore the backup to any server added to service.
June 11, 2015
by Tom Cooper
· 52,186 Views · 1 Like
article thumbnail
Mounting an EBS Volume to Docker on AWS Elastic Beanstalk
Mounting an EBS volume to a Docker instance running on Amazon Elastic Beanstalk (EB) is surprisingly tricky. The good news is that it is possible. I will describe how to automatically create and mount a new EBS volume (optionally based on a snapshot). If you would prefer to mount a specific, existing EBS volume, you should check out leg100’s docker-ebs-attach (using AWS API to mount the volume) that you can use either in a multi-container setup or just include the relevant parts in your own Dockerfile. The problem with EBS volumes is that, if I am correct, a volume can only be mounted to a single EC2 instance – and thus doesn’t play well with EB’s autoscaling. That is why EB supports only creating and mounting a fresh volume for each instance. Why would you want to use an auto-created EBS volume? You can already use a docker VOLUME to mount a directory on the host system’s ephemeral storage to make data persistent across docker restarts/redeploys. The only advantage of EBS is that it survives restarts of the EC2 instance but that is something that, I suppose, happens rarely. I suspect that in most cases EB actually creates a new EC2 instance and then destroys the old one. One possible benefit of an EBS volume is that you can take a snapshot of it and use that to launch future instances. I’m now inclined to believe that a better solution in most cases is to set up automatic backup to and restore from S3, f.ex. using duplicity with its S3 backend (as I do for my NAS). Anyway, here is how I got EBS volume mounting working. There are 4 parts to the solution: Configure EB to create an EBS mount for your instances Add custom EB commands to format and mount the volume upon first use Restart the Docker daemon after the volume is mounted so that it will see it (see this discussion) Configure Docker to mount the (mounted) volume inside the container 1-3.: .ebextensions/01-ebs.config: # .ebextensions/01-ebs.config commands: 01format-volume: command: mkfs -t ext3 /dev/sdh test: file -sL /dev/sdh | grep -v 'ext3 filesystem' # ^ prints '/dev/sdh: data' if not formatted 02attach-volume: ### Note: The volume may be renamed by the Kernel, e.g. sdh -> xvdh but # /dev/ will then contain a symlink from the old to the new name command: | mkdir /media/ebs_volume mount /dev/sdh /media/ebs_volume service docker restart # We must restart Docker daemon or it wont' see the new mount test: sh -c "! grep -qs '/media/ebs_volume' /proc/mounts" option_settings: # Tell EB to create a 100GB volume and mount it to /dev/sdh - namespace: aws:autoscaling:launchconfiguration option_name: BlockDeviceMappings value: /dev/sdh=:100 4.: Dockerrun.aws.json and Dockerfile: Dockerrun.aws.json: mount the host’s /media/ebs_volume as /var/easydeploy/share inside the container: { "AWSEBDockerrunVersion": "1", "Volumes": [ { "HostDirectory": "/media/ebs_volume", "ContainerDirectory": "/var/easydeploy/share" } ] } Dockerfile: Tell Docker to use a directory on the host system as /var/easydeploy/share – either a randomly generated one or the one given via the -m mount option to docker run: ... VOLUME ["/var/easydeploy/share"] ...
June 3, 2015
by Jakub Holý
· 14,757 Views
  • Previous
  • ...
  • 296
  • 297
  • 298
  • 299
  • 300
  • 301
  • 302
  • 303
  • 304
  • 305
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×