Deployment Resources

The Latest Deployment Topics

New Relic’s Docker Monitoring Now Generally Available

[This article was written by Andrew Marshall] We’ve been talking a lot about Docker over the past few weeks—with good reason. Docker’s explosive growth in popularity within the enterprise has enabled new distributed application architectures and with it a need for app-centric monitoring of your Docker containers within the context of the rest of your infrastructure. We’re thrilled to announce today that New Relic’s Docker monitoring is now generally available to New Relic customers, just in time for DockerCon 2015! (And as we noted last week, New Relic’s Docker monitoring solution has been selected by Docker for its Ecosystem Technology Partner program as a proven container monitoring solution.) Why app-centric monitoring? If you’re a software business using Docker containers, chances are you’ve done so to gain efficiencies from your system resources or portability across environments to shorten the cycle between writing and running code. Either way, adding Docker containers to your app development meant a new tier of infrastructure to monitor, which equated to a “black box” in your data—one that you had no visibility into from a monitoring perspective, Docker monitoring with New Relic is designed to “fix” this lack of monitoring visibility by adding an app-centric view of Docker containers to the existing New Relic Servers interface you already use. Now, instead of having a gap between the application and server monitoring views, we’ve added the ability to see containers with the same “first-class“ experience as you would with virtual machines and servers. You can now drill down from the application (which is really what you care about) to the individual Docker container, and then to the physical server. No more blind spots! As we strive to do with all of our products, we took the approach of “important” over “impressive” when it comes to the container information we provide to users. Based on direct feedback from customers, we’ve tried to take the mystery out of finding the right container to help you get back to developing your applications. As the way people use containers changes over time, we plan to continue to listen to our customers to help shape how we approach Docker container monitoring. Restoring 360-degree view of your application environment One example of how app-centric monitoring can impact a team moving to microservices or distributed application environments is Motus, a mobile workforce management company. Motus has been a New Relic customer for more than four years and recently has been shifting to a microservices architecture with approximately 95% of its production workload now running in Docker containers. While Docker helpd Motus gain speed and agility while reducing infrastructure complexity, the link between the application and what was happening with the container it was running on was broken. During the trial of New Relic’s Docker monitoring, Motus was able to more easily identify which container an app was running on, all the way down to the node. That was a big help when they needed to investigate an issue and determine if a new container was required.. During the beta alone, Motus estimates that using New Relic helped them to reduce the time to investigate and fix problems with its Docker containers by 30%! Motus isn’t just using New Relic to diagnose when a problem occurs. Docker monitoring with New Relic has helped Motus analyze and “right size” its containers for the application to better allocate resources for performance and budget. Get started with New Relic’s Docker monitoring today, for more information, please stop by our booth at DockerCon, June 22-23 in San Francisco! Resources: Motus Docker Monitoring Case Study Docker Monitoring with New Relic Enabling Docker Monitoring with New Relic Docker in the New Relic Community Forum

June 24, 2015

by Fredric Paul

· 1,019 Views

Display Android Device Screen on Fedora for Feedhenry Application

I wanted to display my Android screen on Fedora during a Summit presentation which includes Feedhenry. I found an easy way to mirror my android screen on Fedora so I can show it through the presentation device (TV, Projector, etc.). So I compiled the steps below from some different references. Download the latest Android SDK from Google: Android SDK Extract the TGZ file to your home/YOUR-USERNAME directory To get ADB, you need to install the SDK: Installing the SDK Run chmode on android in tools Run android under tools and then install the Android SDK Tools On your phone turn on Debugging in Developer Settings, click Settings > Developer Options turn on debugging and make sure USB Debugging is on. If you are running 64-bit then to run adb you will have to enable 32-bit # yum install glibc.i686 #yum install zlib.i686 libstdc++.i686 ncurses-libs.i686 libgcc.i686 You need to add a udev rules file that contains a USB configuration for each type of device you want to use for development. In the rules file, each device manufacturer is identified by a unique vendor ID, as specified by the ATTR{idVendor} property. For a list of vendor IDs, see USB Vendor IDs, To set up device detection on Linux: Log in as root and create this file: /etc/udev/rules.d/51-android.rules. Use this format to add each vendor to the file: SUBSYSTEM=="usb", ATTR{idVendor}=="xxxx", MODE="0666" [summit2015@localhost tools]$ cat /etc/udev/rules.d/51-android.rules SUBSYSTEM=="usb", ATTR{idVendor}=="22b8", MODE="0666" [summit2015@localhost tools]$ Note: The rule syntax may vary slightly depending on your environment. Consult the udevdocumentation for your system as needed. For an overview of rule syntax, see this guide towriting udev rules. Now execute: chmod a+r /etc/udev/rules.d/51-android.rules When plugged in over USB, you can verify that your device is connected by executing adb devices from your SDK platform-tools/ directory. If connected, you'll see the device name listed as a "device." [summit2015@localhost platform-tools]$ ./adb devices List of devices attached 0A3D267016016004 device [summit2015@localhost platform-tools]$ NOTE: I ran android update adb and adb server-start to test prior to the above command but these shouldn't be required Next I download Droid@Screen and then ran java -jar droidAtScreen-1.1.jar That's all that is required!

June 24, 2015

by Kenneth Peeples

· 1,778 Views

Percona XtraDB Cluster (PXC): How Many Nodes Do You Need?

Written by Stephane Combaudon. A question I often hear when customers want to set up a production PXC cluster is: “How many nodes should we use?” Three nodes is the most common deployment, but when are more nodes needed? They also ask: “Do we always need to use an even number of nodes?” This is what we’ll clarify in this post. This is all about quorum I explained in a previous post that a quorum vote is held each time one node becomes unreachable. With this vote, the remaining nodes will estimate whether it is safe to keep on serving queries. If quorum is not reached, all remaining nodes will set themselves in a state where they cannot process any query (even reads). To get the right size for you cluster, the only question you should answer is: how many nodes can simultaneously fail while leaving the cluster operational? If the answer is 1 node, then you need 3 nodes: when 1 node fails, the two remaining nodes have quorum. If the answer is 2 nodes, then you need 5 nodes. If the answer is 3 nodes, then you need 7 nodes. And so on and so forth. Remember that group communication is not free, so the more nodes in the cluster, the more expensive group communication will be. That’s why it would be a bad idea to have a cluster with 15 nodes for instance. In general we recommend that you talk to us if you think you need more than 10 nodes. What about an even number of nodes? The recommendation above always specifies odd number of nodes, so is there anything bad with an even number of nodes? Let’s take a 4-node cluster and see what happens if nodes fail: If 1 node fails, 3 nodes are remaining: they have quorum. If 2 nodes fail, 2 nodes are remaining: they no longer have quorum (remember 50% is NOT quorum). Conclusion: availability of a 4-node cluster is no better than the availability of a 3-node cluster, so why bother with a 4th node? The next question is: is a 4-node cluster less available than a 3-node cluster? Many people think so, specifically after reading this sentence from the manual: Clusters that have an even number of nodes risk split-brain conditions. Many people read this as “as soon as one node fails, this is a split-brain condition and the whole cluster stop working”. This is not correct! In a 4-node cluster, you can lose 1 node without any problem, exactly like in a 3-node cluster. This is not better but not worse. By the way the manual is not wrong! The sentence makes sense with its context. There could actually reasons why you might want to have an even number of nodes, but we will discuss that topic in the next section. Quorum with multiple data centers To provide more availability, spreading nodes in several datacenters is a common practice: if power fails in one DC, nodes are available elsewhere. The typical implementation is 3 nodes in 2 DCs: Notice that while this setup can handle any single node failure, it can’t handle all single DC failures: if we lose DC1, 2 nodes leave the cluster and the remaining node has not quorum. You can try with 4, 5 or any number of nodes and it will be easy to convince yourself that in all cases, losing one DC can make the whole cluster stop operating. If you want to be resilient to a single DC failure, you must have 3 DCs, for instance like this: Other considerations Sometimes other factors will make you choose a higher number of nodes. For instance, look at these requirements: All traffic is directed to a single node. The application should be able to fail over to another node in the same datacenter if possible. The cluster must keep operating even if one datacenter fails. The following architecture is an option (and yes, it has an even number of nodes!): Conclusion Regarding availability, it is easy to estimate the number of nodes you need for your PXC cluster. But node failures are not the only aspect to consider: Resilience to a datacenter failure can, for instance, influence the number of nodes you will be using.

June 24, 2015

by Peter Zaitsev

· 1,394 Views

The Best Mail Clients for Android : A Quick Review Report

Recently, I puzzled while looking for friendly email client for my Android device. And, the situation was not unique as I found all have their drawbacks. Let's look at the most popular e-mail clients for Android and try to find the best. As a result of wanderings in Google Play, I selected the six most popular Mail clients which are successful in my opinion. 1. Blue Mail Blue Mail is a popular e-mail client with support for a large number of postal services. It has very user-friendly interface and minimum setting - you must enter the mailbox password. After setting up the client immediately begins syncing - and after a few moments you will already see rows of folders with your letters. That's great - subfolders marked as investment, but, unfortunately, turn into the main folder cannot be all the folders displaying a single list. Appendix is perfectly adapted for touch screens - controlled via a mail is very convenient. It is possible to set the reminder time for a specific letter, the ability to customize the appearance and gestures, you can add a large number of accounts (different postal services, support for Exchange). Pros: free; There is no limit on the number of accounts (may, of course, is - but a lot; Not tied to a specific mail service; Convenient management (ie control gestures); Additional features (reminder filter to display messages in the application, configure the appearance of the application). Cons: You cannot collapse the subfolders to the main (if subfolders much, it turns out a long list, which is not convenient to work); The interface is not smooth - when scrolling and withdraw sidebars sometimes retarding. 2. myMail myMail is an excellent mail client aimed at an international audience. It’s offering quality services to the customers, minimum setting (no mail server, you need not). Completely free, not tied to a specific mail service, there is no limit to the number of connected accounts. This is management friendly and comfortable, but as such no control gestures. In the list of folders nested shifted to the right with respect to the others, in spite of that they cannot be minimized - quite convenient to use (unless, of course, subfolders are not too many). Even if the letters and added a lot of accounts - runs very smoothly. Unfortunately, there is no appearance settings, and the interface at the application seems specific enough (many may seem too "flashy"), the application has filters (for sorting mail). Pros: free; There is no limit on the number of accounts; Not tied to a specific mail service; Friendly visual interface, subfolders are shifted relative to the other; Smooth operation. Cons: Cannot be minimized subfolders in the main; Very bright interface without customization; when you reply to or forward the original formatting is lost letters. 3. Mailbox Mailbox is the popular mail client for Google and iCloud. After logging interface meets us in soothing and pleasing colors. The letters are in chronological order, depending on the strength and direction of pulling letters perform various actions - for example, just to delay the letter to the right - it is marked as executed, pull on - letter moves to the basket, if you pull the left - there also its actions. Managing convenient, but you get used - I, for example, first letter ever mistakenly moved to the Trash. It is possible to create reminders for time / date, there is an interesting sort mode analyzing user actions and, based on starts sorting the mail (of course you can create categories to sort). Application works smoothly, small braking when scrolling noticed only at the moment when there is synchronization. The app is counted in the list of best email apps for Android . Pros: Free; There is no limit on the number of accounts; Simple operation; Interesting feature of intellectual sorting mail and other options. Smoothly and fast. Cons: The ability to use only Google Accounts and iCloud; It takes time to get used to; it is impossible to beat in the category of writing more on some lines. 4. MailDroid MailDroid is a popular app with advertising and a large number of settings. On the first pages of the settings immediately struck by the quality of translation. Navigation is intuitive, but in my opinion not very convenient. Smooth operation is no different. MailDroid supports most popular email services, so configuration problems will not arise. Interface is made in soft blue and white, there are additional appearance settings. Default opens the folder "Inbox" in order to get to the other - need to go to a separate menu and there select the folder you want. The application has a large number of settings, it is possible to buy additional extension (for example, Spam plug-in), and it is possible to archive emails, have export / import settings and much more. Pros: Free; There is no limit on the number of accounts; Friendly controls; Many additional options / settings. Cons: Hype; Enlargement will have to re-buy for a fee; Despite the easy management interface, it seemed to me not very logical; It does not work smoothly. 5. Aqua Mail Aqua Mail is Conditionally free (free limited), very convenient mail client for Android. Free version of the application is limited - you can link only two e-mail accounts, plus all send letters added a signature with a link to the official website of the program. The full version can be unlocked for 4.95$ (the price at the beginning of 2015). The application meets us with a nice interface, something significant of the material design from Google, when you first turn on each step tooltips that help to understand the basic functions and features of the application. The interface is very smooth, good management is optimized for touch screens, is very convenient to work with a large number of letters. Working with folders organized as follows - the default application displays only one folder "Inbox", the others can be added manually (choose only those which are needed on your phone). There is an interesting feature, "Smart Folder" - to get the letters on any criteria require attention, naturally configured criteria (for example, letters received in the last X days). Pros: Convenient to work with a large number of letters; A variety of settings; A handy feature "Smart Folder"; Easy navigation; Smooth operation. Cons: The free version has some limitations. 6. CloudMagic CloudMagic is Simple e-mail client with the ability to password protection, a good alternative to the standard Android mail client. The application supports most of the international postal services. The controls are very simple, intuitive interface, it works smoothly. Folders are displayed all at once invested labeled as investment, but in the main folder, they do not collapse. Settings least of the features - the ability to enable password protection, longer any special settings there. pros: Free; There is no limit on the number of accounts; Simple and clear controls; It operates smoothly and quickly; The ability to password protect. cons: Few options / additional options.

June 24, 2015

by Deepak Raghav

· 1,237 Views

Git for Windows, Getting Invalid Username or Password with Wincred

if you use https to communicate with your git repository, es, github or visualstudioonline, you usually setup credential manager to avoid entering credential for each command that contact the server. with latest versions of git you can configure wincred with this simple command. git config --global credential.helper wincred this morning i start getting error while i’m trying to push some commits to github. $ git push remote: invalid username or password. fatal: authentication failed for 'https://github.com/proximosrl/jarvis.documents tore.git/' if i remove credential helper (git config –global credential.helper unset) everything works, git ask me for user name and password and i’m able to do everything, but as soon as i re-enable credential helper, the error returned. this problem is probably originated by some corruption of stored credentials, and usually you can simply clear stored credentials and at the next operation you will be prompted for credentials and everything starts worked again. the question is, where are stored credential for wincred? if you use wincred for credential.helper, git is storing your credentials in standard windows credential manager you can simply open credential manager on your computer, figure 1: credential manager in your control panel settings opening credential manager you can manage windows and web credentials. now simply have a look to both web credentials and windows credentials, and delete everything related to github or the server you are using. the next time you issue a git command that requires authentication, you will be prompted for credentials again and the credentials will be stored again in the store. gian maria.

June 23, 2015

by Ricci Gian Maria

· 20,969 Views

It's Time to Start Programming (for) Adults

This week we're in Boston at DevNation, an awesome, young (second ever), and relatively intimate (~500 attendees) conference on anything and everything hard-core, cool-and-hot (DevOps, big data, Angular, IoT, you name it), and of course -- since the conference is organized by Red Hat -- totally open-source. So far I've had in-depth conversations with five super-amazing engineers, attended several inspiring keynotes, and chatted with one skilled developer after another. We'll transcribe the deeper interviews shortly, including some on topics totally unrelated to this post. But meanwhile I'd like to offer some thoughts inspired by the first day of the event. The general theme is: we're just beginning to get serious about separation of concerns. The metaphor that keeps popping into my head comes from the first keynote: machines have finally grown up. Imperatives: telling really unintelligent agents what to do (and then they sort of do whatever they please) It is trivial to observe that computers are incredibly stupid. Turing's fundamental paper is about how to figure out whether a theoretical computer will keep calculating the values of a function until the heat death of the universe (okay that's a slight oversimplification). The fact that Edsger Dijkstra felt the need to rail gently against all goto statements in any higher-level language than machine code suggests that, in 1968, far too many computers needed instructions about how to read the instructions that tell them what to do in the first place. Richard Feynman's famous lecture on computer heuristics is the condescension of the man who conceived quantum computing to the level of functional composition (hmmm) and file systems (double sigh). Stupid agents need to be told exactly what to do. Then they need to be told to pay attention to the exact part of the command that tells them exactly what they have been told to do (dude, just goto line 1343 already and shut up). Then they don't do what you told them (optimistically we call this an 'exception'), and then you send them into time out / set a break point and try to figure out where the idiot state muted off the rails. They stare blankly at the wall / variable / register and either do nothing or repeat another unintelligibly wrong result until you notice that your increment is (apparently meaninglessly to you) one bracket too deep. You sigh and tell them what to do again, and after a while they hit age thirty (life-years/debug-hours) and maybe do something useful with their (process-)lives. Well, maybe I'm straining the metaphor a little here, but you get the point because it cuts too close to home. We spend far too much time fixing stupid mistakes that we didn't even know we were making because -- like all actual human beings -- we assumed that the agent we commanded will use their common sense to iron out those few whiffs of, admit it, frank nonsense that our step-by-step instructions will probably always contain. So, at least, goes the imperative programming paradigm. The machine does what you tell it to; and the universe collapses onto itself before the last real number is computed. Functions: reliable, predictable adults Time to give credit where it's due: I'm really just riffing on the metaphor Venkat Subramanian offered in his highly enjoyable keynote on The Joy of Functional Programming yesterday morning His not-so-smart agents -- the 'programmed' of imperative programming -- were toddlers. Since I don't have any kids, I can't presume to understand this experience fully (although I did grow up with three younger brothers..). But the general idea is: imperative programming is tricky because, when you spell everything out super literally, it's very hard to tell exactly why what you thought should happen didn't. Venkat's talk was a whirlwind of functional concepts, from the thrill of immutability to the self-evident utility of memoization. For random (Myers-Briggs?) reasons, the object-oriented paradigm never seemed very intuitive to me -- I've gravitated towards functional style even when the problem domain wasn't actually modeled very well by functions -- but Venkat's side-by-side implementations of simple calculations in OO and functional Java showed the readability delta very clearly. Functional code is beautiful because it looks like its purpose. It tells you flat-out: here is what I do; and then it does it. But immutable functions are also beautiful because they do exactly the same thing every time. I couldn't count on my two year old brother very much at all because given a certain input I had pretty much no idea what would come out. But we all count on our grown-up collaborators to output exactly what they should, given a definite input, predictably and reliably every time. Of course, people also do more than expected -- every intervention of intelligence is an injection of creativity, not generated by the definition of the function -- but at least they do what you need them to do and no less. Containers: grown-ups with good boundaries I'm picking out just one aspect of the resurgent 'joy' of functional programming because the renaissance of containerization (another 'old' technology that is just now really taking off) is, I think, a part of the same shift toward, let's say, treating computers as adults. If functions are reliable agents, then applications in well-defined containers are self-sufficient agents who know exactly what they need from others and neither require nor demand anything more. If apps on dedicated VMs are teenagers negotiating personal boundaries by waking/booting up independently (and taking far too long -- and far too many resources -- to do so, given their meager output) -- or bubble boys, isolated in ways that are unfortunate in order to isolate in ways that are absolutely necessary -- then containerized applications are subway-riders who jam into the train without offending anyone or campers who can live anywhere with just a backpack of just the stuff they need. Of course, subway-riders and campers do more than just not-mess-up. But what's kind of neat about containers is that -- like an adult with good boundaries -- clearly defined bounds and interfaces free up the application / mind to do whatever world-changing thing the developer / human has cooked up. I'll come back to this metaphor in a later article. (Mesh networks, SDN, and ad-hoc computing are all part of the same picture, I think. Kubernetes probably is too, along with event-driven and reactive programming, the actor model, dreams of Smalltalk, and of course REST, at least of the HATEOAS flavor.) But maybe this isn't a good way to think about some of these recent sparks in devworld within a single paradigm -- and maybe my perpetual discomfort with OO is influencing me too much. What do you think?

June 23, 2015

by John Esposito

· 1,942 Views · 1 Like

This Week In Modern Software: Inside Obama’s Geek Squad

[This article was written by Kevin Casey] Welcome to This Week in Modern Software, orTWiMS, New Relic’s weekly roundup of the need-to-know news, stories, and events of interest surrounding software analytics, cloud computing, application monitoring, development methodologies, programming languages, and the myriad of other issues that influence modern software. This week, our top story goes inside President Obama’s secret team of tech geeks, 140 of them and counting: TWiMS Top Story: Inside Obama’s Stealth Startup—Fast Company What it’s about:If the President of the United States walked into the room and personally recruited you to rebuild the country’s technology infrastructure, could you turn him down? He’s serious, and that room is theRoosevelt Room in the West Wing of the White House, by the way. AsLisa Gelobtersays: “What are you going to say that?” Gelobter’s answer was “Yes”—she’s now chief digital officer for the US Department of Education, part of a 140-person-and-counting tech team that’s functioning something like an elite startup embedded inside the federal government. Its business? Only modernizing the technical infrastructure, applications, and processes of just about every federal agency. Why you should care:What was once something of a tech desert—the federal government—is beginning to draw top private-sector talent inside the Beltway. The team, led by Mikey Dickerson (who helped lead the team that rescuedHealthcare.gov) andformer US CTO Todd Park, also includes the likes of former Googler Matthew Weaver, and it hopes to hit 500 people by the end 2016, shortly before President Obama will leave office. Its challenges are immense, from tackling government bureaucracy (to test just how entrenched the suits were, Weaver requested the official title “Rogue Leader”—and he got it) to the fact that its recruiting pitch includes the phrase: “You’ll have to take a pay cut.” But its mission is both noble and necessary, and the appeal of working on major problems with enormous public impacts appears to be working. Recommended reading. Further reading: Mikey Dickerson’s 10 Tips for Dealing with Bureaucracy—New Relic Blog [Video] Airbnb Open Sources Software to Lure Talent Amid ‘Insane’ Competition—CIO Journal What it’s about:Airbnb added three new apps to its open source portfolio earlier this month, but the motivation wasn’t just trying to give employees the best business tools or contribute to the software community at large. Sure, that might have been part of the equation, but the rental booking site hopes open-sourcing some of its toolkit will help recruit the best software talent in the face of what director of engineeringMike Curtiscalls “insane” competition in the Silicon Valley labor market. Why you should care:In the software arms race, any little edge counts. Curtis tellsCIO Journalthat Airbnb will keep the proprietary stuff closely guarded, of course. But it will open source “generic” tools with wider industry use cases, such as its recently releasedAerosolvemachine-learning package and itsAirpalcloud-based data querying tool. The latter, which works with Facebook’s open sourcePrestoDB, aims to simplify SQL queries to the point where you don’t need to be a big data wonk or business intelligence guru to run it. Indeed, one in three Airbnb employees have run a query on it in the year since it launched. Airbnb has contributed a dozen open source tools on its aptly namedNerds site(gotta love that!) to date, something the company hopes both contributes to greater good but also advertises its software innovation to potential hires. Google Is Wielding Its Own Secret Weapon in the Cloud—The New York Times What it’s about:In thecutthroat competitionfor public cloud business, Google may be its own best customer testimonial. In advance of this week’sOpen Network Summit, theTimes’Bits bloglooked at Google’s plan to not only unveil cloud customers such as HTC but reveal much more than ever before about its own infrastructure. Google did just that on Wednesday, offering a look inside itsdata center networking, including its massive-capacity, lightning-fast Jupiter network. Why you should care:As major cloud players continue to zap prices with their shrink-rays, it’s increasingly clear that features and underlying platforms will distinguish one from the other when enterprise users make their pick. Google is taking a big step toward writing its own story in this regard, and the synopsis might read something like: “We’re pretty good at this stuff.” Its Jupiter fabrics deliver 1 petabit per second of bisection bandwidth, according to Google, or “enough for 100,000 servers to exchange information at 10Gb/s each, enough to read the entire scanned contents of the Library of Congress in less than 1/10th of a second.” If it sounds like a bit of bragging, well, yeah—it is. But it’s bragging with a purpose: Attracting devs who want access to the same technology without having to build it themselves.Google’s Amin Vahdat connected the dots in a blog post: “The same networks that power all of Google’s internal infrastructure and services also power Google Cloud Platform.” Move Over, Meeker: Byron Deeter’s State of the Cloud Report—Bessemer Venture Partners What it’s about:With a nod to Mary Meeker’s classicState of the Internet report,Bessemer Venture Partners’Byron Deeterchecks in with his 2015 State of the Cloud Report. Given cloud computing’s relative youth and rampant ascension, it’s no surprise the stats are staggering. Here’s one to start: Cloud revenues have increased tenfold in the last six years, from a scant $5.6 billion in 2008 to more than $56 billion in 2014. And it’s going to double again in the next four years, according to BVP’s projections, to $127.5 billion in 2018. Why you should care:Deeter’s full presentation is worth a weekend watch or read, but it’s the forward-looking slides that may be most compelling for software pros. Deeter notes both the immense risks and opportunities in cloud security, unveiling a 10-point security plan for cloud startups on slide 37. To underscore the security landscape, Deeter quotes an unnamed cloud CEO who says aDDoSattack that took down the firm’s API caused more customer churn in one day than in the rest of its history. Wow. He also addresses the exploding market for cloud services built specifically for developers including, yes, New Relic. And for mobile developers, slide 44 underscores something we’ve talked about before in this space:the real money’s in enterprise apps, and it’s still a largely untapped market. Click through thefull slide deck hereorwatch video of Deeter’s presentation here. Bandwidth: The Next Frontier of Cloud Computing—ZDnet What it’s about:Is networking the next big thing in the everything-as-a-service age? It just might be, as firms likePacnetvie to deliver networking capacity on a pay-for-what-you-use model that some industry folks say better suits cloud environments facing significant but uneven networking needs. Why you should care:As author Drew Turney notes, there’s a common blind spot when it comes to cloud computing’s many shapes and sizes: Moving all that data from points A to Z, and everywhere in between, which can cause both performance problems and undue financial pressures. The promise of Networking-as-a-Service (NaaS), industry execs tell Turney, is that it can provide more efficient, scalable networking for short-term usage bursts such as customer traffic spikes or large cloud backup-and-storage jobs, enabling companies to later dial down their capacity as needed. Combined withSoftware-Defined Networking (SDN),NaaS makes it possible to build intelligent applications that manage their own networking needs, which might be the most significant enterprise potential of NaaS, saysNuage NetworksarchitectMarten Hauville. Page Bloat: Average Web Page Now More Than 2MB—The Performance Beacon (SOASTA) What it’s about:Do you need to put your website on a diet? Apparently so: The average Web page topped 2 MB as of May 2015, according to ongoing tracking atThe Performance Beacon. That’s double the average page weight from just three years ago. The site projects average page weight will exceed 3 MB in late 2017. Why you should care:Performance, performance, performance:Slow speedsare a killerin the modern software era. While author andSOASTAUX evangelistTammy Evertsrightly notes that page weight is not the only factor in Web optimization, we’re simply not paying it enough attention when designing and building Web pages. Images are the big culprit in the Web’s expanding waistline: they comprise nearly two-thirds of the average page’s weight, and video is a growing part of our Web diet, too. But other factors such as custom fonts play a role, adding weight even as the Web sheds previous performance hogs like Flash. The ideal weight? 1 MB, she says, which will save crucial seconds in load times. Sounds like it’s time to hit the virtual treadmill.

June 23, 2015

by Fredric Paul

· 1,075 Views

Devnation Keynote 6/22 #2: The Future of Development with Kubernetes and Docker

From the DevNation Agenda site: You've probably heard a lot about Linux containers and the exciting potential they hold. In this presentation, Matt Hicks will cover how Docker and Kubernetes have evolved to fundamentally change how you will approach development and operations. If you are looking for an understanding of the technology and how it relates to the common roles in IT today, this is the talk to watch. Speaker: Matt Hicks -- Vice President of engineering, Red Hat Matt Hicks is a founding member of the OpenShift by Red Hat team. He has spent more than a decade in software engineering, with a variety of roles in development, operations, architecture, and management. His real expertise is in bridging the gap between developing code and actually running it in production. An expert in IT and cloud-based architectures, he spends his time these days evolving OpenShift to use the power of cloud and make developers more productive.

June 22, 2015

by N A

· 1,090 Views · 2 Likes

Techfor.us

Welcome to Useful PC Guide, we are covering latest technology news with many topics on computing, mobile, programming, technology, computer games, games, mobile games, Apple iOS, and Android apps as well as online tutorials, guides and how-to articles. UsefulPCGuide.com website also regularly updates new Windows OS tips and tricks to resolve your problems, as well as iOS and Android issues. You can read an example tutorial from us about how to fix your connection is not private error in Google Chrome in Windows OS. This guide will help you to learn more about causes of this error, and appropriate ways to troubleshoot the issues on your Chrome browser. Most of our tips and tricks are include images and very easy to read and follow up the instructions. Visit usefulguide.com for more good news and tutorials.

June 21, 2015

by Alize Camp

· 1,087 Views

Long-Term Log Analysis with AWS Redshift

You will aggregate a lot of logs over the lifetime of your product and codebase, so it’s important to be able to search through them. In the rare case of a security issue, not having that capability is incredibly painful. You might be able to use services that allow you to search through the logs of the last two weeks quickly. But what if you want to search through the last six months, a year, or even further? That availability can be rather expensive or not even an option at all with existing services. Many hosted log services provide S3 archival support which we can use to build a long-term log analysis infrastructure with AWS Redshift. Recently I’ve set up scripts to be able to create that infrastructure whenever we need it at Codeship. AWS Redshift AWS Redshift is a data warehousing solution by AWS. It has an easy clustering and ingestion mechanism ideal for loading large log files and then searching through them with SQL. As it automatically balances your log files across several machines, you can easily scale up if you need more speed. As I said earlier, looking through large amounts of log files is a relatively rare occasion; you don’t need this infrastructure to be around all the time, which makes it a perfect use case for AWS. Setting Up Your Log Analysis Let’s walk through the scripts that drive our long-term log analysis infrastructure. You can check them out in the flomotlik/redshift-logging GitHub repository. I’ll take you step by step through configuring the whole setup of the environment variables needed, as well as starting the creation of the cluster and searching the logs. But first, let’s get a high-level overview of what the setup script is doing before going into all the different options that you can set: Creates an AWS Redshift cluster. You can configure the number of servers and which server type should be used. Waits for the cluster to become ready. Creates a SQL table inside the Redshift cluster to load the log files into. Ingests all log files into the Redshift cluster from AWS S3. Cleans up the database and prints the psql access command to connect into the cluster. Be sure to check out the script on GitHub before we go into all the different options that you can set through the .env file. Options to set The following is a list of all the options available to you. You can simply copy the .env.template file to .env and then fill in all the options to get picked up. AWS_ACCESS_KEY_ID AWS key of the account that should run the Redshift cluster. AWS_SECRET_ACCESS_KEY AWS secret key of the account that should run the Redshift cluster. AWS_REGION=us-east-1 AWS region the cluster should run in, default us-east-1. Make sure to use the same region that is used for archiving your logs to S3 to have them close. REDSHIFT_USERNAME Username to connect with psql into the cluster. REDSHIFT_PASSWORD Password to connect with psql into the cluster. S3_AWS_ACCESS_KEY_ID AWS key that has access to the S3 bucket you want to pull your logs from. We run the log analysis cluster in our AWS Sandbox account but pull the logs from our production AWS account so the Redshift cluster doesn’t impact production in any way. S3_AWS_SECRET_ACCESS_KEY AWS secret key that has access to the S3 bucket you want to pull your logs from. PORT=5439 Port to connect to with psql. CLUSTER_TYPE=single-node The cluster type can be single-node or multi-node. Multi-node clusters get auto-balanced which gives you more speed at a higher cost. NODE_TYPE Instance type that’s used for the nodes of the cluster. Check out the Redshift Documentation for details on the instance types and their differences. NUMBER_OF_NODES=10 Number of nodes when running in multi-mode. CLUSTER_IDENTIFIER=log-analysis DB_NAME=log-analysis S3_PATH=s3://your_s3_bucket/papertrail/logs/862693/dt=2015 Database format and failed loads When ingesting log statements into the cluster, make sure to check the amount of failed loads that are happening. You might have to edit the database format to fit to your specific log output style. You can debug this easily by creating a single-node cluster first that only loads a small subset of your logs and is very fast as a result. Make sure to have none or nearly no failed loads before you extend to the whole cluster. In case there are issues, check out the documentation of the copy command which loads your logs into the database and the parameters in the setup script for that. Example and benchmarks It’s a quick thing to set up the whole cluster and run example queries against it. For example, I’ll load all of our logs of the last nine months into a Redshift cluster and run several queries against it. I haven’t spent any time on optimizing the table, but you could definitely gain some more speed out of the whole system if necessary. It’s just fast enough already for us out of the box. As you can see here, loading all logs of May — more than 600 million log lines — took only 12 minutes on a cluster of 10 machines. We could easily load more than one month into that 10-machine cluster since there’s more than enough storage available, but for this post, one month is enough. After that, we’re able to search through the history of all of our applications and past servers through SQL. We connect with our psql client and send of SQL queries against the “events’ database. For example, what if we want to know how many build servers reported logs in May: loganalysis=# select count(distinct(source_name)) from events where source_name LIKE 'i-%'; count ------- 801 (1 row) So in May, we had 801 EC2 build servers running for our customers. That query took ~3 seconds to finish. Or let’s say we want to know how many people accessed the configuration page of our main repository (the project ID is hidden with XXXX): loganalysis=# select count(*) from events where source_name = 'mothership' and program LIKE 'app/web%' and message LIKE 'method=GET path=/projects/XXXX/configure_tests%'; count ------- 15 (1 row) So now we know that there were 15 accesses on that configuration page throughout May. We can also get all the details, including who accessed it when through our logs. This could help in case of any security issues we’d need to look into. The query took about 40 seconds to go though all of our logs, but it could be optimized on Redshift even more. Those are just some of the queries you could use to look through your logs, gaining more insight into your customers’ use of your system. And you et all of that with a setup that costs $2.50 an hour, can be shut down immediately, and recreated any time you need access to that data again. Conclusions Being able to search through and learn from your history is incredibly important for building a large infrastructure. You need to be able to look into your history easily, especially when it comes to security issues. With AWS Redshift, you have a great tool in hand that allows you to start an ad hoc analytics infrastructure that’s fast and cheap for short-term reviews. Of course, Redshift can do a lot more as well. Let us know what your processes and tools around logging, storage, and search are in the comments.

June 21, 2015

by Florian Motlik

· 1,451 Views

Spring XD 1.2 GA, Spring XD 1.1.3 and Flo for Spring XD Beta Released

Written by Mark Pollack. Today, we are pleased to announce the general availability of Spring XD 1.2, Spring XD 1.1.3 and the release of Flo for Spring XD Beta. 1.2.0.GA: zip 1.1.3.RELEASE: zip Flo for Spring XD Beta You can also install XD 1.2 using brew and rpm The 1.2 release includes a wide range of new features and improvements. The release journey was an eventful one, mainly due to Spring XD’s popularity with so many different groups, each with their respective request priorities. However the Spring XD team rose to the challenge and it is rewarding to look back and review the amount of innovation delivered to meet our commitments toward simplifying big data complexity. Here is a summary of what we have been busy with for the last 3 months and the value created for the community and our customers. Flo for Spring XD and UI improvements Flo for Spring XD is an HTML5 canvas application that runs on top of the Spring XD runtime, offering a graphical interface for creation, management and monitoring streaming data pipelines. Here is a short screencast showing you how to build an advanced stream definition. You can browse the documentation for additional information and links to additional screen casts of Flo in action. The XD admin screen also includes a new Analytics section that allows you to easily view gauges, counters, field-value counters and aggregate counters. Performance Improvements Anticipating increased high-throughput and low-latency IoT requirements, we’ve made several performance optimizations within the underlying message-bus implementation to deliver several million messages per second transported between Spring XD containers using Kafka as a transport. With these optimizations, we are now on par with the performance from Kafka’s own testing tools. However, we are using the more feature rich Spring Integration Kafka client instead of Kafka’s high level consumer library. For anyone who is interested in reproducing these numbers, please refer to the XD benchmarking blog, which describes the tests performed and infrastructure used in detail. Apache Ambari and Pivotal HD To help automate the deployment of Spring XD on an Apache HadoopⓇ cluster, we added an Apache AmbariⓇ plugin for Spring XD. The plugin is supported on both Pivotal HD 3.0 and Hortonworks HDP 2.2 distributions. We also added support in Spring XD for Pivotal HD 3.0, bringing the total number of Hadoop versions supported to five. New Sources, Processors, Sinks, and Batch Jobs One of Spring XD’s biggest value propositions is its complete set of out-of-the-box data connectivity adapters that can be used to create real-time and batch-based data pipelines, and these require little to no user-code for common use-cases. With the help of community contributions, we now have MongoDB, VideCap, and FTP as source modules, an XSLT-transformer processor, and FTP sink module. The XD team also developed a Cassandra sink and a language-detection processor. Recognizing the important role in the Pivotal Big Data portfolio, we have also added native integration with Pivotal Greenplum Database and Pivotal HAWQ through gpfdist sink for real-time streaming and also support for gpload based batch jobs. Adding to our developer productivity theme and the use of Spring XD in production for high-volume data ingest use-cases, we are delighted to recognize Simon Tao and Yu Cao (EMC² Office of The CTO & Labs China), who have been operationalizing Spring XD data pipelines in production since 2014 and also for the VideCap source module contribution. Their use-case and implementation specifics (in their own words) are below. “There are significant demands to extract insights from large magnitude of unstructured video streams for the video surveillance industry. Prior to being analyzed by data scientists, the video surveillance data needs to be ingested in the first place. To tackle this challenge, we built a highly scalable and extensible video-data ingestion platform using Spring XD. This platform is operationally ready to ingest different kinds of video sources into a centralized Big Data Lake. Given the out-of-the-box features within Spring XD, the platform is designed to allow rich video content processing capabilities such as video transcoding and object detection, etc. The platform also supports various types of video sources—data processors and data exporting destinations (e.g. HDFS, Gemfire XD and Spark)—which are built as custom modules in Spring XD and are highly reusable and composable. With a declarative DSL, a video ingestion stream will be handled by a video ingestion pipeline defined as Directed Acyclic Graph of modules. The pipeline is designed to be deployed in a clustered environment with upstream modules transferring data to downstream ones efficiently via the message bus. The Spring-XD distributed runtime allows each module in the pipeline to have multiple instances that run in parallel on different nodes. By scaling out horizontally, our system is capable of supporting large scale video surveillance deployment with high volume of video data and complex data processing workloads.” Custom Module Registry and HA Support Though we have had the flexibility to configure shared network location for distributed availability of custom modules (via: xd.customModule.home), we also recognized the importance of having the module-registry resilient under failure scenarios—hence, we have an HDFS backed module registry. Having this setup for production deployment provides consistent availability of custom module bits and the flexibility of choices, as needed by the business requirements. Pivotal Cloud Foundry Integration Furthering the Pivotal Cloud Foundry integration efforts, we have made several foundation-level changes to the Spring XD runtime, so we are able to run Spring XD modules as cloud-native Apps in Lattice and Diego. We have aggressive roadmap plans to launch Spring XD on Diego proper. While studying Diego’s Receptor API (written in Go!), we created a Java Receptor API, which is now proposed to Cloud Foundry for incubation. Next Steps We have some very interesting developments on the horizon. Perhaps the most important, we will be launching new projects that focus on message-driven and batch-oriented “data microservices”. These will be built directly on Spring Boot as well as Spring Integration and Spring Batch, respectively. Our main goal is to provide the simplest possible developer experience for creating cloud-native, data-centric microservice apps. In turn, Spring XD 2.0 will be refactored as a layer above those projects, to support the composition of those data microservices into streams and jobs as well as all of the “as a service” aspects that it provides today, but it will have a major focus on deployment to Cloud Foundry and Lattice. We will be posting more on these new projects soon, so stay tuned! Feedback is very important, so please get in touch with questions and comments via * StackOverflowspring-xd tag * Spring JIRA or GitHub Issues Editor’s Note: ©2015 Pivotal Software, Inc. All rights reserved. Pivotal, Pivotal HD, Pivotal Greenplum Database, Pivotal Gemfire and Pivotal Cloud Foundry are trademarks and/or registered trademarks of Pivotal Software, Inc. in the United States and/or other countries. Apache, Apache Hadoop, Hadoop and Apache Ambari are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All Posts Engineering Releases News and Events

June 21, 2015

by Pieter Humphrey

· 3,701 Views

Why We Need Continuous Integration

Introduction Continuous integration is a practice that helps developers deliver better software in a more reliable and predictable manner. This article deals with the problems developers face while writing, testing and delivering software to end users. Through exploring continuous integration, we will cover how we can overcome these issues. The Problem First, we will take a look at the source of the problem, which lies in the software development cycle. Next, we will cover some of the change conflicts that can take place during that process, and finally we will explore the main factors that can make these problems escalate, followed by an explanation of how continuous integration solves these issues. The Source of the Problem Let's take a look at what a traditional software development cycle looks like. Each developer gets a copy of the code from the central repository. The starting point is usually the latest stable version of the application. All developers begin at the same starting point, and work on adding a new feature or fixing a bug. Each developer makes progress by working on their own or in a team. They add or change classes, methods and functions, shaping the code to meet their needs, and eventually they complete the task they were assigned to do. Meanwhile, the other developers and teams continue working on their own tasks, changing the code or adding new code, solving the problems they have been assigned. If we take a step back and look at the big picture, i.e. the entire project, we can see that all developers working on a project are changing the context for the other developers as they are working on the source code. As teams finish their tasks, they copy their code to the central repository. There are two scenarios that can take place at this point. The code in the central repository is unchanged The code is the same as the initial copy. If this is the case, things are simple, because the system is unchanged. All the ideas we had about the system still stand. This is always the case if you are the only developer working on the application and if you have finished your work before the other members of your team. Either way, things are looking good for you. The system you have created and tested can be delivered to users without additional changes. The code in the central repository has changed The second scenario is that the application you have been working on has changed, and you discover this at the point when you try to copy your code over to the central repository. Changes in the code may or may not be in conflict with the ones you've made. If there are conflicts, you need to resolve them in order to be able to successfully deliver your code to the users. In this case, things could get complicated. Next, we'll explore the types of conflicts that can happen and what you may need to do to resolve them. Change Conflicts There are several types of change conflicts that can occur when integrating code. Here are some of the most common ones. We'll start with the simplest scenarios, and gradually explore the more complex ones. The implementation details have changed - You refactored a method, but so did the developer that has already integrated their code into the central repository. The behavior of the method is the same in all three implementations. You will need to pick the version that will stay, and remove the other implementations. You can even come up with a fourth implementation. This is a simple type of conflict, which you can usually resolve within a few minutes. The APIs you have been relying on have changed - For instance, the behavior of a certain method has changed. This could affect your code in a number of ways — from minor changes that you might need to make, to major structural changes. There is no silver bullet in such cases. You will need to carefully study the changes and make all the fixes. An entire subsystem of the application behaves in a different way - in such cases you will almost certainly be facing a partial, if not a full rewrite of your solution. If this is the case, you will probably need to speak with all the developers working on the application, because such a significant change should not happen without letting the rest of the team know about it. These and a number of other issues could come up, caused by various factors. Different versions of frameworks, libraries, databases are another potential source of conflicts. Once you have updated your code so it can be compiled or interpreted, you also need to remember to repeat all the tests that you have previously ran. These examples show that the amount of work needed to solve a problem that was initially assigned to a developer can easily double. Escalating Factors Here are some of the main factors that can make these problems escalate. The size of the team working on the project. The number of changes that are being pushed back into the main repository is proportional to the number of people on the project. This makes the process of integrating code into the main repository significantly harder. The amount of time passed since the developer got the latest version of the code from the central repository. As time passes, other people working on the same project are integrating more and more of their work, and changing the context in which your code needs to run. Sometimes the changes in the main repository are so big that it's easier to do a complete rewrite of your solution. A large number of changes in the system make integration events more complex and can have a huge effect on the productivity of the team. Such situations are even referred to as "integration hell". This process has a number of other negative consequences for your business. Testing and fixing bugs can take forever. Your releases are running late. Teams are stressed out because of long and unpredictable release cycles, and morale deteriorates. Solution: Integrate Continuously The solution to the problem of managing a large number of changes in big integration events is conceptually simple. We need to split these big integration events into much smaller integration events. This way, developers need to deal with a much smaller number of changes, which are easier to understand and manage. To keep integration events small and easily manageable, we need them to happen often. A couple of times a day is ideal. The practice of doing small integrations often is called Continuous Integration. The idea is simple, but at the same time it often appears to be impossible to implement in practice. This is because changing the process requires us to change some of our own habits, and changing habits is difficult. The Practice of Continuous Integration In order to avoid the previously described issues, developers need to integrate their partially complete work back into the main repository on a daily basis, or even a couple of times a day. To accomplish this, they first need to pull in all the changes added to the main repository while they were working on the code. They also must make sure that their code will work once it is integrated into the main repository. The only way to ensure this is to test every feature of the application. What first comes into mind when we start considering continuous integration is that the developers would need to spend half of their time every day testing the code in order not to break the code in the main repository for everyone else. This is why the prerequisite for continuous integration is having an automated test suite. Automated tests take away the burden of the manual, repetitive, and error-prone testing process from the developers. They also make the entire testing process much quicker. A computer can replace hours of manual testing with just minutes of automated testing. Behavior-driven and test-driven development are techniques that help developers write clean, maintainable code while writing tests at the same time. Testing techniques are out of the scope of this article, and you can read more about them in other articles on Semaphore Community. Tests make sense only if they are executed every time the source code changes, without exception. A continuous integration service such as Semaphore CI is a tool which can automate this process by monitoring the central code repository and running tests on every change in the source code. Apart from running tests, they also collect test results and communicate those results to the entire team working on the project. The result of continuous integration is so important that many teams have a rule to stop working on their current task if the version in the central repository is broken. They join the team which is working on fixing the code until tests are passing again. The role of a continuous integration service is to improve the communication between developers by communicating the status of a project's source code. How to Adopt Continuous Integration Continuous integration as a practice makes a big contribution to improving the development process, but also calls for essential changes in the everyday development routine. Adopting it comes with challenges that are easy to overcome if the process is introduced gradually. One of the biggest challenges teams face is the lack of an automated testing suite. A good recipe for overcoming this situation is to start adding automated tests for all new features as they are being developed. At the same time, the developer working on a bug fix should also work to cover the related code with tests. Whenever a bug is reported, the team should first write a failing test to demonstrate the existence of bug. Once the fix is created, the tests should pass. Over time, the automated tests suite gradually becomes more comprehensive, and the developers begin relying on it more and more. Adopting a continuous integration service to communicate the status of the tests to the entire team in the early stages of a project is also important, because it raises awareness of the project status among team members. Conclusion Introducing continuous integration and automated testing into the development process changes the way software is developed from the ground up. It requires effort from all team members, and a cultural shift in the organization. Big changes in the workflow are not easy to pull off quickly. Changes have to be introduced gradually, and all team members and stakeholders need to be on board with the idea. Educating team members about the practice of continuous integration practice and building the automated tests suite needs to be done systematically. Once the first steps have been taken, the process usually continues on its own, as both developers and stakeholders begin seeing the benefits of automated testing suites and the peace of mind that this practice brings to the entire team. Article originally posted on the Semaphore Community.

June 20, 2015

by Darko Fabijan

· 1,197 Views

Building Microservices: Using an API Gateway

Learn about using the microservice architecture pattern to build microservices and API gateways--compared to the usage of monolithic application architecture.

June 16, 2015

by Patrick Nommensen

· 121,125 Views · 40 Likes

Why 12 Factor Application Patterns, Microservices and CloudFoundry Matter (Part 2)

Learn why 12 Factor Application Patterns, Microservices and CloudFoundry matter when trying to change the way your product is produced.

June 12, 2015

by Tim Spann

CORE

· 15,656 Views · 4 Likes

Spring Integration Tests with MongoDB Rulez

Spring integration tests allow you to test functionality against a running application. This article shows proper database set- and clean-up with MongoDB.

June 10, 2015

by Ralf Stuckert

· 21,470 Views · 2 Likes

Easy SQLite on Android with RxJava

Whenever I consider using an ORM library on my Android projects, I always end up abandoning the idea and rolling my own layer instead for a few reasons: My database models have never reached the level of complexity that ORM’s help with. Every ounce of performance counts on Android and I can’t help but fear that the SQL generated will not be as optimized as it should be. Recently, I started using a pretty simple design pattern that uses Rx to offer what I think is a fairly simple way of managing your database access with RxJava. Easy reads One of the important design principles on Android is to never perform I/O on the main thread, and this obviously applies to database access. RxJava turns out to be a great fit for this problem. I usually create one Java class per table and these tables are then managed by my SQLiteOpenHelper. With this new approach, I decided to extend my use of the helper and make it the only point of access to anything that needs to read or write to my SQL tables. Let’s consider a simple example: a USERS table managed by the UserTable class: // MySqliteOpenHelper.java Observable> getUsers(String userId) { return makeObservable(mUserTable.getUsers(getReadableDatabase(), userId)) .subscribeOn(Schedulers:io()) } The problem with this method is that if you’re not careful, you will call it on the main thread, so it’s up to the caller to make sure they are always invoking this method on a background thread (and then to post their UI update back on the main thread, if they are updating the UI). Instead of relying on managing yet another thread pool or, worse, using AsyncTask, we are going to rely on RxJava to take care of the threading model for us. Let’s rewrite this method to return a callable instead: // MySqliteOpenHelper.java private static Observable makeObservable(final Callable func) { return Observable.create( new Observable.OnSubscribe() { @Override public void call(Subscriber subscriber) { try { subscriber.onNext(func.call()); } catch(Exception ex) { Log.e(TAG, "Error reading from the database", ex); } } }); } In effect, we simply refactored our method to return a lazy result, which makes it possible for the database helper to turn this result into an Observable: // MySqliteOpenHelper.java Observable> getUsers(String userId) { return makeObservable(mUserTable.getUsers(getReadableDatabase(), userId)) .subscribeOn(Schedulers:io()) } Notice that on top of turning the lazy result into an Observable, the helper forces the subscription to happen on a background thread (the IO thread here, since we’re accessing the database). This guarantees that callers don’t have to worry about ever blocking the main thread. Finally, the makeObservable method is pretty straightforward (and completely generic): // MySqliteOpenHelper.java private static Observable makeObservable(final Callable func) { return Observable.create( new Observable.OnSubscribe() { @Override public void call(Subscriber subscriber) { try { subscriber.onNext(func.call()); } catch(Exception ex) { Log.e(TAG, "Error reading from the database", ex); } } }); } At this point, all our database reads have become observables that guarantee that the queries run on a background thread. Accessing the database is now pretty standard Rx code: // DisplayUsersFragment.java @Inject MySqliteOpenHelper mDbHelper; // ... mDbHelper.getUsers(userId) .observeOn(AndroidSchedulers.mainThread()) .subscribe(new Action1>()) { @Override public void onNext(List users) { // Update our UI with the users } } } And if you don’t need to update your UI with the results, just observe on a background thread. Since your database layer is now returning observables, it’s trivial to compose and transform these results as they come in. For example, you might decide that your ContactTable is a low layer class that should not know anything about your model (the User class) and that instead, it should only return low level objects (maybe a Cursor or ContentValues). Then you can use use Rx to map these low level values into your model classes for an even cleaner separation of layers. Two additional remarks: Your Table Java classes should contain no public methods: only package protected methods (which are accessed exclusively by your Helper, located in the same package) and private methods. No other classes should ever access these Table classes directly. This approach is extremely compatible with dependency injection: it’s trivial to have both your database helper and your individual tables injected (additional bonus: with Dagger 2, your tables can have their own component since the database helper is the only refence needed to instantiate them). This is a very simple design pattern that has scaled remarkably well for our projects while fully enabling the power of RxJava. I also started extending this layer to provide a flexible update notification mechanism for list view adapters (not unlike what SQLBrite offers), but this will be for a future post. This is still a work in progress, so feedback welcome!

June 4, 2015

by Cedric Beust

· 16,230 Views

Mounting an EBS Volume to Docker on AWS Elastic Beanstalk

Mounting an EBS volume to a Docker instance running on Amazon Elastic Beanstalk (EB) is surprisingly tricky. The good news is that it is possible. I will describe how to automatically create and mount a new EBS volume (optionally based on a snapshot). If you would prefer to mount a specific, existing EBS volume, you should check out leg100’s docker-ebs-attach (using AWS API to mount the volume) that you can use either in a multi-container setup or just include the relevant parts in your own Dockerfile. The problem with EBS volumes is that, if I am correct, a volume can only be mounted to a single EC2 instance – and thus doesn’t play well with EB’s autoscaling. That is why EB supports only creating and mounting a fresh volume for each instance. Why would you want to use an auto-created EBS volume? You can already use a docker VOLUME to mount a directory on the host system’s ephemeral storage to make data persistent across docker restarts/redeploys. The only advantage of EBS is that it survives restarts of the EC2 instance but that is something that, I suppose, happens rarely. I suspect that in most cases EB actually creates a new EC2 instance and then destroys the old one. One possible benefit of an EBS volume is that you can take a snapshot of it and use that to launch future instances. I’m now inclined to believe that a better solution in most cases is to set up automatic backup to and restore from S3, f.ex. using duplicity with its S3 backend (as I do for my NAS). Anyway, here is how I got EBS volume mounting working. There are 4 parts to the solution: Configure EB to create an EBS mount for your instances Add custom EB commands to format and mount the volume upon first use Restart the Docker daemon after the volume is mounted so that it will see it (see this discussion) Configure Docker to mount the (mounted) volume inside the container 1-3.: .ebextensions/01-ebs.config: # .ebextensions/01-ebs.config commands: 01format-volume: command: mkfs -t ext3 /dev/sdh test: file -sL /dev/sdh | grep -v 'ext3 filesystem' # ^ prints '/dev/sdh: data' if not formatted 02attach-volume: ### Note: The volume may be renamed by the Kernel, e.g. sdh -> xvdh but # /dev/ will then contain a symlink from the old to the new name command: | mkdir /media/ebs_volume mount /dev/sdh /media/ebs_volume service docker restart # We must restart Docker daemon or it wont' see the new mount test: sh -c "! grep -qs '/media/ebs_volume' /proc/mounts" option_settings: # Tell EB to create a 100GB volume and mount it to /dev/sdh - namespace: aws:autoscaling:launchconfiguration option_name: BlockDeviceMappings value: /dev/sdh=:100 4.: Dockerrun.aws.json and Dockerfile: Dockerrun.aws.json: mount the host’s /media/ebs_volume as /var/easydeploy/share inside the container: { "AWSEBDockerrunVersion": "1", "Volumes": [ { "HostDirectory": "/media/ebs_volume", "ContainerDirectory": "/var/easydeploy/share" } ] } Dockerfile: Tell Docker to use a directory on the host system as /var/easydeploy/share – either a randomly generated one or the one given via the -m mount option to docker run: ... VOLUME ["/var/easydeploy/share"] ...

June 3, 2015

by Jakub Holý

· 14,771 Views

Ecosystem of Hadoop Animal Zoo

hadoop is best known for map reduce and it's distributed file system (hdfs). recently other productivity tools developed on top of these will form a complete ecosystem of hadoop. most of the projects are hosted under apache software foundation . hadoop ecosystem projects are listed below. hadoop common a set of components and interfaces for distributed file system and i/o (serialization, java rpc, persistent data structures) http://hadoop.apache.org/ hadoop ecosystem hdfs a distributed file system that runs on large clusters of commodity hardware. hadoop distributed file system, hdfs renamed form ndfs. scalable data store that stores semi-structured, un-structured and structured data. http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/hdfsuserguide.html http://wiki.apache.org/hadoop/hdfs map reduce map reduce is the distributed, parallel computing programming model for hadoop. inspired from google map reduce research paper . hadoop includes implementation of map reduce programming model. in map reduce there are two phases, not surprisingly map and reduce. to be precise in between map and reduce phase, there is another phase called sort and shuffle. job tracker in name node machine manages other cluster nodes. map reduce programming can be written in java. if you like sql or other non- java languages, you are still in luck. you can use utility called hadoop streaming. http://wiki.apache.org/hadoop/hadoopmapreduce hadoop streaming a utility to enable map reduce code in many languages like c, perl, python, c++, bash etc., examples include a python mapper and awk reducer. http://hadoop.apache.org/docs/r1.2.1/streaming.html avro a serialization system for efficient, cross-language rpc and persistent data storage. avro is a framework for performing remote procedure calls and data serialization. in the context of hadoop, it can be used to pass data from one program or language to another, e.g. from c to pig. it is particularly suited for use with scripting languages such as pig, because data is always stored with its schema in avro. http://avro.apache.org/ apache thrift apache thrift allows you to define data types and service interfaces in a simple definition file. taking that file as input, the compiler generates code to be used to easily build rpc clients and servers that communicate seamlessly across programming languages. instead of writing a load of boilerplate code to serialize and transport your objects and invoke remote methods, you can get right down to business. http://thrift.apache.org/ hive and hue if you like sql, you would be delighted to hear that you can write sql and hive convert it to a map reduce job. but, you don't get a full ansi-sql environment. hue gives you a browser based graphical interface to do your hive work. hue features a file browser for hdfs, a job browser for map reduce/yarn, an hbase browser, query editors for hive, pig, cloudera impala and sqoop2.it also ships with an oozie application for creating and monitoring workflows, a zookeeper browser and an sdk. pig a high-level programming data flow language and execution environment to do map reduce coding the pig language is called pig latin. you may find naming conventions some what un-conventional, but you get incredible price-performance and high availability. https://pig.apache.org/ jaql jaql is a functional, declarative programming language designed especially for working with large volumes of structured, semi-structured and unstructured data. as its name implies, a primary use of jaql is to handle data stored as json documents, but jaql can work on various types of data. for example, it can support xml, comma-separated values (csv) data and flat files. a "sql within jaql" capability lets programmers work with structured sql data while employing a json data model that's less restrictive than its structured query language counterparts. 1. jaql in google code 2. what is jaql? by ibm sqoop sqoop provides a bi-directional data transfer between hadoop -hdfs and your favorite relational database. for example you might be storing your app data in relational store such as oracle, now you want to scale your application with hadoop so you can migrate oracle database data to hadoop hdfs using sqoop. http://sqoop.apache.org/ oozie manages hadoop workflow. this doesn't replace your scheduler or BPM tooling, but it will provide if-then-else branching and control with hadoop jobs. https://oozie.apache.org/ zookeeper a distributed, highly available coordination service. zookeeper provides primitives such as distributed locks that can be used for building the highly scalable applications. it is used to manage synchronization for cluster. http://zookeeper.apache.org/ hbase based on google's bigtable , hbase "is an open-source, distributed, version, column-oriented store" that sits on top of hdfs. a super scalable key-value store. it works very much like a persistent hash-map (for python developers think like a dictionary). it is not a conventional relational database. it is a distributed, column oriented database. hbase uses hdfs for it's underlying. supports both batch-style computations using map reduce and point queries for random reads. https://hbase.apache.org/ cassandra a column oriented nosql data store which offers scalability, high availability with out compromising on performance. it perfect platform for commodity hardware and cloud infrastructure.cassandra's data model offers the convenience of column indexes with the performance of log-structured updates, strong support for de-normalization and materialized views , and powerful built-in caching. http://cassandra.apache.org/ flume a real time loader for streaming your data into hadoop. it stores data in hdfs and hbase.flume "channels" data between "sources" and "sinks" and its data harvesting can either be scheduled or event-driven. possible sources for flume include avro, files, and system logs, and possible sinks include hdfs and hbase. http://flume.apache.org/ mahout machine learning for hadoop, used for predictive analytics and other advanced analysis. there are currently four main groups of algorithms in mahout: recommendations, a.k.a. collective filtering classification, a.k.a categorization clustering frequent item set mining, a.k.a parallel frequent pattern mining mahout is not simply a collection of pre-existing algorithms; many machine learning algorithms are intrinsically non-scalable; that is, given the types of operations they perform, they cannot be executed as a set of parallel processes. algorithms in the mahout library belong to the subset that can be executed in a distributed fashion. http://en.wikipedia.org/wiki/list_of_machine_learning_algorithms https://www.coursera.org/course/machlearning https://mahout.apache.org/ fuse makes the hdfs system to look like a regular file system so that you can use ls, rm, cd etc., directly on hdfs data. whirr apache whirr is a set of libraries for running cloud services. whirr provides a cloud-neutral way to run services. you don't have to worry about the idiosyncrasies of each provider.a common service api. the details of provisioning are particular to the service. smart defaults for services. you can get a properly configured system running quickly, while still being able to override settings as needed. you can also use whirr as a command line tool for deploying clusters. https://whirr.apache.org/ giraph an open source graph processing api like pregel from google https://giraph.apache.org/ chukwa chukwa, an incubator project on apache, is a data collection and analysis system built on top of hdfs and map reduce. tailored for collecting logs and other data from distributed monitoring systems, chukwa provides a workflow that allows for incremental data collection, processing and storage in hadoop. it is included in the apache hadoop distribution as an independent module. https://chukwa.apache.org/ drill apache drill, an incubator project on apache, is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. drill is the open source version of google's dremel system which is available as an iaas service called google big query. one explicitly stated design goal is that drill is able to scale to 10,000 servers or more and to be able to process petabytes of data and trillions of records in seconds. http://incubator.apache.org/drill/ impala (cloudera) released by cloudera, impala is an open-source project which, like apache drill, was inspired by google's paper on dremel; the purpose of both is to facilitate real-time querying of data in hdfs or hbase. impala uses an sql-like language that, though similar to hiveql, is currently more limited than hiveql. because impala relies on the hive meta store, hive must be installed on a cluster in order for impala to work. the secret behind impala's speed is that it "circumvents map reduce to directly access the data through a specialized distributed query engine that is very similar to those found in commercial parallel rdbmss." (source: cloudera) http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html http://training.cloudera.com/elearning/impala/

June 3, 2015

by Umashankar Ankuri

· 23,883 Views · 3 Likes

Efficient Cassandra Write Pattern for Micro-Batching

The best way to write to a Cassandra cluster are concurrent asynchronous writes. In cases where data exhibits strong temporal locality, speed can be improved.

May 20, 2015

by John Georgiadis

· 35,038 Views · 1 Like

Why Android Studio Is Better For Android Developers Instead Of Eclipse

Besides, Android Studio platform developers also use Eclipse to develop applications, but always thought of Eclipse like a "Student-Project IDE " and learned about it.

May 20, 2015

by Mehul Rajput

· 68,317 Views · 1 Like