We often have our heads down looking at the projects we regularly work on (Apache CloudStack and Xen Project) and don't always pay attention to the other cool things going on in the open source world. So once in a while it's good to poke your head up out of the clouds and take a look at some of the awesome projects being developed in the open source community. These projects are very promising and especially usefully for cloud computing.
Hybrid Cloud => Segregated Workloads
I am not convinced of the hybrid cloud scenario as espoused by many cloud pundits. I think it's more theoretical than the commonplace. What I do think happens is that organizations are using the public cloud and private cloud simultaneously with different applications in each and will continue to do so. That's why I like some of these tools that help users manage multiple clouds (hopefully one of them will be Apache CloudStack ;) ) from a single tool.
One of my favorite projects is Scalr, which gives users an easy-to-use menu-driven interface (See screenshot to the right) that enables them to deploy applications on multiple clouds. I have seen Scalr in use on a number of CloudStack clouds as well as being used to manage Amazon Web Services. Their template system makes cloud deployments a point-and-click proposition.
jclouds is an open source library that allows you to abstract API calls to a single interface and make those abstractions portable across clouds. jclouds tests support over 30 cloud providers and cloud software stacks including Amazon, Azure, GoGrid, Ninefold, OpenStack, Rackspace, and vCloud. jclouds also recently joined the Apache Software Foundation's incubator which I suspect will draw even more developer interest. The bottom line is that jclouds gives users a common interface to manage multiple clouds., some people refer to jclouds as a cloud controller. jclouds is a Java library and allows you to interact with it via Clojure or Java.
Just as jclouds abstracts the clouds interfaces into a Java library, Libcloud, does so with python. Libcloud allows you to manage four types of cloud resources: Servers (e.g. Amazon EC2), Storage (e.g. Amazon S3), Load Balancers and DNS all as a service.
Virtualization Still Evolving
Forrester analyst Dave Bartolett notes that virtualization has become mainstream and that the majority of data center managers will be using virtualization by the end of 2013 in this ZDnet post (It's a nice post with his take on where the industry is going).
We’re now more than a decade into the era of widespread x86 server virtualization. Hypervisors are certainly a mature (if not peaceful) technology category, and the consolidation benefits of virtualization are now incontestable. 77% of you will be using virtualization by the end of this year, and you’re running as many as 6 out of 10 workloads in virtual machines. With such strong penetration, what’s left? In our view: plenty. It’s time to ask your virtual infrastructure, “What have you done for me lately?”
The question is what's next in virtualization. Here are a few of the open source projects I think are evolving the application of virtualization.
While the team here at Open@Citrix is very supportive of the Xen Project and XenServer we do see a lot of value in the ideas behind lightweight containers for Linux apps. The folks at Gilt.com actually showed us how they used LXC and Apache CloudStack to run Lots of Small Applications (LOSA) in their cloud. The thing that makes LXC interesting to me is that the operating system is something users already understand how to manage, Linux and the payloads can be easily ported from Linux server to Linux server. If I were giving young Benjamin (Dustin Hoffman in The Graduate) advice today, one word it would be containers.
Docker is an open-source engine that automates the deployment of any application as a lightweight, portable, self-sufficient container that will run virtually anywhere. Many users are using Docker to help build a lightweight Platform-as-a-Service. While it's not considered production ready yet Docker seems to be a solution that the industry is hungry for which provides a way to deliver application payloads to multiple clouds in a consistent format. I highly recommend reading the Docker story here.
While Docker is an excellent way to package payloads in containers. CoreOS is an operating system that includes basically a Linux kernel and systemd to run containers. I could go into the CoreOS story in greater detail but I think Wired does a better job. Also note that these guys are pals of noted kernel hacker GregKH (@gregkh) so I am giving them the brilliant by association designation too. Although still in alpha they have the ability to create a very interesting platform for scaling Linux applications.
Xen on ARM
Xen has been around for the x86 architecture for over 10 years but as the ARM processor starts to cream into the data center the need for virtualization on this new architecture is becoming evident. As ARM gains ground not only as a mobile platform but that of the server it's going to be important to have the same virtualization capabilities as we have on x86. Last year when Linus pulled the Xen on ARM patches into the Linux kernel Xen became the first hypervisor supported by Linux on the ARM platform. Sang-Bum Subh (VP of the Samsung Software Platform Team in Software R&D Center) is leading the Xen ARM project and I believe we'll see many of the cloud platforms start to adopt management of ARM-based servers over time(Here's a demo of OpenNebula managing ARM servers on YouTube).
While we have seen storage and compute virtualization become pervasive until recently the hold back has been networking. The hypervisor developers have already seen the need for networking to become as fluid and configurable as virtual machines. They have adopted Open vSwitch as a softswitch for their platforms. The industry stood up and took notice when VMware acquired SDN controller developer Nicira for $1.26 billion last year. The final leg of the cloud tripod is to have flexibility in networking that matches that of storage and compute. That is why I believe software-defined networking (SDN) is one of the most important advances for the promise of cloud computing to become a reality.
Open Daylight is a community-led Software-Defined Networking (SDN) platform. The idea is to create an industry standard networking control platform that is pluggable to allow the entire networking stack (layers 3-7) to be managed by a commonly used control plane. Vendors and other open source projects can then add their integrations to help expand the capabilities of the controller. Open Daylight was launched earlier this year with the support of some of the biggest IT vendors in the industry: Red Hat, Microsoft, IBM, VMware, Cisco, Juniper, Brocade, Citrix and many more. The OpenDaylight project will be hosting a Mini-Summit in New Orleans later this month and probably will spark a lot more information.
- utility <yo͞oˈtilətē/> - Used, serving, or working in several capacities as needed, especially capable of playing as a substitute in any of several positions
When it comes to storage there are varying degrees of utility. It's not unlike the old parable about the blind men and the elephant storage means something different to everyone depending on their perspective. Some applications need high performance throughput others need massive amounts of storage but don't require high availability. Some require high availability but not consistency (see CAP Theorem).
In an interview with former Reddit Reliabilty Architect, Jeremy Edberg (@jedberg), he noted that SSDs in their application were like "cheap RAM" as the disks which were 4x more expensive than the cheaper spinning platters yielded a 16x performance increase. John Bates on the TwinStrata blog notes that fast and efficient transfer of large amounts of data isn't usually listed as a benefit. However, having geographically, redundant data storage usually is a requirement for cloud apps. Here are some open source technologies that are ideally adapted to the cloud.
Ceph is an open source distributed object store and file system that allows users to build robust storage applications on commodity hardware. This allows the user to create massive amounts of storage but still controlling the costs to measure the level of utility needed by the user. As a file system Ceph provides a traditional file system interface that complements the traditional file system.
Since the release of Gluster 3.4 I have heard a lot of positive praise for the their distributed file system which scales to 72 brontobytes. GlusterFS clusters storage building blocks over Infiniband or TCP/IP and then aggregates disk and memory resources into a single name space.
While there are tons of notable open source projects ongoing, Hadoop, OpenStack and CloudFoundry come to mind, I really think these lesser known projects are worth mentioning. With the expansive number of open source infrastructure projects available it really provides the building blocks for the cloud to be an unmatched platform of innovation. Given the unlimited number of combinations for these technologies there is a solution that can be customized for virtually any needs of the organization.