There seemed to be a big crowd on the Cloud Foundry Community Advisory Board meeting this month. The main focus of this meeting was the Foundation, for which it was suggested attendees bring their questions. But first, the technical...
James Bayer from Pivotal started the updates with information on the recent vulnerabilities, such as ShellShock and POODLE. He said that "we're all patched up" and that v191 of cf-release has the POODLE fixes for the client. Greg Oehman from Pivotal confirmed this and said that the BOSH stemcell v2748, and subsequent releases, includes the POODLE fixes.
On the 28th October Greg sent out a Final Resolution of the SSLv3 POODLE vulnerabilities to the vcap-dev mailing-list, which gave resolution steps for BOSH, Cloud Foundry and HAProxy users. Those using HAProxy alternatives for load-balancing should take equivalent appropriate steps.
James said that going forward they will continue to communicate vulnerability notices and resolutions over the relevant mailing lists, such as the vcap-dev and the BOSH mailing lists. He said their goal is to get down to 2 full business days for resolution of a vulnerability.
Garden and libcontainer
Glyn Normington from Pivotal gave an update on the progress made for merging Garden features into Docker's container implementation, libcontainer. This is an effort to switch to using libcontainer for Cloud Foundry containerization.
Glyn said that they have worked with the people developing libcontainer for 2-3 months, defining an API that would allow them to re-use libcontainer within Garden. He said that they did make some progress, but the process model for libcontainer is quite different from Garden's. Specifically, the first user process initiated in a libcontainer container becomes the init process for the container and defines the life-cycle of that container. Any additional user processes, that need to run in same container, would need to be managed by this primary process.
Glyn's team was considering retrofitting the Garden functionality into the first user process, but there are missing features. They raised several feature requests against libcontainer, but Glyn says that there is little incentive for the Docker and Google developers, working on libcontainer, to include these features.
The decision was made to not completely use libcontainer, but rather integrate selective packages from libcontainer into the Garden code-base. An example of this is the netlink package, which encapsulates a lot of specialist knowledge that the Garden developers would have to build-up for themselves.
James asked whether this was related to wshd. wshd is a daemon that runs within the Garden container and performs tasks such as making sure the container is set-up correctly before starting the application process and hooking the application process into Loggregator. James mentioned that these things were difficult or they possibly hit race-conditions when trying to do this with Docker. Glyn said blockers were related to this kind of thing and also mentioned the secret socket that wshd establishes back to the Warden server. He said that the way the secret socket is established is currently not possible to do with a normal Docker user-process.
Glyn reminded everyone that Garden now supports mounting Docker images directly, which is independent of libcontainer. This uses Docker's graph driver. This deals with the Docker packaging and file-system side of things, rather than the running actual Docker containers.
Glyn recently wrote a blog post giving an overview of Garden.
Greg Oehman from Pivotal gave an update on the Cloud Foundry CLI, saying they have released v6.6.2. He said that this release includes a lot of functionality for an MVP (minimum viable product) of CLI-Plugins, although it is not quite there yet; 23 of 27 stories are finished. He said this should be completed in the next week, at which time they will work with the Documentation team to document it. For Plugin authors, the documentation will be found on GitHub and will include example plugins. For Plugin users the documentation will be found in the usual docs.cloudfoundry.org site.
Greg has posted a video demonstration of the currently implemented CLI Plugins functionality.
Greg said his team have spent time migrating their continuous integration to GoCD. He said that this does not directly benefit the community, but does allow his team to move faster.
CLI improvements this past month are related to Security Groups and Space Quotas. This includes improvements to help command output and error handling. An unhelpful message encountered during login was also addressed to give more useful feedback.
CLI bugs that have been addressed include a Windows GoCD timeout issue.
Metrics and Logging
Alex Jackson from Pivotal gave an update on Metrics and Logging. The fire-hose was completed this month. This is a single stream of all of the data that is collected, in protocol buffer format, for logging and metrics.
The fire-hose is currently in the develop branch of cf-release and it is targeted to be included in the next v192 release, at which point it will be merged into the master branch.
An additional scope has been added to UAA to control access to the fire-hose. End-users requiring access fire-hose will need to have this scope added to the profile.
Mike Youngstrom from LDS Church, asked in the chat, "Is there an adapter for the Collector to the fire-hose?" Alex said that there was not. Currently, they are actually pumping the data from fire-hose into the Collector (into varz). They want to re-factor the Collector code so that some of the Historians, rather than being embedded in the Collector, will be consuming from the fire-hose and will be plugged in at the end of the chain. This will make it easier for others to add additional Historians, in different formats, at the end rather than having to get into the bowels of the Collector code. He said that this model would be more extensible.
Other work included improvements to the example producer library, dropsonde, and the example consumer library, noaa. The READMEs of these two repositories give more information on the fire-hose and protocol format used.
Note: varz is REST interface exposing all metrics of a Cloud Foundry component.
Alex said his team has been working on making the logging and metrics system more reliable when NATS fails. They found some bugs related to NATS going away and coming back again, such as requiring to re-subscribe to or re-send heartbeat messages.
The syslog templates have been consolidated into the Metron job, rather than being spread out across the system. Metron is the agent that handles logging and metrics on the VM, but also manages the syslog configuration.
Dr. Max (Michael Maximilien) from IBM asked if there are cases where a subscriber to the fire-hose might lose events. Alex said it's a best-effort. He said that if you slow down as a subscriber then you will not keep up and the system will start to lose messages. In cases of extreme load, messages may be intentionally dropped instead of blocking other components. I confirmed with Alex in the chat as to whether end-users would be aware that log messages had been dropped and he responded, "If they are dropped due to a slow consumer, we add a warning message to the stream."
Alex addressed a question from the mailing-list, on whether syslog is feeding into the fire-hose. He said it is not and is still independent of the fire-hose. Although, they would like it to and that is an upcoming change. There is a demo video of the Logging and Metrics Firehose.
Dr. Nic from Stark & Wayne told us how he had "discovered a magical way", thanks to Dan Higham, to gain access to a running container. He then made a project called cf-ssh. Originally cf-ssh was a bash script, but this limited where it could be run. Since then he has rewritten it in Go, so it will run on any operating system, such as Windows, and he hopes that one day it will become a CLI Plugin.
Shannon Coen from Pivotal gave an update on the work his team has been doing on the open-source highly-available MySQL service. Over the past few weeks they have found it has become hard to handle failure of cases with HAProxy. This is particularly true for severing existing connections when nodes becomes un-writable. There are a few states where the Galera database is considered "healthy", but is actually un-writable.
Due to the issues they have been experiencing with HAProxy, Shannon's team have begun writing their own thin TCP proxy. This is not MySQL aware and is simply to bridge-the-gap with their requirements for fail-over and for example, to ensure that clients do not have to wait for a five minute timeout before they can reconnect to a healthy node. He said that this work is progressing quickly.
As mentioned in [last month's CAB call], Shannon's team are continuing to keep an eye on the MaxScale project, which is a MySQL aware "smart proxy". Shannon said the project has a very active community which is quick to respond to requests. The current blocker of adopting MaxScale is an issue with duplicate connections occurring when a unhealthy node returns to being healthy. This has the potential of causing deadlocks if a row-lock is requested at the same time to multiple nodes. They have submitted a fix request for this and the MaxScale community are looking at it. MaxScale is hoping to have a general availability release in December, so the in-house TCP proxy Shannon's team are developing may be retired after a few months.
In other news, Shannon said that the work on the Update Service Instance feature is progressing. With Cloud Foundry, end-users have always had the ability to create, bind, delete and rename services, but not to make changes to the provisioned services themselves. Most use-cases relate to upgrading and downgrading quotas, but may also relate to adding additional features. For example, if you have a database that is becoming full, you probably require a bigger one. This is being implemented in a generic way, that allows you to change the service plan for the already provisioned serviced. It is up to individual service brokers to manage the details of what is included in each plan. Shannon expects this feature to be available soon.
IBM has been helping with both the MySQL work and the Update Service Instance feature.
Riak CS service work has involved getting it to a point where it could be used as a blob-store internally by Cloud Foundry components. This involves seeding the buckets via cf-release so that components requiring to use a blob store do not have to deal with generating credentials on-demand.
Reliable garbage collection for Riak CS is something they have been working on with Basho, who are the creators of Riak CS. This helps with reclaiming disk usage when objects are deleted. Shannon said that Riak CS has a complex way of doing garbage collection - it is asynchronous and highly-tunable. Thanks to their efforts, they are now seeing predictable disk usage reclaimed.
Dmitry Kalinin from Pivotal said his team has been working on OpenStack related BOSH work. Last month he discussed that they have been improving how the BOSH agent deals with OpenStack disk allocation, specifically ephemeral disk partitions on the root disk. This work has been finished and pushed out.
They have also introduced features related to config-drive. This is one option for sending meta-data to the BOSH agent. This spans many different OpenStack implementations and has been implemented in two places; the BOSH CPI and the BOSH agent.
The OpenStack root partition has been shrunk to 3Gb, which Dmitry says will be a surprise to those relying on a 10Gb root partition. He asked that questions be sent to the mailing-list, so that he can discuss what the usage patterns are.
The vSphere CPI has been improved. There were some race conditions around creating different folders. Now it places the VMs into those folders, which should make the vSphere CPI more resilient when you are spinning up multiple VMs at the same time.
The OpenStack stemcell is now using an older version of the Qcow disk image file format, which allows greater usage across more versions of OpenStack. HVM stemcells are now being published in response to requests from the Cloud Foundry community. These are only lite stemcells, which are being used within Pivotal's development environment. Dmitry expects more demand for HVM stemcells as Amazon Web Services starts to use HVM more.
A long-lived bug in BOSH Director has been fixed, which was around system dependencies for some of the packages. Existing releases that were not able to upload should now work. This was found by the team working on the Riak CS service, when trying to upload the BOSH release for that.
Some race conditions have been removed from BOSH Director NATS communication, which has improved their integration tests.
There have been a few pull requests from the Cloud Foundry community. For example, the BOSH Errands command will now list all Errands on the Director.
Security fixes for POODLE and ShellShock were pushed out to all the stemcells.
External CPI work continues, discussed last month, which is work to pull all of the CPIs out of BOSH repositories. It is now possible to run a new BOSH micro CLI and deploy a VM, telling that VM to start acting as a MicroBOSH with the new CPI.
Dr. Nic asked Dmitry if the BOSH micro CLI will support all the CPIs. Dmitry said yes, it will support CPIs as BOSH releases. He said the plan is to release all CPIs as BOSH releases. The CLI will use the CPI releases transparently, so there will be no need for external changes to a CLI for additional CPI releases. Dr. Max recommended looking at bosh-warden-cpi-release or bosh-softlayer-cpi-release.
Dr. Nic also asked about the potential future of a "Micro-Cloud Foundry" VM. Dmitry said that they have created a "CF-Lite" team of two developers in Toronto that is tasked with getting bosh-lite back on track. Once that is finished they will continue the work they have already started on building CF-Lite Vagrant boxes. There is currently an issue with the size of cf-release and how big the final package will be. One obvious place this can reduced is with offline buildpacks. Currently with all the offline buildpacks installed the size of a VM is 11Gb.
James Bayer of Pivotal said that their Runtime team has been working on fixing some Health Manager issues they found with NATS' queuing functionality. Sometimes this loses a message or fails to remove a message from the queue. Instead, they are moving to HTTP point-to-point when using this particular design pattern. NATS usage will generally be limited to just pub-sub.
James discussed a new API for the Cloud Controller, which will copy the source files from one application to another. The use-case he described involved having a "development" space, a "testing" space and "production" space. If you have permissions to all those spaces you would be able to call this new API and copy the application source code from an application in one space to an application in the other spaces. You are still required to re-stage the application that receives the source code. He noted that this only deals with promoting the source code and does not deal with environment variables, bound services or other application settings.
James said that Diego is now staging applications in Pivotal's production environment in an opt-in style. This means that when you stage your application you can set an environment variable which explicitly tells Cloud Foundry to stage it with Diego. He said this is not in cf-release, but is an additional BOSH release, called diego-release, that can be installed as an add-on to cf-release. There is not yet documentation on how others can do this, but James said Onsi Fakhouri of Pivotal is planning to send out instructions when they are sure it is working correctly. Work is also being done to get a Vagrant release of Diego, so people can more easily try it out.
Mike Maxey from Pivotal gave an update on the Cloud Foundry Foundation and said that a large chunk of the governance documents have been posted to the cloudfoundry.org website for review. This includes the bylaws, the membership agreements and CLAs. Mike said that this is the work of a big team, with a lot of experience, over the past 9 months. He said this is one of first formal steps we go through before we stand up the Foundation and make it official in early December. When this happens all the intellectual property and processes will flip into the Foundation working model.
Dr. Nic asked about the connection between the Foundation's Board of Directors and the Community Advisory Board. Specifically, will the future of the Cloud Foundry project be directed only by companies that can afford Gold or Platinum membership? Mike said that in the documents he thinks they have done a good job of separating the technical community, and their decisions, from the Foundation's board. He said the board manages IP and dollars for marketing. The connection between the CAB and the board itself has been formalized. In the bylaws a provision has been added for two working groups that the board will start with - although they have the ability to create more. The first of these two groups will be the User Advisory Group, which will allow end-users to give feedback into the project. The second, which is being called The Strategy Council, is for vendor members of the foundation, which includes silver, gold and platinum. The Strategy Council will focus on business issues, and will advise the board on things such as what the Summit should look like and how the community is going.
Bart Copeland from ActiveState asked if the PMC (Project Management Committee) Council Documents will drive the technical aspects of how groups like the Community Advisory Board are governed and asked when those documents would be ready. He said he thinks those documents will be more relevant to the Community Advisory Board. Mike said that the technical side is governed by the technical communities and the documents mention projects, PMCs and a PMC Council. These are all self-governing, so they choose the members that join, the backlogs of work and the state of the projects when they are promoted from incubation. He said that all the technical decisions "live in that side of the house". There is a plan to create an Operations Document to help the technical community understand what is meant by those terms and the working models. Mike said they are committed to have the Operations Document done in the next 4-6 weeks, before the Foundation is set up.
Mike said the documents such as the Operations Document and the Development Governance Policy are intended to not overly define things or put in a lot of rules. Instead, they hope they can leverage the processes and things that are currently working. Rather it should be a quick-start guide for people joining the community.
Chris Ferris from IBM added that they are trying to keep a separation between what the board is worrying about and what the technical side of the house is concerned about. He said the board "doesn't get to meddle too much" in what is going on in the technical side. He clarified what Mike had said, in that the board is concerned with legal, financial, marketing, operations aspects of hiring and firing of employees and so forth. Chris said the technical side is concerned with running the open-source initiative. The board does not approve things that are done on the technical side. The PMCs have complete autonomy. Pull requests are just as welcome and will be processed in the same way as is done today. Chris said that we now have the addition of a PMC Council, that we didn't have before, and this needs to be put together. If the PMCs or PMC Council find they need to adapt their processes, then they can do that. The PMC Council has an audit by the board once a year where they state what their technical governance model is.
Dr. Max asked what happens around intellectual property, to which Chris answered that it is part of the legal framework and is controlled by the board. He said that things like whether there is a CLA (Contributor License Agreement) and who has to sign the CLA is decided by the board. The Foundation's operations would keep track of who has signed it. Chris noted that currently James Bayer keeps a list of who has signed the CLA. Also, which software licenses can be used in the project will be decided by the board.
"The board is not going to get in there and say - hey, you can't do that to BOSH!", said Chris.
Mark Atwood from HP re-iterated Chris' points on what the board is concerned with and said the obvious things are around licensing, trademarks and those sorts of things. He added that he fully expects that technical contributors will "feel free to have a full and frank exchange of views" with people on the board if they think they are going even the slightest bit off the rails.
Bart Copeland of ActiveState said he wanted to thank all the people involved, as he knows what a huge amount of work was involved in working with all the constituents. This was later echoed by Renat Khasanshyn from Altoros and Colin Humphreys from CloudCredo.
Bart said that there are two types of certification proposed for Cloud Foundry, but he was particularly interested in how the PaaS certification will work. Mike Maxey said that in the documents they have defined that they are going to exist with a process to-be-determined. He said the idea here is to satisfy users and operators of Cloud Foundry that when a PaaS offering is certified as a Cloud Foundry PaaS it will work with other certified PaaS offerings. It removes that overhead and gives it the Foundation stamp-of-approval. Mike said to be certified you ship bits and those bits will be the same across all offerings. IBM, Pivotal, HP and ActiveState will all have the same core bits. He said that brings comfort to end-users to know that if for some reason they don't like one vendor and want to go to another, that the core is the same. The process for testing and validating is something that they are still working through. It is not a paid certification. It will be free, it will use open-source tools and it will be as easy as possible, but the technical teams are still working through the details. Mark Atwood said that this is "one those interesting interfaces between legal and technology" and he sees that for this in particular the board will solicit the technical community for proposals, advice, suggestions and reviews of its process.
Renat Khasanshyn from Altoros asked what controls will be in place within the bylaws for upcoming projects and features to prevent things like incubation from getting out of control. He suggested this might be caused by the competing interests of a growing number of vendors. Chris Ferris replied to say that you get influence from the input you put into a project. Key influencers of specific projects will decide if new changes are taking the project in the direction where they think it needs to go. Pull requests will be judged on quality and whether it meets the criteria set out for the project. He said that free-and-frank dialog on the mailing-list and IRC is essentially how open-source is done and we would expect that to continue. Also, there would be oversight by the PMCs and PMC Council. Chris stressed that we are not looking to have voting and that is the last thing he would like to see.
Mark Atwood said the absence of the Operations Document at this stage was intention. They decided to not have the details of the technical governance process baked in the bylaws, because they knew that they wouldn't get it right the first, second or third time.
Colin Humphreys from CloudCredo asked how the community can communicate to the board any changes they want to make to the documents. Mike Maxey said the Operations Document is almost out of the control of the board and would be managed by the community, PMCs and PMC Council via the mailing-list and IRC. He said that feedback on the other documents could be sent directly to himself, Chris or the other team members via email. Although, he added that these documents have gone through 9 months of heated discussions, back-and-forth and many hours of attorneys staring on these documents, so unless something is glaringly wrong it is unlikely they will change. But they are always welcome to feedback. Mike also said that once the foundation is up and running, we will have an executive director, a real board and a chairman of the board, so there will be lots of vehicles to provide feedback.
Foundation Governance Documents
- Community Code of Conduct: Describes the guiding principles of the Cloud Foundry Foundation
- Bylaws: Describes the Foundation Board of Directors, membership and rules
- Development Governance Policy: Outlines the structure of the Development community including projects, PMCs and the PMC Council Membership Agreement: Required in order to join the Foundation
- Individual CLA: Required in order for an individual to submit code for a Cloud Foundry project
- Corporate CLA: Required in order for a company to submit code for a Cloud Foundry project
- Intellectual Property Policy: Describes the IP policies and licenses for Cloud Foundry projects
- Antitrust Guidelines: Outlines the conduct for the organization and members to promote fair competition
The next meeting is Wednesday 26th November at 8am PST.
James Bayer has posted a Calendar to make it easier for others to remember when the meeting is. It is generally the last Wednesday of every month, but as James notes, "We have some holidays coming up in Nov/Dec, so we may adjust a bit those months, more information on that as it's available."