[This article was written Shay Naeh.]
As we get more heavily involved with NFV, I would like to share my experience taking a few VNFs (virtual network functions) and SFCs (service function chains) and implementing them on public and private clouds.
I thought implementing an IMS system on a public cloud would take forever but I was surprised to discover that there are several tools that can expedite the process. I chose to install Clearwater IMS from MetaSwitch on OpenStack.
Orchestration for NFV, VNF, SFC - on OpenStack with Cloudify. Go
Why Open Source NFVs?
- First it's free and available but not everything that is free worth something :-) but here it’s worth evaluating it.
- Second, open source NFV provides the economics of Over-the-TOP while maintaining Telco standards, and this is quite unique in this domain.
- Third, Its highly available and scales linearly.
- Fourth, it supports resource allocation & optimization quite nicely.
I chose to implement an open source IMS system from MetaSwitch, Clearwater, the distributed version.
- OpenStack becomes more mature With Telco-grade capabilities.
- It's the open source cloud with the largest community support and contribution
- High availability, e.g. even if Nova fails your application VM continues to function
- With support of the right tools you can make it available across cloud zones, regions of a public cloud and even support multi-clouds. You can run one component in a VMWare cloud and the second on an OpenStack cloud or failover to an OpenStack cloud.
Which VNFs and SFCs?
Basically you can install single VNFs like vRouter (virtual router), vFW (virtual firewall), vLB (virtuall load balancer) or whole susbsystems like IMS (IP Multimedia System) which together with a vSBC (session border controller), core IMS and P-CF (policy and charging functions) create a SFC (service function chain) to serve mobile devices connected through an APN (access point name). Another SFC could be EPC (evolved packet core) that has a vFW, charging and policy functions included in the chain.
The Four Must-Have Components:
An NFV cloud must have the four following components:
- A hardware component which is based on COTS compute servers, network and storage
- An IaaS system that manages the hardware resources and allocate them optimally like for example OpenStack
- The VNF application component that you virtualize as described above, e.g. open source IMS system
- An orchestration component which is responsible for automatic application installation & deployment including dependencies and relationships between components, KPI monitoring, analytics and feedback actions like auto-scaling and self-healing. I wrote about these features in a previous post.
For the orchestration component I used Cloudify which provides a complete life-cycle management of applications from deployment to monitoring and analytics.
4 must have components
Clearwater IMS from MetaSwitch
I will now describe my experience with installing the Clearwater IMS with Cloudify orchestrator + Chef. Clearwater can be installed All-in-One or in a scalable distributed way. The distributed way can scale out linearly. Additional sprout servers can be added based on the number of incoming connections, which is an application scaling method, or by CPU % which is more infrastructure related. The Clearwater architecture is depicted below.
The Deployment and Orchestration Process
The process of deployment and monitoring was done automatically. I described in a simple TOSCA like YAML file the application components and relationships. I created VMs for Bono, Sprout, Ellis, Homer, Homeastead and Ralf, as seen in the architecture diagram above. I then deployed the application components using the Cloudify Chef configuration management plugin which automates the Chef process. The way this works, is the Chef client is installed automatically on each node and calls the relevant Chef roles and recipes. The whole process is automated, including the installation of a DNS server and the relationships between the nodes.
Tosca like YAML configuration description
Node Orchestration and Dependencies Map
We can see in the chart above a dependency map which is used by the Cloudify Orchestrator to bring up the right components in the right order and tie them together with adjacent components. TOSCA describes a lifecycle which can be applied to a node or relationship. For example, if the DNS depends on all nodes then when all nodes are created a function can be executed to populate the DNS zone with the IPs of the nodes. Cloudify has a set of various plugins and workflow management which help to easily create this topology map and run day-to-day operations on it utilizing custom and built-in workflows, but that is in a different blog post.
Public Cloud and Private Cloud
I installed this on HP cloud, which is a public OpenStack cloud, as well as on our internal OpenStack lab environment. The process was similar: creating the VMs, attaching them to the right network, configuring the DNS, and bringing up all the nodes Bono, Ellis, Sprout, Ralf, Homer and Homestead. I had some issues with the DNS configuration, which I discovered utilizing TCPDUMP and WireShark for packet analysis and protocol decode of communications between the six Clearwater nodes. After fixing it everything ran smoothly.
Clearwater servers on HP OpenStack Cloud
Monitoring KPIs, Auto-Scaling and Self-Healing
To monitor the system activity and application status I used the collectors that come with Cloudify and enable me to hook in data from multiple data sources like CPU and system collector, SNMP, JMX, RabbitMQ and many others. It is based on the open source Diamond project. The metrics are kept in a metrics DB and I can apply various analytics on them.
Monitoring of Application and System KPIs
You can see above a chart of SNMP and CPU statistics collected from Bono. At around 15:02 you can see a sharp CPU system time drop from 30% to almost zero.
Dashboards, several charts can be grouped together, and in this way, we can create a dashboard for an application role which views the application statistics; such as: how many calls are coming in, how many rejects, etc. we can define a dashboard for technical personnel that are responsible for triage, and call tracing.
Auto-Scaling & Self-Healing
Based on a monitoring indicator or a group of indicators I can decide to add additional Sprout servers to support more connections, or to shutdown Sprout servers at times of lesser activity. This is called elastic scale-out.
Monitoring an availability KPI, I can decide that anode is not responsive, bring up a new VM, provision the new VM with the failed node capabilities, and add the new node instead of the failed one.
Ok, so that was Part I…I am now in the middle of these stages, and will update on progress with this in my next post…so stay tuned. It should be interesting.