AWS Overlay IP in SAP Landscapes
In this article, we will understand why Overlay IP is needed, technical details on how this works, and design patterns in SAP that utilize it.
Join the DZone community and get the full member experience.Join For Free
Overlay IP is a special case of networking in AWS, which is of critical importance in SAP environments that require High Availability setup. In this article, we will understand why Overlay IP is needed, technical details on how this works, and design patterns in SAP that utilize it.
What Is Overlay IP
In AWS networking, the private IP address within a VPC binds itself to a network interface of an EC2 instance for providing network connectivity. As the binding happens in a static manner and the instance lies within a zone, this creates a challenge for the high availability scenarios that are set up across zones. In such scenarios, the client applications will expect a single IP address for application connectivity; however, the same cannot be achieved with a static IP address.
The Overlay IP, also sometimes referred to as Floating IP, is a concept that overcomes this limitation. Note that Floating IP terminology is also used by This IP address, which is outside the CIDR block that the VPC that the EC2 instance belongs to is using. For example, if VPC is using a CIDR block of 10.2.0.0/16, then the Overlay IP is any IP outside of the range 10.2.0.0 to 10.2.255.255. It can, for example, be 10.1.0.1 while ensuring that no other VPC or peered on-premise networks use it either.
How Does It Work?
The first thing to understand is that Overlay IP by itself is not an AWS resource. It is just any IP address that is chosen from the network. The only requirement is that this address should be routable. So the main aspect to understand here is how we make the address routable. The simple answer to this is a Routing Table in AWS. A routing table as per AWS definition, is a simple set of rules that determines where the network traffic from a customer's subnet or gateway is directed. Each route has a destination and a target, which might be confusing as both usually have the same meaning. However, in the case of a routing table, a destination is a range of IP addresses "where" we would want the traffic to be routed, and a target is a gateway, network traffic, or an interface "through" which the traffic is routed through. In the case of an overlay IP address, this route is edited by specifying the destination as the overlay IP address, and the target as the elastic network interface of the instance for which we wish to use the overlay IP address. This can be understood with the below diagram.
- The user/client application makes a request using the Overlay IP address
- This is first received by VPC/subnet route table to determine the next course of action
- The route table indicates that the corresponding overlay IP has the target set to the elastic network interface of the instance
- The traffic is routed to the elastic network interface of the instance, which then checks the destination as overlay IP
- The overlay IP basically, at this point, acts as the local IP address for this instance and then processes the task
- In case of a cluster failover, the routing table is updated, usually by a cluster software which changes the target to the elastic network interface of the other instance
This basically means that the Overlay IP itself can "float" between both instances based on the currently active instance from a cluster perspective. Hence, from an end-user/application point of view, this ensures a seamless failover with no changes for them.
An important aspect of this setup is the source/destination check of the property of the EC2 instance. By default, AWS EC2 instances are expected to be either a source or a destination of any traffic that it sends or receives. This is a security setting (AWS re: Invent 2017: Another Day, Another Billion Flows (NET405) - YouTube) that creates a problem in the scenario of Overlay IP ( as well as NAT instances, however, that is not elaborated here). As Overlay IP is an entity external to the EC2 instance and its network interface., if the EC2 instance will have this check enabled, it will eventually block the packet transmission since the destination is overlay IP.
Design patterns in SAP that leverage Overlay IP — The primary usage of Overlay IP in SAP landscapes is in application clusters (ASCS/ERS cluster) and the Database Cluster (primarily SAP HANA cluster). SAP applications have a design pattern where the SAP ASCS, abbreviated for ABAP SAP Central Services and Database, are single points of failure. To ensure resiliency, they are recommended to be made redundant and to improve the availability time of application, the failover to the secondary/standby node has to be performed automatically and transparently. In order to achieve this, the clusters are usually made up of three resources:
1. The application component itself, or in other words, the service that should run and should be monitored
2. The filesystem that can be mounted to run these services
3. A network address, in the AWS case, is the Overlay IP that can be assigned to an active cluster node.
While the cluster design itself is also a detailed discussion in itself, for now, our focus will be on how Overlay IP is being used in clusters. The cluster software determines which node is active by first determining the state of the SAP or database service that it is monitoring and then ensuring that the Overlay IP is pointing to the same EC2 instance where the current active state is found. When a failure is detected, along with the services being moved, the Overlay IP also moves to the other EC2 node, hence ensuring that all application and end-user traffic continues to work seamlessly with the active node. The cluster also updates route tables and uses the Overlay IP agent for this. I will cover the Overlay IP agent itself in a separate blog post.
The implementation of Overlay IP is available in two different ways as per AWS recommendation.
1. Transit Gateway — A virtual router or a cloud router, as AWS calls it. This component is designed to provide routing between several Amazon VPCs and also for customers' on-premise environments that are connected over VPN or Direct Connect to the AWS network. The Transit Gateway route table, in this case, basically should have a route where Overlay IP is added as a destination, with the target being an EC2 instance that is part of a cluster.
2. Network Load Balancer — A network load balancer is basically designed for TCP Load Balancing; although the clusters in SAP are active-passive setups and not really "load-balanced" in a real sense, the NLB is capable of choosing a target from the NLB target group to route the traffic to a destination which is the Overlay IP.
Here is the resource snapshot that is created within the Pacemaker cluster for quick reference:
primitive rsc_IP_HA1_ASCS00 ocf:suse:aws-vpc-move-ip \ params ip=x.x.x.x \ routing_table=rtb-xxxxxxxxx,rtb-yyyyyyyyy,rtb-zzzzzzzzz \ interface=eth0 profile=cluster \ op start interval=0 timeout=180 \ op stop interval=0 timeout=180 \ op monitor interval=60 timeout=60
The IP here is Overlay IP, and the routing table is the one that is either attached to VPC in case of the NLB setup or the Transit Gateway Route Table in case Transit Gateway is being used.
The cluster basically manages the IP with the colocation parameter of the cluster, which is set to be co-located with the current HANA primary database. The documentation from AWS can be used for actual configuration. This blog is intended to provide technical details and background. It is important to keep in mind that the Overlay IP agent replaces route table entries so that routing is re-directed to the active cluster node, and the necessary AWS IAM permissions for the same have to be available for the corresponding AWS EC2 instance's IAM profile.
On a side note, don't rely on ChatGPT for an answer to this since it can only analyze available textual data and provide answers, and since there is limited input data on Overlay IP, the information provided there is inaccurate and misleading currently.
Opinions expressed by DZone contributors are their own.