Serverless Network Log Shipping, Enrichment, and Transformation
Join the DZone community and get the full member experience.Join For Free
What Are VCN Flow Logs
Virtual Cloud Network(VCN) flow log is a feature of Oracle Cloud Infrastructure that enables customers to capture information on the IP/Network Traffic that flows through a subnet as observed from a Virtual NIC in the subnet. Flow logs can be extremely useful in a number of tasks such as
- Diagnosing overly permissive or overly restrictive firewall rules in Security Lists / Network Security Groups that restrict/allow the flow of packets
- Monitoring network traffic reaching and originating from your Instances (VMs, DBs, Bare Metal Servers)
- Traffic volumes processed by a given VirtualNIC or Subnet
- For Forensic analysis when exported to a store that acts as a non-repudiable source of truth like a SIEM.
What Is This Article About?
This article outlines the architecture and deployment methodology of a serverless, cloud-native, scalable, low-cost, zero-maintenance method of processing and enriching the Virtual Cloud Network (VCN) Flow Logs and populate to a SIEM like Splunk. The same method can, however, be used for
The architecture uses standard formats like a flat JSON published over a standard HTTP Event Collector which is a default functionality in most SIEMs , Log Aggregation and Event Collector services.
Oracle Cloud Infrastructure announced the Cloud Native Limited Availability program for customers and developers developing on Oracle Cloud Infrastructure in late 2019. The most recent addition to Oracle's cloud-native suite was VCN Flow Logs. For further information on how you can sign up for the Limited Availability(LA) Program.
Whitelist your tenancy for Logging
- Number of Data Centers / Regions in Tenancy - 4 Regions
- Number of Compartments in each Region - 96 Compartments
- Number of VCNs in tenancy - 119
- Number of Subnets per VCN - 2 Subnets / VCN
- Avg. Number of Unique Flow Log events processed/ day - 160 Million. + Events
In Addition to transporting network flow logs at scale, this architecture serves as a means to enrich flow logs as they get populated with context.
For the Impatient
Link to the entire source code and deployment tutorial.
The Utility of VCN Flow Logs
What Questions Do Plain Flow Logs Help Answer?
- Which Source?
- Which Destination?
- Which Protocol?
- Did it pass through the firewall?
Raw VCN Flow Log
What Questions Do Enhanced Flow Logs Help Answer
- What is the Virtual NIC(vNIC) did this packet originate from ?
- Which Subnet is the Virtual NIC a part of ?
- Which VCN is the Subnet a part of ?
- What is the Security List or Network Security Group that rejected/ allowed this packet flow
- Which compartment do all these resources belong to?
Enriched JSON Flow Log
The additional metadata enhances the readability, adds context, and enhances debugging capability.
Oracle Cloud Infrastructure Logging service is a cloud-native, completely managed service that can Collect, Index, Search & Aggregate Logs from multiple log sources ( OCI Services). The Oracle Logging here is leveraged to extract VCN Flow Log information from each subnet in the tenancy.
Oracle Cloud Infrastructure Object Storage is a low-cost, highly-scalable, the zero-management target for all VCN Flow log files. Automated object lifecycle policies set on the objects helps to simplify the garbage collection of log-files pushed by the Logging service after being processed by the serverless data pipeline.
The glue of the event-driven architecture generates events when log files are created in the object storage based on a pre-defined set of conditions. Events service can be leveraged to drive highly automated, zero-ops workflows. For this project, Events service helps trigger targeted events using attribute filters and on the specific event of flow log creation.
Oracle Functions is an extension to the popular open-source Fn Project backed by Oracle. Oracle Cloud Infrastructure provides seamless integration with the Fn-project, adds simplified code to deploy functionality with popular languages such as Python, Java etc. In this architecture, the enrich-flow-log function does the following
- Read the flow log object created
- Extract the object metadata and use it to populate the network metadata by querying the OCI API
- Parses the log file into a JSON document
- Uses the Splunk HEC interface to publish it to Splunk.
Ease of UseZero patching
No need to size
Deploy and Forget model
Functions are massively parallel
The entire architecture is event-driven
The Function is deployed on a private subnet
Instance Principals and Dynamic groups for Least privilege
No Ingress traffic is allowed.
Object Storage Access through Service Gateway only
Egress to Splunk through NAT Gateway
Egress allowed only on specific Splunk Ports & Url
Events are based on invocations only
Object Storage Policy to retain objects for only 1 Day
Fn cost based on invocations only
Multi-Region Deployment Architecture
Below is a representation of how this setup can be extended to multiple regions populating data to the same Splunk instance.
I hope you learned more about the ease of developing cloud-native functionality using serverless, event-driven, fully managed, pay-per-use components that Oracle Cloud Infrastructure offers through this blog. You can follow me on medium at https://medium.com/@vamsiramakrishnan
Published at DZone with permission of Vamsi Ramakrishnan. See the original article here.
Opinions expressed by DZone contributors are their own.