DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Enabling Data Location Awareness for Optimized Performance and Lower Cost With Alluxio Tiered Locality

Enabling Data Location Awareness for Optimized Performance and Lower Cost With Alluxio Tiered Locality

Data location awareness — learn how Alluxio tiered locality can provide optimized performance and lower costs.

Andrew Audibert user avatar by
Andrew Audibert
·
Mar. 18, 19 · Tutorial
Like (2)
Save
Tweet
Share
4.21K Views

Join the DZone community and get the full member experience.

Join For Free

Caching frequently used data in-memory is not a new computing technique; however, it is a concept that Alluxio has taken to the next level with the ability to aggregate data from multiple storage systems in a unified pool of memory. Alluxio capabilities extend further to intelligently managing the data within that virtual data layer. Tiered locality uses awareness of network topology and configurable policies to manage data placement for performance and cost optimizations. This feature is particularly useful with cloud deployments across multiple availability zones. It can also be useful for cost savings in environments where cross-zone or cross-location traffic is more expensive than intra-zone data traffic.

Here is a simple scenario where Alluxio will use network topology information to prefer more local reads and writes with Alluxio workers in two different AWS Availability Zones.

enter image description here

Using this setup with m5.xlarge EC2 instances, the application gets different read performance depending on which worker the data is read from.

enter image description here

Unsurprisingly, performance is fastest when reading from the local Alluxio worker and slows when read from a non-local worker. The performance difference between worker 2 and worker 3 is due to the difference in bandwidth between Availability Zones (AZs). Worker 2 is in the same AZ as the application with about 10 gigabits per second of bandwidth. Reading from worker 3 is slower because the bandwidth across AZs is only about 5 gigabits per second.

Without tiered locality, the application is just as likely to read from worker 2 as worker 3 if both workers have the data cached. Configuring tiered locality gives a preference for worker 2 and faster performance. The situation is similar for writes. When applications write data through Alluxio, they will prefer to write to more-local workers for improved performance.

Configuration

To enabled tiered locality, and the associated performance benefit, every actor (clients and workers) must be configured to know its tiered identity. Tiered identity is a mapping from locality tier (e.g. Availability Zone) to the value for that tier (e.g. us-east-1a). For the above cluster setup example, the tiered identities would be:

enter image description here

Configure With Alluxio-site.properties

The simplest way to configure tiered locality is to use alluxio-site.properties to set it like any other. Please refer to this configuration settings page.

Here are the properties for Application, Worker 1, and Worker 2:

alluxio-site.properties

alluxio.locality.az="us-east-1a"
alluxio.locality.order="node,az" # custom locality hierarchy


And here are the properties for Worker 3 and Worker 4

alluxio-site.properties

alluxio.locality.az="us-east-1b"
alluxio.locality.order="node,az" # custom locality hierarchy


We set  alluxio.locality.order to introduce the az locality tier and show its order in the locality hierarchy. By default, the locality tiers arenode,rack. Note that we don't need to explicitly configure "node" identity because it is determined automatically via localhost lookup.

Configure With Alluxio-locality.sh

When the cluster is set up automatically, or there are many workers, it can be convenient to set the locality information via script instead of using a static value in alluxio-site.properties. If a script exists at ${ALLUXIO_HOME}/conf/alluxio-locality.sh, it will be executed to determine tiered identity.

alluxio-locality.sh

#!/bin/bash
echo "az=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)"


Custom Tiers

In this example, we used the tiers node and az, but the tier configuration is fully customizable, so you can use whatever tiers make sense for your deployment, e.g. rack, zone, region, etc. Just make sure each region is contained within the next in  alluxio.locality.order, since locality decisions prefer to match in the earliest tier possible.

Interaction With Location Policies

In addition to tiered locality, Alluxio has a concept of using a location policy during reads and writes to help select the worker to use. The policies interact with tiered locality in different ways.

  •  LocalFirstPolicy: Choose a worker that matches in the most local tier possible
  •  LocalFirstAvoidEvictionPolicy: Like LocalFirstPolicy, but avoiding eviction is given a higher priority than locality.
  •  MostAvailableFirstPolicy: Unaffected by locality information
  •  RoundRobinPolicy: Unaffected by locality information
  •  SpecificHostPolicy: Unaffected by locality information

Conclusion

When clusters have non-uniform networking capabilities, it makes sense to configure Alluxio tiered locality to improve performance. It can also save on cost in environments where cross-zone network transfer is more expensive than intra-zone data transfer.

Data (computing) Alluxio Location awareness optimization application cluster Aggregate data

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Strategies for Kubernetes Cluster Administrators: Understanding Pod Scheduling
  • Choosing the Right Framework for Your Project
  • When Should We Move to Microservices?
  • What Is API-First?

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: