Download the Essential Cloud Buyer’s Guide to learn important factors to consider before selecting a provider as well as buying criteria to help you make the best decision for your infrastructure needs, brought to you in partnership with Internap.
There is recently an article by Bernard Golden
talking about network constraint (bandwidth and latency) as well as the
associated bandwidth usage cost continues to become one main obstacle
in cloud computing.
There are two concerns here. One is about
not meeting the application's performance goal (throughput and response
time). The other is about the cost of running in the cloud. (receive a
large phone bill from your cloud provider)
The goal is to reduce the total amount of data transfer. A number of cloud app design patterns can be used ...
How do you put the code and data together before the processing can start ?
Try to be as stateless as possible
is zero data data transfer to be transferred if your component is
stateless by nature. Following techniques are assuming that there are
some unavoidable stateful components involved.
Move your data creation process into the cloud first
of uploading huge volume of data from your data center into the cloud
so processing can be started, can you move the data creation process
into the cloud ? Of course, you need to carefully evaluate the security
Distribute the architecture of your data creation
the subsequent processing is based on a parallel execution
architecture, why not distribute the data creation processing also.
This will save a data repartition step.
Move the code to the data
usually has a much smaller footprint than the data it processes.
Therefore it is more economical to move processing logic to the data
rather than downloading the data to process. Of course, we need to
check to make sure the machine hosting the data has enough CPU power to
execute the processing logic.
Do as much as possible along current partition
typically parallel processing architecture partitions data along some
dimensions, conduct the processing in parallel, and then repartition
data along other dimensions, conduct the next stage of processing, and
so on ...
See if you can rearrange the order of processing such
that you can do as much as possible within the current partition. The
goal is to minimize the number of repartitions where a lot of data
transfer is needed.
Minimize data redistribution at grow/shrink
do you redistribute data to newly joined VM such that the overall data
transfer can be minimized ? For example, "consistent hashing" algorithm
can be used such that data redistribution only happens within the
neighbor of newly joined VM rather than every other existing VMs.
Conduct data redistribution in the background
redistribution should have an impact on performance but not accuracy.
In other words, the newly joined VMs should be able to serve immediately
while doing data redistribution in the background. The data
redistribution algorithm (which may take a longer time to finish) also
need to adapt to continuous joining VMs. In other words, data
redistribution can be just an ongoing performance improvement process in
a highly dynamic workload environment.
Place component with bandwidth cost in mind
than the amount of data being transferred (which should be minimized
anyway), it is equally important to look into bandwidth cost. Typically
the cloud provider will charge a substantial amount in bandwidth usage
across the cloud boundary. Therefore, it is important to place the
components such that if data transfer do need to occur, it will occur
within the cloud rather than across the cloud boundary. This requires a
careful analysis of the communication pattern among application
components and group frequently communicating components so they will be
deployed within the same cloud.
Migrate data as communication pattern changes
pattern may change after the system is deployed. It is important to
continuously monitor the actual communication patterns and determine if a
migration is needed to minimize the bandwidth cost. It is important to
consider the gain versus the cost of migration. Gain is estimated by
multiplying the communication frequency with the time that the new
communication pattern is going to persist. Cost is estimated by the
total among of data redistribution traffic caused by component
migration. And only when the migration cost is smaller than the gain
will the migration take place.
Use a local cache to reduce the need of data access, especially if the data is relatively static.
Allow direct access to data
is against the philosophy of SOA where the internal state should be
encapsulated behind an API interface. In this model, when a client want
to extract the data, it need to first make a request to the owning
application, which then make a request to the DB, get the data, encode
that into the web service response, and pass the result back to the
client. Is network bandwidth is costly, it will be much more efficient
if the client can have direct access to the DB.
Expose latency information to the application
Provide latency map so application can dynamically adjust their communication partners who they want to communicate with.
The Cloud Zone is brought to you in partnership with Internap. Read Bare-Metal Cloud 101 to learn about bare-metal cloud and how it has emerged as a way to complement virtualized services.