Transferring Large Amount of Data Over the Network: SCP, TAR; SSH, TAR; NC Compared
Join the DZone community and get the full member experience.
Join For FreeScp is slow, that’s a known fact. Known and so annoying that someone tried to fix it by producing the hpn-ssh patch:
SCP and the underlying SSH2 protocol implementation in OpenSSH is network performance limited by statically defined internal flow control buffers. These buffers often end up acting as a bottleneck for network throughput of SCP, especially on long and high bandwidth network links.
Nonetheless, especially for small transfers, scp is straightforward
and so that’s what I use. But transferring 100GB of data between 2
machines on the same LAN proved to be such a pain that I decided to opt
for one of the alternatives, the 2 most common being tar over ssh and
tar over netcat. The whole thing got me curious so I decided to do some
testing/bechmarking.
This is no scientific test. There was background noise, OSes of the box were different, and more. But it’s good enough as a real life test between two boxes on the same LAN.
Test bed
Two boxes, referred to as hostA and hostB from now on, with the same specs:
vendor_id : AuthenticAMD
model name : AMD Sempron(tm) Processor 2800+
cpu MHz : 1600.010
MemTotal : 2009992 kB
SATA disks: Timing cached reads: 1243.04 MB/sec
Timing buffered disk reads: 57.97 MB/sec
Network : VIA Technologies, Inc. VT6102 [Rhine-II] (rev 78)
Switch : Netgear 10/100 Mbs
Boxes were connected via a 10/100Mbs switch, living on the same
LAN/subnet. Given the above setup it’s safe to assume that the network
is the bottleneck, with its theoretical 12MB/s peak transfer rate.
Test cases and data set
I’ve created 2 directories, one containing 2000 100KB files , and the other 200 10MB files. All files I’ve been created using dd if=/dev/urandom of=file. These are the commands I’ve compared:
hostA: scp -r dir user@hostB:/tmp/
hostA: tar cf – dir | ssh user@hostB tar xf – -C /tmp/
hostA: tar cf – dir | nc -w1 hostB 6969 \
on hostB: nc -l -p 6969 | tar xf – -C /tmp/
I’ve also run a set of tests using ssh compression and tar gzip
compression. To be noted that bzip2 compression is too CPU expensive to
be generally worth it.
Results
Command | Compression | Fileset | Time |
scp | No | Small | 0:01:53 |
scp | No | Large | 0:10:10 |
scp | Yes | Small | 0:02:46 |
scp | Yes | Large | 0:14:11 |
tar | ssh | No | Small | 0:00:24 |
tar | ssh | No | Large | 0:03:18 |
tar | ssh | Yes ssh | Small | 0:01:09 |
tar | ssh | Yes ssh | Large | 0:11:33 |
tar | ssh | Yes tar gz | Small | 0:00:18 |
tar | ssh | Yes tar gz | Large | 0:01:57 |
tar | nc | No | Small | 0:00:21 |
tar | nc | No | Large | 0:03:24 |
tar | nc | Yes tar gz | Small | 0:00:20 |
tar | nc | Yes tar gz | Large | 0:01:16 |
This is a summary with totals for the entire dataset transfer with times in seconds
Command | Compression | Time |
scp | No | 723 |
scp | Yes ssh | 1017 |
tar | ssh | No | 222 |
tar | ssh | Yes ssh | 762 |
tar | ssh | Yes tar gz | 135 |
tar | nc | No | 225 |
tar | nc | Yes tar gz | 96 |
Conclusions
Scp is by far the slowest transfer method, 623%
slower than the fastest case scenario. Contrary to the common conception
that it’s ssh’s encryption layer to slow down the transfer, it is
really scp being slow, as tar over ssh performs as good as over nc. The
other 2 things to consider are the disastrous impact of ssh’s traffic
compression (-C), which surprisingly slows down the transfer of roughly
42% in the case of scp and even 270% in the tar over ssh test, and the
tar gzip compression, which results in transfers
being 87% faster over ssh and 134.38% over nc.
Published at DZone with permission of Spike Morelli, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments