Over a million developers have joined DZone.

Case Study: 10x File Copy Performance with Robocopy

DZone's Guide to

Case Study: 10x File Copy Performance with Robocopy

· Performance Zone ·
Free Resource

Maintain Application Performance with real-time monitoring and instrumentation for any application. Learn More!

Source data:

  • ~500,000 folders (court cases)
  • ~2.5-3 million documents
  • Source drives is replicated x2 with RAID
  • Copying to NAS over GB ethernet
  • Initial un-tuned copy was set to take ~2 weeks (after switching to Robocopy – before, it was painful just to do an ls)
  • Final copy took ~24 hours


  • Initially I saw 20-40 Kbps in traffic in DD-WRT, clearly too low. After some changes this is still generally low, but with spikes up to 650 Kbps.
  • CPU use – 4/8 cores in use, even with >8 threads assigned to Robocopy
  • In Computer Management -> Performance monitoring, the disk being copied is Reading as fast as it can (set to 100 all the time)
  • The number called “Split IO / second” is very high much of the time. Research indicates this could be improved with defrag (though this might take me months to complete).

Filesystem Lessions:

  • NTFS can hold a folder with large numbers of files but takes forever to enumerate
  • When you enumerate a directory in NTFS (e.g. by opening it in Windows Explorer), Windows appears to lock the folder(!) which pauses any copy/ls operations
  • The copy does not appear to be I/O bound – even setting Robocopy to use many threads, only 4/8 cores are in use at 5-15% per each.
  • ext4 (destination system) supports up to 64,000 items per folder, any more and you get an error.
  • I split all 500k items into groups of 256*256 at random (for instance one might open \36\0f to see a half dozen items). These are split up using md5 on the folder names – basically this uses the filesystem as a tree map.
  • One nice consequence of this is that you can estimate how far along the process is by looking at how many folders have been copied (85/256 -> 33%, etc)

Robocopy Options:

  • Robocopy lets you remove the console logging, with /LOG:output.txt
  • Robocopy lets you set the number of threads it uses. By default this is 8, it seemed to run faster with > 8, but only the first few threads made any difference.

To investigate:

  • Ways of using virtual filesystems – it’d be nice to continue using wget to download, but split up large folders into batches for scraping. 
  • One possibility is to use wget through VirtualBox, since there are more linux based virtual filesystems – not sure on the performance ovehead

Collect, analyze, and visualize performance data from mobile to mainframe with AutoPilot APM. Learn More!


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}