Over a million developers have joined DZone.

Which HDD to Use for a Content Application – Large and Small Files: Part 5

File processing are common content applications tasks, some being small, others large or mixed, as well as reads and writes.

· Performance Zone

Evolve your approach to Application Performance Monitoring by adopting five best practices that are outlined and explored in this e-book, brought to you in partnership with BMC.

This is the fifth in a multi-part series (read part four here) based on a white paper hands-on lab report I did—compliments of Servers Direct and Seagate—that you can read in PDF form here. The focus is looking at the Servers Direct (www.serversdirect.com) converged Content Solution platforms with Seagate Enterprise Hard Disk Drive (HDD‘s). In this post, the focus looks at large and small file I/O processing.

File Performance Activity

Tip, Content solutions use files in various ways. Use the following to gain perspective how various HDD‘s handle workloads similar to your specific needs.

Two separate file processing workloads were run (12), one with a relative small number of large files, and another with a large number of small files. For the large file processing (table-3), 5 GByte sized files were created and then accessed via 128 Kbyte (128KB) sized I/O over a 10 hour period with 90% read using 64 threads (workers). Large file workload simulates what might be seen with higher definition video, image or other content streaming.

(Note 12) File processing workloads were run using Vdbench 5.04 and file anchors with sample script configuration below. Instead of vdbench you could also use other tools such as sysbench or fio among others.

VdbenchFSBigTest.txt
 # Sample script for big files testing
 fsd=fsd1,anchor=H:,depth=1,width=5,files=20,size=5G
 fwd=fwd1,fsd=fsd1,rdpct=90,xfersize=128k,fileselect=random,fileio=random,threads=64
 rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=10h,interval=30

vdbench -f VdbenchFSBigTest.txt -m 16 -o Results_FSbig_H_060615

VdbenchFSSmallTest.txt
 # Sample script for big files testing
 fsd=fsd1,anchor=H:,depth=1,width=64,files=25600,size=16k
 fwd=fwd1,fsd=fsd1,rdpct=90,xfersize=1k,fileselect=random,fileio=random,threads=64
 rd=rd1,fwd=fwd1,fwdrate=max,format=yes,elapsed=10h,interval=30

vdbench -f VdbenchFSSmallTest.txt -m 16 -o Results_FSsmall_H_060615

The 10% writes are intended to reflect some update activity for new content or other changes to content. Note that 128KB per second translates to roughly 1 Gbps streaming content such as higher definition video. However 4K video (not optimized) would require a higher speed as well as resulting in larger file sizes. Table-3 shows the performance during the large file access period showing average read /write rates and response time, bandwidth (MBps), average open and close rates with response time.


Avg. File Read Rate

Avg. Read Resp. Time
 Sec.

Avg. File Write Rate

Avg. Write Resp. Time
 Sec.

Avg.
 CPU %
 Total

Avg. CPU % System

Avg. MBps
 Read

Avg. MBps
 Write

ENT 15K R1

580.7

107.9

64.5

19.7

52.2

35.5

72.6

8.1

ENT 10K R1

455.4

135.5

50.6

44.6

34.0

22.7

56.9

6.3

ENT CAP R1

285.5

221.9

31.8

19.0

43.9

28.3

37.7

4.0

ENT 10K R10

690.9

87.21

76.8

48.6

35.0

21.8

86.4

9.6

Table-3 Performance summary for large file access operations (90% read)

Table-3 shows that for two-drive RAID 1, the Enterprise 15K are the fastest performance, however using a RAID 10 with four 10K HDD’s with enhanced cache features provide a good price, performance and space capacity option. Software RAID was used in this workload test.

Figure-4 shows the relative performance of various HDD options handling large files, keep in mind that for the response line lower is better, while for the activity rate higher is better.

large file processing
 Figure-4 Large file processing 90% read, 10% write rate and response time

In figure-4, you can see the performance in terms of response time (reads larger dashed line, writes smaller dotted line) along with number of file read operations per second (reads solid blue column bar, writes green column bar). Reminder that lower response time, and higher activity rates are better. Performance declines moving from left to right, from 15K to 10K Enterprise Performance with enhanced cache feature to Enterprise Capacity (7.2K), all of which were hardware RAID 1. Also shown is a hardware RAID 10 (four x 10K HDD’s).

Results in figure-4 above and table-4 below show how various drives can be configured to balance their performance, capacity and costs to meet different needs. Table-4 below shows an analysis looking at average file reads per second (RPS) performance vs. HDD costs, usable capacity and protection level.

Table-4 is an example of looking at multiple metrics to make informed decisions as to which HDD would be best suited to your specific needs. For example RAID 10 using four 10K drives provides good performance and protection along with large usable space, however, that also comes at a budget cost (e.g. price).


Avg.
 File Reads Per Sec. (RPS)

Single Drive Cost per RPS

Multi-Drive Cost per RPS

Single Drive Cost / Per GB Capacity

Cost / Per GB Usable (Protected) Cap.

Drive Cost (Multiple Drives)

Protection Overhead (Space Capacity for RAID)

Cost per usable GB per RPS

Avg. File Read Resp. (Sec.)

ENT 15K R1

580.7

$1.02

$2.05

$ 0.99

$0.99

$1,190

100%

$2.1

107.9

ENT 10K R1

455.5

1.92

3.84

0.49

0.49

1,750

100%

3.8

135.5

ENT CAP R1

285.5

1.40

2.80

0.20

0.20

798

100%

2.8

271.9

ENT 10K R10

690.9

1.27

5.07

0.49

0.97

3,500

100%

5.1

87.2

Table-4 Performance, capacity and cost analysis for big file processing

Small File Size Processing

To simulate a general file sharing environment, or content streaming with many smaller objects, 1,638,464 16KB sized files were created on each device being tested (table-5). These files were spread across 64 directories (25,600 files each) and accessed via 64 threads (workers) doing 90% reads with a 1KB I/O size over a ten hour time frame. Like the large file test, and database activity, all workloads were run at the same time (e.g. test devices were concurrently busy).


Avg. File Read Rate

Avg. Read Resp. Time
 Sec.

Avg. File Write Rate

Avg. Write Resp. Time
 Sec.

Avg.
 CPU %
 Total

Avg. CPU % System

Avg. MBps
 Read

Avg. MBps
 Write

ENT 15K R1

3,415.7

1.5

379.4

132.2

24.9

19.5

3.3

0.4

ENT 10K R1

2,203.4

2.9

244.7

172.8

24.7

19.3

2.2

0.2

ENT CAP R1

1,063.1

12.7

118.1

303.3

24.6

19.2

1.1

0.1

ENT 10K R10

4,590.5

0.7

509.9

101.7

27.7

22.1

4.5

0.5

Table-5 Performance summary for small-sized (16KB) file access operations (90% read)

Figure-5 shows the relative performance of various HDD options handling large files, keep in mind that for the response line lower is better, while for the activity rate higher is better.

small file processing
 Figure-5 Small file processing 90% read, 10% write rate and response time

In figure-5, you can see the performance in terms of response time (reads larger dashed line, writes smaller dotted line) along with number of file read operations per second (reads solid blue column bar, writes green column bar). Reminder that lower response time, and higher activity rates are better. Performance declines moving from left to right, from 15K to 10K Enterprise Performance with enhanced cache feature to Enterprise Capacity (7.2K RPM), all of which were hardware RAID 1. Also shown is a hardware RAID 10 (four x 10K RPM HDD’s) that has higher performance and capacity along with costs (table-5).

Results in figure-5 above and table-5 below show how various drives can be configured to balance their performance, capacity and costs to meet different needs. Table-6 below shows an analysis looking at average file reads per second (RPS) performance vs. HDD costs, usable capacity and protection level.

Table-6 is an example of looking at multiple metrics to make informed decisions as to which HDD would be best suited to your specific needs. For example RAID 10 using four 10K drives provides good performance and protection along with large usable space, however, that also comes at a budget cost (e.g. price).


Avg.
 File Reads Per Sec. (RPS)

Single Drive Cost per RPS

Multi-Drive Cost per RPS

Single Drive Cost / Per GB Capacity

Cost / Per GB Usable (Protected) Cap.

Drive Cost (Multiple Drives)

Protection Overhead (Space Capacity for RAID)

Cost per usable GB per RPS

Avg. File Read Resp. (Sec.)

ENT 15K R1

3,415.7

$0.17

$0.35

$0.99

$0.99

$1,190

100%

$0.35

1.51

ENT 10K R1

2,203.4

0.40

0.79

0.49

0.49

1,750

100%

0.79

2.90

ENT CAP R1

1,063.1

0.38

0.75

0.20

0.20

798

100%

0.75

12.70

ENT 10K R10

4,590.5

0.19

0.76

0.49

0.97

3,500

100%

0.76

0.70

Table-6 Performance, capacity, and cost analysis for small file processing

Looking at the small file processing analysis in table-5 shows that the 15K HDD’s on an apples to apples basis (e.g. same RAID level and number of drives) provide the best performance. However when also factoring in space capacity, performance, different RAID level or other protection schemes along with cost, there are other considerations. On the other hand the Enterprise Capacity 2TB HDD’s have a low cost per capacity, however do not have the performance of other options, assuming your applications need more performance.

Thus the right HDD for one application may not be the best one for a different scenario as well as multiple metrics as shown in table-5 need to be included in an informed storage decision making process.

What This All Means

Even if your content environment is using object storage, chances are unless it is a new application or a gateway exists, you may be using NAS or file based access. Thus the importance of if your applications are doing file-based processing, either run your own applications or use tools that can simulate as close as possible to what your environment is doing.

Continue reading part six in this multi-part series here where the focus is around general I/O including 8KB and 128KB sized IOPs along with associated metrics.

Learn tips and best practices for optimizing your capacity management strategy with the Market Guide for Capacity Management, brought to you in partnership with BMC.

Topics:
performance ,response time ,processing ,hdd ,handling ,time

Published at DZone with permission of Greg Schulz, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}