Uncategorized

XCOPY Chunk Sizes – Revisited (and data reduction as well)

As we get very close to the XtremIO X2 GA, I wanted to compare some important metrics, XCOPY performance and data reduction (DRR) between the two platforms.

Lets start with DRR, that’s a straight forward one to compare, I took a windows 10 VM and cloned it to an X1 array,

On X1, the capacity that VM consumed was 8.54GB Physical capacity and 11.89GB logical capacity, with a total DRR of 1.4:1

I then cloned it to X2, On X2, the capacity that VM consumed was 6.99GB Physical capacity and 11.97GB logical capacity, with a total DRR of 1.7:1, that’s DRR efficiency just there!

Now, let’s compare XCOPY speed and potentially look to optimize it even further.

By default, the XCOPY chunk size is 4MB, in the past we recommended to change it to 0256kb for X1 as it turned out to be the sweet spot between performance, time and latency. See a blog post I wrote here

So on X1, lets change the XCOPY chunk size to 0256kb using the following command

And run the XCOPY operation using the 0256KB XCOPY parameter, the operation started at 8:55:05 and concluded at, that’s 180 seconds

Because we used 256kb chunk size for the operation, we can see that the “blocks” reporting is highlighting this as larger than 1MB BW

Latency peaked at roughly 0.1 ms latency

And the array (single X1) CPU utilization has peaked to 81% during the operation

Now, on X2, lets change (or ensure) the XCOPY chunk size to 0256kb using the following command

And re-run the XCOPY operation using the 0256KB XCOPY parameter, the operation started at 7:58:30 and concluded at 8:00:20, that’s 80 seconds

Because we used 256kb chunk size for the operation, we can see that the “blocks” reporting is highlighting this as larger than 1MB BW

Latency peaked at roughly 0.023 ms latency

And the array (single X2-S) CPU utilization has peaked to 70% during the operation

Lastly, lets change the XCOPY chunk size to 4MB using the following command

And re-run the XCOPY operation using the 4MB XCOPY parameter, the operation started at 7:29:05 and concluded at 7:30:20, that’s 75 seconds

Because we used 4MB chunk size for the operation, we can see that the “blocks” reporting is highlighting this as larger than 1MB BW

Latency peaked at roughly 0.5 ms latency

And the array (single X2-S) CPU utilization has peaked to 55% during the operation

you can see a video i recorded showing it all here

So, to conclude, X2 cloning speed was X 2.25 faster! And using the 4MB XCOPY chunk size on X2, you could save extra 5 seconds of that 100 VMs cloning, obviously, the more VMs you clone, the larger the time gap will be!

XtremIO Model + Block Size Used By XCOPY Time Latency
X1 – 256kb 180 seconds 0.1 ms
X2 – 256kb 80 seconds 0.023 ms
X2 – 4096kb (4mb) 75 seconds 0.46 ms

Categories: Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s