[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2104
  • Last Modified:

Netapp Aggregate 110gb free but 99%full - will this cause very slow performance copying between volumes on same aggregate

please help - the unix guys just doing a copy from 1 vol to another on same file and aggregate?

the copy performance is very poor /slow - there are no lan/duplex issues same switch carrying nfs

is this wholly due to 99% aggr even though 110gb free? -
0
philb19
Asked:
philb19
  • 5
  • 3
1 Solution
 
Paul SolovyovskyCommented:
This normal behavior, it all depends on what you are doing with it..

Example:  You have a 2TB Aggregate.  You create a volume that is 1TB but you have nothing in it, your Aggregate will show 50% full although you don't have anything in the volume.

Other issues could also cause your volumes to grow in usage such as orphaned snapshots, not running space reclaimation on the LUNs, not sizing your volumes correctly, etc...


Generally you don't want your Aggr to go about 90% otherwise your NVRAM can't write data appropriately.  I have just had such a scenario with a customer and was able to go from 99% on the Aggregate to 75% in a few hours worth of a health check/remediation.

If you have a relationship with a Netapp consultant have them take a look but also do a sanity check and figure out what is actually being use and if your snapshots are taking up too much space and your volumes are correctly sized.

Hope this helps
0
 
philb19Author Commented:
great thanks for answering paulsolov:very helpful - have you actually seen poor performance with a 99% aggr
0
 
Paul SolovyovskyCommented:
YES...this is very common, depending on how much data you hav in use.  Think about it you're all your data is being use by data and not released by snapshots it doesn't have anywhere to be written to since space is not cleared up.

Can you do a screenshot of your volumes from netapp system manager?

You should also run a sysstat command from a ssh session on the filer and see what it's actually doing?  Are you running Data Fabric Manager/Operations Manager?

If you can't find a consultant and figure out what is going on I do provide consulting services.

Paul
0
Transaction-level recovery for Oracle database

Veeam Explore for Oracle delivers low RTOs and RPOs with agentless transaction log backup and transaction-level recovery of Oracle databases. You can restore the database to a precise point in time, even to a specific transaction.

 
philb19Author Commented:
thanks ill send the screenshots when im logged on - im just a bit thrown with there being 110GB free - (even though aggr is 99% full) - its still 110gb - substantial amount i would have thought - i guess it depends on how much the overhead required for file system etc
0
 
robocatCommented:

>im just a bit thrown with there being 110GB free - (even though aggr is 99% full) - its still 110gb - substantial amount i would have thought

This is due to the way a Netapp WAFL filesystem works. It has very good write performance because it's able to write all data sequentially to all disks at once, even if the data is random I/O.
To be able to do this, it needs to find enough free blocks on all the disks in the same location of the disk. If the aggregate is over 90% full, it will not be able to do so, degrading performance severely.

Read up on how WAFL works if you want to understand this in detail. Otherwise just make sure that you keep at least 10% free space.
0
 
Paul SolovyovskyCommented:
Check this URL, specifiaclly the chart that shows performance vs space on aggr

http://media.netapp.com/documents/wp_3356.pdf
0
 
Paul SolovyovskyCommented:
how are you making out?
0
 
philb19Author Commented:
Thanks very much for helping. We did get aggr down to around 85% - we did have netapp consultant in. It turns out the key reason for slowness was due to spindles difference in our disk shelfs - we have a mix of FC drives and older SATA drives - and thr data transfer between the 2 shelves was main cause. - he did send me a report with greater detail - basically disks were being thrashed

    When SnapMirror transfers a volume to an aggregate with a different geometry, the relationship between logical and physical addresses is changed. The destination aggregate lays out the volume into different physical blocks. Immediately after the SnapMirror transfer, reads to the destination volume will use a different process to translate logical addresses into physical addresses. This method is less efficient and may require additional disk IO to access a block of data. Once a SnapMirror update finishes, the destination storage controller will start a deswizzle scan to recreate the metadata required for fast disk access. Once the deswizzle finishes, reads to the SnapMirror destination will use the normal fast-path. This is never occurring for the "sm_prodapp" volume, as deswizzle scans are not finishing between snapmirror updates.
(exert)
As you can see above, the issues exacerbate each other. The deswizzle scans are not completing between SnapMirror updates, which causes a consistently high disk utilisation on saggr1, and inefficient disk access for NFS operations on the sm_prodapp volume.


reduce the impact of deswizzle scans, and increase available disk IO capacity by moving the "sm_prodapp" volume on LCFILER from "saggr1" (SATA) to "aggr0" (FC)
0
 
Paul SolovyovskyCommented:
good to know, I have seen some strange issues with different flavor drives, I try to keep the raid groups with the same drives to avoid such issues.  
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

  • 5
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now