#SOFS logical configuration reference


Scale-out File Server (SOFS) is a great new technology we can find in Windows 2012 R2 (debuted in 2012). We have endorsed SOFS with or without storage spaces (see Aidan blog). In either cases SOFS is a great NAS solution based on SMBv3 that can be leveraged by Windows OS (file sharing, hyper-v vhdx, SQL Server data files) and Linux (starting with kernel 3.12).

This post is not about the hardware optimizations or benchmarking stuff (see references for a complete list of articles on the subject) rather it is all about the configuration and monitor of a SOFS cluster regardless of the underlying hardware (obviously the better the components choice the faster will be your storage access).

As usual this will be a journey and Windows 10 will change a few things, I’ll try to keep this post up to date.

This reference configuration takes into account the following aspects:

  • SOFS cluster network configuration
  • Storage Deduplication, redirected IO and network planning
  • CSV tuning and monitoring

The reference configuration has the following goals:

  • All SOFS traffic must be isolated on a dedicated network
  • Client side configuration should be minimized
  • CSV redirected traffic must not impact management and should use a different network from the SOFS one
  • Cluster management traffic must not use in any circumstances the SOFS network
  • SOFS accessibility and performance should be monitored

clip_image001

While these can seem pretty obvious requirements the default configuration doesn’t achieve these goals.

The reference configuration needs 3 different networks defined on the SOFS cluster, these networks do not need to be routable between them and the advice is to not make them routable:

  • Cluster (used for intra-cluster traffic, including CSV redirected access) – cluster only (role=1)
  • Management (used for management purposes, monitoring, domain join, etc.) – cluster and client (role=3)
  • Storage (used for SOFS / SMBv3 traffic only) – cluster and client (role=3)

clip_image002

Logical configurations for the stated goals

“All SOFS traffic must be isolated on a dedicated network”

There are at least a couple of options here: smb constraints and network exclusion. Since the first one is a client side option we opted for network exclusion. On the SOFS cluster the Storage network

$dnn = Get-ClusterResource | where {$.Name -eq ‘SM2-SFS-vSFS1’ -and $.ResourceType -eq ‘Distributed Network Name’}

$mgmtnet = get-clusternetwork -Name ‘Management’

$dnn | Set-ClusterParameter -Name ExcludedNetworks -Value $mgmtnet.Id

This settings makes the DNN register itself on the DNS only with the IPs of the Storage network (the only network not excluded with Client use flag set). With this settings there is no need for constraints client side hance we achive the second goal “Client side configuration should be minimized”. Anyway if SOFS is accessed by and hyperv host we recommend to set “Do not allow cluster network communication on this network” for the storage network. (see http://technet.microsoft.com/it-it/library/dn550728.aspx)

“CSV redirected traffic must not impact management and should use a different network from the SOFS one”

This requirement needs an explanation. First of all redirected CSV traffic is something that needs to be taken seriously, in most cases the impact can be negligible, but if you chose to:

  • Use dynamic VHDx
  • Use data deduplication

The redirected traffic becomes something to plan about. In fact in both those cases we can have a fair amount of IO serviced via redirected file access (see Understanding the state of your Cluster Shared Volumes in Windows Server 2012 R2). That’s why we prefer redirected IO to use its own network rather than use bandwith on the Storage channel. What we do want in this case is to use che Cluster network for redirected IO, obviously the cluster network should have enough bandwith to process the required IO. We don’t want redirectd IO to use the management network either, in this case under heavy IO we could lose cluster connectivity. When dedup and dynamic VHDx are used we try to set it at 50% of the Storage network. If we’re using HP blade systems with a FelxTen connectivity we set Storage at 6 Gbps x 2, Management at 1 Gbps x 2 and Cluster at 3 Gbps x 2. Where x2 means we’re formin a team of two NICs.

If you are in a situation when it’s not easy to have separate networks you can try to use SMB bandwith control (see Preventing Live Migration Over SMB Starving CSV Traffic in Windows Server 2012 R2 with Set-SmbBandwidthLimit).

So how can we be sure the Cluster network is used for CSV redirected IO, pretty easy since it’s the only network with “Cluster Only” role it will be used by default:

  • Rule 1 – SMB Multichannel take precedence over the Network Priorities of NetFT to decide what Subnets to use for the CSV Redirected traffic
  • Rule 2 – The cluster will only use Internal Cluster Networks by default for SMB Multichannel. This behavior can be changed to also used the External Networks modifying the UseClientAccessNetworksForSharedVolumes Cluster parameter
  • Rule 3 – SMB Multichannel requires identical link speed and features (RSS and/or RDMA) to stream the CSV redirected traffic over different subnets simultaneously
  • Rule 4 – If Adapters are not identical, SMB Multichannel will use the faster adapter/s only to stream the CSV redirected traffic
  • Rule 5 – Failover Cluster will fail back to NetFT the decision of what subnet to use only if SMB Multichannel is not available or disabled. Then the lowest metric logic will apply and the CSV redirected traffic will be send over the Lowest metric subnet

In other terms CSV traffic:

  • Prefers to use networks configured for “Allow cluster communication on this network” with “Allow clients to connect through this network” *not* selected.
  • Will use networks configured for “Allow cluster communication on this network” with “Allow clients to connect through this network” selected if no other networks available.
  • Does not use networks that are configured for “Do not allow cluster network communication on this network”

For more info see “Windows Server 2012 SMB Multichannel and CSV Redirected traffic caveats”

“Cluster management traffic must not use in any circumstances the SOFS network”

What I mean is the cluster nodes must only register the management IP address and not the SOFS addresses. By default, since the SOFS network is marked as “Cluster and Client” the nodes will register a couple of IP addresses: the IP address and the management network and the IP address on the SOFS network. Bad situation in general, very bad if the management and SOFS network are not routed. To get rid of this simply uncheck the “Register this connection’s address in DNS” in the TCP/IP advanced dialog.

clip_image003

“SOFS accessibility and performance should be monitored”

The pretty obvious statement is: SOFS must be monitored exactly like all your storage fabric components. We need to be proactive in managing the storage subsystem and try to anticipate and avoid all the problems that we can. An issue with the storage subsystem often means several services with problems.

In a Microsoft ecosystem we use System Center Operations Manager for monitoring purposes. Alas there’s no specific management pack for SOFS, but if we get to the bone, SOFS is made of three technologies:

  • The failover cluster
  • Cluster Shared Volumes
  • SMBv3 / Server service

So we can monitor SOFS using the management packs for the Operating System, Failover Cluster and File Services. The CSV monitoring is a little basic for this reason I developed a specific MP you can find on Technet Gallery. I must add it is not as comprehensive as it should, but is’ a good start.

CSV Tuning

This paragraph is really a work in progress, right now we have the following guidelines:

  • Very slow disks and dedup don’t go well together
  • The average CSV size we use is between 4 TB and 10 TB, often we use 8 TB CSV
  • We don’t see any performance benefit pushing CSV cache over 16 GB

Known issues

Error 0x80090322 adding a new share to the SOFS cluster. The SOFS SPN are not registered properly, re-register the SPNs. Setspn -A http/<<SOFS DNN fqdn>> <<SOFS DNN>>

References

– Daniele

This posting is provided “AS IS” with no warranties, and confers no rights.

Advertisements
  1. NeWay Technologies – Weekly Newsletter #128 – January 2nd, 2015 | NeWay
  2. NeWay Technologies – Weekly Newsletter #128 – January 1st, 2015 | NeWay
  3. Weekly IT Newsletter – December 29-January 2nd, 2015 | Just a Lync Guy

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: