Non-uniform PCIe Access - are any OSs "NUPA" aware?

andyalder
andyalder used Ask the Experts™
on
Wondering which PCIe slots to use on dual Intel CPU E5 based servers, and whether any operating system know how to load balance to keep PCIe access local to the CPU.

It's well known that modern OSs are NUMA aware and will try to allocate RAM/cores so that memory access is local rather than across the hypertransport/quickpath busses because modern CPUs have inbuilt memory controllers.

What about PCIe cards though now that the PCIe controllers are built into the CPUs rather than on the southbridge? Say for example I have a HP DL380 gen8 or Dell R720 with two CPUs and I put dual port NICs in slots 1 and 4 and teamed them together for 4*1Gb, would the OS be clever enough to know that slot 1 was on CPU1 and slot 4 was on CPU2 and direct the outbound packets to the local NIC or would half of the traffic go over the HT bus in a random manner? Is there any benefit of spreading NICs and FC HBAs over both CPUs inbuilt PCIe controllers?

I'm going to sleep on it and see if anyone knows for sure.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017
Commented:
Andy, it's an excellent question, and we've been asking ourselves the same question and testing with recent large cluster deployments of Dell R720s, and which PCIe slots, to insert quad Broadcom network cards, to network team with Windows 2012 Hyper-V.

at present, after much testing, an high network I/O, and performance monitoring, it's difficult to ascertain, and draw any conclusions, and we have raised this Service Request to our Technical Specialist at Dell!
Top Expert 2014

Author

Commented:
Did Dell come back with anything? It's really a question for the likes of VMware, MS and the NIC/HBA manufacturers to address rather than the server makers. I think we have to assume there's nothing that uses locality in load balancing at the moment so we might as well just put the cards on any CPU and get average 50% local traffic.

Not sure if there is a heat/turbo gain to spreading the cards over both CPUs, if the processors and airflow were identical then maybe the additional heat from the PCIe controller part of the chip might come into play and slow the CPU on that die down but that would be delving into the 1-2% speed improvements you can get by picking the fastest of a particular chip part number.
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
Dell still sleepy mode over Chrimbo!
Top Expert 2014

Author

Commented:
EE is chasing as abandoned so I'll close. Guess it doesn't matter much which CPU the peripherals are on at the moment otherwise the manufacturers would be shouting about slight speed improvements from clever drivers.
Andrew Hancock (VMware vExpert / EE Fellow)VMware and Virtualization Consultant
Fellow 2018
Expert of the Year 2017

Commented:
still in Dell Escalation Team, bookmarked the question, so will post back here, or catchmeup through profile!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial