Non-uniform PCIe Access - are any OSs "NUPA" aware?

Wondering which PCIe slots to use on dual Intel CPU E5 based servers, and whether any operating system know how to load balance to keep PCIe access local to the CPU.

It's well known that modern OSs are NUMA aware and will try to allocate RAM/cores so that memory access is local rather than across the hypertransport/quickpath busses because modern CPUs have inbuilt memory controllers.

What about PCIe cards though now that the PCIe controllers are built into the CPUs rather than on the southbridge? Say for example I have a HP DL380 gen8 or Dell R720 with two CPUs and I put dual port NICs in slots 1 and 4 and teamed them together for 4*1Gb, would the OS be clever enough to know that slot 1 was on CPU1 and slot 4 was on CPU2 and direct the outbound packets to the local NIC or would half of the traffic go over the HT bus in a random manner? Is there any benefit of spreading NICs and FC HBAs over both CPUs inbuilt PCIe controllers?

I'm going to sleep on it and see if anyone knows for sure.
LVL 56
Who is Participating?
Andrew Hancock (VMware vExpert / EE MVE^2)Connect With a Mentor VMware and Virtualization ConsultantCommented:
Andy, it's an excellent question, and we've been asking ourselves the same question and testing with recent large cluster deployments of Dell R720s, and which PCIe slots, to insert quad Broadcom network cards, to network team with Windows 2012 Hyper-V.

at present, after much testing, an high network I/O, and performance monitoring, it's difficult to ascertain, and draw any conclusions, and we have raised this Service Request to our Technical Specialist at Dell!
andyalderAuthor Commented:
Did Dell come back with anything? It's really a question for the likes of VMware, MS and the NIC/HBA manufacturers to address rather than the server makers. I think we have to assume there's nothing that uses locality in load balancing at the moment so we might as well just put the cards on any CPU and get average 50% local traffic.

Not sure if there is a heat/turbo gain to spreading the cards over both CPUs, if the processors and airflow were identical then maybe the additional heat from the PCIe controller part of the chip might come into play and slow the CPU on that die down but that would be delving into the 1-2% speed improvements you can get by picking the fastest of a particular chip part number.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
Dell still sleepy mode over Chrimbo!
andyalderAuthor Commented:
EE is chasing as abandoned so I'll close. Guess it doesn't matter much which CPU the peripherals are on at the moment otherwise the manufacturers would be shouting about slight speed improvements from clever drivers.
Andrew Hancock (VMware vExpert / EE MVE^2)VMware and Virtualization ConsultantCommented:
still in Dell Escalation Team, bookmarked the question, so will post back here, or catchmeup through profile!
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.