FireBall
asked on
Same ip traffic going to same core while irq is activated
Hello ,
We have suprised on one test. We realize that when an ip address directly sends 1million pps to a server. That server normally working with irq perfectly , it hits to only one core of the server.
Why that happens and how this issue should be over comed.
We have suprised on one test. We realize that when an ip address directly sends 1million pps to a server. That server normally working with irq perfectly , it hits to only one core of the server.
Why that happens and how this issue should be over comed.
ASKER
Actually it is not adapted to one cpu it is adapted to only one core if a single ip flooding.
For ex if you calculate 500k pps for per cpu core and if you have dpkg / pf_ring and you should decide that you are ok to handle 12m pps in total of 24 cores / rings. But ;
If s.b. floods 750k pps with same source ip that causes the core stuck , i tryed and tested this on
Juniper SRX 3600 / Ubuntu / Freebsd / Debian / centos all the same they all core stucks.
I do not need to flood from my own ip . If i flood with 1.1.1.1 spoofed source ip with 4 machine i should let it down a datacenter easily.
For ex if you calculate 500k pps for per cpu core and if you have dpkg / pf_ring and you should decide that you are ok to handle 12m pps in total of 24 cores / rings. But ;
If s.b. floods 750k pps with same source ip that causes the core stuck , i tryed and tested this on
Juniper SRX 3600 / Ubuntu / Freebsd / Debian / centos all the same they all core stucks.
I do not need to flood from my own ip . If i flood with 1.1.1.1 spoofed source ip with 4 machine i should let it down a datacenter easily.
ASKER
Please do not take the topic to the routers. Ntuple filters (interface apis ) are able to block the source ip easily for millions of pps but my aim is to understand why kernel acts like this
Per my original post, I believe this is by design, as trying to share packets across many CPUs would likely take down a machine (CPU thrash) under high traffic loads.
This will work the same on every Distro, because Kernel code is common across all Distros.
Likely best place for you to receive a definitive answer is posting a question the the Debian developer forum for networking.
Likely the actual developers working on this code, will be the best people to ask.
You may also have to fallback to asking your question on one of the https://www.kernel.org/ related IRC channels or forums.
Or, just get into the actual Kernel code + see how it's coded.
This will work the same on every Distro, because Kernel code is common across all Distros.
Likely best place for you to receive a definitive answer is posting a question the the Debian developer forum for networking.
Likely the actual developers working on this code, will be the best people to ask.
You may also have to fallback to asking your question on one of the https://www.kernel.org/ related IRC channels or forums.
Or, just get into the actual Kernel code + see how it's coded.
ASKER
Actually it is already locking the cores and caution of restart the kernel & server :) or comepletely loosing all the packets
Do you know the mailing list of kernel developers ? Because i do not think ubuntu developers should know this issue because it depends on inside the kernel , it is same in all distros
Do you know the mailing list of kernel developers ? Because i do not think ubuntu developers should know this issue because it depends on inside the kernel , it is same in all distros
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you
After talking with my brother about this, solution seems to this...
1) Use a http://dpdk.org/doc/nics card.
2) Place card in DPDK mode, so all traffic steering occurs on card, rather than in kernel.
3) Create how some number of queues in card (how many depends on traffic pattern + card CPUs).
4) All interrupts + buffer reassembly now occurs in card, rather than kernel.
5) To scale further requires routers which understand bonding, then bond multiple cards together.
Expensive + complex + doable, if you simply must handle high bandwidth traffic.
1) Use a http://dpdk.org/doc/nics card.
2) Place card in DPDK mode, so all traffic steering occurs on card, rather than in kernel.
3) Create how some number of queues in card (how many depends on traffic pattern + card CPUs).
4) All interrupts + buffer reassembly now occurs in card, rather than kernel.
5) To scale further requires routers which understand bonding, then bond multiple cards together.
Expensive + complex + doable, if you simply must handle high bandwidth traffic.
https://fasterdata.es.net/ host-tunin g/100g-tun ing/ suggests also be sure to verify you've specifically enabled irq balancing + that the setting has actually effected card's settings correctly.
ASKER
Thank you for the answer , I already use pf_ring for rings but not solving this problem i will try in a test lab with dpdk
If multiple CPUs were used, then buffers would have to be assembled after packets arrived + were processed by many CPUs.
This would likely be a performance killer, because packets would have to move across many memory busses/lanes to be reassembled.
That said, there are specific bus architectures specifically designed for high packet flow.
These architectures switch between multiple i/o cards, based on connections... so once a connection begins, only one i/o card handles the connection + as connections arrive, they multiplex between any free cards. If all cards are busy, then connections begin to queue, based on best guess at which card has most available CPU cycles available.
One of my brother's works with a company which uses this type of i/o card for their business.
If this is what you're looking for, I can reach out to him + ask him to update this question.
Keep in mind, this type of architecture is extremely expensive. Each i/o card may run many $1000s each.