This needs redesign of a lot of stuff and while doing so I would switchįrom a more stack based approach to build the stack to try out a more I once tried this up till the vlan layer and error handling got soĬomplex and complicated that I stopped there. > Changing / adjusting the stack to support processing in "stages" might > before "flushing" the icache (by calling the stack). > allow the RX loop (in ixgbe_clean_rx_irq()) to run more iterations > basic idea is that I delay calling ixgbe_rx_skb/napi_gro_receive, and > touch the packet data, thus it also optimizes data-cache misses). (This patch also optimize delaying when I ![]() > I have a prove-of-concept patch for ixgbe, which gives me 10% speedup > napi_gro_receive()) in its processing loop. > then calls the "full" stack for every single packet (e.g. up-to 64) packets in the driver RX-poll routine. > Today, we already allow NAPI net_rx_action() to process many > I imagine that we could process several packets at each stage in the > same icache misses (as the previous packet). > travel individually through the network stack, experiencing the exact > larger than the instruction-cache (icache). > The kernel network stack code path (a packet travels) is obviously > I want to do some instruction-cache level optimizations. > Given net-next is closed, we have time to discuss controversial core 20:47 ` David Miller 2 siblings, 1 reply 59+ messages in threadįrom: Hannes Frederic Sowa 13:32 UTC ( / raw)Ĭc: David Miller, Alexander Duyck, Alexei Starovoitov,ĭaniel Borkmann, Marek Majkowski, Florian Westphal, Paolo Abeni, 13:22 Optimizing instruction-cache, more packets at each stage Jesper Dangaard 13:32 ` Hannes Frederic Sowa * Re: Optimizing instruction-cache, more packets at each stage LinkedIn: ^ raw reply 59+ messages in thread MSc.CS, Principal Kernel Engineer at Red Hat I doing a Network Performance BoF at NetDev 1.1, where thisĪnd many more subjects will be brought up face-to-face. Because as IĪlways say, if you cannot measure it, you cannot improve it. One challenge with icache optimizations is that it is hard to profile.īut hopefully the new Skylake CPU can profile this. We automatically amortize processing cost in a scalable manor. Packets that all arrived at the same "time", and thus process them atīy letting the "bulking" depend on the avail packets in the RX ring, Maybe we should view the packets stuck/avail in the RX ring, as Maybe we could simple pass a skb-list?Ĭhanging / adjusting the stack to support processing in "stages" might Theīasic idea is that I delay calling ixgbe_rx_skb/napi_gro_receive, andĪllow the RX loop (in ixgbe_clean_rx_irq()) to run more iterationsīefore "flushing" the icache (by calling the stack). Touch the packet data, thus it also optimizes data-cache misses). I have a prove-of-concept patch for ixgbe, which gives me 10% speedup Napi_gro_receive()) in its processing loop. Then calls the "full" stack for every single packet (e.g. Today, we already allow NAPI net_rx_action() to process many ![]() I imagine that we could process several packets at each stage in the Same icache misses (as the previous packet). Travel individually through the network stack, experiencing the exact Larger than the instruction-cache (icache). The kernel network stack code path (a packet travels) is obviously I want to do some instruction-cache level optimizations. Given net-next is closed, we have time to discuss controversial core ` (2 more replies) 0 siblings, 3 replies 59+ messages in threadįrom: Jesper Dangaard Brouer 13:22 UTC ( / raw)Ĭc: brouer, David Miller, Alexander Duyck, Alexei Starovoitov,ĭaniel Borkmann, Marek Majkowski, Hannes Frederic Sowa,įlorian Westphal, Paolo Abeni, John Fastabend Optimizing instruction-cache, more packets at each stage Netdev Archive on help / color / mirror / Atom feed * Optimizing instruction-cache, more packets at each stage 13:22 Jesper Dangaard Brouer
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |