1. 25 4月, 2014 2 次提交
  2. 23 4月, 2014 1 次提交
  3. 12 4月, 2014 1 次提交
    • D
      net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369
      David S. Miller 提交于
      Several spots in the kernel perform a sequence like:
      
      	skb_queue_tail(&sk->s_receive_queue, skb);
      	sk->sk_data_ready(sk, skb->len);
      
      But at the moment we place the SKB onto the socket receive queue it
      can be consumed and freed up.  So this skb->len access is potentially
      to freed up memory.
      
      Furthermore, the skb->len can be modified by the consumer so it is
      possible that the value isn't accurate.
      
      And finally, no actual implementation of this callback actually uses
      the length argument.  And since nobody actually cared about it's
      value, lots of call sites pass arbitrary values in such as '0' and
      even '1'.
      
      So just remove the length argument from the callback, that way there
      is no confusion whatsoever and all of these use-after-free cases get
      fixed as a side effect.
      
      Based upon a patch by Eric Dumazet and his suggestion to audit this
      issue tree-wide.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      676d2369
  4. 04 4月, 2014 2 次提交
  5. 29 3月, 2014 1 次提交
    • D
      packet: respect devices with LLTX flag in direct xmit · 43279500
      Daniel Borkmann 提交于
      Quite often it can be useful to test with dummy or similar
      devices as a blackhole sink for skbs. Such devices are only
      equipped with a single txq, but marked as NETIF_F_LLTX as
      they do not require locking their internal queues on xmit
      (or implement locking themselves). Therefore, rather use
      HARD_TX_{UN,}LOCK API, so that NETIF_F_LLTX will be respected.
      
      trafgen mmap/TX_RING example against dummy device with config
      foo: { fill(0xff, 64) } results in the following performance
      improvements for such scenarios on an ordinary Core i7/2.80GHz:
      
      Before:
      
       Performance counter stats for 'trafgen -i foo -o du0 -n100000000' (10 runs):
      
         160,975,944,159 instructions:k            #    0.55  insns per cycle          ( +-  0.09% )
         293,319,390,278 cycles:k                  #    0.000 GHz                      ( +-  0.35% )
             192,501,104 branch-misses:k                                               ( +-  1.63% )
                     831 context-switches:k                                            ( +-  9.18% )
                       7 cpu-migrations:k                                              ( +-  7.40% )
                  69,382 cache-misses:k            #    0.010 % of all cache refs      ( +-  2.18% )
             671,552,021 cache-references:k                                            ( +-  1.29% )
      
            22.856401569 seconds time elapsed                                          ( +-  0.33% )
      
      After:
      
       Performance counter stats for 'trafgen -i foo -o du0 -n100000000' (10 runs):
      
         133,788,739,692 instructions:k            #    0.92  insns per cycle          ( +-  0.06% )
         145,853,213,256 cycles:k                  #    0.000 GHz                      ( +-  0.17% )
              59,867,100 branch-misses:k                                               ( +-  4.72% )
                     384 context-switches:k                                            ( +-  3.76% )
                       6 cpu-migrations:k                                              ( +-  6.28% )
                  70,304 cache-misses:k            #    0.077 % of all cache refs      ( +-  1.73% )
              90,879,408 cache-references:k                                            ( +-  1.35% )
      
            11.719372413 seconds time elapsed                                          ( +-  0.24% )
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43279500
  6. 27 3月, 2014 1 次提交
  7. 01 3月, 2014 1 次提交
    • D
      packet: allow to transmit +4 byte in TX_RING slot for VLAN case · 52f1454f
      Daniel Borkmann 提交于
      Commit 57f89bfa ("network: Allow af_packet to transmit +4 bytes
      for VLAN packets.") added the possibility for non-mmaped frames to
      send extra 4 byte for VLAN header so the MTU increases from 1500 to
      1504 byte, for example.
      
      Commit cbd89acb ("af_packet: fix for sending VLAN frames via
      packet_mmap") attempted to fix that for the mmap part but was
      reverted as it caused regressions while using eth_type_trans()
      on output path.
      
      Lets just act analogous to 57f89bfa and add a similar logic
      to TX_RING. We presume size_max as overcharged with +4 bytes and
      later on after skb has been built by tpacket_fill_skb() check
      for ETH_P_8021Q header on packets larger than normal MTU. Can
      be easily reproduced with a slightly modified trafgen in mmap(2)
      mode, test cases:
      
       { fill(0xff, 12) const16(0x8100) fill(0xff, <1504|1505>) }
       { fill(0xff, 12) const16(0x0806) fill(0xff, <1500|1501>) }
      
      Note that we need to do the test right after tpacket_fill_skb()
      as sockets can have PACKET_LOSS set where we would not fail but
      instead just continue to traverse the ring.
      Reported-by: NMathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Ben Greear <greearb@candelatech.com>
      Cc: Phil Sutter <phil@nwl.cc>
      Tested-by: NMathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52f1454f
  8. 19 2月, 2014 1 次提交
  9. 17 2月, 2014 1 次提交
    • D
      packet: check for ndo_select_queue during queue selection · 0fd5d57b
      Daniel Borkmann 提交于
      Mathias reported that on an AMD Geode LX embedded board (ALiX)
      with ath9k driver PACKET_QDISC_BYPASS, introduced in commit
      d346a3fa ("packet: introduce PACKET_QDISC_BYPASS socket
      option"), triggers a WARN_ON() coming from the driver itself
      via 066dae93 ("ath9k: rework tx queue selection and fix
      queue stopping/waking").
      
      The reason why this happened is that ndo_select_queue() call
      is not invoked from direct xmit path i.e. for ieee80211 subsystem
      that sets queue and TID (similar to 802.1d tag) which is being
      put into the frame through 802.11e (WMM, QoS). If that is not
      set, pending frame counter for e.g. ath9k can get messed up.
      
      So the WARN_ON() in ath9k is absolutely legitimate. Generally,
      the hw queue selection in ieee80211 depends on the type of
      traffic, and priorities are set according to ieee80211_ac_numbers
      mapping; working in a similar way as DiffServ only on a lower
      layer, so that the AP can favour frames that have "real-time"
      requirements like voice or video data frames.
      
      Therefore, check for presence of ndo_select_queue() in netdev
      ops and, if available, invoke it with a fallback handler to
      __packet_pick_tx_queue(), so that driver such as bnx2x, ixgbe,
      or mlx4 can still select a hw queue for transmission in
      relation to the current CPU while e.g. ieee80211 subsystem
      can make their own choices.
      Reported-by: NMathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fd5d57b
  10. 23 1月, 2014 1 次提交
  11. 22 1月, 2014 3 次提交
  12. 19 1月, 2014 1 次提交
  13. 17 1月, 2014 3 次提交
    • D
      packet: use percpu mmap tx frame pending refcount · b0138408
      Daniel Borkmann 提交于
      In PF_PACKET's packet mmap(), we can avoid using one atomic_inc()
      and one atomic_dec() call in skb destructor and use a percpu
      reference count instead in order to determine if packets are
      still pending to be sent out. Micro-benchmark with [1] that has
      been slightly modified (that is, protcol = 0 in socket(2) and
      bind(2)), example on a rather crappy testing machine; I expect
      it to scale and have even better results on bigger machines:
      
      ./packet_mm_tx -s7000 -m7200 -z700000 em1, avg over 2500 runs:
      
      With patch:    4,022,015 cyc
      Without patch: 4,812,994 cyc
      
      time ./packet_mm_tx -s64 -c10000000 em1 > /dev/null, stable:
      
      With patch:
        real         1m32.241s
        user         0m0.287s
        sys          1m29.316s
      
      Without patch:
        real         1m38.386s
        user         0m0.265s
        sys          1m35.572s
      
      In function tpacket_snd(), it is okay to use packet_read_pending()
      since in fast-path we short-circuit the condition already with
      ph != NULL, since we have next frames to process. In case we have
      MSG_DONTWAIT, we also do not execute this path as need_wait is
      false here anyway, and in case of _no_ MSG_DONTWAIT flag, it is
      okay to call a packet_read_pending(), because when we ever reach
      that path, we're done processing outgoing frames anyway and only
      look if there are skbs still outstanding to be orphaned. We can
      stay lockless in this percpu counter since it's acceptable when we
      reach this path for the sum to be imprecise first, but we'll level
      out at 0 after all pending frames have reached the skb destructor
      eventually through tx reclaim. When people pin a tx process to
      particular CPUs, we expect overflows to happen in the reference
      counter as on one CPU we expect heavy increase; and distributed
      through ksoftirqd on all CPUs a decrease, for example. As
      David Laight points out, since the C language doesn't define the
      result of signed int overflow (i.e. rather than wrap, it is
      allowed to saturate as a possible outcome), we have to use
      unsigned int as reference count. The sum over all CPUs when tx
      is complete will result in 0 again.
      
      The BUG_ON() in tpacket_destruct_skb() we can remove as well. It
      can _only_ be set from inside tpacket_snd() path and we made sure
      to increase tx_ring.pending in any case before we called po->xmit(skb).
      So testing for tx_ring.pending == 0 is not too useful. Instead, it
      would rather have been useful to test if lower layers didn't orphan
      the skb so that we're missing ring slots being put back to
      TP_STATUS_AVAILABLE. But such a bug will be caught in user space
      already as we end up realizing that we do not have any
      TP_STATUS_AVAILABLE slots left anymore. Therefore, we're all set.
      
      Btw, in case of RX_RING path, we do not make use of the pending
      member, therefore we also don't need to use up any percpu memory
      here. Also note that __alloc_percpu() already returns a zero-filled
      percpu area, so initialization is done already.
      
        [1] http://wiki.ipxwarzone.com/index.php5?title=Linux_packet_mmapSigned-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0138408
    • D
      packet: don't unconditionally schedule() in case of MSG_DONTWAIT · 87a2fd28
      Daniel Borkmann 提交于
      In tpacket_snd(), when we've discovered a first frame that is
      not in status TP_STATUS_SEND_REQUEST, and return a NULL buffer,
      we exit the send routine in case of MSG_DONTWAIT, since we've
      finished traversing the mmaped send ring buffer and don't care
      about pending frames.
      
      While doing so, we still unconditionally call an expensive
      schedule() in the packet_current_frame() "error" path, which
      is unnecessary in this case since it's enough to just quit
      the function.
      
      Also, in case MSG_DONTWAIT is not set, we should rather test
      for need_resched() first and do schedule() only if necessary
      since meanwhile pending frames could already have finished
      processing and called skb destructor.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87a2fd28
    • D
      packet: improve socket create/bind latency in some cases · 902fefb8
      Daniel Borkmann 提交于
      Most people acquire PF_PACKET sockets with a protocol argument in
      the socket call, e.g. libpcap does so with htons(ETH_P_ALL) for
      all its sockets. Most likely, at some point in time a subsequent
      bind() call will follow, e.g. in libpcap with ...
      
        memset(&sll, 0, sizeof(sll));
        sll.sll_family          = AF_PACKET;
        sll.sll_ifindex         = ifindex;
        sll.sll_protocol        = htons(ETH_P_ALL);
      
      ... as arguments. What happens in the kernel is that already
      in socket() syscall, we install a proto hook via register_prot_hook()
      if our protocol argument is != 0. Yet, in bind() we're almost
      doing the same work by doing a unregister_prot_hook() with an
      expensive synchronize_net() call in case during socket() the proto
      was != 0, plus follow-up register_prot_hook() with a bound device
      to it this time, in order to limit traffic we get.
      
      In the case when the protocol and user supplied device index (== 0)
      does not change from socket() to bind(), we can spare us doing
      the same work twice. Similarly for re-binding to the same device
      and protocol. For these scenarios, we can decrease create/bind
      latency from ~7447us (sock-bind-2 case) to ~89us (sock-bind-1 case)
      with this patch.
      
      Alternatively, for the first case, if people care, they should
      simply create their sockets with proto == 0 argument and define
      the protocol during bind() as this saves a call to synchronize_net()
      as well (sock-bind-3 case).
      
      In all other cases, we're tied to user space behaviour we must not
      change, also since a bind() is not strictly required. Thus, we need
      the synchronize_net() to make sure no asynchronous packet processing
      paths still refer to the previous elements of po->prot_hook.
      
      In case of mmap()ed sockets, the workflow that includes bind() is
      socket() -> setsockopt(<ring>) -> bind(). In that case, a pair of
      {__unregister, register}_prot_hook is being called from setsockopt()
      in order to install the new protocol receive handler. Thus, when
      we call bind and can skip a re-hook, we have already previously
      installed the new handler. For fanout, this is handled different
      entirely, so we should be good.
      
      Timings on an i7-3520M machine:
      
        * sock-bind-1:   89 us
        * sock-bind-2: 7447 us
        * sock-bind-3:   75 us
      
      sock-bind-1:
        socket(PF_PACKET, SOCK_RAW, htons(ETH_P_IP)) = 3
        bind(3, {sa_family=AF_PACKET, proto=htons(ETH_P_IP), if=all(0),
                 pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0
      
      sock-bind-2:
        socket(PF_PACKET, SOCK_RAW, htons(ETH_P_IP)) = 3
        bind(3, {sa_family=AF_PACKET, proto=htons(ETH_P_IP), if=lo(1),
                 pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0
      
      sock-bind-3:
        socket(PF_PACKET, SOCK_RAW, 0) = 3
        bind(3, {sa_family=AF_PACKET, proto=htons(ETH_P_IP), if=lo(1),
                 pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      902fefb8
  14. 01 1月, 2014 1 次提交
  15. 18 12月, 2013 4 次提交
  16. 14 12月, 2013 1 次提交
    • L
      packet: fix using smp_processor_id() in preemptible code · 1cbac010
      Li Zhong 提交于
      This patches fixes the following warning by replacing smp_processor_id()
      with raw_smp_processor_id():
      
      [   11.120893] BUG: using smp_processor_id() in preemptible [00000000] code: arping/3510
      [   11.120913] caller is .packet_sendmsg+0xc14/0xe68
      [   11.120920] CPU: 13 PID: 3510 Comm: arping Not tainted 3.13.0-rc3-next-20131211-dirty #1
      [   11.120926] Call Trace:
      [   11.120932] [c0000001f803f6f0] [c0000000000138dc] .show_stack+0x110/0x25c (unreliable)
      [   11.120942] [c0000001f803f7e0] [c00000000083dd24] .dump_stack+0xa0/0x37c
      [   11.120951] [c0000001f803f870] [c000000000493fd4] .debug_smp_processor_id+0xfc/0x12c
      [   11.120959] [c0000001f803f900] [c0000000007eba78] .packet_sendmsg+0xc14/0xe68
      [   11.120968] [c0000001f803fa80] [c000000000700968] .sock_sendmsg+0xa0/0xe0
      [   11.120975] [c0000001f803fbf0] [c0000000007014d8] .SyS_sendto+0x100/0x148
      [   11.120983] [c0000001f803fd60] [c0000000006fff10] .SyS_socketcall+0x1c4/0x2e8
      [   11.120990] [c0000001f803fe30] [c00000000000a1e4] syscall_exit+0x0/0x9c
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cbac010
  17. 10 12月, 2013 2 次提交
    • D
      packet: introduce PACKET_QDISC_BYPASS socket option · d346a3fa
      Daniel Borkmann 提交于
      This patch introduces a PACKET_QDISC_BYPASS socket option, that
      allows for using a similar xmit() function as in pktgen instead
      of taking the dev_queue_xmit() path. This can be very useful when
      PF_PACKET applications are required to be used in a similar
      scenario as pktgen, but with full, flexible packet payload that
      needs to be provided, for example.
      
      On default, nothing changes in behaviour for normal PF_PACKET
      TX users, so everything stays as is for applications. New users,
      however, can now set PACKET_QDISC_BYPASS if needed to prevent
      own packets from i) reentering packet_rcv() and ii) to directly
      push the frame to the driver.
      
      In doing so we can increase pps (here 64 byte packets) for
      PF_PACKET a bit:
      
        # CPUs -- QDISC_BYPASS   -- qdisc path -- qdisc path[**]
        1 CPU  ==  1,509,628 pps --  1,208,708 --  1,247,436
        2 CPUs ==  3,198,659 pps --  2,536,012 --  1,605,779
        3 CPUs ==  4,787,992 pps --  3,788,740 --  1,735,610
        4 CPUs ==  6,173,956 pps --  4,907,799 --  1,909,114
        5 CPUs ==  7,495,676 pps --  5,956,499 --  2,014,422
        6 CPUs ==  9,001,496 pps --  7,145,064 --  2,155,261
        7 CPUs == 10,229,776 pps --  8,190,596 --  2,220,619
        8 CPUs == 11,040,732 pps --  9,188,544 --  2,241,879
        9 CPUs == 12,009,076 pps -- 10,275,936 --  2,068,447
       10 CPUs == 11,380,052 pps -- 11,265,337 --  1,578,689
       11 CPUs == 11,672,676 pps -- 11,845,344 --  1,297,412
       [...]
       20 CPUs == 11,363,192 pps -- 11,014,933 --  1,245,081
      
       [**]: qdisc path with packet_rcv(), how probably most people
             seem to use it (hopefully not anymore if not needed)
      
      The test was done using a modified trafgen, sending a simple
      static 64 bytes packet, on all CPUs.  The trick in the fast
      "qdisc path" case, is to avoid reentering packet_rcv() by
      setting the RAW socket protocol to zero, like:
      socket(PF_PACKET, SOCK_RAW, 0);
      
      Tradeoffs are documented as well in this patch, clearly, if
      queues are busy, we will drop more packets, tc disciplines are
      ignored, and these packets are not visible to taps anymore. For
      a pktgen like scenario, we argue that this is acceptable.
      
      The pointer to the xmit function has been placed in packet
      socket structure hole between cached_dev and prot_hook that
      is hot anyway as we're working on cached_dev in each send path.
      
      Done in joint work together with Jesper Dangaard Brouer.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d346a3fa
    • D
      packet: fix send path when running with proto == 0 · 66e56cd4
      Daniel Borkmann 提交于
      Commit e40526cb introduced a cached dev pointer, that gets
      hooked into register_prot_hook(), __unregister_prot_hook() to
      update the device used for the send path.
      
      We need to fix this up, as otherwise this will not work with
      sockets created with protocol = 0, plus with sll_protocol = 0
      passed via sockaddr_ll when doing the bind.
      
      So instead, assign the pointer directly. The compiler can inline
      these helper functions automagically.
      
      While at it, also assume the cached dev fast-path as likely(),
      and document this variant of socket creation as it seems it is
      not widely used (seems not even the author of TX_RING was aware
      of that in his reference example [1]). Tested with reproducer
      from e40526cb.
      
       [1] http://wiki.ipxwarzone.com/index.php5?title=Linux_packet_mmap#Example
      
      Fixes: e40526cb ("packet: fix use after free race in send path when dev is released")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Tested-by: NSalam Noureddine <noureddine@aristanetworks.com>
      Tested-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66e56cd4
  18. 07 12月, 2013 1 次提交
  19. 30 11月, 2013 1 次提交
  20. 22 11月, 2013 1 次提交
    • D
      packet: fix use after free race in send path when dev is released · e40526cb
      Daniel Borkmann 提交于
      Salam reported a use after free bug in PF_PACKET that occurs when
      we're sending out frames on a socket bound device and suddenly the
      net device is being unregistered. It appears that commit 827d9780
      introduced a possible race condition between {t,}packet_snd() and
      packet_notifier(). In the case of a bound socket, packet_notifier()
      can drop the last reference to the net_device and {t,}packet_snd()
      might end up suddenly sending a packet over a freed net_device.
      
      To avoid reverting 827d9780 and thus introducing a performance
      regression compared to the current state of things, we decided to
      hold a cached RCU protected pointer to the net device and maintain
      it on write side via bind spin_lock protected register_prot_hook()
      and __unregister_prot_hook() calls.
      
      In {t,}packet_snd() path, we access this pointer under rcu_read_lock
      through packet_cached_dev_get() that holds reference to the device
      to prevent it from being freed through packet_notifier() while
      we're in send path. This is okay to do as dev_put()/dev_hold() are
      per-cpu counters, so this should not be a performance issue. Also,
      the code simplifies a bit as we don't need need_rls_dev anymore.
      
      Fixes: 827d9780 ("af-packet: Use existing netdev reference for bound sockets.")
      Reported-by: NSalam Noureddine <noureddine@aristanetworks.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NSalam Noureddine <noureddine@aristanetworks.com>
      Cc: Ben Greear <greearb@candelatech.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e40526cb
  21. 21 11月, 2013 1 次提交
    • H
      net: rework recvmsg handler msg_name and msg_namelen logic · f3d33426
      Hannes Frederic Sowa 提交于
      This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
      set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
      to return msg_name to the user.
      
      This prevents numerous uninitialized memory leaks we had in the
      recvmsg handlers and makes it harder for new code to accidentally leak
      uninitialized memory.
      
      Optimize for the case recvfrom is called with NULL as address. We don't
      need to copy the address at all, so set it to NULL before invoking the
      recvmsg handler. We can do so, because all the recvmsg handlers must
      cope with the case a plain read() is called on them. read() also sets
      msg_name to NULL.
      
      Also document these changes in include/linux/net.h as suggested by David
      Miller.
      
      Changes since RFC:
      
      Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
      non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
      affect sendto as it would bail out earlier while trying to copy-in the
      address. It also more naturally reflects the logic by the callers of
      verify_iovec.
      
      With this change in place I could remove "
      if (!uaddr || msg_sys->msg_namelen == 0)
      	msg->msg_name = NULL
      ".
      
      This change does not alter the user visible error logic as we ignore
      msg_namelen as long as msg_name is NULL.
      
      Also remove two unnecessary curly brackets in ___sys_recvmsg and change
      comments to netdev style.
      
      Cc: David Miller <davem@davemloft.net>
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3d33426
  22. 30 8月, 2013 2 次提交
  23. 21 8月, 2013 1 次提交
  24. 10 8月, 2013 1 次提交
    • E
      net: attempt high order allocations in sock_alloc_send_pskb() · 28d64271
      Eric Dumazet 提交于
      Adding paged frags skbs to af_unix sockets introduced a performance
      regression on large sends because of additional page allocations, even
      if each skb could carry at least 100% more payload than before.
      
      We can instruct sock_alloc_send_pskb() to attempt high order
      allocations.
      
      Most of the time, it does a single page allocation instead of 8.
      
      I added an additional parameter to sock_alloc_send_pskb() to
      let other users to opt-in for this new feature on followup patches.
      
      Tested:
      
      Before patch :
      
      $ netperf -t STREAM_STREAM
      STREAM STREAM TEST
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       2304  212992  212992    10.00    46861.15
      
      After patch :
      
      $ netperf -t STREAM_STREAM
      STREAM STREAM TEST
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       2304  212992  212992    10.00    57981.11
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28d64271
  25. 08 8月, 2013 1 次提交
  26. 03 8月, 2013 3 次提交
  27. 23 7月, 2013 1 次提交