1. 20 9月, 2012 2 次提交
  2. 09 9月, 2012 1 次提交
    • C
      net: small bug on rxhash calculation · 68622342
      Chema Gonzalez 提交于
      In the current rxhash calculation function, while the
      sorting of the ports/addrs is coherent (you get the
      same rxhash for packets sharing the same 4-tuple, in
      both directions), ports and addrs are sorted
      independently. This implies packets from a connection
      between the same addresses but crossed ports hash to
      the same rxhash.
      
      For example, traffic between A=S:l and B=L:s is hashed
      (in both directions) from {L, S, {s, l}}. The same
      rxhash is obtained for packets between C=S:s and D=L:l.
      
      This patch ensures that you either swap both addrs and ports,
      or you swap none. Traffic between A and B, and traffic
      between C and D, get their rxhash from different sources
      ({L, S, {l, s}} for A<->B, and {L, S, {s, l}} for C<->D)
      
      The patch is co-written with Eric Dumazet <edumazet@google.com>
      Signed-off-by: NChema Gonzalez <chema@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68622342
  3. 01 9月, 2012 1 次提交
    • R
      net: fix documentation of skb_needs_linearize(). · d1a53dfd
      Rami Rosen 提交于
      skb_needs_linearize() does not check highmem DMA as it does not call
      illegal_highdma() anymore, so there is no need to mention highmem DMA here.
      
      (Indeed, ~NETIF_F_SG flag, which is checked in skb_needs_linearize(), can
      be set when illegal_highdma() returns true, and we are assured that
      illegal_highdma() is invoked prior to skb_needs_linearize() as
      skb_needs_linearize() is a static method called only once.
      But ~NETIF_F_SG can be set not only there in this same invocation path.
      It can also be set when can_checksum_protocol() returns false).
      
      see commit 02932ce9,
      Convert skb_need_linearize() to use precomputed features.
      Signed-off-by: NRami Rosen <rosenr@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1a53dfd
  4. 31 8月, 2012 1 次提交
    • G
      net: dev: fix the incorrect hold of net namespace's lo device · 6549dd43
      Gao feng 提交于
      When moving a net device from one net namespace to another
      net namespace,dev_change_net_namespace calls NETDEV_DOWN
      event,so the original net namespace's dst entries which
      beloned to this net device will be put into dst_garbage
      list.
      
      then dev_change_net_namespace will set this net device's
      net to the new net namespace.
      
      If we unregister this net device's driver, this will trigger
      the NETDEV_UNREGISTER_FINAL event, dst_ifdown will be called,
      and get this net device's dst entries from dst_garbage list,
      put these entries' dev to the new net namespace's lo device.
      
      It's not what we want,actually we need these dst entries hold
      the original net namespace's lo device,this incorrect device
      holding will trigger emg message like below.
      unregister_netdevice: waiting for lo to become free. Usage count = 1
      
      so we should call NETDEV_UNREGISTER_FINAL event in
      dev_change_net_namespace too,in order to make sure dst entries
      already in the dst_garbage list, we need rcu_barrier before we
      call NETDEV_UNREGISTER_FINAL event.
      
      With help form Eric Dumazet.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6549dd43
  5. 25 8月, 2012 1 次提交
    • B
      net: Set device operstate at registration time · 8f4cccbb
      Ben Hutchings 提交于
      The operstate of a device is initially IF_OPER_UNKNOWN and is updated
      asynchronously by linkwatch after each change of carrier state
      reported by the driver.  The default carrier state of a net device is
      on, and this will never be changed on drivers that do not support
      carrier detection, thus the operstate remains IF_OPER_UNKNOWN.
      
      For devices that do support carrier detection, the driver must set the
      carrier state to off initially, then poll the hardware state when the
      device is opened.  However, we must not activate linkwatch for a
      unregistered device, and commit b4730016 ('net: Do not fire linkwatch
      events until the device is registered.') ensured that we don't.  But
      this means that the operstate for many devices that support carrier
      detection remains IF_OPER_UNKNOWN when it should be IF_OPER_DOWN.
      
      The same issue exists with the dormant state.
      
      The proper initialisation sequence, avoiding a race with opening of
      the device, is:
      
              rtnl_lock();
              rc = register_netdevice(dev);
              if (rc)
                      goto out_unlock;
              netif_carrier_off(dev); /* or netif_dormant_on(dev) */
              rtnl_unlock();
      
      but it seems silly that this should have to be repeated in so many
      drivers.  Further, the operstate seen immediately after opening the
      device may still be IF_OPER_UNKNOWN due to the asynchronous nature of
      linkwatch.
      
      Commit 22604c86 ('net: Fix for initial link state in 2.6.28') attempted
      to fix this by setting the operstate synchronously, but it was
      reverted as it could lead to deadlock.
      
      This initialises the operstate synchronously at registration time
      only.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f4cccbb
  6. 24 8月, 2012 1 次提交
  7. 23 8月, 2012 1 次提交
    • E
      net: remove delay at device dismantle · 0115e8e3
      Eric Dumazet 提交于
      I noticed extra one second delay in device dismantle, tracked down to
      a call to dst_dev_event() while some call_rcu() are still in RCU queues.
      
      These call_rcu() were posted by rt_free(struct rtable *rt) calls.
      
      We then wait a little (but one second) in netdev_wait_allrefs() before
      kicking again NETDEV_UNREGISTER.
      
      As the call_rcu() are now completed, dst_dev_event() can do the needed
      device swap on busy dst.
      
      To solve this problem, add a new NETDEV_UNREGISTER_FINAL, called
      after a rcu_barrier(), but outside of RTNL lock.
      
      Use NETDEV_UNREGISTER_FINAL with care !
      
      Change dst_dev_event() handler to react to NETDEV_UNREGISTER_FINAL
      
      Also remove NETDEV_UNREGISTER_BATCH, as its not used anymore after
      IP cache removal.
      
      With help from Gao feng
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0115e8e3
  8. 20 8月, 2012 2 次提交
  9. 15 8月, 2012 3 次提交
  10. 10 8月, 2012 2 次提交
  11. 09 8月, 2012 1 次提交
  12. 02 8月, 2012 1 次提交
    • B
      net: Allow driver to limit number of GSO segments per skb · 30b678d8
      Ben Hutchings 提交于
      A peer (or local user) may cause TCP to use a nominal MSS of as little
      as 88 (actual MSS of 76 with timestamps).  Given that we have a
      sufficiently prodigious local sender and the peer ACKs quickly enough,
      it is nevertheless possible to grow the window for such a connection
      to the point that we will try to send just under 64K at once.  This
      results in a single skb that expands to 861 segments.
      
      In some drivers with TSO support, such an skb will require hundreds of
      DMA descriptors; a substantial fraction of a TX ring or even more than
      a full ring.  The TX queue selected for the skb may stall and trigger
      the TX watchdog repeatedly (since the problem skb will be retried
      after the TX reset).  This particularly affects sfc, for which the
      issue is designated as CVE-2012-3412.
      
      Therefore:
      1. Add the field net_device::gso_max_segs holding the device-specific
         limit.
      2. In netif_skb_features(), if the number of segments is too high then
         mask out GSO features to force fall back to software GSO.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30b678d8
  13. 01 8月, 2012 1 次提交
    • M
      netvm: set PF_MEMALLOC as appropriate during SKB processing · b4b9e355
      Mel Gorman 提交于
      In order to make sure pfmemalloc packets receive all memory needed to
      proceed, ensure processing of pfmemalloc SKBs happens under PF_MEMALLOC.
      This is limited to a subset of protocols that are expected to be used for
      writing to swap.  Taps are not allowed to use PF_MEMALLOC as these are
      expected to communicate with userspace processes which could be paged out.
      
      [a.p.zijlstra@chello.nl: Ideas taken from various patches]
      [jslaby@suse.cz: Lock imbalance fix]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4b9e355
  14. 24 7月, 2012 1 次提交
  15. 23 7月, 2012 1 次提交
  16. 19 7月, 2012 1 次提交
  17. 15 7月, 2012 1 次提交
  18. 11 7月, 2012 2 次提交
  19. 10 7月, 2012 1 次提交
  20. 05 7月, 2012 1 次提交
  21. 29 6月, 2012 1 次提交
  22. 16 6月, 2012 1 次提交
  23. 13 6月, 2012 1 次提交
    • M
      net-next: add dev_loopback_xmit() to avoid duplicate code · 95603e22
      Michel Machado 提交于
      Add dev_loopback_xmit() in order to deduplicate functions
      ip_dev_loopback_xmit() (in net/ipv4/ip_output.c) and
      ip6_dev_loopback_xmit() (in net/ipv6/ip6_output.c).
      
      I was about to reinvent the wheel when I noticed that
      ip_dev_loopback_xmit() and ip6_dev_loopback_xmit() do exactly what I
      need and are not IP-only functions, but they were not available to reuse
      elsewhere.
      
      ip6_dev_loopback_xmit() does not have line "skb_dst_force(skb);", but I
      understand that this is harmless, and should be in dev_loopback_xmit().
      Signed-off-by: NMichel Machado <michel@digirati.com.br>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      CC: James Morris <jmorris@namei.org>
      CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jpirko@redhat.com>
      CC: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95603e22
  24. 19 5月, 2012 1 次提交
  25. 16 5月, 2012 2 次提交
  26. 11 5月, 2012 1 次提交
  27. 01 5月, 2012 1 次提交
    • E
      net: make GRO aware of skb->head_frag · d7e8883c
      Eric Dumazet 提交于
      GRO can check if skb to be merged has its skb->head mapped to a page
      fragment, instead of a kmalloc() area.
      
      We 'upgrade' skb->head as a fragment in itself
      
      This avoids the frag_list fallback, and permits to build true GRO skb
      (one sk_buff and up to 16 fragments), using less memory.
      
      This reduces number of cache misses when user makes its copy, since a
      single sk_buff is fetched.
      
      This is a followup of patch "net: allow skb->head to be a page fragment"
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Matt Carlson <mcarlson@broadcom.com>
      Cc: Michael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7e8883c
  28. 20 4月, 2012 1 次提交
  29. 16 4月, 2012 1 次提交
  30. 13 4月, 2012 1 次提交
  31. 04 4月, 2012 1 次提交
  32. 29 3月, 2012 1 次提交
  33. 28 3月, 2012 1 次提交
    • B
      net/core: dev_forward_skb() should clear skb_iif · 3b9785c6
      Benjamin LaHaise 提交于
      While investigating another bug, I found that the code on the incoming path
      in __netif_receive_skb will only set skb->skb_iif if it is already 0.  When
      dev_forward_skb() is used in the case of interfaces like veth, skb_iif may
      already have been set.  Making dev_forward_skb() cause the packet to look
      like a newly received packet would seem to the the correct behaviour here,
      as otherwise the wrong incoming interface can be reported for such a packet.
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b9785c6