1. 02 8月, 2013 1 次提交
  2. 11 7月, 2013 1 次提交
  3. 27 6月, 2013 1 次提交
    • N
      net: fix kernel deadlock with interface rename and netdev name retrieval. · 5dbe7c17
      Nicolas Schichan 提交于
      When the kernel (compiled with CONFIG_PREEMPT=n) is performing the
      rename of a network interface, it can end up waiting for a workqueue
      to complete. If userland is able to invoke a SIOCGIFNAME ioctl or a
      SO_BINDTODEVICE getsockopt in between, the kernel will deadlock due to
      the fact that read_secklock_begin() will spin forever waiting for the
      writer process (the one doing the interface rename) to update the
      devnet_rename_seq sequence.
      
      This patch fixes the problem by adding a helper (netdev_get_name())
      and using it in the code handling the SIOCGIFNAME ioctl and
      SO_BINDTODEVICE setsockopt.
      
      The netdev_get_name() helper uses raw_seqcount_begin() to avoid
      spinning forever, waiting for devnet_rename_seq->sequence to become
      even. cond_resched() is used in the contended case, before retrying
      the access to give the writer process a chance to finish.
      
      The use of raw_seqcount_begin() will incur some unneeded work in the
      reader process in the contended case, but this is better than
      deadlocking the system.
      Signed-off-by: NNicolas Schichan <nschichan@freebox.fr>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5dbe7c17
  4. 14 6月, 2013 2 次提交
    • R
      net/core: Add VF link state control · 1d8faf48
      Rony Efraim 提交于
      Add netlink directives and ndo entry to allow for controling
      VF link, which can be in one of three states:
      
      Auto - VF link state reflects the PF link state (default)
      
      Up - VF link state is up, traffic from VF to VF works even if
      the actual PF link is down
      
      Down - VF link state is down, no traffic from/to this VF, can be of
      use while configuring the VF
      Signed-off-by: NRony Efraim <ronye@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d8faf48
    • W
      net-rps: fixes for rps flow limit · 5f121b9a
      Willem de Bruijn 提交于
      Caught by sparse:
      - __rcu: missing annotation to sd->flow_limit
      - __user: direct access in cpumask_scnprintf
      
      Also
      - add endline character when printing bitmap if room in buffer
      - avoid bucket overflow by reducing FLOW_LIMIT_HISTORY
      
      The last item warrants some explanation. The hashtable buckets are
      subject to overflow if FLOW_LIMIT_HISTORY is larger than or equal
      to bucket size, since all packets may end up in a single bucket. The
      current (rather arbitrary) history value of 256 happens to match the
      buffer size (u8).
      
      As a result, with a single flow, the first 128 packets are accepted
      (correct), the second 128 packets dropped (correct) and then the
      history[] array has filled, so that each subsequent new packet
      causes an increment in the bucket for new_flow plus a decrement
      for old_flow: a steady state.
      
      This is fine if packets are dropped, as the steady state goes away
      as soon as a mix of traffic reappears. But, because the 256th packet
      overflowed the bucket to 0: no packets are dropped.
      
      Instead of explicitly adding an overflow check, this patch changes
      FLOW_LIMIT_HISTORY to never be able to overflow a single bucket.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      (first item)
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f121b9a
  5. 12 6月, 2013 1 次提交
  6. 11 6月, 2013 2 次提交
  7. 29 5月, 2013 3 次提交
  8. 28 5月, 2013 1 次提交
    • S
      MPLS: Add limited GSO support · 0d89d203
      Simon Horman 提交于
      In the case where a non-MPLS packet is received and an MPLS stack is
      added it may well be the case that the original skb is GSO but the
      NIC used for transmit does not support GSO of MPLS packets.
      
      The aim of this code is to provide GSO in software for MPLS packets
      whose skbs are GSO.
      
      SKB Usage:
      
      When an implementation adds an MPLS stack to a non-MPLS packet it should do
      the following to skb metadata:
      
      * Set skb->inner_protocol to the old non-MPLS ethertype of the packet.
        skb->inner_protocol is added by this patch.
      
      * Set skb->protocol to the new MPLS ethertype of the packet.
      
      * Set skb->network_header to correspond to the
        end of the L3 header, including the MPLS label stack.
      
      I have posted a patch, "[PATCH v3.29] datapath: Add basic MPLS support to
      kernel" which adds MPLS support to the kernel datapath of Open vSwtich.
      That patch sets the above requirements in datapath/actions.c:push_mpls()
      and was used to exercise this code.  The datapath patch is against the Open
      vSwtich tree but it is intended that it be added to the Open vSwtich code
      present in the mainline Linux kernel at some point.
      
      Features:
      
      I believe that the approach that I have taken is at least partially
      consistent with the handling of other protocols.  Jesse, I understand that
      you have some ideas here.  I am more than happy to change my implementation.
      
      This patch adds dev->mpls_features which may be used by devices
      to advertise features supported for MPLS packets.
      
      A new NETIF_F_MPLS_GSO feature is added for devices which support
      hardware MPLS GSO offload.  Currently no devices support this
      and MPLS GSO always falls back to software.
      
      Alternate Implementation:
      
      One possible alternate implementation is to teach netif_skb_features()
      and skb_network_protocol() about MPLS, in a similar way to their
      understanding of VLANs. I believe this would avoid the need
      for net/mpls/mpls_gso.c and in particular the calls to
      __skb_push() and __skb_push() in mpls_gso_segment().
      
      I have decided on the implementation in this patch as it should
      not introduce any overhead in the case where mpls_gso is not compiled
      into the kernel or inserted as a module.
      
      MPLS GSO suggested by Jesse Gross.
      Based in part on "v4 GRE: Add TCP segmentation offload for GRE"
      by Pravin B Shelar.
      
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d89d203
  9. 26 5月, 2013 1 次提交
  10. 21 5月, 2013 1 次提交
    • W
      rps: selective flow shedding during softnet overflow · 99bbc707
      Willem de Bruijn 提交于
      A cpu executing the network receive path sheds packets when its input
      queue grows to netdev_max_backlog. A single high rate flow (such as a
      spoofed source DoS) can exceed a single cpu processing rate and will
      degrade throughput of other flows hashed onto the same cpu.
      
      This patch adds a more fine grained hashtable. If the netdev backlog
      is above a threshold, IRQ cpus track the ratio of total traffic of
      each flow (using 4096 buckets, configurable). The ratio is measured
      by counting the number of packets per flow over the last 256 packets
      from the source cpu. Any flow that occupies a large fraction of this
      (set at 50%) will see packet drop while above the threshold.
      
      Tested:
      Setup is a muli-threaded UDP echo server with network rx IRQ on cpu0,
      kernel receive (RPS) on cpu0 and application threads on cpus 2--7
      each handling 20k req/s. Throughput halves when hit with a 400 kpps
      antagonist storm. With this patch applied, antagonist overload is
      dropped and the server processes its complete load.
      
      The patch is effective when kernel receive processing is the
      bottleneck. The above RPS scenario is a extreme, but the same is
      reached with RFS and sufficient kernel processing (iptables, packet
      socket tap, ..).
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99bbc707
  11. 17 5月, 2013 1 次提交
    • E
      bonding: allow TSO being set on bonding master · b0ce3508
      Eric Dumazet 提交于
      In some situations, we need to disable TSO on bonding slaves.
      
      bonding device automatically unset TSO in bond_fix_features(), and
      performance is not good because :
      
      1) We consume more cpu cycles.
      
      2) GSO segmentation has some bugs leading to out of order TCP packets
      if this segmentation is done before virtual device. This particular
      problem will be addressed in a separate patch.
      
      This patch allows TSO being set/unset on the bonding master,
      so that GSO segmentation is done after bonding layer.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Michał Mirosław <mirqus@gmail.com>
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0ce3508
  12. 06 5月, 2013 1 次提交
  13. 20 4月, 2013 2 次提交
  14. 16 4月, 2013 1 次提交
    • V
      net: add dev_uc_sync_multiple() and dev_mc_sync_multiple() api · 4cd729b0
      Vlad Yasevich 提交于
      The current implementation of dev_uc_sync/unsync() assumes that there is
      a strict 1-to-1 relationship between the source and destination of the sync.
      In other words, once an address has been synced to a destination device, it
      will not be synced to any other device through the sync API.
      However, there are some virtual devices that aggreate a number of lower
      devices and need to sync addresses to all of them.  The current
      API falls short there.
      
      This patch introduces a new dev_uc_sync_multiple() api that can be called
      in the above circumstances and allows sync to work for every invocation.
      
      CC: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cd729b0
  15. 05 4月, 2013 1 次提交
    • V
      net: count hw_addr syncs so that unsync works properly. · 4543fbef
      Vlad Yasevich 提交于
      A few drivers use dev_uc_sync/unsync to synchronize the
      address lists from master down to slave/lower devices.  In
      some cases (bond/team) a single address list is synched down
      to multiple devices.  At the time of unsync, we have a leak
      in these lower devices, because "synced" is treated as a
      boolean and the address will not be unsynced for anything after
      the first device/call.
      
      Treat "synced" as a count (same as refcount) and allow all
      unsync calls to work.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4543fbef
  16. 31 3月, 2013 1 次提交
    • E
      net: reorder some fields of net_device · 4c3d5e7b
      Eric Dumazet 提交于
      As time passed, some fields were added in net_device, and not
      at sensible offsets.
      
      Lets reorder some fields to reduce number of cache lines in RX path.
      Fields not used in data path should be moved out of this critical cache
      line.
      
      In particular, move broadcast[] to the end of the rx section,
      as it is less used, and ethernet uses only the beginning of the 32bytes
      field.
      
      Before patch :
      
      offsetof(struct net_device,dev_addr)=0x258
      offsetof(struct net_device,rx_handler)=0x2b8
      offsetof(struct net_device,ingress_queue)=0x2c8
      offsetof(struct net_device,broadcast)=0x278
      
      After :
      
      offsetof(struct net_device,dev_addr)=0x280
      offsetof(struct net_device,rx_handler)=0x298
      offsetof(struct net_device,ingress_queue)=0x2a8
      offsetof(struct net_device,broadcast)=0x2b0
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c3d5e7b
  17. 28 3月, 2013 2 次提交
  18. 15 3月, 2013 2 次提交
  19. 10 3月, 2013 1 次提交
  20. 06 3月, 2013 1 次提交
  21. 20 2月, 2013 1 次提交
  22. 19 2月, 2013 1 次提交
  23. 16 2月, 2013 1 次提交
  24. 14 2月, 2013 3 次提交
  25. 11 2月, 2013 1 次提交
  26. 07 2月, 2013 1 次提交
    • C
      net: adjust skb_gso_segment() for calling in rx path · 12b0004d
      Cong Wang 提交于
      skb_gso_segment() is almost always called in tx path,
      except for openvswitch. It calls this function when
      it receives the packet and tries to queue it to user-space.
      In this special case, the ->ip_summed check inside
      skb_gso_segment() is no longer true, as ->ip_summed value
      has different meanings on rx path.
      
      This patch adjusts skb_gso_segment() so that we can at least
      avoid such warnings on checksum.
      
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12b0004d
  27. 29 1月, 2013 1 次提交
    • C
      netpoll: add RCU annotation to npinfo field · 5fbee843
      Cong Wang 提交于
      dev->npinfo is protected by RCU.
      
      This fixes the following sparse warnings:
      
      net/core/netpoll.c:177:48: error: incompatible types in comparison expression (different address spaces)
      net/core/netpoll.c:200:35: error: incompatible types in comparison expression (different address spaces)
      net/core/netpoll.c:221:35: error: incompatible types in comparison expression (different address spaces)
      net/core/netpoll.c:327:18: error: incompatible types in comparison expression (different address spaces)
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5fbee843
  28. 12 1月, 2013 1 次提交
  29. 11 1月, 2013 3 次提交