1. 29 5月, 2013 1 次提交
  2. 28 5月, 2013 2 次提交
    • D
      netpoll: remove return value from netpoll_rx_disable() · da6e378b
      dingtianhong 提交于
      The netpoll_rx_disable() will always return 0, it is no use and looks wordy,
      so remove the unnecessary code and get rid of it in _dev_open and _dev_close.
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da6e378b
    • S
      MPLS: Add limited GSO support · 0d89d203
      Simon Horman 提交于
      In the case where a non-MPLS packet is received and an MPLS stack is
      added it may well be the case that the original skb is GSO but the
      NIC used for transmit does not support GSO of MPLS packets.
      
      The aim of this code is to provide GSO in software for MPLS packets
      whose skbs are GSO.
      
      SKB Usage:
      
      When an implementation adds an MPLS stack to a non-MPLS packet it should do
      the following to skb metadata:
      
      * Set skb->inner_protocol to the old non-MPLS ethertype of the packet.
        skb->inner_protocol is added by this patch.
      
      * Set skb->protocol to the new MPLS ethertype of the packet.
      
      * Set skb->network_header to correspond to the
        end of the L3 header, including the MPLS label stack.
      
      I have posted a patch, "[PATCH v3.29] datapath: Add basic MPLS support to
      kernel" which adds MPLS support to the kernel datapath of Open vSwtich.
      That patch sets the above requirements in datapath/actions.c:push_mpls()
      and was used to exercise this code.  The datapath patch is against the Open
      vSwtich tree but it is intended that it be added to the Open vSwtich code
      present in the mainline Linux kernel at some point.
      
      Features:
      
      I believe that the approach that I have taken is at least partially
      consistent with the handling of other protocols.  Jesse, I understand that
      you have some ideas here.  I am more than happy to change my implementation.
      
      This patch adds dev->mpls_features which may be used by devices
      to advertise features supported for MPLS packets.
      
      A new NETIF_F_MPLS_GSO feature is added for devices which support
      hardware MPLS GSO offload.  Currently no devices support this
      and MPLS GSO always falls back to software.
      
      Alternate Implementation:
      
      One possible alternate implementation is to teach netif_skb_features()
      and skb_network_protocol() about MPLS, in a similar way to their
      understanding of VLANs. I believe this would avoid the need
      for net/mpls/mpls_gso.c and in particular the calls to
      __skb_push() and __skb_push() in mpls_gso_segment().
      
      I have decided on the implementation in this patch as it should
      not introduce any overhead in the case where mpls_gso is not compiled
      into the kernel or inserted as a module.
      
      MPLS GSO suggested by Jesse Gross.
      Based in part on "v4 GRE: Add TCP segmentation offload for GRE"
      by Pravin B Shelar.
      
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d89d203
  3. 26 5月, 2013 1 次提交
  4. 23 5月, 2013 1 次提交
    • S
      net: Loosen constraints for recalculating checksum in skb_segment() · 1cdbcb79
      Simon Horman 提交于
      This is a generic solution to resolve a specific problem that I have observed.
      
      If the encapsulation of an skb changes then ability to offload checksums
      may also change. In particular it may be necessary to perform checksumming
      in software.
      
      An example of such a case is where a non-GRE packet is received but
      is to be encapsulated and transmitted as GRE.
      
      Another example relates to my proposed support for for packets
      that are non-MPLS when received but MPLS when transmitted.
      
      The cost of this change is that the value of the csum variable may be
      checked when it previously was not. In the case where the csum variable is
      true this is pure overhead. In the case where the csum variable is false it
      leads to software checksumming, which I believe also leads to correct
      checksums in transmitted packets for the cases described above.
      
      Further analysis:
      
      This patch relies on the return value of can_checksum_protocol()
      being correct and in turn the return value of skb_network_protocol(),
      used to provide the protocol parameter of can_checksum_protocol(),
      being correct. It also relies on the features passed to skb_segment()
      and in turn to can_checksum_protocol() being correct.
      
      I believe that this problem has not been observed for VLANs because it
      appears that almost all drivers, the exception being xgbe, set
      vlan_features such that that the checksum offload support for VLAN packets
      is greater than or equal to that of non-VLAN packets.
      
      I wonder if the code in xgbe may be an oversight and the hardware does
      support checksumming of VLAN packets.  If so it may be worth updating the
      vlan_features of the driver as this patch will force such checksums to be
      performed in software rather than hardware.
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cdbcb79
  5. 21 5月, 2013 1 次提交
    • W
      rps: selective flow shedding during softnet overflow · 99bbc707
      Willem de Bruijn 提交于
      A cpu executing the network receive path sheds packets when its input
      queue grows to netdev_max_backlog. A single high rate flow (such as a
      spoofed source DoS) can exceed a single cpu processing rate and will
      degrade throughput of other flows hashed onto the same cpu.
      
      This patch adds a more fine grained hashtable. If the netdev backlog
      is above a threshold, IRQ cpus track the ratio of total traffic of
      each flow (using 4096 buckets, configurable). The ratio is measured
      by counting the number of packets per flow over the last 256 packets
      from the source cpu. Any flow that occupies a large fraction of this
      (set at 50%) will see packet drop while above the threshold.
      
      Tested:
      Setup is a muli-threaded UDP echo server with network rx IRQ on cpu0,
      kernel receive (RPS) on cpu0 and application threads on cpus 2--7
      each handling 20k req/s. Throughput halves when hit with a 400 kpps
      antagonist storm. With this patch applied, antagonist overload is
      dropped and the server processes its complete load.
      
      The patch is effective when kernel receive processing is the
      bottleneck. The above RPS scenario is a extreme, but the same is
      reached with RFS and sufficient kernel processing (iptables, packet
      socket tap, ..).
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99bbc707
  6. 20 5月, 2013 1 次提交
  7. 18 5月, 2013 1 次提交
  8. 12 5月, 2013 1 次提交
    • E
      ipv6: do not clear pinet6 field · f77d6021
      Eric Dumazet 提交于
      We have seen multiple NULL dereferences in __inet6_lookup_established()
      
      After analysis, I found that inet6_sk() could be NULL while the
      check for sk_family == AF_INET6 was true.
      
      Bug was added in linux-2.6.29 when RCU lookups were introduced in UDP
      and TCP stacks.
      
      Once an IPv6 socket, using SLAB_DESTROY_BY_RCU is inserted in a hash
      table, we no longer can clear pinet6 field.
      
      This patch extends logic used in commit fcbdf09d
      ("net: fix nulls list corruptions in sk_prot_alloc")
      
      TCP/UDP/UDPLite IPv6 protocols provide their own .clear_sk() method
      to make sure we do not clear pinet6 field.
      
      At socket clone phase, we do not really care, as cloning the parent (non
      NULL) pinet6 is not adding a fatal race.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f77d6021
  9. 09 5月, 2013 1 次提交
  10. 06 5月, 2013 2 次提交
  11. 03 5月, 2013 2 次提交
  12. 02 5月, 2013 3 次提交
    • D
      proc: Supply a function to remove a proc entry by PDE · a8ca16ea
      David Howells 提交于
      Supply a function (proc_remove()) to remove a proc entry (and any subtree
      rooted there) by proc_dir_entry pointer rather than by name and (optionally)
      root dir entry pointer.  This allows us to eliminate all remaining pde->name
      accesses outside of procfs.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NGrant Likely <grant.likely@linaro.or>
      cc: linux-acpi@vger.kernel.org
      cc: openipmi-developer@lists.sourceforge.net
      cc: devicetree-discuss@lists.ozlabs.org
      cc: linux-pci@vger.kernel.org
      cc: netdev@vger.kernel.org
      cc: netfilter-devel@vger.kernel.org
      cc: alsa-devel@alsa-project.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a8ca16ea
    • D
      proc: Split the namespace stuff out into linux/proc_ns.h · 0bb80f24
      David Howells 提交于
      Split the proc namespace stuff out into linux/proc_ns.h.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: netdev@vger.kernel.org
      cc: Serge E. Hallyn <serge.hallyn@ubuntu.com>
      cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0bb80f24
    • N
      netpoll: convert mutex into a semaphore · bd7c4b60
      Neil Horman 提交于
      Bart Van Assche recently reported a warning to me:
      
      <IRQ>  [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0
      [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20
      [<ffffffff814761dd>] mutex_trylock+0x16d/0x180
      [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30
      [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0
      [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410
      [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0
      [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole]
      [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole]
      [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0
      [<ffffffff8103fb76>] console_unlock+0x2d6/0x450
      [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510
      [<ffffffff8146f9f6>] printk+0x4d/0x4f
      [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod]
      
      This resulted from my commit ca99ca14 which introduced a mutex_trylock
      operation in a path that could execute in interrupt context.  When mutex
      debugging is enabled, the above warns the user when we are in fact
      exectuting in interrupt context
      interrupt context.
      
      After some discussion, It seems that a semaphore is the proper mechanism to use
      here.  While mutexes are defined to be unusable in interrupt context, no such
      condition exists for semaphores (save for the fact that the non blocking api
      calls, like up and down_trylock must be used when in irq context).
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Reported-by: NBart Van Assche <bvanassche@acm.org>
      CC: Bart Van Assche <bvanassche@acm.org>
      CC: David Miller <davem@davemloft.net>
      CC: netdev@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd7c4b60
  13. 30 4月, 2013 7 次提交
  14. 25 4月, 2013 3 次提交
  15. 23 4月, 2013 1 次提交
  16. 20 4月, 2013 6 次提交
  17. 17 4月, 2013 1 次提交
  18. 16 4月, 2013 1 次提交
    • V
      net: add dev_uc_sync_multiple() and dev_mc_sync_multiple() api · 4cd729b0
      Vlad Yasevich 提交于
      The current implementation of dev_uc_sync/unsync() assumes that there is
      a strict 1-to-1 relationship between the source and destination of the sync.
      In other words, once an address has been synced to a destination device, it
      will not be synced to any other device through the sync API.
      However, there are some virtual devices that aggreate a number of lower
      devices and need to sync addresses to all of them.  The current
      API falls short there.
      
      This patch introduces a new dev_uc_sync_multiple() api that can be called
      in the above circumstances and allows sync to work for every invocation.
      
      CC: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cd729b0
  19. 12 4月, 2013 1 次提交
  20. 11 4月, 2013 1 次提交
  21. 10 4月, 2013 2 次提交