1. 15 7月, 2011 2 次提交
  2. 06 7月, 2011 1 次提交
  3. 02 7月, 2011 1 次提交
    • T
      rtnl: provide link dump consistency info · 4e985ada
      Thomas Graf 提交于
      This patch adds a change sequence counter to each net namespace
      which is bumped whenever a netdevice is added or removed from
      the list. If such a change occurred while a link dump took place,
      the dump will have the NLM_F_DUMP_INTR flag set in the first
      message which has been interrupted and in all subsequent messages
      of the same dump.
      
      Note that links may still be modified or renamed while a dump is
      taking place but we can guarantee for userspace to receive a
      complete list of links and not miss any.
      
      Testing:
      I have added 500 VLAN netdevices to make sure the dump is split
      over multiple messages. Then while continuously dumping links in
      one process I also continuously deleted and re-added a dummy
      netdevice in another process. Multiple dumps per seconds have
      had the NLM_F_DUMP_INTR flag set.
      
      I guess we can wait for Johannes patch to hit net-next via the
      wireless tree.  I just wanted to give this some testing right away.
      Signed-off-by: NThomas Graf <tgraf@infradead.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e985ada
  4. 22 6月, 2011 1 次提交
  5. 12 6月, 2011 1 次提交
    • J
      vlan: Fix the ingress VLAN_FLAG_REORDER_HDR check · 0b5c9db1
      Jiri Pirko 提交于
      Testing of VLAN_FLAG_REORDER_HDR does not belong in vlan_untag
      but rather in vlan_do_receive.  Otherwise the vlan header
      will not be properly put on the packet in the case of
      vlan header accelleration.
      
      As we remove the check from vlan_check_reorder_header
      rename it vlan_reorder_header to keep the naming clean.
      
      Fix up the skb->pkt_type early so we don't look at the packet
      after adding the vlan tag, which guarantees we don't goof
      and look at the wrong field.
      
      Use a simple if statement instead of a complicated switch
      statement to decided that we need to increment rx_stats
      for a multicast packet.
      
      Hopefully at somepoint we will just declare the case where
      VLAN_FLAG_REORDER_HDR is cleared as unsupported and remove
      the code.  Until then this keeps it working correctly.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NJiri Pirko <jpirko@redhat.com>
      Acked-by: NChangli Gao <xiaosuo@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b5c9db1
  6. 09 6月, 2011 1 次提交
    • A
      v2 ethtool: remove support for ETHTOOL_GRXNTUPLE · bff55273
      Alexander Duyck 提交于
      This change is meant to remove all support for displaying an ntuple as
      strings via ETHTOOL_GRXNTUPLE.  The reason for this change is due to the
      fact that multiple issues have been found including:
       - Multiple buffer overruns for strings being displayed.
       - Incorrect filters displayed, cleared filters with ring of -2 are displayed
       - Setting get_rx_ntuple displays no rules if defined.
       - Endianess wrong on displayed values.
       - Hard limit of 1024 filters makes display functionality extremely limited
      
      The only driver that had supported this interface was ixgbe.  Since it no
      longer uses the interface and due to the issues mentioned above I am
      submitting this patch to remove it.
      
      v2:
      Updated based on comments from Ben Hutchings
       - Left ETH_SS_NTUPLE_FILTERS in code but commented on it being deprecated
       - Removed ethtool_rx_ntuple_list and ethtool_rx_ntuple_flow_spec_container
       - Left ETHTOOL_GRXNTUPLE but commented it as deprecated
      
      Also cleaned up set_rx_ntuple since there is no flow spec container to
      maintain we can drop all the code for the alloc and free of it and just
      return ops->set_rx_ntuple().
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bff55273
  7. 07 6月, 2011 2 次提交
  8. 03 6月, 2011 1 次提交
    • K
      net: tracepoint of net_dev_xmit sees freed skb and causes panic · ec764bf0
      Koki Sanagi 提交于
      Because there is a possibility that skb is kfree_skb()ed and zero cleared
      after ndo_start_xmit, we should not see the contents of skb like skb->len and
      skb->dev->name after ndo_start_xmit. But trace_net_dev_xmit does that
      and causes panic by NULL pointer dereference.
      This patch fixes trace_net_dev_xmit not to see the contents of skb directly.
      
      If you want to reproduce this panic,
      
      1. Get tracepoint of net_dev_xmit on
      2. Create 2 guests on KVM
      2. Make 2 guests use virtio_net
      4. Execute netperf from one to another for a long time as a network burden
      5. host will panic(It takes about 30 minutes)
      Signed-off-by: NKoki Sanagi <sanagi.koki@jp.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec764bf0
  9. 26 5月, 2011 1 次提交
  10. 25 5月, 2011 1 次提交
    • E
      net: use synchronize_rcu_expedited() · be3fc413
      Eric Dumazet 提交于
      synchronize_rcu() is very slow in various situations (HZ=100,
      CONFIG_NO_HZ=y, CONFIG_PREEMPT=n)
      
      Extract from my (mostly idle) 8 core machine :
      
       synchronize_rcu() in 99985 us
       synchronize_rcu() in 79982 us
       synchronize_rcu() in 87612 us
       synchronize_rcu() in 79827 us
       synchronize_rcu() in 109860 us
       synchronize_rcu() in 98039 us
       synchronize_rcu() in 89841 us
       synchronize_rcu() in 79842 us
       synchronize_rcu() in 80151 us
       synchronize_rcu() in 119833 us
       synchronize_rcu() in 99858 us
       synchronize_rcu() in 73999 us
       synchronize_rcu() in 79855 us
       synchronize_rcu() in 79853 us
      
      When we hold RTNL mutex, we would like to spend some cpu cycles but not
      block too long other processes waiting for this mutex.
      
      We also want to setup/dismantle network features as fast as possible at
      boot/shutdown time.
      
      This patch makes synchronize_net() call the expedited version if RTNL is
      locked.
      
      synchronize_rcu_expedited() typical delay is about 20 us on my machine.
      
       synchronize_rcu_expedited() in 18 us
       synchronize_rcu_expedited() in 18 us
       synchronize_rcu_expedited() in 18 us
       synchronize_rcu_expedited() in 18 us
       synchronize_rcu_expedited() in 20 us
       synchronize_rcu_expedited() in 16 us
       synchronize_rcu_expedited() in 20 us
       synchronize_rcu_expedited() in 18 us
       synchronize_rcu_expedited() in 18 us
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      CC: Ben Greear <greearb@candelatech.com>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be3fc413
  11. 23 5月, 2011 1 次提交
    • E
      net: remove synchronize_net() from netdev_set_master() · 6df427fe
      Eric Dumazet 提交于
      In the old days, we used to access dev->master in __netif_receive_skb()
      in a rcu_read_lock section.
      
      So one synchronize_net() call was needed in netdev_set_master() to make
      sure another cpu could not use old master while/after we release it.
      
      We now use netdev_rx_handler infrastructure and added one
      synchronize_net() call in bond_release()/bond_release_all()
      
      Remove the obsolete synchronize_net() from netdev_set_master() and add
      one in bridge del_nbp() after its netdev_rx_handler_unregister() call.
      
      This makes enslave -d a bit faster.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Jiri Pirko <jpirko@redhat.com>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6df427fe
  12. 20 5月, 2011 1 次提交
  13. 18 5月, 2011 2 次提交
  14. 17 5月, 2011 1 次提交
  15. 14 5月, 2011 1 次提交
    • P
      net:set valid name before calling ndo_init() · 0696c3a8
      Peter Pan(潘卫平) 提交于
      In commit 1c5cae81 (net: call dev_alloc_name from register_netdevice),
      a bug of bonding was involved, see example 1 and 2.
      
      In register_netdevice(), the name of net_device is not valid until
      dev_get_valid_name() is called. But dev->netdev_ops->ndo_init(that is
      bond_init) is called before dev_get_valid_name(),
      and it uses the invalid name of net_device.
      
      I think register_netdevice() should make sure that the name of net_device is
      valid before calling ndo_init().
      
      example 1:
      modprobe bonding
      ls  /proc/net/bonding/bond%d
      
      ps -eLf
      root      3398     2  3398  0    1 21:34 ?        00:00:00 [bond%d]
      
      example 2:
      modprobe bonding max_bonds=3
      
      [  170.100292] bonding: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
      [  170.101090] bonding: Warning: either miimon or arp_interval and arp_ip_target module parameters must be specified, otherwise bonding will not detect link failures! see bonding.txt for details.
      [  170.102469] ------------[ cut here ]------------
      [  170.103150] WARNING: at /home/pwp/net-next-2.6/fs/proc/generic.c:586 proc_register+0x126/0x157()
      [  170.104075] Hardware name: VirtualBox
      [  170.105065] proc_dir_entry 'bonding/bond%d' already registered
      [  170.105613] Modules linked in: bonding(+) sunrpc ipv6 uinput microcode ppdev parport_pc parport joydev e1000 pcspkr i2c_piix4 i2c_core [last unloaded: bonding]
      [  170.108397] Pid: 3457, comm: modprobe Not tainted 2.6.39-rc2+ #14
      [  170.108935] Call Trace:
      [  170.109382]  [<c0438f3b>] warn_slowpath_common+0x6a/0x7f
      [  170.109911]  [<c051a42a>] ? proc_register+0x126/0x157
      [  170.110329]  [<c0438fc3>] warn_slowpath_fmt+0x2b/0x2f
      [  170.110846]  [<c051a42a>] proc_register+0x126/0x157
      [  170.111870]  [<c051a4dd>] proc_create_data+0x82/0x98
      [  170.112335]  [<f94e6af6>] bond_create_proc_entry+0x3f/0x73 [bonding]
      [  170.112905]  [<f94dd806>] bond_init+0x77/0xa5 [bonding]
      [  170.113319]  [<c0721ac6>] register_netdevice+0x8c/0x1d3
      [  170.113848]  [<f94e0e30>] bond_create+0x6c/0x90 [bonding]
      [  170.114322]  [<f94f4763>] bonding_init+0x763/0x7b1 [bonding]
      [  170.114879]  [<c0401240>] do_one_initcall+0x76/0x122
      [  170.115317]  [<f94f4000>] ? 0xf94f3fff
      [  170.115799]  [<c0463f1e>] sys_init_module+0x1286/0x140d
      [  170.116879]  [<c07c6d9f>] sysenter_do_call+0x12/0x28
      [  170.117404] ---[ end trace 64e4fac3ae5fff1a ]---
      [  170.117924] bond%d: Warning: failed to register to debugfs
      [  170.128728] ------------[ cut here ]------------
      [  170.129360] WARNING: at /home/pwp/net-next-2.6/fs/proc/generic.c:586 proc_register+0x126/0x157()
      [  170.130323] Hardware name: VirtualBox
      [  170.130797] proc_dir_entry 'bonding/bond%d' already registered
      [  170.131315] Modules linked in: bonding(+) sunrpc ipv6 uinput microcode ppdev parport_pc parport joydev e1000 pcspkr i2c_piix4 i2c_core [last unloaded: bonding]
      [  170.133731] Pid: 3457, comm: modprobe Tainted: G        W   2.6.39-rc2+ #14
      [  170.134308] Call Trace:
      [  170.134743]  [<c0438f3b>] warn_slowpath_common+0x6a/0x7f
      [  170.135305]  [<c051a42a>] ? proc_register+0x126/0x157
      [  170.135820]  [<c0438fc3>] warn_slowpath_fmt+0x2b/0x2f
      [  170.137168]  [<c051a42a>] proc_register+0x126/0x157
      [  170.137700]  [<c051a4dd>] proc_create_data+0x82/0x98
      [  170.138174]  [<f94e6af6>] bond_create_proc_entry+0x3f/0x73 [bonding]
      [  170.138745]  [<f94dd806>] bond_init+0x77/0xa5 [bonding]
      [  170.139278]  [<c0721ac6>] register_netdevice+0x8c/0x1d3
      [  170.139828]  [<f94e0e30>] bond_create+0x6c/0x90 [bonding]
      [  170.140361]  [<f94f4763>] bonding_init+0x763/0x7b1 [bonding]
      [  170.140927]  [<c0401240>] do_one_initcall+0x76/0x122
      [  170.141494]  [<f94f4000>] ? 0xf94f3fff
      [  170.141975]  [<c0463f1e>] sys_init_module+0x1286/0x140d
      [  170.142463]  [<c07c6d9f>] sysenter_do_call+0x12/0x28
      [  170.142974] ---[ end trace 64e4fac3ae5fff1b ]---
      [  170.144949] bond%d: Warning: failed to register to debugfs
      Signed-off-by: NWeiping Pan <panweiping3@gmail.com>
      Reviewed-by: NJiri Pirko <jpirko@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0696c3a8
  16. 13 5月, 2011 1 次提交
  17. 11 5月, 2011 1 次提交
  18. 06 5月, 2011 1 次提交
  19. 03 5月, 2011 1 次提交
  20. 30 4月, 2011 1 次提交
    • D
      ethtool: Call ethtool's get/set_settings callbacks with cleaned data · 8ae6daca
      David Decotigny 提交于
      This makes sure that when a driver calls the ethtool's
      get/set_settings() callback of another driver, the data passed to it
      is clean. This guarantees that speed_hi will be zeroed correctly if
      the called callback doesn't explicitely set it: we are sure we don't
      get a corrupted speed from the underlying driver. We also take care of
      setting the cmd field appropriately (ETHTOOL_GSET/SSET).
      
      This applies to dev_ethtool_get_settings(), which now makes sure it
      sets up that ethtool command parameter correctly before passing it to
      drivers. This also means that whoever calls dev_ethtool_get_settings()
      does not have to clean the ethtool command parameter. This function
      also becomes an exported symbol instead of an inline.
      
      All drivers visible to make allyesconfig under x86_64 have been
      updated.
      Signed-off-by: NDavid Decotigny <decot@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ae6daca
  21. 29 4月, 2011 1 次提交
  22. 26 4月, 2011 2 次提交
  23. 23 4月, 2011 1 次提交
  24. 13 4月, 2011 4 次提交
  25. 05 4月, 2011 1 次提交
    • T
      net: Allow no-cache copy from user on transmit · c6e1a0d1
      Tom Herbert 提交于
      This patch uses __copy_from_user_nocache on transmit to bypass data
      cache for a performance improvement.  skb_add_data_nocache and
      skb_copy_to_page_nocache can be called by sendmsg functions to use
      this feature, initial support is in tcp_sendmsg.  This functionality is
      configurable per device using ethtool.
      
      Presumably, this feature would only be useful when the driver does
      not touch the data.  The feature is turned on by default if a device
      indicates that it does some form of checksum offload; it is off by
      default for devices that do no checksum offload or indicate no checksum
      is necessary.  For the former case copy-checksum is probably done
      anyway, in the latter case the device is likely loopback in which case
      the no cache copy is probably not beneficial.
      
      This patch was tested using 200 instances of netperf TCP_RR with
      1400 byte request and one byte reply.  Platform is 16 core AMD x86.
      
      No-cache copy disabled:
         672703 tps, 97.13% utilization
         50/90/99% latency:244.31 484.205 1028.41
      
      No-cache copy enabled:
         702113 tps, 96.16% utilization,
         50/90/99% latency 238.56 467.56 956.955
      
      Using 14000 byte request and response sizes demonstrate the
      effects more dramatically:
      
      No-cache copy disabled:
         79571 tps, 34.34 %utlization
         50/90/95% latency 1584.46 2319.59 5001.76
      
      No-cache copy enabled:
         83856 tps, 34.81% utilization
         50/90/95% latency 2508.42 2622.62 2735.88
      
      Note especially the effect on latency tail (95th percentile).
      
      This seems to provide a nice performance improvement and is
      consistent in the tests I ran.  Presumably, this would provide
      the greatest benfits in the presence of an application workload
      stressing the cache and a lot of transmit data happening.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6e1a0d1
  26. 03 4月, 2011 1 次提交
  27. 31 3月, 2011 1 次提交
  28. 30 3月, 2011 1 次提交
  29. 28 3月, 2011 2 次提交
  30. 22 3月, 2011 1 次提交
  31. 17 3月, 2011 1 次提交
  32. 10 3月, 2011 1 次提交
    • V
      net: don't allow CAP_NET_ADMIN to load non-netdev kernel modules · 8909c9ad
      Vasiliy Kulikov 提交于
      Since a8f80e8f any process with
      CAP_NET_ADMIN may load any module from /lib/modules/.  This doesn't mean
      that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are
      limited to /lib/modules/**.  However, CAP_NET_ADMIN capability shouldn't
      allow anybody load any module not related to networking.
      
      This patch restricts an ability of autoloading modules to netdev modules
      with explicit aliases.  This fixes CVE-2011-1019.
      
      Arnd Bergmann suggested to leave untouched the old pre-v2.6.32 behavior
      of loading netdev modules by name (without any prefix) for processes
      with CAP_SYS_MODULE to maintain the compatibility with network scripts
      that use autoloading netdev modules by aliases like "eth0", "wlan0".
      
      Currently there are only three users of the feature in the upstream
      kernel: ipip, ip_gre and sit.
      
          root@albatros:~# capsh --drop=$(seq -s, 0 11),$(seq -s, 13 34) --
          root@albatros:~# grep Cap /proc/$$/status
          CapInh:	0000000000000000
          CapPrm:	fffffff800001000
          CapEff:	fffffff800001000
          CapBnd:	fffffff800001000
          root@albatros:~# modprobe xfs
          FATAL: Error inserting xfs
          (/lib/modules/2.6.38-rc6-00001-g2bf4ca3/kernel/fs/xfs/xfs.ko): Operation not permitted
          root@albatros:~# lsmod | grep xfs
          root@albatros:~# ifconfig xfs
          xfs: error fetching interface information: Device not found
          root@albatros:~# lsmod | grep xfs
          root@albatros:~# lsmod | grep sit
          root@albatros:~# ifconfig sit
          sit: error fetching interface information: Device not found
          root@albatros:~# lsmod | grep sit
          root@albatros:~# ifconfig sit0
          sit0      Link encap:IPv6-in-IPv4
      	      NOARP  MTU:1480  Metric:1
      
          root@albatros:~# lsmod | grep sit
          sit                    10457  0
          tunnel4                 2957  1 sit
      
      For CAP_SYS_MODULE module loading is still relaxed:
      
          root@albatros:~# grep Cap /proc/$$/status
          CapInh:	0000000000000000
          CapPrm:	ffffffffffffffff
          CapEff:	ffffffffffffffff
          CapBnd:	ffffffffffffffff
          root@albatros:~# ifconfig xfs
          xfs: error fetching interface information: Device not found
          root@albatros:~# lsmod | grep xfs
          xfs                   745319  0
      
      Reference: https://lkml.org/lkml/2011/2/24/203Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Signed-off-by: NMichael Tokarev <mjt@tls.msk.ru>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NKees Cook <kees.cook@canonical.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      8909c9ad