1. 11 3月, 2011 1 次提交
  2. 10 3月, 2011 1 次提交
    • V
      net: don't allow CAP_NET_ADMIN to load non-netdev kernel modules · 8909c9ad
      Vasiliy Kulikov 提交于
      Since a8f80e8f any process with
      CAP_NET_ADMIN may load any module from /lib/modules/.  This doesn't mean
      that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are
      limited to /lib/modules/**.  However, CAP_NET_ADMIN capability shouldn't
      allow anybody load any module not related to networking.
      
      This patch restricts an ability of autoloading modules to netdev modules
      with explicit aliases.  This fixes CVE-2011-1019.
      
      Arnd Bergmann suggested to leave untouched the old pre-v2.6.32 behavior
      of loading netdev modules by name (without any prefix) for processes
      with CAP_SYS_MODULE to maintain the compatibility with network scripts
      that use autoloading netdev modules by aliases like "eth0", "wlan0".
      
      Currently there are only three users of the feature in the upstream
      kernel: ipip, ip_gre and sit.
      
          root@albatros:~# capsh --drop=$(seq -s, 0 11),$(seq -s, 13 34) --
          root@albatros:~# grep Cap /proc/$$/status
          CapInh:	0000000000000000
          CapPrm:	fffffff800001000
          CapEff:	fffffff800001000
          CapBnd:	fffffff800001000
          root@albatros:~# modprobe xfs
          FATAL: Error inserting xfs
          (/lib/modules/2.6.38-rc6-00001-g2bf4ca3/kernel/fs/xfs/xfs.ko): Operation not permitted
          root@albatros:~# lsmod | grep xfs
          root@albatros:~# ifconfig xfs
          xfs: error fetching interface information: Device not found
          root@albatros:~# lsmod | grep xfs
          root@albatros:~# lsmod | grep sit
          root@albatros:~# ifconfig sit
          sit: error fetching interface information: Device not found
          root@albatros:~# lsmod | grep sit
          root@albatros:~# ifconfig sit0
          sit0      Link encap:IPv6-in-IPv4
      	      NOARP  MTU:1480  Metric:1
      
          root@albatros:~# lsmod | grep sit
          sit                    10457  0
          tunnel4                 2957  1 sit
      
      For CAP_SYS_MODULE module loading is still relaxed:
      
          root@albatros:~# grep Cap /proc/$$/status
          CapInh:	0000000000000000
          CapPrm:	ffffffffffffffff
          CapEff:	ffffffffffffffff
          CapBnd:	ffffffffffffffff
          root@albatros:~# ifconfig xfs
          xfs: error fetching interface information: Device not found
          root@albatros:~# lsmod | grep xfs
          xfs                   745319  0
      
      Reference: https://lkml.org/lkml/2011/2/24/203Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Signed-off-by: NMichael Tokarev <mjt@tls.msk.ru>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NKees Cook <kees.cook@canonical.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      8909c9ad
  3. 08 3月, 2011 2 次提交
  4. 03 3月, 2011 1 次提交
  5. 18 2月, 2011 4 次提交
  6. 14 2月, 2011 3 次提交
  7. 28 1月, 2011 1 次提交
    • E
      net: fix dev_seq_next() · ccf43438
      Eric Dumazet 提交于
      Commit c6d14c84 (net: Introduce for_each_netdev_rcu() iterator)
      added a race in dev_seq_next().
      
      The rcu_dereference() call should be done _before_ testing the end of
      list, or we might return a wrong net_device if a concurrent thread
      changes net_device list under us.
      
      Note : discovered thanks to a sparse warning :
      
      net/core/dev.c:3919:9: error: incompatible types in comparison expression
      (different address spaces)
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccf43438
  8. 25 1月, 2011 3 次提交
  9. 20 1月, 2011 2 次提交
    • J
      net: implement mechanism for HW based QOS · 4f57c087
      John Fastabend 提交于
      This patch provides a mechanism for lower layer devices to
      steer traffic using skb->priority to tx queues. This allows
      for hardware based QOS schemes to use the default qdisc without
      incurring the penalties related to global state and the qdisc
      lock. While reliably receiving skbs on the correct tx ring
      to avoid head of line blocking resulting from shuffling in
      the LLD. Finally, all the goodness from txq caching and xps/rps
      can still be leveraged.
      
      Many drivers and hardware exist with the ability to implement
      QOS schemes in the hardware but currently these drivers tend
      to rely on firmware to reroute specific traffic, a driver
      specific select_queue or the queue_mapping action in the
      qdisc.
      
      By using select_queue for this drivers need to be updated for
      each and every traffic type and we lose the goodness of much
      of the upstream work. Firmware solutions are inherently
      inflexible. And finally if admins are expected to build a
      qdisc and filter rules to steer traffic this requires knowledge
      of how the hardware is currently configured. The number of tx
      queues and the queue offsets may change depending on resources.
      Also this approach incurs all the overhead of a qdisc with filters.
      
      With the mechanism in this patch users can set skb priority using
      expected methods ie setsockopt() or the stack can set the priority
      directly. Then the skb will be steered to the correct tx queues
      aligned with hardware QOS traffic classes. In the normal case with
      single traffic class and all queues in this class everything
      works as is until the LLD enables multiple tcs.
      
      To steer the skb we mask out the lower 4 bits of the priority
      and allow the hardware to configure upto 15 distinct classes
      of traffic. This is expected to be sufficient for most applications
      at any rate it is more then the 8021Q spec designates and is
      equal to the number of prio bands currently implemented in
      the default qdisc.
      
      This in conjunction with a userspace application such as
      lldpad can be used to implement 8021Q transmission selection
      algorithms one of these algorithms being the extended transmission
      selection algorithm currently being used for DCB.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f57c087
    • V
      net_device: add support for network device groups · cbda10fa
      Vlad Dogaru 提交于
      Net devices can now be grouped, enabling simpler manipulation from
      userspace. This patch adds a group field to the net_device structure, as
      well as rtnetlink support to query and modify it.
      Signed-off-by: NVlad Dogaru <ddvlad@rosedu.org>
      Acked-by: NJamal Hadi Salim <hadi@cyberus.ca>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbda10fa
  10. 14 1月, 2011 1 次提交
    • E
      net: remove dev_txq_stats_fold() · 1ac9ad13
      Eric Dumazet 提交于
      After recent changes, (percpu stats on vlan/tunnels...), we dont need
      anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters.
      
      Only remaining users are ixgbe, sch_teql, gianfar & macvlan :
      
      1) ixgbe can be converted to use existing tx_ring counters.
      
      2) macvlan incremented txq->tx_dropped, it can use the
      dev->stats.tx_dropped counter.
      
      3) sch_teql : almost revert ab35cd4b (Use net_device internal stats)
          Now we have ndo_get_stats64(), use it, even for "unsigned long"
      fields (No need to bring back a struct net_device_stats)
      
      4) gianfar adds a stats structure per tx queue to hold
      tx_bytes/tx_packets
      
      This removes a lockdep warning (and possible lockup) in rndis gadget,
      calling dev_get_stats() from hard IRQ context.
      
      Ref: http://www.spinics.net/lists/netdev/msg149202.htmlReported-by: NNeil Jones <neiljay@gmail.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Jarek Poplawski <jarkao2@gmail.com>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      CC: Sandeep Gopalpet <sandeep.kumar@freescale.com>
      CC: Michal Nazarewicz <mina86@mina86.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ac9ad13
  11. 11 1月, 2011 1 次提交
  12. 10 1月, 2011 2 次提交
  13. 04 1月, 2011 1 次提交
  14. 17 12月, 2010 2 次提交
  15. 09 12月, 2010 1 次提交
    • E
      net: RCU conversion of dev_getbyhwaddr() and arp_ioctl() · 941666c2
      Eric Dumazet 提交于
      Le dimanche 05 décembre 2010 à 09:19 +0100, Eric Dumazet a écrit :
      
      > Hmm..
      >
      > If somebody can explain why RTNL is held in arp_ioctl() (and therefore
      > in arp_req_delete()), we might first remove RTNL use in arp_ioctl() so
      > that your patch can be applied.
      >
      > Right now it is not good, because RTNL wont be necessarly held when you
      > are going to call arp_invalidate() ?
      
      While doing this analysis, I found a refcount bug in llc, I'll send a
      patch for net-2.6
      
      Meanwhile, here is the patch for net-next-2.6
      
      Your patch then can be applied after mine.
      
      Thanks
      
      [PATCH] net: RCU conversion of dev_getbyhwaddr() and arp_ioctl()
      
      dev_getbyhwaddr() was called under RTNL.
      
      Rename it to dev_getbyhwaddr_rcu() and change all its caller to now use
      RCU locking instead of RTNL.
      
      Change arp_ioctl() to use RCU instead of RTNL locking.
      
      Note: this fix a dev refcount bug in llc
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      941666c2
  16. 02 12月, 2010 1 次提交
  17. 30 11月, 2010 1 次提交
  18. 29 11月, 2010 2 次提交
  19. 25 11月, 2010 1 次提交
    • T
      xps: Transmit Packet Steering · 1d24eb48
      Tom Herbert 提交于
      This patch implements transmit packet steering (XPS) for multiqueue
      devices.  XPS selects a transmit queue during packet transmission based
      on configuration.  This is done by mapping the CPU transmitting the
      packet to a queue.  This is the transmit side analogue to RPS-- where
      RPS is selecting a CPU based on receive queue, XPS selects a queue
      based on the CPU (previously there was an XPS patch from Eric
      Dumazet, but that might more appropriately be called transmit completion
      steering).
      
      Each transmit queue can be associated with a number of CPUs which will
      use the queue to send packets.  This is configured as a CPU mask on a
      per queue basis in:
      
      /sys/class/net/eth<n>/queues/tx-<n>/xps_cpus
      
      The mappings are stored per device in an inverted data structure that
      maps CPUs to queues.  In the netdevice structure this is an array of
      num_possible_cpu structures where each structure holds and array of
      queue_indexes for queues which that CPU can use.
      
      The benefits of XPS are improved locality in the per queue data
      structures.  Also, transmit completions are more likely to be done
      nearer to the sending thread, so this should promote locality back
      to the socket on free (e.g. UDP).  The benefits of XPS are dependent on
      cache hierarchy, application load, and other factors.  XPS would
      nominally be configured so that a queue would only be shared by CPUs
      which are sharing a cache, the degenerative configuration woud be that
      each CPU has it's own queue.
      
      Below are some benchmark results which show the potential benfit of
      this patch.  The netperf test has 500 instances of netperf TCP_RR test
      with 1 byte req. and resp.
      
      bnx2x on 16 core AMD
         XPS (16 queues, 1 TX queue per CPU)  1234K at 100% CPU
         No XPS (16 queues)                   996K at 100% CPU
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d24eb48
  20. 16 11月, 2010 3 次提交
  21. 09 11月, 2010 2 次提交
  22. 26 10月, 2010 4 次提交