1. 03 12月, 2006 3 次提交
  2. 02 12月, 2006 1 次提交
  3. 17 10月, 2006 1 次提交
  4. 29 9月, 2006 1 次提交
    • P
      [NET_SCHED]: Fix fallout from dev->qdisc RCU change · 85670cc1
      Patrick McHardy 提交于
      The move of qdisc destruction to a rcu callback broke locking in the
      entire qdisc layer by invalidating previously valid assumptions about
      the context in which changes to the qdisc tree occur.
      
      The two assumptions were:
      
      - since changes only happen in process context, read_lock doesn't need
        bottem half protection. Now invalid since destruction of inner qdiscs,
        classifiers, actions and estimators happens in the RCU callback unless
        they're manually deleted, resulting in dead-locks when read_lock in
        process context is interrupted by write_lock_bh in bottem half context.
      
      - since changes only happen under the RTNL, no additional locking is
        necessary for data not used during packet processing (f.e. u32_list).
        Again, since destruction now happens in the RCU callback, this assumption
        is not valid anymore, causing races while using this data, which can
        result in corruption or use-after-free.
      
      Instead of "fixing" this by disabling bottem halfs everywhere and adding
      new locks/refcounting, this patch makes these assumptions valid again by
      moving destruction back to process context. Since only the dev->qdisc
      pointer is protected by RCU, but ->enqueue and the qdisc tree are still
      protected by dev->qdisc_lock, destruction of the tree can be performed
      immediately and only the final free needs to happen in the rcu callback
      to make sure dev_queue_xmit doesn't access already freed memory.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85670cc1
  5. 23 9月, 2006 2 次提交
  6. 18 8月, 2006 2 次提交
  7. 03 8月, 2006 3 次提交
  8. 09 7月, 2006 1 次提交
  9. 08 7月, 2006 1 次提交
  10. 01 7月, 2006 1 次提交
  11. 30 6月, 2006 2 次提交
    • H
      [NET]: Make illegal_highdma more anal · 3d3a8533
      Herbert Xu 提交于
      Rather than having illegal_highdma as a macro when HIGHMEM is off, we
      can turn it into an inline function that returns zero.  This will catch
      callers that give it bad arguments.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d3a8533
    • H
      [NET]: Added GSO header verification · 576a30eb
      Herbert Xu 提交于
      When GSO packets come from an untrusted source (e.g., a Xen guest domain),
      we need to verify the header integrity before passing it to the hardware.
      
      Since the first step in GSO is to verify the header, we can reuse that
      code by adding a new bit to gso_type: SKB_GSO_DODGY.  Packets with this
      bit set can only be fed directly to devices with the corresponding bit
      NETIF_F_GSO_ROBUST.  If the device doesn't have that bit, then the skb
      is fed to the GSO engine which will allow the packet to be sent to the
      hardware if it passes the header check.
      
      This patch changes the sg flag to a full features flag.  The same method
      can be used to implement TSO ECN support.  We simply have to mark packets
      with CWR set with SKB_GSO_ECN so that only hardware with a corresponding
      NETIF_F_TSO_ECN can accept them.  The GSO engine can either fully segment
      the packet, or segment the first MTU and pass the rest to the hardware for
      further segmentation.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      576a30eb
  12. 26 6月, 2006 2 次提交
  13. 23 6月, 2006 4 次提交
    • O
      [PATCH] list: use list_replace_init() instead of list_splice_init() · 626ab0e6
      Oleg Nesterov 提交于
      list_splice_init(list, head) does unneeded job if it is known that
      list_empty(head) == 1.  We can use list_replace_init() instead.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      626ab0e6
    • R
      [NET]: fix net-core kernel-doc · f4b8ea78
      Randy Dunlap 提交于
      Warning(/var/linsrc/linux-2617-g4//include/linux/skbuff.h:304): No description found for parameter 'dma_cookie'
      Warning(/var/linsrc/linux-2617-g4//include/net/sock.h:1274): No description found for parameter 'copied_early'
      Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'chan'
      Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'event'
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4b8ea78
    • H
      [NET]: Add generic segmentation offload · f6a78bfc
      Herbert Xu 提交于
      This patch adds the infrastructure for generic segmentation offload.
      The idea is to tap into the potential savings of TSO without hardware
      support by postponing the allocation of segmented skb's until just
      before the entry point into the NIC driver.
      
      The same structure can be used to support software IPv6 TSO, as well as
      UFO and segmentation offload for other relevant protocols, e.g., DCCP.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6a78bfc
    • H
      [NET]: Prevent transmission after dev_deactivate · d4828d85
      Herbert Xu 提交于
      The dev_deactivate function has bit-rotted since the introduction of
      lockless drivers.  In particular, the spin_unlock_wait call at the end
      has no effect on the xmit routine of lockless drivers.
      
      With a little bit of work, we can make it much more useful by providing
      the guarantee that when it returns, no more calls to the xmit routine
      of the underlying driver will be made.
      
      The idea is simple.  There are two entry points in to the xmit routine.
      The first comes from dev_queue_xmit.  That one is easily stopped by
      using synchronize_rcu.  This works because we set the qdisc to noop_qdisc
      before the synchronize_rcu call.  That in turn causes all subsequent
      packets sent to dev_queue_xmit to be dropped.  The synchronize_rcu call
      also ensures all outstanding calls leave their critical section.
      
      The other entry point is from qdisc_run.  Since we now have a bit that
      indicates whether it's running, all we have to do is to wait until the
      bit is off.
      
      I've removed the loop to wait for __LINK_STATE_SCHED to clear.  This is
      useless because netif_wake_queue can cause it to be set again.  It is
      also harmless because we've disarmed qdisc_run.
      
      I've also removed the spin_unlock_wait on xmit_lock because its only
      purpose of making sure that all outstanding xmit_lock holders have
      exited is also given by dev_watchdog_down.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4828d85
  14. 18 6月, 2006 4 次提交
    • H
      [NET]: Add NETIF_F_GEN_CSUM and NETIF_F_ALL_CSUM · 8648b305
      Herbert Xu 提交于
      The current stack treats NETIF_F_HW_CSUM and NETIF_F_NO_CSUM
      identically so we test for them in quite a few places.  For the sake
      of brevity, I'm adding the macro NETIF_F_GEN_CSUM for these two.  We
      also test the disjunct of NETIF_F_IP_CSUM and the other two in various
      places, for that purpose I've added NETIF_F_ALL_CSUM.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8648b305
    • H
      [NET]: Clean up skb_linearize · 364c6bad
      Herbert Xu 提交于
      The linearisation operation doesn't need to be super-optimised.  So we can
      replace __skb_linearize with __pskb_pull_tail which does the same thing but
      is more general.
      
      Also, most users of skb_linearize end up testing whether the skb is linear
      or not so it helps to make skb_linearize do just that.
      
      Some callers of skb_linearize also use it to copy cloned data, so it's
      useful to have a new function skb_linearize_cow to copy the data if it's
      either non-linear or cloned.
      
      Last but not least, I've removed the gfp argument since nobody uses it
      anymore.  If it's ever needed we can easily add it back.
      
      Misc bugs fixed by this patch:
      
      * via-velocity error handling (also, no SG => no frags)
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      364c6bad
    • H
      [NET]: Add netif_tx_lock · 932ff279
      Herbert Xu 提交于
      Various drivers use xmit_lock internally to synchronise with their
      transmission routines.  They do so without setting xmit_lock_owner.
      This is fine as long as netpoll is not in use.
      
      With netpoll it is possible for deadlocks to occur if xmit_lock_owner
      isn't set.  This is because if a printk occurs while xmit_lock is held
      and xmit_lock_owner is not set can cause netpoll to attempt to take
      xmit_lock recursively.
      
      While it is possible to resolve this by getting netpoll to use
      trylock, it is suboptimal because netpoll's sole objective is to
      maximise the chance of getting the printk out on the wire.  So
      delaying or dropping the message is to be avoided as much as possible.
      
      So the only alternative is to always set xmit_lock_owner.  The
      following patch does this by introducing the netif_tx_lock family of
      functions that take care of setting/unsetting xmit_lock_owner.
      
      I renamed xmit_lock to _xmit_lock to indicate that it should not be
      used directly.  I didn't provide irq versions of the netif_tx_lock
      functions since xmit_lock is meant to be a BH-disabling lock.
      
      This is pretty much a straight text substitution except for a small
      bug fix in winbond.  It currently uses
      netif_stop_queue/spin_unlock_wait to stop transmission.  This is
      unsafe as an IRQ can potentially wake up the queue.  So it is safer to
      use netif_tx_disable.
      
      The hamradio bits used spin_lock_irq but it is unnecessary as
      xmit_lock must never be taken in an IRQ handler.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      932ff279
    • C
      [I/OAT]: Setup the networking subsystem as a DMA client · db217334
      Chris Leech 提交于
      Attempts to allocate per-CPU DMA channels
      Signed-off-by: NChris Leech <christopher.leech@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db217334
  15. 27 5月, 2006 1 次提交
  16. 11 5月, 2006 1 次提交
  17. 10 5月, 2006 1 次提交
  18. 07 5月, 2006 1 次提交
  19. 20 4月, 2006 1 次提交
  20. 11 4月, 2006 1 次提交
  21. 10 4月, 2006 2 次提交
  22. 30 3月, 2006 1 次提交
    • D
      [NET]: Deinline some larger functions from netdevice.h · 56079431
      Denis Vlasenko 提交于
      On a allyesconfig'ured kernel:
      
      Size  Uses Wasted Name and definition
      ===== ==== ====== ================================================
         95  162  12075 netif_wake_queue      include/linux/netdevice.h
        129   86   9265 dev_kfree_skb_any     include/linux/netdevice.h
        127   56   5885 netif_device_attach   include/linux/netdevice.h
         73   86   4505 dev_kfree_skb_irq     include/linux/netdevice.h
         46   60   1534 netif_device_detach   include/linux/netdevice.h
        119   16   1485 __netif_rx_schedule   include/linux/netdevice.h
        143    5    492 netif_rx_schedule     include/linux/netdevice.h
         81    7    366 netif_schedule        include/linux/netdevice.h
      
      netif_wake_queue is big because __netif_schedule is a big inline:
      
      static inline void __netif_schedule(struct net_device *dev)
      {
              if (!test_and_set_bit(__LINK_STATE_SCHED, &dev->state)) {
                      unsigned long flags;
                      struct softnet_data *sd;
      
                      local_irq_save(flags);
                      sd = &__get_cpu_var(softnet_data);
                      dev->next_sched = sd->output_queue;
                      sd->output_queue = dev;
                      raise_softirq_irqoff(NET_TX_SOFTIRQ);
                      local_irq_restore(flags);
              }
      }
      
      static inline void netif_wake_queue(struct net_device *dev)
      {
      #ifdef CONFIG_NETPOLL_TRAP
              if (netpoll_trap())
                      return;
      #endif
              if (test_and_clear_bit(__LINK_STATE_XOFF, &dev->state))
                      __netif_schedule(dev);
      }
      
      By de-inlining __netif_schedule we are saving a lot of text
      at each callsite of netif_wake_queue and netif_schedule.
      __netif_rx_schedule is also big, and it makes more sense to keep
      both of them out of line.
      
      Patch also deinlines dev_kfree_skb_any. We can deinline dev_kfree_skb_irq
      instead... oh well.
      
      netif_device_attach/detach are not hot paths, we can deinline them too.
      Signed-off-by: NDenis Vlasenko <vda@ilport.com.ua>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56079431
  23. 28 3月, 2006 1 次提交
    • A
      [PATCH] Notifier chain update: API changes · e041c683
      Alan Stern 提交于
      The kernel's implementation of notifier chains is unsafe.  There is no
      protection against entries being added to or removed from a chain while the
      chain is in use.  The issues were discussed in this thread:
      
          http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2
      
      We noticed that notifier chains in the kernel fall into two basic usage
      classes:
      
      	"Blocking" chains are always called from a process context
      	and the callout routines are allowed to sleep;
      
      	"Atomic" chains can be called from an atomic context and
      	the callout routines are not allowed to sleep.
      
      We decided to codify this distinction and make it part of the API.  Therefore
      this set of patches introduces three new, parallel APIs: one for blocking
      notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
      really just the old API under a new name).  New kinds of data structures are
      used for the heads of the chains, and new routines are defined for
      registration, unregistration, and calling a chain.  The three APIs are
      explained in include/linux/notifier.h and their implementation is in
      kernel/sys.c.
      
      With atomic and blocking chains, the implementation guarantees that the chain
      links will not be corrupted and that chain callers will not get messed up by
      entries being added or removed.  For raw chains the implementation provides no
      guarantees at all; users of this API must provide their own protections.  (The
      idea was that situations may come up where the assumptions of the atomic and
      blocking APIs are not appropriate, so it should be possible for users to
      handle these things in their own way.)
      
      There are some limitations, which should not be too hard to live with.  For
      atomic/blocking chains, registration and unregistration must always be done in
      a process context since the chain is protected by a mutex/rwsem.  Also, a
      callout routine for a non-raw chain must not try to register or unregister
      entries on its own chain.  (This did happen in a couple of places and the code
      had to be changed to avoid it.)
      
      Since atomic chains may be called from within an NMI handler, they cannot use
      spinlocks for synchronization.  Instead we use RCU.  The overhead falls almost
      entirely in the unregister routine, which is okay since unregistration is much
      less frequent that calling a chain.
      
      Here is the list of chains that we adjusted and their classifications.  None
      of them use the raw API, so for the moment it is only a placeholder.
      
        ATOMIC CHAINS
        -------------
      arch/i386/kernel/traps.c:		i386die_chain
      arch/ia64/kernel/traps.c:		ia64die_chain
      arch/powerpc/kernel/traps.c:		powerpc_die_chain
      arch/sparc64/kernel/traps.c:		sparc64die_chain
      arch/x86_64/kernel/traps.c:		die_chain
      drivers/char/ipmi/ipmi_si_intf.c:	xaction_notifier_list
      kernel/panic.c:				panic_notifier_list
      kernel/profile.c:			task_free_notifier
      net/bluetooth/hci_core.c:		hci_notifier
      net/ipv4/netfilter/ip_conntrack_core.c:	ip_conntrack_chain
      net/ipv4/netfilter/ip_conntrack_core.c:	ip_conntrack_expect_chain
      net/ipv6/addrconf.c:			inet6addr_chain
      net/netfilter/nf_conntrack_core.c:	nf_conntrack_chain
      net/netfilter/nf_conntrack_core.c:	nf_conntrack_expect_chain
      net/netlink/af_netlink.c:		netlink_chain
      
        BLOCKING CHAINS
        ---------------
      arch/powerpc/platforms/pseries/reconfig.c:	pSeries_reconfig_chain
      arch/s390/kernel/process.c:		idle_chain
      arch/x86_64/kernel/process.c		idle_notifier
      drivers/base/memory.c:			memory_chain
      drivers/cpufreq/cpufreq.c		cpufreq_policy_notifier_list
      drivers/cpufreq/cpufreq.c		cpufreq_transition_notifier_list
      drivers/macintosh/adb.c:		adb_client_list
      drivers/macintosh/via-pmu.c		sleep_notifier_list
      drivers/macintosh/via-pmu68k.c		sleep_notifier_list
      drivers/macintosh/windfarm_core.c	wf_client_list
      drivers/usb/core/notify.c		usb_notifier_list
      drivers/video/fbmem.c			fb_notifier_list
      kernel/cpu.c				cpu_chain
      kernel/module.c				module_notify_list
      kernel/profile.c			munmap_notifier
      kernel/profile.c			task_exit_notifier
      kernel/sys.c				reboot_notifier_list
      net/core/dev.c				netdev_chain
      net/decnet/dn_dev.c:			dnaddr_chain
      net/ipv4/devinet.c:			inetaddr_chain
      
      It's possible that some of these classifications are wrong.  If they are,
      please let us know or submit a patch to fix them.  Note that any chain that
      gets called very frequently should be atomic, because the rwsem read-locking
      used for blocking chains is very likely to incur cache misses on SMP systems.
      (However, if the chain's callout routines may sleep then the chain cannot be
      atomic.)
      
      The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
      material written by Keith Owens and suggestions from Paul McKenney and Andrew
      Morton.
      
      [jes@sgi.com: restructure the notifier chain initialization macros]
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Signed-off-by: NJes Sorensen <jes@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e041c683
  24. 25 3月, 2006 1 次提交
    • H
      [NET]: Take RTNL when unregistering notifier · 9f514950
      Herbert Xu 提交于
      The netdev notifier call chain is currently unregistered without taking
      any locks outside the notifier system.  Because the notifier system itself
      does not synchronise unregistration with respect to the calling of the
      chain, we as its user need to do our own locking.
      
      We are supposed to take the RTNL for all calls to netdev notifiers, so
      taking the RTNL should be sufficient to protect it.
      
      The registration path in dev.c already takes the RTNL so it's OK.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f514950
  25. 21 3月, 2006 1 次提交