1. 26 5月, 2009 1 次提交
    • E
      net: txq_trans_update() helper · 08baf561
      Eric Dumazet 提交于
      We would like to get rid of netdev->trans_start = jiffies; that about all net
      drivers have to use in their start_xmit() function, and use txq->trans_start
      instead.
      
      This can be done generically in core network, as suggested by David.
      
      Some devices, (particularly loopback) dont need trans_start update, because
      they dont have transmit watchdog. We could add a new device flag, or rely
      on fact that txq->tran_start can be updated is txq->xmit_lock_owner is
      different than -1. Use a helper function to hide our choice.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      08baf561
  2. 22 5月, 2009 2 次提交
  3. 18 5月, 2009 1 次提交
  4. 29 3月, 2009 1 次提交
  5. 10 12月, 2008 1 次提交
    • N
      netpoll: fix race on poll_list resulting in garbage entry · 7b363e44
      Neil Horman 提交于
      	A few months back a race was discused between the netpoll napi service
      path, and the fast path through net_rx_action:
      http://kerneltrap.org/mailarchive/linux-netdev/2007/10/16/345470
      
      A patch was submitted for that bug, but I think we missed a case.
      
      Consider the following scenario:
      
      INITIAL STATE
      CPU0 has one napi_struct A on its poll_list
      CPU1 is calling netpoll_send_skb and needs to call poll_napi on the same
      napi_struct A that CPU0 has on its list
      
      
      
      CPU0						CPU1
      net_rx_action					poll_napi
      !list_empty (returns true)			locks poll_lock for A
      						 poll_one_napi
      						  napi->poll
      						   netif_rx_complete
      						    __napi_complete
      						    (removes A from poll_list)
      list_entry(list->next)
      
      
      In the above scenario, net_rx_action assumes that the per-cpu poll_list is
      exclusive to that cpu.  netpoll of course violates that, and because the netpoll
      path can dequeue from the poll list, its possible for CPU0 to detect a non-empty
      list at the top of the while loop in net_rx_action, but have it become empty by
      the time it calls list_entry.  Since the poll_list isn't surrounded by any other
      structure, the returned data from that list_entry call in this situation is
      garbage, and any number of crashes can result based on what exactly that garbage
      is.
      
      Given that its not fasible for performance reasons to place exclusive locks
      arround each cpus poll list to provide that mutal exclusion, I think the best
      solution is modify the netpoll path in such a way that we continue to guarantee
      that the poll_list for a cpu is in fact exclusive to that cpu.  To do this I've
      implemented the patch below.  It adds an additional bit to the state field in
      the napi_struct.  When executing napi->poll from the netpoll_path, this bit will
      be set. When a driver calls netif_rx_complete, if that bit is set, it will not
      remove the napi_struct from the poll_list.  That work will be saved for the next
      iteration of net_rx_action.
      
      I've tested this and it seems to work well.  About the biggest drawback I can
      see to it is the fact that it might result in an extra loop through
      net_rx_action in the event that the device is actually contended for (i.e. the
      netpoll path actually preforms all the needed work no the device, and the call
      to net_rx_action winds up doing nothing, except removing the napi_struct from
      the poll_list.  However I think this is probably a small price to pay, given
      that the alternative is a crash.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b363e44
  6. 21 11月, 2008 1 次提交
  7. 20 11月, 2008 2 次提交
    • S
      netdev: network device operations infrastructure · d314774c
      Stephen Hemminger 提交于
      This patch changes the network device internal API to move adminstrative
      operations out of the network device structure and into a separate structure.
      
      This patch involves some hackery to maintain compatablity between the
      new and old model, so all 300+ drivers don't have to be changed at once.
      For drivers that aren't converted yet, the netdevice_ops virt function list
      still resides in the net_device structure. For old protocols, the new
      net_device_ops are copied out to the old net_device pointers.
      
      After the transistion is completed the nag message can be changed to
      an WARN_ON, and the compatiablity code can be made configurable.
      
      Some function pointers aren't moved:
      * destructor can't be in net_device_ops because
        it may need to be referenced after the module is unloaded.
      * neighbor setup is manipulated in a couple of places that need special
        consideration
      * hard_start_xmit is in the fast path for transmit.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d314774c
    • J
      include/net net/ - csum_partial - remove unnecessary casts · 07f0757a
      Joe Perches 提交于
      The first argument to csum_partial is const void *
      casts to char/u8 * are not necessary
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07f0757a
  8. 28 10月, 2008 1 次提交
  9. 01 8月, 2008 1 次提交
  10. 18 7月, 2008 1 次提交
    • D
      net: Use queue aware tests throughout. · fd2ea0a7
      David S. Miller 提交于
      This effectively "flips the switch" by making the core networking
      and multiqueue-aware drivers use the new TX multiqueue structures.
      
      Non-multiqueue drivers need no changes.  The interfaces they use such
      as netif_stop_queue() degenerate into an operation on TX queue zero.
      So everything "just works" for them.
      
      Code that really wants to do "X" to all TX queues now invokes a
      routine that does so, such as netif_tx_wake_all_queues(),
      netif_tx_stop_all_queues(), etc.
      
      pktgen and netpoll required a little bit more surgery than the others.
      
      In particular the pktgen changes, whilst functional, could be largely
      improved.  The initial check in pktgen_xmit() will sometimes check the
      wrong queue, which is mostly harmless.  The thing to do is probably to
      invoke fill_packet() earlier.
      
      The bulk of the netpoll changes is to make the code operate solely on
      the TX queue indicated by by the SKB queue mapping.
      
      Setting of the SKB queue mapping is entirely confined inside of
      net/core/dev.c:dev_pick_tx().  If we end up needing any kind of
      special semantics (drops, for example) it will be implemented here.
      
      Finally, we now have a "real_num_tx_queues" which is where the driver
      indicates how many TX queues are actually active.
      
      With IGB changes from Jeff Kirsher.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd2ea0a7
  11. 13 5月, 2008 1 次提交
  12. 21 3月, 2008 1 次提交
  13. 05 3月, 2008 1 次提交
  14. 04 3月, 2008 1 次提交
  15. 29 1月, 2008 6 次提交
  16. 30 10月, 2007 1 次提交
    • D
      [NET]: Fix race between poll_napi() and net_rx_action() · 0a7606c1
      David S. Miller 提交于
      netpoll_poll_lock() synchronizes the ->poll() invocation
      code paths, but once we have the lock we have to make
      sure that NAPI_STATE_SCHED is still set.  Otherwise we
      get:
      
      	cpu 0			cpu 1
      
      	net_rx_action()		poll_napi()
      	netpoll_poll_lock()	... spin on ->poll_lock
      	->poll()
      	  netif_rx_complete
      	netpoll_poll_unlock()	acquire ->poll_lock()
      				->poll()
      				 netif_rx_complete()
      				 CRASH
      
      Based upon a bug report from Tina Yang.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a7606c1
  17. 22 10月, 2007 1 次提交
  18. 11 10月, 2007 5 次提交
    • S
      [NET]: Wrap netdevice hardware header creation. · 0c4e8581
      Stephen Hemminger 提交于
      Add inline for common usage of hardware header creation, and
      fix bug in IPV6 mcast where the assumption about negative return is
      an errno. Negative return from hard_header means not enough space
      was available,(ie -N bytes).
      Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c4e8581
    • J
      [NET]: Introduce and use print_mac() and DECLARE_MAC_BUF() · 0795af57
      Joe Perches 提交于
      This is nicer than the MAC_FMT stuff.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0795af57
    • E
      [NET]: Make the device list and device lookups per namespace. · 881d966b
      Eric W. Biederman 提交于
      This patch makes most of the generic device layer network
      namespace safe.  This patch makes dev_base_head a
      network namespace variable, and then it picks up
      a few associated variables.  The functions:
      dev_getbyhwaddr
      dev_getfirsthwbytype
      dev_get_by_flags
      dev_get_by_name
      __dev_get_by_name
      dev_get_by_index
      __dev_get_by_index
      dev_ioctl
      dev_ethtool
      dev_load
      wireless_process_ioctl
      
      were modified to take a network namespace argument, and
      deal with it.
      
      vlan_ioctl_set and brioctl_set were modified so their
      hooks will receive a network namespace argument.
      
      So basically anthing in the core of the network stack that was
      affected to by the change of dev_base was modified to handle
      multiple network namespaces.  The rest of the network stack was
      simply modified to explicitly use &init_net the initial network
      namespace.  This can be fixed when those components of the network
      stack are modified to handle multiple network namespaces.
      
      For now the ifindex generator is left global.
      
      Fundametally ifindex numbers are per namespace, or else
      we will have corner case problems with migration when
      we get that far.
      
      At the same time there are assumptions in the network stack
      that the ifindex of a network device won't change.  Making
      the ifindex number global seems a good compromise until
      the network stack can cope with ifindex changes when
      you change namespaces, and the like.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      881d966b
    • S
      [NET] netconsole: Support dynamic reconfiguration using configfs · 0bcc1816
      Satyam Sharma 提交于
      Based upon initial work by Keiichi Kii <k-keiichi@bx.jp.nec.com>.
      
      This patch introduces support for dynamic reconfiguration (adding, removing
      and/or modifying parameters of netconsole targets at runtime) using a
      userspace interface exported via configfs.  Documentation is also updated
      accordingly.
      
      Issues and brief design overview:
      
      (1) Kernel-initiated creation / destruction of kernel objects is not
          possible with configfs -- the lifetimes of the "config items" is managed
          exclusively from userspace.  But netconsole must support boot/module
          params too, and these are parsed in kernel and hence netpolls must be
          setup from the kernel.  Joel Becker suggested to separately manage the
          lifetimes of the two kinds of netconsole_target objects -- those created
          via configfs mkdir(2) from userspace and those specified from the
          boot/module option string.  This adds complexity and some redundancy here
          and also means that boot/module param-created targets are not exposed
          through the configfs namespace (and hence cannot be updated / destroyed
          dynamically).  However, this saves us from locking / refcounting
          complexities that would need to be introduced in configfs to support
          kernel-initiated item creation / destroy there.
      
      (2) In configfs, item creation takes place in the call chain of the
          mkdir(2) syscall in the driver subsystem.  If we used an ioctl(2) to
          create / destroy objects from userspace, the special userspace program is
          able to fill out the structure to be passed into the ioctl and hence
          specify attributes such as local interface that are required at the time
          we set up the netpoll.  For configfs, this information is not available at
          the time of mkdir(2).  So, we keep all newly-created targets (via
          configfs) disabled by default.  The user is expected to set various
          attributes appropriately (including the local network interface if
          required) and then write(2) "1" to the "enabled" attribute.  Thus,
          netpoll_setup() is then called on the set parameters in the context of
          _this_ write(2) on the "enabled" attribute itself.  This design enables
          the user to reconfigure existing netconsole targets at runtime to be
          attached to newly-come-up interfaces that may not have existed when
          netconsole was loaded or when the targets were actually created.  All this
          effectively enables us to get rid of custom ioctls.
      
      (3) Ultra-paranoid configfs attribute show() and store() operations, with
          sanity and input range checking, using only safe string primitives, and
          compliant with the recommendations in Documentation/filesystems/sysfs.txt.
      
      (4) A new function netpoll_print_options() is created in the netpoll API,
          that just prints out the configured parameters for a netpoll structure.
          netpoll_parse_options() is modified to use that and it is also exported to
          be used from netconsole.
      Signed-off-by: NSatyam Sharma <satyam@infradead.org>
      Acked-by: NKeiichi Kii <k-keiichi@bx.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0bcc1816
    • S
      [NET]: Make NAPI polling independent of struct net_device objects. · bea3348e
      Stephen Hemminger 提交于
      Several devices have multiple independant RX queues per net
      device, and some have a single interrupt doorbell for several
      queues.
      
      In either case, it's easier to support layouts like that if the
      structure representing the poll is independant from the net
      device itself.
      
      The signature of the ->poll() call back goes from:
      
      	int foo_poll(struct net_device *dev, int *budget)
      
      to
      
      	int foo_poll(struct napi_struct *napi, int budget)
      
      The caller is returned the number of RX packets processed (or
      the number of "NAPI credits" consumed if you want to get
      abstract).  The callee no longer messes around bumping
      dev->quota, *budget, etc. because that is all handled in the
      caller upon return.
      
      The napi_struct is to be embedded in the device driver private data
      structures.
      
      Furthermore, it is the driver's responsibility to disable all NAPI
      instances in it's ->stop() device close handler.  Since the
      napi_struct is privatized into the driver's private data structures,
      only the driver knows how to get at all of the napi_struct instances
      it may have per-device.
      
      With lots of help and suggestions from Rusty Russell, Roland Dreier,
      Michael Chan, Jeff Garzik, and Jamal Hadi Salim.
      
      Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
      Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.
      
      [ Ported to current tree and all drivers converted.  Integrated
        Stephen's follow-on kerneldoc additions, and restored poll_list
        handling to the old style to fix mutual exclusion issues.  -DaveM ]
      Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bea3348e
  19. 17 7月, 2007 1 次提交
  20. 12 7月, 2007 1 次提交
  21. 11 7月, 2007 2 次提交
  22. 06 7月, 2007 1 次提交
    • J
      [NETPOLL]: Fixups for 'fix soft lockup when removing module' · 25442caf
      Jarek Poplawski 提交于
      >From my recent patch:
      
      > >    #1
      > >    Until kernel ver. 2.6.21 (including) cancel_rearming_delayed_work()
      > >    required a work function should always (unconditionally) rearm with
      > >    delay > 0 - otherwise it would endlessly loop. This patch replaces
      > >    this function with cancel_delayed_work(). Later kernel versions don't
      > >    require this, so here it's only for uniformity.
      
      But Oleg Nesterov <oleg@tv-sign.ru> found:
      
      > But 2.6.22 doesn't need this change, why it was merged?
      > 
      > In fact, I suspect this change adds a race,
      ...
      
      His description was right (thanks), so this patch reverts #1.
      Signed-off-by: NJarek Poplawski <jarkao2@o2.pl>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25442caf
  23. 29 6月, 2007 1 次提交
  24. 27 6月, 2007 1 次提交
  25. 09 5月, 2007 1 次提交
  26. 26 4月, 2007 3 次提交