1. 30 11月, 2013 2 次提交
    • V
      e1000: fix lockdep warning in e1000_reset_task · b2f963bf
      Vladimir Davydov 提交于
      The patch fixes the following lockdep warning, which is 100%
      reproducible on network restart:
      
      ======================================================
      [ INFO: possible circular locking dependency detected ]
      3.12.0+ #47 Tainted: GF
      -------------------------------------------------------
      kworker/1:1/27 is trying to acquire lock:
       ((&(&adapter->watchdog_task)->work)){+.+...}, at: [<ffffffff8108a5b0>] flush_work+0x0/0x70
      
      but task is already holding lock:
       (&adapter->mutex){+.+...}, at: [<ffffffffa0177c0a>] e1000_reset_task+0x4a/0xa0 [e1000]
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (&adapter->mutex){+.+...}:
             [<ffffffff810bdb5d>] lock_acquire+0x9d/0x120
             [<ffffffff816b8cbc>] mutex_lock_nested+0x4c/0x390
             [<ffffffffa017233d>] e1000_watchdog+0x7d/0x5b0 [e1000]
             [<ffffffff8108b972>] process_one_work+0x1d2/0x510
             [<ffffffff8108ca80>] worker_thread+0x120/0x3a0
             [<ffffffff81092c1e>] kthread+0xee/0x110
             [<ffffffff816c3d7c>] ret_from_fork+0x7c/0xb0
      
      -> #0 ((&(&adapter->watchdog_task)->work)){+.+...}:
             [<ffffffff810bd9c0>] __lock_acquire+0x1710/0x1810
             [<ffffffff810bdb5d>] lock_acquire+0x9d/0x120
             [<ffffffff8108a5eb>] flush_work+0x3b/0x70
             [<ffffffff8108b5d8>] __cancel_work_timer+0x98/0x140
             [<ffffffff8108b693>] cancel_delayed_work_sync+0x13/0x20
             [<ffffffffa0170cec>] e1000_down_and_stop+0x3c/0x60 [e1000]
             [<ffffffffa01775b1>] e1000_down+0x131/0x220 [e1000]
             [<ffffffffa0177c12>] e1000_reset_task+0x52/0xa0 [e1000]
             [<ffffffff8108b972>] process_one_work+0x1d2/0x510
             [<ffffffff8108ca80>] worker_thread+0x120/0x3a0
             [<ffffffff81092c1e>] kthread+0xee/0x110
             [<ffffffff816c3d7c>] ret_from_fork+0x7c/0xb0
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&adapter->mutex);
                                     lock((&(&adapter->watchdog_task)->work));
                                     lock(&adapter->mutex);
        lock((&(&adapter->watchdog_task)->work));
      
       *** DEADLOCK ***
      
      3 locks held by kworker/1:1/27:
       #0:  (events){.+.+.+}, at: [<ffffffff8108b906>] process_one_work+0x166/0x510
       #1:  ((&adapter->reset_task)){+.+...}, at: [<ffffffff8108b906>] process_one_work+0x166/0x510
       #2:  (&adapter->mutex){+.+...}, at: [<ffffffffa0177c0a>] e1000_reset_task+0x4a/0xa0 [e1000]
      
      stack backtrace:
      CPU: 1 PID: 27 Comm: kworker/1:1 Tainted: GF            3.12.0+ #47
      Hardware name: System manufacturer System Product Name/P5B-VM SE, BIOS 0501    05/31/2007
      Workqueue: events e1000_reset_task [e1000]
       ffffffff820f6000 ffff88007b9dba98 ffffffff816b54a2 0000000000000002
       ffffffff820f5e50 ffff88007b9dbae8 ffffffff810ba936 ffff88007b9dbac8
       ffff88007b9dbb48 ffff88007b9d8f00 ffff88007b9d8780 ffff88007b9d8f00
      Call Trace:
       [<ffffffff816b54a2>] dump_stack+0x49/0x5f
       [<ffffffff810ba936>] print_circular_bug+0x216/0x310
       [<ffffffff810bd9c0>] __lock_acquire+0x1710/0x1810
       [<ffffffff8108a5b0>] ? __flush_work+0x250/0x250
       [<ffffffff810bdb5d>] lock_acquire+0x9d/0x120
       [<ffffffff8108a5b0>] ? __flush_work+0x250/0x250
       [<ffffffff8108a5eb>] flush_work+0x3b/0x70
       [<ffffffff8108a5b0>] ? __flush_work+0x250/0x250
       [<ffffffff8108b5d8>] __cancel_work_timer+0x98/0x140
       [<ffffffff8108b693>] cancel_delayed_work_sync+0x13/0x20
       [<ffffffffa0170cec>] e1000_down_and_stop+0x3c/0x60 [e1000]
       [<ffffffffa01775b1>] e1000_down+0x131/0x220 [e1000]
       [<ffffffffa0177c12>] e1000_reset_task+0x52/0xa0 [e1000]
       [<ffffffff8108b972>] process_one_work+0x1d2/0x510
       [<ffffffff8108b906>] ? process_one_work+0x166/0x510
       [<ffffffff8108ca80>] worker_thread+0x120/0x3a0
       [<ffffffff8108c960>] ? manage_workers+0x2c0/0x2c0
       [<ffffffff81092c1e>] kthread+0xee/0x110
       [<ffffffff81092b30>] ? __init_kthread_worker+0x70/0x70
       [<ffffffff816c3d7c>] ret_from_fork+0x7c/0xb0
       [<ffffffff81092b30>] ? __init_kthread_worker+0x70/0x70
      
      == The issue background ==
      
      The problem occurs, because e1000_down(), which is called under
      adapter->mutex by e1000_reset_task(), tries to synchronously cancel
      e1000 auxiliary works (reset_task, watchdog_task, phy_info_task,
      fifo_stall_task), which take adapter->mutex in their handlers. So the
      question is what does adapter->mutex protect there?
      
      The adapter->mutex was introduced by commit 0ef4ee ("e1000: convert to
      private mutex from rtnl") as a replacement for rtnl_lock() taken in the
      asynchronous handlers. It targeted on fixing a similar lockdep warning
      issued when e1000_down() was called under rtnl_lock(), and it fixed it,
      but unfortunately it introduced the lockdep warning described above.
      Anyway, that said the source of this bug is that the asynchronous works
      were made to take rtnl_lock() some time ago, so let's look deeper and
      find why it was added there.
      
      The rtnl_lock() was added to asynchronous handlers by commit 338c15
      ("e1000: fix occasional panic on unload") in order to prevent
      asynchronous handlers from execution after the module is unloaded
      (e1000_down() is called) as it follows from the comment to the commit:
      
      > Net drivers in general have an issue where timers fired
      > by mod_timer or work threads with schedule_work are running
      > outside of the rtnl_lock.
      >
      > With no other lock protection these routines are vulnerable
      > to races with driver unload or reset paths.
      >
      > The longer term solution to this might be a redesign with
      > safer locks being taken in the driver to guarantee no
      > reentrance, but for now a safe and effective fix is
      > to take the rtnl_lock in these routines.
      
      I'm not sure if this locking scheme fixed the problem or just made it
      unlikely, although I incline to the latter. Anyway, this was long time
      ago when e1000 auxiliary works were implemented as timers scheduling
      real work handlers in their routines. The e1000_down() function only
      canceled the timers, but left the real handlers running if they were
      running, which could result in work execution after module unload.
      Today, the e1000 driver uses sane delayed works instead of the pair
      timer+work to implement its delayed asynchronous handlers, and the
      e1000_down() synchronously cancels all the works so that the problem
      that commit 338c15 tried to cope with disappeared, and we don't need any
      locks in the handlers any more. Moreover, any locking there can
      potentially result in a deadlock.
      
      So, this patch reverts commits 0ef4ee and 338c15.
      
      Fixes: 0ef4eedc ("e1000: convert to private mutex from rtnl")
      Fixes: 338c15e4 ("e1000: fix occasional panic on unload")
      Cc: Tushar Dave <tushar.n.dave@intel.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      b2f963bf
    • Y
      e1000: prevent oops when adapter is being closed and reset simultaneously · 6a7d64e3
      yzhu1 提交于
      This change is based on a similar change made to e1000e support in
      commit bb9e44d0 ("e1000e: prevent oops when adapter is being closed
      and reset simultaneously").  The same issue has also been observed
      on the older e1000 cards.
      
      Here, we have increased the RESET_COUNT value to 50 because there are too
      many accesses to e1000 nic on stress tests to e1000 nic, it is not enough
      to set RESET_COUT 25. Experimentation has shown that it is enough to set
      RESET_COUNT 50.
      Signed-off-by: Nyzhu1 <yanjun.zhu@windriver.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      6a7d64e3
  2. 25 9月, 2013 1 次提交
    • J
      intel: Remove extern from function prototypes · 5ccc921a
      Joe Perches 提交于
      There are a mix of function prototypes with and without extern
      in the kernel sources.  Standardize on not using extern for
      function prototypes.
      
      Function prototypes don't need to be written with extern.
      extern is assumed by the compiler.  Its use is as unnecessary as
      using auto to declare automatic/local variables in a block.
      Signed-off-by: NJoe Perches <joe@perches.com>
      5ccc921a
  3. 16 2月, 2013 1 次提交
  4. 07 2月, 2012 1 次提交
  5. 07 10月, 2011 2 次提交
  6. 27 8月, 2011 1 次提交
    • D
      e1000: save skb counts in TX to avoid cache misses · 31c15a2f
      Dean Nelson 提交于
      Virtual Machines with emulated e1000 network adapter running on Parallels'
      server were seeing kernel panics due to the e1000 driver dereferencing an
      unexpected NULL pointer retrieved from buffer_info->skb.
      
      The problem has been addressed for the e1000e driver, but not for the e1000.
      Since the two drivers share similar code in the affected area, a port of the
      following e1000e driver commit solves the issue for the e1000 driver:
      
      commit 9ed318d5
      Author: Tom Herbert <therbert@google.com>
      Date:   Wed May 5 14:02:27 2010 +0000
      
          e1000e: save skb counts in TX to avoid cache misses
      
          In e1000_tx_map, precompute number of segements and bytecounts which
          are derived from fields in skb; these are stored in buffer_info.  When
          cleaning tx in e1000_clean_tx_irq use the values in the associated
          buffer_info for statistics counting, this eliminates cache misses
          on skb fields.
      Signed-off-by: NDean Nelson <dnelson@redhat.com>
      Acked-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31c15a2f
  7. 11 8月, 2011 1 次提交
  8. 22 7月, 2011 1 次提交
  9. 07 5月, 2011 1 次提交
  10. 30 4月, 2011 1 次提交
  11. 24 9月, 2010 1 次提交
  12. 27 7月, 2010 1 次提交
  13. 20 7月, 2010 1 次提交
  14. 28 4月, 2010 1 次提交
  15. 27 3月, 2010 1 次提交
  16. 04 2月, 2010 1 次提交
  17. 21 1月, 2010 1 次提交
  18. 03 12月, 2009 1 次提交
  19. 08 10月, 2009 1 次提交
  20. 27 9月, 2009 1 次提交
  21. 07 7月, 2009 2 次提交
  22. 22 1月, 2009 1 次提交
  23. 04 12月, 2008 1 次提交
  24. 25 9月, 2008 1 次提交
  25. 23 7月, 2008 5 次提交
  26. 17 4月, 2008 1 次提交
  27. 26 3月, 2008 2 次提交
  28. 31 10月, 2007 1 次提交
  29. 11 10月, 2007 1 次提交
    • S
      [NET]: Make NAPI polling independent of struct net_device objects. · bea3348e
      Stephen Hemminger 提交于
      Several devices have multiple independant RX queues per net
      device, and some have a single interrupt doorbell for several
      queues.
      
      In either case, it's easier to support layouts like that if the
      structure representing the poll is independant from the net
      device itself.
      
      The signature of the ->poll() call back goes from:
      
      	int foo_poll(struct net_device *dev, int *budget)
      
      to
      
      	int foo_poll(struct napi_struct *napi, int budget)
      
      The caller is returned the number of RX packets processed (or
      the number of "NAPI credits" consumed if you want to get
      abstract).  The callee no longer messes around bumping
      dev->quota, *budget, etc. because that is all handled in the
      caller upon return.
      
      The napi_struct is to be embedded in the device driver private data
      structures.
      
      Furthermore, it is the driver's responsibility to disable all NAPI
      instances in it's ->stop() device close handler.  Since the
      napi_struct is privatized into the driver's private data structures,
      only the driver knows how to get at all of the napi_struct instances
      it may have per-device.
      
      With lots of help and suggestions from Rusty Russell, Roland Dreier,
      Michael Chan, Jeff Garzik, and Jamal Hadi Salim.
      
      Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
      Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.
      
      [ Ported to current tree and all drivers converted.  Integrated
        Stephen's follow-on kerneldoc additions, and restored poll_list
        handling to the old style to fix mutual exclusion issues.  -DaveM ]
      Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bea3348e
  30. 18 5月, 2007 1 次提交
  31. 28 4月, 2007 1 次提交
  32. 18 2月, 2007 1 次提交