1. 25 3月, 2009 1 次提交
  2. 28 1月, 2009 1 次提交
  3. 07 1月, 2009 5 次提交
  4. 24 12月, 2008 1 次提交
    • H
      edac: fix edac core deadlock when removing a device · d519c8d9
      Harry Ciao 提交于
      When deleting an edac device, we have to wait for its edac_dev.work to be
      completed before deleting the whole edac_dev structure.  Since we have no
      idea which work in current edac_poller's workqueue is the work we are
      conerned about, we wait for all work in the edac_poller's workqueue to be
      proceseed.  This is done via flush_cpu_workqueue() which inserts a
      wq_barrier into the tail of the workqueue and then sleeping on the
      completion of this wq_barrier.  The edac_poller will wake up sleepers when
      it is found.
      
      EDAC core creates only one kernel worker thread, edac_poller, to run the
      works of all current edac devices.  They share the same callback function
      of edac_device_workq_function(), which would grab the mutex of
      device_ctls_mutex first before it checks the device.  This is exactly
      where edac_poller and rmmod would have a great chance to deadlock.
      
      In below call trace of rmmod > ... >
      edac_device_del_device >
      edac_device_workq_teardown > flush_workqueue > flush_cpu_workqueue,
      
      device_ctls_mutex would have already been grabbed by
      edac_device_del_device().  So, on one hand rmmod would sleep on the
      completion of a wq_barrier, holding device_ctls_mutex; on the other hand
      edac_poller would be blocked on the same mutex when it's running any one
      of works of existing edac evices(Note, this edac_dev.work is likely to be
      totally irrelevant to the one that is being removed right now)and never
      would have a chance to run the work of above wq_barrier to wake rmmod up.
      
      edac_device_workq_teardown() should not be called within the critical
      region of device_ctls_mutex.  Just like is done in edac_pci_del_device()
      and edac_mc_del_mc(), where edac_pci_workq_teardown() and
      edac_mc_workq_teardown() are called after related mutex are released.
      
      Moreover, an edac_dev.work should check first if it is being removed.  If
      this is the case, then it should bail out immediately.  Since not all of
      existing edac devices are to be removed, this "shutting flag" should be
      contained to edac device being removed.  The current edac_dev.op_state can
      be used to serve this purpose.
      
      The original deadlock problem and the solution have been witnessed and
      tested on actual hardware.  Without the solution, rmmod an edac driver
      would result in below deadlock:
      
      root@localhost:/root> rmmod mv64x60_edac
      EDAC DEBUG: mv64x60_dma_err_remove()
      EDAC DEBUG: edac_device_del_device()
      EDAC DEBUG: find_edac_device_by_dev()
      
      (hang for a moment)
      
      INFO: task edac-poller:2030 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      edac-poller   D 00000000     0  2030      2
      Call Trace:
      [df159dc0] [c0071e3c] free_hot_cold_page+0x17c/0x304 (unreliable)
      [df159e80] [c000a024] __switch_to+0x6c/0xa0
      [df159ea0] [c03587d8] schedule+0x2f4/0x4d8
      [df159f00] [c03598a8] __mutex_lock_slowpath+0xa0/0x174
      [df159f40] [e1030434] edac_device_workq_function+0x28/0xd8 [edac_core]
      [df159f60] [c003beb4] run_workqueue+0x114/0x218
      [df159f90] [c003c674] worker_thread+0x5c/0xc8
      [df159fd0] [c004106c] kthread+0x5c/0xa0
      [df159ff0] [c0013538] original_kernel_thread+0x44/0x60
      INFO: task rmmod:2062 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      rmmod         D 0ff2c9fc     0  2062   1839
      Call Trace:
      [df119c00] [c0437a74] 0xc0437a74 (unreliable)
      [df119cc0] [c000a024] __switch_to+0x6c/0xa0
      [df119ce0] [c03587d8] schedule+0x2f4/0x4d8
      [df119d40] [c03591dc] schedule_timeout+0xb0/0xf4
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d519c8d9
  5. 23 12月, 2008 1 次提交
    • B
      powerpc/cell: add QPACE as a separate Cell platform · def434c2
      Benjamin Krill 提交于
      Since the QPACE (Chromodynamics Parallel Computing on the
      Cell Broadband Engine) platform doesn't use a iommu, doesn't
      have PCI devices and a MPIC much lesser setup and
      configurations are needed. So far all devices are detected
      as OF device. A notifier function is used to set the dma_ops
      for the of_platform bus. Further this patch splits the
      PPC_CELL_NATIVE into PPC_CELL_COMMON which are parts that are
      shared with the QPACE platform and the rest.
      Signed-off-by: NBenjamin Krill <ben@codiert.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      def434c2
  6. 02 12月, 2008 2 次提交
    • J
      i82875p_edac: fix module remove · 09a81269
      Jarkko Lavinen 提交于
      Fix module removal bugs of i82875p_edac.  Also i82975x_edac code seems to
      have the same module removal bugs as in i82875p_edac.
      
      The problems were:
      
      1. In module removal i82875p_remove_one() is never called.
      
         Variable i82875p_registered is newer changed from 1, which
         guarantees i82875p_remove_one() is not called (and even if it were
         called, it would be called in wrong order).
      
         As a result, the edac_mc workque is not stopped and keeps probing.
         If kernel debugging options are not enabled, user may not notice
         anything going wrong.
      
         if debugging options are enabled and I do "rmmod i82875p_edac", I
         get:
      
            edac debug: edac_pci_workq_function() checking
            BUG: unable to handle kernel paging request at f882d16f
            ...
            call trace:
             [<f8834df3>] ? edac_mc_workq_function+0x55/0x7e [edac_core]
             [<c0233974>] ? run_workqueue+0xd7/0x1a5
             [<c023392f>] ? run_workqueue+0x92/0x1a5
             [<f8834d9e>] ? edac_mc_workq_function+0x0/0x7e [edac_core]
             [<c0233af9>] ? worker_thread+0xb7/0xc3
             [<c0236a7b>] ? autoremove_wake_function+0x0/0x33
             [<c0233a42>] ? worker_thread+0x0/0xc3
             [<c0236809>] ? kthread+0x3b/0x61
             [<c02367ce>] ? kthread+0x0/0x61
             [<c0204587>] ? kernel_thread_helper+0x7/0x10
      
         Fix for this is to get rid of needles variable i82875p_registered
         altogether and run i82875p_remove_one() *before*
         pci_unregister_driver().
      
      2. edac_mc_del_mc() uses mci after freeing mci
      
         edac_mc_del_mc() calls calls edac_remove_sysfs_mci_device().  The
         kobject refcount of mci drops to 0 and mci is freed.  After this
         mci is accessed via debug print and i82875p_remove_one() still
         uses mci->pvt and tries to free mci again with edac_mc_free().
      
         The fix for this is add kobject_get(&mci->edac_mci_kobj) after
         edac_mc_alloc(). Then the mci is still available after returning
         from edac_mc_del_mc() with refcount 1, and mci->pvt is still
         available. When i82875p_remove_one() finally calls edac_mc_free(),
         this will cause kobject_put() and mci is released properly.
      Signed-off-by: NJarkko Lavinen <jlavi@iki.fi>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09a81269
    • J
      i82875p_edac: fix overflow device resource setup · 307d1144
      Jarkko Lavinen 提交于
      When I do "modprobe i82875p_edac" on my Asus P4C800 MB on kernels 2.6.26
      or later, the module load fails due to BAR 0 collision.  On 2.6.25 the
      module loads just fine.
      
      The overflow device on the MB seems to be hidden and its resources are not
      allocated at normal PCI bus init.  Log shows the missing resource problem:
      
        EDAC DEBUG: i82875p_probe1()
        PCI: 0000:00:06.0 reg 10 32bit mmio: [fecf0000, fecf0fff]
        pci 0000:00:06.0: device not available because of BAR 0
      [0xfecf0000-0xfecf0fff] collisions
        EDAC i82875p: i82875p_setup_overfl_dev(): Failed to enable overflow
      device
      
      The patch below fixes this by calling pci_bus_assign_resources() after
      the overflow device is revealed and added to the bus. With this patch
      I am again able to load and use the module.
      Signed-off-by: NJarkko Lavinen <jlavi@iki.fi>
      Cc: Doug Thompson <norsk5@yahoo.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      307d1144
  7. 13 11月, 2008 1 次提交
  8. 31 10月, 2008 2 次提交
  9. 20 10月, 2008 1 次提交
  10. 17 10月, 2008 4 次提交
  11. 24 8月, 2008 1 次提交
  12. 26 7月, 2008 14 次提交
  13. 22 7月, 2008 1 次提交
  14. 25 5月, 2008 1 次提交
  15. 06 5月, 2008 1 次提交
  16. 30 4月, 2008 1 次提交
  17. 29 4月, 2008 2 次提交