1. 29 6月, 2006 2 次提交
    • B
      [SCSI] scsi: Device scanning oops for offlined devices (resend) · 309bd271
      Brian King 提交于
      If a device gets offlined as a result of the Inquiry sent
      during scanning, the following oops can occur. After the
      disk gets put into the SDEV_OFFLINE state, the error handler
      sends back the failed inquiry, which wakes the thread doing
      the scan. This starts a race between the scanning thread
      freeing the scsi device and the error handler calling
      scsi_run_host_queues to restart the host. Since the disk
      is in the SDEV_OFFLINE state, scsi_device_get will still
      work, which results in __scsi_iterate_devices getting
      a reference to the scsi disk when it shouldn't.
      
      The following execution thread causes the oops:
      
      CPU 0 (scan)				CPU 1 (eh)
      
      ---------------------------------------------------------
      scsi_probe_and_add_lun
                              ....
                                              scsi_eh_offline_sdevs
                                              scsi_eh_flush_done_q
      scsi_destroy_sdev
      scsi_device_dev_release
                                              scsi_restart_operations
                                               scsi_run_host_queues
                                                __scsi_iterate_devices
                                                 get_device
      scsi_device_dev_release_usercontext
                                                scsi_run_queue
                                                  <---OOPS--->
      
      The patch fixes this by changing the state of the sdev to SDEV_DEL
      before doing the final put_device, which should prevent the race
      from occurring.
      
      Original oops follows:
      
      Badness in kref_get at lib/kref.c:32
      Call Trace:
      [C00000002F4476D0] [C00000000000EE20] .show_stack+0x68/0x1b0 (unreliable)
      [C00000002F447770] [C00000000037515C] .program_check_exception+0x1cc/0x5a8
      [C00000002F447840] [C00000000000446C] program_check_common+0xec/0x100
       Exception: 700 at .kref_get+0x10/0x28
          LR = .kobject_get+0x20/0x3c
      [C00000002F447B30] [C00000002F447BC0] 0xc00000002f447bc0 (unreliable)
      [C00000002F447BB0] [C000000000254BDC] .get_device+0x20/0x3c
      [C00000002F447C30] [D000000000063188] .scsi_device_get+0x34/0xdc [scsi_mod]
      [C00000002F447CC0] [D0000000000633EC] .__scsi_iterate_devices+0x50/0xbc [scsi_mod]
      [C00000002F447D60] [D00000000006A910] .scsi_run_host_queues+0x34/0x5c [scsi_mod]
      [C00000002F447DF0] [D000000000069054] .scsi_error_handler+0xdb4/0xe44 [scsi_mod]
      [C00000002F447EE0] [C00000000007B4E0] .kthread+0x128/0x178
      [C00000002F447F90] [C000000000025E84] .kernel_thread+0x4c/0x68
      Unable to handle kernel paging request for <7>PCI: Enabling device: (0002:41:01.1), cmd 143
      data at address 0x000001b8
      Faulting instruction address: 0xd0000000000698e4
      sym1: <1010-66> rev 0x1 at pci 0002:41:01.1 irq 216
      sym1: No NVRAM, ID 7, Fast-80, LVD, parity checking
      sym1: SCSI BUS has been reset.
      scsi2 : sym-2.2.2
      cpu 0x0: Vector: 300 (Data Access) at [c00000002f447a30]
          pc: d0000000000698e4: .scsi_run_queue+0x2c/0x218 [scsi_mod]
          lr: d00000000006a904: .scsi_run_host_queues+0x28/0x5c [scsi_mod]
          sp: c00000002f447cb0
         msr: 9000000000009032
         dar: 1b8
       dsisr: 40000000
        current = 0xc0000000045fecd0
        paca    = 0xc00000000048ee80
          pid   = 1123, comm = scsi_eh_1
      enter ? for help
      [c00000002f447d60] d00000000006a904 .scsi_run_host_queues+0x28/0x5c [scsi_mod]
      [c00000002f447df0] d000000000069054 .scsi_error_handler+0xdb4/0xe44 [scsi_mod]
      [c00000002f447ee0] c00000000007b4e0 .kthread+0x128/0x178
      [c00000002f447f90] c000000000025e84 .kernel_thread+0x4c/0x68
      Signed-off-by: NBrian King <brking@us.ibm.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@SteelEye.com>
      309bd271
    • B
      [SCSI] scsi: Add allow_restart sysfs class attribute · a144c5ae
      Brian King 提交于
      This is a resend of a patch I generated in response to an email sent
      by Ruben Faelens <parasietje@gmail.com>. His original email to
      linux-scsi requested a method in which he could spin down a scsi disk
      when not in use and have the kernel automatically spin it back up when
      an I/O was generated to the disk. The infrastructure to automatically
      spin a disk up has been in the scsi error handler for some time now,
      but it is not enabled by default. This patch adds an sd sysfs attribute
      which allows userspace to enable this behavior.
      Signed-off-by: NBrian King <brking@us.ibm.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@SteelEye.com>
      a144c5ae
  2. 28 6月, 2006 1 次提交
    • J
      [SCSI] fc transport: bug fix: correct references · 3bdad7bd
      James Smart 提交于
      Original post was incorrect as it didn't realize that we already had
      a self-referenc due to device_initialize(), and we were really only
      missing the put on our own reference. This was hidden by the other bug
      which had the midlayer reusing stargets after they were already free,
      which was doing too many puts on our rport.
      
      Updating FC transport for:
      - Add put in fc_rport_final_delete(), to release the rport.
        Prior, we were leaving the rport with a reference, thus the shost
        with references, etc. If the driver was unloaded, shosts and rports
        remained, along with work threads, etc
      - Fix fc_rport_create failure path - too many put's on parent
      - Add commenting to easily track ref taking.
      Signed-off-by: NJames Smart <james.smart@emulex.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@SteelEye.com>
      3bdad7bd
  3. 27 6月, 2006 22 次提交
  4. 26 6月, 2006 15 次提交