1. 08 7月, 2012 1 次提交
  2. 23 4月, 2012 8 次提交
    • D
      [SCSI] Revert "[SCSI] libsas: fix sas port naming" · b4698d88
      Dan Williams 提交于
      This reverts commit a692b0ee.
      
      Tom reports:
      
      [    8.741033] ------------[ cut here ]------------
      [    8.741038] WARNING: at fs/sysfs/dir.c:508 sysfs_add_one+0xc1/0xf0()
      [    8.741040] Hardware name: To Be Filled By O.E.M.
      [    8.741041] sysfs: cannot create duplicate filename
      
      ...and missing 2 out of 4 drives connected to mvsas.  Commit a692b0ee
      made the assumption that all the phy ids an lldd registers to libsas are
      unique.  However, in the "multi-chip" case mvsas does a rather annoying
      duplication of phy ids in the array passed to libsas.  So, for example,
      chip0 has phy0-3 at ha phy index 0-3 and chip1 has its phy0-3 at ha phy
      index 4-7.  The more natural model would be to create a scsi_host (and
      sas_ha) per chip (controller), but for now revert the naming fix which
      unfortunately means dealing with unpredictable end-device names for a
      bit longer.
      
      Cc: Xiangliang Yu <yuxiangl@marvell.com>
      Cc: Patrick Thomson <patrick.s.thomson@intel.com>
      Reported-by: NTom Rini <trini@ti.com>
      Tested-by: NTom Rini <trini@ti.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      b4698d88
    • D
      [SCSI] libsas: fix false positive 'device attached' conditions · 7d1d8651
      Dan Williams 提交于
      Normalize phy->attached_sas_addr to return a zero-address in the case
      when device-type == NO_DEVICE or the linkrate is invalid to handle
      expanders that put non-zero sas addresses in the discovery response:
      
       sas: ex 5001b4da000f903f phy02:U:0 attached: 0100000000000000 (no device)
       sas: ex 5001b4da000f903f phy01:U:0 attached: 0100000000000000 (no device)
       sas: ex 5001b4da000f903f phy03:U:0 attached: 0100000000000000 (no device)
       sas: ex 5001b4da000f903f phy00:U:0 attached: 0100000000000000 (no device)
      Reported-by: NAndrzej Jakowski <andrzej.jakowski@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      7d1d8651
    • D
      [SCSI] libsas, libata: fix start of life for a sas ata_port · b2024459
      Dan Williams 提交于
      This changes the ordering of initialization and probing events from:
        1/ allocate rphy in PORTE_BYTES_DMAED, DISCE_REVALIDATE_DOMAIN
        2/ allocate ata_port and schedule port probe in DISCE_PROBE
      ...to:
        1/ allocate ata_port in PORTE_BYTES_DMAED, DISCE_REVALIDATE_DOMAIN
        2/ allocate rphy in PORTE_BYTES_DMAED, DISCE_REVALIDATE_DOMAIN
        3/ schedule port probe in DISCE_PROBE
      
      This ordering prevents PHYE_SIGNAL_LOSS_EVENTS from sneaking in to
      destrory ata devices before they have been fully initialized:
      
        BUG: unable to handle kernel paging request at 0000000000003b10
        IP: [<ffffffffa0053d7e>] sas_ata_end_eh+0x12/0x5e [libsas]
        ...
        [<ffffffffa004d1af>] sas_unregister_common_dev+0x78/0xc9 [libsas]
        [<ffffffffa004d4d4>] sas_unregister_dev+0x4f/0xad [libsas]
        [<ffffffffa004d5b1>] sas_unregister_domain_devices+0x7f/0xbf [libsas]
        [<ffffffffa004c487>] sas_deform_port+0x61/0x1b8 [libsas]
        [<ffffffffa004bed0>] sas_phye_loss_of_signal+0x29/0x2b [libsas]
      
      ...and kills the awkward "sata domain_device briefly existing in the
      domain without an ata_port" state.
      Reported-by: NMichal Kosciowski <michal.kosciowski@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      b2024459
    • D
      [SCSI] libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready · 0f3fce5c
      Dan Williams 提交于
      The check_ready implementation in the expander-attached ata device case
      polls on sas_ex_phy_discover().  The effect is that the ex_phy fields
      (critically ->attached_sas_addr) can change.  When ata_eh ends and
      libsas comes along to revalidate the domain
      sas_unregister_devs_sas_addr() can fail to lookup devices to remove, or
      fail to re-add an ata device that ata_eh marked as disabled.  So change
      the code to skip the sas_address and change count updates when ata_eh is
      active.
      
      Cc: Jack Wang <jack_wang@usish.com>
      Tested-by: NMaciej Patelczyk <maciej.patelczyk@intel.com>
      Tested-by: NBartek Nowakowski <bartek.nowakowski@intel.com>
      Tested-by: NJacek Danecki <jacek.danecki@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      0f3fce5c
    • D
      [SCSI] libsas: unify domain_device sas_rphy lifetimes · 9487669f
      Dan Williams 提交于
      Since the domain_device can out live the scsi_target we need the rphy to
      follow suit otherwise we run into issues like:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
        IP: [<ffffffffa011561b>] sas_ata_printk+0x43/0x6f [libsas]
        PGD 0
        Oops: 0000 [#1] SMP
        CPU 1
        Modules linked in: ses enclosure isci libsas scsi_transport_sas fuse sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf microcode pcspkr igb joydev iTCO_wdt ioatdma iTCO_vendor_support i2c_i801 i2c_core dca wmi hed ipv6 pata_acpi ata_generic [last unloaded: scsi_wait_scan]
      
        Pid: 129, comm: kworker/u:3 Not tainted 3.3.0-rc5-isci+ #1 Intel Corporation SandyBridge Platform/To be filled by O.E.M.
        RIP: 0010:[<ffffffffa011561b>] [<ffffffffa011561b>] sas_ata_printk+0x43/0x6f [libsas]
        RSP: 0018:ffff88042232dd70 EFLAGS: 00010282
        RAX: 0000000000000000 RBX: ffff8804283165b8 RCX: ffff88042232dda0
        RDX: ffff88042232dd78 RSI: ffff8804283165b8 RDI: ffffffffa01188d7
        RBP: ffff88042232ddd0 R08: ffff880388454000 R09: ffff8803edfde1f8
        R10: ffff8803edfde1f8 R11: ffff8803edfde1f8 R12: ffff880428316750
        R13: ffff880388454000 R14: ffff8803f88b31d0 R15: ffff8803f8b21d50
        FS: 0000000000000000(0000) GS:ffff88042ee20000(0000) knlGS:0000000000000000
        CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 0000000000000050 CR3: 0000000001a05000 CR4: 00000000000406e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        Process kworker/u:3 (pid: 129, threadinfo ffff88042232c000, task ffff88042230c920)
        Stack:
        0000000000000000 ffff880400000018 ffff88042232dde0 ffff88042232dda0
        ffffffffa01188c4 ffff88042ee93af0 ffff88042232ddb0 ffffffff8100e047
        ffff88042232de10 ffff880420e5a2c8 ffff8803f8b21d50 ffff8803edfde1f8
        Call Trace:
        [<ffffffff8100e047>] ? load_TLS+0xb/0xf
        [<ffffffffa01156ad>] async_sas_ata_eh+0x66/0x95 [libsas]
        [<ffffffff810655e1>] async_run_entry_fn+0x9e/0x131
      Reported-by: NTom Jackson <thomas.p.jackson@intel.com>
      Tested-by: NTom Jackson <thomas.p.jackson@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      9487669f
    • D
      [SCSI] libsas: fix sas_get_port_device regression · ec236e52
      Dan Williams 提交于
      Commit 899fcf40 "[SCSI] libsas: set attached device type and target
      protocols for local phys" setup 'phy' to be dereferenced after
      list_for_each_entry(phy, &port->phy_list, port_phy_el) (i.e. phy ==
      &port->phy_list) resulting in reports like:
      
        BUG: unable to handle kernel NULL pointer dereference at 00000000000002b0
        IP: [<ffffffffa00ce948>] sas_discover_domain+0x29e/0x4fb [libsas]
      
      ...fix by deferring sas_phy_set_target() to the end of
      sas_get_port_device().
      Reported-by: NTom Jackson <thomas.p.jackson@intel.com>
      Tested-by: NTom Jackson <thomas.p.jackson@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      ec236e52
    • T
      [SCSI] libsas: fix sas_find_bcast_phy() in the presence of 'vacant' phys · 1699490d
      Thomas Jackson 提交于
      If an expander reports 'PHY VACANT' for a phy index prior to the one
      that generated a BCN libsas fails rediscovery.  Since a vacant phy is
      defined as a valid phy index that will never have an attached device
      just continue the search.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NThomas Jackson <thomas.p.jackson@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      1699490d
    • D
      [SCSI] libsas: introduce sas_work to fix sas_drain_work vs sas_queue_work · 22b9153f
      Dan Williams 提交于
      When requeuing work to a draining workqueue the last work instance may
      not be idle, so sas_queue_work() must not touch work->entry.  Introduce
      sas_work with a drain_node list_head to have a private list for
      collecting work deferred due to drain collision.
      
      Fixes reports like:
        BUG: unable to handle kernel NULL pointer dereference at           (null)
        IP: [<ffffffff810410d4>] process_one_work+0x2e/0x338
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      22b9153f
  3. 20 3月, 2012 1 次提交
  4. 01 3月, 2012 24 次提交
    • D
      [SCSI] libsas: don't recover end devices attached to disabled phys · 26a2e68f
      Dan Williams 提交于
      If userspace has decided to disable a phy the kernel should honor that
      and not inadvertantly re-enable the phy via error recovery.  This is
      more straightforward in the sata case where link recovery (via
      libata-eh) is separate from sas_task cancelling in libsas-eh.  Teach
      libsas to accept -ENODEV as a successful response from I_T_nexus_reset
      ('successful' in terms of not escalating further).
      
      This is a more comprehensive fix then "libsas: don't recover 'gone'
      devices in sas_ata_hard_reset()", as it is no longer sata-specific.
      
      aic94xx does check the return value from sas_phy_reset() so if the phy
      is disabled we proceed with clearing the I_T_nexus.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      26a2e68f
    • D
      [SCSI] libsas: fixup target_port_protocols for expanders that don't report sata · 77c309f3
      Dan Williams 提交于
      If discovery returns 0 for target_port_protocols but shows an attached
      sata device, just report SAS_PROTOCOL_SATA in the identify data so
      userspace can reliably search for sata devices in the domain.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      77c309f3
    • D
      [SCSI] libsas: set attached device type and target protocols for local phys · 899fcf40
      Dan Williams 提交于
      Before:
      $ cat /sys/class/sas_phy/phy-6\:3/device_type
      none
      $ cat /sys/class/sas_phy/phy-6\:3/target_port_protocols
      none
      
      After:
      $ cat /sys/class/sas_phy/phy-6\:3/device_type
      end device
      $ cat /sys/class/sas_phy/phy-6\:3/target_port_protocols
      sata
      
      Also downgrade the phy_list_lock to _irq instead of _irqsave since
      libsas will never call sas_get_port_device with interrupts disbled.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      899fcf40
    • D
      [SCSI] libsas: revert ata srst · 9a10b33c
      Dan Williams 提交于
      libata issues follow up srsts when the controller has a hard time
      recording the signature-fis after a reset, or if the link supports port
      multipliers.  libsas does not support port multipliers and no current
      libsas lldds appear to need help retrieving the signature fis.  Revert
      it for now to remove confusion.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      9a10b33c
    • D
      [SCSI] libsas: fix lifetime of SAS_HA_FROZEN · 84023474
      Dan Williams 提交于
      Until all sas_tasks are known to no longer be in-flight this flag gates late
      completions from colliding with error handling.  However, it must be cleared
      prior to the submission of scsi_send_eh_cmnd() requests, otherwise those
      commands will never be completed correctly.
      
      This was spotted by slub debug:
       =============================================================================
       BUG sas_task: Objects remaining on kmem_cache_close()
       -----------------------------------------------------------------------------
      
       INFO: Slab 0xffffea001f0eba00 objects=34 used=1 fp=0xffff8807c3aecb00 flags=0x8000000000004080
       Pid: 22919, comm: modprobe Not tainted 3.2.0-isci+ #2
       Call Trace:
        [<ffffffff810fcdcd>] slab_err+0xb0/0xd2
        [<ffffffff810e1c50>] ? free_percpu+0x31/0x117
        [<ffffffff81100122>] ? kzalloc+0x14/0x16
        [<ffffffff81100122>] ? kzalloc+0x14/0x16
        [<ffffffff81100486>] kmem_cache_destroy+0x11d/0x270
        [<ffffffffa0112bdc>] sas_class_exit+0x10/0x12 [libsas]
        [<ffffffff81078fba>] sys_delete_module+0x1c4/0x23c
        [<ffffffff814797ba>] ? sysret_check+0x2e/0x69
        [<ffffffff8126479e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
        [<ffffffff81479782>] system_call_fastpath+0x16/0x1b
       INFO: Object 0xffff8807c3aed280 @offset=21120
       INFO: Allocated in sas_alloc_task+0x22/0x90 [libsas] age=4615311 cpu=2 pid=12966
        __slab_alloc.clone.3+0x1d1/0x234
        kmem_cache_alloc+0x52/0x10d
        sas_alloc_task+0x22/0x90 [libsas]
        sas_queuecommand+0x20e/0x230 [libsas]
        scsi_send_eh_cmnd+0xd1/0x30c
        scsi_eh_try_stu+0x4f/0x6b
        scsi_eh_ready_devs+0xba/0x6ef
        sas_scsi_recover_host+0xa35/0xab1 [libsas]
        scsi_error_handler+0x14b/0x5fa
        kthread+0x9d/0xa5
        kernel_thread_helper+0x4/0x10
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      84023474
    • D
      [SCSI] libsas: async ata scanning · 9508a66f
      Dan Williams 提交于
      libsas ata error handling is already async but this does not help the
      scan case.  Move initial link recovery out from under host->scan_mutex,
      and delay synchronization with eh until after all port probe/recovery
      work has been queued.
      
      Device ordering is maintained with scan order by still calling
      sas_rphy_add() in order of domain discovery.
      
      Since we now scan the domain list when invoking libata-eh we need to be
      careful to check for fully initialized ata ports.
      Acked-by: NJack Wang <jack_wang@usish.com>
      Acked-by: NJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      9508a66f
    • D
      [SCSI] libsas: restore scan order · 92625f9b
      Dan Williams 提交于
      ata devices are always scanned after ssp.  Prior to the ata error
      handling reworks libsas would tend to scan devices in ascending expander
      phy order.  Restore this ordering by deferring ssp discovery to a
      DISCE_PROBE event, and keep the probe order consistent with the
      discovery order, not the placement of sata devices.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      92625f9b
    • D
      [SCSI] libsas: delete device on sas address changed · c666aae6
      Dan Williams 提交于
      If the phy is attached to a new sas address unregister the first address
      before processing the new attachment.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      c666aae6
    • D
      [SCSI] libsas: let libata recover links that fail to transmit initial sig-fis · 354cf829
      Dan Williams 提交于
      libsas fails to discover all sata devices in the domain.  If a device fails
      negotiation and does not transmit a signature fis the link needs recovery.
      libata already understands how to manage slow to come up links, so treat these
      conditions as ata device attach events for the purposes of creating an
      ata_port.  This allows libata to manage retrying link bring up.
      
      Rediscovery is modified to be careful about checking changes in dev_type.  It
      looks like libsas leaks old devices if the sas address changes, but that's a
      fix for another patch.
      Acked-by: NJack Wang <jack_wang@usish.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      354cf829
    • D
      [SCSI] libsas: fix sas port naming · a692b0ee
      Dan Williams 提交于
      Make sas-port naming consistent with the expander-attached case whereby
      the phy-id is the last digit in the port name.  Otherwise we get the
      random behavior of the allocation order.
      Reported-by: NPatrick Thomson <patrick.s.thomson@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      a692b0ee
    • D
      [SCSI] libsas: improve debug statements · d214d81e
      Dan Williams 提交于
      It's difficult to determine which domain_device is triggering error recovery,
      so convert messages like:
      
        sas: ex 5001b4da000e703f phy08:T attached: 5001b4da000e7028
        sas: ex 5001b4da000e703f phy09:T attached: 5001b4da000e7029
        ...
        ata7: sas eh calling libata port error handler
        ata8: sas eh calling libata port error handler
      
      ...into:
      
        sas: ex 5001517e85cfefff phy05:T:9 attached: 5001517e85cfefe5 (stp)
        sas: ex 5001517e3b0af0bf phy11:T:8 attached: 5001517e3b0af0ab (stp)
        ...
        sas: ata7: end_device-21:1: dev error handler
        sas: ata8: end_device-20:0:5: dev error handler
      
      which shows attached link rate, device type, and associates a
      domain_device with its ata_port id to correlate messages emitted from
      libata-eh.
      
      As Doug notes, we can also take the opportunity to clarify expander phy
      routing capabilities.
      
      [dgilbert@interlog.com: clarify table2table with 'U']
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      d214d81e
    • M
      [SCSI] libsas: kill spurious sas_put_device · fdfd9d1b
      Maciej Trela 提交于
      Holdover from a patch rework, prior to the addition of SAS_DEV_DESTROY
      we were holding a reference while the destruct was pending in case the
      domain was torn down before the desctruct event ran.  That case is
      covered by SAS_DEV_DESTROY, and the sas_put_device() just corrupts freed
      memory, or worse frees the memory while another agent holds a reference.
      Signed-off-by: NMaciej Trela <maciej.trela@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      fdfd9d1b
    • D
      [SCSI] libsas: fix sas_unregister_ports vs sas_drain_work · 5d7f6d10
      Dan Williams 提交于
      We need to hold drain_mutex across the unregistration as port down events
      queue device removal as chained events, so we need to make sure no other
      drainers are active.
      
      [ 1118.673968] WARNING: at kernel/workqueue.c:996 __queue_work+0x11a/0x326()
      [ 1118.681982] Hardware name: S2600CP
      [ 1118.686193] Modules linked in: isci(-) libsas scsi_transport_sas nls_utf8
      ipv6 uinput sg iTCO_wdt iTCO_vendor_support i2c_i801 i2c_core ioatdma dca
      sd_mod sr_mod cdrom ahci libahci libata [last unloaded: scsi_transport_sas]
      [ 1118.709893] Pid: 6831, comm: rmmod Not tainted 3.2.0-isci+ #1
      [ 1118.716727] Call Trace:
      [ 1118.719867]  [<ffffffff8103e9f5>] warn_slowpath_common+0x85/0x9d
      [ 1118.727000]  [<ffffffff8103ea27>] warn_slowpath_null+0x1a/0x1c
      [ 1118.733942]  [<ffffffff81056d44>] __queue_work+0x11a/0x326
      [ 1118.740481]  [<ffffffff81056f99>] queue_work_on+0x1b/0x22
      [ 1118.746925]  [<ffffffff81057106>] queue_work+0x37/0x3e
      [ 1118.753105]  [<ffffffffa0120e05>] ? sas_discover_event+0x55/0x82 [libsas]
      [ 1118.761094]  [<ffffffff813217c3>] scsi_queue_work+0x42/0x44
      [ 1118.767717]  [<ffffffffa0120e19>] sas_discover_event+0x69/0x82 [libsas]
      [ 1118.775509]  [<ffffffffa0120f5b>] sas_unregister_dev+0xc3/0xcc [libsas]
      [ 1118.783319]  [<ffffffffa0120fae>] sas_unregister_domain_devices+0x4a/0xc8 [libsas]
      [ 1118.792731]  [<ffffffffa0120071>] sas_deform_port+0x60/0x1a6 [libsas]
      [ 1118.800339]  [<ffffffffa01201ea>] sas_unregister_ports+0x33/0x44 [libsas]
      [ 1118.808342]  [<ffffffffa011f7e5>] sas_unregister_ha+0x41/0x6b [libsas]
      [ 1118.816055]  [<ffffffffa0134055>] isci_unregister+0x22/0x4d [isci]
      [ 1118.823384]  [<ffffffffa0143040>] isci_pci_remove+0x2e/0x60 [isci]
      Reported-by: NJacek Danecki <jacek.danecki@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      5d7f6d10
    • D
      [SCSI] libsas: route local link resets through ata-eh · ab526633
      Dan Williams 提交于
      Similar to the conversion of the transport-class reset we want bsg
      initiated resets to be managed by libata.
      Reported-by: NJacek Danecki <jacek.danecki@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      ab526633
    • D
      [SCSI] libsas: fix mixed topology recovery · d230ce69
      Dan Williams 提交于
      If we have a domain with sas and sata devices there may still be sas
      recovery actions to take after peeling off the commands to send to
      libata.
      Reported-by: NAndrzej Jakowski <andrzej.jakowski@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      d230ce69
    • D
      [SCSI] libsas: close scsi_remove_target() vs libata-eh race · 8abda4d2
      Dan Williams 提交于
      ata_port lifetime in libata follows the host.  In libsas it follows the
      scsi_target.  Once scsi_remove_device() has caused all commands to be
      completed it allows scsi_remove_target() to immediately proceed to
      freeing the ata_port causing bug reports like:
      
      [  848.393333] BUG: spinlock bad magic on CPU#4, kworker/u:2/5107
      [  848.400262] general protection fault: 0000 [#1] SMP
      [  848.406244] CPU 4
      [  848.408310] Modules linked in: nls_utf8 ipv6 uinput i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma dca sg sd_mod sr_mod cdrom ahci libahci isci libsas libata scsi_transport_sas [last unloaded: scsi_wait_scan]
      [  848.432060]
      [  848.434137] Pid: 5107, comm: kworker/u:2 Not tainted 3.2.0-isci+ #8 Intel Corporation S2600CP/S2600CP
      [  848.445310] RIP: 0010:[<ffffffff8126a68c>]  [<ffffffff8126a68c>] spin_dump+0x5e/0x8c
      [  848.454787] RSP: 0018:ffff8807f868dca0  EFLAGS: 00010002
      [  848.461137] RAX: 0000000000000048 RBX: ffff8807fe86a630 RCX: ffffffff817d0be0
      [  848.469520] RDX: 0000000000000000 RSI: ffffffff814af1cf RDI: 0000000000000002
      [  848.477959] RBP: ffff8807f868dcb0 R08: 00000000ffffffff R09: 000000006b6b6b6b
      [  848.486327] R10: 000000000003fb8c R11: ffffffff81a19448 R12: 6b6b6b6b6b6b6b6b
      [  848.494699] R13: ffff8808027dc520 R14: 0000000000000000 R15: 000000000000001e
      [  848.503067] FS:  0000000000000000(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000000
      [  848.512899] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  848.519710] CR2: 00007ff77d001000 CR3: 00000007f7a5d000 CR4: 00000000000406e0
      [  848.528072] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  848.536446] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  848.544831] Process kworker/u:2 (pid: 5107, threadinfo ffff8807f868c000, task ffff8807ff348000)
      [  848.555327] Stack:
      [  848.557959]  ffff8807fe86a630 ffff8807fe86a630 ffff8807f868dcd0 ffffffff8126a6e0
      [  848.567072]  ffffffff817c142f ffff8807fe86a630 ffff8807f868dcf0 ffffffff8126a703
      [  848.576190]  ffff8808027dc520 0000000000000286 ffff8807f868dd10 ffffffff814af1bb
      [  848.585281] Call Trace:
      [  848.588409]  [<ffffffff8126a6e0>] spin_bug+0x26/0x28
      [  848.594357]  [<ffffffff8126a703>] do_raw_spin_unlock+0x21/0x88
      [  848.601283]  [<ffffffff814af1bb>] _raw_spin_unlock_irqrestore+0x2c/0x65
      [  848.609089]  [<ffffffffa001c103>] ata_scsi_port_error_handler+0x548/0x557 [libata]
      [  848.618331]  [<ffffffff81061813>] ? async_schedule+0x17/0x17
      [  848.625060]  [<ffffffffa004f30f>] async_sas_ata_eh+0x45/0x69 [libsas]
      [  848.632655]  [<ffffffff810618aa>] async_run_entry_fn+0x97/0x125
      [  848.639670]  [<ffffffff81057439>] process_one_work+0x207/0x38d
      [  848.646577]  [<ffffffff8105738c>] ? process_one_work+0x15a/0x38d
      [  848.653681]  [<ffffffff810576f7>] worker_thread+0x138/0x21c
      [  848.660305]  [<ffffffff810575bf>] ? process_one_work+0x38d/0x38d
      [  848.667493]  [<ffffffff8105b098>] kthread+0x9d/0xa5
      [  848.673382]  [<ffffffff8106e1bd>] ? trace_hardirqs_on_caller+0x12f/0x166
      [  848.681304]  [<ffffffff814b7704>] kernel_thread_helper+0x4/0x10
      [  848.688324]  [<ffffffff814af534>] ? retint_restore_args+0x13/0x13
      [  848.695530]  [<ffffffff8105affb>] ? __init_kthread_worker+0x5b/0x5b
      [  848.702929]  [<ffffffff814b7700>] ? gs_change+0x13/0x13
      [  848.709155] Code: 00 00 48 8d 88 38 04 00 00 44 8b 80 84 02 00 00 31 c0 e8 cf 1b 24 00 41 83 c8 ff 44 8b 4b 08 48 c7 c1 e0 0b 7d 81 4d 85 e4 74 10 <45> 8b 84 24 84 02 00 00 49 8d 8c 24 38 04 00 00 8b 53 04 48 89
      [  848.732467] RIP  [<ffffffff8126a68c>] spin_dump+0x5e/0x8c
      [  848.738905]  RSP <ffff8807f868dca0>
      [  848.743743] ---[ end trace 143161646eee8caa ]---
      
      ...so arrange for the ata_port to have the same end of life as the domain
      device.
      Reported-by: NMarcin Tomczak <marcin.tomczak@intel.com>
      Acked-by: NJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      8abda4d2
    • D
      [SCSI] libsas: mark all domain devices gone if root port disappears · 7d05919a
      Dan Williams 提交于
      If the top level expander is hot removed, mark all child devices as gone
      before unregistration to short circuit futile recovery.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      7d05919a
    • D
      [SCSI] libsas: pre-clean commands that won the eh vs completion race · 45c73b65
      Dan Williams 提交于
      When scrolling forward through the eh list (in a clear_q scenario) it is
      possible to encounter commands that won the completion vs eh race.  Rather
      than sprinkle more "if (!task)" throughout the handler just make a pass
      through the list and delete the race winners before handling the rest.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      45c73b65
    • D
      [SCSI] isci: stop interpreting ->lldd_lu_reset() as an ata soft-reset · 43a5ab15
      Dan Williams 提交于
      Driving resets from libsas-eh is pre-mature as libata will make a
      decision about performing a softreset.  Currently libata determines
      whether to perform a softreset based on ata_eh_followup_srst_needed(),
      and none of those conditions apply to isci.
      
      Remove the srst implementation and translate ->lldd_lu_reset() for ata
      devices as a request to drive a reset via libata-eh.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      43a5ab15
    • D
      [SCSI] libsas: don't recover 'gone' devices in sas_ata_hard_reset() · cb48d672
      Dan Williams 提交于
      The commands that timeout when a disk is forcibly removed may trigger
      libata to attempt recovery of the device.  If libsas has decided to
      remove the device don't permit ata to continue to issue resets to its
      last known phy.
      
      The primary motivation for this patch is hotplug testing by writing 0 to
      /sys/class/sas_phy/phyX/enable.  Without this check this test leads to
      libata issuing a reset and re-enabling the device that wants to be torn
      down.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      cb48d672
    • D
      [SCSI] libsas: fix sas_find_local_phy(), take phy references · f41a0c44
      Dan Williams 提交于
      In the direct-attached case this routine returns the phy on which this
      device was first discovered.  Which is broken if we want to support
      wide-targets, as this phy reference can become stale even though the
      port is still active.
      
      In the expander-attached case this routine tries to lookup the phy by
      scanning the attached sas addresses of the parent expander, and BUG_ONs
      if it can't find it.  However since eh and the libsas workqueue run
      independently we can still be attempting device recovery via eh after
      libsas has recorded the device as detached.  This is even easier to hit
      now that eh is blocked while device domain rediscovery takes place, and
      that libata is fed more timed out commands increasing the chances that
      it will try to recover the ata device.
      
      Arrange for dev->phy to always point to a last known good phy, it may be
      stale after the port is torn down, but it will catch up for wide port
      reconfigurations, and never be NULL.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      f41a0c44
    • D
      [SCSI] libsas: check for 'gone' expanders in smp_execute_task() · 3a9c5560
      Dan Williams 提交于
      No sense in issuing or retrying commands to an expander that has been
      removed.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      3a9c5560
    • D
      [SCSI] libsas: don't mark expanders as gone when a child device is removed · 0508c2f3
      Dan Williams 提交于
      Commit 56dd2c06 "[SCSI] libsas: Don't issue commands to devices that
      have been hot-removed" marked the parent device of an end-device as gone
      when all the phys to the end device have been deleted.
      
      The expander device is still present until its parent is removed.  This
      is a benign change until the smp_execute_task() path is taught to check
      ->gone.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      0508c2f3
    • D
      [SCSI] libsas: poll for ata device readiness after reset · 36a39947
      Dan Williams 提交于
      Use ata_wait_after_reset() to poll for link recovery after a reset.
      This combined with sas_ha->eh_mutex prevents expander rediscovery from
      probing phys in an intermediate state.  Local discovery does not have a
      mechanism to filter link status changes during this timeout, so it
      remains the responsibility of lldds to prevent premature port teardown.
      Although once all lldd's support ->lldd_ata_check_ready() that could be
      used as a gate to local port teardown.
      
      The signature fis is re-transmitted when the link comes back so we
      should be revalidating the ata device class, but that is left to a future
      patch.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      36a39947
  5. 20 2月, 2012 6 次提交