提交 · 9702c67c6066f583b629cf037d2056245bb7a8e6 · openeuler / raspberrypi-kernel

20 3月, 2017 1 次提交

scsi: libsas: fix ata xfer length · 9702c67c

由 John Garry 提交于 3月 16, 2017

The total ata xfer length may not be calculated properly, in that we do
not use the proper method to get an sg element dma length.

According to the code comment, sg_dma_len() should be used after
dma_map_sg() is called.

This issue was found by turning on the SMMUv3 in front of the hisi_sas
controller in hip07. Multiple sg elements were being combined into a
single element, but the original first element length was being use as
the total xfer length.

Cc: <stable@vger.kernel.org>
Fixes: ff2aeb1e ("libata: convert to chained sg")
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

9702c67c

21 7月, 2016 1 次提交

scsi:libsas: fix oops caused by assigning a freed task to ->lldd_task · 354a086d

由 Wei Fang 提交于 7月 06, 2016

A freed task has been assigned to ->lldd_task when lldd_execute_task()
failed in sas_ata_qc_issue(), and access of ->lldd_task will cause an
oops:

Call trace:
[<ffffffc000641f64>] sas_ata_post_internal+0x6c/0x150
[<ffffffc0006c0d64>] ata_exec_internal_sg+0x32c/0x588
[<ffffffc0006c1048>] ata_exec_internal+0x88/0xe8
[<ffffffc0006c13b4>] ata_dev_read_id+0x204/0x5e0
[<ffffffc0006c17f0>] ata_dev_reread_id+0x60/0xc8
[<ffffffc0006c3098>] ata_dev_revalidate+0x88/0x1e0
[<ffffffc0006cf828>] ata_eh_recover+0xcf8/0x13a8
[<ffffffc0006d075c>] ata_do_eh+0x5c/0xe0
[<ffffffc0006d0828>] ata_std_error_handler+0x48/0x98
[<ffffffc0006d042c>] ata_scsi_port_error_handler+0x474/0x658
[<ffffffc000641b78>] async_sas_ata_eh+0x50/0x80
[<ffffffc0000ca664>] async_run_entry_fn+0x64/0x180
[<ffffffc0000c085c>] process_one_work+0x164/0x438
[<ffffffc0000c0c74>] worker_thread+0x144/0x4b0
[<ffffffc0000c70fc>] kthread+0xfc/0x110

Fix this by reassigning NULL to ->lldd_task in error path.
Signed-off-by: NWei Fang <fangwei1@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

354a086d

14 7月, 2016 1 次提交

libsas: use ata_is_ncq() and ata_has_dma() accessors · b38d4d85

由 Hannes Reinecke 提交于 7月 14, 2016

Use accessors instead of the raw protocol value.
Signed-off-by: NHannes Reinecke <hare@suse.com>
[hch: trivial cleanup of the ata_task assignments]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTejun Heo <tj@kernel.org>

b38d4d85

10 5月, 2016 2 次提交

libata/libsas: Define ATA_CMD_NCQ_NON_DATA · 661ce1f0

由 Hannes Reinecke 提交于 4月 25, 2016

Define the NCQ NON DATA command and update libsas to handle it
correctly.
Signed-off-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

661ce1f0

libsas: enable FPDMA SEND/RECEIVE · ef026b18

由 Hannes Reinecke 提交于 4月 25, 2016

Update libsas and dependent drivers to handle FPDMA
SEND/RECEIVE correctly.
Signed-off-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

ef026b18

20 3月, 2015 1 次提交

ata: Add a new flag to destinguish sas controller · 5067c046

由 Shaohua Li 提交于 3月 12, 2015

SAS controller has its own tag allocation, which doesn't directly match to ATA
tag, so SAS and SATA have different code path for ata tags. Originally we use
port->scsi_host (98bd4be1) to destinguish SAS controller, but libsas set
->scsi_host too, so we can't use it for the destinguish, we add a new flag for
this purpose.

Without this patch, the following oops can happen because scsi-mq uses
a host-wide tag map shared among all devices with some integer tag
values >= ATA_MAX_QUEUE.  These unexpectedly high tag values cause
__ata_qc_from_tag() to return NULL, which is then dereferenced in
ata_qc_new_init().

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
  IP: [<ffffffff804fd46e>] ata_qc_new_init+0x3e/0x120
  PGD 32adf0067 PUD 32adf1067 PMD 0
  Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
  Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi igb
  i2c_algo_bit ptp pps_core pm80xx libsas scsi_transport_sas sg coretemp
  eeprom w83795 i2c_i801
  CPU: 4 PID: 1450 Comm: cydiskbench Not tainted 4.0.0-rc3 #1
  Hardware name: Supermicro X8DTH-i/6/iF/6F/X8DTH, BIOS 2.1b       05/04/12
  task: ffff8800ba86d500 ti: ffff88032a064000 task.ti: ffff88032a064000
  RIP: 0010:[<ffffffff804fd46e>]  [<ffffffff804fd46e>] ata_qc_new_init+0x3e/0x120
  RSP: 0018:ffff88032a067858  EFLAGS: 00010046
  RAX: 0000000000000000 RBX: ffff8800ba0d2230 RCX: 000000000000002a
  RDX: ffffffff80505ae0 RSI: 0000000000000020 RDI: ffff8800ba0d2230
  RBP: ffff88032a067868 R08: 0000000000000201 R09: 0000000000000001
  R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800ba0d0000
  R13: ffff8800ba0d2230 R14: ffffffff80505ae0 R15: ffff8800ba0d0000
  FS:  0000000041223950(0063) GS:ffff88033e480000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000058 CR3: 000000032a0a3000 CR4: 00000000000006e0
  Stack:
   ffff880329eee758 ffff880329eee758 ffff88032a0678a8 ffffffff80502dad
   ffff8800ba167978 ffff880329eee758 ffff88032bf9c520 ffff8800ba167978
   ffff88032bf9c520 ffff88032bf9a290 ffff88032a0678b8 ffffffff80506909
  Call Trace:
   [<ffffffff80502dad>] ata_scsi_translate+0x3d/0x1b0
   [<ffffffff80506909>] ata_sas_queuecmd+0x149/0x2a0
   [<ffffffffa0046650>] sas_queuecommand+0xa0/0x1f0 [libsas]
   [<ffffffff804ea544>] scsi_dispatch_cmd+0xd4/0x1a0
   [<ffffffff804eb50f>] scsi_queue_rq+0x66f/0x7f0
   [<ffffffff803e5098>] __blk_mq_run_hw_queue+0x208/0x3f0
   [<ffffffff803e54b8>] blk_mq_run_hw_queue+0x88/0xc0
   [<ffffffff803e5c74>] blk_mq_insert_request+0xc4/0x130
   [<ffffffff803e0b63>] blk_execute_rq_nowait+0x73/0x160
   [<ffffffffa0023fca>] sg_common_write+0x3da/0x720 [sg]
   [<ffffffffa0025100>] sg_new_write+0x250/0x360 [sg]
   [<ffffffffa0025feb>] sg_write+0x13b/0x450 [sg]
   [<ffffffff8032ec91>] vfs_write+0xd1/0x1b0
   [<ffffffff8032ee54>] SyS_write+0x54/0xc0
   [<ffffffff80689932>] system_call_fastpath+0x12/0x17

tj: updated description.

Fixes: 12cb5ce1 ("libata: use blk taging")
Reported-and-tested-by: NTony Battersby <tonyb@cybernetics.com>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

5067c046

27 11月, 2014 1 次提交

libsas: remove task_collector mode · 79855d17

由 Christoph Hellwig 提交于 11月 05, 2014

The task_collector mode (or "latency_injector", (C) Dan Willians) is an
optional I/O path in libsas that queues up scsi commands instead of
directly sending it to the hardware.  It generall increases latencies
to in the optiomal case slightly reduce mmio traffic to the hardware.

Only the obsolete aic94xx driver and the mvsas driver allowed to use
it without recompiling the kernel, and most drivers didn't support it
at all.

Remove the giant blob of code to allow better optimizations for scsi-mq
in the future.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Acked-by: NDan Williams <dan.j.williams@intel.com>

79855d17

06 11月, 2014 1 次提交

libsas: use ata_dev_classify() · 1cbd772d

由 Hannes Reinecke 提交于 11月 05, 2014

Use the ata device class from libata in libsas instead of checking
the supported command set and switch to using ata_dev_classify()
instead of our own method.

Cc: Tejun Heo <tj@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Acked-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NTejun Heo <tj@kernel.org>

1cbd772d

19 3月, 2014 1 次提交

libata, libsas: kill pm_result and related cleanup · bc6e7c4b

由 Dan Williams 提交于 3月 14, 2014

Tejun says:
  "At least for libata, worrying about suspend/resume failures don't make
   whole lot of sense.  If suspend failed, just proceed with suspend.  If
   the device can't be woken up afterwards, that's that.  There isn't
   anything we could have done differently anyway.  The same for resume, if
   spinup fails, the device is dud and the following commands will invoke
   EH actions and will eventually fail.  Again, there really isn't any
   *choice* to make.  Just making sure the errors are handled gracefully
   (ie. don't crash) and the following commands are handled correctly
   should be enough."

The only libata user that actually cares about the result from a suspend
operation is libsas.  However, it only cares about whether queuing a new
operation collides with an in-flight one.  All libsas does with the
error is retry, but we can just let libata wait for the previous
operation before continuing.

Other cleanups include:
1/ Unifying all ata port pm operations on an ata_port_pm_ prefix
2/ Marking all ata port pm helper routines as returning void, only
   ata_port_pm_ entry points need to fake a 0 return value.
3/ Killing ata_port_{suspend|resume}_common() in favor of calling
   ata_port_request_pm() directly
4/ Killing the wrappers that just do a to_ata_port() conversion
5/ Clearly marking the entry points that do async operations with an
  _async suffix.

Reference: http://marc.info/?l=linux-scsi&m=138995409532286&w=2

Cc: Phillip Susi <psusi@ubuntu.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Suggested-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NTodd Brandt <todd.e.brandt@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

bc6e7c4b

27 11月, 2013 1 次提交

[SCSI] libsas: fix usage of ata_tf_to_fis · ae5fbae0

由 Dan Williams 提交于 10月 22, 2013

Since commit 110dd8f1 "[SCSI] libsas: fix scr_read/write users and
update the libata documentation" we have been passing pmp=1 and is_cmd=0
to ata_tf_to_fis().  Praveen reports that eSATA attached drives do not
discover correctly.  His investigation found that the BIOS was passing
pmp=0 while Linux was passing pmp=1 and failing to discover the drives.
Update libsas to follow the libata example of pulling the pmp setting
from the ata_link and correct is_cmd to be 1 since all tf's submitted
through ->qc_issue are commands.  Presumably libsas lldds do not care
about is_cmd as they have sideband mechanisms to perform link
management.

http://marc.info/?l=linux-scsi&m=138179681726990

[jejb: checkpatch fix]
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Reported-by: NPraveen Murali <pmurali@logicube.com>
Tested-by: NPraveen Murali <pmurali@logicube.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

ae5fbae0

10 5月, 2013 1 次提交

[SCSI] sas: unify the pointlessly separated enums sas_dev_type and sas_device_type · aa9f8328

由 James Bottomley 提交于 5月 07, 2013

These enums have been separate since the dawn of SAS, mainly because the
latter is a procotol only enum and the former includes additional state
for libsas. The dichotomy causes endless confusion about which one you
should use where and leads to pointless warnings like this:

drivers/scsi/mvsas/mv_sas.c: In function 'mvs_update_phyinfo':
drivers/scsi/mvsas/mv_sas.c:1162:34: warning: comparison between 'enum sas_device_type' and 'enum sas_dev_type' [-Wenum-compare]

Fix by eliminating one of them. The one kept is effectively the sas.h
one, but call it sas_device_type and make sure the enums are all
properly namespaced with the SAS_ prefix.
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

aa9f8328

24 8月, 2012 2 次提交

[SCSI] libsas, ipr: cleanup ata_host flags initialization via ata_host_init · 8d8e7d13

由 Dan Williams 提交于 7月 09, 2012

libsas and ipr pass flags to ata_host_init that are meant for the port.

ata_host flags:
	ATA_HOST_SIMPLEX	= (1 << 0),	/* Host is simplex, one DMA channel per host only */
	ATA_HOST_STARTED	= (1 << 1),	/* Host started */
	ATA_HOST_PARALLEL_SCAN	= (1 << 2),	/* Ports on this host can be scanned in parallel */
	ATA_HOST_IGNORE_ATA	= (1 << 3),	/* Ignore ATA devices on this host. */

flags passed by libsas:
	ATA_FLAG_SATA		= (1 << 1),
	ATA_FLAG_PIO_DMA	= (1 << 7), /* PIO cmds via DMA */
	ATA_FLAG_NCQ		= (1 << 10), /* host supports NCQ */

The only one that aliases is ATA_HOST_STARTED which is a 'don't care' in
the libsas and ipr cases since ata_hosts from these sources are not
registered with libata.
Reported-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Acked-by: NBrian King <brking@us.ibm.com>
Acked-by: NJeff Garzik <jgarzik@redhat.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

8d8e7d13

[SCSI] libsas: suspend / resume support · 303694ee

由 Dan Williams 提交于 6月 21, 2012

libsas power management routines to suspend and recover the sas domain
based on a model where the lldd is allowed and expected to be
"forgetful".

sas_suspend_ha - disable event processing allowing the lldd to take down
                 links without concern for causing hotplug events.
                 Regardless of whether the lldd actually posts link down
                 messages libsas notifies the lldd that all
                 domain_devices are gone.

sas_prep_resume_ha - on the way back up before the lldd starts link
                     training clean out any spurious events that were
                     generated on the way down, and re-enable event
                     processing

sas_resume_ha - after the lldd has started and decided that all phys
		have posted link-up events this routine is called to let
		libsas start it's own timeout of any phys that did not
		resume.  After the timeout an lldd can cancel the
                phy teardown by posting a link-up event.

Storage for ex_change_count (u16) and phy_change_count (u8) are changed
to int so they can be set to -1 to indicate 'invalidated'.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NJacek Danecki <jacek.danecki@intel.com>
Tested-by: NMaciej Patelczyk <maciej.patelczyk@intel.com>
Acked-by: NAlan Stern <stern@rowland.harvard.edu>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

303694ee

20 7月, 2012 3 次提交

[SCSI] async: introduce 'async_domain' type · 2955b47d

由 Dan Williams 提交于 7月 09, 2012

This is in preparation for teaching async_synchronize_full() to sync all
pending async work, and not just on the async_running domain.  This
conversion is functionally equivalent, just embedding the existing list
in a new async_domain type.

The .registered attribute is used in a later patch to distinguish
between domains that want to be flushed by async_synchronize_full()
versus those that only expect async_synchronize_{full|cookie}_domain to
be used for flushing.

[jejb: add async.h to scsi_priv.h for struct async_domain]
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Acked-by: NArjan van de Ven <arjan@linux.intel.com>
Acked-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
Tested-by: NEldad Zack <eldad@fogrefinery.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

2955b47d

[SCSI] libsas: cleanup spurious calls to scsi_schedule_eh · 36fed498

由 Maciej Trela 提交于 6月 21, 2012

eh is woken up automatically by the presence of failed commands,
scsi_schedule_eh is reserved for cases where there are no failed
commands.  This guarantees that host_eh_sceduled is only incremented
when an explicit eh request is made.
Reviewed-by: NJacek Danecki <jacek.danecki@intel.com>
Signed-off-by: NMaciej Trela <maciej.trela@intel.com>
[fixed spurious delete of sas_ata_task_abort]
Signed-off-by: NArtur Wojcik <artur.wojcik@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

36fed498

[SCSI] libata, libsas: introduce sched_eh and end_eh port ops · e4a9c373

由 Dan Williams 提交于 6月 21, 2012

When managing shost->host_eh_scheduled libata assumes that there is a
1:1 shost-to-ata_port relationship.  libsas creates a 1:N relationship
so it needs to manage host_eh_scheduled cumulatively at the host level.
The sched_eh and end_eh port port ops allow libsas to track when domain
devices enter/leave the "eh-pending" state under ha->lock (previously
named ha->state_lock, but it is no longer just a lock for ha->state
changes).

Since host_eh_scheduled indicates eh without backing commands pinning
the device it can be deallocated at any time.  Move the taking of the
domain_device reference under the port_lock to guarantee that the
ata_port stays around for the duration of eh.
Reviewed-by: NJacek Danecki <jacek.danecki@intel.com>
Acked-by: NJeff Garzik <jgarzik@redhat.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

e4a9c373

08 7月, 2012 1 次提交

[SCSI] libsas: fix taskfile corruption in sas_ata_qc_fill_rtf · 6ef1b512

由 Dan Williams 提交于 6月 22, 2012

fill_result_tf() grabs the taskfile flags from the originating qc which
sas_ata_qc_fill_rtf() promptly overwrites.  The presence of an
ata_taskfile in the sata_device makes it tempting to just copy the full
contents in sas_ata_qc_fill_rtf().  However, libata really only wants
the fis contents and expects the other portions of the taskfile to not
be touched by ->qc_fill_rtf.  To that end store a fis buffer in the
sata_device and use ata_tf_from_fis() like every other ->qc_fill_rtf()
implementation.

Cc: <stable@vger.kernel.org>
Reported-by: NPraveen Murali <pmurali@logicube.com>
Tested-by: NPraveen Murali <pmurali@logicube.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

6ef1b512

23 4月, 2012 1 次提交

[SCSI] libsas, libata: fix start of life for a sas ata_port · b2024459

由 Dan Williams 提交于 3月 21, 2012

This changes the ordering of initialization and probing events from:
  1/ allocate rphy in PORTE_BYTES_DMAED, DISCE_REVALIDATE_DOMAIN
  2/ allocate ata_port and schedule port probe in DISCE_PROBE
...to:
  1/ allocate ata_port in PORTE_BYTES_DMAED, DISCE_REVALIDATE_DOMAIN
  2/ allocate rphy in PORTE_BYTES_DMAED, DISCE_REVALIDATE_DOMAIN
  3/ schedule port probe in DISCE_PROBE

This ordering prevents PHYE_SIGNAL_LOSS_EVENTS from sneaking in to
destrory ata devices before they have been fully initialized:

  BUG: unable to handle kernel paging request at 0000000000003b10
  IP: [<ffffffffa0053d7e>] sas_ata_end_eh+0x12/0x5e [libsas]
  ...
  [<ffffffffa004d1af>] sas_unregister_common_dev+0x78/0xc9 [libsas]
  [<ffffffffa004d4d4>] sas_unregister_dev+0x4f/0xad [libsas]
  [<ffffffffa004d5b1>] sas_unregister_domain_devices+0x7f/0xbf [libsas]
  [<ffffffffa004c487>] sas_deform_port+0x61/0x1b8 [libsas]
  [<ffffffffa004bed0>] sas_phye_loss_of_signal+0x29/0x2b [libsas]

...and kills the awkward "sata domain_device briefly existing in the
domain without an ata_port" state.
Reported-by: NMichal Kosciowski <michal.kosciowski@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Acked-by: NJeff Garzik <jgarzik@redhat.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

b2024459

01 3月, 2012 12 次提交

[SCSI] libsas: don't recover end devices attached to disabled phys · 26a2e68f

由 Dan Williams 提交于 1月 30, 2012

If userspace has decided to disable a phy the kernel should honor that
and not inadvertantly re-enable the phy via error recovery.  This is
more straightforward in the sata case where link recovery (via
libata-eh) is separate from sas_task cancelling in libsas-eh.  Teach
libsas to accept -ENODEV as a successful response from I_T_nexus_reset
('successful' in terms of not escalating further).

This is a more comprehensive fix then "libsas: don't recover 'gone'
devices in sas_ata_hard_reset()", as it is no longer sata-specific.

aic94xx does check the return value from sas_phy_reset() so if the phy
is disabled we proceed with clearing the I_T_nexus.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

26a2e68f

[SCSI] libsas: revert ata srst · 9a10b33c

由 Dan Williams 提交于 1月 20, 2012

libata issues follow up srsts when the controller has a hard time
recording the signature-fis after a reset, or if the link supports port
multipliers.  libsas does not support port multipliers and no current
libsas lldds appear to need help retrieving the signature fis.  Revert
it for now to remove confusion.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

9a10b33c

[SCSI] libsas: async ata scanning · 9508a66f

由 Dan Williams 提交于 1月 18, 2012

libsas ata error handling is already async but this does not help the
scan case.  Move initial link recovery out from under host->scan_mutex,
and delay synchronization with eh until after all port probe/recovery
work has been queued.

Device ordering is maintained with scan order by still calling
sas_rphy_add() in order of domain discovery.

Since we now scan the domain list when invoking libata-eh we need to be
careful to check for fully initialized ata ports.
Acked-by: NJack Wang <jack_wang@usish.com>
Acked-by: NJeff Garzik <jgarzik@redhat.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

9508a66f

[SCSI] libsas: restore scan order · 92625f9b

由 Dan Williams 提交于 1月 18, 2012

ata devices are always scanned after ssp. Prior to the ata error
handling reworks libsas would tend to scan devices in ascending expander
phy order. Restore this ordering by deferring ssp discovery to a
DISCE_PROBE event, and keep the probe order consistent with the
discovery order, not the placement of sata devices.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

92625f9b

[SCSI] libsas: let libata recover links that fail to transmit initial sig-fis · 354cf829

由 Dan Williams 提交于 1月 12, 2012

libsas fails to discover all sata devices in the domain. If a device fails
negotiation and does not transmit a signature fis the link needs recovery.
libata already understands how to manage slow to come up links, so treat these
conditions as ata device attach events for the purposes of creating an
ata_port. This allows libata to manage retrying link bring up.

Rediscovery is modified to be careful about checking changes in dev_type. It
looks like libsas leaks old devices if the sas address changes, but that's a
fix for another patch.
Acked-by: NJack Wang <jack_wang@usish.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

354cf829

[SCSI] libsas: improve debug statements · d214d81e

由 Dan Williams 提交于 1月 16, 2012

It's difficult to determine which domain_device is triggering error recovery,
so convert messages like:

  sas: ex 5001b4da000e703f phy08:T attached: 5001b4da000e7028
  sas: ex 5001b4da000e703f phy09:T attached: 5001b4da000e7029
  ...
  ata7: sas eh calling libata port error handler
  ata8: sas eh calling libata port error handler

...into:

  sas: ex 5001517e85cfefff phy05:T:9 attached: 5001517e85cfefe5 (stp)
  sas: ex 5001517e3b0af0bf phy11:T:8 attached: 5001517e3b0af0ab (stp)
  ...
  sas: ata7: end_device-21:1: dev error handler
  sas: ata8: end_device-20:0:5: dev error handler

which shows attached link rate, device type, and associates a
domain_device with its ata_port id to correlate messages emitted from
libata-eh.

As Doug notes, we can also take the opportunity to clarify expander phy
routing capabilities.

[dgilbert@interlog.com: clarify table2table with 'U']
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

d214d81e

[SCSI] libsas: fix mixed topology recovery · d230ce69

由 Dan Williams 提交于 1月 11, 2012

If we have a domain with sas and sata devices there may still be sas
recovery actions to take after peeling off the commands to send to
libata.
Reported-by: NAndrzej Jakowski <andrzej.jakowski@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

d230ce69

[SCSI] libsas: close scsi_remove_target() vs libata-eh race · 8abda4d2

由 Dan Williams 提交于 1月 10, 2012

ata_port lifetime in libata follows the host.  In libsas it follows the
scsi_target.  Once scsi_remove_device() has caused all commands to be
completed it allows scsi_remove_target() to immediately proceed to
freeing the ata_port causing bug reports like:

[  848.393333] BUG: spinlock bad magic on CPU#4, kworker/u:2/5107
[  848.400262] general protection fault: 0000 [#1] SMP
[  848.406244] CPU 4
[  848.408310] Modules linked in: nls_utf8 ipv6 uinput i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma dca sg sd_mod sr_mod cdrom ahci libahci isci libsas libata scsi_transport_sas [last unloaded: scsi_wait_scan]
[  848.432060]
[  848.434137] Pid: 5107, comm: kworker/u:2 Not tainted 3.2.0-isci+ #8 Intel Corporation S2600CP/S2600CP
[  848.445310] RIP: 0010:[<ffffffff8126a68c>]  [<ffffffff8126a68c>] spin_dump+0x5e/0x8c
[  848.454787] RSP: 0018:ffff8807f868dca0  EFLAGS: 00010002
[  848.461137] RAX: 0000000000000048 RBX: ffff8807fe86a630 RCX: ffffffff817d0be0
[  848.469520] RDX: 0000000000000000 RSI: ffffffff814af1cf RDI: 0000000000000002
[  848.477959] RBP: ffff8807f868dcb0 R08: 00000000ffffffff R09: 000000006b6b6b6b
[  848.486327] R10: 000000000003fb8c R11: ffffffff81a19448 R12: 6b6b6b6b6b6b6b6b
[  848.494699] R13: ffff8808027dc520 R14: 0000000000000000 R15: 000000000000001e
[  848.503067] FS:  0000000000000000(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000000
[  848.512899] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  848.519710] CR2: 00007ff77d001000 CR3: 00000007f7a5d000 CR4: 00000000000406e0
[  848.528072] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  848.536446] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  848.544831] Process kworker/u:2 (pid: 5107, threadinfo ffff8807f868c000, task ffff8807ff348000)
[  848.555327] Stack:
[  848.557959]  ffff8807fe86a630 ffff8807fe86a630 ffff8807f868dcd0 ffffffff8126a6e0
[  848.567072]  ffffffff817c142f ffff8807fe86a630 ffff8807f868dcf0 ffffffff8126a703
[  848.576190]  ffff8808027dc520 0000000000000286 ffff8807f868dd10 ffffffff814af1bb
[  848.585281] Call Trace:
[  848.588409]  [<ffffffff8126a6e0>] spin_bug+0x26/0x28
[  848.594357]  [<ffffffff8126a703>] do_raw_spin_unlock+0x21/0x88
[  848.601283]  [<ffffffff814af1bb>] _raw_spin_unlock_irqrestore+0x2c/0x65
[  848.609089]  [<ffffffffa001c103>] ata_scsi_port_error_handler+0x548/0x557 [libata]
[  848.618331]  [<ffffffff81061813>] ? async_schedule+0x17/0x17
[  848.625060]  [<ffffffffa004f30f>] async_sas_ata_eh+0x45/0x69 [libsas]
[  848.632655]  [<ffffffff810618aa>] async_run_entry_fn+0x97/0x125
[  848.639670]  [<ffffffff81057439>] process_one_work+0x207/0x38d
[  848.646577]  [<ffffffff8105738c>] ? process_one_work+0x15a/0x38d
[  848.653681]  [<ffffffff810576f7>] worker_thread+0x138/0x21c
[  848.660305]  [<ffffffff810575bf>] ? process_one_work+0x38d/0x38d
[  848.667493]  [<ffffffff8105b098>] kthread+0x9d/0xa5
[  848.673382]  [<ffffffff8106e1bd>] ? trace_hardirqs_on_caller+0x12f/0x166
[  848.681304]  [<ffffffff814b7704>] kernel_thread_helper+0x4/0x10
[  848.688324]  [<ffffffff814af534>] ? retint_restore_args+0x13/0x13
[  848.695530]  [<ffffffff8105affb>] ? __init_kthread_worker+0x5b/0x5b
[  848.702929]  [<ffffffff814b7700>] ? gs_change+0x13/0x13
[  848.709155] Code: 00 00 48 8d 88 38 04 00 00 44 8b 80 84 02 00 00 31 c0 e8 cf 1b 24 00 41 83 c8 ff 44 8b 4b 08 48 c7 c1 e0 0b 7d 81 4d 85 e4 74 10 <45> 8b 84 24 84 02 00 00 49 8d 8c 24 38 04 00 00 8b 53 04 48 89
[  848.732467] RIP  [<ffffffff8126a68c>] spin_dump+0x5e/0x8c
[  848.738905]  RSP <ffff8807f868dca0>
[  848.743743] ---[ end trace 143161646eee8caa ]---

...so arrange for the ata_port to have the same end of life as the domain
device.
Reported-by: NMarcin Tomczak <marcin.tomczak@intel.com>
Acked-by: NJeff Garzik <jgarzik@redhat.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

8abda4d2

[SCSI] isci: stop interpreting ->lldd_lu_reset() as an ata soft-reset · 43a5ab15

由 Dan Williams 提交于 12月 08, 2011

Driving resets from libsas-eh is pre-mature as libata will make a
decision about performing a softreset.  Currently libata determines
whether to perform a softreset based on ata_eh_followup_srst_needed(),
and none of those conditions apply to isci.

Remove the srst implementation and translate ->lldd_lu_reset() for ata
devices as a request to drive a reset via libata-eh.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

43a5ab15

[SCSI] libsas: don't recover 'gone' devices in sas_ata_hard_reset() · cb48d672

由 Dan Williams 提交于 12月 22, 2011

The commands that timeout when a disk is forcibly removed may trigger
libata to attempt recovery of the device.  If libsas has decided to
remove the device don't permit ata to continue to issue resets to its
last known phy.

The primary motivation for this patch is hotplug testing by writing 0 to
/sys/class/sas_phy/phyX/enable.  Without this check this test leads to
libata issuing a reset and re-enabling the device that wants to be torn
down.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

cb48d672

[SCSI] libsas: fix sas_find_local_phy(), take phy references · f41a0c44

由 Dan Williams 提交于 12月 21, 2011

In the direct-attached case this routine returns the phy on which this
device was first discovered. Which is broken if we want to support
wide-targets, as this phy reference can become stale even though the
port is still active.

In the expander-attached case this routine tries to lookup the phy by
scanning the attached sas addresses of the parent expander, and BUG_ONs
if it can't find it. However since eh and the libsas workqueue run
independently we can still be attempting device recovery via eh after
libsas has recorded the device as detached. This is even easier to hit
now that eh is blocked while device domain rediscovery takes place, and
that libata is fed more timed out commands increasing the chances that
it will try to recover the ata device.

Arrange for dev->phy to always point to a last known good phy, it may be
stale after the port is torn down, but it will catch up for wide port
reconfigurations, and never be NULL.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

f41a0c44

[SCSI] libsas: poll for ata device readiness after reset · 36a39947

由 Dan Williams 提交于 11月 17, 2011

Use ata_wait_after_reset() to poll for link recovery after a reset.
This combined with sas_ha->eh_mutex prevents expander rediscovery from
probing phys in an intermediate state.  Local discovery does not have a
mechanism to filter link status changes during this timeout, so it
remains the responsibility of lldds to prevent premature port teardown.
Although once all lldd's support ->lldd_ata_check_ready() that could be
used as a gate to local port teardown.

The signature fis is re-transmitted when the link comes back so we
should be revalidating the ata device class, but that is left to a future
patch.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

36a39947

20 2月, 2012 10 次提交

[SCSI] libsas: async ata-eh · 50824d6c

由 Dan Williams 提交于 12月 04, 2011

Once sas_ata_hard_reset() starts honoring the 'deadline' parameter a
pathological configuration could take 25 seconds per ata device
(serialized) to recover.  Run per-port recoveries in parallel.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

50824d6c

[SCSI] libsas: execute transport link resets with libata-eh via host workqueue · 81c757bc

由 Dan Williams 提交于 12月 02, 2011

Link resets leave ata affiliations intact, so arrange for libsas to make
an effort to avoid dropping the device due to a slow-to-recover link.
Towards this end carry out reset in the host workqueue so that it can
check for ata devices and kick the reset request to libata.  Hard
resets, in contrast, bypass libata since they are meant for associating
an ata device with another initiator in the domain (tears down
affiliations).

Need to add a new transport_sas_phy_reset() since the current
sas_phy_reset() is a utility function to libsas lldds.  They are not
prepared for it to loop back into eh.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

81c757bc

[SCSI] libsas: use libata-eh-reset for sata rediscovery fis transmit failures · b52df417

由 Dan Williams 提交于 11月 30, 2011

Since sata devices can take several seconds to recover the link on reset
the 0.5 seconds that libsas currently waits may not be enough.  Instead
if we are rediscovering a phy that was previously attached to a sata
device let libata handle any resets to encourage the device to transmit
the initial fis.

Once sas_ata_hard_reset() and lldds learn how to honor 'deadline' libsas
should stop encountering phys in an intermediate state, until then this
will loop until the fis is transmitted or ->attached_sas_addr gets
cleared, but in the more likely initial discovery case we keep existing
behavior.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

b52df417

[SCSI] libsas: defer SAS_TASK_NEED_DEV_RESET commands to libata · 3a2cdf39

由 Dan Williams 提交于 11月 29, 2011

lldds use the SAS_TASK_NEED_DEV_RESET interface to request that eh
perform a reset.  In the sata device case defer the commands that
triggered the reset to libata-eh context so it can perform its pre and
post reset management.

In the sas_ata_post_internal() case the reset request is falling on deaf
ears as the sas_task is immediately destroyed without any reset action.
Since it is currently a nop, and likely superfluous given the conversion
to new-style libata-eh, just drop the request.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

3a2cdf39

[SCSI] libsas: fix timeout vs completion race · 9095a64a

由 Dan Williams 提交于 11月 28, 2011

Until we have told the lldd to forget a task a timed out operation can
return from the hardware at any time.  Since completion frees the task
we need to make sure that no tasks run their normal completion handler
once eh has decided to manage the task.  Similar to
ata_scsi_cmd_error_handler() freeze completions to let eh judge the
outcome of the race.

Task collector mode is problematic because it presents a situation where
a task can be timed out and aborted before the lldd has even seen it.
For this case we need to guarantee that a task that an lldd has been
told to forget does not get queued after the lldd says "never seen it".
With sas_scsi_timed_out we achieve this with the ->task_queue_flush
mutex, rather than adding more time.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

9095a64a

[SCSI] libsas: close error handling vs sas_ata_task_done() race · 3dff5721

由 Dan Williams 提交于 11月 28, 2011

Since sas_ata does not implement ->freeze(), completions for scmds and
internal commands can still arrive concurrent with
ata_scsi_cmd_error_handler() and sas_ata_post_internal() respectively.
By the time either of those is called libata has committed to completing
the qc, and the ATA_PFLAG_FROZEN flag tells sas_ata_task_done() it has
lost the race.

In the sas_ata_post_internal() case we take on the additional
responsibility of freeing the sas_task to close the race with
sas_ata_task_done() freeing the the task while sas_ata_post_internal()
is in the process of invoking ->lldd_abort_task().
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

3dff5721

[SCSI] libsas: kill invocation of scsi_eh_finish_cmd from sas_ata_task_done · e500a34b

由 Dan Williams 提交于 11月 28, 2011

Prior to the conversion to the new-style libata-eh sas_ata_task_done()
may have been the last opportunity to clean up the scmd, but now
libata-eh explicitly handles this case.  It also races against sas-eh.
If a lldd completes a task after SAS_TASK_STATE_ABORTED is set it could
trigger a spurious decrement of shost->host_failed.  Current lldds have
the band-aid of checking SAS_TASK_STATE_ABORTED before calling
->task_done(), but better to just let the scmds escalate to libata for
race free cleanup.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

e500a34b

[SCSI] libsas: use ->set_dmamode to notify lldds of NCQ parameters · b91bb296

由 Dan Williams 提交于 11月 17, 2011

sas_discover_sata() notifies lldds of sata devices twice.  Once to allow
the 'identify' to be sent, and a second time to allow aic94xx (the only
libsas driver that cares about sata_dev.identify) to setup NCQ
parameters before the device becomes known to the midlayer.  Replace
this double notification and intervening 'identify' with an explicit
->lldd_ata_set_dmamode notification.  With this change all ata internal
commands are issued by libata, so we no longer need sas_issue_ata_cmd().

The data from the identify command only needs to be cached in one
location so ata_device.id replaces domain_device.sata_dev.identify.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

b91bb296

[SCSI] libsas: prevent domain rediscovery competing with ata error handling · 87c8331f

由 Dan Williams 提交于 11月 17, 2011

libata error handling provides for a timeout for link recovery.  libsas
must not rescan for previously known devices in this interval otherwise
it may remove a device that is simply waiting for its link to recover.
Let libata-eh make the determination of when the link is stable and
prevent libsas (host workqueue) from taking action while this
determination is pending.

Using a mutex (ha->disco_mutex) to flush and disable revalidation while
eh is running requires any discovery action that may block on eh be
moved to its own context outside the lock.  Probing ATA devices
explicitly waits on ata-eh and the cache-flush-io issued during device
removal may also pend awaiting eh completion.  Essentially any rphy
add/remove activity needs to run outside the lock.

This adds two new cleanup states for sas_unregister_domain_devices()
'allocated-but-not-probed', and 'flagged-for-destruction'.  In the
'allocated-but-not-probed' state  dev->rphy points to a rphy that is
known to have not been through a sas_rphy_add() event.  At domain
teardown check if this device is still pending probe and cleanup
accordingly.  Similarly if a device has already been queued for removal
then sas_unregister_domain_devices has nothing to do.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

87c8331f

[SCSI] libsas: convert dev->gone to flags · e139942d

由 Dan Williams 提交于 1月 07, 2012

In preparation for adding tracking of another device state "destroy".
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

e139942d