提交 · c2b89ae594ba6f5583bc20359b3adcfefd17e82b · openeuler / raspberrypi-kernel

27 12月, 2019 4 次提交

scsi: fix ata_port_wait_eh() hang caused by missing to wake up eh thread · c2b89ae5

由 zhengbin 提交于 5月 10, 2019

hulk inclusion
category: bugfix
bugzilla: 12843
CVE: NA

---------------------------

When I use fio test kernel in the following steps:
1.The sas controller mixes SAS/SATA disks
2.Use fio test all disks
3.Simultaneous enable/disable/link_reset/hard_reset PHY

it will hang in ata_port_wait_eh
Call trace:
 __switch_to+0xb4/0x1b8
 __schedule+0x1e8/0x718
 schedule+0x38/0x90
 ata_port_wait_eh+0x70/0xf8
 sas_ata_wait_eh+0x24/0x30 [libsas]
 transport_sas_phy_reset.isra.3+0x128/0x160 [libsas]
 phy_reset_work+0x20/0x30 [libsas]
 process_one_work+0x1e4/0x460
 worker_thread+0x40/0x450
 kthread+0x12c/0x130
 ret_from_fork+0x10/0x18

The key code process is like this:
scsi_dec_host_busy
	atomic_dec(&shost->host_busy);
	if (unlikely(scsi_host_in_recovery(shost))) {
		spin_lock_irqsave(shost->host_lock, flags);
		...
		scsi_eh_wakeup(shost)
		...
	}

scsi_schedule_eh
	spin_lock_irqsave(shost->host_lock, flags);
	if (scsi_host_set_state(shost, SHOST_RECOVERY) == 0 ||
	    scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY) == 0) {
		...
		scsi_eh_wakeup(shost);
	}

scsi_eh_wakeup
	if (scsi_host_busy(shost) == shost->host_failed)
		wake_up_process(shost->ehandler);

In scsi_dec_host_busy, host_busy & shost_state not in spinlock. Neither
function wakes up the SCSI error handler in the following timing:

CPU 0(call scsi_dec_host_busy)    CPU 1(call scsi_schedule_eh)
LOAD shost_state(!=recovery)
                                  scsi_host_set_state(SHOST_RECOVERY)
                                  scsi_eh_wakeup(host_busy != host_failed)
atomic_dec(&shost->host_busy);
if (scsi_host_in_recovery(shost))

Add a smp_mb between host_busy and shost_state.
Signed-off-by: Nzhengbin <zhengbin13@huawei.com>
[yan: backport from 5.0]
Signed-off-by: NJason Yan <yanaijie@huawei.com>
Reviewed-by: NMiao Xie <miaoxie@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c2b89ae5

ahci: prevent freezing port when EH is running · cb7fc545

由 Jason Yan 提交于 4月 03, 2019

euler inclusion
category: bugfix
bugzilla: NA
CVE: NA

---------------------------

Trinity report a warning for this patch:
WARNING: CPU: 1 PID: 118 at ../drivers/ata/libata-eh.c:4016
ata_eh_finish+0x15a/0x170

Fixing the race condition between EH and interrupt by making the EH
thread re-enter again is a little overkill and IO will get through
after the scsi_run_host_queues() and before SHOST_RECOVERY is set agian
in scsi_restart_operations().

If EH thread is already running, no need to freeze port and schedule
EH again.

Fixes: a7d2fef75b83 ("scsi: ata: Fix a race condition between scsi error handler and ahci interrupt")
Signed-off-by: NJason Yan <yanaijie@huawei.com>
Reviewed-by: Nzhengbin <zhengbin13@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

cb7fc545

scsi: ata: Fix a race condition between scsi error handler and ahci interrupt · 02e500a0

由 Jason Yan 提交于 3月 28, 2019

euler inclusion
category: bugfix
bugzilla: NA
CVE: NA

---------------------------

   interrupt                                          scsi_eh

ahci_error_intr
  =>ata_port_freeze
    =>__ata_port_freeze
      =>ahci_freeze (turn IRQ off)
    =>ata_port_abort
      =>ata_port_schedule_eh
        =>shost->host_eh_scheduled++;
	host_eh_scheduled = 1
                                                 scsi_error_handler
						   =>ata_scsi_error
						     =>ata_scsi_port_error_handler
						       =>ahci_error_handler
						       . =>sata_pmp_error_handler
						       .   =>ata_eh_thaw_port
						       .     =>ahci_thaw (turn IRQ on)
ahci_error_intr                                        .
  =>ata_port_freeze                                    .
    =>__ata_port_freeze                                .
      =>ahci_freeze (turn IRQ off)                     .
    =>ata_port_abort                                   .
      =>ata_port_schedule_eh                           .
        =>shost->host_eh_scheduled++;                  .
	host_eh_scheduled = 2                          .
						       =>ata_std_end_eh
						         =>host->host_eh_scheduled = 0;

host_eh_scheduled is 0 and scsi eh thread will not be scheduled again,
and the ata port remain freeze and will never be enabled.
Reported-by: Nluojian <luojian5@huawei.com>
Signed-off-by: NJason Yan <yanaijie@huawei.com>
Reviewed-by: Nzhengbin <zhengbin13@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

02e500a0

scsi: core: Remove scsi_block_when_processing_errors: message · 347b450f

由 Laurence Oberman 提交于 3月 27, 2019

mainline inclusion
from mainline-4.20-rc1
commit 37208bee6a75574f66b28ae6bb536d9f9b6f22bf
category: bugfix
bugzilla: 10010
CVE: NA

---------------------------

This message floods the log when enabling mask 0x7 for
/proc/sys/dev/scsi/logging_level:

 xxxxxxxx kernel: scsi_block_when_processing_errors: rtn: 1

It's not needed and makes tracing just scsi_eh* messages way too
verbose so get rid of it.

[mkp: mangled patch, applied by hand]
Signed-off-by: NLaurence Oberman <loberman@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NChad Dupuis <chad.dupuis@cavium.com>
Reviewed-by: NEwan D. Milne <emilne@redhat.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Nzhengbin <zhengbin13@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

347b450f

25 7月, 2018 1 次提交

scsi: set timed out out mq requests to complete · 065990bd

由 Keith Busch 提交于 7月 23, 2018

The scsi block layer requires requests claimed by the error handling be
completed by the error handler. A previous commit allowed completions
to proceed for blk-mq, breaking that assumption.

This patch prevents completions that may race with the timeout handler
by marking the state to complete, restoring the previous behavior.

Fixes: 12f5b931 ("blk-mq: Remove generation seqeunce")
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

065990bd

27 6月, 2018 1 次提交

scsi: read host_busy via scsi_host_busy() · c84b023a

由 Ming Lei 提交于 6月 24, 2018

No functional change.

Just introduce scsi_host_busy() and replace the direct read of
scsi_host->host_busy with this new API.

Cc: Omar Sandoval <osandov@fb.com>,
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c84b023a

29 5月, 2018 1 次提交

block: rename BLK_EH_NOT_HANDLED to BLK_EH_DONE · 6600593c

由 Christoph Hellwig 提交于 5月 29, 2018

The BLK_EH_NOT_HANDLED implies nothing happen, but very often that
is not what is happening - instead the driver already completed the
command.  Fix the symbolic name to reflect that a little better.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6600593c

14 5月, 2018 2 次提交

block: pass an explicit gfp_t to get_request · 4accf5fc

由 Christoph Hellwig 提交于 5月 09, 2018

blk_old_get_request already has it at hand, and in blk_queue_bio, which
is the fast path, it is constant.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4accf5fc

block: sanitize blk_get_request calling conventions · ff005a06

由 Christoph Hellwig 提交于 5月 09, 2018

Switch everyone to blk_get_request_flags, and then rename
blk_get_request_flags to blk_get_request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ff005a06

21 4月, 2018 2 次提交

scsi: devinfo: BLIST_RETRY_ASC_C1 for Fujitsu ETERNUS · c3606520

由 Martin Wilck 提交于 4月 18, 2018

On Fujitsu ETERNUS systems, sense code ABORTED COMMAND with ASC/Q C1/01
is used to indicate temporary condition where the storage-internal path
to a target is switched from one controller to another. SCSI commands
that return with this error code must be retried unconditionally
(i.e. without the "maybe_retry" logic in scsi_decide_disposition);
otherwise dm-multipath might initiate a failover from a healthy path
e.g. for REQ_FAILFAST_DEV commands.

Introduce a new blist flag for this case.

[mkp: applied by hand]
Signed-off-by: NMartin Wilck <mwilck@suse.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c3606520

scsi: devinfo: add BLIST_RETRY_ITF for EMC Symmetrix · 29cfc2ab

由 Martin Wilck 提交于 4月 18, 2018

EMC Symmetrix returns 'internal target error' for a variety of
conditions, most of which will be transient.  So we should always retry
it, even with failfast set.  Otherwise we'd get spurious path flaps with
multipath.
Signed-off-by: NMartin Wilck <mwilck@suse.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

29cfc2ab

02 3月, 2018 1 次提交

scsi: core: Avoid that ATA error handling can trigger a kernel hang or oops · 3be8828f

由 Bart Van Assche 提交于 2月 22, 2018

Avoid that the recently introduced call_rcu() call in the SCSI core
triggers a double call_rcu() call.
Reported-by: NNatanael Copa <ncopa@alpinelinux.org>
Reported-by: NDamien Le Moal <damien.lemoal@wdc.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=198861
Fixes: 3bd6f43f ("scsi: core: Ensure that the SCSI error handler gets woken up")
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Tested-by: NDamien Le Moal <damien.lemoal@wdc.com>
Cc: Natanael Copa <ncopa@alpinelinux.org>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Alexandre Oliva <oliva@gnu.org>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3be8828f

28 2月, 2018 1 次提交

scsi: core: fix two wrong indentation cases · 8ef7fe4b

由 Jianchao Wang 提交于 2月 26, 2018

No functional changes. Just fix two wrong indentation cases in
scsi_finish_command and scsi_decide_disposition.
Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8ef7fe4b

14 2月, 2018 1 次提交

scsi: core: scmd_eh_abort_handler(): Add a comment · 923f46f9

由 Bart Van Assche 提交于 2月 12, 2018

After the patch that introduced this function was posted on the
linux-scsi mailing list an explanation was posted why this patch is
correct. Since that explanation contains important information, add a
summary of it above the code that explanation applies to.  See also
http://www.spinics.net/lists/linux-scsi/msg106326.html.

References: e494f6a7 ("[SCSI] improved eh timeout handler")
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

923f46f9

08 12月, 2017 2 次提交

scsi: core: Convert a source code comment into a runtime check · f0317e88

由 Bart Van Assche 提交于 12月 04, 2017

Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

f0317e88

scsi: core: Ensure that the SCSI error handler gets woken up · 3bd6f43f

由 Bart Van Assche 提交于 12月 04, 2017

If scsi_eh_scmd_add() is called concurrently with
scsi_host_queue_ready() while shost->host_blocked > 0 then it can
happen that neither function wakes up the SCSI error handler. Fix
this by making every function that decreases the host_busy counter
wake up the error handler if necessary and by protecting the
host_failed checks with the SCSI host lock.
Reported-by: NPavel Tikhomirov <ptikhomirov@virtuozzo.com>
References: https://marc.info/?l=linux-kernel&m=150461610630736
Fixes: commit 74665016 ("scsi: convert host_busy to atomic_t")
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NPavel Tikhomirov <ptikhomirov@virtuozzo.com>
Tested-by: NStuart Hayes <stuart.w.hayes@gmail.com>
Cc: Konstantin Khorenko <khorenko@virtuozzo.com>
Cc: Stuart Hayes <stuart.w.hayes@gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3bd6f43f

03 11月, 2017 1 次提交

scsi: scsi_error: DID_SOFT_ERROR comment clean up · ad95028a

由 Petros Koutoupis 提交于 10月 30, 2017

Updated comment. We are keeping track of maximum number of retries per
command via retries/allowed in struct scsi_cmnd. Corrected comment
positioning.

[mkp: applied by hand]
Signed-off-by: NPetros Koutoupis <petros@petroskoutoupis.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

ad95028a

19 10月, 2017 2 次提交

scsi: scsi_error: Handle power-on reset unit attention · cf3431bb

由 Hannes Reinecke 提交于 10月 17, 2017

As per SAM there is a status precedence, with any sense code 29/XX
taking second place just after an ACA ACTIVE status. Additionally, each
target might prefer to not queue any unit attention conditions, but just
report one. Due to the above, this will be that one with the highest
precedence. This results in the sense code 29/XX effectively
overwriting any other unit attention. Hence we should report the
power-on reset to userland so that it can take appropriate action.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

cf3431bb

scsi: scsi_error: Do not retry illegal function error · a8bbb2ab

由 Hannes Reinecke 提交于 10月 17, 2017

Hitachi USP-V returns 'ILLEGAL FUNCTION' when the internal staging
mechanism encountered an error. These errors should not be retried on
another path.

[mkp: s/invalid/illegal/]
Signed-off-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a8bbb2ab

28 9月, 2017 1 次提交

scsi: ILLEGAL REQUEST + ASC==27 => target failure · d0b7a909

由 Martin Wilck 提交于 9月 27, 2017

ASC 0x27 is "WRITE PROTECTED". This error code is returned e.g.  by
Fujitsu ETERNUS systems under certain conditions for WRITE SAME 16
commands with UNMAP bit set. It should not be treated as a path
error. In general, it makes sense to assume that being write protected
is a target rather than a path property.
Signed-off-by: NMartin Wilck <mwilck@suse.com>
Acked-by: NLee Duncan <lduncan@suse.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

d0b7a909

26 8月, 2017 2 次提交

scsi: Use blk_mq_rq_to_pdu() to convert a request to a SCSI command pointer · bed2213d

由 Bart Van Assche 提交于 8月 25, 2017

Since commit e9c787e6 ("scsi: allocate scsi_cmnd structures as
part of struct request") struct request and struct scsi_cmnd are
adjacent. This means that there is now an alternative to reading
req->special to convert a pointer to a prepared request into a
SCSI command pointer, namely by using blk_mq_rq_to_pdu(). Make
this change where appropriate. Although this patch does not
change any functionality, it slightly improves performance and
slightly improves readability.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

bed2213d

scsi: Suppress gcc 7 fall-through warnings reported with W=1 · 3bf2ff67

由 Bart Van Assche 提交于 8月 25, 2017

The conclusion of a recent discussion about the new warnings
reported by gcc 7 is that the new warnings reported when building
with W=1 should be suppressed. However, gcc 7 still warns about
fall-through in switch statements when building with W=1. Suppress
these warnings by annotating the SCSI core properly.

See also Linus Torvalds, Lots of new warnings with gcc-7.1.1, 11
July 2017 (https://www.mail-archive.com/linux-media@vger.kernel.org/msg115428.html).

References: commit bd664f6b ("disable new gcc-7.1.1 warnings for now")
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3bf2ff67

21 6月, 2017 1 次提交

block: Make most scsi_req_init() calls implicit · ca18d6f7

由 Bart Van Assche 提交于 6月 20, 2017

Instead of explicitly calling scsi_req_init() after blk_get_request(),
call that function from inside blk_get_request(). Add an
.initialize_rq_fn() callback function to the block drivers that need
it. Merge the IDE .init_rq_fn() function into .initialize_rq_fn()
because it is too small to keep it as a separate function. Keep the
scsi_req_init() call in ide_prep_sense() because it follows a
blk_rq_init() call.

References: commit 82ed4db4 ("block: split scsi_request out of struct request")
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ca18d6f7

13 6月, 2017 1 次提交

scsi: Protect SCSI device state changes with a mutex · 0db6ca8a

由 Bart Van Assche 提交于 6月 02, 2017

Serializing SCSI device state changes avoids that two state changes can
occur concurrently, e.g. the state changes in scsi_target_block() and
__scsi_remove_device(). This serialization is essential to make patch
"Make __scsi_remove_device go straight from BLOCKED to DEL" work
reliably.

Enable this mechanism for all scsi_target_*block() callers but not for
the scsi_internal_device_unblock() calls from the mpt3sas driver because
that driver can call scsi_internal_device_unblock() from atomic context.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

0db6ca8a

09 6月, 2017 1 次提交

block: introduce new block status code type · 2a842aca

由 Christoph Hellwig 提交于 6月 03, 2017

Currently we use nornal Linux errno values in the block layer, and while
we accept any error a few have overloaded magic meanings. This patch
instead introduces a new blk_status_t value that holds block layer specific
status codes and explicitly explains their meaning. Helpers to convert from
and to the previous special meanings are provided for now, but I suspect
we want to get rid of them in the long run - those drivers that have a
errno input (e.g. networking) usually get errnos that don't know about
the special block layer overloads, and similarly returning them to userspace
will usually return somethings that strictly speaking isn't correct
for file system operations, but that's left as an exercise for later.

For now the set of errors is a very limited set that closely corresponds
to the previous overloaded errno values, but there is some low hanging
fruite to improve it.

blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
typechecking, so that we can easily catch places passing the wrong values.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

2a842aca

26 4月, 2017 1 次提交

scsi: Improve scsi_get_sense_info_fld · 2908769c

由 Damien Le Moal 提交于 4月 24, 2017

Use get_unaligned_be32 and get_unaligned_be64 to obtain values from the
sense buffer instead of open coding the operations.  Also change the
function return value to a bool and fix the function signature
declaration to remove spaces triggering checkpatch warnings.

No functional change is introduced by this patch.
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

2908769c

07 4月, 2017 5 次提交

scsi: make asynchronous aborts mandatory · a0658632

由 Hannes Reinecke 提交于 4月 06, 2017

There hasn't been any reports for HBAs where asynchronous abort
would not work, so we should make it mandatory and remove
the fallback.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a0658632

scsi: make scsi_eh_scmd_add() always succeed · 2171b6d0

由 Hannes Reinecke 提交于 4月 06, 2017

scsi_eh_scmd_add() currently only will fail if no
error handler thread is started (which will never be the
case) or if the state machine encounters an illegal transition.

But if we're encountering an invalid state transition
chances is we cannot fixup things with the error handler.
So better add a WARN_ON for illegal host states and
make scsi_dh_scmd_add() a void function.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

2171b6d0

scsi: make eh_eflags persistent · 8e8c9d01

由 Hannes Reinecke 提交于 4月 06, 2017

If a failed command is retried and fails again we need
to enter SCSI EH, otherwise we will never be able to
recover the command.
To detect this situation we must not clear scmd->eh_eflags
when EH finishes but rather make it persistent throughout
the lifetime of the command.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NBenjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8e8c9d01

scsi: always send command aborts · 1bcb9304

由 Hannes Reinecke 提交于 4月 06, 2017

When a command has timed out we always should be sending an
abort; with the previous code a failed abort might signal
SCSI EH to start, and all other timed out commands will
never be aborted, even though they might belong to a
different ITL nexus.

Cc: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

1bcb9304

scsi: scsi_error: count medium access timeout only once per EH run · 7a38dc0b

由 Hannes Reinecke 提交于 4月 06, 2017

The current medium access timeout counter will be increased for
each command, so if there are enough failed commands we'll hit
the medium access timeout for even a single device failure and
the following kernel message is displayed:

sd H:C:T:L: [sdXY] Medium access timeout failure. Offlining disk!

Fix this by making the timeout per EH run, ie the counter will
only be increased once per device and EH run.

Fixes: 18a4d0a2 ("[SCSI] Handle disk devices which can not process medium access commands")
Cc: Ewan Milne <emilne@redhat.com>
Cc: Lawrence Obermann <loberman@redhat.com>
Cc: Benjamin Block <bblock@linux.vnet.ibm.com>
Cc: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

7a38dc0b

06 4月, 2017 1 次提交

block, scsi: move the retries field to struct scsi_request · 64c7f1d1

由 Christoph Hellwig 提交于 4月 05, 2017

Instead of bloating the generic struct request with it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

64c7f1d1

07 2月, 2017 1 次提交

scsi: remove eh_timed_out methods in the transport template · b6a05c82

由 Christoph Hellwig 提交于 1月 30, 2017

Instead define the timeout behavior purely based on the host_template
eh_timed_out method and wire up the existing transport implementations
in the host templates.  This also clears up the confusion that the
transport template method overrides the host template one, so some
drivers have to re-override the transport template one.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NTyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

b6a05c82

01 2月, 2017 2 次提交

block: fold cmd_type into the REQ_OP_ space · aebf526b

由 Christoph Hellwig 提交于 1月 31, 2017

Instead of keeping two levels of indirection for requests types, fold it
all into the operations.  The little caveat here is that previously
cmd_type only applied to struct request, while the request and bio op
fields were set to plain REQ_OP_READ/WRITE even for passthrough
operations.

Instead this patch adds new REQ_OP_* for SCSI passthrough and driver
private requests, althought it has to add two for each so that we
can communicate the data in/out nature of the request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

aebf526b

block: introduce blk_rq_is_passthrough · 57292b58

由 Christoph Hellwig 提交于 1月 31, 2017

This can be used to check for fs vs non-fs requests and basically
removes all knowledge of BLOCK_PC specific from the block layer,
as well as preparing for removing the cmd_type field in struct request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

57292b58

28 1月, 2017 2 次提交

block: split scsi_request out of struct request · 82ed4db4

由 Christoph Hellwig 提交于 1月 27, 2017

And require all drivers that want to support BLOCK_PC to allocate it
as the first thing of their private data.  To support this the legacy
IDE and BSG code is switched to set cmd_size on their queues to let
the block layer allocate the additional space.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

82ed4db4

scsi: allocate scsi_cmnd structures as part of struct request · e9c787e6

由 Christoph Hellwig 提交于 1月 02, 2017

Rely on the new block layer functionality to allocate additional driver
specific data behind struct request instead of implementing it in SCSI
itѕelf.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

e9c787e6

28 10月, 2016 1 次提交

block: split out request-only flags into a new namespace · e8064021

由 Christoph Hellwig 提交于 10月 20, 2016

A lot of the REQ_* flags are only used on struct requests, and only of
use to the block layer and a few drivers that dig into struct request
internals.

This patch adds a new req_flags_t rq_flags field to struct request for
them, and thus dramatically shrinks the number of common requests.  It
also removes the unfortunate situation where we have to fit the fields
from the same enum into 32 bits for struct bio and 64 bits for
struct request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NShaun Tancheff <shaun.tancheff@seagate.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

e8064021

09 6月, 2016 1 次提交

scsi: fix race between simultaneous decrements of ->host_failed · 72d8c36e

由 Wei Fang 提交于 6月 07, 2016

sas_ata_strategy_handler() adds the works of the ata error handler to
system_unbound_wq. This workqueue asynchronously runs work items, so the
ata error handler will be performed concurrently on different CPUs. In
this case, ->host_failed will be decreased simultaneously in
scsi_eh_finish_cmd() on different CPUs, and become abnormal.

It will lead to permanently inequality between ->host_failed and
->host_busy, and scsi error handler thread won't start running. IO
errors after that won't be handled.

Since all scmds must have been handled in the strategy handler, just
remove the decrement in scsi_eh_finish_cmd() and zero ->host_busy after
the strategy handler to fix this race.

Fixes: 50824d6c ("[SCSI] libsas: async ata-eh")
Cc: stable@vger.kernel.org
Signed-off-by: NWei Fang <fangwei1@huawei.com>
Reviewed-by: NJames Bottomley <jejb@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

72d8c36e

05 4月, 2016 1 次提交

libata: evaluate SCSI sense code · 3852e373

由 Hannes Reinecke 提交于 4月 04, 2016

Whenever a sense code is set it would need to be evaluated to
update the error mask.

tj: Cosmetic formatting updates.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NTejun Heo <tj@kernel.org>

3852e373