1. 27 12月, 2019 4 次提交
    • Z
      scsi: fix ata_port_wait_eh() hang caused by missing to wake up eh thread · c2b89ae5
      zhengbin 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 12843
      CVE: NA
      
      ---------------------------
      
      When I use fio test kernel in the following steps:
      1.The sas controller mixes SAS/SATA disks
      2.Use fio test all disks
      3.Simultaneous enable/disable/link_reset/hard_reset PHY
      
      it will hang in ata_port_wait_eh
      Call trace:
       __switch_to+0xb4/0x1b8
       __schedule+0x1e8/0x718
       schedule+0x38/0x90
       ata_port_wait_eh+0x70/0xf8
       sas_ata_wait_eh+0x24/0x30 [libsas]
       transport_sas_phy_reset.isra.3+0x128/0x160 [libsas]
       phy_reset_work+0x20/0x30 [libsas]
       process_one_work+0x1e4/0x460
       worker_thread+0x40/0x450
       kthread+0x12c/0x130
       ret_from_fork+0x10/0x18
      
      The key code process is like this:
      scsi_dec_host_busy
      	atomic_dec(&shost->host_busy);
      	if (unlikely(scsi_host_in_recovery(shost))) {
      		spin_lock_irqsave(shost->host_lock, flags);
      		...
      		scsi_eh_wakeup(shost)
      		...
      	}
      
      scsi_schedule_eh
      	spin_lock_irqsave(shost->host_lock, flags);
      	if (scsi_host_set_state(shost, SHOST_RECOVERY) == 0 ||
      	    scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY) == 0) {
      		...
      		scsi_eh_wakeup(shost);
      	}
      
      scsi_eh_wakeup
      	if (scsi_host_busy(shost) == shost->host_failed)
      		wake_up_process(shost->ehandler);
      
      In scsi_dec_host_busy, host_busy & shost_state not in spinlock. Neither
      function wakes up the SCSI error handler in the following timing:
      
      CPU 0(call scsi_dec_host_busy)    CPU 1(call scsi_schedule_eh)
      LOAD shost_state(!=recovery)
                                        scsi_host_set_state(SHOST_RECOVERY)
                                        scsi_eh_wakeup(host_busy != host_failed)
      atomic_dec(&shost->host_busy);
      if (scsi_host_in_recovery(shost))
      
      Add a smp_mb between host_busy and shost_state.
      Signed-off-by: Nzhengbin <zhengbin13@huawei.com>
      [yan: backport from 5.0]
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: NMiao Xie <miaoxie@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      c2b89ae5
    • J
      ahci: prevent freezing port when EH is running · cb7fc545
      Jason Yan 提交于
      euler inclusion
      category: bugfix
      bugzilla: NA
      CVE: NA
      
      ---------------------------
      
      Trinity report a warning for this patch:
      WARNING: CPU: 1 PID: 118 at ../drivers/ata/libata-eh.c:4016
      ata_eh_finish+0x15a/0x170
      
      Fixing the race condition between EH and interrupt by making the EH
      thread re-enter again is a little overkill and IO will get through
      after the scsi_run_host_queues() and before SHOST_RECOVERY is set agian
      in scsi_restart_operations().
      
      If EH thread is already running, no need to freeze port and schedule
      EH again.
      
      Fixes: a7d2fef75b83 ("scsi: ata: Fix a race condition between scsi error handler and ahci interrupt")
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: Nzhengbin <zhengbin13@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      cb7fc545
    • J
      scsi: ata: Fix a race condition between scsi error handler and ahci interrupt · 02e500a0
      Jason Yan 提交于
      euler inclusion
      category: bugfix
      bugzilla: NA
      CVE: NA
      
      ---------------------------
      
         interrupt                                          scsi_eh
      
      ahci_error_intr
        =>ata_port_freeze
          =>__ata_port_freeze
            =>ahci_freeze (turn IRQ off)
          =>ata_port_abort
            =>ata_port_schedule_eh
              =>shost->host_eh_scheduled++;
      	host_eh_scheduled = 1
                                                       scsi_error_handler
      						   =>ata_scsi_error
      						     =>ata_scsi_port_error_handler
      						       =>ahci_error_handler
      						       . =>sata_pmp_error_handler
      						       .   =>ata_eh_thaw_port
      						       .     =>ahci_thaw (turn IRQ on)
      ahci_error_intr                                        .
        =>ata_port_freeze                                    .
          =>__ata_port_freeze                                .
            =>ahci_freeze (turn IRQ off)                     .
          =>ata_port_abort                                   .
            =>ata_port_schedule_eh                           .
              =>shost->host_eh_scheduled++;                  .
      	host_eh_scheduled = 2                          .
      						       =>ata_std_end_eh
      						         =>host->host_eh_scheduled = 0;
      
      host_eh_scheduled is 0 and scsi eh thread will not be scheduled again,
      and the ata port remain freeze and will never be enabled.
      Reported-by: Nluojian <luojian5@huawei.com>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: Nzhengbin <zhengbin13@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      02e500a0
    • L
      scsi: core: Remove scsi_block_when_processing_errors: message · 347b450f
      Laurence Oberman 提交于
      mainline inclusion
      from mainline-4.20-rc1
      commit 37208bee6a75574f66b28ae6bb536d9f9b6f22bf
      category: bugfix
      bugzilla: 10010
      CVE: NA
      
      ---------------------------
      
      This message floods the log when enabling mask 0x7 for
      /proc/sys/dev/scsi/logging_level:
      
       xxxxxxxx kernel: scsi_block_when_processing_errors: rtn: 1
      
      It's not needed and makes tracing just scsi_eh* messages way too
      verbose so get rid of it.
      
      [mkp: mangled patch, applied by hand]
      Signed-off-by: NLaurence Oberman <loberman@redhat.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NChad Dupuis <chad.dupuis@cavium.com>
      Reviewed-by: NEwan D. Milne <emilne@redhat.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: Nzhengbin <zhengbin13@huawei.com>
      Reviewed-by: NJason Yan <yanaijie@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      347b450f
  2. 25 7月, 2018 1 次提交
  3. 27 6月, 2018 1 次提交
    • M
      scsi: read host_busy via scsi_host_busy() · c84b023a
      Ming Lei 提交于
      No functional change.
      
      Just introduce scsi_host_busy() and replace the direct read of
      scsi_host->host_busy with this new API.
      
      Cc: Omar Sandoval <osandov@fb.com>,
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
      Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
      Cc: Christoph Hellwig <hch@lst.de>,
      Cc: Don Brace <don.brace@microsemi.com>
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Laurence Oberman <loberman@redhat.com>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      c84b023a
  4. 29 5月, 2018 1 次提交
  5. 14 5月, 2018 2 次提交
  6. 21 4月, 2018 2 次提交
  7. 02 3月, 2018 1 次提交
  8. 28 2月, 2018 1 次提交
  9. 14 2月, 2018 1 次提交
  10. 08 12月, 2017 2 次提交
  11. 03 11月, 2017 1 次提交
  12. 19 10月, 2017 2 次提交
  13. 28 9月, 2017 1 次提交
  14. 26 8月, 2017 2 次提交
  15. 21 6月, 2017 1 次提交
    • B
      block: Make most scsi_req_init() calls implicit · ca18d6f7
      Bart Van Assche 提交于
      Instead of explicitly calling scsi_req_init() after blk_get_request(),
      call that function from inside blk_get_request(). Add an
      .initialize_rq_fn() callback function to the block drivers that need
      it. Merge the IDE .init_rq_fn() function into .initialize_rq_fn()
      because it is too small to keep it as a separate function. Keep the
      scsi_req_init() call in ide_prep_sense() because it follows a
      blk_rq_init() call.
      
      References: commit 82ed4db4 ("block: split scsi_request out of struct request")
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Nicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ca18d6f7
  16. 13 6月, 2017 1 次提交
    • B
      scsi: Protect SCSI device state changes with a mutex · 0db6ca8a
      Bart Van Assche 提交于
      Serializing SCSI device state changes avoids that two state changes can
      occur concurrently, e.g. the state changes in scsi_target_block() and
      __scsi_remove_device(). This serialization is essential to make patch
      "Make __scsi_remove_device go straight from BLOCKED to DEL" work
      reliably.
      
      Enable this mechanism for all scsi_target_*block() callers but not for
      the scsi_internal_device_unblock() calls from the mpt3sas driver because
      that driver can call scsi_internal_device_unblock() from atomic context.
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      0db6ca8a
  17. 09 6月, 2017 1 次提交
    • C
      block: introduce new block status code type · 2a842aca
      Christoph Hellwig 提交于
      Currently we use nornal Linux errno values in the block layer, and while
      we accept any error a few have overloaded magic meanings.  This patch
      instead introduces a new  blk_status_t value that holds block layer specific
      status codes and explicitly explains their meaning.  Helpers to convert from
      and to the previous special meanings are provided for now, but I suspect
      we want to get rid of them in the long run - those drivers that have a
      errno input (e.g. networking) usually get errnos that don't know about
      the special block layer overloads, and similarly returning them to userspace
      will usually return somethings that strictly speaking isn't correct
      for file system operations, but that's left as an exercise for later.
      
      For now the set of errors is a very limited set that closely corresponds
      to the previous overloaded errno values, but there is some low hanging
      fruite to improve it.
      
      blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
      typechecking, so that we can easily catch places passing the wrong values.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2a842aca
  18. 26 4月, 2017 1 次提交
  19. 07 4月, 2017 5 次提交
  20. 06 4月, 2017 1 次提交
  21. 07 2月, 2017 1 次提交
  22. 01 2月, 2017 2 次提交
  23. 28 1月, 2017 2 次提交
  24. 28 10月, 2016 1 次提交
  25. 09 6月, 2016 1 次提交
    • W
      scsi: fix race between simultaneous decrements of ->host_failed · 72d8c36e
      Wei Fang 提交于
      sas_ata_strategy_handler() adds the works of the ata error handler to
      system_unbound_wq. This workqueue asynchronously runs work items, so the
      ata error handler will be performed concurrently on different CPUs. In
      this case, ->host_failed will be decreased simultaneously in
      scsi_eh_finish_cmd() on different CPUs, and become abnormal.
      
      It will lead to permanently inequality between ->host_failed and
      ->host_busy, and scsi error handler thread won't start running. IO
      errors after that won't be handled.
      
      Since all scmds must have been handled in the strategy handler, just
      remove the decrement in scsi_eh_finish_cmd() and zero ->host_busy after
      the strategy handler to fix this race.
      
      Fixes: 50824d6c ("[SCSI] libsas: async ata-eh")
      Cc: stable@vger.kernel.org
      Signed-off-by: NWei Fang <fangwei1@huawei.com>
      Reviewed-by: NJames Bottomley <jejb@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      72d8c36e
  26. 05 4月, 2016 1 次提交