1. 07 10月, 2020 1 次提交
    • X
      scsi: hisi_sas: Filter out new PHY up events during suspend · b14a37e0
      Xiang Chen 提交于
      Currently sas_resume_ha() is called while resuming the controller to wait
      for all suspended PHYs to come up and all the libsas events to be
      completed.
      
      There is a scenario which will cause task hung: For direct attach with two
      disks connected with two PHYs, disable phy0 before suspending the disk on
      phy1 and the controller, then enable phy0 and resume the controller, and
      task hung occurs as follows:
      
      [  591.901463] hisi_sas_v3_hw 0000:b4:02.0: resuming from operating state [D0]
      [  593.113525] hisi_sas_v3_hw 0000:b4:02.0: neither _PS0 nor _PR0 is defined
      [  593.120301] hisi_sas_v3_hw 0000:b4:02.0: waiting up to 25 seconds for 1 phy to resume
      [  593.120836] hisi_sas_v3_hw 0000:b4:02.0: phyup: phy0 link_rate=10(sata)
      [  593.134680] hisi_sas_v3_hw 0000:b4:02.0: phyup: phy1 link_rate=10(sata)
      [  593.134733] sas: phy-2:0 added to port-2:0, phy_mask:0x1 (5000000000000200)
      [  593.148350] sas: DOING DISCOVERY on port 0, pid:948
      [  593.153227] hisi_sas_v3_hw 0000:b4:02.0: dev[3:5] found
      [  593.159840] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
      [  593.165663] sas: ata7: end_device-2:0: dev error handler
      [  593.165730] sas: ata2: end_device-2:1: dev error handler
      [  593.172532] hisi_sas_v3_hw 0000:b4:02.0: phydown: phy0 phy_state=0x2
      [  593.182570] hisi_sas_v3_hw 0000:b4:02.0: ignore flutter phy0 down
      [  593.331277] hisi_sas_v3_hw 0000:b4:02.0: phyup: phy0 link_rate=10(sata)
      [  593.498956] ata7.00: ATA-11: SAMSUNG MZ7LH960HAJR-00005, HXT7404Q, max UDMA/133
      [  593.506235] ata7.00: 1875385008 sectors, multi 16: LBA48 NCQ (depth 32)
      [  593.514295] ata7.00: configured for UDMA/133
      [  593.518557] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
      [  593.528613] sas: ata7: end_device-2:0: model:SAMSUNG MZ7LH960HAJR-00005
      serial:S45NNA0M712225
      [  593.537520] device_link_add 316: dev=2:0:2:0 supplier:2 consumer:0
      [  593.543674] device_link_add 324
      [  593.546801] device_link_add 352
      [  593.549930] device_link_add 406
      [  593.553058] device_link_add 440: dev=2:0:2:0 supplier:2 consumer:0
      [  593.559208] device_link_add 444
      [  593.562335] device_link_add 455
      [  593.565517] scsi 2:0:2:0: Direct-Access     ATA      SAMSUNG MZ7LH960 404Q PQ: 0
      ANSI: 5
      [  620.057464]  phy-2:1: resume timeout
      [  738.841445] INFO: task kworker/u256:0:8 blocked for more than 120 seconds.
      [  738.848295]       Not tainted 5.8.0-rc1-76154-g0d52b59-dirty #744
      [  738.854361] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  738.862155] kworker/u256:0  D    0     8      2 0x00000028
      [  738.867626] Workqueue: 0000:b4:02.0_event_q sas_port_event_worker
      [  738.873693] Call trace:
      [  738.876133]  __switch_to+0xf4/0x148
      [  738.879613]  __schedule+0x270/0x5d8
      [  738.883091]  schedule+0x78/0x110
      [  738.886307]  schedule_timeout+0x1ac/0x280
      [  738.890299]  wait_for_completion+0x94/0x138
      [  738.894472]  flush_workqueue+0x114/0x438
      [  738.898377]  sas_porte_bytes_dmaed+0x400/0x500
      [  738.902801]  sas_port_event_worker+0x28/0x40
      [  738.907053]  process_one_work+0x1e8/0x360
      [  738.911046]  worker_thread+0x44/0x478
      [  738.914698]  kthread+0x150/0x158
      [  738.917915]  ret_from_fork+0x10/0x1c
      [  738.921534] INFO: task kworker/u256:1:948 blocked for more than 120 seconds.
      [  738.928550]       Not tainted 5.8.0-rc1-76154-g0d52b59-dirty #744
      [  738.934614] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  738.942408] kworker/u256:1  D    0   948      2 0x00000028
      [  738.947873] Workqueue: 0000:b4:02.0_disco_q sas_discover_domain
      [  738.953766] Call trace:
      [  738.956203]  __switch_to+0xf4/0x148
      [  738.959678]  __schedule+0x270/0x5d8
      [  738.963152]  schedule+0x78/0x110
      [  738.966368]  rpm_resume+0xcc/0x550
      [  738.969757]  __pm_runtime_resume+0x3c/0x88
      [  738.973836]  rpm_get_suppliers+0x50/0x148
      [  738.977829]  __pm_runtime_set_status+0x124/0x2f0
      [  738.982427]  scsi_sysfs_add_sdev+0x1a0/0x2a8
      [  738.986679]  scsi_probe_and_add_lun+0x888/0xab0
      [  738.991190]  __scsi_scan_target+0xec/0x520
      [  738.995268]  scsi_scan_target+0x11c/0x128
      [  738.999261]  sas_rphy_add+0x15c/0x1e8
      [  739.002907]  sas_probe_devices+0xe4/0x150
      [  739.006899]  sas_discover_domain+0x33c/0x588
      [  739.011150]  process_one_work+0x1e8/0x360
      [  739.015143]  worker_thread+0x44/0x478
      [  739.018789]  kthread+0x150/0x158
      [  739.022003]  ret_from_fork+0x10/0x1c
      ...
      
      If an extra phy0 up happens during resume of the SAS controller, it will
      emit a new libsas event (event PORTE_BYTES_DMAED and event
      DISCE_DISCOVER_DOMAIN). We will call function scsi_sysfs_add_sdev() in
      event DISCE_DISCOVER_DOMAIN, which will call __pm_runtime_set_status() to
      resume supplier (host controller). For runtime PM core, if device is in the
      resuming state, the later resume request of the device will wait for
      previous resume request to complete synchronously. At that point in time
      the state of the controller is still resuming as it waits for all libsas
      events to be completed, while libsas event DISCE_DISCOVER_DOMAIN is blocked
      as the state of the controller is resuming which causes a deadlock.
      
      To avoid the issue, filter out new PHY up events while the controller is
      suspended.
      
      Link: https://lore.kernel.org/r/1601649038-25534-7-git-send-email-john.garry@huawei.comSigned-off-by: NXiang Chen <chenxiang66@hisilicon.com>
      Signed-off-by: NJohn Garry <john.garry@huawei.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      b14a37e0
  2. 03 9月, 2020 5 次提交
  3. 20 5月, 2020 1 次提交
  4. 21 1月, 2020 4 次提交
  5. 13 11月, 2019 2 次提交
  6. 25 10月, 2019 17 次提交
  7. 01 10月, 2019 1 次提交
  8. 24 9月, 2019 1 次提交
  9. 11 9月, 2019 8 次提交