1. 04 5月, 2019 1 次提交
    • S
      scsi: mpt3sas: Fix kernel panic during expander reset · 2c8c8ef8
      Sreekanth Reddy 提交于
      [ Upstream commit c2fe742ff6e77c5b4fe4ad273191ddf28fdea25e ]
      
      During expander reset handling, the driver invokes kernel function
      scsi_host_find_tag() to obtain outstanding requests associated with the
      scsi host managed by the driver. Driver loops from tag value zero to hba
      queue depth to obtain the outstanding scmds. But when blk-mq is enabled,
      the block layer may return stale entry for one or more requests. This may
      lead to kernel panic if the returned value is inaccessible or the memory
      pointed by the returned value is reused.
      
      Reference of upstream discussion:
      
      	https://patchwork.kernel.org/patch/10734933/
      
      Instead of calling scsi_host_find_tag() API for each and every smid (smid
      is tag +1) from one to shost->can_queue, now driver will call this API (to
      obtain the outstanding scmd) only for those smid's which are outstanding at
      the driver level.
      
      Driver will determine whether this smid is outstanding at driver level by
      looking into it's corresponding MPI request frame, if its MPI request frame
      is empty, then it means that this smid is free and does not need to call
      scsi_host_find_tag() for it.  By doing this, driver will invoke
      scsi_host_find_tag() for only those tags which are outstanding at the
      driver level.
      
      Driver will check whether particular MPI request frame is empty or not by
      looking into the "DevHandle" field. If this field is zero then it means
      that this MPI request is empty. For active MPI request DevHandle must be
      non-zero.
      
      Also driver will memset the MPI request frame once the corresponding scmd
      is processed (i.e. just before calling
      scmd->done function).
      Signed-off-by: NSreekanth Reddy <sreekanth.reddy@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
      2c8c8ef8
  2. 13 2月, 2019 1 次提交
  3. 09 8月, 2018 1 次提交
    • S
      scsi: mpt3sas: Fix calltrace observed while running IO & reset · e7018314
      Sreekanth Reddy 提交于
      Below kernel BUG was observed while running IOs with host reset (issued
      from application),
      
      mpt3sas_cm0: diag reset: SUCCESS
      ------------[ cut here ]------------
      WARNING: CPU: 12 PID: 4336 at drivers/scsi/mpt3sas/mpt3sas_base.c:3282 mpt3sas_base_clear_st+0x3d/0x40 [mpt3sas]
      Modules linked in: macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun devlink ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support
       dcdbas pcspkr joydev ipmi_ssif ses enclosure sg ipmi_devintf acpi_pad ipmi_msghandler acpi_power_meter mei_me lpc_ich wmi mei shpchp ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi uas usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix mpt3sas libata crct10dif_pclmul crct10dif_common tg3 crc32c_intel i2c_core raid_class ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod
      CPU: 12 PID: 4336 Comm: python Kdump: loaded Tainted: G        W      ------------   3.10.0-875.el7.brdc.x86_64 #1
      Hardware name: Dell Inc. PowerEdge R820/0YWR73, BIOS 1.5.0 03/08/2013
      Call Trace:
       [<ffffffff9cf16583>] dump_stack+0x19/0x1b
       [<ffffffff9c891698>] __warn+0xd8/0x100
       [<ffffffff9c8917dd>] warn_slowpath_null+0x1d/0x20
       [<ffffffffc04f3f4d>] mpt3sas_base_clear_st+0x3d/0x40 [mpt3sas]
       [<ffffffffc05047d2>] _scsih_flush_running_cmds+0x92/0xe0 [mpt3sas]
       [<ffffffffc05095db>] mpt3sas_scsih_reset_handler+0x43b/0xaf0 [mpt3sas]
       [<ffffffff9c894829>] ? vprintk_default+0x29/0x40
       [<ffffffff9cf10531>] ? printk+0x60/0x77
       [<ffffffffc04f06c8>] ? _base_diag_reset+0x238/0x340 [mpt3sas]
       [<ffffffffc04f794d>] mpt3sas_base_hard_reset_handler+0x1ad/0x420 [mpt3sas]
       [<ffffffffc05132b9>] _ctl_ioctl_main.isra.12+0x11b9/0x1200 [mpt3sas]
       [<ffffffffc068d585>] ? xfs_file_aio_write+0x155/0x1b0 [xfs]
       [<ffffffff9ca1a4e3>] ? do_sync_write+0x93/0xe0
       [<ffffffffc051337a>] _ctl_ioctl+0x1a/0x20 [mpt3sas]
       [<ffffffff9ca2fe90>] do_vfs_ioctl+0x350/0x560
       [<ffffffff9ca1dec1>] ? __sb_end_write+0x31/0x60
       [<ffffffff9ca30141>] SyS_ioctl+0xa1/0xc0
       [<ffffffff9cf28715>] ? system_call_after_swapgs+0xa2/0x146
       [<ffffffff9cf287d5>] system_call_fastpath+0x1c/0x21
       [<ffffffff9cf28721>] ? system_call_after_swapgs+0xae/0x146
      ---[ end trace 5dac5b98d89aaa3c ]---
      ------------[ cut here ]------------
      kernel BUG at block/blk-core.c:1476!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag binfmt_misc fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun devlink ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support
       dcdbas pcspkr joydev ipmi_ssif ses enclosure sg ipmi_devintf acpi_pad ipmi_msghandler acpi_power_meter mei_me lpc_ich wmi mei shpchp ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi uas usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix mpt3sas libata crct10dif_pclmul crct10dif_common tg3 crc32c_intel i2c_core raid_class ptp scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod
      CPU: 12 PID: 4336 Comm: python Kdump: loaded Tainted: G        W      ------------   3.10.0-875.el7.brdc.x86_64 #1
      Hardware name: Dell Inc. PowerEdge R820/0YWR73, BIOS 1.5.0 03/08/2013
      task: ffff903fc96e0fd0 ti: ffff903fb1eec000 task.ti: ffff903fb1eec000
      RIP: 0010:[<ffffffff9cb19ec0>]  [<ffffffff9cb19ec0>] blk_requeue_request+0x90/0xa0
      RSP: 0018:ffff903c6b783dc0  EFLAGS: 00010087
      RAX: ffff903bb67026d0 RBX: ffff903b7d6a6140 RCX: dead000000000200
      RDX: ffff903bb67026d0 RSI: ffff903bb6702580 RDI: ffff903bb67026d0
      RBP: ffff903c6b783dd8 R08: ffff903bb67026d0 R09: ffffd97e80000000
      R10: ffff903c658bac00 R11: 0000000000000000 R12: ffff903bb6702580
      R13: ffff903fa9a292f0 R14: 0000000000000246 R15: 0000000000001057
      FS:  00007f7026f5b740(0000) GS:ffff903c6b780000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f298877c004 CR3: 00000000caf36000 CR4: 00000000000607e0
      Call Trace:
       <IRQ>
       [<ffffffff9cca68ff>] __scsi_queue_insert+0xbf/0x110
       [<ffffffff9cca79ca>] scsi_io_completion+0x5da/0x6a0
       [<ffffffff9cc9ca3c>] scsi_finish_command+0xdc/0x140
       [<ffffffff9cca6aa2>] scsi_softirq_done+0x132/0x160
       [<ffffffff9cb240c6>] blk_done_softirq+0x96/0xc0
       [<ffffffff9c89a905>] __do_softirq+0xf5/0x280
       [<ffffffff9cf2bd2c>] call_softirq+0x1c/0x30
       [<ffffffff9c82d625>] do_softirq+0x65/0xa0
       [<ffffffff9c89ac85>] irq_exit+0x105/0x110
       [<ffffffff9cf2d0a8>] smp_apic_timer_interrupt+0x48/0x60
       [<ffffffff9cf297f2>] apic_timer_interrupt+0x162/0x170
       <EOI>
       [<ffffffff9cca5f41>] ? scsi_done+0x21/0x60
       [<ffffffff9cb5ac18>] ? delay_tsc+0x38/0x60
       [<ffffffff9cb5ab5d>] __const_udelay+0x2d/0x30
       [<ffffffffc04effde>] _base_handshake_req_reply_wait+0x8e/0x4a0 [mpt3sas]
       [<ffffffffc04f0b13>] _base_get_ioc_facts+0x123/0x590 [mpt3sas]
       [<ffffffffc04f06c8>] ? _base_diag_reset+0x238/0x340 [mpt3sas]
       [<ffffffffc04f7993>] mpt3sas_base_hard_reset_handler+0x1f3/0x420 [mpt3sas]
       [<ffffffffc05132b9>] _ctl_ioctl_main.isra.12+0x11b9/0x1200 [mpt3sas]
       [<ffffffffc068d585>] ? xfs_file_aio_write+0x155/0x1b0 [xfs]
       [<ffffffff9ca1a4e3>] ? do_sync_write+0x93/0xe0
       [<ffffffffc051337a>] _ctl_ioctl+0x1a/0x20 [mpt3sas]
       [<ffffffff9ca2fe90>] do_vfs_ioctl+0x350/0x560
       [<ffffffff9ca1dec1>] ? __sb_end_write+0x31/0x60
       [<ffffffff9ca30141>] SyS_ioctl+0xa1/0xc0
       [<ffffffff9cf28715>] ? system_call_after_swapgs+0xa2/0x146
       [<ffffffff9cf287d5>] system_call_fastpath+0x1c/0x21
       [<ffffffff9cf28721>] ? system_call_after_swapgs+0xae/0x146
      Code: 83 c3 10 4c 89 e2 4c 89 ee e8 8d 21 04 00 48 8b 03 48 85 c0 75 e5 41 f6 44 24 4a 10 74 ad 4c 89 e6 4c 89 ef e8 b2 42 00 00 eb a0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
      RIP  [<ffffffff9cb19ec0>] blk_requeue_request+0x90/0xa0
       RSP <ffff903c6b783dc0>
      
      As a part of host reset operation, driver will flushout all IOs outstanding
      at driver level with "DID_RESET" result.  To find which are all commands
      outstanding at the driver level, driver loops with smid starting from one
      to HBA queue depth and calls mpt3sas_scsih_scsi_lookup_get() to get scmd as
      shown below
      
       for (smid = 1; smid <= ioc->scsiio_depth; smid++) {
                      scmd = mpt3sas_scsih_scsi_lookup_get(ioc, smid);
                      if (!scmd)
                              continue;
      
      But in mpt3sas_scsih_scsi_lookup_get() function, driver returns some scsi
      cmnds which are not outstanding at the driver level (possibly request is
      constructed at block layer since QUEUE_FLAG_QUIESCED is not set. Even if
      driver uses scsi_block_requests and scsi_unblock_requests, issue still
      persists as they will be just blocking further IO from scsi layer and not
      from block layer) and these commands are flushed with DID_RESET host bytes
      thus resulting into above kernel BUG.
      
      This issue got introduced by commit dbec4c90 ("scsi: mpt3sas: lockless
      command submission").
      
      To fix this issue, we have modified the mpt3sas_scsih_scsi_lookup_get() to
      check for smid equals to zero (note: whenever any scsi cmnd is processing
      at the driver level then smid for that scsi cmnd will be non-zero, always
      it starts from one) before it returns the scmd pointer to the caller. If
      smid is zero then this function returns scmd pointer as NULL and driver
      won't flushout those scsi cmnds at driver level with DID_RESET host byte
      thus this issue will not be observed.
      
      [mkp: amended with updated fix from Sreekanth]
      Signed-off-by: NSreekanth Reddy <sreekanth.reddy@broadcom.com>
      Fixes: dbec4c90 ("scsi: mpt3sas: lockless command submission")
      Cc: stable@vger.kernel.org # v4.16+
      Reviewed-by: NTomas Henzl <thenzl@redhat.com>
      Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      e7018314
  4. 30 7月, 2018 1 次提交
  5. 20 6月, 2018 5 次提交
  6. 19 6月, 2018 4 次提交
  7. 08 5月, 2018 6 次提交
  8. 09 3月, 2018 1 次提交
  9. 07 3月, 2018 1 次提交
  10. 28 2月, 2018 3 次提交
  11. 22 2月, 2018 2 次提交
    • S
      scsi: mpt3sas: wait for and flush running commands on shutdown/unload · c666d3be
      Sreekanth Reddy 提交于
      This patch finishes all outstanding SCSI IO commands (but not other commands,
      e.g., task management) in the shutdown and unload paths.
      
      It first waits for the commands to complete (this is done after setting
      'ioc->remove_host = 1 ', which prevents new commands to be queued) then it
      flushes commands that might still be running.
      
      This avoids triggering error handling (e.g., abort command) for all commands
      possibly completed by the adapter after interrupts disabled.
      
      [mauricfo: introduced something in commit message.]
      Signed-off-by: NSreekanth Reddy <sreekanth.reddy@broadcom.com>
      Tested-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      c666d3be
    • M
      scsi: mpt3sas: fix oops in error handlers after shutdown/unload · 9ff549ff
      Mauricio Faria de Oliveira 提交于
      This patch adds checks for 'ioc->remove_host' in the SCSI error handlers, so
      not to access pointers/resources potentially freed in the PCI shutdown/module
      unload path.  The error handlers may be invoked after shutdown/unload,
      depending on other components.
      
      This problem was observed with kexec on a system with a mpt3sas based adapter
      and an infiniband adapter which takes long enough to shutdown:
      
      The mpt3sas driver finished shutting down / disabled interrupt handling, thus
      some commands have not finished and timed out.
      
      Since the system was still running (waiting for the infiniband adapter to
      shutdown), the scsi error handler for task abort of mpt3sas was invoked, and
      hit an oops -- either in scsih_abort() because 'ioc->scsi_lookup' was NULL
      without commit dbec4c90 ("scsi: mpt3sas: lockless command submission"), or
      later up in scsih_host_reset() (with or without that commit), because it
      eventually called mpt3sas_base_get_iocstate().
      
      After the above commit, the oops in scsih_abort() does not occur anymore
      (_scsih_scsi_lookup_find_by_scmd() is no longer called), but that commit is
      too big and out of the scope of linux-stable, where this patch might help, so
      still go for the changes.
      
      Also, this might help to prevent similar errors in the future, in case code
      changes and possibly tries to access freed stuff.
      
      Note the fix in scsih_host_reset() is still important anyway.
      Signed-off-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Acked-by: NSreekanth Reddy <Sreekanth.Reddy@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      9ff549ff
  12. 11 1月, 2018 6 次提交
  13. 04 1月, 2018 1 次提交
    • C
      scsi: mpt3sas: Proper handling of set/clear of "ATA command pending" flag. · f49d4aed
      Chaitra P B 提交于
      1. In IO path, setting of "ATA command pending" flag early before device
         removal, invalid device handle etc., checks causes any new commands
         to be always returned with SAM_STAT_BUSY and when the driver removes
         the drive the SML issues SYNC Cache command and that command is
         always returned with SAM_STAT_BUSY and thus making SYNC Cache command
         to requeued.
      
      2. If the driver gets an ATA PT command for a SATA drive then the driver
         set "ATA command pending" flag in device specific data structure not
         to allow any further commands until the ATA PT command is completed.
         However, after setting the flag if the driver decides to return the
         command back to upper layers without actually issuing to the firmware
         (i.e., returns from qcmd failure return paths) then the corresponding
         flag is not cleared and this prevents the driver from sending any new
         commands to the drive.
      
      This patch fixes above two issues by setting of "ATA command pending"
      flag after checking for whether device deleted, invalid device handle,
      device busy with task management. And by setting "ATA command pending"
      flag to false in all of the qcmd failure return paths after setting the
      flag.
      Signed-off-by: NChaitra P B <chaitra.basappa@broadcom.com>
      Signed-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      f49d4aed
  14. 05 12月, 2017 1 次提交
  15. 09 11月, 2017 2 次提交
  16. 04 11月, 2017 4 次提交