1. 08 8月, 2019 7 次提交
    • S
      scsi: mpt3sas: Support MEMORY MOVE Tool box command · ba630ea0
      Suganath Prabu 提交于
      Host uses the Memory Move Tool to copy data from any source/destination
      combination of system memory and IOC memory.
      
      Memory Move Tool box request contains two SGE fields, First SGE field must
      contains the source buffer details described by an MPI Simple SGE.  The
      second SGE field must contains the destination buffer details described by
      an MPI Simple SGE.
      
       Source   ->   Destination
      
      1. IOC    ->   IOC    (Both the SGE's will be filled by application)
      
      2. HOST   ->   HOST   (Both the SGE's will be filled by the host,
                     application should give sgl_offset to first SGE offset)
      
      3. IOC    ->   HOST   (Application will fill the first SGE and set the
                     sgl_offset to second SGE and hence driver fills
                     the second SGE)
      4. HOST   ->   IOC    (Application will fill IOC buffer information in the
                     first SGE and set the sgl_offset to second SGE.
                     Then driver will fill the second SGE with Host buffer
                     information and just before posting the command to the
                     firmware, driver will swap these two SGEs so that first
                     SGE contains the HOST buffer information and second SGE
                     contains the IOC information.
      
      Driver has to take care only of the 4th case, other three cases are by
      default supported by the current driver design.
      Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      ba630ea0
    • S
      scsi: mpt3sas: Allow ioctls to blocked access status NVMe · 3c090ce3
      Suganath Prabu 提交于
      If driver sees the NVMe drive with "DEVICE_BLOCKED" AccessStatus in its
      PCIe Device Page0, then driver removes the drive from its internal list and
      does not allow any IOCTL commands to be sent to the drive and will return
      the IOCTLs with "-ENODEV" status.
      
      The driver will now allow NVMe Encapsulated IOCTL issued to the NVMe device
      with an access status of DEVICE_BLOCKED. This change allows the user to
      flash new drive firmware online and revive the drive.
      
      Add NVMe device only the driver's internal list even though the device is
      in the blocked state so that the device will be visible to Apps. This way
      Apps can send NVMe Encapsulated IOCTLs to this drive and bring the drive
      online. This NVMe drive with DEVICE_BLOCKED access status won't added to
      the SML, it will be added only in the driver's internal list.
      
      [mkp: clarified desc]
      Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      3c090ce3
    • S
      scsi: mpt3sas: Enumerate SES of a managed PCIe switch · 5bb309db
      Suganath Prabu 提交于
      SES device of managed PCIe switch will be enumerated same as NVMe drives.
      
      The device info type for this SES device is
      
              MPI26_PCIE_DEVINFO_SCSI (0x4),
      
      whereas the device info type for NVMe drives is
      
              MPI26_PCIE_DEVINFO_NVME (0x3).
      
      Based on this device info type driver determines whether the device is NVMe
      drive or a SES device of a managed PCIe switch.
      
      This SES device doesn't have the PCIe device page 2 information like NVMe
      drives, so driver won't read PCIe device page 2 information for SES device.
      
      This SES device uses only IEEE SGL's, So driver build's IEEE SGL's whenever
      it receives any SCSI commands for this SES device.
      Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      5bb309db
    • S
      scsi: mpt3sas: Update MPI headers to 2.6.8 spec · 635ee6c7
      Suganath Prabu 提交于
      Updated MPI to 2.6.8 specification and header files to 2.00.54.
      Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      635ee6c7
    • S
      scsi: mpt3sas: Gracefully handle online firmware update · ffedeae1
      Suganath Prabu 提交于
      Issue:
      
      During online Firmware upgrade operations it is possible that MaxDevHandles
      filled in IOCFacts may change with new FW.  With this we may observe kernel
      panics when driver try to access the pd_handles or blocking_handles buffers
      at offset greater than the old firmware's MaxDevHandle value.
      
      Fix:
      
      _base_check_ioc_facts_changes() looks for increase/decrease in IOCFacts
      attributes during online firmware upgrade and increases the pd_handles,
      blocking_handles, etc buffer sizes to new firmware's MaxDevHandle value if
      this new firmware's MaxDevHandle value is greater than the old firmware's
      MaxDevHandle value.
      Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      ffedeae1
    • S
      scsi: mpt3sas: memset request frame before reusing · e224e03b
      Suganath Prabu 提交于
      Driver gets a request frame from the free pool of DMA-able request frames
      and fill in the required information and pass the address of the frame to
      IOC/FW to pull the complete request frame. In certain places the driver
      used the request frame allocated from the free pool without completely
      clearing the previous data stored in it. The request contents were cleared
      only for the size of the new request to be issued and that left out some
      stale data in the unused part of the request. Though the IOC/FW is not
      expected to access the request beyond the specified size, it is good
      practice to clear complete request message frame.
      
      So reinitialize the complete request message frame with 0s before using
      it.
      Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      e224e03b
    • S
      scsi: mpt3sas: Add support for PCIe Lane margin · f23ca2cb
      Suganath Prabu 提交于
      PCIe Lane margin tool box request requires IEEE sgl's and hence driver
      fills the SGL field with IEEE sgl's while issuing the PCIe Lane margin
      ioctl request to the HBA firmware.
      Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      f23ca2cb
  2. 31 7月, 2019 2 次提交
    • M
      scsi: mpt3sas: support target smid for [abort|query] task · 8f55c307
      Minwoo Im 提交于
      We can request task management IOCTL command(MPI2_FUNCTION_SCSI_TASK_MGMT)
      to /dev/mpt3ctl.  If the given task_type is either abort task or query
      task, it may need a field named "Initiator Port Transfer Tag to Manage" in
      the IU.
      
      Current code does not support to check target IPTT tag from the tm_request.
      This patch introduces to check TaskMID given from the userspace as a target
      tag.  We have a rule of relationship between
      (struct request *req->tag) and smid in mpt3sas_base.c:
      
      3318 u16
      3319 mpt3sas_base_get_smid_scsiio(struct MPT3SAS_ADAPTER *ioc, u8 cb_idx,
      3320         struct scsi_cmnd *scmd)
      3321 {
      3322         struct scsiio_tracker *request = scsi_cmd_priv(scmd);
      3323         unsigned int tag = scmd->request->tag;
      3324         u16 smid;
      3325
      3326         smid = tag + 1;
      
      So if we want to abort a request tagged #X, then we can pass (X + 1) to
      this IOCTL handler.  Otherwise, user space just can pass 0 TaskMID to abort
      the first outstanding smid which is legacy behaviour.
      
      Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
      Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>
      Cc: Sathya Prakash <sathya.prakash@broadcom.com>
      Cc: James E.J. Bottomley <jejb@linux.ibm.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: MPT-FusionLinux.pdl@broadcom.com
      Signed-off-by: NMinwoo Im <minwoo.im@samsung.com>
      Acked-by: NSreekanth Reddy <sreekanth.reddy@broadcom.com>
      Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      8f55c307
    • D
      scsi: mpt3sas: clean up a couple sizeof() uses · 1de540a9
      Dan Carpenter 提交于
      There is a copy and paste bug here.  It uses EVENT_TRIGGERS size instead of
      SCSI_TRIGGERS size but fortunately both size are 84 bytes so it doesn't
      affect runtime.
      
      These days the preferred style is to just say sizeof(object) instead of
      sizeof(type) so I have updated the function to the latest style as well.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NSreekanth Reddy <sreekanth.reddy@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      1de540a9
  3. 17 7月, 2019 1 次提交
  4. 27 6月, 2019 4 次提交
  5. 21 6月, 2019 3 次提交
  6. 19 6月, 2019 11 次提交
    • S
      scsi: mpt3sas: Update driver version to 29.100.00.00 · 895d8860
      Suganath Prabu S 提交于
      Update driver version from 28.100.00.00 to 29.100.00.00
      This is equivalent to Phase 10 OOB driver.
      Signed-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      895d8860
    • S
      scsi: mpt3sas: Introduce perf_mode module parameter · ca7e1e9d
      Suganath Prabu S 提交于
      1. Introduce module parameter perf_mode for only Aero/Sea generation HBAs.
      
      2. Update IOC page1 fields according to performance mode.
      
      Below are the performance modes that can be enabled with module parameter
      perf_mode:
      
       0: Balanced - Few high iops reply queues will be enabled.  Interrupt
          coalescing will be enabled only for these high iops reply descriptor
          queues.
      
       1: Iops - Interrupt coalescing will be enabled on all reply queues.
          Coalescing timeout is set to 0x20.This is default value for Aero.
      
       2: Latency - Interrupt coalescing will be enabled on all reply queues.
          Coalescing timeout is set to 0xA.  This is a legacy behavior similar to
          Ventura & Invader HBA series.
      
      Default perf mode set by driver will be balanced mode if the following
      conditions are met:
      
       - CPU vendor = Intel;
       - Aero controller working in 16GT/s pcie speed
      
      Performance mode will be set to latency mode for all other cases.
      
      4k Random Read IO performance numbers on 24 SAS SSD drives for above three
      permormance modes. Performance data is from Intel Skylake and HGST SS300
      (drive model SDLL1DLR400GCCA1).
      
      IOPs:
       -----------------------------------------------------------------------
        |perf_mode    | qd = 1 | qd = 64 |   note                             |
        |-------------|--------|---------|-------------------------------------
        |balanced     |  259K  |  3061k  | Provides max performance numbers   |
        |             |        |         | both on lower QD workload &        |
        |             |        |         | also on higher QD workload         |
        |-------------|--------|---------|-------------------------------------
        |iops         |  220K  |  3100k  | Provides max performance numbers   |
        |             |        |         | only on higher QD workload.        |
        |-------------|--------|---------|-------------------------------------
        |latency      |  246k  |  2226k  | Provides good performance numbers  |
        |             |        |         | only on lower QD worklaod.         |
        -----------------------------------------------------------------------
      
      Avarage Latency:
        -----------------------------------------------------
        |perf_mode    |  qd = 1      |    qd = 64           |
        |-------------|--------------|----------------------|
        |balanced     |  92.05 usec  |    501.12 usec       |
        |-------------|--------------|----------------------|
        |iops         |  108.40 usec |    498.10 usec       |
        |-------------|--------------|----------------------|
        |latency      |  97.10 usec  |    689.26 usec       |
        -----------------------------------------------------
      Signed-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      ca7e1e9d
    • S
      scsi: mpt3sas: Enable interrupt coalescing on high iops · 2426f209
      Suganath Prabu S 提交于
      Enable interrupt coalescing only on high iops queues.
      
      In ioc config page 1, offset 0x14 (ProductSpecific field) is used to
      determine interrupt coalescing enabled/disabled on per reply descriptor
      post queue group(8) basis.  If 31st bit is zero, then interrupt coalescing
      is enabled for all reply descriptor post queues. If 31st bit is set to one,
      then user can enable/disable interrupt coalescing on per reply descriptor
      post queue group(8) basis. So to enable interrupt coalescing only on first
      reply descriptor post queue group (i.e. on high iops queues), set bit 0 and
      31.
      
      This configuration should reset during driver unload or shutdown to the
      default settings. For this, the driver takes copy of default ioc page 1 and
      copies back the default or unmodified ioc page1 during unload and
      shutdown. This means that on next driver load (e.g. if older version driver
      is loaded by user), current modified changes on ioc page1 won't take
      effect.
      Signed-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      2426f209
    • S
      scsi: mpt3sas: Affinity high iops queues IRQs to local node · 728bbc6c
      Suganath Prabu S 提交于
      High iops queues are mapped to non-managed irqs. Set affinity of
      non-managed irqs to local numa node.  Low latency queues are mapped to
      managed irqs.
      
      Driver reserves some reply queues for max iops (through
      pci_alloc_irq_vectors_affinity and .pre_vectors interface). The rest of
      queues are for low latency.
      
      Based on io workload in io submission path, driver will decide which group
      of reply queues (either high iops queues or low latency queues) to be
      used. High iops queues will be mapped to local numa node of controller and
      low latency queues will be mapped to cpus across numa nodes. In general,
      high iops and low latency queues should fit into 128 reply queues
      which is the max number of reply queues supported by Aero/Sea.
      Signed-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      728bbc6c
    • S
      scsi: mpt3sas: save and use MSI-X index for posting RD · 998c3001
      Suganath Prabu S 提交于
      In the IO submission path _base_get_msix_index is called twice. Initially
      while getting the smid and subsequently while posting the request
      descriptor (RD).
      
      Refactor code to query msix index only while posting the request
      descriptor. Save determined msix index in msix_io field.
      Signed-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      998c3001
    • S
      scsi: mpt3sas: Use high iops queues under some circumstances · 5dd48a55
      Suganath Prabu S 提交于
      The driver will use round-robin method for io submission in batches within
      the high iops queues when the number of in-flight ios on the target device
      is larger than 8. Otherwise the driver will use low latency reply queues.
      Signed-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      5dd48a55
    • S
      scsi: mpt3sas: change _base_get_msix_index prototype · 02136516
      Suganath Prabu S 提交于
      Code refactoring.
      
      In function _base_get_msix_index, add scmd as second argument. This change
      is made in preparation for the next patch where we introduce a new function
      to get the MSI-X index for high iops queues.
      Signed-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      02136516
    • S
      scsi: mpt3sas: Add flag high_iops_queues · 18fd3d8c
      Suganath Prabu S 提交于
      Aero controllers support balanced performance mode through the ability to
      configure queues with different properties.
      
      Reply queues with interrupt coalescing enabled are called "high iops reply
      queues" and reply queues with interrupt coalescing disabled are called "low
      latency reply queues".
      
      The driver configures a combination of high iops and low latency reply
      queues if:
      
       - HBA is an AERO controller;
      
       - MSI-X vectors supported by the HBA is 128;
      
       - Total CPU count in the system more than high iops queue count;
      
       - Driver is loaded with default max_msix_vectors module parameter; and
      
       - System booted in non-kdump mode.
      Signed-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      18fd3d8c
    • S
      scsi: mpt3sas: Add Atomic RequestDescriptor support on Aero · 79c74d03
      Suganath Prabu S 提交于
      If the Aero HBA supports Atomic Request Descriptors, it sets the Atomic
      Request Descriptor Capable bit in the IOCCapabilities field of the IOCFacts
      Reply message. Driver uses an Atomic Request Descriptor as an alternative
      method for posting an entry onto a request queue.
      
      The posting of an Atomic Request Descriptor is an atomic operation,
      providing a safe mechanism for multiple processors on the host to post
      requests without synchronization. This Atomic Request Descriptor format is
      identical to first 32 bits of Default Request Descriptor and uses only 32
      bits.
      Signed-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      79c74d03
    • S
      scsi: mpt3sas: function pointers of request descriptor · 078a4cc1
      Suganath Prabu S 提交于
      This code refactoring introduces function pointers.
      
      Host uses Request Descriptors of different types for posting an entry onto
      a request queue. Based on controller type and capabilities, host can also
      use atomic descriptors other than normal descriptors.  Using function
      pointer will avoid if-else statements
      Signed-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      078a4cc1
    • G
      scsi: mpt3sas_ctl: fix double-fetch bug in _ctl_ioctl_main() · f9e3ebee
      Gen Zhang 提交于
      In _ctl_ioctl_main(), 'ioctl_header' is fetched the first time from
      userspace. 'ioctl_header.ioc_number' is then checked. The legal result is
      saved to 'ioc'. Then, in condition MPT3COMMAND, the whole struct is fetched
      again from the userspace. Then _ctl_do_mpt_command() is called, 'ioc' and
      'karg' as inputs.
      
      However, a malicious user can change the 'ioc_number' between the two
      fetches, which will cause a potential security issues.  Moreover, a
      malicious user can provide a valid 'ioc_number' to pass the check in first
      fetch, and then modify it in the second fetch.
      
      To fix this, we need to recheck the 'ioc_number' in the second fetch.
      Signed-off-by: NGen Zhang <blackgod016574@gmail.com>
      Acked-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      f9e3ebee
  7. 08 4月, 2019 1 次提交
    • W
      drivers: Remove explicit invocations of mmiowb() · fb24ea52
      Will Deacon 提交于
      mmiowb() is now implied by spin_unlock() on architectures that require
      it, so there is no reason to call it from driver code. This patch was
      generated using coccinelle:
      
      	@mmiowb@
      	@@
      	- mmiowb();
      
      and invoked as:
      
      $ for d in drivers include/linux/qed sound; do \
      spatch --include-headers --sp-file mmiowb.cocci --dir $d --in-place; done
      
      NOTE: mmiowb() has only ever guaranteed ordering in conjunction with
      spin_unlock(). However, pairing each mmiowb() removal in this patch with
      the corresponding call to spin_unlock() is not at all trivial, so there
      is a small chance that this change may regress any drivers incorrectly
      relying on mmiowb() to order MMIO writes between CPUs using lock-free
      synchronisation. If you've ended up bisecting to this commit, you can
      reintroduce the mmiowb() calls using wmb() instead, which should restore
      the old behaviour on all architectures other than some esoteric ia64
      systems.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      fb24ea52
  8. 28 3月, 2019 1 次提交
  9. 26 3月, 2019 1 次提交
    • S
      scsi: mpt3sas: Fix kernel panic during expander reset · c2fe742f
      Sreekanth Reddy 提交于
      During expander reset handling, the driver invokes kernel function
      scsi_host_find_tag() to obtain outstanding requests associated with the
      scsi host managed by the driver. Driver loops from tag value zero to hba
      queue depth to obtain the outstanding scmds. But when blk-mq is enabled,
      the block layer may return stale entry for one or more requests. This may
      lead to kernel panic if the returned value is inaccessible or the memory
      pointed by the returned value is reused.
      
      Reference of upstream discussion:
      
      	https://patchwork.kernel.org/patch/10734933/
      
      Instead of calling scsi_host_find_tag() API for each and every smid (smid
      is tag +1) from one to shost->can_queue, now driver will call this API (to
      obtain the outstanding scmd) only for those smid's which are outstanding at
      the driver level.
      
      Driver will determine whether this smid is outstanding at driver level by
      looking into it's corresponding MPI request frame, if its MPI request frame
      is empty, then it means that this smid is free and does not need to call
      scsi_host_find_tag() for it.  By doing this, driver will invoke
      scsi_host_find_tag() for only those tags which are outstanding at the
      driver level.
      
      Driver will check whether particular MPI request frame is empty or not by
      looking into the "DevHandle" field. If this field is zero then it means
      that this MPI request is empty. For active MPI request DevHandle must be
      non-zero.
      
      Also driver will memset the MPI request frame once the corresponding scmd
      is processed (i.e. just before calling
      scmd->done function).
      Signed-off-by: NSreekanth Reddy <sreekanth.reddy@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      c2fe742f
  10. 19 3月, 2019 6 次提交
    • S
      scsi: mpt3sas: Update mpt3sas driver version to 28.100.00.00 · 4bcb298e
      Suganath Prabu 提交于
      Updated driver version to 28.100.00.00, which is equivalent to OOB Phase 9.
      Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      4bcb298e
    • S
      scsi: mpt3sas: Improve the threshold value and introduce module param · 288addd6
      Suganath Prabu 提交于
      * Reduce the threshold value to 1/4 of the queue depth.
      
      * With this FW can find enough entries to post the Reply Descriptors in the
        reply descriptor post queue.
      
      * With module param, user can play with threshold value, the same
        irqpoll_weight is used as the budget in processing of reply descriptor
        post queues in _base_process_reply_queue.
      Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      288addd6
    • S
      scsi: mpt3sas: Load balance to improve performance and avoid soft lockups · 51e3b2ad
      Suganath Prabu 提交于
      Driver uses "reply descriptor post queues" in round robin fashion so that
      IO's are distributed to all the available reply descriptor post queues
      equally.  With this each reply descriptor post queue load is balanced.
      
      This is enabled only if CPUs count to MSI-X vector count ratio is X:1
      (where X > 1) This improves performance and also fixes soft lockups.
      Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      51e3b2ad
    • S
      scsi: mpt3sas: Irq poll to avoid CPU hard lockups · 320e77ac
      Suganath Prabu 提交于
      Issue Description:
      We have seen cpu lock up issue from fields if system has greater (more than
      96) logical cpu count.  SAS3.0 controller (Invader series) supports at max
      96 msix vector and SAS3.5 product (Ventura) supports at max 128 msix
      vectors.
      
      This may be a generic issue (if PCI device supports completion on multiple
      reply queues).  Let me explain it w.r.t to mpt3sas supported h/w just to
      simplify the problem and possible changes to handle such issues. IT HBA
      (mpt3sas) supports multiple reply queues in completion path. Driver creates
      MSI-x vectors for controller as "min of (FW supported Reply queue, Logical
      CPUs)". If submitter is not interrupted via completion on same CPU, there
      is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
      timeout, system sluggish etc.
      
      Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
      (e.g. CPU B) is busy with processing the corresponding IO's reply
      descriptors from reply descriptor queue upon receiving the interrupts from
      HBA.  If the CPU A is continuously pumping the IOs then always CPU B (which
      is executing the ISR) will see the valid reply descriptors in the reply
      descriptor queue and it will be continuously processing those reply
      descriptor in a loop without quitting the ISR handler.
      
      Mpt3sas driver will exit ISR handler if it finds unused reply descriptor in
      the reply descriptor queue. Since CPU A will be continuously sending the
      IOs, CPU B may always see a valid reply descriptor (posted by HBA Firmware
      after processing the IO) in the reply descriptor queue. In worst case,
      driver will not quit from this loop in the ISR handler. Eventually, CPU
      lockup will be detected by watchdog.
      
      Above mentioned behavior is not common if "rq_affinity" set to 2 or
      affinity_hint is honored by irqbalance as "exact". If rq_affinity is set
      to 2, submitter will be always interrupted via completion on same CPU.  If
      irqbalance is using "exact" policy, interrupt will be delivered to
      submitter CPU.
      
      If CPU counts to MSI-X vectors (reply descriptor Queues) count ratio is not
      1:1, we still have exposure of issue explained above and for that we don't
      have any solution.
      
      Exposure of soft/hard lockup if CPU count is more than MSI-x supported by
      device.
      
      If CPUs count to MSI-x vectors count ratio is not 1:1, (Other way, if CPU
      counts to MSI-x vector count ratio is something like X:1, where X > 1) then
      'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
      hard/soft lockups. There won't be any one to one mapping between CPU to
      MSI-x vector instead one MSI-x interrupt (or reply descriptor queue) is
      shared with group/set of CPUs and there is a possibility of having a loop
      in the IO path within that CPU group and may observe lockups.
      
      For example: Consider a system having two NUMA nodes and each node having
      four logical CPUs and also consider that number of MSI-x vectors enabled on
      the HBA is two, then CPUs count to MSI-x vector count ratio as 4:1.  e.g.
      MSIx vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
      MSI-x vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.
      
      numactl --hardware
      available: 2 nodes (0-1)
      node 0 cpus: 0 1 2 3                 --> MSI-x 0
      node 0 size: 65536 MB
      node 0 free: 63176 MB
      node 1 cpus: 4 5 6 7                 -->MSI-x 1
      node 1 size: 65536 MB
      node 1 free: 63176 MB
      
      Assume that user started an application which uses all the CPUs of NUMA
      node 0 for issuing the IOs.  Only one CPU from affinity list (it can be any
      cpu since this behavior depends upon irqbalance) CPU0 will receive the
      interrupts from MSIx vector 0 for all the IOs. Eventually, CPU 0 IO
      submission percentage will be decreasing and ISR processing percentage will
      be increasing as it is more busy with processing the interrupts.  Gradually
      IO submission percentage on CPU 0 will be zero and it's ISR processing
      percentage will be 100 percentage as IO loop has already formed within the
      NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
      submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
      always find the valid reply descriptor in the reply descriptor
      queue. Eventually, we will observe the hard lockup here.
      
      Chances of occurring of hard/soft lockups are directly proportional to
      value of X. If value of X is high, then chances of observing CPU lockups is
      high.
      
      Solution: Use IRQ poll interface defined in " irq_poll.c".  mpt3sas driver
      will execute ISR routine in Softirq context and it will always quit the
      loop based on budget provided in IRQ poll interface.
      
      In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
      X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
      voluntary exit from the reply queue processing based on budget.  Note -
      Only one MSI-x vector is busy doing processing.
      
      Irqstat output:
      
      IRQs / 1 second(s)
      IRQ#  TOTAL  NODE0   NODE1   NODE2   NODE3  NAME
        44    122871   122871   0       0       0  IR-PCI-MSI-edge mpt3sas0-msix0
        45        0              0           0       0       0  IR-PCI-MSI-edge mpt3sas0-msix1
      
      We use this approach only if cpu count is more than FW supported MSI-x
      vector
      Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      320e77ac
    • S
      scsi: mpt3sas: simplify interrupt handler · 233af108
      Suganath Prabu 提交于
      Separate out processing of reply descriptor post queue from _base_interrupt
      to _base_process_reply_queue.
      Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      233af108
    • S
      scsi: mpt3sas: Fix typo in request_desript_type · 2c063507
      Suganath Prabu 提交于
      Fixed typo in request_desript_type.
      request_desript_type --> request_descript_type.
      Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      2c063507
  11. 27 2月, 2019 1 次提交
    • G
      scsi: mpt3sas: Add missing breaks in switch statements · 7850b51b
      Gustavo A. R. Silva 提交于
      Fix the following warnings by adding the proper missing breaks:
      
      drivers/scsi/mpt3sas/mpt3sas_base.c: In function  _base_display_OEMs_branding :
      drivers/scsi/mpt3sas/mpt3sas_base.c:3548:4: warning: this statement may fall through [-Wimplicit-fallthrough=]
          switch (ioc->pdev->subsystem_device) {
          ^~~~~~
      drivers/scsi/mpt3sas/mpt3sas_base.c:3566:3: note: here
         case MPI2_MFGPAGE_DEVID_SAS2308_2:
         ^~~~
      drivers/scsi/mpt3sas/mpt3sas_base.c:3567:4: warning: this statement may fall through [-Wimplicit-fallthrough=]
          switch (ioc->pdev->subsystem_device) {
          ^~~~~~
      drivers/scsi/mpt3sas/mpt3sas_base.c:3601:3: note: here
         case MPI25_MFGPAGE_DEVID_SAS3008:
         ^~~~
      drivers/scsi/mpt3sas/mpt3sas_base.c:3735:4: warning: this statement may fall through [-Wimplicit-fallthrough=]
          switch (ioc->pdev->subsystem_device) {
          ^~~~~~
      drivers/scsi/mpt3sas/mpt3sas_base.c:3745:3: note: here
         case MPI2_MFGPAGE_DEVID_SAS2308_2:
         ^~~~
      drivers/scsi/mpt3sas/mpt3sas_base.c:3746:4: warning: this statement may fall through [-Wimplicit-fallthrough=]
          switch (ioc->pdev->subsystem_device) {
          ^~~~~~
      drivers/scsi/mpt3sas/mpt3sas_base.c:3768:3: note: here
         default:
         ^~~~~~~
      
      Warning level 3 was used: -Wimplicit-fallthrough=3
      
      This patch is part of the ongoing efforts to enable
      -Wimplicit-fallthrough.
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      7850b51b
  12. 05 2月, 2019 2 次提交