1. 27 12月, 2019 1 次提交
  2. 16 7月, 2018 1 次提交
  3. 02 7月, 2018 4 次提交
  4. 01 6月, 2018 1 次提交
  5. 15 5月, 2018 2 次提交
  6. 14 3月, 2018 1 次提交
  7. 13 3月, 2018 4 次提交
  8. 24 1月, 2018 1 次提交
  9. 23 11月, 2017 1 次提交
  10. 06 11月, 2017 1 次提交
    • V
      cxl: Rework the implementation of cxl_stop_trace_psl9() · cbb55eeb
      Vaibhav Jain 提交于
      Presently the PSL9 specific cxl_stop_trace_psl9() only stops the RX0
      traces on the CXL adapter when a PSL error irq is triggered. The patch
      updates the function to stop all the traces arrays and move them to
      the FIN state. The implementation issues the mmio to TRACECFG register
      to stop the trace array iff it already not in FIN state. This prevents
      the issue of trace data being reset in case of multiple stop mmio
      issued for a single trace array.
      
      Also the patch does some refactoring of existing cxl_stop_trace_psl9()
      and cxl_stop_trace_psl8() functions by moving them to 'pci.c' from
      'debugfs.c' file and marking them as static.
      Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Acked-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cbb55eeb
  11. 13 10月, 2017 1 次提交
  12. 06 10月, 2017 1 次提交
  13. 14 9月, 2017 1 次提交
    • M
      mm: treewide: remove GFP_TEMPORARY allocation flag · 0ee931c4
      Michal Hocko 提交于
      GFP_TEMPORARY was introduced by commit e12ba74d ("Group short-lived
      and reclaimable kernel allocations") along with __GFP_RECLAIMABLE.  It's
      primary motivation was to allow users to tell that an allocation is
      short lived and so the allocator can try to place such allocations close
      together and prevent long term fragmentation.  As much as this sounds
      like a reasonable semantic it becomes much less clear when to use the
      highlevel GFP_TEMPORARY allocation flag.  How long is temporary? Can the
      context holding that memory sleep? Can it take locks? It seems there is
      no good answer for those questions.
      
      The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
      __GFP_RECLAIMABLE which in itself is tricky because basically none of
      the existing caller provide a way to reclaim the allocated memory.  So
      this is rather misleading and hard to evaluate for any benefits.
      
      I have checked some random users and none of them has added the flag
      with a specific justification.  I suspect most of them just copied from
      other existing users and others just thought it might be a good idea to
      use without any measuring.  This suggests that GFP_TEMPORARY just
      motivates for cargo cult usage without any reasoning.
      
      I believe that our gfp flags are quite complex already and especially
      those with highlevel semantic should be clearly defined to prevent from
      confusion and abuse.  Therefore I propose dropping GFP_TEMPORARY and
      replace all existing users to simply use GFP_KERNEL.  Please note that
      SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
      so they will be placed properly for memory fragmentation prevention.
      
      I can see reasons we might want some gfp flag to reflect shorterm
      allocations but I propose starting from a clear semantic definition and
      only then add users with proper justification.
      
      This was been brought up before LSF this year by Matthew [1] and it
      turned out that GFP_TEMPORARY really doesn't have a clear semantic.  It
      seems to be a heuristic without any measured advantage for most (if not
      all) its current users.  The follow up discussion has revealed that
      opinions on what might be temporary allocation differ a lot between
      developers.  So rather than trying to tweak existing users into a
      semantic which they haven't expected I propose to simply remove the flag
      and start from scratch if we really need a semantic for short term
      allocations.
      
      [1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org
      
      [akpm@linux-foundation.org: fix typo]
      [akpm@linux-foundation.org: coding-style fixes]
      [sfr@canb.auug.org.au: drm/i915: fix up]
        Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
      Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ee931c4
  14. 03 7月, 2017 1 次提交
  15. 23 6月, 2017 1 次提交
  16. 02 5月, 2017 2 次提交
    • V
      cxl: Route eeh events to all drivers in cxl_pci_error_detected() · 4f58f0bf
      Vaibhav Jain 提交于
      Fix a boundary condition where in some cases an eeh event that results
      in card reset isn't passed on to a driver attached to the virtual PCI
      device associated with a slice. This will happen in case when a slice
      attached device driver returns a value other than
      PCI_ERS_RESULT_NEED_RESET from the eeh error_detected() callback. This
      would result in an early return from cxl_pci_error_detected() and
      other drivers attached to other AFUs on the card wont be notified.
      
      The patch fixes this by making sure that all slice attached
      device-drivers are notified and the return values from
      error_detected() callback are aggregated in a scheme where request for
      'disconnect' trumps all and 'none' trumps 'need_reset'.
      
      Fixes: 9e8df8a2 ("cxl: EEH support")
      Cc: stable@vger.kernel.org # v4.3+
      Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4f58f0bf
    • V
      cxl: Force context lock during EEH flow · ea9a26d1
      Vaibhav Jain 提交于
      During an eeh event when the cxl card is fenced and card sysfs attr
      perst_reloads_same_image is set following warning message is seen in the
      kernel logs:
      
        Adapter context unlocked with 0 active contexts
        ------------[ cut here ]------------
        WARNING: CPU: 12 PID: 627 at
        ../drivers/misc/cxl/main.c:325 cxl_adapter_context_unlock+0x60/0x80 [cxl]
      
      Even though this warning is harmless, it clutters the kernel log
      during an eeh event. This warning is triggered as the EEH callback
      cxl_pci_error_detected doesn't obtain a context-lock before forcibly
      detaching all active context and when context-lock is released during
      call to cxl_configure_adapter from cxl_pci_slot_reset, a warning in
      cxl_adapter_context_unlock is triggered.
      
      To fix this warning, we acquire the adapter context-lock via
      cxl_adapter_context_lock() in the eeh callback
      cxl_pci_error_detected() once all the virtual AFU PHBs are notified
      and their contexts detached. The context-lock is released in
      cxl_pci_slot_reset() after the adapter is successfully reconfigured
      and before the we call the slot_reset callback on slice attached
      device-drivers.
      
      Fixes: 70b565bb ("cxl: Prevent adapter reset if an active context exists")
      Cc: stable@vger.kernel.org # v4.9+
      Reported-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Acked-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Reviewed-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Tested-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ea9a26d1
  17. 19 4月, 2017 1 次提交
  18. 13 4月, 2017 5 次提交
  19. 20 3月, 2017 1 次提交
  20. 21 2月, 2017 1 次提交
    • A
      cxl: fix nested locking hang during EEH hotplug · 171ed0fc
      Andrew Donnellan 提交于
      Commit 14a3ae34 ("cxl: Prevent read/write to AFU config space while AFU
      not configured") introduced a rwsem to fix an invalid memory access that
      occurred when someone attempts to access the config space of an AFU on a
      vPHB whilst the AFU is deconfigured, such as during EEH recovery.
      
      It turns out that it's possible to run into a nested locking issue when EEH
      recovery fails and a full device hotplug is required.
      cxl_pci_error_detected() deconfigures the AFU, taking a writer lock on
      configured_rwsem. When EEH recovery fails, the EEH code calls
      pci_hp_remove_devices() to remove the device, which in turn calls
      cxl_remove() -> cxl_pci_remove_afu() -> pci_deconfigure_afu(), which tries
      to grab the writer lock that's already held.
      
      Standard rwsem semantics don't express what we really want to do here and
      don't allow for nested locking. Fix this by replacing the rwsem with an
      atomic_t which we can control more finely. Allow the AFU to be locked
      multiple times so long as there are no readers.
      
      Fixes: 14a3ae34 ("cxl: Prevent read/write to AFU config space while AFU not configured")
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Acked-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      171ed0fc
  21. 25 1月, 2017 2 次提交
  22. 18 11月, 2016 1 次提交
  23. 19 10月, 2016 1 次提交
    • V
      cxl: Prevent adapter reset if an active context exists · 70b565bb
      Vaibhav Jain 提交于
      This patch prevents resetting the cxl adapter via sysfs in presence of
      one or more active cxl_context on it. This protects against an
      unrecoverable error caused by PSL owning a dirty cache line even after
      reset and host tries to touch the same cache line. In case a force reset
      of the card is required irrespective of any active contexts, the int
      value -1 can be stored in the 'reset' sysfs attribute of the card.
      
      The patch introduces a new atomic_t member named contexts_num inside
      struct cxl that holds the number of active context attached to the card
      , which is checked against '0' before proceeding with the reset. To
      prevent against a race condition where a context is activated just after
      reset check is performed, the contexts_num is atomically set to '-1'
      after reset-check to indicate that no more contexts can be activated on
      the card anymore.
      
      Before activating a context we atomically test if contexts_num is
      non-negative and if so, increment its value by one. In case the value of
      contexts_num is negative then it indicates that the card is about to be
      reset and context activation is error-ed out at that point.
      
      Fixes: 62fa19d4 ("cxl: Add ability to reset the card")
      Cc: stable@vger.kernel.org # v4.0+
      Acked-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com>
      Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      70b565bb
  24. 04 10月, 2016 1 次提交
  25. 13 9月, 2016 1 次提交
  26. 10 8月, 2016 1 次提交
  27. 09 8月, 2016 1 次提交