1. 28 5月, 2019 13 次提交
  2. 22 5月, 2019 16 次提交
  3. 15 5月, 2019 6 次提交
    • J
      mm/mmu_notifier: convert user range->blockable to helper function · dfcd6660
      Jérôme Glisse 提交于
      Use the mmu_notifier_range_blockable() helper function instead of directly
      dereferencing the range->blockable field.  This is done to make it easier
      to change the mmu_notifier range field.
      
      This patch is the outcome of the following coccinelle patch:
      
      %<-------------------------------------------------------------------
      @@
      identifier I1, FN;
      @@
      FN(..., struct mmu_notifier_range *I1, ...) {
      <...
      -I1->blockable
      +mmu_notifier_range_blockable(I1)
      ...>
      }
      ------------------------------------------------------------------->%
      
      spatch --in-place --sp-file blockable.spatch --dir .
      
      Link: http://lkml.kernel.org/r/20190326164747.24405-3-jglisse@redhat.comSigned-off-by: NJérôme Glisse <jglisse@redhat.com>
      Reviewed-by: NRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Ross Zwisler <zwisler@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Christian Koenig <christian.koenig@amd.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dfcd6660
    • I
      IB/mthca: use the new FOLL_LONGTERM flag to get_user_pages_fast() · f3b4fdb1
      Ira Weiny 提交于
      Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against FS
      DAX pages being mapped.
      
      Link: http://lkml.kernel.org/r/20190328084422.29911-8-ira.weiny@intel.com
      Link: http://lkml.kernel.org/r/20190317183438.2057-8-ira.weiny@intel.comSigned-off-by: NIra Weiny <ira.weiny@intel.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Mike Marshall <hubcap@omnibond.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f3b4fdb1
    • I
      IB/qib: use the new FOLL_LONGTERM flag to get_user_pages_fast() · 664b21e7
      Ira Weiny 提交于
      Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against FS
      DAX pages being mapped.
      
      Link: http://lkml.kernel.org/r/20190328084422.29911-7-ira.weiny@intel.com
      Link: http://lkml.kernel.org/r/20190317183438.2057-7-ira.weiny@intel.comSigned-off-by: NIra Weiny <ira.weiny@intel.com>
      Reviewed-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Mike Marshall <hubcap@omnibond.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      664b21e7
    • I
      IB/hfi1: use the new FOLL_LONGTERM flag to get_user_pages_fast() · 9fdf4aa1
      Ira Weiny 提交于
      Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against FS
      DAX pages being mapped.
      
      [ira.weiny@intel.com: v3]
        Link: http://lkml.kernel.org/r/20190328084422.29911-6-ira.weiny@intel.com
      Link: http://lkml.kernel.org/r/20190328084422.29911-6-ira.weiny@intel.com
      Link: http://lkml.kernel.org/r/20190317183438.2057-6-ira.weiny@intel.comSigned-off-by: NIra Weiny <ira.weiny@intel.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Mike Marshall <hubcap@omnibond.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9fdf4aa1
    • I
      mm/gup: change GUP fast to use flags rather than a write 'bool' · 73b0140b
      Ira Weiny 提交于
      To facilitate additional options to get_user_pages_fast() change the
      singular write parameter to be gup_flags.
      
      This patch does not change any functionality.  New functionality will
      follow in subsequent patches.
      
      Some of the get_user_pages_fast() call sites were unchanged because they
      already passed FOLL_WRITE or 0 for the write parameter.
      
      NOTE: It was suggested to change the ordering of the get_user_pages_fast()
      arguments to ensure that callers were converted.  This breaks the current
      GUP call site convention of having the returned pages be the final
      parameter.  So the suggestion was rejected.
      
      Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
      Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.comSigned-off-by: NIra Weiny <ira.weiny@intel.com>
      Reviewed-by: NMike Marshall <hubcap@omnibond.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73b0140b
    • I
      mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM · 932f4a63
      Ira Weiny 提交于
      Pach series "Add FOLL_LONGTERM to GUP fast and use it".
      
      HFI1, qib, and mthca, use get_user_pages_fast() due to its performance
      advantages.  These pages can be held for a significant time.  But
      get_user_pages_fast() does not protect against mapping FS DAX pages.
      
      Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which
      retains the performance while also adding the FS DAX checks.  XDP has also
      shown interest in using this functionality.[1]
      
      In addition we change get_user_pages() to use the new FOLL_LONGTERM flag
      and remove the specialized get_user_pages_longterm call.
      
      [1] https://lkml.org/lkml/2019/3/19/939
      
      "longterm" is a relative thing and at this point is probably a misnomer.
      This is really flagging a pin which is going to be given to hardware and
      can't move.  I've thought of a couple of alternative names but I think we
      have to settle on if we are going to use FL_LAYOUT or something else to
      solve the "longterm" problem.  Then I think we can change the flag to a
      better name.
      
      Secondly, it depends on how often you are registering memory.  I have
      spoken with some RDMA users who consider MR in the performance path...
      For the overall application performance.  I don't have the numbers as the
      tests for HFI1 were done a long time ago.  But there was a significant
      advantage.  Some of which is probably due to the fact that you don't have
      to hold mmap_sem.
      
      Finally, architecturally I think it would be good for everyone to use
      *_fast.  There are patches submitted to the RDMA list which would allow
      the use of *_fast (they reworking the use of mmap_sem) and as soon as they
      are accepted I'll submit a patch to convert the RDMA core as well.  Also
      to this point others are looking to use *_fast.
      
      As an aside, Jasons pointed out in my previous submission that *_fast and
      *_unlocked look very much the same.  I agree and I think further cleanup
      will be coming.  But I'm focused on getting the final solution for DAX at
      the moment.
      
      This patch (of 7):
      
      This patch starts a series which aims to support FOLL_LONGTERM in
      get_user_pages_fast().  Some callers who would like to do a longterm (user
      controlled pin) of pages with the fast variant of GUP for performance
      purposes.
      
      Rather than have a separate get_user_pages_longterm() call, introduce
      FOLL_LONGTERM and change the longterm callers to use it.
      
      This patch does not change any functionality.  In the short term
      "longterm" or user controlled pins are unsafe for Filesystems and FS DAX
      in particular has been blocked.  However, callers of get_user_pages_fast()
      were not "protected".
      
      FOLL_LONGTERM can _only_ be supported with get_user_pages[_fast]() as it
      requires vmas to determine if DAX is in use.
      
      NOTE: In merging with the CMA changes we opt to change the
      get_user_pages() call in check_and_migrate_cma_pages() to a call of
      __get_user_pages_locked() on the newly migrated pages.  This makes the
      code read better in that we are calling __get_user_pages_locked() on the
      pages before and after a potential migration.
      
      As a side affect some of the interfaces are cleaned up but this is not the
      primary purpose of the series.
      
      In review[1] it was asked:
      
      <quote>
      > This I don't get - if you do lock down long term mappings performance
      > of the actual get_user_pages call shouldn't matter to start with.
      >
      > What do I miss?
      
      A couple of points.
      
      First "longterm" is a relative thing and at this point is probably a
      misnomer.  This is really flagging a pin which is going to be given to
      hardware and can't move.  I've thought of a couple of alternative names
      but I think we have to settle on if we are going to use FL_LAYOUT or
      something else to solve the "longterm" problem.  Then I think we can
      change the flag to a better name.
      
      Second, It depends on how often you are registering memory.  I have spoken
      with some RDMA users who consider MR in the performance path...  For the
      overall application performance.  I don't have the numbers as the tests
      for HFI1 were done a long time ago.  But there was a significant
      advantage.  Some of which is probably due to the fact that you don't have
      to hold mmap_sem.
      
      Finally, architecturally I think it would be good for everyone to use
      *_fast.  There are patches submitted to the RDMA list which would allow
      the use of *_fast (they reworking the use of mmap_sem) and as soon as they
      are accepted I'll submit a patch to convert the RDMA core as well.  Also
      to this point others are looking to use *_fast.
      
      As an asside, Jasons pointed out in my previous submission that *_fast and
      *_unlocked look very much the same.  I agree and I think further cleanup
      will be coming.  But I'm focused on getting the final solution for DAX at
      the moment.
      
      </quote>
      
      [1] https://lore.kernel.org/lkml/20190220180255.GA12020@iweiny-DESK2.sc.intel.com/T/#md6abad2569f3bf6c1f03686c8097ab6563e94965
      
      [ira.weiny@intel.com: v3]
        Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
      Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
      Link: http://lkml.kernel.org/r/20190317183438.2057-2-ira.weiny@intel.comSigned-off-by: NIra Weiny <ira.weiny@intel.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Mike Marshall <hubcap@omnibond.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      932f4a63
  4. 14 5月, 2019 3 次提交
  5. 08 5月, 2019 2 次提交
    • L
      RDMA/ipoib: Allow user space differentiate between valid dev_port · b79656ed
      Leon Romanovsky 提交于
      Systemd triggers the following warning during IPoIB device load:
      
       mlx5_core 0000:00:0c.0 ib0: "systemd-udevd" wants to know my dev_id.
              Should it look at dev_port instead?
              See Documentation/ABI/testing/sysfs-class-net for more info.
      
      This is caused due to user space attempt to differentiate old systems
      without dev_port and new systems with dev_port. In case dev_port will be
      zero, the systemd will try to read dev_id instead.
      
      There is no need to print a warning in such case, because it is valid
      situation and it is needed to ensure systemd compatibility with old
      kernels.
      
      Link: https://github.com/systemd/systemd/blob/master/src/udev/udev-builtin-net_id.c#L358
      Cc: <stable@vger.kernel.org> # 4.19
      Fixes: f6350da4 ("IB/ipoib: Log sysfs 'dev_id' accesses from userspace")
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      b79656ed
    • D
      IB/core, ipoib: Do not overreact to SM LID change event · ba7d8117
      Dennis Dalessandro 提交于
      When IPoIB receives an SM LID change event, it reacts by flushing its
      path record cache and rejoining multicast groups. This is the same
      behavior it performs when it receives a reregistration event. This
      behavior is unnecessary as an SM may have database backup or
      synchronization mechanisms which permit the SM location or LID to change
      without loss of multicast membership and without impact to path records.
      
      Both opensm and the OPA FM issue reregistration events if a new SM is
      started (or restarted with a new config) or an SM event occurs which
      results in loss of multicast membership records by the SM (such as
      opensm failover) or the SM encounters new nodes with Active ports (such
      as after joining 2 fabrics by connecting switches via ISLs). Hence this
      event can be depended on as the trigger for IPoIB cache and multicast
      flushing.
      
      It appears that some drivers, such as qib, and hfi1 issue the
      IB_EVENT_SM_CHANGE but other drivers such as mlx4 and mlx5 do not.
      Empirical testing on Mellanox EDR using ibv_asyncwatch has confirmed
      that Mellanox EDR HCAs do not generate SM change events and that opensm
      does generate reregistration.
      
      An SM LID change event is generated by the mentioned drivers to reflect
      that sm_lid and/or sm_sl in the local port info has changed. The intent
      of this event is to permit applications and ULPs which have a local copy
      of this information (or an address handle using it) to update their
      information.
      
      The intent is that the reregistration event (caused by the SM via a bit
      in Set(PortInfo)) be used to inform nodes that they need to rejoin
      multicast groups, resubscribe for notices and potentially update path
      records.
      
      When an SM migrates or fails over, a SM LID change event can occur. In
      response IPoIB discards path records and multicast membership and loses
      connectivity until these records are restored via SA requests. In very
      large fabrics, it may take minutes for the SM to be ready and for the SA
      responses to be supplied.  This can result in undesirable and
      unnecessary IPoIB connectivity impacts. It also can result in an
      unnecessary storm of SA queries from all nodes in a cluster potentially
      followed by yet another storm if the SM issues the reregistration
      request.
      
      The fact the Mellanox HCAs do not even generate this event, is further
      evidence that on modern IB fabrics there will be no ill side effects
      from the proposed changes below to reduce the reaction by 3 kernel
      components to this event. So these changes should be benign for Mellanox
      IB fabrics and will benefit OPA fabrics while also making ib_core and
      ULP behavor "correct" as intended by the IBTA spec and kernel RDMA event
      APIs.
      
      Address these issues by removing IB_EVENT_SM_CHANGE handling from ipoib.
      IPoIB does not locally store sm_lid nor sm_sl, so it does not need to do
      anything on SM LID change. IPoIB makes use of other ib_core components
      to issue SA requests for it and those components correctly track SM LID
      and SM LID changes.
      
      Also in ib_core multicast handling,  remove the test for
      IB_EVENT_SM_CHANGE. This code is moving all multicast groups to the
      error state, which will trigger rejoins. This code is used by IPoIB as
      well as the connection manager and other clients of multicast groups.
      This kernel module centralizes group membership status and joins since a
      node can only join a given group once but multiple ULPs or applications
      may want to join the same group. It makes use of the sa_query.c
      component in ib_core, which correctly trackes SM LID and SL. This
      component does not track SM LID nor SL itself and hence need not react
      to their changes.
      
      Similarly in the ib_core cache code remove the handling for the
      IB_EVENT_SM_CHANGE.  In this function. The ib_cache_update function
      which is ultimately called is updating local copies of the pkey table,
      gid table and lmc. It does not update nor retain sm_lid nor sm_sl. As
      such it does not need to be called on an SM LID change. It technically
      also does not need to be called on a reregistration. The LID_CHANGE,
      PKEY_CHANGE, GID_CHANGE and port state change events (PORT_ERR,
      PORT_ACTICE) should be sufficient triggers.
      
      It is worth noting that the alternative of simply having the hfi1 and
      qib drivers not generate the SM LID change event was explored. While
      this would duplicate what Mellanox drivers do now, it is not the correct
      behavior and removes the ability for an SM to migrate without requiring
      reregistration. Since both opensm and OPA SM have mechanisms to backup
      or synchronize registration information, it is desirable to let them
      perform SM migrations (with LID or SL changes) without requiring
      reregistration when they deem it appropriate.
      Suggested-by: NTodd Rimmer <todd.rimmer@intel.com>
      Tested-by: NMichael Brooks <michael.brooks@intel.com>
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: NTodd Rimmer <todd.rimmer@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      ba7d8117