1. 09 9月, 2017 3 次提交
    • L
      MAINTAINERS: clarify kmod is just a kernel module loader · 00653d3a
      Luis R. Rodriguez 提交于
      This should make it clearer what the kmod code is now that
      the umh code is split out separately.
      
      Link: http://lkml.kernel.org/r/20170810180618.22457-3-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Daniel Mentz <danielmentz@google.com>
      Cc: David Binderman <dcb314@hotmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00653d3a
    • L
      kmod: split out umh code into its own file · 23558693
      Luis R. Rodriguez 提交于
      Patch series "kmod: few code cleanups to split out umh code"
      
      The usermode helper has a provenance from the old usb code which first
      required a usermode helper.  Eventually this was shoved into kmod.c and
      the kernel's modprobe calls was converted over eventually to share the
      same code.  Over time the list of usermode helpers in the kernel has grown
      -- so kmod is just but one user of the API.
      
      This series is a simple logical cleanup which acknowledges the code
      evolution of the usermode helper and shoves the UMH API into its own
      dedicated file.  This way users of the API can later just include umh.h
      instead of kmod.h.
      
      Note despite the diff state the first patch really is just a code shove,
      no functional changes are done there.  I did use git format-patch -M to
      generate the patch, but in the end the split was not enough for git to
      consider it a rename hence the large diffstat.
      
      I've put this through 0-day and it gives me their machine compilation
      blessings with all tests as OK.
      
      This patch (of 4):
      
      There's a slew of usermode helper users and kmod is just one of them.
      Split out the usermode helper code into its own file to keep the logic and
      focus split up.
      
      This change provides no functional changes.
      
      Link: http://lkml.kernel.org/r/20170810180618.22457-2-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Daniel Mentz <danielmentz@google.com>
      Cc: David Binderman <dcb314@hotmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      23558693
    • J
      hmm: heterogeneous memory management documentation · bffc33ec
      Jérôme Glisse 提交于
      Patch series "HMM (Heterogeneous Memory Management)", v25.
      
      Heterogeneous Memory Management (HMM) (description and justification)
      
      Today device driver expose dedicated memory allocation API through their
      device file, often relying on a combination of IOCTL and mmap calls.
      The device can only access and use memory allocated through this API.
      This effectively split the program address space into object allocated
      for the device and useable by the device and other regular memory
      (malloc, mmap of a file, share memory, â) only accessible by
      CPU (or in a very limited way by a device by pinning memory).
      
      Allowing different isolated component of a program to use a device thus
      require duplication of the input data structure using device memory
      allocator.  This is reasonable for simple data structure (array, grid,
      image, â) but this get extremely complex with advance data
      structure (list, tree, graph, â) that rely on a web of memory
      pointers.  This is becoming a serious limitation on the kind of work
      load that can be offloaded to device like GPU.
      
      New industry standard like C++, OpenCL or CUDA are pushing to remove
      this barrier.  This require a shared address space between GPU device
      and CPU so that GPU can access any memory of a process (while still
      obeying memory protection like read only).  This kind of feature is also
      appearing in various other operating systems.
      
      HMM is a set of helpers to facilitate several aspects of address space
      sharing and device memory management.  Unlike existing sharing mechanism
      that rely on pining pages use by a device, HMM relies on mmu_notifier to
      propagate CPU page table update to device page table.
      
      Duplicating CPU page table is only one aspect necessary for efficiently
      using device like GPU.  GPU local memory have bandwidth in the TeraBytes/
      second range but they are connected to main memory through a system bus
      like PCIE that is limited to 32GigaBytes/second (PCIE 4.0 16x).  Thus it
      is necessary to allow migration of process memory from main system memory
      to device memory.  Issue is that on platform that only have PCIE the
      device memory is not accessible by the CPU with the same properties as
      main memory (cache coherency, atomic operations, ...).
      
      To allow migration from main memory to device memory HMM provides a set of
      helper to hotplug device memory as a new type of ZONE_DEVICE memory which
      is un-addressable by CPU but still has struct page representing it.  This
      allow most of the core kernel logic that deals with a process memory to
      stay oblivious of the peculiarity of device memory.
      
      When page backing an address of a process is migrated to device memory the
      CPU page table entry is set to a new specific swap entry.  CPU access to
      such address triggers a migration back to system memory, just like if the
      page was swap on disk.  HMM also blocks any one from pinning a ZONE_DEVICE
      page so that it can always be migrated back to system memory if CPU access
      it.  Conversely HMM does not migrate to device memory any page that is pin
      in system memory.
      
      To allow efficient migration between device memory and main memory a new
      migrate_vma() helpers is added with this patchset.  It allows to leverage
      device DMA engine to perform the copy operation.
      
      This feature will be use by upstream driver like nouveau mlx5 and probably
      other in the future (amdgpu is next suspect in line).  We are actively
      working on nouveau and mlx5 support.  To test this patchset we also worked
      with NVidia close source driver team, they have more resources than us to
      test this kind of infrastructure and also a bigger and better userspace
      eco-system with various real industry workload they can be use to test and
      profile HMM.
      
      The expected workload is a program builds a data set on the CPU (from
      disk, from network, from sensors, â).  Program uses GPU API (OpenCL,
      CUDA, ...) to give hint on memory placement for the input data and also
      for the output buffer.  Program call GPU API to schedule a GPU job, this
      happens using device driver specific ioctl.  All this is hidden from
      programmer point of view in case of C++ compiler that transparently
      offload some part of a program to GPU.  Program can keep doing other stuff
      on the CPU while the GPU is crunching numbers.
      
      It is expected that CPU will not access the same data set as the GPU while
      GPU is working on it, but this is not mandatory.  In fact we expect some
      small memory object to be actively access by both GPU and CPU concurrently
      as synchronization channel and/or for monitoring purposes.  Such object
      will stay in system memory and should not be bottlenecked by system bus
      bandwidth (rare write and read access from both CPU and GPU).
      
      As we are relying on device driver API, HMM does not introduce any new
      syscall nor does it modify any existing ones.  It does not change any
      POSIX semantics or behaviors.  For instance the child after a fork of a
      process that is using HMM will not be impacted in anyway, nor is there any
      data hazard between child COW or parent COW of memory that was migrated to
      device prior to fork.
      
      HMM assume a numbers of hardware features.  Device must allow device page
      table to be updated at any time (ie device job must be preemptable).
      Device page table must provides memory protection such as read only.
      Device must track write access (dirty bit).  Device must have a minimum
      granularity that match PAGE_SIZE (ie 4k).
      
      Reviewer (just hint):
      Patch 1  HMM documentation
      Patch 2  introduce core infrastructure and definition of HMM, pretty
               small patch and easy to review
      Patch 3  introduce the mirror functionality of HMM, it relies on
               mmu_notifier and thus someone familiar with that part would be
               in better position to review
      Patch 4  is an helper to snapshot CPU page table while synchronizing with
               concurrent page table update. Understanding mmu_notifier makes
               review easier.
      Patch 5  is mostly a wrapper around handle_mm_fault()
      Patch 6  add new add_pages() helper to avoid modifying each arch memory
               hot plug function
      Patch 7  add a new memory type for ZONE_DEVICE and also add all the logic
               in various core mm to support this new type. Dan Williams and
               any core mm contributor are best people to review each half of
               this patchset
      Patch 8  special case HMM ZONE_DEVICE pages inside put_page() Kirill and
               Dan Williams are best person to review this
      Patch 9  allow to uncharge a page from memory group without using the lru
               list field of struct page (best reviewer: Johannes Weiner or
               Vladimir Davydov or Michal Hocko)
      Patch 10 Add support to uncharge ZONE_DEVICE page from a memory cgroup (best
               reviewer: Johannes Weiner or Vladimir Davydov or Michal Hocko)
      Patch 11 add helper to hotplug un-addressable device memory as new type
               of ZONE_DEVICE memory (new type introducted in patch 3 of this
               serie). This is boiler plate code around memory hotplug and it
               also pick a free range of physical address for the device memory.
               Note that the physical address do not point to anything (at least
               as far as the kernel knows).
      Patch 12 introduce a new hmm_device class as an helper for device driver
               that want to expose multiple device memory under a common fake
               device driver. This is usefull for multi-gpu configuration.
               Anyone familiar with device driver infrastructure can review
               this. Boiler plate code really.
      Patch 13 add a new migrate mode. Any one familiar with page migration is
               welcome to review.
      Patch 14 introduce a new migration helper (migrate_vma()) that allow to
               migrate a range of virtual address of a process using device DMA
               engine to perform the copy. It is not limited to do copy from and
               to device but can also do copy between any kind of source and
               destination memory. Again anyone familiar with migration code
               should be able to verify the logic.
      Patch 15 optimize the new migrate_vma() by unmapping pages while we are
               collecting them. This can be review by any mm folks.
      Patch 16 add unaddressable memory migration to helper introduced in patch
               7, this can be review by anyone familiar with migration code
      Patch 17 add a feature that allow device to allocate non-present page on
               the GPU when migrating a range of address to device memory. This
               is an helper for device driver to avoid having to first allocate
               system memory before migration to device memory
      Patch 18 add a new kind of ZONE_DEVICE memory for cache coherent device
               memory (CDM)
      Patch 19 add an helper to hotplug CDM memory
      
      Previous patchset posting :
      v1 http://lwn.net/Articles/597289/
      v2 https://lkml.org/lkml/2014/6/12/559
      v3 https://lkml.org/lkml/2014/6/13/633
      v4 https://lkml.org/lkml/2014/8/29/423
      v5 https://lkml.org/lkml/2014/11/3/759
      v6 http://lwn.net/Articles/619737/
      v7 http://lwn.net/Articles/627316/
      v8 https://lwn.net/Articles/645515/
      v9 https://lwn.net/Articles/651553/
      v10 https://lwn.net/Articles/654430/
      v11 http://www.gossamer-threads.com/lists/linux/kernel/2286424
      v12 http://www.kernelhub.org/?msg=972982&p=2
      v13 https://lwn.net/Articles/706856/
      v14 https://lkml.org/lkml/2016/12/8/344
      v15 http://www.mail-archive.com/linux-kernel@xxxxxxxxxxxxxxx/msg1304107.html
      v16 http://www.spinics.net/lists/linux-mm/msg119814.html
      v17 https://lkml.org/lkml/2017/1/27/847
      v18 https://lkml.org/lkml/2017/3/16/596
      v19 https://lkml.org/lkml/2017/4/5/831
      v20 https://lwn.net/Articles/720715/
      v21 https://lkml.org/lkml/2017/4/24/747
      v22 http://lkml.iu.edu/hypermail/linux/kernel/1705.2/05176.html
      v23 https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1404788.html
      v24 https://lwn.net/Articles/726691/
      
      This patch (of 19):
      
      This adds documentation for HMM (Heterogeneous Memory Management).  It
      presents the motivation behind it, the features necessary for it to be
      useful and and gives an overview of how this is implemented.
      
      Link: http://lkml.kernel.org/r/20170817000548.32038-2-jglisse@redhat.comSigned-off-by: NJérôme Glisse <jglisse@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Nellans <dnellans@nvidia.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mark Hairgrove <mhairgrove@nvidia.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Sherry Cheung <SCheung@nvidia.com>
      Cc: Subhash Gutti <sgutti@nvidia.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Bob Liu <liubo95@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bffc33ec
  2. 05 9月, 2017 2 次提交
  3. 31 8月, 2017 4 次提交
  4. 30 8月, 2017 1 次提交
  5. 29 8月, 2017 1 次提交
    • D
      hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK) · ae0078fc
      Dexuan Cui 提交于
      Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
      mechanism between the host and the guest. It uses VMBus ringbuffer as the
      transportation layer.
      
      With hv_sock, applications between the host (Windows 10, Windows Server
      2016 or newer) and the guest can talk with each other using the traditional
      socket APIs.
      
      More info about Hyper-V Sockets is available here:
      
      "Make your own integration services":
      https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/make-integration-service
      
      The patch implements the necessary support in Linux guest by introducing a new
      vsock transport for AF_VSOCK.
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Andy King <acking@vmware.com>
      Cc: Dmitry Torokhov <dtor@vmware.com>
      Cc: George Zhang <georgezhang@vmware.com>
      Cc: Jorgen Hansen <jhansen@vmware.com>
      Cc: Reilly Grant <grantr@vmware.com>
      Cc: Asias He <asias@redhat.com>
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Rolf Neugebauer <rolf.neugebauer@docker.com>
      Cc: Marcelo Cerri <marcelo.cerri@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae0078fc
  6. 28 8月, 2017 1 次提交
  7. 27 8月, 2017 2 次提交
  8. 25 8月, 2017 1 次提交
  9. 24 8月, 2017 1 次提交
  10. 23 8月, 2017 2 次提交
  11. 22 8月, 2017 2 次提交
  12. 20 8月, 2017 6 次提交
  13. 18 8月, 2017 4 次提交
  14. 17 8月, 2017 2 次提交
    • M
      membarrier: Provide expedited private command · 22e4ebb9
      Mathieu Desnoyers 提交于
      Implement MEMBARRIER_CMD_PRIVATE_EXPEDITED with IPIs using cpumask built
      from all runqueues for which current thread's mm is the same as the
      thread calling sys_membarrier. It executes faster than the non-expedited
      variant (no blocking). It also works on NOHZ_FULL configurations.
      
      Scheduler-wise, it requires a memory barrier before and after context
      switching between processes (which have different mm). The memory
      barrier before context switch is already present. For the barrier after
      context switch:
      
      * Our TSO archs can do RELEASE without being a full barrier. Look at
        x86 spin_unlock() being a regular STORE for example.  But for those
        archs, all atomics imply smp_mb and all of them have atomic ops in
        switch_mm() for mm_cpumask(), and on x86 the CR3 load acts as a full
        barrier.
      
      * From all weakly ordered machines, only ARM64 and PPC can do RELEASE,
        the rest does indeed do smp_mb(), so there the spin_unlock() is a full
        barrier and we're good.
      
      * ARM64 has a very heavy barrier in switch_to(), which suffices.
      
      * PPC just removed its barrier from switch_to(), but appears to be
        talking about adding something to switch_mm(). So add a
        smp_mb__after_unlock_lock() for now, until this is settled on the PPC
        side.
      
      Changes since v3:
      - Properly document the memory barriers provided by each architecture.
      
      Changes since v2:
      - Address comments from Peter Zijlstra,
      - Add smp_mb__after_unlock_lock() after finish_lock_switch() in
        finish_task_switch() to add the memory barrier we need after storing
        to rq->curr. This is much simpler than the previous approach relying
        on atomic_dec_and_test() in mmdrop(), which actually added a memory
        barrier in the common case of switching between userspace processes.
      - Return -EINVAL when MEMBARRIER_CMD_SHARED is used on a nohz_full
        kernel, rather than having the whole membarrier system call returning
        -ENOSYS. Indeed, CMD_PRIVATE_EXPEDITED is compatible with nohz_full.
        Adapt the CMD_QUERY mask accordingly.
      
      Changes since v1:
      - move membarrier code under kernel/sched/ because it uses the
        scheduler runqueue,
      - only add the barrier when we switch from a kernel thread. The case
        where we switch from a user-space thread is already handled by
        the atomic_dec_and_test() in mmdrop().
      - add a comment to mmdrop() documenting the requirement on the implicit
        memory barrier.
      
      CC: Peter Zijlstra <peterz@infradead.org>
      CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      CC: Boqun Feng <boqun.feng@gmail.com>
      CC: Andrew Hunter <ahh@google.com>
      CC: Maged Michael <maged.michael@gmail.com>
      CC: gromer@google.com
      CC: Avi Kivity <avi@scylladb.com>
      CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Paul Mackerras <paulus@samba.org>
      CC: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NDave Watson <davejwatson@fb.com>
      22e4ebb9
    • N
  15. 16 8月, 2017 1 次提交
    • J
      MAINTAINERS: Update entries for notification subsystem · 67427715
      Jan Kara 提交于
      Entries for inotify, dnotify, and fanotify in MAINTAINERS are stale.
      Neither Eric nor Robert nor John care about these subsystems anymore.
      These days it is mostly me or Amir who take care of bugs in these
      subsystems. So update MAINTAINERS to reflect current state.
      
      CC: Alexander Viro <viro@zeniv.linux.org.uk>
      CC: Eric Paris <eparis@redhat.com>
      CC: Robert Love <rlove@rlove.org>
      CC: John McCutchan <john@johnmccutchan.com>
      CC: Amir Goldstein <amir73il@gmail.com>
      Acked-by: NEric Paris <eparis@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      67427715
  16. 14 8月, 2017 1 次提交
  17. 12 8月, 2017 1 次提交
  18. 11 8月, 2017 3 次提交
  19. 09 8月, 2017 2 次提交