1. 02 5月, 2017 1 次提交
  2. 27 4月, 2017 5 次提交
  3. 21 4月, 2017 1 次提交
    • M
      kvm: better MWAIT emulation for guests · 668fffa3
      Michael S. Tsirkin 提交于
      Guests that are heavy on futexes end up IPI'ing each other a lot. That
      can lead to significant slowdowns and latency increase for those guests
      when running within KVM.
      
      If only a single guest is needed on a host, we have a lot of spare host
      CPU time we can throw at the problem. Modern CPUs implement a feature
      called "MWAIT" which allows guests to wake up sleeping remote CPUs without
      an IPI - thus without an exit - at the expense of never going out of guest
      context.
      
      The decision whether this is something sensible to use should be up to the
      VM admin, so to user space. We can however allow MWAIT execution on systems
      that support it properly hardware wise.
      
      This patch adds a CAP to user space and a KVM cpuid leaf to indicate
      availability of native MWAIT execution. With that enabled, the worst a
      guest can do is waste as many cycles as a "jmp ." would do, so it's not
      a privilege problem.
      
      We consciously do *not* expose the feature in our CPUID bitmap, as most
      people will want to benefit from sleeping vCPUs to allow for over commit.
      Reported-by: N"Gabriel L. Somlo" <gsomlo@gmail.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      [agraf: fix amd, change commit message]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      668fffa3
  4. 20 4月, 2017 2 次提交
    • A
      KVM: PPC: VFIO: Add in-kernel acceleration for VFIO · 121f80ba
      Alexey Kardashevskiy 提交于
      This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
      and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO
      without passing them to user space which saves time on switching
      to user space and back.
      
      This adds H_PUT_TCE/H_PUT_TCE_INDIRECT/H_STUFF_TCE handlers to KVM.
      KVM tries to handle a TCE request in the real mode, if failed
      it passes the request to the virtual mode to complete the operation.
      If it a virtual mode handler fails, the request is passed to
      the user space; this is not expected to happen though.
      
      To avoid dealing with page use counters (which is tricky in real mode),
      this only accelerates SPAPR TCE IOMMU v2 clients which are required
      to pre-register the userspace memory. The very first TCE request will
      be handled in the VFIO SPAPR TCE driver anyway as the userspace view
      of the TCE table (iommu_table::it_userspace) is not allocated till
      the very first mapping happens and we cannot call vmalloc in real mode.
      
      If we fail to update a hardware IOMMU table unexpected reason, we just
      clear it and move on as there is nothing really we can do about it -
      for example, if we hot plug a VFIO device to a guest, existing TCE tables
      will be mirrored automatically to the hardware and there is no interface
      to report to the guest about possible failures.
      
      This adds new attribute - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE - to
      the VFIO KVM device. It takes a VFIO group fd and SPAPR TCE table fd
      and associates a physical IOMMU table with the SPAPR TCE table (which
      is a guest view of the hardware IOMMU table). The iommu_table object
      is cached and referenced so we do not have to look up for it in real mode.
      
      This does not implement the UNSET counterpart as there is no use for it -
      once the acceleration is enabled, the existing userspace won't
      disable it unless a VFIO container is destroyed; this adds necessary
      cleanup to the KVM_DEV_VFIO_GROUP_DEL handler.
      
      This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user
      space.
      
      This adds real mode version of WARN_ON_ONCE() as the generic version
      causes problems with rcu_sched. Since we testing what vmalloc_to_phys()
      returns in the code, this also adds a check for already existing
      vmalloc_to_phys() call in kvmppc_rm_h_put_tce_indirect().
      
      This finally makes use of vfio_external_user_iommu_id() which was
      introduced quite some time ago and was considered for removal.
      
      Tests show that this patch increases transmission speed from 220MB/s
      to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      121f80ba
    • A
      KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number · 4898d3f4
      Alexey Kardashevskiy 提交于
      This adds a capability number for in-kernel support for VFIO on
      SPAPR platform.
      
      The capability will tell the user space whether in-kernel handlers of
      H_PUT_TCE can handle VFIO-targeted requests or not. If not, the user space
      must not attempt allocating a TCE table in the host kernel via
      the KVM_CREATE_SPAPR_TCE KVM ioctl because in that case TCE requests
      will not be passed to the user space which is desired action in
      the situation like that.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      4898d3f4
  5. 13 4月, 2017 1 次提交
  6. 09 4月, 2017 6 次提交
  7. 07 4月, 2017 3 次提交
  8. 28 3月, 2017 2 次提交
    • J
      KVM: MIPS: Add 64BIT capability · 578fd61d
      James Hogan 提交于
      Add a new KVM_CAP_MIPS_64BIT capability to indicate that 64-bit MIPS
      guests are available and supported. In this case it should still be
      possible to run 32-bit guest code. If not available it won't be possible
      to run 64-bit guest code and the instructions may not be available, or
      the kernel may not support full context switching of 64-bit registers.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      578fd61d
    • J
      KVM: MIPS: Add VZ & TE capabilities · a8a3c426
      James Hogan 提交于
      Add new KVM_CAP_MIPS_VZ and KVM_CAP_MIPS_TE capabilities, and in order
      to allow MIPS KVM to support VZ without confusing old users (which
      expect the trap & emulate implementation), define and start checking
      KVM_CREATE_VM type codes.
      
      The codes available are:
      
       - KVM_VM_MIPS_TE = 0
      
         This is the current value expected from the user, and will create a
         VM using trap & emulate in user mode, confined to the user mode
         address space. This may in future become unavailable if the kernel is
         only configured to support VZ, in which case the EINVAL error will be
         returned and KVM_CAP_MIPS_TE won't be available even though
         KVM_CAP_MIPS_VZ is.
      
       - KVM_VM_MIPS_VZ = 1
      
         This can be provided when the KVM_CAP_MIPS_VZ capability is available
         to create a VM using VZ, with a fully virtualized guest virtual
         address space. If VZ support is unavailable in the kernel, the EINVAL
         error will be returned (although old kernels without the
         KVM_CAP_MIPS_VZ capability may well succeed and create a trap &
         emulate VM).
      
      This is designed to allow the desired implementation (T&E vs VZ) to be
      potentially chosen at runtime rather than being fixed in the kernel
      configuration.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      a8a3c426
  9. 24 3月, 2017 1 次提交
  10. 23 3月, 2017 1 次提交
  11. 22 3月, 2017 1 次提交
    • M
      s390: add a system call for guarded storage · 916cda1a
      Martin Schwidefsky 提交于
      This adds a new system call to enable the use of guarded storage for
      user space processes. The system call takes two arguments, a command
      and pointer to a guarded storage control block:
      
          s390_guarded_storage(int command, struct gs_cb *gs_cb);
      
      The second argument is relevant only for the GS_SET_BC_CB command.
      
      The commands in detail:
      
      0 - GS_ENABLE
          Enable the guarded storage facility for the current task. The
          initial content of the guarded storage control block will be
          all zeros. After the enablement the user space code can use
          load-guarded-storage-controls instruction (LGSC) to load an
          arbitrary control block. While a task is enabled the kernel
          will save and restore the current content of the guarded
          storage registers on context switch.
      1 - GS_DISABLE
          Disables the use of the guarded storage facility for the current
          task. The kernel will cease to save and restore the content of
          the guarded storage registers, the task specific content of
          these registers is lost.
      2 - GS_SET_BC_CB
          Set a broadcast guarded storage control block. This is called
          per thread and stores a specific guarded storage control block
          in the task struct of the current task. This control block will
          be used for the broadcast event GS_BROADCAST.
      3 - GS_CLEAR_BC_CB
          Clears the broadcast guarded storage control block. The guarded-
          storage control block is removed from the task struct that was
          established by GS_SET_BC_CB.
      4 - GS_BROADCAST
          Sends a broadcast to all thread siblings of the current task.
          Every sibling that has established a broadcast guarded storage
          control block will load this control block and will be enabled
          for guarded storage. The broadcast guarded storage control block
          is used up, a second broadcast without a refresh of the stored
          control block with GS_SET_BC_CB will not have any effect.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      916cda1a
  12. 20 3月, 2017 2 次提交
    • K
      x86/arch_prctl: Add ARCH_[GET|SET]_CPUID · e9ea1e7f
      Kyle Huey 提交于
      Intel supports faulting on the CPUID instruction beginning with Ivy Bridge.
      When enabled, the processor will fault on attempts to execute the CPUID
      instruction with CPL>0. Exposing this feature to userspace will allow a
      ptracer to trap and emulate the CPUID instruction.
      
      When supported, this feature is controlled by toggling bit 0 of
      MSR_MISC_FEATURES_ENABLES. It is documented in detail in Section 2.3.2 of
      https://bugzilla.kernel.org/attachment.cgi?id=243991
      
      Implement a new pair of arch_prctls, available on both x86-32 and x86-64.
      
      ARCH_GET_CPUID: Returns the current CPUID state, either 0 if CPUID faulting
          is enabled (and thus the CPUID instruction is not available) or 1 if
          CPUID faulting is not enabled.
      
      ARCH_SET_CPUID: Set the CPUID state to the second argument. If
          cpuid_enabled is 0 CPUID faulting will be activated, otherwise it will
          be deactivated. Returns ENODEV if CPUID faulting is not supported on
          this system.
      
      The state of the CPUID faulting flag is propagated across forks, but reset
      upon exec.
      Signed-off-by: NKyle Huey <khuey@kylehuey.com>
      Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: linux-kselftest@vger.kernel.org
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Robert O'Callahan <robert@ocallahan.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: user-mode-linux-devel@lists.sourceforge.net
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: user-mode-linux-user@lists.sourceforge.net
      Cc: David Matlack <dmatlack@google.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: http://lkml.kernel.org/r/20170320081628.18952-9-khuey@kylehuey.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e9ea1e7f
    • K
      x86/syscalls/32: Wire up arch_prctl on x86-32 · 79170fda
      Kyle Huey 提交于
      Hook up arch_prctl to call do_arch_prctl() on x86-32, and in 32 bit compat
      mode on x86-64. This allows to have arch_prctls that are not specific to 64
      bits.
      
      On UML, simply stub out this syscall.
      Signed-off-by: NKyle Huey <khuey@kylehuey.com>
      Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: linux-kselftest@vger.kernel.org
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Robert O'Callahan <robert@ocallahan.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: user-mode-linux-devel@lists.sourceforge.net
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: user-mode-linux-user@lists.sourceforge.net
      Cc: David Matlack <dmatlack@google.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: http://lkml.kernel.org/r/20170320081628.18952-7-khuey@kylehuey.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      79170fda
  13. 19 3月, 2017 2 次提交
    • M
      target: fix ALUA transition timeout handling · d7175373
      Mike Christie 提交于
      The implicit transition time tells initiators the min time
      to wait before timing out a transition. We currently schedule
      the transition to occur in tg_pt_gp_implicit_trans_secs
      seconds so there is no room for delays. If
      core_alua_do_transition_tg_pt_work->core_alua_update_tpg_primary_metadata
      needs to write out info to a remote file, then the initiator can
      easily time out the operation.
      Signed-off-by: NMike Christie <mchristi@redhat.com>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      d7175373
    • M
      target: allow ALUA setup for some passthrough backends · 530c6891
      Mike Christie 提交于
      This patch allows passthrough backends to use the core/base LIO
      ALUA setup and state checks, but still handle the execution of
      commands.
      
      This will allow the target_core_user module to execute STPG and RTPG
      in userspace, and not have to duplicate the ALUA state checks, path
      information (needed so we can check if command is executable on
      specific paths) and setup (rtslib sets/updates the configfs ALUA
      interface like it does for iblock or file).
      
      For STPG, the target_core_user userspace daemon, tcmu-runner will
      still execute the STPG, and to update the core/base LIO state it
      will use the existing configfs interface. For RTPG, tcmu-runner
      will loop over configfs and/or cache the state.
      Signed-off-by: NMike Christie <mchristi@redhat.com>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      530c6891
  14. 17 3月, 2017 2 次提交
  15. 14 3月, 2017 1 次提交
  16. 13 3月, 2017 2 次提交
    • D
      uapi: fix drm/omap_drm.h userspace compilation errors · 337ba7fb
      Dmitry V. Levin 提交于
      Consistently use types from linux/types.h like in other uapi drm/*_drm.h
      header files to fix the following drm/omap_drm.h userspace compilation
      errors:
      
      /usr/include/drm/omap_drm.h:36:2: error: unknown type name 'uint64_t'
        uint64_t param;   /* in */
      /usr/include/drm/omap_drm.h:37:2: error: unknown type name 'uint64_t'
        uint64_t value;   /* in (set_param), out (get_param) */
      /usr/include/drm/omap_drm.h:56:2: error: unknown type name 'uint32_t'
        uint32_t bytes;  /* (for non-tiled formats) */
      /usr/include/drm/omap_drm.h:58:3: error: unknown type name 'uint16_t'
         uint16_t width;
      /usr/include/drm/omap_drm.h:59:3: error: unknown type name 'uint16_t'
         uint16_t height;
      /usr/include/drm/omap_drm.h:65:2: error: unknown type name 'uint32_t'
        uint32_t flags;   /* in */
      /usr/include/drm/omap_drm.h:66:2: error: unknown type name 'uint32_t'
        uint32_t handle;  /* out */
      /usr/include/drm/omap_drm.h:67:2: error: unknown type name 'uint32_t'
        uint32_t __pad;
      /usr/include/drm/omap_drm.h:77:2: error: unknown type name 'uint32_t'
        uint32_t handle;  /* buffer handle (in) */
      /usr/include/drm/omap_drm.h:78:2: error: unknown type name 'uint32_t'
        uint32_t op;   /* mask of omap_gem_op (in) */
      /usr/include/drm/omap_drm.h:82:2: error: unknown type name 'uint32_t'
        uint32_t handle;  /* buffer handle (in) */
      /usr/include/drm/omap_drm.h:83:2: error: unknown type name 'uint32_t'
        uint32_t op;   /* mask of omap_gem_op (in) */
      /usr/include/drm/omap_drm.h:88:2: error: unknown type name 'uint32_t'
        uint32_t nregions;
      /usr/include/drm/omap_drm.h:89:2: error: unknown type name 'uint32_t'
        uint32_t __pad;
      /usr/include/drm/omap_drm.h:93:2: error: unknown type name 'uint32_t'
        uint32_t handle;  /* buffer handle (in) */
      /usr/include/drm/omap_drm.h:94:2: error: unknown type name 'uint32_t'
        uint32_t pad;
      /usr/include/drm/omap_drm.h:95:2: error: unknown type name 'uint64_t'
        uint64_t offset;  /* mmap offset (out) */
      /usr/include/drm/omap_drm.h:102:2: error: unknown type name 'uint32_t'
        uint32_t size;   /* virtual size for mmap'ing (out) */
      /usr/include/drm/omap_drm.h:103:2: error: unknown type name 'uint32_t'
        uint32_t __pad;
      
      Fixes: ef6503e8 ("drm: Kbuild: add omap_drm.h to the installed headers")
      Signed-off-by: NDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: NTomi Valkeinen <tomi.valkeinen@ti.com>
      337ba7fb
    • D
      bpf: improve read-only handling · 65869a47
      Daniel Borkmann 提交于
      Improve bpf_{prog,jit_binary}_{un,}lock_ro() by throwing a
      one-time warning in case of an error when the image couldn't
      be set read-only, and also mark struct bpf_prog as locked when
      bpf_prog_lock_ro() was called.
      
      Reason for the latter is that bpf_prog_unlock_ro() is called from
      various places including error paths, and we shouldn't mess with
      page attributes when really not needed.
      
      For bpf_jit_binary_unlock_ro() this is not needed as jited flag
      implicitly indicates this, thus for archs with ARCH_HAS_SET_MEMORY
      we're guaranteed to have a previously locked image. Overall, this
      should also help us to identify any further potential issues with
      set_memory_*() helpers.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65869a47
  17. 11 3月, 2017 3 次提交
  18. 10 3月, 2017 4 次提交
    • D
      net: Work around lockdep limitation in sockets that use sockets · cdfbabfb
      David Howells 提交于
      Lockdep issues a circular dependency warning when AFS issues an operation
      through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.
      
      The theory lockdep comes up with is as follows:
      
       (1) If the pagefault handler decides it needs to read pages from AFS, it
           calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
           creating a call requires the socket lock:
      
      	mmap_sem must be taken before sk_lock-AF_RXRPC
      
       (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
           binds the underlying UDP socket whilst holding its socket lock.
           inet_bind() takes its own socket lock:
      
      	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET
      
       (3) Reading from a TCP socket into a userspace buffer might cause a fault
           and thus cause the kernel to take the mmap_sem, but the TCP socket is
           locked whilst doing this:
      
      	sk_lock-AF_INET must be taken before mmap_sem
      
      However, lockdep's theory is wrong in this instance because it deals only
      with lock classes and not individual locks.  The AF_INET lock in (2) isn't
      really equivalent to the AF_INET lock in (3) as the former deals with a
      socket entirely internal to the kernel that never sees userspace.  This is
      a limitation in the design of lockdep.
      
      Fix the general case by:
      
       (1) Double up all the locking keys used in sockets so that one set are
           used if the socket is created by userspace and the other set is used
           if the socket is created by the kernel.
      
       (2) Store the kern parameter passed to sk_alloc() in a variable in the
           sock struct (sk_kern_sock).  This informs sock_lock_init(),
           sock_init_data() and sk_clone_lock() as to the lock keys to be used.
      
           Note that the child created by sk_clone_lock() inherits the parent's
           kern setting.
      
       (3) Add a 'kern' parameter to ->accept() that is analogous to the one
           passed in to ->create() that distinguishes whether kernel_accept() or
           sys_accept4() was the caller and can be passed to sk_alloc().
      
           Note that a lot of accept functions merely dequeue an already
           allocated socket.  I haven't touched these as the new socket already
           exists before we get the parameter.
      
           Note also that there are a couple of places where I've made the accepted
           socket unconditionally kernel-based:
      
      	irda_accept()
      	rds_rcp_accept_one()
      	tcp_accept_from_sock()
      
           because they follow a sock_create_kern() and accept off of that.
      
      Whilst creating this, I noticed that lustre and ocfs don't create sockets
      through sock_create_kern() and thus they aren't marked as for-kernel,
      though they appear to be internal.  I wonder if these should do that so
      that they use the new set of lock keys.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdfbabfb
    • A
      userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED · 70ccb92f
      Andrea Arcangeli 提交于
      userfaultfd_remove() has to be execute before zapping the pagetables or
      UFFDIO_COPY could keep filling pages after zap_page_range returned,
      which would result in non zero data after a MADV_DONTNEED.
      
      However userfaultfd_remove() may have to release the mmap_sem.  This was
      handled correctly in MADV_REMOVE, but MADV_DONTNEED accessed a
      potentially stale vma (the very vma passed to zap_page_range(vma, ...)).
      
      The fix consists in revalidating the vma in case userfaultfd_remove()
      had to release the mmap_sem.
      
      This also optimizes away an unnecessary down_read/up_read in the
      MADV_REMOVE case if UFFD_EVENT_FORK had to be delivered.
      
      It all remains zero runtime cost in case CONFIG_USERFAULTFD=n as
      userfaultfd_remove() will be defined as "true" at build time.
      
      Link: http://lkml.kernel.org/r/20170302173738.18994-3-aarcange@redhat.comSigned-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70ccb92f
    • Y
      mm/vmstats: add thp_split_pud event for clarity · ce9311cf
      Yisheng Xie 提交于
      We added support for PUD-sized transparent hugepages, however we count
      the event "thp split pud" into thp_split_pmd event.
      
      To separate the event count of thp split pud from pmd, add a new event
      named thp_split_pud.
      
      Link: http://lkml.kernel.org/r/1488282380-5076-1-git-send-email-xieyisheng1@huawei.comSigned-off-by: NYisheng Xie <xieyisheng1@huawei.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ce9311cf
    • A
      include/linux/fs.h: fix unsigned enum warning with gcc-4.2 · cbfd0c10
      Arnd Bergmann 提交于
      With arm-linux-gcc-4.2, almost every file we build in the kernel ends up
      with this warning:
      
        include/linux/fs.h:2648: warning: comparison of unsigned expression < 0 is always false
      
      Later versions don't have this problem, but it's easy enough to work
      around.
      
      Link: http://lkml.kernel.org/r/20161216105634.235457-12-arnd@arndb.deSigned-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Russell King <rmk+kernel@armlinux.org.uk>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cbfd0c10