1. 01 10月, 2019 5 次提交
    • I
      perf llvm: Don't access out-of-scope array · 7d4c85b7
      Ian Rogers 提交于
      The 'test_dir' variable is assigned to the 'release' array which is
      out-of-scope 3 lines later.
      
      Extend the scope of the 'release' array so that an out-of-scope array
      isn't accessed.
      
      Bug detected by clang's address sanitizer.
      
      Fixes: 07bc5c69 ("perf tools: Make fetch_kernel_version() publicly available")
      Cc: stable@vger.kernel.org # v4.4+
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lore.kernel.org/lkml/20190926220018.25402-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7d4c85b7
    • A
      tools headers kvm: Sync kvm headers with the kernel sources · b7ad6108
      Arnaldo Carvalho de Melo 提交于
      To pick the changes in:
      
        200824f5 ("KVM: s390: Disallow invalid bits in kvm_valid_regs and kvm_dirty_regs")
        4a53d99d ("KVM: VMX: Introduce exit reason for receiving INIT signal on guest-mode")
        7396d337 ("KVM: x86: Return to userspace with internal error on unexpected exit reason")
        92f35b75 ("KVM: arm/arm64: vgic: Allow more than 256 vcpus for KVM_IRQ_LINE")
      
      None of them trigger any changes in tooling, this time this is just to silence
      these perf build warnings:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
        diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
        Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/vmx.h' differs from latest version at 'arch/x86/include/uapi/asm/vmx.h'
        diff -u tools/arch/x86/include/uapi/asm/vmx.h arch/x86/include/uapi/asm/vmx.h
        Warning: Kernel ABI header at 'tools/arch/s390/include/uapi/asm/kvm.h' differs from latest version at 'arch/s390/include/uapi/asm/kvm.h'
        diff -u tools/arch/s390/include/uapi/asm/kvm.h arch/s390/include/uapi/asm/kvm.h
        Warning: Kernel ABI header at 'tools/arch/arm/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm/include/uapi/asm/kvm.h'
        diff -u tools/arch/arm/include/uapi/asm/kvm.h arch/arm/include/uapi/asm/kvm.h
        Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
        diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Janosch Frank <frankja@linux.ibm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Thomas Huth <thuth@redhat.com>
      Link: https://lkml.kernel.org/n/tip-akuugvvjxte26kzv23zp5d2z@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b7ad6108
    • A
      tools headers uapi: Sync linux/fs.h with the kernel sources · 0ae40612
      Arnaldo Carvalho de Melo 提交于
      To pick the changes from:
      
        78a1b96b ("fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY_ALL_USERS ioctl")
        23c688b5 ("fscrypt: allow unprivileged users to add/remove keys for v2 policies")
        5dae460c ("fscrypt: v2 encryption policy support")
        5a7e2992 ("fscrypt: add FS_IOC_GET_ENCRYPTION_KEY_STATUS ioctl")
        b1c0ec35 ("fscrypt: add FS_IOC_REMOVE_ENCRYPTION_KEY ioctl")
        22d94f49 ("fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl")
        3b6df59b ("fscrypt: use FSCRYPT_* definitions, not FS_*")
        2336d0de ("fscrypt: use FSCRYPT_ prefix for uapi constants")
        7af0ab0d ("fs, fscrypt: move uapi definitions to new header <linux/fscrypt.h>")
      
      That don't trigger any changes in tooling, as it so far is used only
      for:
      
        $ grep -l 'fs\.h' tools/perf/trace/beauty/*.sh | xargs grep regex=
        tools/perf/trace/beauty/rename_flags.sh:regex='^[[:space:]]*#[[:space:]]*define[[:space:]]+RENAME_([[:alnum:]_]+)[[:space:]]+\(1[[:space:]]*<<[[:space:]]*([[:xdigit:]]+)[[:space:]]*\)[[:space:]]*.*'
        tools/perf/trace/beauty/sync_file_range.sh:regex='^[[:space:]]*#[[:space:]]*define[[:space:]]+SYNC_FILE_RANGE_([[:alnum:]_]+)[[:space:]]+([[:xdigit:]]+)[[:space:]]*.*'
        tools/perf/trace/beauty/usbdevfs_ioctl.sh:regex="^#[[:space:]]*define[[:space:]]+USBDEVFS_(\w+)(\(\w+\))?[[:space:]]+_IO[CWR]{0,2}\([[:space:]]*(_IOC_\w+,[[:space:]]*)?'U'[[:space:]]*,[[:space:]]*([[:digit:]]+).*"
        tools/perf/trace/beauty/usbdevfs_ioctl.sh:regex="^#[[:space:]]*define[[:space:]]+USBDEVFS_(\w+)[[:space:]]+_IO[WR]{0,2}\([[:space:]]*'U'[[:space:]]*,[[:space:]]*([[:digit:]]+).*"
        $
      
      This silences this perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/fs.h' differs from latest version at 'include/uapi/linux/fs.h'
        diff -u tools/include/uapi/linux/fs.h include/uapi/linux/fs.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-44g48exl9br9ba0t64chqb4i@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0ae40612
    • A
      tools headers uapi: Sync linux/usbdevice_fs.h with the kernel sources · 05f371f8
      Arnaldo Carvalho de Melo 提交于
      To pick up the changes from:
      
        4ed33505 ("USB: usbfs: Add a capability flag for runtime suspend")
        7794f486 ("usbfs: Add ioctls for runtime power management")
      
      This triggers these changes in the kernel sources, automagically
      supporting these new ioctls in the 'perf trace' beautifiers.
      
      Soon this will be used in things like filter expressions for tracepoints
      in 'perf record', 'perf trace', 'perf top', i.e. filter expressions will
      do a lookup to turn things like USBDEVFS_WAIT_FOR_RESUME into _IO('U',
      35) before associating the tracepoint expression to tracepoint perf
      event.
      
        $ tools/perf/trace/beauty/usbdevfs_ioctl.sh  > before
        $ cp include/uapi/linux/usbdevice_fs.h tools/include/uapi/linux/usbdevice_fs.h
        $ git diff
        diff --git a/tools/include/uapi/linux/usbdevice_fs.h b/tools/include/uapi/linux/usbdevice_fs.h
        index 78efe870c2b7..cf525cddeb94 100644
        --- a/tools/include/uapi/linux/usbdevice_fs.h
        +++ b/tools/include/uapi/linux/usbdevice_fs.h
        @@ -158,6 +158,7 @@ struct usbdevfs_hub_portinfo {
         #define USBDEVFS_CAP_MMAP                      0x20
         #define USBDEVFS_CAP_DROP_PRIVILEGES           0x40
         #define USBDEVFS_CAP_CONNINFO_EX               0x80
        +#define USBDEVFS_CAP_SUSPEND                   0x100
      
         /* USBDEVFS_DISCONNECT_CLAIM flags & struct */
      
        @@ -223,5 +224,8 @@ struct usbdevfs_streams {
          * extending size of the data returned.
          */
         #define USBDEVFS_CONNINFO_EX(len)  _IOC(_IOC_READ, 'U', 32, len)
        +#define USBDEVFS_FORBID_SUSPEND    _IO('U', 33)
        +#define USBDEVFS_ALLOW_SUSPEND     _IO('U', 34)
        +#define USBDEVFS_WAIT_FOR_RESUME   _IO('U', 35)
      
         #endif /* _UAPI_LINUX_USBDEVICE_FS_H */
        $ tools/perf/trace/beauty/usbdevfs_ioctl.sh  > after
        $ diff -u before after
        --- before	2019-09-27 11:41:50.634867620 -0300
        +++ after	2019-09-27 11:42:07.453102978 -0300
        @@ -24,6 +24,9 @@
         	[30] = "DROP_PRIVILEGES",
         	[31] = "GET_SPEED",
         	[32] = "CONNINFO_EX",
        +	[33] = "FORBID_SUSPEND",
        +	[34] = "ALLOW_SUSPEND",
        +	[35] = "WAIT_FOR_RESUME",
         	[3] = "RESETEP",
         	[4] = "SETINTERFACE",
         	[5] = "SETCONFIGURATION",
        $
      
      This addresses the following perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/usbdevice_fs.h' differs from latest version at 'include/uapi/linux/usbdevice_fs.h'
        diff -u tools/include/uapi/linux/usbdevice_fs.h include/uapi/linux/usbdevice_fs.h
      
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-x1rb109b9nfi7pukota82xhj@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      05f371f8
    • A
      tools headers uapi: Sync asm-generic/mman-common.h with the kernel · b1ba55cf
      Arnaldo Carvalho de Melo 提交于
      To pick the changes from:
      
        1a4e58cc ("mm: introduce MADV_PAGEOUT")
        9c276cc6 ("mm: introduce MADV_COLD")
      
      That result in these changes in the tools:
      
        $ tools/perf/trace/beauty/madvise_behavior.sh > before
        $ cp include/uapi/asm-generic/mman-common.h tools/include/uapi/asm-generic/mman-common.h
        $ git diff
        diff --git a/tools/include/uapi/asm-generic/mman-common.h b/tools/include/uapi/asm-generic/mman-common.h
        index 63b1f506ea67..c160a5354eb6 100644
        --- a/tools/include/uapi/asm-generic/mman-common.h
        +++ b/tools/include/uapi/asm-generic/mman-common.h
        @@ -67,6 +67,9 @@
         #define MADV_WIPEONFORK 18             /* Zero memory on fork, child only */
         #define MADV_KEEPONFORK 19             /* Undo MADV_WIPEONFORK */
      
        +#define MADV_COLD      20              /* deactivate these pages */
        +#define MADV_PAGEOUT   21              /* reclaim these pages */
        +
         /* compatibility flags */
         #define MAP_FILE       0
      
        $ tools/perf/trace/beauty/madvise_behavior.sh > after
        $ diff -u before after
        --- before	2019-09-27 11:29:43.346320100 -0300
        +++ after	2019-09-27 11:30:03.838570439 -0300
        @@ -16,6 +16,8 @@
         	[17] = "DODUMP",
         	[18] = "WIPEONFORK",
         	[19] = "KEEPONFORK",
        +	[20] = "COLD",
        +	[21] = "PAGEOUT",
         	[100] = "HWPOISON",
         	[101] = "SOFT_OFFLINE",
         };
        $
      
      I.e. now when madvise gets those behaviours as args, it will be able to
      translate from the number to a human readable string.
      
      This addresses the following perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h'
        diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-n40y6c4sa49p29q6sl8w3ufx@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b1ba55cf
  2. 27 9月, 2019 25 次提交
  3. 26 9月, 2019 10 次提交
    • D
      jffs2: Fix mounting under new mount API · a3bc18a4
      David Howells 提交于
      The mounting of jffs2 is broken due to the changes from the new mount API
      because it specifies a "source" operation, but then doesn't actually
      process it.  But because it specified it, it doesn't return -ENOPARAM and
      the caller doesn't process it either and the source gets lost.
      
      Fix this by simply removing the source parameter from jffs2 and letting the
      VFS deal with it in the default manner.
      
      To test it, enable CONFIG_MTD_MTDRAM and allow the default size and erase
      block size parameters, then try and mount the /dev/mtdblock<N> file that
      that creates as jffs2.  No need to initialise it.
      
      Fixes: ec10a24f ("vfs: Convert jffs2 to use the new mount API")
      Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: David Woodhouse <dwmw2@infradead.org>
      cc: Richard Weinberger <richard@nod.at>
      cc: linux-mtd@lists.infradead.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a3bc18a4
    • I
      Merge tag 'perf-core-for-mingo-5.5-20190925' of... · b11f7244
      Ingo Molnar 提交于
      Merge tag 'perf-core-for-mingo-5.5-20190925' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      perf record:
      
        Stephane Eranian:
      
        - Fix priv level with branch sampling for paranoid=2, i.e. the kernel checks
          if perf_event_attr_attr.exclude_hv is set in addition to .exclude_kernel,
          so reset both to zero.
      
        Arnaldo Carvalho de Melo:
      
        - Don't warn about not being able to read kernel maps (kallsyms, etc) when
          kernel samples aren't being collected.
      
      perf list:
      
        Kim Phillips:
      
        - Allow plurals for metric, metricgroup., i.e.:
      
          $ perf list metrics
      
          was showing nothing, which is very confusing, make it work like:
      
          $ perf stat metric
      
      perf stat:
      
        Andi Kleen:
      
        - Free memory access/leaks detected via valgrind, related to metrics.
      
      Libraries:
      
      libperf:
      
        Jiri Olsa:
      
        - Move more stuff from tools/perf, this time a first stab at moving perf_mmap
          methods.
      
      libtracevent:
      
        Steven Rostedt (VMware):
      
        - Round up in tep_print_event() time precision.
      
        Tzvetomir Stoyanov (VMware):
      
        - Man pages for event print and related and plugins APIs.
      
        - Move traceevent plugins in its own subdirectory.
      
      Feature detection:
      
        Thomas Richter:
      
        - Add detection of java-11-openjdk-devel package, in addition to the older
          versions supported.
      
      Architecture specific:
      
      S/390:
      
        Thomas Richter (2):
      
        - Include JVMTI support for s390
      
      Vendor events:
      
      AMD:
      
        Kim Phillips:
      
        - Add L3 cache events for Family 17h.
      
        - Remove redundant '['.
      
      PowerPC:
      
        Mamatha Inamdar:
      
        - Remove P8 HW events which are not supported.
      
      Cleanups:
      
        Arnaldo Carvalho de Melo:
      
        - Remove needless headers, add needed ones, move things around to reduce the
          headers dependency tree, speeding up builds by not doing needless compiles
          when unrelated stuff gets changed.
      
        - Ditch unused code that was dragging headers.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b11f7244
    • D
      checkpatch: check for nested (un)?likely() calls · de3f186f
      Denis Efremov 提交于
      IS_ERR(), IS_ERR_OR_NULL(), IS_ERR_VALUE() and WARN*() already contain
      unlikely() optimization internally.  Thus, there is no point in calling
      these functions and defines under likely()/unlikely().
      
      This check is based on the coccinelle rule developed by Enrico Weigelt
      https://lore.kernel.org/lkml/1559767582-11081-1-git-send-email-info@metux.net/
      
      Link: http://lkml.kernel.org/r/20190829165025.15750-1-efremov@linux.comSigned-off-by: NDenis Efremov <efremov@linux.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Anton Altaparmakov <anton@tuxera.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Boris Pismenny <borisp@mellanox.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Denis Efremov <efremov@linux.com>
      Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
      Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Leon Romanovsky <leon@kernel.org>
      Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
      Cc: Rob Clark <robdclark@gmail.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Cc: Sean Paul <sean@poorly.run>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de3f186f
    • M
      hexagon: drop empty and unused free_initrd_mem · c7cc8d77
      Mike Rapoport 提交于
      hexagon never reserves or initializes initrd and the only mention of it is
      the empty free_initrd_mem() function.
      
      As we have a generic implementation of free_initrd_mem(), there is no need
      to define an empty stub for the hexagon implementation and it can be
      dropped.
      
      Link: http://lkml.kernel.org/r/1565858133-25852-1-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c7cc8d77
    • M
      mm: factor out common parts between MADV_COLD and MADV_PAGEOUT · d616d512
      Minchan Kim 提交于
      There are many common parts between MADV_COLD and MADV_PAGEOUT.
      This patch factor them out to save code duplication.
      
      Link: http://lkml.kernel.org/r/20190726023435.214162-6-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Suggested-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: kbuild test robot <lkp@intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Oleksandr Natalenko <oleksandr@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Sonny Rao <sonnyrao@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tim Murray <timmurray@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d616d512
    • M
      mm: introduce MADV_PAGEOUT · 1a4e58cc
      Minchan Kim 提交于
      When a process expects no accesses to a certain memory range for a long
      time, it could hint kernel that the pages can be reclaimed instantly but
      data should be preserved for future use.  This could reduce workingset
      eviction so it ends up increasing performance.
      
      This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall.
      MADV_PAGEOUT can be used by a process to mark a memory range as not
      expected to be used for a long time so that kernel reclaims *any LRU*
      pages instantly.  The hint can help kernel in deciding which pages to
      evict proactively.
      
      A note: It doesn't apply SWAP_CLUSTER_MAX LRU page isolation limit
      intentionally because it's automatically bounded by PMD size.  If PMD
      size(e.g., 256) makes some trouble, we could fix it later by limit it to
      SWAP_CLUSTER_MAX[1].
      
      - man-page material
      
      MADV_PAGEOUT (since Linux x.x)
      
      Do not expect access in the near future so pages in the specified
      regions could be reclaimed instantly regardless of memory pressure.
      Thus, access in the range after successful operation could cause
      major page fault but never lose the up-to-date contents unlike
      MADV_DONTNEED. Pages belonging to a shared mapping are only processed
      if a write access is allowed for the calling process.
      
      MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or
      VM_PFNMAP pages.
      
      [1] https://lore.kernel.org/lkml/20190710194719.GS29695@dhcp22.suse.cz/
      
      [minchan@kernel.org: clear PG_active on MADV_PAGEOUT]
        Link: http://lkml.kernel.org/r/20190802200643.GA181880@google.com
      [akpm@linux-foundation.org: resolve conflicts with hmm.git]
      Link: http://lkml.kernel.org/r/20190726023435.214162-5-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Oleksandr Natalenko <oleksandr@redhat.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Sonny Rao <sonnyrao@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tim Murray <timmurray@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1a4e58cc
    • M
      mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM · 8940b34a
      Minchan Kim 提交于
      The local variable references in shrink_page_list is PAGEREF_RECLAIM_CLEAN
      as default.  It is for preventing to reclaim dirty pages when CMA try to
      migrate pages.  Strictly speaking, we don't need it because CMA didn't
      allow to write out by .may_writepage = 0 in reclaim_clean_pages_from_list.
      
      Moreover, it has a problem to prevent anonymous pages's swap out even
      though force_reclaim = true in shrink_page_list on upcoming patch.  So
      this patch makes references's default value to PAGEREF_RECLAIM and rename
      force_reclaim with ignore_references to make it more clear.
      
      This is a preparatory work for next patch.
      
      Link: http://lkml.kernel.org/r/20190726023435.214162-3-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: kbuild test robot <lkp@intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Oleksandr Natalenko <oleksandr@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Sonny Rao <sonnyrao@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tim Murray <timmurray@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8940b34a
    • M
      mm: introduce MADV_COLD · 9c276cc6
      Minchan Kim 提交于
      Patch series "Introduce MADV_COLD and MADV_PAGEOUT", v7.
      
      - Background
      
      The Android terminology used for forking a new process and starting an app
      from scratch is a cold start, while resuming an existing app is a hot
      start.  While we continually try to improve the performance of cold
      starts, hot starts will always be significantly less power hungry as well
      as faster so we are trying to make hot start more likely than cold start.
      
      To increase hot start, Android userspace manages the order that apps
      should be killed in a process called ActivityManagerService.
      ActivityManagerService tracks every Android app or service that the user
      could be interacting with at any time and translates that into a ranked
      list for lmkd(low memory killer daemon).  They are likely to be killed by
      lmkd if the system has to reclaim memory.  In that sense they are similar
      to entries in any other cache.  Those apps are kept alive for
      opportunistic performance improvements but those performance improvements
      will vary based on the memory requirements of individual workloads.
      
      - Problem
      
      Naturally, cached apps were dominant consumers of memory on the system.
      However, they were not significant consumers of swap even though they are
      good candidate for swap.  Under investigation, swapping out only begins
      once the low zone watermark is hit and kswapd wakes up, but the overall
      allocation rate in the system might trip lmkd thresholds and cause a
      cached process to be killed(we measured performance swapping out vs.
      zapping the memory by killing a process.  Unsurprisingly, zapping is 10x
      times faster even though we use zram which is much faster than real
      storage) so kill from lmkd will often satisfy the high zone watermark,
      resulting in very few pages actually being moved to swap.
      
      - Approach
      
      The approach we chose was to use a new interface to allow userspace to
      proactively reclaim entire processes by leveraging platform information.
      This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
      that are known to be cold from userspace and to avoid races with lmkd by
      reclaiming apps as soon as they entered the cached state.  Additionally,
      it could provide many chances for platform to use much information to
      optimize memory efficiency.
      
      To achieve the goal, the patchset introduce two new options for madvise.
      One is MADV_COLD which will deactivate activated pages and the other is
      MADV_PAGEOUT which will reclaim private pages instantly.  These new
      options complement MADV_DONTNEED and MADV_FREE by adding non-destructive
      ways to gain some free memory space.  MADV_PAGEOUT is similar to
      MADV_DONTNEED in a way that it hints the kernel that memory region is not
      currently needed and should be reclaimed immediately; MADV_COLD is similar
      to MADV_FREE in a way that it hints the kernel that memory region is not
      currently needed and should be reclaimed when memory pressure rises.
      
      This patch (of 5):
      
      When a process expects no accesses to a certain memory range, it could
      give a hint to kernel that the pages can be reclaimed when memory pressure
      happens but data should be preserved for future use.  This could reduce
      workingset eviction so it ends up increasing performance.
      
      This patch introduces the new MADV_COLD hint to madvise(2) syscall.
      MADV_COLD can be used by a process to mark a memory range as not expected
      to be used in the near future.  The hint can help kernel in deciding which
      pages to evict early during memory pressure.
      
      It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves
      
      	active file page -> inactive file LRU
      	active anon page -> inacdtive anon LRU
      
      Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file
      LRU's head because MADV_COLD is a little bit different symantic.
      MADV_FREE means it's okay to discard when the memory pressure because the
      content of the page is *garbage* so freeing such pages is almost zero
      overhead since we don't need to swap out and access afterward causes just
      minor fault.  Thus, it would make sense to put those freeable pages in
      inactive file LRU to compete other used-once pages.  It makes sense for
      implmentaion point of view, too because it's not swapbacked memory any
      longer until it would be re-dirtied.  Even, it could give a bonus to make
      them be reclaimed on swapless system.  However, MADV_COLD doesn't mean
      garbage so reclaiming them requires swap-out/in in the end so it's bigger
      cost.  Since we have designed VM LRU aging based on cost-model, anonymous
      cold pages would be better to position inactive anon's LRU list, not file
      LRU.  Furthermore, it would help to avoid unnecessary scanning if system
      doesn't have a swap device.  Let's start simpler way without adding
      complexity at this moment.  However, keep in mind, too that it's a caveat
      that workloads with a lot of pages cache are likely to ignore MADV_COLD on
      anonymous memory because we rarely age anonymous LRU lists.
      
      * man-page material
      
      MADV_COLD (since Linux x.x)
      
      Pages in the specified regions will be treated as less-recently-accessed
      compared to pages in the system with similar access frequencies.  In
      contrast to MADV_FREE, the contents of the region are preserved regardless
      of subsequent writes to pages.
      
      MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP
      pages.
      
      [akpm@linux-foundation.org: resolve conflicts with hmm.git]
      Link: http://lkml.kernel.org/r/20190726023435.214162-2-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Daniel Colascione <dancol@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Oleksandr Natalenko <oleksandr@redhat.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Sonny Rao <sonnyrao@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tim Murray <timmurray@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c276cc6
    • C
      mm: untag user pointers in mmap/munmap/mremap/brk · ce18d171
      Catalin Marinas 提交于
      There isn't a good reason to differentiate between the user address space
      layout modification syscalls and the other memory permission/attributes
      ones (e.g.  mprotect, madvise) w.r.t.  the tagged address ABI.  Untag the
      user addresses on entry to these functions.
      
      Link: http://lkml.kernel.org/r/20190821164730.47450-2-catalin.marinas@arm.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NWill Deacon <will@kernel.org>
      Acked-by: NAndrey Konovalov <andreyknvl@google.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Szabolcs Nagy <szabolcs.nagy@arm.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: Dave P Martin <Dave.Martin@arm.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ce18d171
    • A
      vfio/type1: untag user pointers in vaddr_get_pfn · 6cf5354c
      Andrey Konovalov 提交于
      This patch is a part of a series that extends kernel ABI to allow to pass
      tagged user pointers (with the top byte set to something else other than
      0x00) as syscall arguments.
      
      vaddr_get_pfn() uses provided user pointers for vma lookups, which can
      only by done with untagged pointers.
      
      Untag user pointers in this function.
      
      Link: http://lkml.kernel.org/r/87422b4d72116a975896f2b19b00f38acbd28f33.1563904656.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Reviewed-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jens Wiklander <jens.wiklander@linaro.org>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6cf5354c