1. 19 10月, 2016 1 次提交
  2. 18 8月, 2016 2 次提交
    • O
      uprobes: Rename the "struct page *" args of __replace_page() · bdfaa2ee
      Oleg Nesterov 提交于
      Purely cosmetic, no changes in the compiled code.
      
      Perhaps it is just me but I can hardly read __replace_page() because I can't
      distinguish "page" from "kpage" and because I need to look at the caller to
      to ensure that, say, kpage is really the new page and the code is correct.
      Rename them to old_page and new_page, this matches the caller.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Brenden Blanco <bblanco@plumgrid.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Link: http://lkml.kernel.org/r/20160817153704.GC29724@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bdfaa2ee
    • O
      uprobes: Fix the memcg accounting · 6c4687cc
      Oleg Nesterov 提交于
      __replace_page() wronlgy calls mem_cgroup_cancel_charge() in "success" path,
      it should only do this if page_check_address() fails.
      
      This means that every enable/disable leads to unbalanced mem_cgroup_uncharge()
      from put_page(old_page), it is trivial to underflow the page_counter->count
      and trigger OOM.
      Reported-and-tested-by: NBrenden Blanco <bblanco@plumgrid.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: stable@vger.kernel.org # 3.17+
      Fixes: 00501b53 ("mm: memcontrol: rewrite charge API")
      Link: http://lkml.kernel.org/r/20160817153629.GB29724@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6c4687cc
  3. 24 5月, 2016 1 次提交
  4. 23 5月, 2016 1 次提交
    • L
      x86: remove more uaccess_32.h complexity · bd28b145
      Linus Torvalds 提交于
      I'm looking at trying to possibly merge the 32-bit and 64-bit versions
      of the x86 uaccess.h implementation, but first this needs to be cleaned
      up.
      
      For example, the 32-bit version of "__copy_from_user_inatomic()" is
      mostly the special cases for the constant size, and it's actually almost
      never relevant.  Most users aren't actually using a constant size
      anyway, and the few cases that do small constant copies are better off
      just using __get_user() instead.
      
      So get rid of the unnecessary complexity.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd28b145
  5. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  6. 29 2月, 2016 1 次提交
  7. 16 2月, 2016 1 次提交
    • D
      mm/gup: Introduce get_user_pages_remote() · 1e987790
      Dave Hansen 提交于
      For protection keys, we need to understand whether protections
      should be enforced in software or not.  In general, we enforce
      protections when working on our own task, but not when on others.
      We call these "current" and "remote" operations.
      
      This patch introduces a new get_user_pages() variant:
      
              get_user_pages_remote()
      
      Which is a replacement for when get_user_pages() is called on
      non-current tsk/mm.
      
      We also introduce a new gup flag: FOLL_REMOTE which can be used
      for the "__" gup variants to get this new behavior.
      
      The uprobes is_trap_at_addr() location holds mmap_sem and
      calls get_user_pages(current->mm) on an instruction address.  This
      makes it a pretty unique gup caller.  Being an instruction access
      and also really originating from the kernel (vs. the app), I opted
      to consider this a 'remote' access where protection keys will not
      be enforced.
      
      Without protection keys, this patch should not change any behavior.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: jack@suse.cz
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210154.3F0E51EA@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1e987790
  8. 16 1月, 2016 2 次提交
  9. 15 1月, 2016 1 次提交
  10. 23 11月, 2015 1 次提交
    • P
      treewide: Remove old email address · 90eec103
      Peter Zijlstra 提交于
      There were still a number of references to my old Red Hat email
      address in the kernel source. Remove these while keeping the
      Red Hat copyright notices intact.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      90eec103
  11. 31 7月, 2015 14 次提交
  12. 14 12月, 2014 3 次提交
  13. 24 11月, 2014 1 次提交
  14. 09 8月, 2014 1 次提交
    • J
      mm: memcontrol: rewrite charge API · 00501b53
      Johannes Weiner 提交于
      These patches rework memcg charge lifetime to integrate more naturally
      with the lifetime of user pages.  This drastically simplifies the code and
      reduces charging and uncharging overhead.  The most expensive part of
      charging and uncharging is the page_cgroup bit spinlock, which is removed
      entirely after this series.
      
      Here are the top-10 profile entries of a stress test that reads a 128G
      sparse file on a freshly booted box, without even a dedicated cgroup (i.e.
       executing in the root memcg).  Before:
      
          15.36%              cat  [kernel.kallsyms]   [k] copy_user_generic_string
          13.31%              cat  [kernel.kallsyms]   [k] memset
          11.48%              cat  [kernel.kallsyms]   [k] do_mpage_readpage
           4.23%              cat  [kernel.kallsyms]   [k] get_page_from_freelist
           2.38%              cat  [kernel.kallsyms]   [k] put_page
           2.32%              cat  [kernel.kallsyms]   [k] __mem_cgroup_commit_charge
           2.18%          kswapd0  [kernel.kallsyms]   [k] __mem_cgroup_uncharge_common
           1.92%          kswapd0  [kernel.kallsyms]   [k] shrink_page_list
           1.86%              cat  [kernel.kallsyms]   [k] __radix_tree_lookup
           1.62%              cat  [kernel.kallsyms]   [k] __pagevec_lru_add_fn
      
      After:
      
          15.67%           cat  [kernel.kallsyms]   [k] copy_user_generic_string
          13.48%           cat  [kernel.kallsyms]   [k] memset
          11.42%           cat  [kernel.kallsyms]   [k] do_mpage_readpage
           3.98%           cat  [kernel.kallsyms]   [k] get_page_from_freelist
           2.46%           cat  [kernel.kallsyms]   [k] put_page
           2.13%       kswapd0  [kernel.kallsyms]   [k] shrink_page_list
           1.88%           cat  [kernel.kallsyms]   [k] __radix_tree_lookup
           1.67%           cat  [kernel.kallsyms]   [k] __pagevec_lru_add_fn
           1.39%       kswapd0  [kernel.kallsyms]   [k] free_pcppages_bulk
           1.30%           cat  [kernel.kallsyms]   [k] kfree
      
      As you can see, the memcg footprint has shrunk quite a bit.
      
         text    data     bss     dec     hex filename
        37970    9892     400   48262    bc86 mm/memcontrol.o.old
        35239    9892     400   45531    b1db mm/memcontrol.o
      
      This patch (of 4):
      
      The memcg charge API charges pages before they are rmapped - i.e.  have an
      actual "type" - and so every callsite needs its own set of charge and
      uncharge functions to know what type is being operated on.  Worse,
      uncharge has to happen from a context that is still type-specific, rather
      than at the end of the page's lifetime with exclusive access, and so
      requires a lot of synchronization.
      
      Rewrite the charge API to provide a generic set of try_charge(),
      commit_charge() and cancel_charge() transaction operations, much like
      what's currently done for swap-in:
      
        mem_cgroup_try_charge() attempts to reserve a charge, reclaiming
        pages from the memcg if necessary.
      
        mem_cgroup_commit_charge() commits the page to the charge once it
        has a valid page->mapping and PageAnon() reliably tells the type.
      
        mem_cgroup_cancel_charge() aborts the transaction.
      
      This reduces the charge API and enables subsequent patches to
      drastically simplify uncharging.
      
      As pages need to be committed after rmap is established but before they
      are added to the LRU, page_add_new_anon_rmap() must stop doing LRU
      additions again.  Revive lru_cache_add_active_or_unevictable().
      
      [hughd@google.com: fix shmem_unuse]
      [hughd@google.com: Add comments on the private use of -EAGAIN]
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00501b53
  15. 01 7月, 2014 1 次提交
  16. 05 6月, 2014 2 次提交
  17. 02 6月, 2014 2 次提交
  18. 26 5月, 2014 1 次提交
    • V
      ARM: 8043/1: uprobes need icache flush after xol write · 72e6ae28
      Victor Kamensky 提交于
      After instruction write into xol area, on ARM V7
      architecture code need to flush dcache and icache to sync
      them up for given set of addresses. Having just
      'flush_dcache_page(page)' call is not enough - it is
      possible to have stale instruction sitting in icache
      for given xol area slot address.
      
      Introduce arch_uprobe_ixol_copy weak function
      that by default calls uprobes copy_to_page function and
      than flush_dcache_page function and on ARM define new one
      that handles xol slot copy in ARM specific way
      
      flush_uprobe_xol_access function shares/reuses implementation
      with/of flush_ptrace_access function and takes care of writing
      instruction to user land address space on given variety of
      different cache types on ARM CPUs. Because
      flush_uprobe_xol_access does not have vma around
      flush_ptrace_access was split into two parts. First that
      retrieves set of condition from vma and common that receives
      those conditions as flags.
      
      Note ARM cache flush function need kernel address
      through which instruction write happened, so instead
      of using uprobes copy_to_page function changed
      code to explicitly map page and do memcpy.
      
      Note arch_uprobe_copy_ixol function, in similar way as
      copy_to_user_page function, has preempt_disable/preempt_enable.
      Signed-off-by: NVictor Kamensky <victor.kamensky@linaro.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NDavid A. Long <dave.long@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      72e6ae28
  19. 14 5月, 2014 2 次提交
    • O
      uprobes/x86: Fix the wrong ->si_addr when xol triggers a trap · b02ef20a
      Oleg Nesterov 提交于
      If the probed insn triggers a trap, ->si_addr = regs->ip is technically
      correct, but this is not what the signal handler wants; we need to pass
      the address of the probed insn, not the address of xol slot.
      
      Add the new arch-agnostic helper, uprobe_get_trap_addr(), and change
      fill_trap_info() and math_error() to use it. !CONFIG_UPROBES case in
      uprobes.h uses a macro to avoid include hell and ensure that it can be
      compiled even if an architecture doesn't define instruction_pointer().
      
      Test-case:
      
      	#include <signal.h>
      	#include <stdio.h>
      	#include <unistd.h>
      
      	extern void probe_div(void);
      
      	void sigh(int sig, siginfo_t *info, void *c)
      	{
      		int passed = (info->si_addr == probe_div);
      		printf(passed ? "PASS\n" : "FAIL\n");
      		_exit(!passed);
      	}
      
      	int main(void)
      	{
      		struct sigaction sa = {
      			.sa_sigaction	= sigh,
      			.sa_flags	= SA_SIGINFO,
      		};
      
      		sigaction(SIGFPE, &sa, NULL);
      
      		asm (
      			"xor %ecx,%ecx\n"
      			".globl probe_div; probe_div:\n"
      			"idiv %ecx\n"
      		);
      
      		return 0;
      	}
      
      it fails if probe_div() is probed.
      
      Note: show_unhandled_signals users should probably use this helper too,
      but we need to cleanup them first.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      b02ef20a
    • O
      uprobes: Add mem_cgroup_charge_anon() into uprobe_write_opcode() · 29dedee0
      Oleg Nesterov 提交于
      Hugh says:
      
          The one I noticed was that it forgets all about memcg (because
          it was copied from KSM, and there the replacement page has already
          been charged to a memcg). See how mm/memory.c do_anonymous_page()
          does a mem_cgroup_charge_anon().
      
      Hopefully not a big problem, uprobes is a system-wide thing and only
      root can insert the probes. But I agree, should be fixed anyway.
      
      Add mem_cgroup_{un,}charge_anon() into uprobe_write_opcode(). To simplify
      the error handling (and avoid the new "uncharge" label) the patch also
      moves anon_vma_prepare() up before we alloc/charge the new page.
      
      While at it fix the comment about ->mmap_sem, it is held for write.
      Suggested-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      29dedee0
  20. 01 5月, 2014 1 次提交
    • O
      uprobes: Refuse to insert a probe into MAP_SHARED vma · 13f59c5e
      Oleg Nesterov 提交于
      valid_vma() rejects the VM_SHARED vmas, but this still allows to insert
      a probe into the MAP_SHARED but not VM_MAYWRITE vma.
      
      Currently this is fine, such a mapping doesn't really differ from the
      private read-only mmap except mprotect(PROT_WRITE) won't work. However,
      get_user_pages(FOLL_WRITE | FOLL_FORCE) doesn't allow to COW in this
      case, and it would be safer to follow the same conventions as mm even
      if currently this happens to work.
      
      After the recent cda540ac "mm: get_user_pages(write,force) refuse
      to COW in shared areas" only uprobes can insert an anon page into the
      shared file-backed area, lets stop this and change valid_vma() to check
      VM_MAYSHARE instead.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      13f59c5e