1. 11 3月, 2014 1 次提交
  2. 04 3月, 2014 1 次提交
    • D
      mm: close PageTail race · 668f9abb
      David Rientjes 提交于
      Commit bf6bddf1 ("mm: introduce compaction and migration for
      ballooned pages") introduces page_count(page) into memory compaction
      which dereferences page->first_page if PageTail(page).
      
      This results in a very rare NULL pointer dereference on the
      aforementioned page_count(page).  Indeed, anything that does
      compound_head(), including page_count() is susceptible to racing with
      prep_compound_page() and seeing a NULL or dangling page->first_page
      pointer.
      
      This patch uses Andrea's implementation of compound_trans_head() that
      deals with such a race and makes it the default compound_head()
      implementation.  This includes a read memory barrier that ensures that
      if PageTail(head) is true that we return a head page that is neither
      NULL nor dangling.  The patch then adds a store memory barrier to
      prep_compound_page() to ensure page->first_page is set.
      
      This is the safest way to ensure we see the head page that we are
      expecting, PageTail(page) is already in the unlikely() path and the
      memory barriers are unfortunately required.
      
      Hugetlbfs is the exception, we don't enforce a store memory barrier
      during init since no race is possible.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Holger Kiehl <Holger.Kiehl@dwd.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      668f9abb
  3. 11 2月, 2014 1 次提交
    • G
      vmcore: prevent PT_NOTE p_memsz overflow during header update · 38dfac84
      Greg Pearson 提交于
      Currently, update_note_header_size_elf64() and
      update_note_header_size_elf32() will add the size of a PT_NOTE entry to
      real_sz even if that causes real_sz to exceeds max_sz.  This patch
      corrects the while loop logic in those routines to ensure that does not
      happen and prints a warning if a PT_NOTE entry is dropped.  If zero
      PT_NOTE entries are found or this condition is encountered because the
      only entry was dropped, a warning is printed and an error is returned.
      
      One possible negative side effect of exceeding the max_sz limit is an
      allocation failure in merge_note_headers_elf64() or
      merge_note_headers_elf32() which would produce console output such as
      the following while booting the crash kernel.
      
        vmalloc: allocation failure: 14076997632 bytes
        swapper/0: page allocation failure: order:0, mode:0x80d2
        CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-gbp1 #7
        Call Trace:
          dump_stack+0x19/0x1b
          warn_alloc_failed+0xf0/0x160
          __vmalloc_node_range+0x19e/0x250
          vmalloc_user+0x4c/0x70
          merge_note_headers_elf64.constprop.9+0x116/0x24a
          vmcore_init+0x2d4/0x76c
          do_one_initcall+0xe2/0x190
          kernel_init_freeable+0x17c/0x207
          kernel_init+0xe/0x180
          ret_from_fork+0x7c/0xb0
      
        Kdump: vmcore not initialized
      
        kdump: dump target is /dev/sda4
        kdump: saving to /sysroot//var/crash/127.0.0.1-2014.01.28-13:58:52/
        kdump: saving vmcore-dmesg.txt
        Cannot open /proc/vmcore: No such file or directory
        kdump: saving vmcore-dmesg.txt failed
        kdump: saving vmcore
        kdump: saving vmcore failed
      
      This type of failure has been seen on a four socket prototype system
      with certain memory configurations.  Most PT_NOTE sections have a single
      entry similar to:
      
        n_namesz = 0x5
        n_descsz = 0x150
        n_type   = 0x1
      
      Occasionally, a second entry is encountered with very large n_namesz and
      n_descsz sizes:
      
        n_namesz = 0x80000008
        n_descsz = 0x510ae163
        n_type   = 0x80000008
      
      Not yet sure of the source of these extra entries, they seem bogus, but
      they shouldn't cause crash dump to fail.
      Signed-off-by: NGreg Pearson <greg.pearson@hp.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      38dfac84
  4. 24 1月, 2014 10 次提交
  5. 22 1月, 2014 1 次提交
    • R
      /proc/meminfo: provide estimated available memory · 34e431b0
      Rik van Riel 提交于
      Many load balancing and workload placing programs check /proc/meminfo to
      estimate how much free memory is available.  They generally do this by
      adding up "free" and "cached", which was fine ten years ago, but is
      pretty much guaranteed to be wrong today.
      
      It is wrong because Cached includes memory that is not freeable as page
      cache, for example shared memory segments, tmpfs, and ramfs, and it does
      not include reclaimable slab memory, which can take up a large fraction
      of system memory on mostly idle systems with lots of files.
      
      Currently, the amount of memory that is available for a new workload,
      without pushing the system into swap, can be estimated from MemFree,
      Active(file), Inactive(file), and SReclaimable, as well as the "low"
      watermarks from /proc/zoneinfo.
      
      However, this may change in the future, and user space really should not
      be expected to know kernel internals to come up with an estimate for the
      amount of free memory.
      
      It is more convenient to provide such an estimate in /proc/meminfo.  If
      things change in the future, we only have to change it in one place.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Reported-by: NErik Mouw <erik.mouw_2@nxp.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34e431b0
  6. 13 12月, 2013 1 次提交
    • J
      procfs: also fix proc_reg_get_unmapped_area() for !MMU case · ae5758a1
      Jan Beulich 提交于
      Commit fad1a86e ("procfs: call default get_unmapped_area on
      MMU-present architectures"), as its title says, took care of only the
      MMU case, leaving the !MMU side still in the regressed state (returning
      -EIO in all cases where pde->proc_fops->get_unmapped_area is NULL).
      
      From the fad1a86e changelog:
      
       "Commit c4fe2448 ("sparc: fix PCI device proc file mmap(2)") added
        proc_reg_get_unmapped_area in proc_reg_file_ops and
        proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
        get_unmapped_area method is not defined for the target procfs file, which
        causes regression of mmap on /proc/vmcore.
      
        To address this issue, like get_unmapped_area(), call default
        current->mm->get_unmapped_area on MMU-present architectures if
        pde->proc_fops->get_unmapped_area, i.e.  the one in actual file operation
        in the procfs file, is not defined"
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: <stable@vger.kernel.org>	[3.12.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ae5758a1
  7. 16 11月, 2013 1 次提交
  8. 15 11月, 2013 4 次提交
    • T
      seq_file: remove "%n" usage from seq_file users · 652586df
      Tetsuo Handa 提交于
      All seq_printf() users are using "%n" for calculating padding size,
      convert them to use seq_setwidth() / seq_pad() pair.
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      652586df
    • K
      mm, hugetlb: convert hugetlbfs to use split pmd lock · cb900f41
      Kirill A. Shutemov 提交于
      Hugetlb supports multiple page sizes. We use split lock only for PMD
      level, but not for PUD.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NAlex Thorlton <athorlton@sgi.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Robin Holt <robinmholt@gmail.com>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb900f41
    • K
      mm, thp: change pmd_trans_huge_lock() to return taken lock · bf929152
      Kirill A. Shutemov 提交于
      With split ptlock it's important to know which lock
      pmd_trans_huge_lock() took.  This patch adds one more parameter to the
      function to return the lock.
      
      In most places migration to new api is trivial.  Exception is
      move_huge_pmd(): we need to take two locks if pmd tables are different.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NAlex Thorlton <athorlton@sgi.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Robin Holt <robinmholt@gmail.com>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bf929152
    • K
      mm: convert mm->nr_ptes to atomic_long_t · e1f56c89
      Kirill A. Shutemov 提交于
      With split page table lock for PMD level we can't hold mm->page_table_lock
      while updating nr_ptes.
      
      Let's convert it to atomic_long_t to avoid races.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NAlex Thorlton <athorlton@sgi.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Robin Holt <robinmholt@gmail.com>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e1f56c89
  9. 13 11月, 2013 6 次提交
  10. 06 11月, 2013 1 次提交
  11. 25 10月, 2013 1 次提交
  12. 17 10月, 2013 3 次提交
  13. 10 10月, 2013 1 次提交
    • R
      of: remove HAVE_ARCH_DEVTREE_FIXUPS · 32df8dca
      Rob Herring 提交于
      HAVE_ARCH_DEVTREE_FIXUPS appears to always be needed except for sparc,
      but it is only used for /proc/device-teee and sparc does not enable
      /proc/device-tree. So this option is redundant. Remove the option and
      always enable it. This has the side effect of fixing /proc/device-tree
      on arches such as arm64 which failed to define this option.
      Signed-off-by: NRob Herring <rob.herring@calxeda.com>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      Acked-by: NGrant Likely <grant.likely@linaro.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      32df8dca
  14. 09 10月, 2013 1 次提交
  15. 13 9月, 2013 1 次提交
  16. 12 9月, 2013 6 次提交
    • M
      vmcore: enable /proc/vmcore mmap for s390 · 11e376a3
      Michael Holzheu 提交于
      The patch "s390/vmcore: Implement remap_oldmem_pfn_range for s390" allows
      now to use mmap also on s390.
      
      So enable mmap for s390 again.
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Jan Willeke <willeke@de.ibm.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      11e376a3
    • M
      vmcore: introduce remap_oldmem_pfn_range() · 9cb21813
      Michael Holzheu 提交于
      For zfcpdump we can't map the HSA storage because it is only available via
      a read interface.  Therefore, for the new vmcore mmap feature we have
      introduce a new mechanism to create mappings on demand.
      
      This patch introduces a new architecture function remap_oldmem_pfn_range()
      that should be used to create mappings with remap_pfn_range() for oldmem
      areas that can be directly mapped.  For zfcpdump this is everything
      besides of the HSA memory.  For the areas that are not mapped by
      remap_oldmem_pfn_range() a generic vmcore a new generic vmcore fault
      handler mmap_vmcore_fault() is called.
      
      This handler works as follows:
      
      * Get already available or new page from page cache (find_or_create_page)
      * Check if /proc/vmcore page is filled with data (PageUptodate)
      * If yes:
        Return that page
      * If no:
        Fill page using __vmcore_read(), set PageUptodate, and return page
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Jan Willeke <willeke@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9cb21813
    • M
      vmcore: introduce ELF header in new memory feature · be8a8d06
      Michael Holzheu 提交于
      For s390 we want to use /proc/vmcore for our SCSI stand-alone dump
      (zfcpdump).  We have support where the first HSA_SIZE bytes are saved into
      a hypervisor owned memory area (HSA) before the kdump kernel is booted.
      When the kdump kernel starts, it is restricted to use only HSA_SIZE bytes.
      
      The advantages of this mechanism are:
      
       * No crashkernel memory has to be defined in the old kernel.
       * Early boot problems (before kexec_load has been done) can be dumped
       * Non-Linux systems can be dumped.
      
      We modify the s390 copy_oldmem_page() function to read from the HSA memory
      if memory below HSA_SIZE bytes is requested.
      
      Since we cannot use the kexec tool to load the kernel in this scenario,
      we have to build the ELF header in the 2nd (kdump/new) kernel.
      
      So with the following patch set we would like to introduce the new
      function that the ELF header for /proc/vmcore can be created in the 2nd
      kernel memory.
      
      The following steps are done during zfcpdump execution:
      
      1.  Production system crashes
      2.  User boots a SCSI disk that has been prepared with the zfcpdump tool
      3.  Hypervisor saves CPU state of boot CPU and HSA_SIZE bytes of memory into HSA
      4.  Boot loader loads kernel into low memory area
      5.  Kernel boots and uses only HSA_SIZE bytes of memory
      6.  Kernel saves registers of non-boot CPUs
      7.  Kernel does memory detection for dump memory map
      8.  Kernel creates ELF header for /proc/vmcore
      9.  /proc/vmcore uses this header for initialization
      10. The zfcpdump user space reads /proc/vmcore to write dump to SCSI disk
          - copy_oldmem_page() copies from HSA for memory below HSA_SIZE
          - copy_oldmem_page() copies from real memory for memory above HSA_SIZE
      
      Currently for s390 we create the ELF core header in the 2nd kernel with a
      small trick.  We relocate the addresses in the ELF header in a way that
      for the /proc/vmcore code it seems to be in the 1st kernel (old) memory
      and the read_from_oldmem() returns the correct data.  This allows the
      /proc/vmcore code to use the ELF header in the 2nd kernel.
      
      This patch:
      
      Exchange the old mechanism with the new and much cleaner function call
      override feature that now offcially allows to create the ELF core header
      in the 2nd kernel.
      
      To use the new feature the following function have to be defined
      by the architecture backend code to read from new memory:
      
       * elfcorehdr_alloc: Allocate ELF header
       * elfcorehdr_free: Free the memory of the ELF header
       * elfcorehdr_read: Read from ELF header
       * elfcorehdr_read_notes: Read from ELF notes
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Jan Willeke <willeke@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be8a8d06
    • O
      proc: make proc_fd_permission() thread-friendly · 96d0df79
      Oleg Nesterov 提交于
      proc_fd_permission() says "process can still access /proc/self/fd after it
      has executed a setuid()", but the "task_pid() = proc_pid() check only
      helps if the task is group leader, /proc/self points to
      /proc/<leader-pid>.
      
      Change this check to use task_tgid() so that the whole thread group can
      access its /proc/self/fd or /proc/<tid-of-sub-thread>/fd.
      
      Notes:
      	- CLONE_THREAD does not require CLONE_FILES so task->files
      	  can differ, but I don't think this can lead to any security
      	  problem. And this matches same_thread_group() in
      	  __ptrace_may_access().
      
      	- /proc/self should probably point to /proc/<thread-tid>, but
      	  it is too late to change the rules. Perhaps it makes sense
      	  to add /proc/thread though.
      
      Test-case:
      
      	void *tfunc(void *arg)
      	{
      		assert(opendir("/proc/self/fd"));
      		return NULL;
      	}
      
      	int main(void)
      	{
      		pthread_t t;
      		pthread_create(&t, NULL, tfunc, NULL);
      		pthread_join(t, NULL);
      		return 0;
      	}
      
      fails if, say, this executable is not readable and suid_dumpable = 0.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96d0df79
    • C
      fs/proc/task_mmu.c: check the return value of mpol_to_str() · a3c03992
      Chen Gang 提交于
      mpol_to_str() may fail, and not fill the buffer (e.g. -EINVAL), so need
      check about it, or buffer may not be zero based, and next seq_printf()
      will cause issue.
      
      The failure return need after mpol_cond_put() to match get_vma_policy().
      Signed-off-by: NChen Gang <gang.chen@asianux.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a3c03992
    • C
      mm: track vma changes with VM_SOFTDIRTY bit · d9104d1c
      Cyrill Gorcunov 提交于
      Pavel reported that in case if vma area get unmapped and then mapped (or
      expanded) in-place, the soft dirty tracker won't be able to recognize this
      situation since it works on pte level and ptes are get zapped on unmap,
      loosing soft dirty bit of course.
      
      So to resolve this situation we need to track actions on vma level, there
      VM_SOFTDIRTY flag comes in.  When new vma area created (or old expanded)
      we set this bit, and keep it here until application calls for clearing
      soft dirty bit.
      
      Thus when user space application track memory changes now it can detect if
      vma area is renewed.
      Reported-by: NPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Rob Landley <rob@landley.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d9104d1c