1. 24 10月, 2012 2 次提交
    • V
      perf/x86: Fix P6 FP_ASSIST event constraint · 7991c9ca
      Vince Weaver 提交于
      According to Intel SDM Volume 3B, FP_ASSIST is limited to Counter 1 only,
      not Counter 0.
      
      Tested on a Pentium II.
      Signed-off-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1210191728570.14552@vincent-weaver-1.um.maine.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7991c9ca
    • A
      x86/perf: Fix virtualization sanity check · bffd5fc2
      Andre Przywara 提交于
      In check_hw_exists() we try to detect non-emulated MSR accesses
      by writing an arbitrary value into one of the PMU registers
      and check if it's value after a readout is still the same.
      This algorithm silently assumes that the register does not contain
      the magic value already, which is wrong in at least one situation.
      
      Fix the algorithm to really do a read-modify-write cycle. This fixes
      a warning under Xen under some circumstances on AMD family 10h CPUs.
      
      The reasons in more details actually sound like a story from
      Believe It or Not!:
      
      First you need an AMD family 10h/12h CPU. These do not reset the
      PERF_CTR registers on a reboot.
      Now you boot bare metal Linux, which goes successfully through this
      check, but leaves the magic value of 0xabcd in the register. You
      don't use the performance counters, but do a reboot (warm reset).
      Then you choose to boot Xen. The check will be triggered with a
      recent Linux kernel as Dom0 again, trying to write 0xabcd into the
      MSR. Xen silently drops the write (expected), but the subsequent read
      will return the value in the register, which just happens to be the
      expected magic value. Thus the test misleadingly succeeds, leaving
      the kernel in the belief that the PMU is available. This will trigger
      the following message:
      
      [    0.020294] ------------[ cut here ]------------
      [    0.020311] WARNING: at arch/x86/xen/enlighten.c:730 xen_apic_write+0x15/0x17()
      [    0.020318] Hardware name: empty
      [    0.020323] Modules linked in:
      [    0.020334] Pid: 1, comm: swapper/0 Not tainted 3.3.8 #7
      [    0.020340] Call Trace:
      [    0.020354]  [<ffffffff81050379>] warn_slowpath_common+0x80/0x98
      [    0.020369]  [<ffffffff810503a6>] warn_slowpath_null+0x15/0x17
      [    0.020378]  [<ffffffff810034df>] xen_apic_write+0x15/0x17
      [    0.020392]  [<ffffffff8101cb2b>] perf_events_lapic_init+0x2e/0x30
      [    0.020410]  [<ffffffff81ee4dd0>] init_hw_perf_events+0x250/0x407
      [    0.020419]  [<ffffffff81ee4b80>] ? check_bugs+0x2d/0x2d
      [    0.020430]  [<ffffffff81002181>] do_one_initcall+0x7a/0x131
      [    0.020444]  [<ffffffff81edbbf9>] kernel_init+0x91/0x15d
      [    0.020456]  [<ffffffff817caaa4>] kernel_thread_helper+0x4/0x10
      [    0.020471]  [<ffffffff817c347c>] ? retint_restore_args+0x5/0x6
      [    0.020481]  [<ffffffff817caaa0>] ? gs_change+0x13/0x13
      [    0.020500] ---[ end trace a7919e7f17c0a725 ]---
      
      The new code will change every of the 16 low bits read from the
      register and tries to write and read-back that modified number
      from the MSR.
      Signed-off-by: NAndre Przywara <andre.przywara@amd.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Avi Kivity <avi@redhat.com>
      Link: http://lkml.kernel.org/r/1349797115-28346-2-git-send-email-andre.przywara@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bffd5fc2
  2. 23 10月, 2012 3 次提交
    • S
      KVM guest: exit idleness when handling KVM_PV_REASON_PAGE_NOT_PRESENT · c5e015d4
      Sasha Levin 提交于
      KVM_PV_REASON_PAGE_NOT_PRESENT kicks cpu out of idleness, but we haven't
      marked that spot as an exit from idleness.
      
      Not doing so can cause RCU warnings such as:
      
      [  732.788386] ===============================
      [  732.789803] [ INFO: suspicious RCU usage. ]
      [  732.790032] 3.7.0-rc1-next-20121019-sasha-00002-g6d8d02d-dirty #63 Tainted: G        W
      [  732.790032] -------------------------------
      [  732.790032] include/linux/rcupdate.h:738 rcu_read_lock() used illegally while idle!
      [  732.790032]
      [  732.790032] other info that might help us debug this:
      [  732.790032]
      [  732.790032]
      [  732.790032] RCU used illegally from idle CPU!
      [  732.790032] rcu_scheduler_active = 1, debug_locks = 1
      [  732.790032] RCU used illegally from extended quiescent state!
      [  732.790032] 2 locks held by trinity-child31/8252:
      [  732.790032]  #0:  (&rq->lock){-.-.-.}, at: [<ffffffff83a67528>] __schedule+0x178/0x8f0
      [  732.790032]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81152bde>] cpuacct_charge+0xe/0x200
      [  732.790032]
      [  732.790032] stack backtrace:
      [  732.790032] Pid: 8252, comm: trinity-child31 Tainted: G        W    3.7.0-rc1-next-20121019-sasha-00002-g6d8d02d-dirty #63
      [  732.790032] Call Trace:
      [  732.790032]  [<ffffffff8118266b>] lockdep_rcu_suspicious+0x10b/0x120
      [  732.790032]  [<ffffffff81152c60>] cpuacct_charge+0x90/0x200
      [  732.790032]  [<ffffffff81152bde>] ? cpuacct_charge+0xe/0x200
      [  732.790032]  [<ffffffff81158093>] update_curr+0x1a3/0x270
      [  732.790032]  [<ffffffff81158a6a>] dequeue_entity+0x2a/0x210
      [  732.790032]  [<ffffffff81158ea5>] dequeue_task_fair+0x45/0x130
      [  732.790032]  [<ffffffff8114ae29>] dequeue_task+0x89/0xa0
      [  732.790032]  [<ffffffff8114bb9e>] deactivate_task+0x1e/0x20
      [  732.790032]  [<ffffffff83a67c29>] __schedule+0x879/0x8f0
      [  732.790032]  [<ffffffff8117e20d>] ? trace_hardirqs_off+0xd/0x10
      [  732.790032]  [<ffffffff810a37a5>] ? kvm_async_pf_task_wait+0x1d5/0x2b0
      [  732.790032]  [<ffffffff83a67cf5>] schedule+0x55/0x60
      [  732.790032]  [<ffffffff810a37c4>] kvm_async_pf_task_wait+0x1f4/0x2b0
      [  732.790032]  [<ffffffff81139e50>] ? abort_exclusive_wait+0xb0/0xb0
      [  732.790032]  [<ffffffff81139c25>] ? prepare_to_wait+0x25/0x90
      [  732.790032]  [<ffffffff810a3a66>] do_async_page_fault+0x56/0xa0
      [  732.790032]  [<ffffffff83a6a6e8>] async_page_fault+0x28/0x30
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Acked-by: NGleb Natapov <gleb@redhat.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      c5e015d4
    • G
      7f46ddbd
    • X
      KVM: MMU: fix release noslot pfn · f3ac1a4b
      Xiao Guangrong 提交于
      We can not directly call kvm_release_pfn_clean to release the pfn
      since we can meet noslot pfn which is used to cache mmio info into
      spte
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      f3ac1a4b
  3. 20 10月, 2012 5 次提交
  4. 19 10月, 2012 2 次提交
  5. 18 10月, 2012 2 次提交
  6. 16 10月, 2012 2 次提交
  7. 15 10月, 2012 2 次提交
  8. 13 10月, 2012 1 次提交
  9. 12 10月, 2012 3 次提交
    • K
      xen/bootup: allow {read|write}_cr8 pvops call. · 1a7bbda5
      Konrad Rzeszutek Wilk 提交于
      We actually do not do anything about it. Just return a default
      value of zero and if the kernel tries to write anything but 0
      we BUG_ON.
      
      This fixes the case when an user tries to suspend the machine
      and it blows up in save_processor_state b/c 'read_cr8' is set
      to NULL and we get:
      
      kernel BUG at /home/konrad/ssd/linux/arch/x86/include/asm/paravirt.h:100!
      invalid opcode: 0000 [#1] SMP
      Pid: 2687, comm: init.late Tainted: G           O 3.6.0upstream-00002-gac264ac-dirty #4 Bochs Bochs
      RIP: e030:[<ffffffff814d5f42>]  [<ffffffff814d5f42>] save_processor_state+0x212/0x270
      
      .. snip..
      Call Trace:
       [<ffffffff810733bf>] do_suspend_lowlevel+0xf/0xac
       [<ffffffff8107330c>] ? x86_acpi_suspend_lowlevel+0x10c/0x150
       [<ffffffff81342ee2>] acpi_suspend_enter+0x57/0xd5
      
      CC: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      1a7bbda5
    • K
      xen/bootup: allow read_tscp call for Xen PV guests. · cd0608e7
      Konrad Rzeszutek Wilk 提交于
      The hypervisor will trap it. However without this patch,
      we would crash as the .read_tscp is set to NULL. This patch
      fixes it and sets it to the native_read_tscp call.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      cd0608e7
    • J
      kgdb,x86: fix warning about unused variable · 42c12213
      Jason Wessel 提交于
      When compiling without CONFIG_DEBUG_RODATA the following
      compiler warning is generated:
      
      arch/x86/kernel/kgdb.c: In function 'kgdb_arch_set_breakpoint':
      arch/x86/kernel/kgdb.c:749: warning: unused variable 'opc'
      
      The variable instantiation needs to be inside the #ifdef to
      make the warning go away.
      Reported-by: NThiago Rafael Becker <trbecker@trbecker.org>
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      42c12213
  10. 10 10月, 2012 4 次提交
  11. 09 10月, 2012 14 次提交
    • D
      mm: Add and use update_mmu_cache_pmd() in transparent huge page code. · b113da65
      David Miller 提交于
      The transparent huge page code passes a PMD pointer in as the third
      argument of update_mmu_cache(), which expects a PTE pointer.
      
      This never got noticed because X86 implements update_mmu_cache() as a
      macro and thus we don't get any type checking, and X86 is the only
      architecture which supports transparent huge pages currently.
      
      Before other architectures can support transparent huge pages properly we
      need to add a new interface which will take a PMD pointer as the third
      argument rather than a PTE pointer.
      
      [akpm@linux-foundation.org: implement update_mm_cache_pmd() for s390]
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b113da65
    • A
      mm: thp: fix pmd_present for split_huge_page and PROT_NONE with THP · 027ef6c8
      Andrea Arcangeli 提交于
      In many places !pmd_present has been converted to pmd_none.  For pmds
      that's equivalent and pmd_none is quicker so using pmd_none is better.
      
      However (unless we delete pmd_present) we should provide an accurate
      pmd_present too.  This will avoid the risk of code thinking the pmd is non
      present because it's under __split_huge_page_map, see the pmd_mknotpresent
      there and the comment above it.
      
      If the page has been mprotected as PROT_NONE, it would also lead to a
      pmd_present false negative in the same way as the race with
      split_huge_page.
      
      Because the PSE bit stays on at all times (both during split_huge_page and
      when the _PAGE_PROTNONE bit get set), we could only check for the PSE bit,
      but checking the PROTNONE bit too is still good to remember pmd_present
      must always keep PROT_NONE into account.
      
      This explains a not reproducible BUG_ON that was seldom reported on the
      lists.
      
      The same issue is in pmd_large, it would go wrong with both PROT_NONE and
      if it races with split_huge_page.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      027ef6c8
    • S
      readahead: fault retry breaks mmap file read random detection · 45cac65b
      Shaohua Li 提交于
      .fault now can retry.  The retry can break state machine of .fault.  In
      filemap_fault, if page is miss, ra->mmap_miss is increased.  In the second
      try, since the page is in page cache now, ra->mmap_miss is decreased.  And
      these are done in one fault, so we can't detect random mmap file access.
      
      Add a new flag to indicate .fault is tried once.  In the second try, skip
      ra->mmap_miss decreasing.  The filemap_fault state machine is ok with it.
      
      I only tested x86, didn't test other archs, but looks the change for other
      archs is obvious, but who knows :)
      Signed-off-by: NShaohua Li <shaohua.li@fusionio.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      45cac65b
    • S
      atomic: implement generic atomic_dec_if_positive() · e79bee24
      Shaohua Li 提交于
      The x86 implementation of atomic_dec_if_positive is quite generic, so make
      it available to all architectures.
      
      This is needed for "swap: add a simple detector for inappropriate swapin
      readahead".
      
      [akpm@linux-foundation.org: do the "#define foo foo" trick in the conventional manner]
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e79bee24
    • M
      rbtree: move augmented rbtree functionality to rbtree_augmented.h · 9c079add
      Michel Lespinasse 提交于
      Provide rb_insert_augmented() and rb_erase_augmented() through a new
      rbtree_augmented.h include file.  rb_erase_augmented() is defined there as
      an __always_inline function, in order to allow inlining of augmented
      rbtree callbacks into it.  Since this generates a relatively large
      function, each augmented rbtree user should make sure to have a single
      call site.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c079add
    • M
      mm: replace vma prio_tree with an interval tree · 6b2dbba8
      Michel Lespinasse 提交于
      Implement an interval tree as a replacement for the VMA prio_tree.  The
      algorithms are similar to lib/interval_tree.c; however that code can't be
      directly reused as the interval endpoints are not explicitly stored in the
      VMA.  So instead, the common algorithm is moved into a template and the
      details (node type, how to get interval endpoints from the node, etc) are
      filled in using the C preprocessor.
      
      Once the interval tree functions are available, using them as a
      replacement to the VMA prio tree is a relatively simple, mechanical job.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b2dbba8
    • M
      rbtree: add RB_DECLARE_CALLBACKS() macro · 3908836a
      Michel Lespinasse 提交于
      As proposed by Peter Zijlstra, this makes it easier to define the augmented
      rbtree callbacks.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3908836a
    • M
      rbtree: remove prior augmented rbtree implementation · 9d9e6f97
      Michel Lespinasse 提交于
      convert arch/x86/mm/pat_rbtree.c to the proposed augmented rbtree api
      and remove the old augmented rbtree implementation.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9d9e6f97
    • G
      thp, x86: introduce HAVE_ARCH_TRANSPARENT_HUGEPAGE · 15626062
      Gerald Schaefer 提交于
      Cleanup patch in preparation for transparent hugepage support on s390.
      Adding new architectures to the TRANSPARENT_HUGEPAGE config option can
      make the "depends" line rather ugly, like "depends on (X86 || (S390 &&
      64BIT)) && MMU".
      
      This patch adds a HAVE_ARCH_TRANSPARENT_HUGEPAGE instead.  x86 already has
      MMU "def_bool y", so the MMU check is superfluous there and
      HAVE_ARCH_TRANSPARENT_HUGEPAGE can be selected in arch/x86/Kconfig.
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      15626062
    • W
      mm: hugetlb: add arch hook for clearing page flags before entering pool · 5d3a551c
      Will Deacon 提交于
      The core page allocator ensures that page flags are zeroed when freeing
      pages via free_pages_check.  A number of architectures (ARM, PPC, MIPS)
      rely on this property to treat new pages as dirty with respect to the data
      cache and perform the appropriate flushing before mapping the pages into
      userspace.
      
      This can lead to cache synchronisation problems when using hugepages,
      since the allocator keeps its own pool of pages above the usual page
      allocator and does not reset the page flags when freeing a page into the
      pool.
      
      This patch adds a new architecture hook, arch_clear_hugepage_flags, so
      that architectures which rely on the page flags being in a particular
      state for fresh allocations can adjust the flags accordingly when a page
      is freed into the pool.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d3a551c
    • K
      mm: kill vma flag VM_RESERVED and mm->reserved_vm counter · 314e51b9
      Konstantin Khlebnikov 提交于
      A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
      currently it lost original meaning but still has some effects:
      
       | effect                 | alternative flags
      -+------------------------+---------------------------------------------
      1| account as reserved_vm | VM_IO
      2| skip in core dump      | VM_IO, VM_DONTDUMP
      3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
      4| do not mlock           | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
      
      This patch removes reserved_vm counter from mm_struct.  Seems like nobody
      cares about it, it does not exported into userspace directly, it only
      reduces total_vm showed in proc.
      
      Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.
      
      remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
      remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.
      
      [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      314e51b9
    • K
      mm, x86, pat: rework linear pfn-mmap tracking · b3b9c293
      Konstantin Khlebnikov 提交于
      Replace the generic vma-flag VM_PFN_AT_MMAP with x86-only VM_PAT.
      
      We can toss mapping address from remap_pfn_range() into
      track_pfn_vma_new(), and collect all PAT-related logic together in
      arch/x86/.
      
      This patch also restores orignal frustration-free is_cow_mapping() check
      in remap_pfn_range(), as it was before commit v2.6.28-rc8-88-g3c8bb73a
      ("x86: PAT: store vm_pgoff for all linear_over_vma_region mappings - v3")
      
      is_linear_pfn_mapping() checks can be removed from mm/huge_memory.c,
      because it already handled by VM_PFNMAP in VM_NO_THP bit-mask.
      
      [suresh.b.siddha@intel.com: Reset the VM_PAT flag as part of untrack_pfn_vma()]
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3b9c293
    • S
      x86, pat: separate the pfn attribute tracking for remap_pfn_range and vm_insert_pfn · 5180da41
      Suresh Siddha 提交于
      With PAT enabled, vm_insert_pfn() looks up the existing pfn memory
      attribute and uses it.  Expectation is that the driver reserves the
      memory attributes for the pfn before calling vm_insert_pfn().
      
      remap_pfn_range() (when called for the whole vma) will setup a new
      attribute (based on the prot argument) for the specified pfn range.
      This addresses the legacy usage which typically calls remap_pfn_range()
      with a desired memory attribute.  For ranges smaller than the vma size
      (which is typically not the case), remap_pfn_range() will use the
      existing memory attribute for the pfn range.
      
      Expose two different API's for these different behaviors.
      track_pfn_insert() for tracking the pfn attribute set by vm_insert_pfn()
      and track_pfn_remap() for the remap_pfn_range().
      
      This cleanup also prepares the ground for the track/untrack pfn vma
      routines to take over the ownership of setting PAT specific vm_flag in
      the 'vma'.
      
      [khlebnikov@openvz.org: Clear checks in track_pfn_remap()]
      [akpm@linux-foundation.org: tweak a few comments]
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5180da41
    • S
      x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines · b1a86e15
      Suresh Siddha 提交于
      'pfn' argument for track_pfn_vma_new() can be used for reserving the
      attribute for the pfn range.  No need to depend on 'vm_pgoff'
      
      Similarly, untrack_pfn_vma() can depend on the 'pfn' argument if it is
      non-zero or can use follow_phys() to get the starting value of the pfn
      range.
      
      Also the non zero 'size' argument can be used instead of recomputing it
      from vma.
      
      This cleanup also prepares the ground for the track/untrack pfn vma
      routines to take over the ownership of setting PAT specific vm_flag in the
      'vma'.
      
      [khlebnikov@openvz.org: Clear pfn to paddr conversion]
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1a86e15