1. 17 7月, 2018 1 次提交
    • R
      x86/mm/tlb: Leave lazy TLB mode at page table free time · 2ff6ddf1
      Rik van Riel 提交于
      Andy discovered that speculative memory accesses while in lazy
      TLB mode can crash a system, when a CPU tries to dereference a
      speculative access using memory contents that used to be valid
      page table memory, but have since been reused for something else
      and point into la-la land.
      
      The latter problem can be prevented in two ways. The first is to
      always send a TLB shootdown IPI to CPUs in lazy TLB mode, while
      the second one is to only send the TLB shootdown at page table
      freeing time.
      
      The second should result in fewer IPIs, since operationgs like
      mprotect and madvise are very common with some workloads, but
      do not involve page table freeing. Also, on munmap, batching
      of page table freeing covers much larger ranges of virtual
      memory than the batching of unmapped user pages.
      Tested-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NRik van Riel <riel@surriel.com>
      Acked-by: NDave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: efault@gmx.de
      Cc: kernel-team@fb.com
      Cc: luto@kernel.org
      Link: http://lkml.kernel.org/r/20180716190337.26133-3-riel@surriel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2ff6ddf1
  2. 15 7月, 2018 1 次提交
  3. 11 8月, 2017 2 次提交
    • M
      mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem · 99baac21
      Minchan Kim 提交于
      Nadav reported parallel MADV_DONTNEED on same range has a stale TLB
      problem and Mel fixed it[1] and found same problem on MADV_FREE[2].
      
      Quote from Mel Gorman:
       "The race in question is CPU 0 running madv_free and updating some PTEs
        while CPU 1 is also running madv_free and looking at the same PTEs.
        CPU 1 may have writable TLB entries for a page but fail the pte_dirty
        check (because CPU 0 has updated it already) and potentially fail to
        flush.
      
        Hence, when madv_free on CPU 1 returns, there are still potentially
        writable TLB entries and the underlying PTE is still present so that a
        subsequent write does not necessarily propagate the dirty bit to the
        underlying PTE any more. Reclaim at some unknown time at the future
        may then see that the PTE is still clean and discard the page even
        though a write has happened in the meantime. I think this is possible
        but I could have missed some protection in madv_free that prevents it
        happening."
      
      This patch aims for solving both problems all at once and is ready for
      other problem with KSM, MADV_FREE and soft-dirty story[3].
      
      TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending
      and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can
      catch there are parallel threads going on.  In that case, forcefully,
      flush TLB to prevent for user to access memory via stale TLB entry
      although it fail to gather page table entry.
      
      I confirmed this patch works with [4] test program Nadav gave so this
      patch supersedes "mm: Always flush VMA ranges affected by zap_page_range
      v2" in current mmotm.
      
      NOTE:
      
      This patch modifies arch-specific TLB gathering interface(x86, ia64,
      s390, sh, um).  It seems most of architecture are straightforward but
      s390 need to be careful because tlb_flush_mmu works only if
      mm->context.flush_mm is set to non-zero which happens only a pte entry
      really is cleared by ptep_get_and_clear and friends.  However, this
      problem never changes the pte entries but need to flush to prevent
      memory access from stale tlb.
      
      [1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net
      [2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrfawp@suse.de
      [3] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com
      [4] https://patchwork.kernel.org/patch/9861621/
      
      [minchan@kernel.org: decrease tlb flush pending count in tlb_finish_mmu]
        Link: http://lkml.kernel.org/r/20170808080821.GA31730@bbox
      Link: http://lkml.kernel.org/r/20170802000818.4760-7-namit@vmware.comSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Reported-by: NNadav Amit <namit@vmware.com>
      Reported-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      99baac21
    • M
      mm: refactor TLB gathering API · 56236a59
      Minchan Kim 提交于
      This patch is a preparatory patch for solving race problems caused by
      TLB batch.  For that, we will increase/decrease TLB flush pending count
      of mm_struct whenever tlb_[gather|finish]_mmu is called.
      
      Before making it simple, this patch separates architecture specific part
      and rename it to arch_tlb_[gather|finish]_mmu and generic part just
      calls it.
      
      It shouldn't change any behavior.
      
      Link: http://lkml.kernel.org/r/20170802000818.4760-5-namit@vmware.comSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      56236a59
  4. 10 3月, 2017 1 次提交
  5. 25 2月, 2017 1 次提交
  6. 13 12月, 2016 4 次提交
  7. 27 7月, 2016 2 次提交
  8. 23 11月, 2015 1 次提交
    • P
      treewide: Remove old email address · 90eec103
      Peter Zijlstra 提交于
      There were still a number of references to my old Red Hat email
      address in the kernel source. Remove these while keeping the
      Red Hat copyright notices intact.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      90eec103
  9. 13 1月, 2015 1 次提交
    • W
      mm: mmu_gather: use tlb->end != 0 only for TLB invalidation · 721c21c1
      Will Deacon 提交于
      When batching up address ranges for TLB invalidation, we check tlb->end
      != 0 to indicate that some pages have actually been unmapped.
      
      As of commit f045bbb9 ("mmu_gather: fix over-eager
      tlb_flush_mmu_free() calling"), we use the same check for freeing these
      pages in order to avoid a performance regression where we call
      free_pages_and_swap_cache even when no pages are actually queued up.
      
      Unfortunately, the range could have been reset (tlb->end = 0) by
      tlb_end_vma, which has been shown to cause memory leaks on arm64.
      Furthermore, investigation into these leaks revealed that the fullmm
      case on task exit no longer invalidates the TLB, by virtue of tlb->end
       == 0 (in 3.18, need_flush would have been set).
      
      This patch resolves the problem by reverting commit f045bbb9, using
      instead tlb->local.nr as the predicate for page freeing in
      tlb_flush_mmu_free and ensuring that tlb->end is initialised to a
      non-zero value in the fullmm case.
      Tested-by: NMark Langsdorf <mlangsdo@redhat.com>
      Tested-by: NDave Hansen <dave@sr71.net>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      721c21c1
  10. 17 11月, 2014 1 次提交
    • W
      mmu_gather: move minimal range calculations into generic code · fb7332a9
      Will Deacon 提交于
      On architectures with hardware broadcasting of TLB invalidation messages
      , it makes sense to reduce the range of the mmu_gather structure when
      unmapping page ranges based on the dirty address information passed to
      tlb_remove_tlb_entry.
      
      arm64 already does this by directly manipulating the start/end fields
      of the gather structure, but this confuses the generic code which
      does not expect these fields to change and can end up calculating
      invalid, negative ranges when forcing a flush in zap_pte_range.
      
      This patch moves the minimal range calculation out of the arm64 code
      and into the generic implementation, simplifying zap_pte_range in the
      process (which no longer needs to care about start/end, since they will
      point to the appropriate ranges already). With the range being tracked
      by core code, the need_flush flag is dropped in favour of checking that
      the end of the range has actually been set.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
      Cc: Michal Simek <monstr@monstr.eu>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      fb7332a9
  11. 16 8月, 2013 1 次提交
    • L
      Fix TLB gather virtual address range invalidation corner cases · 2b047252
      Linus Torvalds 提交于
      Ben Tebulin reported:
      
       "Since v3.7.2 on two independent machines a very specific Git
        repository fails in 9/10 cases on git-fsck due to an SHA1/memory
        failures.  This only occurs on a very specific repository and can be
        reproduced stably on two independent laptops.  Git mailing list ran
        out of ideas and for me this looks like some very exotic kernel issue"
      
      and bisected the failure to the backport of commit 53a59fc6 ("mm:
      limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT").
      
      That commit itself is not actually buggy, but what it does is to make it
      much more likely to hit the partial TLB invalidation case, since it
      introduces a new case in tlb_next_batch() that previously only ever
      happened when running out of memory.
      
      The real bug is that the TLB gather virtual memory range setup is subtly
      buggered.  It was introduced in commit 597e1c35 ("mm/mmu_gather:
      enable tlb flush range in generic mmu_gather"), and the range handling
      was already fixed at least once in commit e6c495a9 ("mm: fix the TLB
      range flushed when __tlb_remove_page() runs out of slots"), but that fix
      was not complete.
      
      The problem with the TLB gather virtual address range is that it isn't
      set up by the initial tlb_gather_mmu() initialization (which didn't get
      the TLB range information), but it is set up ad-hoc later by the
      functions that actually flush the TLB.  And so any such case that forgot
      to update the TLB range entries would potentially miss TLB invalidates.
      
      Rather than try to figure out exactly which particular ad-hoc range
      setup was missing (I personally suspect it's the hugetlb case in
      zap_huge_pmd(), which didn't have the same logic as zap_pte_range()
      did), this patch just gets rid of the problem at the source: make the
      TLB range information available to tlb_gather_mmu(), and initialize it
      when initializing all the other tlb gather fields.
      
      This makes the patch larger, but conceptually much simpler.  And the end
      result is much more understandable; even if you want to play games with
      partial ranges when invalidating the TLB contents in chunks, now the
      range information is always there, and anybody who doesn't want to
      bother with it won't introduce subtle bugs.
      
      Ben verified that this fixes his problem.
      Reported-bisected-and-tested-by: NBen Tebulin <tebulin@googlemail.com>
      Build-testing-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Build-testing-by: NRichard Weinberger <richard.weinberger@gmail.com>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2b047252
  12. 06 6月, 2013 1 次提交
    • P
      arch, mm: Remove tlb_fast_mode() · 29eb7782
      Peter Zijlstra 提交于
      Since the introduction of preemptible mmu_gather TLB fast mode has been
      broken. TLB fast mode relies on there being absolutely no concurrency;
      it frees pages first and invalidates TLBs later.
      
      However now we can get concurrency and stuff goes *bang*.
      
      This patch removes all tlb_fast_mode() code; it was found the better
      option vs trying to patch the hole by entangling tlb invalidation with
      the scheduler.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Reported-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29eb7782
  13. 13 4月, 2013 1 次提交
    • D
      x86-32: Fix possible incomplete TLB invalidate with PAE pagetables · 1de14c3c
      Dave Hansen 提交于
      This patch attempts to fix:
      
      	https://bugzilla.kernel.org/show_bug.cgi?id=56461
      
      The symptom is a crash and messages like this:
      
      	chrome: Corrupted page table at address 34a03000
      	*pdpt = 0000000000000000 *pde = 0000000000000000
      	Bad pagetable: 000f [#1] PREEMPT SMP
      
      Ingo guesses this got introduced by commit 611ae8e3 ("x86/tlb:
      enable tlb flush range support for x86") since that code started to free
      unused pagetables.
      
      On x86-32 PAE kernels, that new code has the potential to free an entire
      PMD page and will clear one of the four page-directory-pointer-table
      (aka pgd_t entries).
      
      The hardware aggressively "caches" these top-level entries and invlpg
      does not actually affect the CPU's copy.  If we clear one we *HAVE* to
      do a full TLB flush, otherwise we might continue using a freed pmd page.
      (note, we do this properly on the population side in pud_populate()).
      
      This patch tracks whenever we clear one of these entries in the 'struct
      mmu_gather', and ensures that we follow up with a full tlb flush.
      
      BTW, I disassembled and checked that:
      
      	if (tlb->fullmm == 0)
      and
      	if (!tlb->fullmm && !tlb->need_flush_all)
      
      generate essentially the same code, so there should be zero impact there
      to the !PAE case.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Artem S Tashkinov <t.artem@mailcity.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1de14c3c
  14. 05 1月, 2013 1 次提交
    • M
      mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT · 53a59fc6
      Michal Hocko 提交于
      Since commit e303297e ("mm: extended batches for generic
      mmu_gather") we are batching pages to be freed until either
      tlb_next_batch cannot allocate a new batch or we are done.
      
      This works just fine most of the time but we can get in troubles with
      non-preemptible kernel (CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY)
      on large machines where too aggressive batching might lead to soft
      lockups during process exit path (exit_mmap) because there are no
      scheduling points down the free_pages_and_swap_cache path and so the
      freeing can take long enough to trigger the soft lockup.
      
      The lockup is harmless except when the system is setup to panic on
      softlockup which is not that unusual.
      
      The simplest way to work around this issue is to limit the maximum
      number of batches in a single mmu_gather.  10k of collected pages should
      be safe to prevent from soft lockups (we would have 2ms for one) even if
      they are all freed without an explicit scheduling point.
      
      This patch doesn't add any new explicit scheduling points because it
      relies on zap_pmd_range during page tables zapping which calls
      cond_resched per PMD.
      
      The following lockup has been reported for 3.0 kernel with a huge
      process (in order of hundreds gigs but I do know any more details).
      
        BUG: soft lockup - CPU#56 stuck for 22s! [kernel:31053]
        Modules linked in: af_packet nfs lockd fscache auth_rpcgss nfs_acl sunrpc mptctl mptbase autofs4 binfmt_misc dm_round_robin dm_multipath bonding cpufreq_conservative cpufreq_userspace cpufreq_powersave pcc_cpufreq mperf microcode fuse loop osst sg sd_mod crc_t10dif st qla2xxx scsi_transport_fc scsi_tgt netxen_nic i7core_edac iTCO_wdt joydev e1000e serio_raw pcspkr edac_core iTCO_vendor_support acpi_power_meter rtc_cmos hpwdt hpilo button container usbhid hid dm_mirror dm_region_hash dm_log linear uhci_hcd ehci_hcd usbcore usb_common scsi_dh_emc scsi_dh_alua scsi_dh_hp_sw scsi_dh_rdac scsi_dh dm_snapshot pcnet32 mii edd dm_mod raid1 ext3 mbcache jbd fan thermal processor thermal_sys hwmon cciss scsi_mod
        Supported: Yes
        CPU 56
        Pid: 31053, comm: kernel Not tainted 3.0.31-0.9-default #1 HP ProLiant DL580 G7
        RIP: 0010:  _raw_spin_unlock_irqrestore+0x8/0x10
        RSP: 0018:ffff883ec1037af0  EFLAGS: 00000206
        RAX: 0000000000000e00 RBX: ffffea01a0817e28 RCX: ffff88803ffd9e80
        RDX: 0000000000000200 RSI: 0000000000000206 RDI: 0000000000000206
        RBP: 0000000000000002 R08: 0000000000000001 R09: ffff887ec724a400
        R10: 0000000000000000 R11: dead000000200200 R12: ffffffff8144c26e
        R13: 0000000000000030 R14: 0000000000000297 R15: 000000000000000e
        FS:  00007ed834282700(0000) GS:ffff88c03f200000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 000000000068b240 CR3: 0000003ec13c5000 CR4: 00000000000006e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        Process kernel (pid: 31053, threadinfo ffff883ec1036000, task ffff883ebd5d4100)
        Call Trace:
          release_pages+0xc5/0x260
          free_pages_and_swap_cache+0x9d/0xc0
          tlb_flush_mmu+0x5c/0x80
          tlb_finish_mmu+0xe/0x50
          exit_mmap+0xbd/0x120
          mmput+0x49/0x120
          exit_mm+0x122/0x160
          do_exit+0x17a/0x430
          do_group_exit+0x3d/0xb0
          get_signal_to_deliver+0x247/0x480
          do_signal+0x71/0x1b0
          do_notify_resume+0x98/0xb0
          int_signal+0x12/0x17
        DWARF2 unwinder stuck at int_signal+0x12/0x17
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Cc: <stable@vger.kernel.org>	[3.0+]
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53a59fc6
  15. 28 6月, 2012 2 次提交
  16. 13 1月, 2012 1 次提交
    • S
      thp: add tlb_remove_pmd_tlb_entry · f21760b1
      Shaohua Li 提交于
      We have tlb_remove_tlb_entry to indicate a pte tlb flush entry should be
      flushed, but not a corresponding API for pmd entry.  This isn't a
      problem so far because THP is only for x86 currently and tlb_flush()
      under x86 will flush entire TLB.  But this is confusion and could be
      missed if thp is ported to other arch.
      
      Also convert tlb->need_flush = 1 to a VM_BUG_ON(!tlb->need_flush) in
      __tlb_remove_page() as suggested by Andrea Arcangeli.  The
      __tlb_remove_page() function is supposed to be called after
      tlb_remove_xxx_tlb_entry() and we can catch any misuse.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f21760b1
  17. 25 5月, 2011 4 次提交
    • P
      mm: uninline large generic tlb.h functions · 9547d01b
      Peter Zijlstra 提交于
      Some of these functions have grown beyond inline sanity, move them
      out-of-line.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Requested-by: NAndrew Morton <akpm@linux-foundation.org>
      Requested-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9547d01b
    • P
      mm: extended batches for generic mmu_gather · e303297e
      Peter Zijlstra 提交于
      Instead of using a single batch (the small on-stack, or an allocated
      page), try and extend the batch every time it runs out and only flush once
      either the extend fails or we're done.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Requested-by: NNick Piggin <npiggin@kernel.dk>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e303297e
    • P
      mm, powerpc: move the RCU page-table freeing into generic code · 26723911
      Peter Zijlstra 提交于
      In case other architectures require RCU freed page-tables to implement
      gup_fast() and software filled hashes and similar things, provide the
      means to do so by moving the logic into generic code.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Requested-by: NDavid Miller <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26723911
    • P
      mm: mmu_gather rework · d16dfc55
      Peter Zijlstra 提交于
      Rework the existing mmu_gather infrastructure.
      
      The direct purpose of these patches was to allow preemptible mmu_gather,
      but even without that I think these patches provide an improvement to the
      status quo.
      
      The first 9 patches rework the mmu_gather infrastructure.  For review
      purpose I've split them into generic and per-arch patches with the last of
      those a generic cleanup.
      
      The next patch provides generic RCU page-table freeing, and the followup
      is a patch converting s390 to use this.  I've also got 4 patches from
      DaveM lined up (not included in this series) that uses this to implement
      gup_fast() for sparc64.
      
      Then there is one patch that extends the generic mmu_gather batching.
      
      After that follow the mm preemptibility patches, these make part of the mm
      a lot more preemptible.  It converts i_mmap_lock and anon_vma->lock to
      mutexes which together with the mmu_gather rework makes mmu_gather
      preemptible as well.
      
      Making i_mmap_lock a mutex also enables a clean-up of the truncate code.
      
      This also allows for preemptible mmu_notifiers, something that XPMEM I
      think wants.
      
      Furthermore, it removes the new and universially detested unmap_mutex.
      
      This patch:
      
      Remove the first obstacle towards a fully preemptible mmu_gather.
      
      The current scheme assumes mmu_gather is always done with preemption
      disabled and uses per-cpu storage for the page batches.  Change this to
      try and allocate a page for batching and in case of failure, use a small
      on-stack array to make some progress.
      
      Preemptible mmu_gather is desired in general and usable once i_mmap_lock
      becomes a mutex.  Doing it before the mutex conversion saves us from
      having to rework the code by moving the mmu_gather bits inside the
      pte_lock.
      
      Also avoid flushing the tlb batches from under the pte lock, this is
      useful even without the i_mmap_lock conversion as it significantly reduces
      pte lock hold times.
      
      [akpm@linux-foundation.org: fix comment tpyo]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Tony Luck <tony.luck@intel.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d16dfc55
  18. 28 7月, 2009 1 次提交
    • B
      mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() · 9e1b32ca
      Benjamin Herrenschmidt 提交于
      mm: Pass virtual address to [__]p{te,ud,md}_free_tlb()
      
      Upcoming paches to support the new 64-bit "BookE" powerpc architecture
      will need to have the virtual address corresponding to PTE page when
      freeing it, due to the way the HW table walker works.
      
      Basically, the TLB can be loaded with "large" pages that cover the whole
      virtual space (well, sort-of, half of it actually) represented by a PTE
      page, and which contain an "indirect" bit indicating that this TLB entry
      RPN points to an array of PTEs from which the TLB can then create direct
      entries. Thus, in order to invalidate those when PTE pages are deleted,
      we need the virtual address to pass to tlbilx or tlbivax instructions.
      
      The old trick of sticking it somewhere in the PTE page struct page sucks
      too much, the address is almost readily available in all call sites and
      almost everybody implemets these as macros, so we may as well add the
      argument everywhere. I added it to the pmd and pud variants for consistency.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: David Howells <dhowells@redhat.com> [MN10300 & FRV]
      Acked-by: NNick Piggin <npiggin@suse.de>
      Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e1b32ca
  19. 04 2月, 2008 1 次提交
  20. 01 2月, 2008 1 次提交
    • I
      asm-generic/tlb.h: build fix · 62152d0e
      Ingo Molnar 提交于
      bring back the avr32, blackfin, sh, sparc architectures into working order,
      by reverting the effects of this change that came in via the x86 tree:
      
         commit a5a19c63
         Author: Jeremy Fitzhardinge <jeremy@goop.org>
         Date:   Wed Jan 30 13:33:39 2008 +0100
      
             x86: demacro asm-x86/pgalloc_32.h
      
      Sorry about that!
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      62152d0e
  21. 30 1月, 2008 1 次提交
  22. 27 12月, 2007 1 次提交
  23. 18 12月, 2007 1 次提交
  24. 04 10月, 2006 1 次提交
  25. 26 4月, 2006 1 次提交
  26. 30 10月, 2005 3 次提交
    • H
      [PATCH] mm: tlb_finish_mmu forget rss · fc2acab3
      Hugh Dickins 提交于
      zap_pte_range has been counting the pages it frees in tlb->freed, then
      tlb_finish_mmu has used that to update the mm's rss.  That got stranger when I
      added anon_rss, yet updated it by a different route; and stranger when rss and
      anon_rss became mm_counters with special access macros.  And it would no
      longer be viable if we're relying on page_table_lock to stabilize the
      mm_counter, but calling tlb_finish_mmu outside that lock.
      
      Remove the mmu_gather's freed field, let tlb_finish_mmu stick to its own
      business, just decrement the rss mm_counter in zap_pte_range (yes, there was
      some point to batching the update, and a subsequent patch restores that).  And
      forget the anal paranoia of first reading the counter to avoid going negative
      - if rss does go negative, just fix that bug.
      
      Remove the mmu_gather's flushes and avoided_flushes from arm and arm26: no use
      was being made of them.  But arm26 alone was actually using the freed, in the
      way some others use need_flush: give it a need_flush.  arm26 seems to prefer
      spaces to tabs here: respect that.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fc2acab3
    • H
      [PATCH] mm: tlb_is_full_mm was obscure · 4d6ddfa9
      Hugh Dickins 提交于
      tlb_is_full_mm?  What does that mean?  The TLB is full?  No, it means that the
      mm's last user has gone and the whole mm is being torn down.  And it's an
      inline function because sparc64 uses a different (slightly better)
      "tlb_frozen" name for the flag others call "fullmm".
      
      And now the ptep_get_and_clear_full macro used in zap_pte_range refers
      directly to tlb->fullmm, which would be wrong for sparc64.  Rather than
      correct that, I'd prefer to scrap tlb_is_full_mm altogether, and change
      sparc64 to just use the same poor name as everyone else - is that okay?
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4d6ddfa9
    • H
      [PATCH] mm: tlb_gather_mmu get_cpu_var · 15a23ffa
      Hugh Dickins 提交于
      tlb_gather_mmu dates from before kernel preemption was allowed, and uses
      smp_processor_id or __get_cpu_var to find its per-cpu mmu_gather.  That works
      because it's currently only called after getting page_table_lock, which is not
      dropped until after the matching tlb_finish_mmu.  But don't rely on that, it
      will soon change: now disable preemption internally by proper get_cpu_var in
      tlb_gather_mmu, put_cpu_var in tlb_finish_mmu.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      15a23ffa
  27. 13 9月, 2005 1 次提交
    • A
      [PATCH] x86-64: Increase TLB flush array size · 2b4a0815
      Andi Kleen 提交于
      The generic TLB flush functions kept upto 506 pages per
      CPU to avoid too frequent IPIs.
      
      This value was done for the L1 cache of older x86 CPUs,
      but with modern CPUs it does not make much sense anymore.
      TLB flushing is slow enough that using the L2 cache is fine.
      
      This patch increases the flush array on x86-64 to cache
      5350 pages. That is roughly 20MB with 4K pages. It speeds
      up large munmaps in multithreaded processes on SMP considerably.
      
      The cost is roughly 42k of memory per CPU, which is reasonable.
      
      I only increased it on x86-64 for now, but it would probably
      make sense to increase it everywhere. Embedded architectures
      with SMP may keep it smaller to save some memory per CPU.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2b4a0815
  28. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4