1. 26 4月, 2014 1 次提交
    • L
      mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts · 1cf35d47
      Linus Torvalds 提交于
      The mmu-gather operation 'tlb_flush_mmu()' has done two things: the
      actual tlb flush operation, and the batched freeing of the pages that
      the TLB entries pointed at.
      
      This splits the operation into separate phases, so that the forced
      batched flushing done by zap_pte_range() can now do the actual TLB flush
      while still holding the page table lock, but delay the batched freeing
      of all the pages to after the lock has been dropped.
      
      This in turn allows us to avoid a race condition between
      set_page_dirty() (as called by zap_pte_range() when it finds a dirty
      shared memory pte) and page_mkclean(): because we now flush all the
      dirty page data from the TLB's while holding the pte lock,
      page_mkclean() will be held up walking the (recently cleaned) page
      tables until after the TLB entries have been flushed from all CPU's.
      Reported-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Tested-by: NDave Hansen <dave.hansen@intel.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1cf35d47
  2. 16 8月, 2013 1 次提交
    • L
      Fix TLB gather virtual address range invalidation corner cases · 2b047252
      Linus Torvalds 提交于
      Ben Tebulin reported:
      
       "Since v3.7.2 on two independent machines a very specific Git
        repository fails in 9/10 cases on git-fsck due to an SHA1/memory
        failures.  This only occurs on a very specific repository and can be
        reproduced stably on two independent laptops.  Git mailing list ran
        out of ideas and for me this looks like some very exotic kernel issue"
      
      and bisected the failure to the backport of commit 53a59fc6 ("mm:
      limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT").
      
      That commit itself is not actually buggy, but what it does is to make it
      much more likely to hit the partial TLB invalidation case, since it
      introduces a new case in tlb_next_batch() that previously only ever
      happened when running out of memory.
      
      The real bug is that the TLB gather virtual memory range setup is subtly
      buggered.  It was introduced in commit 597e1c35 ("mm/mmu_gather:
      enable tlb flush range in generic mmu_gather"), and the range handling
      was already fixed at least once in commit e6c495a9 ("mm: fix the TLB
      range flushed when __tlb_remove_page() runs out of slots"), but that fix
      was not complete.
      
      The problem with the TLB gather virtual address range is that it isn't
      set up by the initial tlb_gather_mmu() initialization (which didn't get
      the TLB range information), but it is set up ad-hoc later by the
      functions that actually flush the TLB.  And so any such case that forgot
      to update the TLB range entries would potentially miss TLB invalidates.
      
      Rather than try to figure out exactly which particular ad-hoc range
      setup was missing (I personally suspect it's the hugetlb case in
      zap_huge_pmd(), which didn't have the same logic as zap_pte_range()
      did), this patch just gets rid of the problem at the source: make the
      TLB range information available to tlb_gather_mmu(), and initialize it
      when initializing all the other tlb gather fields.
      
      This makes the patch larger, but conceptually much simpler.  And the end
      result is much more understandable; even if you want to play games with
      partial ranges when invalidating the TLB contents in chunks, now the
      range information is always there, and anybody who doesn't want to
      bother with it won't introduce subtle bugs.
      
      Ben verified that this fixes his problem.
      Reported-bisected-and-tested-by: NBen Tebulin <tebulin@googlemail.com>
      Build-testing-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Build-testing-by: NRichard Weinberger <richard.weinberger@gmail.com>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2b047252
  3. 06 6月, 2013 1 次提交
    • P
      arch, mm: Remove tlb_fast_mode() · 29eb7782
      Peter Zijlstra 提交于
      Since the introduction of preemptible mmu_gather TLB fast mode has been
      broken. TLB fast mode relies on there being absolutely no concurrency;
      it frees pages first and invalidates TLBs later.
      
      However now we can get concurrency and stuff goes *bang*.
      
      This patch removes all tlb_fast_mode() code; it was found the better
      option vs trying to patch the hole by entangling tlb invalidation with
      the scheduler.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Reported-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29eb7782
  4. 04 6月, 2013 1 次提交
  5. 25 8月, 2012 1 次提交
  6. 03 2月, 2012 1 次提交
  7. 08 12月, 2011 2 次提交
  8. 25 5月, 2011 1 次提交
    • P
      arm: mmu_gather rework · 9e14f674
      Peter Zijlstra 提交于
      Fix up the arm mmu_gather code to conform to the new API.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e14f674
  9. 22 2月, 2011 2 次提交
  10. 28 7月, 2009 1 次提交
    • B
      mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() · 9e1b32ca
      Benjamin Herrenschmidt 提交于
      mm: Pass virtual address to [__]p{te,ud,md}_free_tlb()
      
      Upcoming paches to support the new 64-bit "BookE" powerpc architecture
      will need to have the virtual address corresponding to PTE page when
      freeing it, due to the way the HW table walker works.
      
      Basically, the TLB can be loaded with "large" pages that cover the whole
      virtual space (well, sort-of, half of it actually) represented by a PTE
      page, and which contain an "indirect" bit indicating that this TLB entry
      RPN points to an array of PTEs from which the TLB can then create direct
      entries. Thus, in order to invalidate those when PTE pages are deleted,
      we need the virtual address to pass to tlbilx or tlbivax instructions.
      
      The old trick of sticking it somewhere in the PTE page struct page sucks
      too much, the address is almost readily available in all call sites and
      almost everybody implemets these as macros, so we may as well add the
      argument everywhere. I added it to the pmd and pud variants for consistency.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: David Howells <dhowells@redhat.com> [MN10300 & FRV]
      Acked-by: NNick Piggin <npiggin@suse.de>
      Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e1b32ca
  11. 15 4月, 2009 1 次提交
    • A
      [ARM] 5450/1: Flush only the needed range when unmapping a VMA · 7fccfc00
      Aaro Koskinen 提交于
      When unmapping N pages (e.g. shared memory) the amount of TLB flushes
      done can be (N*PAGE_SIZE/ZAP_BLOCK_SIZE)*N although it should be N at
      maximum. With PREEMPT kernel ZAP_BLOCK_SIZE is 8 pages, so there is a
      noticeable performance penalty when unmapping a large VMA and the system
      is spending its time in flush_tlb_range().
      
      The problem is that tlb_end_vma() is always flushing the full VMA
      range. The subrange that needs to be flushed can be calculated by
      tlb_remove_tlb_entry(). This approach was suggested by Hugh Dickins,
      and is also used by other arches.
      
      The speed increase is roughly 3x for 8M mappings and for larger mappings
      even more.
      Signed-off-by: NAaro Koskinen <Aaro.Koskinen@nokia.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      7fccfc00
  12. 03 8月, 2008 1 次提交
  13. 06 2月, 2008 1 次提交
  14. 22 3月, 2006 1 次提交
  15. 30 10月, 2005 3 次提交
    • H
      [PATCH] mm: tlb_finish_mmu forget rss · fc2acab3
      Hugh Dickins 提交于
      zap_pte_range has been counting the pages it frees in tlb->freed, then
      tlb_finish_mmu has used that to update the mm's rss.  That got stranger when I
      added anon_rss, yet updated it by a different route; and stranger when rss and
      anon_rss became mm_counters with special access macros.  And it would no
      longer be viable if we're relying on page_table_lock to stabilize the
      mm_counter, but calling tlb_finish_mmu outside that lock.
      
      Remove the mmu_gather's freed field, let tlb_finish_mmu stick to its own
      business, just decrement the rss mm_counter in zap_pte_range (yes, there was
      some point to batching the update, and a subsequent patch restores that).  And
      forget the anal paranoia of first reading the counter to avoid going negative
      - if rss does go negative, just fix that bug.
      
      Remove the mmu_gather's flushes and avoided_flushes from arm and arm26: no use
      was being made of them.  But arm26 alone was actually using the freed, in the
      way some others use need_flush: give it a need_flush.  arm26 seems to prefer
      spaces to tabs here: respect that.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fc2acab3
    • H
      [PATCH] mm: tlb_is_full_mm was obscure · 4d6ddfa9
      Hugh Dickins 提交于
      tlb_is_full_mm?  What does that mean?  The TLB is full?  No, it means that the
      mm's last user has gone and the whole mm is being torn down.  And it's an
      inline function because sparc64 uses a different (slightly better)
      "tlb_frozen" name for the flag others call "fullmm".
      
      And now the ptep_get_and_clear_full macro used in zap_pte_range refers
      directly to tlb->fullmm, which would be wrong for sparc64.  Rather than
      correct that, I'd prefer to scrap tlb_is_full_mm altogether, and change
      sparc64 to just use the same poor name as everyone else - is that okay?
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4d6ddfa9
    • H
      [PATCH] mm: tlb_gather_mmu get_cpu_var · 15a23ffa
      Hugh Dickins 提交于
      tlb_gather_mmu dates from before kernel preemption was allowed, and uses
      smp_processor_id or __get_cpu_var to find its per-cpu mmu_gather.  That works
      because it's currently only called after getting page_table_lock, which is not
      dropped until after the matching tlb_finish_mmu.  But don't rely on that, it
      will soon change: now disable preemption internally by proper get_cpu_var in
      tlb_gather_mmu, put_cpu_var in tlb_finish_mmu.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      15a23ffa
  16. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4