1. 09 8月, 2014 2 次提交
    • J
      mm: memcontrol: rewrite uncharge API · 0a31bc97
      Johannes Weiner 提交于
      The memcg uncharging code that is involved towards the end of a page's
      lifetime - truncation, reclaim, swapout, migration - is impressively
      complicated and fragile.
      
      Because anonymous and file pages were always charged before they had their
      page->mapping established, uncharges had to happen when the page type
      could still be known from the context; as in unmap for anonymous, page
      cache removal for file and shmem pages, and swap cache truncation for swap
      pages.  However, these operations happen well before the page is actually
      freed, and so a lot of synchronization is necessary:
      
      - Charging, uncharging, page migration, and charge migration all need
        to take a per-page bit spinlock as they could race with uncharging.
      
      - Swap cache truncation happens during both swap-in and swap-out, and
        possibly repeatedly before the page is actually freed.  This means
        that the memcg swapout code is called from many contexts that make
        no sense and it has to figure out the direction from page state to
        make sure memory and memory+swap are always correctly charged.
      
      - On page migration, the old page might be unmapped but then reused,
        so memcg code has to prevent untimely uncharging in that case.
        Because this code - which should be a simple charge transfer - is so
        special-cased, it is not reusable for replace_page_cache().
      
      But now that charged pages always have a page->mapping, introduce
      mem_cgroup_uncharge(), which is called after the final put_page(), when we
      know for sure that nobody is looking at the page anymore.
      
      For page migration, introduce mem_cgroup_migrate(), which is called after
      the migration is successful and the new page is fully rmapped.  Because
      the old page is no longer uncharged after migration, prevent double
      charges by decoupling the page's memcg association (PCG_USED and
      pc->mem_cgroup) from the page holding an actual charge.  The new bits
      PCG_MEM and PCG_MEMSW represent the respective charges and are transferred
      to the new page during migration.
      
      mem_cgroup_migrate() is suitable for replace_page_cache() as well,
      which gets rid of mem_cgroup_replace_page_cache().  However, care
      needs to be taken because both the source and the target page can
      already be charged and on the LRU when fuse is splicing: grab the page
      lock on the charge moving side to prevent changing pc->mem_cgroup of a
      page under migration.  Also, the lruvecs of both pages change as we
      uncharge the old and charge the new during migration, and putback may
      race with us, so grab the lru lock and isolate the pages iff on LRU to
      prevent races and ensure the pages are on the right lruvec afterward.
      
      Swap accounting is massively simplified: because the page is no longer
      uncharged as early as swap cache deletion, a new mem_cgroup_swapout() can
      transfer the page's memory+swap charge (PCG_MEMSW) to the swap entry
      before the final put_page() in page reclaim.
      
      Finally, page_cgroup changes are now protected by whatever protection the
      page itself offers: anonymous pages are charged under the page table lock,
      whereas page cache insertions, swapin, and migration hold the page lock.
      Uncharging happens under full exclusion with no outstanding references.
      Charging and uncharging also ensure that the page is off-LRU, which
      serializes against charge migration.  Remove the very costly page_cgroup
      lock and set pc->flags non-atomically.
      
      [mhocko@suse.cz: mem_cgroup_charge_statistics needs preempt_disable]
      [vdavydov@parallels.com: fix flags definition]
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Tested-by: NJet Chen <jet.chen@intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Tested-by: NFelipe Balbi <balbi@ti.com>
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a31bc97
    • J
      mm: memcontrol: rewrite charge API · 00501b53
      Johannes Weiner 提交于
      These patches rework memcg charge lifetime to integrate more naturally
      with the lifetime of user pages.  This drastically simplifies the code and
      reduces charging and uncharging overhead.  The most expensive part of
      charging and uncharging is the page_cgroup bit spinlock, which is removed
      entirely after this series.
      
      Here are the top-10 profile entries of a stress test that reads a 128G
      sparse file on a freshly booted box, without even a dedicated cgroup (i.e.
       executing in the root memcg).  Before:
      
          15.36%              cat  [kernel.kallsyms]   [k] copy_user_generic_string
          13.31%              cat  [kernel.kallsyms]   [k] memset
          11.48%              cat  [kernel.kallsyms]   [k] do_mpage_readpage
           4.23%              cat  [kernel.kallsyms]   [k] get_page_from_freelist
           2.38%              cat  [kernel.kallsyms]   [k] put_page
           2.32%              cat  [kernel.kallsyms]   [k] __mem_cgroup_commit_charge
           2.18%          kswapd0  [kernel.kallsyms]   [k] __mem_cgroup_uncharge_common
           1.92%          kswapd0  [kernel.kallsyms]   [k] shrink_page_list
           1.86%              cat  [kernel.kallsyms]   [k] __radix_tree_lookup
           1.62%              cat  [kernel.kallsyms]   [k] __pagevec_lru_add_fn
      
      After:
      
          15.67%           cat  [kernel.kallsyms]   [k] copy_user_generic_string
          13.48%           cat  [kernel.kallsyms]   [k] memset
          11.42%           cat  [kernel.kallsyms]   [k] do_mpage_readpage
           3.98%           cat  [kernel.kallsyms]   [k] get_page_from_freelist
           2.46%           cat  [kernel.kallsyms]   [k] put_page
           2.13%       kswapd0  [kernel.kallsyms]   [k] shrink_page_list
           1.88%           cat  [kernel.kallsyms]   [k] __radix_tree_lookup
           1.67%           cat  [kernel.kallsyms]   [k] __pagevec_lru_add_fn
           1.39%       kswapd0  [kernel.kallsyms]   [k] free_pcppages_bulk
           1.30%           cat  [kernel.kallsyms]   [k] kfree
      
      As you can see, the memcg footprint has shrunk quite a bit.
      
         text    data     bss     dec     hex filename
        37970    9892     400   48262    bc86 mm/memcontrol.o.old
        35239    9892     400   45531    b1db mm/memcontrol.o
      
      This patch (of 4):
      
      The memcg charge API charges pages before they are rmapped - i.e.  have an
      actual "type" - and so every callsite needs its own set of charge and
      uncharge functions to know what type is being operated on.  Worse,
      uncharge has to happen from a context that is still type-specific, rather
      than at the end of the page's lifetime with exclusive access, and so
      requires a lot of synchronization.
      
      Rewrite the charge API to provide a generic set of try_charge(),
      commit_charge() and cancel_charge() transaction operations, much like
      what's currently done for swap-in:
      
        mem_cgroup_try_charge() attempts to reserve a charge, reclaiming
        pages from the memcg if necessary.
      
        mem_cgroup_commit_charge() commits the page to the charge once it
        has a valid page->mapping and PageAnon() reliably tells the type.
      
        mem_cgroup_cancel_charge() aborts the transaction.
      
      This reduces the charge API and enables subsequent patches to
      drastically simplify uncharging.
      
      As pages need to be committed after rmap is established but before they
      are added to the LRU, page_add_new_anon_rmap() must stop doing LRU
      additions again.  Revive lru_cache_add_active_or_unevictable().
      
      [hughd@google.com: fix shmem_unuse]
      [hughd@google.com: Add comments on the private use of -EAGAIN]
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00501b53
  2. 07 8月, 2014 7 次提交
  3. 31 7月, 2014 1 次提交
  4. 24 7月, 2014 1 次提交
  5. 05 6月, 2014 5 次提交
  6. 07 5月, 2014 1 次提交
  7. 26 4月, 2014 1 次提交
    • L
      mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts · 1cf35d47
      Linus Torvalds 提交于
      The mmu-gather operation 'tlb_flush_mmu()' has done two things: the
      actual tlb flush operation, and the batched freeing of the pages that
      the TLB entries pointed at.
      
      This splits the operation into separate phases, so that the forced
      batched flushing done by zap_pte_range() can now do the actual TLB flush
      while still holding the page table lock, but delay the batched freeing
      of all the pages to after the lock has been dropped.
      
      This in turn allows us to avoid a race condition between
      set_page_dirty() (as called by zap_pte_range() when it finds a dirty
      shared memory pte) and page_mkclean(): because we now flush all the
      dirty page data from the TLB's while holding the pte lock,
      page_mkclean() will be held up walking the (recently cleaned) page
      tables until after the TLB entries have been flushed from all CPU's.
      Reported-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Tested-by: NDave Hansen <dave.hansen@intel.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1cf35d47
  8. 23 4月, 2014 1 次提交
    • L
      mm: make fixup_user_fault() check the vma access rights too · 1b17844b
      Linus Torvalds 提交于
      fixup_user_fault() is used by the futex code when the direct user access
      fails, and the futex code wants it to either map in the page in a usable
      form or return an error.  It relied on handle_mm_fault() to map the
      page, and correctly checked the error return from that, but while that
      does map the page, it doesn't actually guarantee that the page will be
      mapped with sufficient permissions to be then accessed.
      
      So do the appropriate tests of the vma access rights by hand.
      
      [ Side note: arguably handle_mm_fault() could just do that itself, but
        we have traditionally done it in the caller, because some callers -
        notably get_user_pages() - have been able to access pages even when
        they are mapped with PROT_NONE.  Maybe we should re-visit that design
        decision, but in the meantime this is the minimal patch. ]
      
      Found by Dave Jones running his trinity tool.
      Reported-by: NDave Jones <davej@redhat.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b17844b
  9. 08 4月, 2014 5 次提交
    • M
      mm: remove unused arg of set_page_dirty_balance() · ed6d7c8e
      Miklos Szeredi 提交于
      There's only one caller of set_page_dirty_balance() and that will call it
      with page_mkwrite == 0.
      
      The page_mkwrite argument was unused since commit b827e496 "mm: close
      page_mkwrite races".
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed6d7c8e
    • M
      memcg: rename high level charging functions · d715ae08
      Michal Hocko 提交于
      mem_cgroup_newpage_charge is used only for charging anonymous memory so
      it is better to rename it to mem_cgroup_charge_anon.
      
      mem_cgroup_cache_charge is used for file backed memory so rename it to
      mem_cgroup_charge_file.
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d715ae08
    • K
      mm: add debugfs tunable for fault_around_order · 1592eef0
      Kirill A. Shutemov 提交于
      Let's allow people to tweak faultaround at runtime.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ning Qu <quning@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1592eef0
    • K
      mm: introduce vm_ops->map_pages() · 8c6e50b0
      Kirill A. Shutemov 提交于
      Here's new version of faultaround patchset.  It took a while to tune it
      and collect performance data.
      
      First patch adds new callback ->map_pages to vm_operations_struct.
      
      ->map_pages() is called when VM asks to map easy accessible pages.
      Filesystem should find and map pages associated with offsets from
      "pgoff" till "max_pgoff".  ->map_pages() is called with page table
      locked and must not block.  If it's not possible to reach a page without
      blocking, filesystem should skip it.  Filesystem should use do_set_pte()
      to setup page table entry.  Pointer to entry associated with offset
      "pgoff" is passed in "pte" field in vm_fault structure.  Pointers to
      entries for other offsets should be calculated relative to "pte".
      
      Currently VM use ->map_pages only on read page fault path.  We try to
      map FAULT_AROUND_PAGES a time.  FAULT_AROUND_PAGES is 16 for now.
      Performance data for different FAULT_AROUND_ORDER is below.
      
      TODO:
       - implement ->map_pages() for shmem/tmpfs;
       - modify get_user_pages() to be able to use ->map_pages() and implement
         mmap(MAP_POPULATE|MAP_NONBLOCK) on top.
      
      =========================================================================
      Tested on 4-socket machine (120 threads) with 128GiB of RAM.
      
      Few real-world workloads. The sweet spot for FAULT_AROUND_ORDER here is
      somewhere between 3 and 5. Let's say 4 :)
      
      Linux build (make -j60)
      FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
      	minor-faults		283,301,572	247,151,987	212,215,789	204,772,882	199,568,944	194,703,779	193,381,485
      	time, seconds		151.227629483	153.920996480	151.356125472	150.863792049	150.879207877	151.150764954	151.450962358
      Linux rebuild (make -j60)
      FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
      	minor-faults		5,396,854	4,148,444	2,855,286	2,577,282	2,361,957	2,169,573	2,112,643
      	time, seconds		27.404543757	27.559725591	27.030057426	26.855045126	26.678618635	26.974523490	26.761320095
      Git test suite (make -j60 test)
      FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
      	minor-faults		129,591,823	99,200,751	66,106,718	57,606,410	51,510,808	45,776,813	44,085,515
      	time, seconds		66.087215026	64.784546905	64.401156567	65.282708668	66.034016829	66.793780811	67.237810413
      
      Two synthetic tests: access every word in file in sequential/random order.
      It doesn't improve much after FAULT_AROUND_ORDER == 4.
      
      Sequential access 16GiB file
      FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
       1 thread
      	minor-faults		4,195,437	2,098,275	525,068		262,251		131,170		32,856		8,282
      	time, seconds		7.250461742	6.461711074	5.493859139	5.488488147	5.707213983	5.898510832	5.109232856
       8 threads
      	minor-faults		33,557,540	16,892,728	4,515,848	2,366,999	1,423,382	442,732		142,339
      	time, seconds		16.649304881	9.312555263	6.612490639	6.394316732	6.669827501	6.75078944	6.371900528
       32 threads
      	minor-faults		134,228,222	67,526,810	17,725,386	9,716,537	4,763,731	1,668,921	537,200
      	time, seconds		49.164430543	29.712060103	12.938649729	10.175151004	11.840094583	9.594081325	9.928461797
       60 threads
      	minor-faults		251,687,988	126,146,952	32,919,406	18,208,804	10,458,947	2,733,907	928,217
      	time, seconds		86.260656897	49.626551828	22.335007632	17.608243696	16.523119035	16.339489186	16.326390902
       120 threads
      	minor-faults		503,352,863	252,939,677	67,039,168	35,191,827	19,170,091	4,688,357	1,471,862
      	time, seconds		124.589206333	79.757867787	39.508707872	32.167281632	29.972989292	28.729834575	28.042251622
      Random access 1GiB file
       1 thread
      	minor-faults		262,636		132,743		34,369		17,299		8,527		3,451		1,222
      	time, seconds		15.351890914	16.613802482	16.569227308	15.179220992	16.557356122	16.578247824	15.365266994
       8 threads
      	minor-faults		2,098,948	1,061,871	273,690		154,501		87,110		25,663		7,384
      	time, seconds		15.040026343	15.096933500	14.474757288	14.289129964	14.411537468	14.296316837	14.395635804
       32 threads
      	minor-faults		8,390,734	4,231,023	1,054,432	528,847		269,242		97,746		26,881
      	time, seconds		20.430433109	21.585235358	22.115062928	14.872878951	14.880856305	14.883370649	14.821261690
       60 threads
      	minor-faults		15,733,258	7,892,809	1,973,393	988,266		594,789		164,994		51,691
      	time, seconds		26.577302548	25.692397770	18.728863715	20.153026398	21.619101933	17.745086260	17.613215273
       120 threads
      	minor-faults		31,471,111	15,816,616	3,959,209	1,978,685	1,008,299	264,635		96,010
      	time, seconds		41.835322703	40.459786095	36.085306105	35.313894834	35.814445675	36.552633793	34.289210594
      
      Touch only one page in page table in 16GiB file
      FAULT_AROUND_ORDER		Baseline	1		3		4		5		7		9
       1 thread
      	minor-faults		8,372		8,324		8,270		8,260		8,249		8,239		8,237
      	time, seconds		0.039892712	0.045369149	0.051846126	0.063681685	0.079095975	0.17652406	0.541213386
       8 threads
      	minor-faults		65,731		65,681		65,628		65,620		65,608		65,599		65,596
      	time, seconds		0.124159196	0.488600638	0.156854426	0.191901957	0.242631486	0.543569456	1.677303984
       32 threads
      	minor-faults		262,388		262,341		262,285		262,276		262,266		262,257		263,183
      	time, seconds		0.452421421	0.488600638	0.565020946	0.648229739	0.789850823	1.651584361	5.000361559
       60 threads
      	minor-faults		491,822		491,792		491,723		491,711		491,701		491,691		491,825
      	time, seconds		0.763288616	0.869620515	0.980727360	1.161732354	1.466915814	3.04041448	9.308612938
       120 threads
      	minor-faults		983,466		983,655		983,366		983,372		983,363		984,083		984,164
      	time, seconds		1.595846553	1.667902182	2.008959376	2.425380942	2.941368804	5.977807890	18.401846125
      
      This patch (of 2):
      
      Introduce new vm_ops callback ->map_pages() and uses it for mapping easy
      accessible pages around fault address.
      
      On read page fault, if filesystem provides ->map_pages(), we try to map up
      to FAULT_AROUND_PAGES pages around page fault address in hope to reduce
      number of minor page faults.
      
      We call ->map_pages first and use ->fault() as fallback if page by the
      offset is not ready to be mapped (cold page cache or something).
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ning Qu <quning@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c6e50b0
    • D
      mm/memory.c: update comment in unmap_single_vma() · 7aa6b4ad
      Davidlohr Bueso 提交于
      The described issue now occurs inside mmap_region().  And unfortunately
      is still valid.
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7aa6b4ad
  10. 05 4月, 2014 1 次提交
    • H
      mm: get_user_pages(write,force) refuse to COW in shared areas · cda540ac
      Hugh Dickins 提交于
      get_user_pages(write=1, force=1) has always had odd behaviour on write-
      protected shared mappings: although it demands FMODE_WRITE-access to the
      underlying object (do_mmap_pgoff sets neither VM_SHARED nor VM_MAYWRITE
      without that), it ends up with do_wp_page substituting private anonymous
      Copied-On-Write pages for the shared file pages in the area.
      
      That was long ago intentional, as a safety measure to prevent ptrace
      setting a breakpoint (or POKETEXT or POKEDATA) from inadvertently
      corrupting the underlying executable.  Yet exec and dynamic loaders open
      the file read-only, and use MAP_PRIVATE rather than MAP_SHARED.
      
      The traditional odd behaviour still causes surprises and bugs in mm, and
      is probably not what any caller wants - even the comment on the flag
      says "You do not want this" (although it's undoubtedly necessary for
      overriding userspace protections in some contexts, and good when !write).
      
      Let's stop doing that.  But it would be dangerous to remove the long-
      standing safety at this stage, so just make get_user_pages(write,force)
      fail with EFAULT when applied to a write-protected shared area.
      Infiniband may in future want to force write through to underlying
      object: we can add another FOLL_flag later to enable that if required.
      
      Odd though the old behaviour was, there is no doubt that we may turn out
      to break userspace with this change, and have to revert it quickly.
      Issue a WARN_ON_ONCE to help debug the changed case (easily triggered by
      userspace, so only once to prevent spamming the logs); and delay a few
      associated cleanups until this change is proved.
      
      get_user_pages callers who might see trouble from this change:
        ptrace poking, or writing to /proc/<pid>/mem
        drivers/infiniband/
        drivers/media/v4l2-core/
        drivers/gpu/drm/exynos/exynos_drm_gem.c
        drivers/staging/tidspbridge/core/tiomap3430.c
      if they ever apply get_user_pages to write-protected shared mappings
      of an object which was opened for writing.
      
      I went to apply the same change to mm/nommu.c, but retreated.  NOMMU has
      no place for COW, and its VM_flags conventions are not the same: I'd be
      more likely to screw up NOMMU than make an improvement there.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cda540ac
  11. 04 4月, 2014 8 次提交
  12. 26 2月, 2014 2 次提交
  13. 24 1月, 2014 2 次提交
    • S
      mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE · 309381fe
      Sasha Levin 提交于
      Most of the VM_BUG_ON assertions are performed on a page.  Usually, when
      one of these assertions fails we'll get a BUG_ON with a call stack and
      the registers.
      
      I've recently noticed based on the requests to add a small piece of code
      that dumps the page to various VM_BUG_ON sites that the page dump is
      quite useful to people debugging issues in mm.
      
      This patch adds a VM_BUG_ON_PAGE(cond, page) which beyond doing what
      VM_BUG_ON() does, also dumps the page before executing the actual
      BUG_ON.
      
      [akpm@linux-foundation.org: fix up includes]
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      309381fe
    • D
      mm: print more details for bad_page() · f0b791a3
      Dave Hansen 提交于
      bad_page() is cool in that it prints out a bunch of data about the page.
      But, I can never remember which page flags are good and which are bad,
      or whether ->index or ->mapping is required to be NULL.
      
      This patch allows bad/dump_page() callers to specify a string about why
      they are dumping the page and adds explanation strings to a number of
      places.  It also adds a 'bad_flags' argument to bad_page(), which it
      then dumps out separately from the flags which are actually set.
      
      This way, the messages will show specifically why the page was bad,
      *specifically* which flags it is complaining about, if it was a page
      flag combination which was the problem.
      
      [akpm@linux-foundation.org: switch to pr_alert]
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NChristoph Lameter <cl@linux.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f0b791a3
  14. 22 1月, 2014 2 次提交
    • K
      mm: create a separate slab for page->ptl allocation · b35f1819
      Kirill A. Shutemov 提交于
      If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
      is 72 bytes.  For page->ptl they will be allocated from kmalloc-96 slab,
      so we loose 24 on each.  An average system can easily allocate few tens
      thousands of page->ptl and overhead is significant.
      
      Let's create a separate slab for page->ptl allocation to solve this.
      
      To make sure that it really works this time, some numbers from my test
      machine (just booted, no load):
      
      Before:
        # grep '^\(kmalloc-96\|page->ptl\)' /proc/slabinfo
        kmalloc-96         31987  32190    128   30    1 : tunables  120   60    8 : slabdata   1073   1073     92
      After:
        # grep '^\(kmalloc-96\|page->ptl\)' /proc/slabinfo
        page->ptl          27516  28143     72   53    1 : tunables  120   60    8 : slabdata    531    531      9
        kmalloc-96          3853   5280    128   30    1 : tunables  120   60    8 : slabdata    176    176      0
      
      Note that the patch is useful not only for debug case, but also for
      PREEMPT_RT, where spinlock_t is always bloated.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b35f1819
    • D
      dma-debug: introduce debug_dma_assert_idle() · 0abdd7a8
      Dan Williams 提交于
      Record actively mapped pages and provide an api for asserting a given
      page is dma inactive before execution proceeds.  Placing
      debug_dma_assert_idle() in cow_user_page() flagged the violation of the
      dma-api in the NET_DMA implementation (see commit 77873803 "net_dma:
      mark broken").
      
      The implementation includes the capability to count, in a limited way,
      repeat mappings of the same page that occur without an intervening
      unmap.  This 'overlap' counter is limited to the few bits of tag space
      in a radix tree.  This mechanism is added to mitigate false negative
      cases where, for example, a page is dma mapped twice and
      debug_dma_assert_idle() is called after the page is un-mapped once.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Vinod Koul <vinod.koul@intel.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: James Bottomley <JBottomley@Parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0abdd7a8
  15. 21 12月, 2013 1 次提交
    • O
      mm: fix build of split ptlock code · 40b64acd
      Olof Johansson 提交于
      Commit 597d795a ('mm: do not allocate page->ptl dynamically, if
      spinlock_t fits to long') restructures some allocators that are compiled
      even if USE_SPLIT_PTLOCKS arn't used.  It results in compilation
      failure:
      
        mm/memory.c:4282:6: error: 'struct page' has no member named 'ptl'
        mm/memory.c:4288:12: error: 'struct page' has no member named 'ptl'
      
      Add in the missing ifdef.
      
      Fixes: 597d795a ('mm: do not allocate page->ptl dynamically, if spinlock_t fits to long')
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      40b64acd