1. 18 3月, 2016 3 次提交
  2. 16 3月, 2016 29 次提交
    • I
      autofs4: fix string.h include in auto_dev-ioctl.h · 63c06227
      Ian Kent 提交于
      Since including linux/string.h will now do the right thing remove the
      conditional check.
      Signed-off-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      63c06227
    • I
      autofs4: fix some white space errors · 0266725a
      Ian Kent 提交于
      Fix some white space format errors.
      Signed-off-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0266725a
    • I
      autofs4: coding style fixes · e9a7c2f1
      Ian Kent 提交于
      Try and make the coding style completely consistent throughtout the
      autofs module and inline with kernel coding style recommendations.
      Signed-off-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e9a7c2f1
    • J
      mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous · 7cf91a98
      Joonsoo Kim 提交于
      There is a performance drop report due to hugepage allocation and in
      there half of cpu time are spent on pageblock_pfn_to_page() in
      compaction [1].
      
      In that workload, compaction is triggered to make hugepage but most of
      pageblocks are un-available for compaction due to pageblock type and
      skip bit so compaction usually fails.  Most costly operations in this
      case is to find valid pageblock while scanning whole zone range.  To
      check if pageblock is valid to compact, valid pfn within pageblock is
      required and we can obtain it by calling pageblock_pfn_to_page().  This
      function checks whether pageblock is in a single zone and return valid
      pfn if possible.  Problem is that we need to check it every time before
      scanning pageblock even if we re-visit it and this turns out to be very
      expensive in this workload.
      
      Although we have no way to skip this pageblock check in the system where
      hole exists at arbitrary position, we can use cached value for zone
      continuity and just do pfn_to_page() in the system where hole doesn't
      exist.  This optimization considerably speeds up in above workload.
      
      Before vs After
        Max: 1096 MB/s vs 1325 MB/s
        Min: 635 MB/s 1015 MB/s
        Avg: 899 MB/s 1194 MB/s
      
      Avg is improved by roughly 30% [2].
      
      [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
      [2]: https://lkml.org/lkml/2015/12/9/23
      
      [akpm@linux-foundation.org: don't forget to restore zone->contiguous on error path, per Vlastimil]
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Reported-by: NAaron Lu <aaron.lu@intel.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Tested-by: NAaron Lu <aaron.lu@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7cf91a98
    • J
      mm: remove unnecessary uses of lock_page_memcg() · fdf1cdb9
      Johannes Weiner 提交于
      There are several users that nest lock_page_memcg() inside lock_page()
      to prevent page->mem_cgroup from changing.  But the page lock prevents
      pages from moving between cgroups, so that is unnecessary overhead.
      
      Remove lock_page_memcg() in contexts with locked contexts and fix the
      debug code in the page stat functions to be okay with the page lock.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fdf1cdb9
    • J
      mm: simplify lock_page_memcg() · 62cccb8c
      Johannes Weiner 提交于
      Now that migration doesn't clear page->mem_cgroup of live pages anymore,
      it's safe to make lock_page_memcg() and the memcg stat functions take
      pages, and spare the callers from memcg objects.
      
      [akpm@linux-foundation.org: fix warnings]
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Suggested-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      62cccb8c
    • J
      mm: migrate: do not touch page->mem_cgroup of live pages · 6a93ca8f
      Johannes Weiner 提交于
      Changing a page's memcg association complicates dealing with the page,
      so we want to limit this as much as possible.  Page migration e.g.  does
      not have to do that.  Just like page cache replacement, it can forcibly
      charge a replacement page, and then uncharge the old page when it gets
      freed.  Temporarily overcharging the cgroup by a single page is not an
      issue in practice, and charging is so cheap nowadays that this is much
      preferrable to the headache of messing with live pages.
      
      The only place that still changes the page->mem_cgroup binding of live
      pages is when pages move along with a task to another cgroup.  But that
      path isolates the page from the LRU, takes the page lock, and the move
      lock (lock_page_memcg()).  That means page->mem_cgroup is always stable
      in callers that have the page isolated from the LRU or locked.  Lighter
      unlocked paths, like writeback accounting, can use lock_page_memcg().
      
      [akpm@linux-foundation.org: fix build]
      [vdavydov@virtuozzo.com: fix lockdep splat]
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6a93ca8f
    • J
      mm: workingset: per-cgroup cache thrash detection · 23047a96
      Johannes Weiner 提交于
      Cache thrash detection (see a528910e "mm: thrash detection-based
      file cache sizing" for details) currently only works on the system
      level, not inside cgroups.  Worse, as the refaults are compared to the
      global number of active cache, cgroups might wrongfully get all their
      refaults activated when their pages are hotter than those of others.
      
      Move the refault machinery from the zone to the lruvec, and then tag
      eviction entries with the memcg ID.  This makes the thrash detection
      work correctly inside cgroups.
      
      [sergey.senozhatsky@gmail.com: do not return from workingset_activation() with locked rcu and page]
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      23047a96
    • J
      mm: memcontrol: generalize locking for the page->mem_cgroup binding · 81f8c3a4
      Johannes Weiner 提交于
      These patches tag the page cache radix tree eviction entries with the
      memcg an evicted page belonged to, thus making per-cgroup LRU reclaim
      work properly and be as adaptive to new cache workingsets as global
      reclaim already is.
      
      This should have been part of the original thrash detection patch
      series, but was deferred due to the complexity of those patches.
      
      This patch (of 5):
      
      So far the only sites that needed to exclude charge migration to
      stabilize page->mem_cgroup have been per-cgroup page statistics, hence
      the name mem_cgroup_begin_page_stat().  But per-cgroup thrash detection
      will add another site that needs to ensure page->mem_cgroup lifetime.
      
      Rename these locking functions to the more generic lock_page_memcg() and
      unlock_page_memcg().  Since charge migration is a cgroup1 feature only,
      we might be able to delete it at some point, and these now easy to
      identify locking sites along with it.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Suggested-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      81f8c3a4
    • V
      memory-hotplug: add automatic onlining policy for the newly added memory · 31bc3858
      Vitaly Kuznetsov 提交于
      Currently, all newly added memory blocks remain in 'offline' state
      unless someone onlines them, some linux distributions carry special udev
      rules like:
      
        SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online"
      
      to make this happen automatically.  This is not a great solution for
      virtual machines where memory hotplug is being used to address high
      memory pressure situations as such onlining is slow and a userspace
      process doing this (udev) has a chance of being killed by the OOM killer
      as it will probably require to allocate some memory.
      
      Introduce default policy for the newly added memory blocks in
      /sys/devices/system/memory/auto_online_blocks file with two possible
      values: "offline" which preserves the current behavior and "online"
      which causes all newly added memory blocks to go online as soon as
      they're added.  The default is "offline".
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NDaniel Kiper <daniel.kiper@oracle.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Daniel Kiper <daniel.kiper@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31bc3858
    • L
      mm/page_poisoning.c: allow for zero poisoning · 1414c7f4
      Laura Abbott 提交于
      By default, page poisoning uses a poison value (0xaa) on free.  If this
      is changed to 0, the page is not only sanitized but zeroing on alloc
      with __GFP_ZERO can be skipped as well.  The tradeoff is that detecting
      corruption from the poisoning is harder to detect.  This feature also
      cannot be used with hibernation since pages are not guaranteed to be
      zeroed after hibernation.
      
      Credit to Grsecurity/PaX team for inspiring this work
      Signed-off-by: NLaura Abbott <labbott@fedoraproject.org>
      Acked-by: NRafael J. Wysocki <rjw@rjwysocki.net>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mathias Krause <minipli@googlemail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Jianyu Zhan <nasa4836@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1414c7f4
    • L
      mm/page_poison.c: enable PAGE_POISONING as a separate option · 8823b1db
      Laura Abbott 提交于
      Page poisoning is currently set up as a feature if architectures don't
      have architecture debug page_alloc to allow unmapping of pages.  It has
      uses apart from that though.  Clearing of the pages on free provides an
      increase in security as it helps to limit the risk of information leaks.
      Allow page poisoning to be enabled as a separate option independent of
      kernel_map pages since the two features do separate work.  Because of
      how hiberanation is implemented, the checks on alloc cannot occur if
      hibernation is enabled.  The runtime alloc checks can also be enabled
      with an option when !HIBERNATION.
      
      Credit to Grsecurity/PaX team for inspiring this work
      Signed-off-by: NLaura Abbott <labbott@fedoraproject.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mathias Krause <minipli@googlemail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Jianyu Zhan <nasa4836@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8823b1db
    • V
      mm, debug: move bad flags printing to bad_page() · ff8e8116
      Vlastimil Babka 提交于
      Since bad_page() is the only user of the badflags parameter of
      dump_page_badflags(), we can move the code to bad_page() and simplify a
      bit.
      
      The dump_page_badflags() function is renamed to __dump_page() and can
      still be called separately from dump_page() for temporary debug prints
      where page_owner info is not desired.
      
      The only user-visible change is that page->mem_cgroup is printed before
      the bad flags.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff8e8116
    • V
      mm, page_owner: dump page owner info from dump_page() · 4e462112
      Vlastimil Babka 提交于
      The page_owner mechanism is useful for dealing with memory leaks.  By
      reading /sys/kernel/debug/page_owner one can determine the stack traces
      leading to allocations of all pages, and find e.g.  a buggy driver.
      
      This information might be also potentially useful for debugging, such as
      the VM_BUG_ON_PAGE() calls to dump_page().  So let's print the stored
      info from dump_page().
      
      Example output:
      
        page:ffffea000292f1c0 count:1 mapcount:0 mapping:ffff8800b2f6cc18 index:0x91d
        flags: 0x1fffff8001002c(referenced|uptodate|lru|mappedtodisk)
        page dumped because: VM_BUG_ON_PAGE(1)
        page->mem_cgroup:ffff8801392c5000
        page allocated via order 0, migratetype Movable, gfp_mask 0x24213ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD|__GFP_NOWARN|__GFP_NORETRY)
         [<ffffffff811682c4>] __alloc_pages_nodemask+0x134/0x230
         [<ffffffff811b40c8>] alloc_pages_current+0x88/0x120
         [<ffffffff8115e386>] __page_cache_alloc+0xe6/0x120
         [<ffffffff8116ba6c>] __do_page_cache_readahead+0xdc/0x240
         [<ffffffff8116bd05>] ondemand_readahead+0x135/0x260
         [<ffffffff8116be9c>] page_cache_async_readahead+0x6c/0x70
         [<ffffffff811604c2>] generic_file_read_iter+0x3f2/0x760
         [<ffffffff811e0dc7>] __vfs_read+0xa7/0xd0
        page has been migrated, last migrate reason: compaction
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e462112
    • V
      mm, page_owner: track and print last migrate reason · 7cd12b4a
      Vlastimil Babka 提交于
      During migration, page_owner info is now copied with the rest of the
      page, so the stacktrace leading to free page allocation during migration
      is overwritten.  For debugging purposes, it might be however useful to
      know that the page has been migrated since its initial allocation.  This
      might happen many times during the lifetime for different reasons and
      fully tracking this, especially with stacktraces would incur extra
      memory costs.  As a compromise, store and print the migrate_reason of
      the last migration that occurred to the page.  This is enough to
      distinguish compaction, numa balancing etc.
      
      Example page_owner entry after the patch:
      
        Page allocated via order 0, mask 0x24200ca(GFP_HIGHUSER_MOVABLE)
        PFN 628753 type Movable Block 1228 type Movable Flags 0x1fffff80040030(dirty|lru|swapbacked)
         [<ffffffff811682c4>] __alloc_pages_nodemask+0x134/0x230
         [<ffffffff811b6325>] alloc_pages_vma+0xb5/0x250
         [<ffffffff81177491>] shmem_alloc_page+0x61/0x90
         [<ffffffff8117a438>] shmem_getpage_gfp+0x678/0x960
         [<ffffffff8117c2b9>] shmem_fallocate+0x329/0x440
         [<ffffffff811de600>] vfs_fallocate+0x140/0x230
         [<ffffffff811df434>] SyS_fallocate+0x44/0x70
         [<ffffffff8158cc2e>] entry_SYSCALL_64_fastpath+0x12/0x71
        Page has been migrated, last migrate reason: compaction
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7cd12b4a
    • V
      mm, page_owner: copy page owner info during migration · d435edca
      Vlastimil Babka 提交于
      The page_owner mechanism stores gfp_flags of an allocation and stack
      trace that lead to it.  During page migration, the original information
      is practically replaced by the allocation of free page as the migration
      target.  Arguably this is less useful and might lead to all the
      page_owner info for migratable pages gradually converge towards
      compaction or numa balancing migrations.  It has also lead to
      inaccuracies such as one fixed by commit e2cfc911 ("mm/page_owner:
      set correct gfp_mask on page_owner").
      
      This patch thus introduces copying the page_owner info during migration.
      However, since the fact that the page has been migrated from its
      original place might be useful for debugging, the next patch will
      introduce a way to track that information as well.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d435edca
    • V
      mm, page_owner: convert page_owner_inited to static key · 7dd80b8a
      Vlastimil Babka 提交于
      CONFIG_PAGE_OWNER attempts to impose negligible runtime overhead when
      enabled during compilation, but not actually enabled during runtime by
      boot param page_owner=on.  This overhead can be further reduced using
      the static key mechanism, which this patch does.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7dd80b8a
    • V
      mm, page_owner: print migratetype of page and pageblock, symbolic flags · 60f30350
      Vlastimil Babka 提交于
      The information in /sys/kernel/debug/page_owner includes the migratetype
      of the pageblock the page belongs to.  This is also checked against the
      page's migratetype (as declared by gfp_flags during its allocation), and
      the page is reported as Fallback if its migratetype differs from the
      pageblock's one.  t This is somewhat misleading because in fact fallback
      allocation is not the only reason why these two can differ.  It also
      doesn't direcly provide the page's migratetype, although it's possible
      to derive that from the gfp_flags.
      
      It's arguably better to print both page and pageblock's migratetype and
      leave the interpretation to the consumer than to suggest fallback
      allocation as the only possible reason.  While at it, we can print the
      migratetypes as string the same way as /proc/pagetypeinfo does, as some
      of the numeric values depend on kernel configuration.  For that, this
      patch moves the migratetype_names array from #ifdef CONFIG_PROC_FS part
      of mm/vmstat.c to mm/page_alloc.c and exports it.
      
      With the new format strings for flags, we can now also provide symbolic
      page and gfp flags in the /sys/kernel/debug/page_owner file.  This
      replaces the positional printing of page flags as single letters, which
      might have looked nicer, but was limited to a subset of flags, and
      required the user to remember the letters.
      
      Example page_owner entry after the patch:
      
        Page allocated via order 0, mask 0x24213ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD|__GFP_NOWARN|__GFP_NORETRY)
        PFN 520 type Movable Block 1 type Movable Flags 0xfffff8001006c(referenced|uptodate|lru|active|mappedtodisk)
         [<ffffffff811682c4>] __alloc_pages_nodemask+0x134/0x230
         [<ffffffff811b4058>] alloc_pages_current+0x88/0x120
         [<ffffffff8115e386>] __page_cache_alloc+0xe6/0x120
         [<ffffffff8116ba6c>] __do_page_cache_readahead+0xdc/0x240
         [<ffffffff8116bd05>] ondemand_readahead+0x135/0x260
         [<ffffffff8116bfb1>] page_cache_sync_readahead+0x31/0x50
         [<ffffffff81160523>] generic_file_read_iter+0x453/0x760
         [<ffffffff811e0d57>] __vfs_read+0xa7/0xd0
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      60f30350
    • V
      mm, tracing: unify mm flags handling in tracepoints and printk · 420adbe9
      Vlastimil Babka 提交于
      In tracepoints, it's possible to print gfp flags in a human-friendly
      format through a macro show_gfp_flags(), which defines a translation
      array and passes is to __print_flags().  Since the following patch will
      introduce support for gfp flags printing in printk(), it would be nice
      to reuse the array.  This is not straightforward, since __print_flags()
      can't simply reference an array defined in a .c file such as mm/debug.c
      - it has to be a macro to allow the macro magic to communicate the
      format to userspace tools such as trace-cmd.
      
      The solution is to create a macro __def_gfpflag_names which is used both
      in show_gfp_flags(), and to define the gfpflag_names[] array in
      mm/debug.c.
      
      On the other hand, mm/debug.c also defines translation tables for page
      flags and vma flags, and desire was expressed (but not implemented in
      this series) to use these also from tracepoints.  Thus, this patch also
      renames the events/gfpflags.h file to events/mmflags.h and moves the
      table definitions there, using the same macro approach as for gfpflags.
      This allows translating all three kinds of mm-specific flags both in
      tracepoints and printk.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NMichal Hocko <mhocko@suse.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      420adbe9
    • V
      tools, perf: make gfp_compact_table up to date · 14e0a214
      Vlastimil Babka 提交于
      When updating tracing's show_gfp_flags() I have noticed that perf's
      gfp_compact_table is also outdated.  Fill in the missing flags and place
      a note in gfp.h to increase chance that future updates are synced.
      Convert the __GFP_X flags from "GFP_X" to "__GFP_X" strings in line with
      the previous patch.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      14e0a214
    • V
      mm, tracing: make show_gfp_flags() up to date · 1f7866b4
      Vlastimil Babka 提交于
      The show_gfp_flags() macro provides human-friendly printing of gfp flags
      in tracepoints.  However, it is somewhat out of date and missing several
      flags.  This patches fills in the missing flags, and distinguishes
      properly between GFP_ATOMIC and __GFP_ATOMIC which were both translated
      to "GFP_ATOMIC".  More generally, all __GFP_X flags which were
      previously printed as GFP_X, are now printed as __GFP_X, since ommiting
      the underscores results in output that doesn't actually match the source
      code, and can only lead to confusion.  Where both variants are defined
      equal (e.g.  _DMA and _DMA32), the variant without underscores are
      preferred.
      
      Also add a note in gfp.h so hopefully future changes will be synced
      better.
      
      __GFP_MOVABLE is defined twice in include/linux/gfp.h with different
      comments.  Leave just the newer one, which was intended to replace the
      old one.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f7866b4
    • V
      tracepoints: move trace_print_flags definitions to tracepoint-defs.h · 20f6e03a
      Vlastimil Babka 提交于
      The following patch will need to declare array of struct
      trace_print_flags in a header.  To prevent this header from pulling in
      all of RCU through trace_events.h, move the struct
      trace_print_flags{_64} definitions to the new lightweight
      tracepoint-defs.h header.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20f6e03a
    • J
      mm/slub: support left redzone · d86bd1be
      Joonsoo Kim 提交于
      SLUB already has a redzone debugging feature.  But it is only positioned
      at the end of object (aka right redzone) so it cannot catch left oob.
      Although current object's right redzone acts as left redzone of next
      object, first object in a slab cannot take advantage of this effect.
      This patch explicitly adds a left red zone to each object to detect left
      oob more precisely.
      
      Background:
      
      Someone complained to me that left OOB doesn't catch even if KASAN is
      enabled which does page allocation debugging.  That page is out of our
      control so it would be allocated when left OOB happens and, in this
      case, we can't find OOB.  Moreover, SLUB debugging feature can be
      enabled without page allocator debugging and, in this case, we will miss
      that OOB.
      
      Before trying to implement, I expected that changes would be too
      complex, but, it doesn't look that complex to me now.  Almost changes
      are applied to debug specific functions so I feel okay.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d86bd1be
    • L
      slub: convert SLAB_DEBUG_FREE to SLAB_CONSISTENCY_CHECKS · becfda68
      Laura Abbott 提交于
      SLAB_DEBUG_FREE allows expensive consistency checks at free to be turned
      on or off.  Expand its use to be able to turn off all consistency
      checks.  This gives a nice speed up if you only want features such as
      poisoning or tracing.
      
      Credit to Mathias Krause for the original work which inspired this
      series
      Signed-off-by: NLaura Abbott <labbott@fedoraproject.org>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mathias Krause <minipli@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      becfda68
    • J
      mm/slab: alternative implementation for DEBUG_SLAB_LEAK · d31676df
      Joonsoo Kim 提交于
      DEBUG_SLAB_LEAK is a debug option.  It's current implementation requires
      status buffer so we need more memory to use it.  And, it cause
      kmem_cache initialization step more complex.
      
      To remove this extra memory usage and to simplify initialization step,
      this patch implement this feature with another way.
      
      When user requests to get slab object owner information, it marks that
      getting information is started.  And then, all free objects in caches
      are flushed to corresponding slab page.  Now, we can distinguish all
      freed object so we can know all allocated objects, too.  After
      collecting slab object owner information on allocated objects, mark is
      checked that there is no free during the processing.  If true, we can be
      sure that our information is correct so information is returned to user.
      
      Although this way is rather complex, it has two important benefits
      mentioned above.  So, I think it is worth changing.
      
      There is one drawback that it takes more time to get slab object owner
      information but it is just a debug option so it doesn't matter at all.
      
      To help review, this patch implements new way only.  Following patch
      will remove useless code.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d31676df
    • J
      mm/slab: clean up DEBUG_PAGEALLOC processing code · 40b44137
      Joonsoo Kim 提交于
      Currently, open code for checking DEBUG_PAGEALLOC cache is spread to
      some sites.  It makes code unreadable and hard to change.
      
      This patch cleans up this code.  The following patch will change the
      criteria for DEBUG_PAGEALLOC cache so this clean-up will help it, too.
      
      [akpm@linux-foundation.org: fix build with CONFIG_DEBUG_PAGEALLOC=n]
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      40b44137
    • J
      mm: fix some spelling · 9f706d68
      Jesper Dangaard Brouer 提交于
      Fix up trivial spelling errors, noticed while reading the code.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f706d68
    • J
      mm: new API kfree_bulk() for SLAB+SLUB allocators · ca257195
      Jesper Dangaard Brouer 提交于
      This patch introduce a new API call kfree_bulk() for bulk freeing memory
      objects not bound to a single kmem_cache.
      
      Christoph pointed out that it is possible to implement freeing of
      objects, without knowing the kmem_cache pointer as that information is
      available from the object's page->slab_cache.  Proposing to remove the
      kmem_cache argument from the bulk free API.
      
      Jesper demonstrated that these extra steps per object comes at a
      performance cost.  It is only in the case CONFIG_MEMCG_KMEM is compiled
      in and activated runtime that these steps are done anyhow.  The extra
      cost is most visible for SLAB allocator, because the SLUB allocator does
      the page lookup (virt_to_head_page()) anyhow.
      
      Thus, the conclusion was to keep the kmem_cache free bulk API with a
      kmem_cache pointer, but we can still implement a kfree_bulk() API fairly
      easily.  Simply by handling if kmem_cache_free_bulk() gets called with a
      kmem_cache NULL pointer.
      
      This does increase the code size a bit, but implementing a separate
      kfree_bulk() call would likely increase code size even more.
      
      Below benchmarks cost of alloc+free (obj size 256 bytes) on CPU i7-4790K
      @ 4.00GHz, no PREEMPT and CONFIG_MEMCG_KMEM=y.
      
      Code size increase for SLAB:
      
       add/remove: 0/0 grow/shrink: 1/0 up/down: 74/0 (74)
       function                                     old     new   delta
       kmem_cache_free_bulk                         660     734     +74
      
      SLAB fastpath: 87 cycles(tsc) 21.814
        sz - fallback             - kmem_cache_free_bulk - kfree_bulk
         1 - 103 cycles 25.878 ns -  41 cycles 10.498 ns - 81 cycles 20.312 ns
         2 -  94 cycles 23.673 ns -  26 cycles  6.682 ns - 42 cycles 10.649 ns
         3 -  92 cycles 23.181 ns -  21 cycles  5.325 ns - 39 cycles 9.950 ns
         4 -  90 cycles 22.727 ns -  18 cycles  4.673 ns - 26 cycles 6.693 ns
         8 -  89 cycles 22.270 ns -  14 cycles  3.664 ns - 23 cycles 5.835 ns
        16 -  88 cycles 22.038 ns -  14 cycles  3.503 ns - 22 cycles 5.543 ns
        30 -  89 cycles 22.284 ns -  13 cycles  3.310 ns - 20 cycles 5.197 ns
        32 -  88 cycles 22.249 ns -  13 cycles  3.420 ns - 20 cycles 5.166 ns
        34 -  88 cycles 22.224 ns -  14 cycles  3.643 ns - 20 cycles 5.170 ns
        48 -  88 cycles 22.088 ns -  14 cycles  3.507 ns - 20 cycles 5.203 ns
        64 -  88 cycles 22.063 ns -  13 cycles  3.428 ns - 20 cycles 5.152 ns
       128 -  89 cycles 22.483 ns -  15 cycles  3.891 ns - 23 cycles 5.885 ns
       158 -  89 cycles 22.381 ns -  15 cycles  3.779 ns - 22 cycles 5.548 ns
       250 -  91 cycles 22.798 ns -  16 cycles  4.152 ns - 23 cycles 5.967 ns
      
      SLAB when enabling MEMCG_KMEM runtime:
       - kmemcg fastpath: 130 cycles(tsc) 32.684 ns (step:0)
       1 - 148 cycles 37.220 ns -  66 cycles 16.622 ns - 66 cycles 16.583 ns
       2 - 141 cycles 35.510 ns -  51 cycles 12.820 ns - 58 cycles 14.625 ns
       3 - 140 cycles 35.017 ns -  37 cycles 9.326 ns - 33 cycles 8.474 ns
       4 - 137 cycles 34.507 ns -  31 cycles 7.888 ns - 33 cycles 8.300 ns
       8 - 140 cycles 35.069 ns -  25 cycles 6.461 ns - 25 cycles 6.436 ns
       16 - 138 cycles 34.542 ns -  23 cycles 5.945 ns - 22 cycles 5.670 ns
       30 - 136 cycles 34.227 ns -  22 cycles 5.502 ns - 22 cycles 5.587 ns
       32 - 136 cycles 34.253 ns -  21 cycles 5.475 ns - 21 cycles 5.324 ns
       34 - 136 cycles 34.254 ns -  21 cycles 5.448 ns - 20 cycles 5.194 ns
       48 - 136 cycles 34.075 ns -  21 cycles 5.458 ns - 21 cycles 5.367 ns
       64 - 135 cycles 33.994 ns -  21 cycles 5.350 ns - 21 cycles 5.259 ns
       128 - 137 cycles 34.446 ns -  23 cycles 5.816 ns - 22 cycles 5.688 ns
       158 - 137 cycles 34.379 ns -  22 cycles 5.727 ns - 22 cycles 5.602 ns
       250 - 138 cycles 34.755 ns -  24 cycles 6.093 ns - 23 cycles 5.986 ns
      
      Code size increase for SLUB:
       function                                     old     new   delta
       kmem_cache_free_bulk                         717     799     +82
      
      SLUB benchmark:
       SLUB fastpath: 46 cycles(tsc) 11.691 ns (step:0)
        sz - fallback             - kmem_cache_free_bulk - kfree_bulk
         1 -  61 cycles 15.486 ns -  53 cycles 13.364 ns - 57 cycles 14.464 ns
         2 -  54 cycles 13.703 ns -  32 cycles  8.110 ns - 33 cycles 8.482 ns
         3 -  53 cycles 13.272 ns -  25 cycles  6.362 ns - 27 cycles 6.947 ns
         4 -  51 cycles 12.994 ns -  24 cycles  6.087 ns - 24 cycles 6.078 ns
         8 -  50 cycles 12.576 ns -  21 cycles  5.354 ns - 22 cycles 5.513 ns
        16 -  49 cycles 12.368 ns -  20 cycles  5.054 ns - 20 cycles 5.042 ns
        30 -  49 cycles 12.273 ns -  18 cycles  4.748 ns - 19 cycles 4.758 ns
        32 -  49 cycles 12.401 ns -  19 cycles  4.821 ns - 19 cycles 4.810 ns
        34 -  98 cycles 24.519 ns -  24 cycles  6.154 ns - 24 cycles 6.157 ns
        48 -  83 cycles 20.833 ns -  21 cycles  5.446 ns - 21 cycles 5.429 ns
        64 -  75 cycles 18.891 ns -  20 cycles  5.247 ns - 20 cycles 5.238 ns
       128 -  93 cycles 23.271 ns -  27 cycles  6.856 ns - 27 cycles 6.823 ns
       158 - 102 cycles 25.581 ns -  30 cycles  7.714 ns - 30 cycles 7.695 ns
       250 - 107 cycles 26.917 ns -  38 cycles  9.514 ns - 38 cycles 9.506 ns
      
      SLUB when enabling MEMCG_KMEM runtime:
       - kmemcg fastpath: 71 cycles(tsc) 17.897 ns (step:0)
       1 - 85 cycles 21.484 ns -  78 cycles 19.569 ns - 75 cycles 18.938 ns
       2 - 81 cycles 20.363 ns -  45 cycles 11.258 ns - 44 cycles 11.076 ns
       3 - 78 cycles 19.709 ns -  33 cycles 8.354 ns - 32 cycles 8.044 ns
       4 - 77 cycles 19.430 ns -  28 cycles 7.216 ns - 28 cycles 7.003 ns
       8 - 101 cycles 25.288 ns -  23 cycles 5.849 ns - 23 cycles 5.787 ns
       16 - 76 cycles 19.148 ns -  20 cycles 5.162 ns - 20 cycles 5.081 ns
       30 - 76 cycles 19.067 ns -  19 cycles 4.868 ns - 19 cycles 4.821 ns
       32 - 76 cycles 19.052 ns -  19 cycles 4.857 ns - 19 cycles 4.815 ns
       34 - 121 cycles 30.291 ns -  25 cycles 6.333 ns - 25 cycles 6.268 ns
       48 - 108 cycles 27.111 ns -  21 cycles 5.498 ns - 21 cycles 5.458 ns
       64 - 100 cycles 25.164 ns -  20 cycles 5.242 ns - 20 cycles 5.229 ns
       128 - 155 cycles 38.976 ns -  27 cycles 6.886 ns - 27 cycles 6.892 ns
       158 - 132 cycles 33.034 ns -  30 cycles 7.711 ns - 30 cycles 7.728 ns
       250 - 130 cycles 32.612 ns -  38 cycles 9.560 ns - 38 cycles 9.549 ns
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca257195
    • J
      mm: fault-inject take over bootstrap kmem_cache check · fab9963a
      Jesper Dangaard Brouer 提交于
      Remove the SLAB specific function slab_should_failslab(), by moving the
      check against fault-injection for the bootstrap slab, into the shared
      function should_failslab() (used by both SLAB and SLUB).
      
      This is a step towards sharing alloc_hook's between SLUB and SLAB.
      
      This bootstrap slab "kmem_cache" is used for allocating struct
      kmem_cache objects to the allocator itself.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fab9963a
  3. 15 3月, 2016 3 次提交
  4. 14 3月, 2016 3 次提交
  5. 13 3月, 2016 1 次提交
  6. 12 3月, 2016 1 次提交