1. 11 1月, 2017 4 次提交
  2. 15 12月, 2016 1 次提交
    • A
      mm: add support for releasing multiple instances of a page · 44fdffd7
      Alexander Duyck 提交于
      Add a function that allows us to batch free a page that has multiple
      references outstanding.  Specifically this function can be used to drop
      a page being used in the page frag alloc cache.  With this drivers can
      make use of functionality similar to the page frag alloc cache without
      having to do any workarounds for the fact that there is no function that
      frees multiple references.
      
      Link: http://lkml.kernel.org/r/20161110113606.76501.70752.stgit@ahduyck-blue-test.jf.intel.comSigned-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      Cc: Helge Deller <deller@gmx.de>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Keguang Zhang <keguang.zhang@gmail.com>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Tobias Klauser <tklauser@distanz.ch>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      44fdffd7
  3. 13 12月, 2016 5 次提交
    • M
      mm, page_alloc: keep pcp count and list contents in sync if struct page is corrupted · a6de734b
      Mel Gorman 提交于
      Vlastimil Babka pointed out that commit 479f854a ("mm, page_alloc:
      defer debugging checks of pages allocated from the PCP") will allow the
      per-cpu list counter to be out of sync with the per-cpu list contents if
      a struct page is corrupted.
      
      The consequence is an infinite loop if the per-cpu lists get fully
      drained by free_pcppages_bulk because all the lists are empty but the
      count is positive.  The infinite loop occurs here
      
                      do {
                              batch_free++;
                              if (++migratetype == MIGRATE_PCPTYPES)
                                      migratetype = 0;
                              list = &pcp->lists[migratetype];
                      } while (list_empty(list));
      
      What the user sees is a bad page warning followed by a soft lockup with
      interrupts disabled in free_pcppages_bulk().
      
      This patch keeps the accounting in sync.
      
      Fixes: 479f854a ("mm, page_alloc: defer debugging checks of pages allocated from the PCP")
      Link: http://lkml.kernel.org/r/20161202112951.23346-2-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: <stable@vger.kernel.org>	[4.7+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6de734b
    • M
      mm: make unreserve highatomic functions reliable · 29fac03b
      Minchan Kim 提交于
      Currently, unreserve_highatomic_pageblock bails out if it found
      highatomic pageblock regardless of really moving free pages from the one
      so that it could mitigate unreserve logic's goal which saves OOM of a
      process.
      
      This patch makes unreserve functions bail out only if it moves some
      pages out of !highatomic free list to avoid such false positive.
      
      Another potential problem is that by race between page freeing and
      reserve highatomic function, pages could be in highatomic free list even
      though the pageblock is !high atomic migratetype.  In that case,
      unreserve_highatomic_pageblock can be void if count of highatomic
      reserve is less than pageblock_nr_pages.  We could solve it simply via
      draining all of reserved pages before the OOM.  It would have a
      safeguard role to exhuast reserved pages before converging to OOM.
      
      Link: http://lkml.kernel.org/r/1476259429-18279-5-git-send-email-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Sangseok Lee <sangseok.lee@lge.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29fac03b
    • M
      mm: try to exhaust highatomic reserve before the OOM · 04c8716f
      Minchan Kim 提交于
      I got OOM report from production team with v4.4 kernel.  It had enough
      free memory but failed to allocate GFP_KERNEL order-0 page and finally
      encountered OOM kill.  It occured during QA process which launches
      several apps, switching and so on.  It happned rarely.  IOW, In normal
      situation, it was not a problem but if we are unluck so that several
      apps uses peak memory at the same time, it can happen.  If we manage to
      pass the phase, the system can go working well.
      
      I could reproduce it with my test(memory spike easily.  Look at below.
      
      The reason is free pages(19M) of DMA32 zone are reserved for
      HIGHORDERATOMIC and doesn't unreserved before the OOM.
      
        balloon invoked oom-killer: gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
        balloon cpuset=/ mems_allowed=0
        CPU: 1 PID: 8473 Comm: balloon Tainted: G        W  OE   4.8.0-rc7-00219-g3f74c9559583-dirty #3161
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
        Call Trace:
          dump_stack+0x63/0x90
          dump_header+0x5c/0x1ce
          oom_kill_process+0x22e/0x400
          out_of_memory+0x1ac/0x210
          __alloc_pages_nodemask+0x101e/0x1040
          handle_mm_fault+0xa0a/0xbf0
          __do_page_fault+0x1dd/0x4d0
          trace_do_page_fault+0x43/0x130
          do_async_page_fault+0x1a/0xa0
          async_page_fault+0x28/0x30
        Mem-Info:
        active_anon:383949 inactive_anon:106724 isolated_anon:0
         active_file:15 inactive_file:44 isolated_file:0
         unevictable:0 dirty:0 writeback:24 unstable:0
         slab_reclaimable:2483 slab_unreclaimable:3326
         mapped:0 shmem:0 pagetables:1906 bounce:0
         free:6898 free_pcp:291 free_cma:0
        Node 0 active_anon:1535796kB inactive_anon:426896kB active_file:60kB inactive_file:176kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:96kB shmem:0kB writeback_tmp:0kB unstable:0kB pages_scanned:1418 all_unreclaimable? no
        DMA free:8188kB min:44kB low:56kB high:68kB active_anon:7648kB inactive_anon:0kB active_file:0kB inactive_file:4kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:20kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
        lowmem_reserve[]: 0 1952 1952 1952
        DMA32 free:19404kB min:5628kB low:7624kB high:9620kB active_anon:1528148kB inactive_anon:426896kB active_file:60kB inactive_file:420kB unevictable:0kB writepending:96kB present:2080640kB managed:2030092kB mlocked:0kB slab_reclaimable:9932kB slab_unreclaimable:13284kB kernel_stack:2496kB pagetables:7624kB bounce:0kB free_pcp:900kB local_pcp:112kB free_cma:0kB
        lowmem_reserve[]: 0 0 0 0
        DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 2*4096kB (H) = 8192kB
        DMA32: 7*4kB (H) 8*8kB (H) 30*16kB (H) 31*32kB (H) 14*64kB (H) 9*128kB (H) 2*256kB (H) 2*512kB (H) 4*1024kB (H) 5*2048kB (H) 0*4096kB = 19484kB
        51131 total pagecache pages
        50795 pages in swap cache
        Swap cache stats: add 3532405601, delete 3532354806, find 124289150/1822712228
        Free swap  = 8kB
        Total swap = 255996kB
        524158 pages RAM
        0 pages HighMem/MovableOnly
        12658 pages reserved
        0 pages cma reserved
        0 pages hwpoisoned
      
      Another example exceeded the limit by the race is
      
        in:imklog: page allocation failure: order:0, mode:0x2280020(GFP_ATOMIC|__GFP_NOTRACK)
        CPU: 0 PID: 476 Comm: in:imklog Tainted: G            E   4.8.0-rc7-00217-g266ef83c51e5-dirty #3135
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
        Call Trace:
          dump_stack+0x63/0x90
          warn_alloc_failed+0xdb/0x130
          __alloc_pages_nodemask+0x4d6/0xdb0
          new_slab+0x339/0x490
          ___slab_alloc.constprop.74+0x367/0x480
          __slab_alloc.constprop.73+0x20/0x40
          __kmalloc+0x1a4/0x1e0
          alloc_indirect.isra.14+0x1d/0x50
          virtqueue_add_sgs+0x1c4/0x470
          __virtblk_add_req+0xae/0x1f0
          virtio_queue_rq+0x12d/0x290
          __blk_mq_run_hw_queue+0x239/0x370
          blk_mq_run_hw_queue+0x8f/0xb0
          blk_mq_insert_requests+0x18c/0x1a0
          blk_mq_flush_plug_list+0x125/0x140
          blk_flush_plug_list+0xc7/0x220
          blk_finish_plug+0x2c/0x40
          __do_page_cache_readahead+0x196/0x230
          filemap_fault+0x448/0x4f0
          ext4_filemap_fault+0x36/0x50
          __do_fault+0x75/0x140
          handle_mm_fault+0x84d/0xbe0
          __do_page_fault+0x1dd/0x4d0
          trace_do_page_fault+0x43/0x130
          do_async_page_fault+0x1a/0xa0
          async_page_fault+0x28/0x30
        Mem-Info:
        active_anon:363826 inactive_anon:121283 isolated_anon:32
         active_file:65 inactive_file:152 isolated_file:0
         unevictable:0 dirty:0 writeback:46 unstable:0
         slab_reclaimable:2778 slab_unreclaimable:3070
         mapped:112 shmem:0 pagetables:1822 bounce:0
         free:9469 free_pcp:231 free_cma:0
        Node 0 active_anon:1455304kB inactive_anon:485132kB active_file:260kB inactive_file:608kB unevictable:0kB isolated(anon):128kB isolated(file):0kB mapped:448kB dirty:0kB writeback:184kB shmem:0kB writeback_tmp:0kB unstable:0kB pages_scanned:13641 all_unreclaimable? no
        DMA free:7748kB min:44kB low:56kB high:68kB active_anon:7944kB inactive_anon:104kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:108kB kernel_stack:0kB pagetables:4kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
        lowmem_reserve[]: 0 1952 1952 1952
        DMA32 free:30128kB min:5628kB low:7624kB high:9620kB active_anon:1447360kB inactive_anon:485028kB active_file:260kB inactive_file:608kB unevictable:0kB writepending:184kB present:2080640kB managed:2030132kB mlocked:0kB slab_reclaimable:11112kB slab_unreclaimable:12172kB kernel_stack:2400kB pagetables:7284kB bounce:0kB free_pcp:924kB local_pcp:72kB free_cma:0kB
        lowmem_reserve[]: 0 0 0 0
        DMA: 7*4kB (UE) 3*8kB (UH) 1*16kB (M) 0*32kB 2*64kB (U) 1*128kB (M) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (U) 1*4096kB (H) = 7748kB
        DMA32: 10*4kB (H) 3*8kB (H) 47*16kB (H) 38*32kB (H) 5*64kB (H) 1*128kB (H) 2*256kB (H) 3*512kB (H) 3*1024kB (H) 3*2048kB (H) 4*4096kB (H) = 30128kB
        2775 total pagecache pages
        2536 pages in swap cache
        Swap cache stats: add 206786828, delete 206784292, find 7323106/106686077
        Free swap  = 108744kB
        Total swap = 255996kB
        524158 pages RAM
        0 pages HighMem/MovableOnly
        12648 pages reserved
        0 pages cma reserved
        0 pages hwpoisoned
      
      It's weird to show that zone has enough free memory above min watermark
      but OOMed with 4K GFP_KERNEL allocation due to reserved highatomic
      pages.  As last resort, try to unreserve highatomic pages again and if
      it has moved pages to non-highatmoc free list, retry reclaim once more.
      
      Link: http://lkml.kernel.org/r/1476259429-18279-4-git-send-email-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Sangseok Lee <sangseok.lee@lge.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      04c8716f
    • M
      mm: prevent double decrease of nr_reserved_highatomic · 4855e4a7
      Minchan Kim 提交于
      There is race between page freeing and unreserved highatomic.
      
       CPU 0				    CPU 1
      
          free_hot_cold_page
            mt = get_pfnblock_migratetype
            set_pcppage_migratetype(page, mt)
          				    unreserve_highatomic_pageblock
          				    spin_lock_irqsave(&zone->lock)
          				    move_freepages_block
          				    set_pageblock_migratetype(page)
          				    spin_unlock_irqrestore(&zone->lock)
            free_pcppages_bulk
              __free_one_page(mt) <- mt is stale
      
      By above race, a page on CPU 0 could go non-highorderatomic free list
      since the pageblock's type is changed.  By that, unreserve logic of
      highorderatomic can decrease reserved count on a same pageblock severak
      times and then it will make mismatch between nr_reserved_highatomic and
      the number of reserved pageblock.
      
      So, this patch verifies whether the pageblock is highatomic or not and
      decrease the count only if the pageblock is highatomic.
      
      Link: http://lkml.kernel.org/r/1476259429-18279-3-git-send-email-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Sangseok Lee <sangseok.lee@lge.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4855e4a7
    • M
      mm: don't steal highatomic pageblock · 88ed365e
      Minchan Kim 提交于
      Patch series "use up highorder free pages before OOM", v3.
      
      I got OOM report from production team with v4.4 kernel.  It had enough
      free memory but failed to allocate GFP_KERNEL order-0 page and finally
      encountered OOM kill.  It occured during QA process which launches
      several apps, switching and so on.  It happned rarely.  IOW, In normal
      situation, it was not a problem but if we are unluck so that several
      apps uses peak memory at the same time, it can happen.  If we manage to
      pass the phase, the system can go working well.
      
      I could reproduce it with my test(memory spike easily.  Look at below.
      
      The reason is free pages(19M) of DMA32 zone are reserved for
      HIGHORDERATOMIC and doesn't unreserved before the OOM.
      
        balloon invoked oom-killer: gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
        balloon cpuset=/ mems_allowed=0
        CPU: 1 PID: 8473 Comm: balloon Tainted: G        W  OE   4.8.0-rc7-00219-g3f74c9559583-dirty #3161
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
        Call Trace:
          dump_stack+0x63/0x90
          dump_header+0x5c/0x1ce
          oom_kill_process+0x22e/0x400
          out_of_memory+0x1ac/0x210
          __alloc_pages_nodemask+0x101e/0x1040
          handle_mm_fault+0xa0a/0xbf0
          __do_page_fault+0x1dd/0x4d0
          trace_do_page_fault+0x43/0x130
          do_async_page_fault+0x1a/0xa0
          async_page_fault+0x28/0x30
        Mem-Info:
        active_anon:383949 inactive_anon:106724 isolated_anon:0
         active_file:15 inactive_file:44 isolated_file:0
         unevictable:0 dirty:0 writeback:24 unstable:0
         slab_reclaimable:2483 slab_unreclaimable:3326
         mapped:0 shmem:0 pagetables:1906 bounce:0
         free:6898 free_pcp:291 free_cma:0
        Node 0 active_anon:1535796kB inactive_anon:426896kB active_file:60kB inactive_file:176kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:96kB shmem:0kB writeback_tmp:0kB unstable:0kB pages_scanned:1418 all_unreclaimable? no
        DMA free:8188kB min:44kB low:56kB high:68kB active_anon:7648kB inactive_anon:0kB active_file:0kB inactive_file:4kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:20kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
        lowmem_reserve[]: 0 1952 1952 1952
        DMA32 free:19404kB min:5628kB low:7624kB high:9620kB active_anon:1528148kB inactive_anon:426896kB active_file:60kB inactive_file:420kB unevictable:0kB writepending:96kB present:2080640kB managed:2030092kB mlocked:0kB slab_reclaimable:9932kB slab_unreclaimable:13284kB kernel_stack:2496kB pagetables:7624kB bounce:0kB free_pcp:900kB local_pcp:112kB free_cma:0kB
        lowmem_reserve[]: 0 0 0 0
        DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 2*4096kB (H) = 8192kB
        DMA32: 7*4kB (H) 8*8kB (H) 30*16kB (H) 31*32kB (H) 14*64kB (H) 9*128kB (H) 2*256kB (H) 2*512kB (H) 4*1024kB (H) 5*2048kB (H) 0*4096kB = 19484kB
        51131 total pagecache pages
        50795 pages in swap cache
        Swap cache stats: add 3532405601, delete 3532354806, find 124289150/1822712228
        Free swap  = 8kB
        Total swap = 255996kB
        524158 pages RAM
        0 pages HighMem/MovableOnly
        12658 pages reserved
        0 pages cma reserved
        0 pages hwpoisoned
      
      Another example exceeded the limit by the race is
      
        in:imklog: page allocation failure: order:0, mode:0x2280020(GFP_ATOMIC|__GFP_NOTRACK)
        CPU: 0 PID: 476 Comm: in:imklog Tainted: G            E   4.8.0-rc7-00217-g266ef83c51e5-dirty #3135
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
        Call Trace:
          dump_stack+0x63/0x90
          warn_alloc_failed+0xdb/0x130
          __alloc_pages_nodemask+0x4d6/0xdb0
          new_slab+0x339/0x490
          ___slab_alloc.constprop.74+0x367/0x480
          __slab_alloc.constprop.73+0x20/0x40
          __kmalloc+0x1a4/0x1e0
          alloc_indirect.isra.14+0x1d/0x50
          virtqueue_add_sgs+0x1c4/0x470
          __virtblk_add_req+0xae/0x1f0
          virtio_queue_rq+0x12d/0x290
          __blk_mq_run_hw_queue+0x239/0x370
          blk_mq_run_hw_queue+0x8f/0xb0
          blk_mq_insert_requests+0x18c/0x1a0
          blk_mq_flush_plug_list+0x125/0x140
          blk_flush_plug_list+0xc7/0x220
          blk_finish_plug+0x2c/0x40
          __do_page_cache_readahead+0x196/0x230
          filemap_fault+0x448/0x4f0
          ext4_filemap_fault+0x36/0x50
          __do_fault+0x75/0x140
          handle_mm_fault+0x84d/0xbe0
          __do_page_fault+0x1dd/0x4d0
          trace_do_page_fault+0x43/0x130
          do_async_page_fault+0x1a/0xa0
          async_page_fault+0x28/0x30
        Mem-Info:
        active_anon:363826 inactive_anon:121283 isolated_anon:32
         active_file:65 inactive_file:152 isolated_file:0
         unevictable:0 dirty:0 writeback:46 unstable:0
         slab_reclaimable:2778 slab_unreclaimable:3070
         mapped:112 shmem:0 pagetables:1822 bounce:0
         free:9469 free_pcp:231 free_cma:0
        Node 0 active_anon:1455304kB inactive_anon:485132kB active_file:260kB inactive_file:608kB unevictable:0kB isolated(anon):128kB isolated(file):0kB mapped:448kB dirty:0kB writeback:184kB shmem:0kB writeback_tmp:0kB unstable:0kB pages_scanned:13641 all_unreclaimable? no
        DMA free:7748kB min:44kB low:56kB high:68kB active_anon:7944kB inactive_anon:104kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:108kB kernel_stack:0kB pagetables:4kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
        lowmem_reserve[]: 0 1952 1952 1952
        DMA32 free:30128kB min:5628kB low:7624kB high:9620kB active_anon:1447360kB inactive_anon:485028kB active_file:260kB inactive_file:608kB unevictable:0kB writepending:184kB present:2080640kB managed:2030132kB mlocked:0kB slab_reclaimable:11112kB slab_unreclaimable:12172kB kernel_stack:2400kB pagetables:7284kB bounce:0kB free_pcp:924kB local_pcp:72kB free_cma:0kB
        lowmem_reserve[]: 0 0 0 0
        DMA: 7*4kB (UE) 3*8kB (UH) 1*16kB (M) 0*32kB 2*64kB (U) 1*128kB (M) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (U) 1*4096kB (H) = 7748kB
        DMA32: 10*4kB (H) 3*8kB (H) 47*16kB (H) 38*32kB (H) 5*64kB (H) 1*128kB (H) 2*256kB (H) 3*512kB (H) 3*1024kB (H) 3*2048kB (H) 4*4096kB (H) = 30128kB
        2775 total pagecache pages
        2536 pages in swap cache
        Swap cache stats: add 206786828, delete 206784292, find 7323106/106686077
        Free swap  = 108744kB
        Total swap = 255996kB
        524158 pages RAM
        0 pages HighMem/MovableOnly
        12648 pages reserved
        0 pages cma reserved
        0 pages hwpoisoned
      
      During the investigation, I found some problems with highatomic so this
      patch aims to solve the problems and the final goal is to unreserve
      every highatomic free pages before the OOM kill.
      
      This patch (of 4):
      
      In page freeing path, migratetype is racy so that a highorderatomic page
      could free into non-highorderatomic free list.  If that page is
      allocated, VM can change the pageblock from higorderatomic to something.
      In that case, highatomic pageblock accounting is broken so it doesn't
      work(e.g., VM cannot reserve highorderatomic pageblocks any more
      although it doesn't reach 1% limit).
      
      So, this patch prohibits the changing from highatomic to other type.
      It's no problem because MIGRATE_HIGHATOMIC is not listed in fallback
      array so stealing will only happen due to unexpected races which is
      really rare.  Also, such prohibiting keeps highatomic pageblock more
      longer so it would be better for highorderatomic page allocation.
      
      Link: http://lkml.kernel.org/r/1476259429-18279-2-git-send-email-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Sangseok Lee <sangseok.lee@lge.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88ed365e
  4. 12 11月, 2016 1 次提交
  5. 10 11月, 2016 1 次提交
  6. 01 11月, 2016 1 次提交
    • K
      latent_entropy: Fix wrong gcc code generation with 64 bit variables · 58bea414
      Kees Cook 提交于
      The stack frame size could grow too large when the plugin used long long
      on 32-bit architectures when the given function had too many basic blocks.
      
      The gcc warning was:
      
      drivers/pci/hotplug/ibmphp_ebda.c: In function 'ibmphp_access_ebda':
      drivers/pci/hotplug/ibmphp_ebda.c:409:1: warning: the frame size of 1108 bytes is larger than 1024 bytes [-Wframe-larger-than=]
      
      This switches latent_entropy from u64 to unsigned long.
      
      Thanks to PaX Team and Emese Revfy for the patch.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      58bea414
  7. 28 10月, 2016 2 次提交
  8. 26 10月, 2016 1 次提交
    • J
      mm/page_alloc: Remove kernel address exposure in free_reserved_area() · adb1fe9a
      Josh Poimboeuf 提交于
      Linus suggested we try to remove some of the low-hanging fruit related
      to kernel address exposure in dmesg.  The only leaks I see on my local
      system are:
      
        Freeing SMP alternatives memory: 32K (ffffffff9e309000 - ffffffff9e311000)
        Freeing initrd memory: 10588K (ffffa0b736b42000 - ffffa0b737599000)
        Freeing unused kernel memory: 3592K (ffffffff9df87000 - ffffffff9e309000)
        Freeing unused kernel memory: 1352K (ffffa0b7288ae000 - ffffa0b728a00000)
        Freeing unused kernel memory: 632K (ffffa0b728d62000 - ffffa0b728e00000)
      
      Linus says:
      
        "I suspect we should just remove [the addresses in the 'Freeing'
         messages]. I'm sure they are useful in theory, but I suspect they
         were more useful back when the whole "free init memory" was
         originally done.
      
         These days, if we have a use-after-free, I suspect the init-mem
         situation is the easiest situation by far. Compared to all the dynamic
         allocations which are much more likely to show it anyway. So having
         debug output for that case is likely not all that productive."
      
      With this patch the freeing messages now look like this:
      
        Freeing SMP alternatives memory: 32K
        Freeing initrd memory: 10588K
        Freeing unused kernel memory: 3592K
        Freeing unused kernel memory: 1352K
        Freeing unused kernel memory: 632K
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/6836ff90c45b71d38e5d4405aec56fa9e5d1d4b2.1477405374.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      adb1fe9a
  9. 11 10月, 2016 2 次提交
    • E
      latent_entropy: Mark functions with __latent_entropy · 0766f788
      Emese Revfy 提交于
      The __latent_entropy gcc attribute can be used only on functions and
      variables.  If it is on a function then the plugin will instrument it for
      gathering control-flow entropy. If the attribute is on a variable then
      the plugin will initialize it with random contents.  The variable must
      be an integer, an integer array type or a structure with integer fields.
      
      These specific functions have been selected because they are init
      functions (to help gather boot-time entropy), are called at unpredictable
      times, or they have variable loops, each of which provide some level of
      latent entropy.
      Signed-off-by: NEmese Revfy <re.emese@gmail.com>
      [kees: expanded commit message]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      0766f788
    • E
      gcc-plugins: Add latent_entropy plugin · 38addce8
      Emese Revfy 提交于
      This adds a new gcc plugin named "latent_entropy". It is designed to
      extract as much possible uncertainty from a running system at boot time as
      possible, hoping to capitalize on any possible variation in CPU operation
      (due to runtime data differences, hardware differences, SMP ordering,
      thermal timing variation, cache behavior, etc).
      
      At the very least, this plugin is a much more comprehensive example for
      how to manipulate kernel code using the gcc plugin internals.
      
      The need for very-early boot entropy tends to be very architecture or
      system design specific, so this plugin is more suited for those sorts
      of special cases. The existing kernel RNG already attempts to extract
      entropy from reliable runtime variation, but this plugin takes the idea to
      a logical extreme by permuting a global variable based on any variation
      in code execution (e.g. a different value (and permutation function)
      is used to permute the global based on loop count, case statement,
      if/then/else branching, etc).
      
      To do this, the plugin starts by inserting a local variable in every
      marked function. The plugin then adds logic so that the value of this
      variable is modified by randomly chosen operations (add, xor and rol) and
      random values (gcc generates separate static values for each location at
      compile time and also injects the stack pointer at runtime). The resulting
      value depends on the control flow path (e.g., loops and branches taken).
      
      Before the function returns, the plugin mixes this local variable into
      the latent_entropy global variable. The value of this global variable
      is added to the kernel entropy pool in do_one_initcall() and _do_fork(),
      though it does not credit any bytes of entropy to the pool; the contents
      of the global are just used to mix the pool.
      
      Additionally, the plugin can pre-initialize arrays with build-time
      random contents, so that two different kernel builds running on identical
      hardware will not have the same starting values.
      Signed-off-by: NEmese Revfy <re.emese@gmail.com>
      [kees: expanded commit message and code comments]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      38addce8
  10. 08 10月, 2016 15 次提交
  11. 02 9月, 2016 2 次提交
  12. 12 8月, 2016 1 次提交
  13. 11 8月, 2016 2 次提交
  14. 10 8月, 2016 1 次提交
    • V
      mm: memcontrol: only mark charged pages with PageKmemcg · c4159a75
      Vladimir Davydov 提交于
      To distinguish non-slab pages charged to kmemcg we mark them PageKmemcg,
      which sets page->_mapcount to -512.  Currently, we set/clear PageKmemcg
      in __alloc_pages_nodemask()/free_pages_prepare() for any page allocated
      with __GFP_ACCOUNT, including those that aren't actually charged to any
      cgroup, i.e. allocated from the root cgroup context.  To avoid overhead
      in case cgroups are not used, we only do that if memcg_kmem_enabled() is
      true.  The latter is set iff there are kmem-enabled memory cgroups
      (online or offline).  The root cgroup is not considered kmem-enabled.
      
      As a result, if a page is allocated with __GFP_ACCOUNT for the root
      cgroup when there are kmem-enabled memory cgroups and is freed after all
      kmem-enabled memory cgroups were removed, e.g.
      
        # no memory cgroups has been created yet, create one
        mkdir /sys/fs/cgroup/memory/test
        # run something allocating pages with __GFP_ACCOUNT, e.g.
        # a program using pipe
        dmesg | tail
        # remove the memory cgroup
        rmdir /sys/fs/cgroup/memory/test
      
      we'll get bad page state bug complaining about page->_mapcount != -1:
      
        BUG: Bad page state in process swapper/0  pfn:1fd945c
        page:ffffea007f651700 count:0 mapcount:-511 mapping:          (null) index:0x0
        flags: 0x1000000000000000()
      
      To avoid that, let's mark with PageKmemcg only those pages that are
      actually charged to and hence pin a non-root memory cgroup.
      
      Fixes: 4949148a ("mm: charge/uncharge kmemcg from generic page allocator paths")
      Reported-and-tested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4159a75
  15. 05 8月, 2016 1 次提交