1. 14 11月, 2014 15 次提交
    • T
      mem-hotplug: reset node managed pages when hot-adding a new pgdat · f784a3f1
      Tang Chen 提交于
      In free_area_init_core(), zone->managed_pages is set to an approximate
      value for lowmem, and will be adjusted when the bootmem allocator frees
      pages into the buddy system.
      
      But free_area_init_core() is also called by hotadd_new_pgdat() when
      hot-adding memory.  As a result, zone->managed_pages of the newly added
      node's pgdat is set to an approximate value in the very beginning.
      
      Even if the memory on that node has node been onlined,
      /sys/device/system/node/nodeXXX/meminfo has wrong value:
      
        hot-add node2 (memory not onlined)
        cat /sys/device/system/node/node2/meminfo
        Node 2 MemTotal:       33554432 kB
        Node 2 MemFree:               0 kB
        Node 2 MemUsed:        33554432 kB
        Node 2 Active:                0 kB
      
      This patch fixes this problem by reset node managed pages to 0 after
      hot-adding a new node.
      
      1. Move reset_managed_pages_done from reset_node_managed_pages() to
         reset_all_zones_managed_pages()
      2. Make reset_node_managed_pages() non-static
      3. Call reset_node_managed_pages() in hotadd_new_pgdat() after pgdat
         is initialized
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: <stable@vger.kernel.org>	[3.16+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f784a3f1
    • J
      mm/debug-pagealloc: correct freepage accounting and order resetting · 57cbc87e
      Joonsoo Kim 提交于
      One thing I did in this patch is fixing freepage accounting.  If we
      clear guard page and link it onto isolate buddy list, we should not
      increase freepage count.  This patch adds conditional branch to skip
      counting in this case.  Without this patch, this overcounting happens
      frequently if guard order is set and CMA is used.
      
      Another thing fixed in this patch is the target to reset order.  In
      __free_one_page(), we check the buddy page whether it is a guard page or
      not.  And, if so, we should clear guard attribute on the buddy page and
      reset order of it to 0.  But, current code resets original page's order
      rather than buddy one's.  Maybe, this doesn't have any problem, because
      whole merged page's order will be re-assigned soon.  But, it is better
      to correct code.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Gioh Kim <gioh.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57cbc87e
    • J
      fanotify: fix notification of groups with inode & mount marks · 8edc6e16
      Jan Kara 提交于
      fsnotify() needs to merge inode and mount marks lists when notifying
      groups about events so that ignore masks from inode marks are reflected
      in mount mark notifications and groups are notified in proper order
      (according to priorities).
      
      Currently the sorting of the lists done by fsnotify_add_inode_mark() /
      fsnotify_add_vfsmount_mark() and fsnotify() differed which resulted
      ignore masks not being used in some cases.
      
      Fix the problem by always using the same comparison function when
      sorting / merging the mark lists.
      
      Thanks to Heinrich Schuchardt for improvements of my patch.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=87721Signed-off-by: NJan Kara <jack@suse.cz>
      Reported-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
      Tested-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8edc6e16
    • V
      mm, compaction: prevent infinite loop in compact_zone · 1d5bfe1f
      Vlastimil Babka 提交于
      Several people have reported occasionally seeing processes stuck in
      compact_zone(), even triggering soft lockups, in 3.18-rc2+.
      
      Testing a revert of commit e14c720e ("mm, compaction: remember
      position within pageblock in free pages scanner") fixed the issue,
      although the stuck processes do not appear to involve the free scanner.
      
      Finally, by code inspection, the bug was found in isolate_migratepages()
      which uses a slightly different condition to detect if the migration and
      free scanners have met, than compact_finished().  That has not been a
      problem until commit e14c720e allowed the free scanner position
      between individual invocations to be in the middle of a pageblock.
      
      In a relatively rare case, the migration scanner position can end up at
      the beginning of a pageblock, with the free scanner position in the
      middle of the same pageblock.  If it's the migration scanner's turn,
      isolate_migratepages() exits immediately (without updating the
      position), while compact_finished() decides to continue compaction,
      resulting in a potentially infinite loop.  The system can recover only
      if another process creates enough high-order pages to make the watermark
      checks in compact_finished() pass.
      
      This patch fixes the immediate problem by bumping the migration
      scanner's position to meet the free scanner in isolate_migratepages(),
      when both are within the same pageblock.  This causes compact_finished()
      to terminate properly.  A more robust check in compact_finished() is
      planned as a cleanup for better future maintainability.
      
      Fixes: e14c720e ("mm, compaction: remember position within pageblock in free pages scanner)
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reported-by: NP. Christeas <xrg@linux.gr>
      Tested-by: NP. Christeas <xrg@linux.gr>
      Link: http://marc.info/?l=linux-mm&m=141508604232522&w=2Reported-by: NNorbert Preining <preining@logic.at>
      Tested-by: NNorbert Preining <preining@logic.at>
      Link: https://lkml.org/lkml/2014/11/4/904Reported-by: NPavel Machek <pavel@ucw.cz>
      Link: https://lkml.org/lkml/2014/11/7/164
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d5bfe1f
    • M
      mm: alloc_contig_range: demote pages busy message from warn to info · dae803e1
      Michal Nazarewicz 提交于
      Having test_pages_isolated failure message as a warning confuses users
      into thinking that it is more serious than it really is.  In reality, if
      called via CMA, allocation will be retried so a single
      test_pages_isolated failure does not prevent allocation from succeeding.
      
      Demote the warning message to an info message and reformat it such that
      the text "failed" does not appear and instead a less worrying "PFNS
      busy" is used.
      
      This message is trivially reproducible on a 10GB x86 machine on 3.16.y
      kernels configured with CONFIG_DMA_CMA.
      Signed-off-by: NMichal Nazarewicz <mina86@mina86.com>
      Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dae803e1
    • J
      mm/slab: fix unalignment problem on Malta with EVA due to slab merge · 95069ac8
      Joonsoo Kim 提交于
      Unlike SLUB, sometimes, object isn't started at the beginning of the
      slab in SLAB.  This causes the unalignment problem after slab merging is
      supported by commit 12220dea ("mm/slab: support slab merge").
      
      Following is the report from Markos that fail to boot on Malta with EVA.
      
          Calibrating delay loop... 19.86 BogoMIPS (lpj=99328)
          pid_max: default: 32768 minimum: 301
          Mount-cache hash table entries: 4096 (order: 0, 16384 bytes)
          Mountpoint-cache hash table entries: 4096 (order: 0, 16384 bytes)
          Kernel bug detected[#1]:
          CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0-05639-g12220dea #1631
          task: 1f04f5d8 ti: 1f050000 task.ti: 1f050000
          epc   : 80141190 alloc_unbound_pwq+0x234/0x304
              Not tainted
          ra    : 80141184 alloc_unbound_pwq+0x228/0x304
          Process swapper/0 (pid: 1, threadinfo=1f050000, task=1f04f5d8, tls=00000000)
          Call Trace:
            alloc_unbound_pwq+0x234/0x304
            apply_workqueue_attrs+0x11c/0x294
            __alloc_workqueue_key+0x23c/0x470
            init_workqueues+0x320/0x400
            do_one_initcall+0xe8/0x23c
            kernel_init_freeable+0x9c/0x224
            kernel_init+0x10/0x100
            ret_from_kernel_thread+0x14/0x1c
          [ end trace cb88537fdc8fa200 ]
          Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
      
      alloc_unbound_pwq() allocates slab object from pool_workqueue.  This
      kmem_cache requires 256 bytes alignment, but, current merging code
      doesn't honor that, and merge it with kmalloc-256.  kmalloc-256 requires
      only cacheline size alignment so that above failure occurs.  However, in
      x86, kmalloc-256 is luckily aligned in 256 bytes, so the problem didn't
      happen on it.
      
      To fix this problem, this patch introduces alignment mismatch check in
      find_mergeable().  This will fix the problem.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Reported-by: NMarkos Chandras <Markos.Chandras@imgtec.com>
      Tested-by: NMarkos Chandras <Markos.Chandras@imgtec.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      95069ac8
    • J
      mm/page_alloc: restrict max order of merging on isolated pageblock · 3c605096
      Joonsoo Kim 提交于
      Current pageblock isolation logic could isolate each pageblock
      individually.  This causes freepage accounting problem if freepage with
      pageblock order on isolate pageblock is merged with other freepage on
      normal pageblock.  We can prevent merging by restricting max order of
      merging to pageblock order if freepage is on isolate pageblock.
      
      A side-effect of this change is that there could be non-merged buddy
      freepage even if finishing pageblock isolation, because undoing
      pageblock isolation is just to move freepage from isolate buddy list to
      normal buddy list rather than to consider merging.  So, the patch also
      makes undoing pageblock isolation consider freepage merge.  When
      un-isolation, freepage with more than pageblock order and it's buddy are
      checked.  If they are on normal pageblock, instead of just moving, we
      isolate the freepage and free it in order to get merged.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Heesub Shin <heesub.shin@samsung.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Ritesh Harjani <ritesh.list@gmail.com>
      Cc: Gioh Kim <gioh.kim@lge.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3c605096
    • J
      mm/page_alloc: move freepage counting logic to __free_one_page() · 8f82b55d
      Joonsoo Kim 提交于
      All the caller of __free_one_page() has similar freepage counting logic,
      so we can move it to __free_one_page().  This reduce line of code and
      help future maintenance.
      
      This is also preparation step for "mm/page_alloc: restrict max order of
      merging on isolated pageblock" which fix the freepage counting problem
      on freepage with more than pageblock order.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Heesub Shin <heesub.shin@samsung.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Ritesh Harjani <ritesh.list@gmail.com>
      Cc: Gioh Kim <gioh.kim@lge.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f82b55d
    • J
      mm/page_alloc: add freepage on isolate pageblock to correct buddy list · 51bb1a40
      Joonsoo Kim 提交于
      In free_pcppages_bulk(), we use cached migratetype of freepage to
      determine type of buddy list where freepage will be added.  This
      information is stored when freepage is added to pcp list, so if
      isolation of pageblock of this freepage begins after storing, this
      cached information could be stale.  In other words, it has original
      migratetype rather than MIGRATE_ISOLATE.
      
      There are two problems caused by this stale information.
      
      One is that we can't keep these freepages from being allocated.
      Although this pageblock is isolated, freepage will be added to normal
      buddy list so that it could be allocated without any restriction.  And
      the other problem is incorrect freepage accounting.  Freepages on
      isolate pageblock should not be counted for number of freepage.
      
      Following is the code snippet in free_pcppages_bulk().
      
          /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
          __free_one_page(page, page_to_pfn(page), zone, 0, mt);
          trace_mm_page_pcpu_drain(page, 0, mt);
          if (likely(!is_migrate_isolate_page(page))) {
              __mod_zone_page_state(zone, NR_FREE_PAGES, 1);
              if (is_migrate_cma(mt))
                  __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
          }
      
      As you can see above snippet, current code already handle second
      problem, incorrect freepage accounting, by re-fetching pageblock
      migratetype through is_migrate_isolate_page(page).
      
      But, because this re-fetched information isn't used for
      __free_one_page(), first problem would not be solved.  This patch try to
      solve this situation to re-fetch pageblock migratetype before
      __free_one_page() and to use it for __free_one_page().
      
      In addition to move up position of this re-fetch, this patch use
      optimization technique, re-fetching migratetype only if there is isolate
      pageblock.  Pageblock isolation is rare event, so we can avoid
      re-fetching in common case with this optimization.
      
      This patch also correct migratetype of the tracepoint output.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Heesub Shin <heesub.shin@samsung.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Ritesh Harjani <ritesh.list@gmail.com>
      Cc: Gioh Kim <gioh.kim@lge.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      51bb1a40
    • J
      mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype · ad53f92e
      Joonsoo Kim 提交于
      Before describing bugs itself, I first explain definition of freepage.
      
       1. pages on buddy list are counted as freepage.
       2. pages on isolate migratetype buddy list are *not* counted as freepage.
       3. pages on cma buddy list are counted as CMA freepage, too.
      
      Now, I describe problems and related patch.
      
      Patch 1: There is race conditions on getting pageblock migratetype that
      it results in misplacement of freepages on buddy list, incorrect
      freepage count and un-availability of freepage.
      
      Patch 2: Freepages on pcp list could have stale cached information to
      determine migratetype of buddy list to go.  This causes misplacement of
      freepages on buddy list and incorrect freepage count.
      
      Patch 4: Merging between freepages on different migratetype of
      pageblocks will cause freepages accouting problem.  This patch fixes it.
      
      Without patchset [3], above problem doesn't happens on my CMA allocation
      test, because CMA reserved pages aren't used at all.  So there is no
      chance for above race.
      
      With patchset [3], I did simple CMA allocation test and get below
      result:
      
       - Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation
       - run kernel build (make -j16) on background
       - 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval
       - Result: more than 5000 freepage count are missed
      
      With patchset [3] and this patchset, I found that no freepage count are
      missed so that I conclude that problems are solved.
      
      On my simple memory offlining test, these problems also occur on that
      environment, too.
      
      This patch (of 4):
      
      There are two paths to reach core free function of buddy allocator,
      __free_one_page(), one is free_one_page()->__free_one_page() and the
      other is free_hot_cold_page()->free_pcppages_bulk()->__free_one_page().
      Each paths has race condition causing serious problems.  At first, this
      patch is focused on first type of freepath.  And then, following patch
      will solve the problem in second type of freepath.
      
      In the first type of freepath, we got migratetype of freeing page
      without holding the zone lock, so it could be racy.  There are two cases
      of this race.
      
       1. pages are added to isolate buddy list after restoring orignal
          migratetype
      
          CPU1                                   CPU2
      
          get migratetype => return MIGRATE_ISOLATE
          call free_one_page() with MIGRATE_ISOLATE
      
                                      grab the zone lock
                                      unisolate pageblock
                                      release the zone lock
      
          grab the zone lock
          call __free_one_page() with MIGRATE_ISOLATE
          freepage go into isolate buddy list,
          although pageblock is already unisolated
      
      This may cause two problems.  One is that we can't use this page anymore
      until next isolation attempt of this pageblock, because freepage is on
      isolate buddy list.  The other is that freepage accouting could be wrong
      due to merging between different buddy list.  Freepages on isolate buddy
      list aren't counted as freepage, but ones on normal buddy list are
      counted as freepage.  If merge happens, buddy freepage on normal buddy
      list is inevitably moved to isolate buddy list without any consideration
      of freepage accouting so it could be incorrect.
      
       2. pages are added to normal buddy list while pageblock is isolated.
          It is similar with above case.
      
      This also may cause two problems.  One is that we can't keep these
      freepages from being allocated.  Although this pageblock is isolated,
      freepage would be added to normal buddy list so that it could be
      allocated without any restriction.  And the other problem is same as
      case 1, that it, incorrect freepage accouting.
      
      This race condition would be prevented by checking migratetype again
      with holding the zone lock.  Because it is somewhat heavy operation and
      it isn't needed in common case, we want to avoid rechecking as much as
      possible.  So this patch introduce new variable, nr_isolate_pageblock in
      struct zone to check if there is isolated pageblock.  With this, we can
      avoid to re-check migratetype in common case and do it only if there is
      isolated pageblock or migratetype is MIGRATE_ISOLATE.  This solve above
      mentioned problems.
      
      Changes from v3:
      Add one more check in free_one_page() that checks whether migratetype is
      MIGRATE_ISOLATE or not. Without this, abovementioned case 1 could happens.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Heesub Shin <heesub.shin@samsung.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Ritesh Harjani <ritesh.list@gmail.com>
      Cc: Gioh Kim <gioh.kim@lge.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ad53f92e
    • J
      mm/compaction: skip the range until proper target pageblock is met · 58420016
      Joonsoo Kim 提交于
      Commit 7d49d886 ("mm, compaction: reduce zone checking frequency in
      the migration scanner") has a side-effect that changes the iteration
      range calculation.  Before the change, block_end_pfn is calculated using
      start_pfn, but now it blindly adds pageblock_nr_pages to the previous
      value.
      
      This causes the problem that isolation_start_pfn is larger than
      block_end_pfn when we isolate the page with more than pageblock order.
      In this case, isolation would fail due to an invalid range parameter.
      
      To prevent this, this patch implements skipping the range until a proper
      target pageblock is met.  Without this patch, CMA with more than
      pageblock order always fails but with this patch it will succeed.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58420016
    • W
      zram: avoid kunmap_atomic() of a NULL pointer · c4065152
      Weijie Yang 提交于
      zram could kunmap_atomic() a NULL pointer in a rare situation: a zram
      page becomes a full-zeroed page after a partial write io.  The current
      code doesn't handle this case and performs kunmap_atomic() on a NULL
      pointer, which panics the kernel.
      
      This patch fixes this issue.
      Signed-off-by: NWeijie Yang <weijie.yang@samsung.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Weijie Yang <weijie.yang.kh@gmail.com>
      Acked-by: NJerome Marchand <jmarchan@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4065152
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security · 2c54396e
      Linus Torvalds 提交于
      Pull SELinux fixlet from James Morris:
       "WARN_ONCE() here will unnecessarily terrify users"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
        selinux: convert WARN_ONCE() to printk() in selinux_nlmsg_perm()
      2c54396e
    • L
      Merge branch 'stable-3.18' of git://git.infradead.org/users/pcmoore/audit · 91188375
      Linus Torvalds 提交于
      Pull audit fixes from Paul Moore:
       "After he sent the initial audit pull request for 3.18, Eric asked me
        to take over the management of the audit tree, hence this pull request
        to fix a couple of problems with audit.
      
        As you can see below, the changes are minimal: adding some whitespace
        to a string so userspace parses it correctly, and fixing a problem
        with audit's usage of fsnotify that was causing audit watch rules to
        be lost.  Neither of these patches were very controversial on the
        mailing lists and they fix real problems, getting them into 3.18 would
        be a good thing"
      
      * 'stable-3.18' of git://git.infradead.org/users/pcmoore/audit:
        audit: keep inode pinned
        audit: AUDIT_FEATURE_CHANGE message format missing delimiting space
      91188375
    • L
      Merge tag 'dm-3.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm · 5a7a662c
      Linus Torvalds 提交于
      Pull device mapper fixes from Mike Snitzer:
      
       - stable fix for dm-thin that avoids normal IO racing with discard
      
       - stable fix for a dm-cache related bug in dm-btree walking code that
         results from using very large fast device (eg 4T) with a very small
         cache blocksize (eg 32K) -- this is a very uncommon configuration
      
       - a couple fixes for dm-raid (one for stable and the other addresses a
         crash in 3.18-rc1 code)
      
       - stable fix for dm-thinp that addresses a very rare dm-bufio bug
         having to do with memory reclaimation (via shrinker) when using
         dm-thinp ontop of loopback devices
      
       - fix a leak in dm-stripe target constructor's error path
      
      * tag 'dm-3.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm btree: fix a recursion depth bug in btree walking code
        dm thin: grab a virtual cell before looking up the mapping
        dm raid: fix inaccessible superblocks causing oops in configure_discard_support
        dm raid: ensure superblock's size matches device's logical block size
        dm bufio: change __GFP_IO to __GFP_FS in shrinker callbacks
        dm stripe: fix potential for leak in stripe_ctr error path
      5a7a662c
  2. 13 11月, 2014 11 次提交
  3. 12 11月, 2014 4 次提交
  4. 11 11月, 2014 4 次提交
    • D
      param: fix crash on bad kernel arguments · 3438cf54
      Daniel Thompson 提交于
      Currently if the user passes an invalid value on the kernel command line
      then the kernel will crash during argument parsing. On most systems this
      is very hard to debug because the console hasn't been initialized yet.
      
      This is a regression due to commit 51e158c1 ("param: hand arguments
      after -- straight to init") which, in response to the systemd debug
      controversy, made it possible to explicitly pass arguments to init. To
      achieve this parse_args() was extended from simply returning an error
      code to returning a pointer. Regretably the new init args logic does not
      perform a proper validity check on the pointer resulting in a crash.
      
      This patch fixes the validity check. Should the check fail then no arguments
      will be passed to init. This is reasonable and matches how the kernel treats
      its own arguments (i.e. no error recovery).
      Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      3438cf54
    • R
      tracing: Do not risk busy looping in buffer splice · 07906da7
      Rabin Vincent 提交于
      If the read loop in trace_buffers_splice_read() keeps failing due to
      memory allocation failures without reading even a single page then this
      function will keep busy looping.
      
      Remove the risk for that by exiting the function if memory allocation
      failures are seen.
      
      Link: http://lkml.kernel.org/r/1415309167-2373-2-git-send-email-rabin@rab.inSigned-off-by: NRabin Vincent <rabin@rab.in>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      07906da7
    • R
      tracing: Do not busy wait in buffer splice · e30f53aa
      Rabin Vincent 提交于
      On a !PREEMPT kernel, attempting to use trace-cmd results in a soft
      lockup:
      
       # trace-cmd record -e raw_syscalls:* -F false
       NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trace-cmd:61]
       ...
       Call Trace:
        [<ffffffff8105b580>] ? __wake_up_common+0x90/0x90
        [<ffffffff81092e25>] wait_on_pipe+0x35/0x40
        [<ffffffff810936e3>] tracing_buffers_splice_read+0x2e3/0x3c0
        [<ffffffff81093300>] ? tracing_stats_read+0x2a0/0x2a0
        [<ffffffff812d10ab>] ? _raw_spin_unlock+0x2b/0x40
        [<ffffffff810dc87b>] ? do_read_fault+0x21b/0x290
        [<ffffffff810de56a>] ? handle_mm_fault+0x2ba/0xbd0
        [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80
        [<ffffffff810951e2>] ? trace_buffer_lock_reserve+0x22/0x60
        [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80
        [<ffffffff8112415d>] do_splice_to+0x6d/0x90
        [<ffffffff81126971>] SyS_splice+0x7c1/0x800
        [<ffffffff812d1edd>] tracesys_phase2+0xd3/0xd8
      
      The problem is this: tracing_buffers_splice_read() calls
      ring_buffer_wait() to wait for data in the ring buffers.  The buffers
      are not empty so ring_buffer_wait() returns immediately.  But
      tracing_buffers_splice_read() calls ring_buffer_read_page() with full=1,
      meaning it only wants to read a full page.  When the full page is not
      available, tracing_buffers_splice_read() tries to wait again with
      ring_buffer_wait(), which again returns immediately, and so on.
      
      Fix this by adding a "full" argument to ring_buffer_wait() which will
      make ring_buffer_wait() wait until the writer has left the reader's
      page, i.e.  until full-page reads will succeed.
      
      Link: http://lkml.kernel.org/r/1415645194-25379-1-git-send-email-rabin@rab.in
      
      Cc: stable@vger.kernel.org # 3.16+
      Fixes: b1169cc6 ("tracing: Remove mock up poll wait function")
      Signed-off-by: NRabin Vincent <rabin@rab.in>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e30f53aa
    • J
      dm btree: fix a recursion depth bug in btree walking code · 9b460d36
      Joe Thornber 提交于
      The walk code was using a 'ro_spine' to hold it's locked btree nodes.
      But this data structure is designed for the rolling lock scheme, and
      as such automatically unlocks blocks that are two steps up the call
      chain.  This is not suitable for the simple recursive walk algorithm,
      which retraces its steps.
      
      This code is only used by the persistent array code, which in turn is
      only used by dm-cache.  In order to trigger it you need to have a
      mapping tree that is more than 2 levels deep; which equates to 8-16
      million cache blocks.  For instance a 4T ssd with a very small block
      size of 32k only just triggers this bug.
      
      The fix just places the locked blocks on the stack, and stops using
      the ro_spine altogether.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      9b460d36
  5. 10 11月, 2014 6 次提交
    • T
      mfd: twl4030-power: Fix poweroff with PM configuration enabled · 481c7f86
      Tony Lindgren 提交于
      Commit e7cd1d1e ("mfd: twl4030-power: Add generic reset
      configuration") enabled configuring the PM features for twl4030.
      
      This caused poweroff command to fail on devices that have the
      BCI charger on twl4030 wired, or have power wired for VBUS.
      Instead of powering off, the device reboots. This is because
      voltage is detected on charger or VBUS with the default bits
      enabled for the power transition registers.
      
      To fix the issue, let's just clear VBUS and CHG bits as we want
      poweroff command to keep the system powered off.
      
      Fixes: e7cd1d1e ("mfd: twl4030-power: Add generic reset configuration")
      Cc: stable@vger.kernel.org # v3.16+
      Reported-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      481c7f86
    • K
      mfd: max77693: Fix always masked MUIC interrupts · c0acb814
      Krzysztof Kozlowski 提交于
      All interrupts coming from MUIC were ignored because interrupt source
      register was masked.
      
      The Maxim 77693 has a "interrupt source" - a separate register and interrupts
      which give information about PMIC block triggering the individual
      interrupt (charger, topsys, MUIC, flash LED).
      
      By default bootloader could initialize this register to "mask all"
      value. In such case (observed on Trats2 board) MUIC interrupts won't be
      generated regardless of their mask status. Regmap irq chip was unmasking
      individual MUIC interrupts but the source was masked
      
      Before introducing regmap irq chip this interrupt source was unmasked,
      read and acked. Reading and acking is not necessary but unmasking is.
      
      Fixes: 342d669c ("mfd: max77693: Handle IRQs using regmap")
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Reviewed-by: NChanwoo Choi <cw00.choi@samsung.com>
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      c0acb814
    • K
      mfd: max77693: Use proper regmap for handling MUIC interrupts · 43fc9396
      Krzysztof Kozlowski 提交于
      Interrupts coming from Maxim77693 MUIC block (MicroUSB Interface
      Controller) were not handled at all because wrong regmap was used for
      MUIC's regmap_irq_chip.
      
      The MUIC component of Maxim 77693 uses different I2C address thus second
      regmap is created and used by max77693 extcon driver. The registers for
      MUIC interrupts are also in that block and should be handled by that
      second regmap.
      
      However the regmap irq chip for MUIC was configured with default regmap
      which could not read MUIC registers.
      
      Fixes: 342d669c ("mfd: max77693: Handle IRQs using regmap")
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Reviewed-by: NChanwoo Choi <cw00.choi@samsung.com>
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      43fc9396
    • J
      mfd: viperboard: Fix platform-device id collision · b6684228
      Johan Hovold 提交于
      Allow more than one viperboard to be connected by registering with
      PLATFORM_DEVID_AUTO instead of PLATFORM_DEVID_NONE.
      
      The subdevices are currently registered with PLATFORM_DEVID_NONE, which
      will cause a name collision on the platform bus when a second viperboard
      is plugged in:
      
      viperboard 1-2.4:1.0: version 0.00 found at bus 001 address 004
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 181 at /home/johan/work/omicron/src/linux/fs/sysfs/dir.c:31 sysfs_warn_dup+0x74/0x84()
      sysfs: cannot create duplicate filename '/bus/platform/devices/viperboard-gpio'
      Modules linked in: i2c_viperboard viperboard netconsole [last unloaded: viperboard]
      CPU: 0 PID: 181 Comm: bash Tainted: G        W      3.17.0-rc6 #1
      [<c0016bf4>] (unwind_backtrace) from [<c0013860>] (show_stack+0x20/0x24)
      [<c0013860>] (show_stack) from [<c04305f8>] (dump_stack+0x24/0x28)
      [<c04305f8>] (dump_stack) from [<c0040fb4>] (warn_slowpath_common+0x80/0x98)
      [<c0040fb4>] (warn_slowpath_common) from [<c004100c>] (warn_slowpath_fmt+0x40/0x48)
      [<c004100c>] (warn_slowpath_fmt) from [<c016f1bc>] (sysfs_warn_dup+0x74/0x84)
      [<c016f1bc>] (sysfs_warn_dup) from [<c016f548>] (sysfs_do_create_link_sd.isra.2+0xcc/0xd0)
      [<c016f548>] (sysfs_do_create_link_sd.isra.2) from [<c016f588>] (sysfs_create_link+0x3c/0x48)
      [<c016f588>] (sysfs_create_link) from [<c02867ec>] (bus_add_device+0x12c/0x1e0)
      [<c02867ec>] (bus_add_device) from [<c0284820>] (device_add+0x410/0x584)
      [<c0284820>] (device_add) from [<c0289440>] (platform_device_add+0xd8/0x26c)
      [<c0289440>] (platform_device_add) from [<c02a5ae4>] (mfd_add_device+0x240/0x344)
      [<c02a5ae4>] (mfd_add_device) from [<c02a5ce0>] (mfd_add_devices+0xb8/0x110)
      [<c02a5ce0>] (mfd_add_devices) from [<bf00d1c8>] (vprbrd_probe+0x160/0x1b0 [viperboard])
      [<bf00d1c8>] (vprbrd_probe [viperboard]) from [<c030c000>] (usb_probe_interface+0x1bc/0x2a8)
      [<c030c000>] (usb_probe_interface) from [<c028768c>] (driver_probe_device+0x14c/0x3ac)
      [<c028768c>] (driver_probe_device) from [<c02879e4>] (__driver_attach+0xa4/0xa8)
      [<c02879e4>] (__driver_attach) from [<c0285698>] (bus_for_each_dev+0x70/0xa4)
      [<c0285698>] (bus_for_each_dev) from [<c0287030>] (driver_attach+0x2c/0x30)
      [<c0287030>] (driver_attach) from [<c030a288>] (usb_store_new_id+0x170/0x1ac)
      [<c030a288>] (usb_store_new_id) from [<c030a2f8>] (new_id_store+0x34/0x3c)
      [<c030a2f8>] (new_id_store) from [<c02853ec>] (drv_attr_store+0x30/0x3c)
      [<c02853ec>] (drv_attr_store) from [<c016eaa8>] (sysfs_kf_write+0x5c/0x60)
      [<c016eaa8>] (sysfs_kf_write) from [<c016dc68>] (kernfs_fop_write+0xd4/0x194)
      [<c016dc68>] (kernfs_fop_write) from [<c010fe40>] (vfs_write+0xb4/0x1c0)
      [<c010fe40>] (vfs_write) from [<c01104a8>] (SyS_write+0x4c/0xa0)
      [<c01104a8>] (SyS_write) from [<c000f900>] (ret_fast_syscall+0x0/0x48)
      ---[ end trace 98e8603c22d65817 ]---
      viperboard 1-2.4:1.0: Failed to add mfd devices to core.
      viperboard: probe of 1-2.4:1.0 failed with error -17
      Signed-off-by: NJohan Hovold <johan@kernel.org>
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      b6684228
    • T
      mfd: rtsx: Fix build warnings for !PM · 451be648
      Thierry Reding 提交于
      rtsx_pci_power_off() is called only from rtsx_pci_suspend(), which isn't
      built when PM is disabled.
      Signed-off-by: NThierry Reding <treding@nvidia.com>
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      451be648
    • L
      mfd: stmpe: Fix STMPE24xx GPMR LSB · 871c3cf4
      Linus Walleij 提交于
      The least significat byte of the GPIO value read register
      on the STMPE24xx series is on addres 0xA4 not 0xA5. Correct
      against datasheet and tested on the STMPE2401 hardware.
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      871c3cf4