1. 04 5月, 2017 9 次提交
  2. 01 5月, 2017 2 次提交
    • H
      mm: remove AVR32 arch special handling in mm/Kconfig · 01e7bc24
      Hans-Christian Noren Egtvedt 提交于
      AVR32 architecture has been removed from the Linux kernel sources, hence
      clean up the special handling setting two quicklists by default in
      mm/Kconfig.
      Signed-off-by: NHans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      01e7bc24
    • D
      mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash · 71389703
      Dan Williams 提交于
      The x86 conversion to the generic GUP code included a small change which causes
      crashes and data corruption in the pmem code - not good.
      
      The root cause is that the /dev/pmem driver code implicitly relies on the x86
      get_user_pages() implementation doing a get_page() on the page refcount, because
      get_page() does a get_zone_device_page() which properly refcounts pmem's separate
      page struct arrays that are not present in the regular page struct structures.
      (The pmem driver does this because it can cover huge memory areas.)
      
      But the x86 conversion to the generic GUP code changed the get_page() to
      page_cache_get_speculative() which is faster but doesn't do the
      get_zone_device_page() call the pmem code relies on.
      
      One way to solve the regression would be to change the generic GUP code to use
      get_page(), but that would slow things down a bit and punish other generic-GUP
      using architectures for an x86-ism they did not care about. (Arguably the pmem
      driver was probably not working reliably for them: but nvdimm is an Intel
      feature, so non-x86 exposure is probably still limited.)
      
      So restructure the pmem code's interface with the MM instead: get rid of the
      get/put_zone_device_page() distinction, integrate put_zone_device_page() into
      __put_page() and and restructure the pmem completion-wait and teardown machinery:
      
      Kirill points out that the calls to {get,put}_dev_pagemap() can be
      removed from the mm fast path if we take a single get_dev_pagemap()
      reference to signify that the page is alive and use the final put of the
      page to drop that reference.
      
      This does require some care to make sure that any waits for the
      percpu_ref to drop to zero occur *after* devm_memremap_page_release(),
      since it now maintains its own elevated reference.
      
      This speeds up things while also making the pmem refcounting more robust going
      forward.
      Suggested-by: NKirill Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NKirill Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/149339998297.24933.1129582806028305912.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      71389703
  3. 23 4月, 2017 1 次提交
    • I
      Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation" · 6dd29b3d
      Ingo Molnar 提交于
      This reverts commit 2947ba05.
      
      Dan Williams reported dax-pmem kernel warnings with the following signature:
      
         WARNING: CPU: 8 PID: 245 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x1f5/0x200
         percpu ref (dax_pmem_percpu_release [dax_pmem]) <= 0 (0) after switching to atomic
      
      ... and bisected it to this commit, which suggests possible memory corruption
      caused by the x86 fast-GUP conversion.
      
      He also pointed out:
      
       "
        This is similar to the backtrace when we were not properly handling
        pud faults and was fixed with this commit: 220ced16 "mm: fix
        get_user_pages() vs device-dax pud mappings"
      
        I've found some missing _devmap checks in the generic
        get_user_pages_fast() path, but this does not fix the regression
        [...]
       "
      
      So given that there are known bugs, and a pretty robust looking bisection
      points to this commit suggesting that are unknown bugs in the conversion
      as well, revert it for the time being - we'll re-try in v4.13.
      Reported-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: aneesh.kumar@linux.vnet.ibm.com
      Cc: dann.frazier@canonical.com
      Cc: dave.hansen@intel.com
      Cc: steve.capper@linaro.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6dd29b3d
  4. 22 4月, 2017 2 次提交
  5. 21 4月, 2017 7 次提交
  6. 20 4月, 2017 1 次提交
    • M
      mm: make mm_percpu_wq non freezable · 80d136e1
      Michal Hocko 提交于
      Geert has reported a freeze during PM resume and some additional
      debugging has shown that the device_resume worker cannot make a forward
      progress because it waits for an event which is stuck waiting in
      drain_all_pages:
      
        INFO: task kworker/u4:0:5 blocked for more than 120 seconds.
              Not tainted 4.11.0-rc7-koelsch-00029-g005882e5-dirty #3476
        "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        kworker/u4:0    D    0     5      2 0x00000000
        Workqueue: events_unbound async_run_entry_fn
          __schedule
          schedule
          schedule_timeout
          wait_for_common
          dpm_wait_for_superior
          device_resume
          async_resume
          async_run_entry_fn
          process_one_work
          worker_thread
          kthread
        [...]
        bash            D    0  1703   1694 0x00000000
          __schedule
          schedule
          schedule_timeout
          wait_for_common
          flush_work
          drain_all_pages
          start_isolate_page_range
          alloc_contig_range
          cma_alloc
          __alloc_from_contiguous
          cma_allocator_alloc
          __dma_alloc
          arm_dma_alloc
          sh_eth_ring_init
          sh_eth_open
          sh_eth_resume
          dpm_run_callback
          device_resume
          dpm_resume
          dpm_resume_end
          suspend_devices_and_enter
          pm_suspend
          state_store
          kernfs_fop_write
          __vfs_write
          vfs_write
          SyS_write
        [...]
        Showing busy workqueues and worker pools:
        [...]
        workqueue mm_percpu_wq: flags=0xc
          pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=0/0
            delayed: drain_local_pages_wq, vmstat_update
          pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=0/0
            delayed: drain_local_pages_wq BAR(1703), vmstat_update
      
      Tetsuo has properly noted that mm_percpu_wq is created as WQ_FREEZABLE
      so it is frozen this early during resume so we are effectively
      deadlocked.  Fix this by dropping WQ_FREEZABLE when creating
      mm_percpu_wq.  We really want to have it operational all the time.
      
      Fixes: ce612879 ("mm: move pcp and lru-pcp draining into single wq")
      Reported-and-tested-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Debugged-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      80d136e1
  7. 14 4月, 2017 5 次提交
  8. 09 4月, 2017 1 次提交
  9. 08 4月, 2017 5 次提交
  10. 06 4月, 2017 1 次提交
  11. 05 4月, 2017 1 次提交
  12. 03 4月, 2017 1 次提交
    • M
      kernel-api.rst: fix a series of errors when parsing C files · 0e056eb5
      mchehab@s-opensource.com 提交于
      ./lib/string.c:134: WARNING: Inline emphasis start-string without end-string.
      ./mm/filemap.c:522: WARNING: Inline interpreted text or phrase reference start-string without end-string.
      ./mm/filemap.c:1283: ERROR: Unexpected indentation.
      ./mm/filemap.c:3003: WARNING: Inline interpreted text or phrase reference start-string without end-string.
      ./mm/vmalloc.c:1544: WARNING: Inline emphasis start-string without end-string.
      ./mm/page_alloc.c:4245: ERROR: Unexpected indentation.
      ./ipc/util.c:676: ERROR: Unexpected indentation.
      ./drivers/pci/irq.c:35: WARNING: Block quote ends without a blank line; unexpected unindent.
      ./security/security.c:109: ERROR: Unexpected indentation.
      ./security/security.c:110: WARNING: Definition list ends without a blank line; unexpected unindent.
      ./block/genhd.c:275: WARNING: Inline strong start-string without end-string.
      ./block/genhd.c:283: WARNING: Inline strong start-string without end-string.
      ./include/linux/clk.h:134: WARNING: Inline emphasis start-string without end-string.
      ./include/linux/clk.h:134: WARNING: Inline emphasis start-string without end-string.
      ./ipc/util.c:477: ERROR: Unknown target name: "s".
      Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      0e056eb5
  13. 01 4月, 2017 4 次提交
    • M
      mm/hugetlb.c: don't call region_abort if region_chg fails · ff8c0c53
      Mike Kravetz 提交于
      Changes to hugetlbfs reservation maps is a two step process.  The first
      step is a call to region_chg to determine what needs to be changed, and
      prepare that change.  This should be followed by a call to call to
      region_add to commit the change, or region_abort to abort the change.
      
      The error path in hugetlb_reserve_pages called region_abort after a
      failed call to region_chg.  As a result, the adds_in_progress counter in
      the reservation map is off by 1.  This is caught by a VM_BUG_ON in
      resv_map_release when the reservation map is freed.
      
      syzkaller fuzzer (when using an injected kmalloc failure) found this
      bug, that resulted in the following:
      
       kernel BUG at mm/hugetlb.c:742!
       Call Trace:
        hugetlbfs_evict_inode+0x7b/0xa0 fs/hugetlbfs/inode.c:493
        evict+0x481/0x920 fs/inode.c:553
        iput_final fs/inode.c:1515 [inline]
        iput+0x62b/0xa20 fs/inode.c:1542
        hugetlb_file_setup+0x593/0x9f0 fs/hugetlbfs/inode.c:1306
        newseg+0x422/0xd30 ipc/shm.c:575
        ipcget_new ipc/util.c:285 [inline]
        ipcget+0x21e/0x580 ipc/util.c:639
        SYSC_shmget ipc/shm.c:673 [inline]
        SyS_shmget+0x158/0x230 ipc/shm.c:657
        entry_SYSCALL_64_fastpath+0x1f/0xc2
       RIP: resv_map_release+0x265/0x330 mm/hugetlb.c:742
      
      Link: http://lkml.kernel.org/r/1490821682-23228-1-git-send-email-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff8c0c53
    • M
      kasan: report only the first error by default · b0845ce5
      Mark Rutland 提交于
      Disable kasan after the first report.  There are several reasons for
      this:
      
       - Single bug quite often has multiple invalid memory accesses causing
         storm in the dmesg.
      
       - Write OOB access might corrupt metadata so the next report will print
         bogus alloc/free stacktraces.
      
       - Reports after the first easily could be not bugs by itself but just
         side effects of the first one.
      
      Given that multiple reports usually only do harm, it makes sense to
      disable kasan after the first one.  If user wants to see all the
      reports, the boot-time parameter kasan_multi_shot must be used.
      
      [aryabinin@virtuozzo.com: wrote changelog and doc, added missing include]
      Link: http://lkml.kernel.org/r/20170323154416.30257-1-aryabinin@virtuozzo.comSigned-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b0845ce5
    • K
      mm: fix section name for .data..ro_after_init · 906f2a51
      Kees Cook 提交于
      A section name for .data..ro_after_init was added by both:
      
          commit d07a980c ("s390: add proper __ro_after_init support")
      
      and
      
          commit d7c19b06 ("mm: kmemleak: scan .data.ro_after_init")
      
      The latter adds incorrect wrapping around the existing s390 section, and
      came later.  I'd prefer the s390 naming, so this moves the s390-specific
      name up to the asm-generic/sections.h and renames the section as used by
      kmemleak (and in the future, kernel/extable.c).
      
      Link: http://lkml.kernel.org/r/20170327192213.GA129375@beastSigned-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>	[s390 parts]
      Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Cc: Eddie Kovsky <ewk@edkovsky.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      906f2a51
    • N
      mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd() · c9d398fa
      Naoya Horiguchi 提交于
      I found the race condition which triggers the following bug when
      move_pages() and soft offline are called on a single hugetlb page
      concurrently.
      
          Soft offlining page 0x119400 at 0x700000000000
          BUG: unable to handle kernel paging request at ffffea0011943820
          IP: follow_huge_pmd+0x143/0x190
          PGD 7ffd2067
          PUD 7ffd1067
          PMD 0
              [61163.582052] Oops: 0000 [#1] SMP
          Modules linked in: binfmt_misc ppdev virtio_balloon parport_pc pcspkr i2c_piix4 parport i2c_core acpi_cpufreq ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk 8139too crc32c_intel ata_piix serio_raw libata virtio_pci 8139cp virtio_ring virtio mii floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: cap_check]
          CPU: 0 PID: 22573 Comm: iterate_numa_mo Tainted: P           OE   4.11.0-rc2-mm1+ #2
          Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
          RIP: 0010:follow_huge_pmd+0x143/0x190
          RSP: 0018:ffffc90004bdbcd0 EFLAGS: 00010202
          RAX: 0000000465003e80 RBX: ffffea0004e34d30 RCX: 00003ffffffff000
          RDX: 0000000011943800 RSI: 0000000000080001 RDI: 0000000465003e80
          RBP: ffffc90004bdbd18 R08: 0000000000000000 R09: ffff880138d34000
          R10: ffffea0004650000 R11: 0000000000c363b0 R12: ffffea0011943800
          R13: ffff8801b8d34000 R14: ffffea0000000000 R15: 000077ff80000000
          FS:  00007fc977710740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: ffffea0011943820 CR3: 000000007a746000 CR4: 00000000001406f0
          Call Trace:
           follow_page_mask+0x270/0x550
           SYSC_move_pages+0x4ea/0x8f0
           SyS_move_pages+0xe/0x10
           do_syscall_64+0x67/0x180
           entry_SYSCALL64_slow_path+0x25/0x25
          RIP: 0033:0x7fc976e03949
          RSP: 002b:00007ffe72221d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000117
          RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc976e03949
          RDX: 0000000000c22390 RSI: 0000000000001400 RDI: 0000000000005827
          RBP: 00007ffe72221e00 R08: 0000000000c2c3a0 R09: 0000000000000004
          R10: 0000000000c363b0 R11: 0000000000000246 R12: 0000000000400650
          R13: 00007ffe72221ee0 R14: 0000000000000000 R15: 0000000000000000
          Code: 81 e4 ff ff 1f 00 48 21 c2 49 c1 ec 0c 48 c1 ea 0c 4c 01 e2 49 bc 00 00 00 00 00 ea ff ff 48 c1 e2 06 49 01 d4 f6 45 bc 04 74 90 <49> 8b 7c 24 20 40 f6 c7 01 75 2b 4c 89 e7 8b 47 1c 85 c0 7e 2a
          RIP: follow_huge_pmd+0x143/0x190 RSP: ffffc90004bdbcd0
          CR2: ffffea0011943820
          ---[ end trace e4f81353a2d23232 ]---
          Kernel panic - not syncing: Fatal exception
          Kernel Offset: disabled
      
      This bug is triggered when pmd_present() returns true for non-present
      hugetlb, so fixing the present check in follow_huge_pmd() prevents it.
      Using pmd_present() to determine present/non-present for hugetlb is not
      correct, because pmd_present() checks multiple bits (not only
      _PAGE_PRESENT) for historical reason and it can misjudge hugetlb state.
      
      Fixes: e66f17ff ("mm/hugetlb: take page table lock in follow_huge_pmd()")
      Link: http://lkml.kernel.org/r/1490149898-20231-1-git-send-email-n-horiguchi@ah.jp.nec.comSigned-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: <stable@vger.kernel.org>        [4.0+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c9d398fa