1. 03 3月, 2016 1 次提交
  2. 17 2月, 2016 1 次提交
  3. 15 2月, 2016 1 次提交
    • A
      powerpc/mm: Fix Multi hit ERAT cause by recent THP update · c777e2a8
      Aneesh Kumar K.V 提交于
      With ppc64 we use the deposited pgtable_t to store the hash pte slot
      information. We should not withdraw the deposited pgtable_t without
      marking the pmd none. This ensure that low level hash fault handling
      will skip this huge pte and we will handle them at upper levels.
      
      Recent change to pmd splitting changed the above in order to handle the
      race between pmd split and exit_mmap. The race is explained below.
      
      Consider following race:
      
      		CPU0				CPU1
      shrink_page_list()
        add_to_swap()
          split_huge_page_to_list()
            __split_huge_pmd_locked()
              pmdp_huge_clear_flush_notify()
      	// pmd_none() == true
      					exit_mmap()
      					  unmap_vmas()
      					    zap_pmd_range()
      					      // no action on pmd since pmd_none() == true
      	pmd_populate()
      
      As result the THP will not be freed. The leak is detected by check_mm():
      
      	BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
      
      The above required us to not mark pmd none during a pmd split.
      
      The fix for ppc is to clear the huge pte of _PAGE_USER, so that low
      level fault handling code skip this pte. At higher level we do take ptl
      lock. That should serialze us against the pmd split. Once the lock is
      acquired we do check the pmd again using pmd_same. That should always
      return false for us and hence we should retry the access. We do the
      pmd_same check in all case after taking plt with
      THP (do_huge_pmd_wp_page, do_huge_pmd_numa_page and
      huge_pmd_set_accessed)
      
      Also make sure we wait for irq disable section in other cpus to finish
      before flipping a huge pte entry with a regular pmd entry. Code paths
      like find_linux_pte_or_hugepte depend on irq disable to get
      a stable pte_t pointer. A parallel thp split need to make sure we
      don't convert a pmd pte to a regular pmd entry without waiting for the
      irq disable section to finish.
      
      Fixes: eef1b3ba ("thp: implement split_huge_pmd()")
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c777e2a8
  4. 06 2月, 2016 3 次提交
  5. 05 2月, 2016 5 次提交
  6. 04 2月, 2016 6 次提交
    • H
      [media] vb2: fix nasty vb2_thread regression · fac710e4
      Hans Verkuil 提交于
      The vb2_thread implementation was made generic and was moved from
      videobuf2-v4l2.c to videobuf2-core.c in commit af3bac1a. Unfortunately
      that clearly was never tested since it broke read() causing NULL address
      references.
      
      The root cause was confused handling of vb2_buffer vs v4l2_buffer (the pb
      pointer in various core functions).
      
      The v4l2_buffer no longer exists after moving the code into the core and
      it is no longer needed. However, the vb2_thread code passed a pointer to
      a vb2_buffer to the core functions were a v4l2_buffer pointer was expected
      and vb2_thread expected that the vb2_buffer fields would be filled in
      correctly.
      
      This is obviously wrong since v4l2_buffer != vb2_buffer. Note that the
      pb pointer is a void pointer, so no type-checking took place.
      
      This patch fixes this problem:
      
      1) allow pb to be NULL for vb2_core_(d)qbuf. The vb2_thread code will use
         a NULL pointer here since they don't care about v4l2_buffer anyway.
      2) let vb2_core_dqbuf pass back the index of the received buffer. This is
         all vb2_thread needs: this index is the index into the q->bufs array
         and vb2_thread just gets the vb2_buffer from there.
      3) the fileio->b pointer (that originally contained a v4l2_buffer) is
         removed altogether since it is no longer needed.
      
      Tested with vivid and the cobalt driver.
      
      Cc: stable@vger.kernel.org # Kernel >= 4.3
      Signed-off-by: NHans Verkuil <hans.verkuil@cisco.com>
      Reported-by: NMatthias Schwarzott <zzam@gentoo.org>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
      fac710e4
    • M
      radix-tree: fix race in gang lookup · 46437f9a
      Matthew Wilcox 提交于
      If the indirect_ptr bit is set on a slot, that indicates we need to redo
      the lookup.  Introduce a new function radix_tree_iter_retry() which
      forces the loop to retry the lookup by setting 'slot' to NULL and
      turning the iterator back to point at the problematic entry.
      
      This is a pretty rare problem to hit at the moment; the lookup has to
      race with a grow of the radix tree from a height of 0.  The consequences
      of hitting this race are that gang lookup could return a pointer to a
      radix_tree_node instead of a pointer to whatever the user had inserted
      in the tree.
      
      Fixes: cebbd29e ("radix-tree: rewrite gang lookup using iterator")
      Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ohad Ben-Cohen <ohad@wizery.com>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      46437f9a
    • K
      mm: polish virtual memory accounting · 30bdbb78
      Konstantin Khlebnikov 提交于
      * add VM_STACK as alias for VM_GROWSUP/DOWN depending on architecture
      * always account VMAs with flag VM_STACK as stack (as it was before)
      * cleanup classifying helpers
      * update comments and documentation
      Signed-off-by: NKonstantin Khlebnikov <koct9i@gmail.com>
      Tested-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30bdbb78
    • J
      mm: memcontrol: drop superfluous entry in the per-memcg stats array · c792e824
      Johannes Weiner 提交于
      MEM_CGROUP_STAT_NSTATS is just a delimiter for cgroup1 statistics, not
      an actual array entry.  Reuse it for the first cgroup2 stat entry, like
      in the event array.
      
      Fixes: b2807f07 ("mm: memcontrol: add "sock" to cgroup2 memory.stat")
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c792e824
    • J
      proc: revert /proc/<pid>/maps [stack:TID] annotation · 65376df5
      Johannes Weiner 提交于
      Commit b7643757 ("procfs: mark thread stack correctly in
      proc/<pid>/maps") added [stack:TID] annotation to /proc/<pid>/maps.
      
      Finding the task of a stack VMA requires walking the entire thread list,
      turning this into quadratic behavior: a thousand threads means a
      thousand stacks, so the rendering of /proc/<pid>/maps needs to look at a
      million combinations.
      
      The cost is not in proportion to the usefulness as described in the
      patch.
      
      Drop the [stack:TID] annotation to make /proc/<pid>/maps (and
      /proc/<pid>/numa_maps) usable again for higher thread counts.
      
      The [stack] annotation inside /proc/<pid>/task/<tid>/maps is retained, as
      identifying the stack VMA there is an O(1) operation.
      
      Siddesh said:
       "The end users needed a way to identify thread stacks programmatically and
        there wasn't a way to do that.  I'm afraid I no longer remember (or have
        access to the resources that would aid my memory since I changed
        employers) the details of their requirement.  However, I did do this on my
        own time because I thought it was an interesting project for me and nobody
        really gave any feedback then as to its utility, so as far as I am
        concerned you could roll back the main thread maps information since the
        information is available in the thread-specific files"
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com>
      Cc: Shaohua Li <shli@fb.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      65376df5
    • K
      thp: make split_queue per-node · a3d0a918
      Kirill A. Shutemov 提交于
      Andrea Arcangeli suggested to make split queue per-node to improve
      scalability.  Let's do it.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Suggested-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a3d0a918
  7. 03 2月, 2016 2 次提交
    • T
      ALSA: rawmidi: Make snd_rawmidi_transmit() race-free · 06ab3003
      Takashi Iwai 提交于
      A kernel WARNING in snd_rawmidi_transmit_ack() is triggered by
      syzkaller fuzzer:
        WARNING: CPU: 1 PID: 20739 at sound/core/rawmidi.c:1136
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
       [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
       [<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
       [<ffffffff84f80bd5>] snd_rawmidi_transmit_ack+0x275/0x400 sound/core/rawmidi.c:1136
       [<ffffffff84fdb3c1>] snd_virmidi_output_trigger+0x4b1/0x5a0 sound/core/seq/seq_virmidi.c:163
       [<     inline     >] snd_rawmidi_output_trigger sound/core/rawmidi.c:150
       [<ffffffff84f87ed9>] snd_rawmidi_kernel_write1+0x549/0x780 sound/core/rawmidi.c:1223
       [<ffffffff84f89fd3>] snd_rawmidi_write+0x543/0xb30 sound/core/rawmidi.c:1273
       [<ffffffff817b0323>] __vfs_write+0x113/0x480 fs/read_write.c:528
       [<ffffffff817b1db7>] vfs_write+0x167/0x4a0 fs/read_write.c:577
       [<     inline     >] SYSC_write fs/read_write.c:624
       [<ffffffff817b50a1>] SyS_write+0x111/0x220 fs/read_write.c:616
       [<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a arch/x86/entry/entry_64.S:185
      
      Also a similar warning is found but in another path:
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff82be2c0d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
       [<ffffffff81355139>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
       [<ffffffff81355369>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
       [<ffffffff8527e69a>] rawmidi_transmit_ack+0x24a/0x3b0 sound/core/rawmidi.c:1133
       [<ffffffff8527e851>] snd_rawmidi_transmit_ack+0x51/0x80 sound/core/rawmidi.c:1163
       [<ffffffff852d9046>] snd_virmidi_output_trigger+0x2b6/0x570 sound/core/seq/seq_virmidi.c:185
       [<     inline     >] snd_rawmidi_output_trigger sound/core/rawmidi.c:150
       [<ffffffff85285a0b>] snd_rawmidi_kernel_write1+0x4bb/0x760 sound/core/rawmidi.c:1252
       [<ffffffff85287b73>] snd_rawmidi_write+0x543/0xb30 sound/core/rawmidi.c:1302
       [<ffffffff817ba5f3>] __vfs_write+0x113/0x480 fs/read_write.c:528
       [<ffffffff817bc087>] vfs_write+0x167/0x4a0 fs/read_write.c:577
       [<     inline     >] SYSC_write fs/read_write.c:624
       [<ffffffff817bf371>] SyS_write+0x111/0x220 fs/read_write.c:616
       [<ffffffff86660276>] entry_SYSCALL_64_fastpath+0x16/0x7a arch/x86/entry/entry_64.S:185
      
      In the former case, the reason is that virmidi has an open code
      calling snd_rawmidi_transmit_ack() with the value calculated outside
      the spinlock.   We may use snd_rawmidi_transmit() in a loop just for
      consuming the input data, but even there, there is a race between
      snd_rawmidi_transmit_peek() and snd_rawmidi_tranmit_ack().
      
      Similarly in the latter case, it calls snd_rawmidi_transmit_peek() and
      snd_rawmidi_tranmit_ack() separately without protection, so they are
      racy as well.
      
      The patch tries to address these issues by the following ways:
      - Introduce the unlocked versions of snd_rawmidi_transmit_peek() and
        snd_rawmidi_transmit_ack() to be called inside the explicit lock.
      - Rewrite snd_rawmidi_transmit() to be race-free (the former case).
      - Make the split calls (the latter case) protected in the rawmidi spin
        lock.
      
      BugLink: http://lkml.kernel.org/r/CACT4Y+YPq1+cYLkadwjWa5XjzF1_Vki1eHnVn-Lm0hzhSpu5PA@mail.gmail.com
      BugLink: http://lkml.kernel.org/r/CACT4Y+acG4iyphdOZx47Nyq_VHGbpJQK-6xNpiqUjaZYqsXOGw@mail.gmail.comReported-by: NDmitry Vyukov <dvyukov@google.com>
      Tested-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      06ab3003
    • T
      ACPI / CPPC: remove redundant mbox_send_message() declaration · 2db8f9a1
      Timur Tabi 提交于
      Remove a redundant function declaration in cppc_acpi.h for
      mbox_send_message().  That function is defined in mailbox_client.h,
      which is already included.
      Signed-off-by: NTimur Tabi <timur@codeaurora.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2db8f9a1
  8. 02 2月, 2016 1 次提交
  9. 01 2月, 2016 1 次提交
  10. 31 1月, 2016 3 次提交
    • D
      block: use DAX for partition table reads · d1a5f2b4
      Dan Williams 提交于
      Avoid populating pagecache when the block device is in DAX mode.
      Otherwise these page cache entries collide with the fsync/msync
      implementation and break data durability guarantees.
      
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Reported-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NMatthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d1a5f2b4
    • D
      block: revert runtime dax control of the raw block device · 9f4736fe
      Dan Williams 提交于
      Dynamically enabling DAX requires that the page cache first be flushed
      and invalidated.  This must occur atomically with the change of DAX mode
      otherwise we confuse the fsync/msync tracking and violate data
      durability guarantees.  Eliminate the possibilty of DAX-disabled to
      DAX-enabled transitions for now and revisit this for the next cycle.
      
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      9f4736fe
    • D
      fs, block: force direct-I/O for dax-enabled block devices · 65f87ee7
      Dan Williams 提交于
      Similar to the file I/O path, re-direct all I/O to the DAX path for I/O
      to a block-device special file.  Both regular files and device special
      files can use the common filp->f_mapping->host lookup to determing is
      DAX is enabled.
      
      Otherwise, we confuse the DAX code that does not expect to find live
      data in the page cache:
      
          ------------[ cut here ]------------
          WARNING: CPU: 0 PID: 7676 at mm/filemap.c:217
          __delete_from_page_cache+0x9f6/0xb60()
          Modules linked in:
          CPU: 0 PID: 7676 Comm: a.out Not tainted 4.4.0+ #276
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
           00000000ffffffff ffff88006d3f7738 ffffffff82999e2d 0000000000000000
           ffff8800620a0000 ffffffff86473d20 ffff88006d3f7778 ffffffff81352089
           ffffffff81658d36 ffffffff86473d20 00000000000000d9 ffffea0000009d60
          Call Trace:
           [<     inline     >] __dump_stack lib/dump_stack.c:15
           [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
           [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
           [<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
           [<ffffffff81658d36>] __delete_from_page_cache+0x9f6/0xb60 mm/filemap.c:217
           [<ffffffff81658fb2>] delete_from_page_cache+0x112/0x200 mm/filemap.c:244
           [<ffffffff818af369>] __dax_fault+0x859/0x1800 fs/dax.c:487
           [<ffffffff8186f4f6>] blkdev_dax_fault+0x26/0x30 fs/block_dev.c:1730
           [<     inline     >] wp_pfn_shared mm/memory.c:2208
           [<ffffffff816e9145>] do_wp_page+0xc85/0x14f0 mm/memory.c:2307
           [<     inline     >] handle_pte_fault mm/memory.c:3323
           [<     inline     >] __handle_mm_fault mm/memory.c:3417
           [<ffffffff816ecec3>] handle_mm_fault+0x2483/0x4640 mm/memory.c:3446
           [<ffffffff8127eff6>] __do_page_fault+0x376/0x960 arch/x86/mm/fault.c:1238
           [<ffffffff8127f738>] trace_do_page_fault+0xe8/0x420 arch/x86/mm/fault.c:1331
           [<ffffffff812705c4>] do_async_page_fault+0x14/0xd0 arch/x86/kernel/kvm.c:264
           [<ffffffff86338f78>] async_page_fault+0x28/0x30 arch/x86/entry/entry_64.S:986
           [<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a
          arch/x86/entry/entry_64.S:185
          ---[ end trace dae21e0f85f1f98c ]---
      
      Fixes: 5a023cdb ("block: enable dax for raw block devices")
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Reported-by: NKirill A. Shutemov <kirill@shutemov.name>
      Suggested-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Suggested-by: NMatthew Wilcox <willy@linux.intel.com>
      Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      65f87ee7
  11. 30 1月, 2016 2 次提交
  12. 29 1月, 2016 6 次提交
  13. 27 1月, 2016 4 次提交
  14. 26 1月, 2016 2 次提交
  15. 25 1月, 2016 2 次提交
    • M
      of: drop symbols declared by _OF_DECLARE() from modules · 71f50c6d
      Masahiro Yamada 提交于
      The users of this macro (OF_EARLYCON_DECLARE, CLK_OF_DECLARE,
      IRQCHIP_DECLARE, etc.) are only parsed in the early boot stage.
      Such symbols contained in modules are never used.
      
      This commit fixes the link error introduced by commit b8d20e06
      ("serial: 8250_uniphier: add earlycon support"); the combination
      of CONFIG_SERIAL_8250_UNIPHIER=m and CONFIG_SERIAL_8250_CONSOLE=y
      fails to link:
      
      ERROR: "early_serial8250_setup" [drivers/tty/serial/8250/8250_uniphier.ko] undefined!
      
      Fixes: b8d20e06 ("serial: 8250_uniphier: add earlycon support")
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NRob Herring <robh@kernel.org>
      71f50c6d
    • A
      net: simplify napi_synchronize() to avoid warnings · facc432f
      Arnd Bergmann 提交于
      The napi_synchronize() function is defined twice: The definition
      for SMP builds waits for other CPUs to be done, while the uniprocessor
      variant just contains a barrier and ignores its argument.
      
      In the mvneta driver, this leads to a warning about an unused variable
      when we lookup the NAPI struct of another CPU and then don't use it:
      
      ethernet/marvell/mvneta.c: In function 'mvneta_percpu_notifier':
      ethernet/marvell/mvneta.c:2910:30: error: unused variable 'other_port' [-Werror=unused-variable]
      
      There are no other CPUs on a UP build, so that code never runs, but
      gcc does not know this.
      
      The nicest solution seems to be to turn the napi_synchronize() helper
      into an inline function for the UP case as well, as that leads gcc to
      not complain about the argument being unused. Once we do that, we can
      also combine the two cases into a single function definition and use
      if(IS_ENABLED()) rather than #ifdef to make it look a bit nicer.
      
      The warning first came up in linux-4.4, but I failed to catch it
      earlier.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Fixes: f8642885 ("net: mvneta: Statically assign queues to CPUs")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      facc432f