1. 17 2月, 2016 2 次提交
    • H
      net/mlx4_core: Set UAR page size to 4KB regardless of system page size · 85743f1e
      Huy Nguyen 提交于
      problem description:
      
      The current code sets UAR page size equal to system page size.
      The ConnectX-3 and ConnectX-3 Pro HWs require minimum 128 UAR pages.
      The mlx4 kernel drivers are not loaded if there is less than 128 UAR pages.
      
      solution:
      
      Always set UAR page to 4KB. This allows more UAR pages if the OS
      has PAGE_SIZE larger than 4KB. For example, PowerPC kernel use 64KB
      system page size, with 4MB uar region, there are 4MB/2/64KB = 32
      uars (half for uar, half for blueflame). This does not meet minimum 128
      UAR pages requirement. With 4KB UAR page, there are 4MB/2/4KB = 512 uars
      which meet the minimum requirement.
      
      Note that only codes in mlx4_core that deal with firmware know that uar
      page size is 4KB. Codes that deal with usr page in cq and qp context
      (mlx4_ib, mlx4_en and part of mlx4_core) still have the same assumption
      that uar page size equals to system page size.
      
      Note that with this implementation, on 64KB system page size kernel, there
      are 16 uars per system page but only one uars is used. The other 15
      uars are ignored because of the above assumption.
      
      Regarding SR-IOV, mlx4_core in hypervisor will set the uar page size
      to 4KB and mlx4_core code in virtual OS will obtain the uar page size from
      firmware.
      
      Regarding backward compatibility in SR-IOV, if hypervisor has this new code,
      the virtual OS must be updated. If hypervisor has old code, and the virtual
      OS has this new code, the new code will be backward compatible with the
      old code. If the uar size is big enough, this new code in VF continues to
      work with 64 KB uar page size (on PowerPc kernel). If the uar size does not
      meet 128 uars requirement, this new code not loaded in VF and print the same
      error message as the old code in Hypervisor.
      Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85743f1e
    • M
      net/mlx5: Use offset based reserved field names in the IFC header file · b4ff3a36
      Matan Barak 提交于
      mlx5_ifc.h is a header file representing the API and ABI between
      the driver to the firmware and hardware. This file is used from
      both the mlx5_ib and mlx5_core drivers.
      
      Previously, this file used incrementing counter to indicate
      reserved fields, for example:
      
      struct mlx5_ifc_odp_per_transport_service_cap_bits {
              u8         send[0x1];
              u8         receive[0x1];
              u8         write[0x1];
              u8         read[0x1];
              u8         reserved_0[0x1];
              u8         srq_receive[0x1];
              u8         reserved_1[0x1a];
      };
      
      If one developer implements through net-next feature A that uses
      reserved_0, they replace it with featureA and renames reserved_1 to
      reserved_0. In the same kernel cycle, a 2nd developer could implement
      feature B through the rdma tree, that uses reserved_1 and split it to
      featureB and a smaller reserved_1 field. This will cause a conflict
      when the two trees are merged.
      
      The source of this conflict is that the 1st developer changed *all*
      reserved fields.
      
      As Linus suggested, we change the layout of structs to:
      
      struct mlx5_ifc_odp_per_transport_service_cap_bits {
      	u8         send[0x1];
      	u8         receive[0x1];
      	u8         write[0x1];
      	u8         read[0x1];
      	u8         reserved_at_4[0x1];
      	u8         srq_receive[0x1];
      	u8         reserved_at_6[0x1a];
      };
      
      This makes the conflicts much more rare and preserves the locality of
      changes.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NAlaa Hleihel <alaa@mellanox.com>
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4ff3a36
  2. 10 2月, 2016 1 次提交
    • D
      vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices · 7e059158
      David Wragg 提交于
      Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could
      transmit vxlan packets of any size, constrained only by the ability to
      send out the resulting packets.  4.3 introduced netdevs corresponding
      to tunnel vports.  These netdevs have an MTU, which limits the size of
      a packet that can be successfully encapsulated.  The default MTU
      values are low (1500 or less), which is awkwardly small in the context
      of physical networks supporting jumbo frames, and leads to a
      conspicuous change in behaviour for userspace.
      
      Instead, set the MTU on openvswitch-created netdevs to be the relevant
      maximum (i.e. the maximum IP packet size minus any relevant overhead),
      effectively restoring the behaviour prior to 4.3.
      Signed-off-by: NDavid Wragg <david@weave.works>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e059158
  3. 09 2月, 2016 2 次提交
  4. 08 2月, 2016 1 次提交
  5. 06 2月, 2016 3 次提交
  6. 05 2月, 2016 5 次提交
  7. 04 2月, 2016 6 次提交
    • H
      [media] vb2: fix nasty vb2_thread regression · fac710e4
      Hans Verkuil 提交于
      The vb2_thread implementation was made generic and was moved from
      videobuf2-v4l2.c to videobuf2-core.c in commit af3bac1a. Unfortunately
      that clearly was never tested since it broke read() causing NULL address
      references.
      
      The root cause was confused handling of vb2_buffer vs v4l2_buffer (the pb
      pointer in various core functions).
      
      The v4l2_buffer no longer exists after moving the code into the core and
      it is no longer needed. However, the vb2_thread code passed a pointer to
      a vb2_buffer to the core functions were a v4l2_buffer pointer was expected
      and vb2_thread expected that the vb2_buffer fields would be filled in
      correctly.
      
      This is obviously wrong since v4l2_buffer != vb2_buffer. Note that the
      pb pointer is a void pointer, so no type-checking took place.
      
      This patch fixes this problem:
      
      1) allow pb to be NULL for vb2_core_(d)qbuf. The vb2_thread code will use
         a NULL pointer here since they don't care about v4l2_buffer anyway.
      2) let vb2_core_dqbuf pass back the index of the received buffer. This is
         all vb2_thread needs: this index is the index into the q->bufs array
         and vb2_thread just gets the vb2_buffer from there.
      3) the fileio->b pointer (that originally contained a v4l2_buffer) is
         removed altogether since it is no longer needed.
      
      Tested with vivid and the cobalt driver.
      
      Cc: stable@vger.kernel.org # Kernel >= 4.3
      Signed-off-by: NHans Verkuil <hans.verkuil@cisco.com>
      Reported-by: NMatthias Schwarzott <zzam@gentoo.org>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
      fac710e4
    • M
      radix-tree: fix race in gang lookup · 46437f9a
      Matthew Wilcox 提交于
      If the indirect_ptr bit is set on a slot, that indicates we need to redo
      the lookup.  Introduce a new function radix_tree_iter_retry() which
      forces the loop to retry the lookup by setting 'slot' to NULL and
      turning the iterator back to point at the problematic entry.
      
      This is a pretty rare problem to hit at the moment; the lookup has to
      race with a grow of the radix tree from a height of 0.  The consequences
      of hitting this race are that gang lookup could return a pointer to a
      radix_tree_node instead of a pointer to whatever the user had inserted
      in the tree.
      
      Fixes: cebbd29e ("radix-tree: rewrite gang lookup using iterator")
      Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ohad Ben-Cohen <ohad@wizery.com>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      46437f9a
    • K
      mm: polish virtual memory accounting · 30bdbb78
      Konstantin Khlebnikov 提交于
      * add VM_STACK as alias for VM_GROWSUP/DOWN depending on architecture
      * always account VMAs with flag VM_STACK as stack (as it was before)
      * cleanup classifying helpers
      * update comments and documentation
      Signed-off-by: NKonstantin Khlebnikov <koct9i@gmail.com>
      Tested-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30bdbb78
    • J
      mm: memcontrol: drop superfluous entry in the per-memcg stats array · c792e824
      Johannes Weiner 提交于
      MEM_CGROUP_STAT_NSTATS is just a delimiter for cgroup1 statistics, not
      an actual array entry.  Reuse it for the first cgroup2 stat entry, like
      in the event array.
      
      Fixes: b2807f07 ("mm: memcontrol: add "sock" to cgroup2 memory.stat")
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c792e824
    • J
      proc: revert /proc/<pid>/maps [stack:TID] annotation · 65376df5
      Johannes Weiner 提交于
      Commit b7643757 ("procfs: mark thread stack correctly in
      proc/<pid>/maps") added [stack:TID] annotation to /proc/<pid>/maps.
      
      Finding the task of a stack VMA requires walking the entire thread list,
      turning this into quadratic behavior: a thousand threads means a
      thousand stacks, so the rendering of /proc/<pid>/maps needs to look at a
      million combinations.
      
      The cost is not in proportion to the usefulness as described in the
      patch.
      
      Drop the [stack:TID] annotation to make /proc/<pid>/maps (and
      /proc/<pid>/numa_maps) usable again for higher thread counts.
      
      The [stack] annotation inside /proc/<pid>/task/<tid>/maps is retained, as
      identifying the stack VMA there is an O(1) operation.
      
      Siddesh said:
       "The end users needed a way to identify thread stacks programmatically and
        there wasn't a way to do that.  I'm afraid I no longer remember (or have
        access to the resources that would aid my memory since I changed
        employers) the details of their requirement.  However, I did do this on my
        own time because I thought it was an interesting project for me and nobody
        really gave any feedback then as to its utility, so as far as I am
        concerned you could roll back the main thread maps information since the
        information is available in the thread-specific files"
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com>
      Cc: Shaohua Li <shli@fb.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      65376df5
    • K
      thp: make split_queue per-node · a3d0a918
      Kirill A. Shutemov 提交于
      Andrea Arcangeli suggested to make split queue per-node to improve
      scalability.  Let's do it.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Suggested-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a3d0a918
  8. 03 2月, 2016 3 次提交
    • T
      ALSA: rawmidi: Make snd_rawmidi_transmit() race-free · 06ab3003
      Takashi Iwai 提交于
      A kernel WARNING in snd_rawmidi_transmit_ack() is triggered by
      syzkaller fuzzer:
        WARNING: CPU: 1 PID: 20739 at sound/core/rawmidi.c:1136
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
       [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
       [<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
       [<ffffffff84f80bd5>] snd_rawmidi_transmit_ack+0x275/0x400 sound/core/rawmidi.c:1136
       [<ffffffff84fdb3c1>] snd_virmidi_output_trigger+0x4b1/0x5a0 sound/core/seq/seq_virmidi.c:163
       [<     inline     >] snd_rawmidi_output_trigger sound/core/rawmidi.c:150
       [<ffffffff84f87ed9>] snd_rawmidi_kernel_write1+0x549/0x780 sound/core/rawmidi.c:1223
       [<ffffffff84f89fd3>] snd_rawmidi_write+0x543/0xb30 sound/core/rawmidi.c:1273
       [<ffffffff817b0323>] __vfs_write+0x113/0x480 fs/read_write.c:528
       [<ffffffff817b1db7>] vfs_write+0x167/0x4a0 fs/read_write.c:577
       [<     inline     >] SYSC_write fs/read_write.c:624
       [<ffffffff817b50a1>] SyS_write+0x111/0x220 fs/read_write.c:616
       [<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a arch/x86/entry/entry_64.S:185
      
      Also a similar warning is found but in another path:
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff82be2c0d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
       [<ffffffff81355139>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
       [<ffffffff81355369>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
       [<ffffffff8527e69a>] rawmidi_transmit_ack+0x24a/0x3b0 sound/core/rawmidi.c:1133
       [<ffffffff8527e851>] snd_rawmidi_transmit_ack+0x51/0x80 sound/core/rawmidi.c:1163
       [<ffffffff852d9046>] snd_virmidi_output_trigger+0x2b6/0x570 sound/core/seq/seq_virmidi.c:185
       [<     inline     >] snd_rawmidi_output_trigger sound/core/rawmidi.c:150
       [<ffffffff85285a0b>] snd_rawmidi_kernel_write1+0x4bb/0x760 sound/core/rawmidi.c:1252
       [<ffffffff85287b73>] snd_rawmidi_write+0x543/0xb30 sound/core/rawmidi.c:1302
       [<ffffffff817ba5f3>] __vfs_write+0x113/0x480 fs/read_write.c:528
       [<ffffffff817bc087>] vfs_write+0x167/0x4a0 fs/read_write.c:577
       [<     inline     >] SYSC_write fs/read_write.c:624
       [<ffffffff817bf371>] SyS_write+0x111/0x220 fs/read_write.c:616
       [<ffffffff86660276>] entry_SYSCALL_64_fastpath+0x16/0x7a arch/x86/entry/entry_64.S:185
      
      In the former case, the reason is that virmidi has an open code
      calling snd_rawmidi_transmit_ack() with the value calculated outside
      the spinlock.   We may use snd_rawmidi_transmit() in a loop just for
      consuming the input data, but even there, there is a race between
      snd_rawmidi_transmit_peek() and snd_rawmidi_tranmit_ack().
      
      Similarly in the latter case, it calls snd_rawmidi_transmit_peek() and
      snd_rawmidi_tranmit_ack() separately without protection, so they are
      racy as well.
      
      The patch tries to address these issues by the following ways:
      - Introduce the unlocked versions of snd_rawmidi_transmit_peek() and
        snd_rawmidi_transmit_ack() to be called inside the explicit lock.
      - Rewrite snd_rawmidi_transmit() to be race-free (the former case).
      - Make the split calls (the latter case) protected in the rawmidi spin
        lock.
      
      BugLink: http://lkml.kernel.org/r/CACT4Y+YPq1+cYLkadwjWa5XjzF1_Vki1eHnVn-Lm0hzhSpu5PA@mail.gmail.com
      BugLink: http://lkml.kernel.org/r/CACT4Y+acG4iyphdOZx47Nyq_VHGbpJQK-6xNpiqUjaZYqsXOGw@mail.gmail.comReported-by: NDmitry Vyukov <dvyukov@google.com>
      Tested-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      06ab3003
    • R
      modules: fix longstanding /proc/kallsyms vs module insertion race. · 8244062e
      Rusty Russell 提交于
      For CONFIG_KALLSYMS, we keep two symbol tables and two string tables.
      There's one full copy, marked SHF_ALLOC and laid out at the end of the
      module's init section.  There's also a cut-down version that only
      contains core symbols and strings, and lives in the module's core
      section.
      
      After module init (and before we free the module memory), we switch
      the mod->symtab, mod->num_symtab and mod->strtab to point to the core
      versions.  We do this under the module_mutex.
      
      However, kallsyms doesn't take the module_mutex: it uses
      preempt_disable() and rcu tricks to walk through the modules, because
      it's used in the oops path.  It's also used in /proc/kallsyms.
      There's nothing atomic about the change of these variables, so we can
      get the old (larger!) num_symtab and the new symtab pointer; in fact
      this is what I saw when trying to reproduce.
      
      By grouping these variables together, we can use a
      carefully-dereferenced pointer to ensure we always get one or the
      other (the free of the module init section is already done in an RCU
      callback, so that's safe).  We allocate the init one at the end of the
      module init section, and keep the core one inside the struct module
      itself (it could also have been allocated at the end of the module
      core, but that's probably overkill).
      Reported-by: NWeilong Chen <chenweilong@huawei.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111541
      Cc: stable@kernel.org
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      8244062e
    • T
      ACPI / CPPC: remove redundant mbox_send_message() declaration · 2db8f9a1
      Timur Tabi 提交于
      Remove a redundant function declaration in cppc_acpi.h for
      mbox_send_message().  That function is defined in mailbox_client.h,
      which is already included.
      Signed-off-by: NTimur Tabi <timur@codeaurora.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2db8f9a1
  9. 02 2月, 2016 1 次提交
  10. 01 2月, 2016 1 次提交
  11. 31 1月, 2016 3 次提交
    • D
      block: use DAX for partition table reads · d1a5f2b4
      Dan Williams 提交于
      Avoid populating pagecache when the block device is in DAX mode.
      Otherwise these page cache entries collide with the fsync/msync
      implementation and break data durability guarantees.
      
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Reported-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NMatthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d1a5f2b4
    • D
      block: revert runtime dax control of the raw block device · 9f4736fe
      Dan Williams 提交于
      Dynamically enabling DAX requires that the page cache first be flushed
      and invalidated.  This must occur atomically with the change of DAX mode
      otherwise we confuse the fsync/msync tracking and violate data
      durability guarantees.  Eliminate the possibilty of DAX-disabled to
      DAX-enabled transitions for now and revisit this for the next cycle.
      
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      9f4736fe
    • D
      fs, block: force direct-I/O for dax-enabled block devices · 65f87ee7
      Dan Williams 提交于
      Similar to the file I/O path, re-direct all I/O to the DAX path for I/O
      to a block-device special file.  Both regular files and device special
      files can use the common filp->f_mapping->host lookup to determing is
      DAX is enabled.
      
      Otherwise, we confuse the DAX code that does not expect to find live
      data in the page cache:
      
          ------------[ cut here ]------------
          WARNING: CPU: 0 PID: 7676 at mm/filemap.c:217
          __delete_from_page_cache+0x9f6/0xb60()
          Modules linked in:
          CPU: 0 PID: 7676 Comm: a.out Not tainted 4.4.0+ #276
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
           00000000ffffffff ffff88006d3f7738 ffffffff82999e2d 0000000000000000
           ffff8800620a0000 ffffffff86473d20 ffff88006d3f7778 ffffffff81352089
           ffffffff81658d36 ffffffff86473d20 00000000000000d9 ffffea0000009d60
          Call Trace:
           [<     inline     >] __dump_stack lib/dump_stack.c:15
           [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
           [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
           [<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
           [<ffffffff81658d36>] __delete_from_page_cache+0x9f6/0xb60 mm/filemap.c:217
           [<ffffffff81658fb2>] delete_from_page_cache+0x112/0x200 mm/filemap.c:244
           [<ffffffff818af369>] __dax_fault+0x859/0x1800 fs/dax.c:487
           [<ffffffff8186f4f6>] blkdev_dax_fault+0x26/0x30 fs/block_dev.c:1730
           [<     inline     >] wp_pfn_shared mm/memory.c:2208
           [<ffffffff816e9145>] do_wp_page+0xc85/0x14f0 mm/memory.c:2307
           [<     inline     >] handle_pte_fault mm/memory.c:3323
           [<     inline     >] __handle_mm_fault mm/memory.c:3417
           [<ffffffff816ecec3>] handle_mm_fault+0x2483/0x4640 mm/memory.c:3446
           [<ffffffff8127eff6>] __do_page_fault+0x376/0x960 arch/x86/mm/fault.c:1238
           [<ffffffff8127f738>] trace_do_page_fault+0xe8/0x420 arch/x86/mm/fault.c:1331
           [<ffffffff812705c4>] do_async_page_fault+0x14/0xd0 arch/x86/kernel/kvm.c:264
           [<ffffffff86338f78>] async_page_fault+0x28/0x30 arch/x86/entry/entry_64.S:986
           [<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a
          arch/x86/entry/entry_64.S:185
          ---[ end trace dae21e0f85f1f98c ]---
      
      Fixes: 5a023cdb ("block: enable dax for raw block devices")
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Reported-by: NKirill A. Shutemov <kirill@shutemov.name>
      Suggested-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Suggested-by: NMatthew Wilcox <willy@linux.intel.com>
      Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      65f87ee7
  12. 30 1月, 2016 3 次提交
    • P
      ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail() · 6f21c96a
      Paolo Abeni 提交于
      The current implementation of ip6_dst_lookup_tail basically
      ignore the egress ifindex match: if the saddr is set,
      ip6_route_output() purposefully ignores flowi6_oif, due
      to the commit d46a9d67 ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE
      flag if saddr set"), if the saddr is 'any' the first route lookup
      in ip6_dst_lookup_tail fails, but upon failure a second lookup will
      be performed with saddr set, thus ignoring the ifindex constraint.
      
      This commit adds an output route lookup function variant, which
      allows the caller to specify lookup flags, and modify
      ip6_dst_lookup_tail() to enforce the ifindex match on the second
      lookup via said helper.
      
      ip6_route_output() becames now a static inline function build on
      top of ip6_route_output_flags(); as a side effect, out-of-tree
      modules need now a GPL license to access the output route lookup
      functionality.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f21c96a
    • J
      21603fc4
    • T
      workqueue: skip flush dependency checks for legacy workqueues · 23d11a58
      Tejun Heo 提交于
      fca839c0 ("workqueue: warn if memory reclaim tries to flush
      !WQ_MEM_RECLAIM workqueue") implemented flush dependency warning which
      triggers if a PF_MEMALLOC task or WQ_MEM_RECLAIM workqueue tries to
      flush a !WQ_MEM_RECLAIM workquee.
      
      This assumes that workqueues marked with WQ_MEM_RECLAIM sit in memory
      reclaim path and making it depend on something which may need more
      memory to make forward progress can lead to deadlocks.  Unfortunately,
      workqueues created with the legacy create*_workqueue() interface
      always have WQ_MEM_RECLAIM regardless of whether they are depended
      upon memory reclaim or not.  These spurious WQ_MEM_RECLAIM markings
      cause spurious triggering of the flush dependency checks.
      
        WARNING: CPU: 0 PID: 6 at kernel/workqueue.c:2361 check_flush_dependency+0x138/0x144()
        workqueue: WQ_MEM_RECLAIM deferwq:deferred_probe_work_func is flushing !WQ_MEM_RECLAIM events:lru_add_drain_per_cpu
        ...
        Workqueue: deferwq deferred_probe_work_func
        [<c0017acc>] (unwind_backtrace) from [<c0013134>] (show_stack+0x10/0x14)
        [<c0013134>] (show_stack) from [<c0245f18>] (dump_stack+0x94/0xd4)
        [<c0245f18>] (dump_stack) from [<c0026f9c>] (warn_slowpath_common+0x80/0xb0)
        [<c0026f9c>] (warn_slowpath_common) from [<c0026ffc>] (warn_slowpath_fmt+0x30/0x40)
        [<c0026ffc>] (warn_slowpath_fmt) from [<c00390b8>] (check_flush_dependency+0x138/0x144)
        [<c00390b8>] (check_flush_dependency) from [<c0039ca0>] (flush_work+0x50/0x15c)
        [<c0039ca0>] (flush_work) from [<c00c51b0>] (lru_add_drain_all+0x130/0x180)
        [<c00c51b0>] (lru_add_drain_all) from [<c00f728c>] (migrate_prep+0x8/0x10)
        [<c00f728c>] (migrate_prep) from [<c00bfbc4>] (alloc_contig_range+0xd8/0x338)
        [<c00bfbc4>] (alloc_contig_range) from [<c00f8f18>] (cma_alloc+0xe0/0x1ac)
        [<c00f8f18>] (cma_alloc) from [<c001cac4>] (__alloc_from_contiguous+0x38/0xd8)
        [<c001cac4>] (__alloc_from_contiguous) from [<c001ceb4>] (__dma_alloc+0x240/0x278)
        [<c001ceb4>] (__dma_alloc) from [<c001cf78>] (arm_dma_alloc+0x54/0x5c)
        [<c001cf78>] (arm_dma_alloc) from [<c0355ea4>] (dmam_alloc_coherent+0xc0/0xec)
        [<c0355ea4>] (dmam_alloc_coherent) from [<c039cc4c>] (ahci_port_start+0x150/0x1dc)
        [<c039cc4c>] (ahci_port_start) from [<c0384734>] (ata_host_start.part.3+0xc8/0x1c8)
        [<c0384734>] (ata_host_start.part.3) from [<c03898dc>] (ata_host_activate+0x50/0x148)
        [<c03898dc>] (ata_host_activate) from [<c039d558>] (ahci_host_activate+0x44/0x114)
        [<c039d558>] (ahci_host_activate) from [<c039f05c>] (ahci_platform_init_host+0x1d8/0x3c8)
        [<c039f05c>] (ahci_platform_init_host) from [<c039e6bc>] (tegra_ahci_probe+0x448/0x4e8)
        [<c039e6bc>] (tegra_ahci_probe) from [<c0347058>] (platform_drv_probe+0x50/0xac)
        [<c0347058>] (platform_drv_probe) from [<c03458cc>] (driver_probe_device+0x214/0x2c0)
        [<c03458cc>] (driver_probe_device) from [<c0343cc0>] (bus_for_each_drv+0x60/0x94)
        [<c0343cc0>] (bus_for_each_drv) from [<c03455d8>] (__device_attach+0xb0/0x114)
        [<c03455d8>] (__device_attach) from [<c0344ab8>] (bus_probe_device+0x84/0x8c)
        [<c0344ab8>] (bus_probe_device) from [<c0344f48>] (deferred_probe_work_func+0x68/0x98)
        [<c0344f48>] (deferred_probe_work_func) from [<c003b738>] (process_one_work+0x120/0x3f8)
        [<c003b738>] (process_one_work) from [<c003ba48>] (worker_thread+0x38/0x55c)
        [<c003ba48>] (worker_thread) from [<c0040f14>] (kthread+0xdc/0xf4)
        [<c0040f14>] (kthread) from [<c000f778>] (ret_from_fork+0x14/0x3c)
      
      Fix it by marking workqueues created via create*_workqueue() with
      __WQ_LEGACY and disabling flush dependency checks on them.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-and-tested-by: NThierry Reding <thierry.reding@gmail.com>
      Link: http://lkml.kernel.org/g/20160126173843.GA11115@ulmo.nvidia.com
      Fixes: fca839c0 ("workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue")
      23d11a58
  13. 29 1月, 2016 6 次提交
  14. 27 1月, 2016 3 次提交