1. 14 8月, 2018 10 次提交
    • L
      Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · de5d1b39
      Linus Torvalds 提交于
      Pull locking/atomics update from Thomas Gleixner:
       "The locking, atomics and memory model brains delivered:
      
         - A larger update to the atomics code which reworks the ordering
           barriers, consolidates the atomic primitives, provides the new
           atomic64_fetch_add_unless() primitive and cleans up the include
           hell.
      
         - Simplify cmpxchg() instrumentation and add instrumentation for
           xchg() and cmpxchg_double().
      
         - Updates to the memory model and documentation"
      
      * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
        locking/atomics: Rework ordering barriers
        locking/atomics: Instrument cmpxchg_double*()
        locking/atomics: Instrument xchg()
        locking/atomics: Simplify cmpxchg() instrumentation
        locking/atomics/x86: Reduce arch_cmpxchg64*() instrumentation
        tools/memory-model: Rename litmus tests to comply to norm7
        tools/memory-model/Documentation: Fix typo, smb->smp
        sched/Documentation: Update wake_up() & co. memory-barrier guarantees
        locking/spinlock, sched/core: Clarify requirements for smp_mb__after_spinlock()
        sched/core: Use smp_mb() in wake_woken_function()
        tools/memory-model: Add informal LKMM documentation to MAINTAINERS
        locking/atomics/Documentation: Describe atomic_set() as a write operation
        tools/memory-model: Make scripts executable
        tools/memory-model: Remove ACCESS_ONCE() from model
        tools/memory-model: Remove ACCESS_ONCE() from recipes
        locking/memory-barriers.txt/kokr: Update Korean translation to fix broken DMA vs. MMIO ordering example
        MAINTAINERS: Add Daniel Lustig as an LKMM reviewer
        tools/memory-model: Fix ISA2+pooncelock+pooncelock+pombonce name
        tools/memory-model: Add litmus test for full multicopy atomicity
        locking/refcount: Always allow checked forms
        ...
      de5d1b39
    • L
      Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1c594774
      Linus Torvalds 提交于
      Pull CPU hotplug update from Thomas Gleixner:
       "A trivial name fix for the hotplug state machine"
      
      * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        cpu/hotplug: Clarify CPU hotplug step name for timers
      1c594774
    • L
      Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f7951c33
      Linus Torvalds 提交于
      Pull scheduler updates from Thomas Gleixner:
      
       - Cleanup and improvement of NUMA balancing
      
       - Refactoring and improvements to the PELT (Per Entity Load Tracking)
         code
      
       - Watchdog simplification and related cleanups
      
       - The usual pile of small incremental fixes and improvements
      
      * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
        watchdog: Reduce message verbosity
        stop_machine: Reflow cpu_stop_queue_two_works()
        sched/numa: Move task_numa_placement() closer to numa_migrate_preferred()
        sched/numa: Use group_weights to identify if migration degrades locality
        sched/numa: Update the scan period without holding the numa_group lock
        sched/numa: Remove numa_has_capacity()
        sched/numa: Modify migrate_swap() to accept additional parameters
        sched/numa: Remove unused task_capacity from 'struct numa_stats'
        sched/numa: Skip nodes that are at 'hoplimit'
        sched/debug: Reverse the order of printing faults
        sched/numa: Use task faults only if numa_group is not yet set up
        sched/numa: Set preferred_node based on best_cpu
        sched/numa: Simplify load_too_imbalanced()
        sched/numa: Evaluate move once per node
        sched/numa: Remove redundant field
        sched/debug: Show the sum wait time of a task group
        sched/fair: Remove #ifdefs from scale_rt_capacity()
        sched/core: Remove get_cpu() from sched_fork()
        sched/cpufreq: Clarify sugov_get_util()
        sched/sysctl: Remove unused sched_time_avg_ms sysctl
        ...
      f7951c33
    • L
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2406fb8d
      Linus Torvalds 提交于
      Pull scheduler fix from Thomas Gleixner:
       "A single bugfix to prevent a pinned thread which queues stomp machine
        work to be preempted by the stopper thread on its CPU which causes a
        live lock as it is unable to wake the second CPUs stopper thread"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        stop_machine: Atomically queue and wake stopper threads
      2406fb8d
    • L
      Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 37a16046
      Linus Torvalds 提交于
      Pull x86 RAS updates from Thomas Gleixner:
       "A small set of changes to the RAS core:
      
         - Rework of the MCE bank scanning code
      
         - Y2038 converion"
      
      * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mce: Cleanup __mc_scan_banks()
        x86/mce: Carve out bank scanning code
        x86/mce: Remove !banks check
        x86/mce: Carve out the crashing_cpu check
        x86/mce: Always use 64-bit timestamps
      37a16046
    • L
      Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b99cdfdf
      Linus Torvalds 提交于
      Pull RCU updates from Thomas Gleixner:
       "A large update to RCU:
      
        Preparatory work for consolidating the RCU flavors:
      
         - Introduce grace-period sequence numbers to the RCU-bh, RCU-preempt,
           and RCU-sched flavors, replacing the old ->gpnum and ->completed
           pair of fields.
      
           This change allows lockless code to obtain the complete
           grace-period state with a single READ_ONCE(), which is needed to
           maintain tolerable lock contention during the upcoming
           consolidation of the three RCU flavors.
      
           Note that grace-period sequence numbers are already used by
           rcu_barrier(), expedited RCU grace periods, and SRCU, and are thus
           already heavily used and well-tested. Joel Fernandes contributed a
           number of excellent fixes and improvements.
      
         - Clean up some grace-period-reporting loose ends, including
           improving the handling of quiescent states from offline CPUs and
           fixing some false-positive WARN_ON_ONCE() invocations.
      
           (Strictly speaking, the WARN_ON_ONCE() invocations were quite
           correct, but their invariants were (harmlessly) violated by the
           earlier sloppy handling of quiescent states from offline CPUs.)
      
           In addition, improve grace-period forward-progress guarantees so as
           to allow removal of fail-safe checks that required otherwise
           needless lock acquisitions. Finally, add more diagnostics to help
           debug the upcoming consolidation of the RCU-bh, RCU-preempt, and
           RCU-sched flavors.
      
        The rest:
      
         - SRCU updates
      
         - Updates to rcutorture and associated scripting.
      
         - The usual pile of miscellaneous fixes"
      
      * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (118 commits)
        rcutorture: Fix rcu_barrier successes counter
        rcutorture: Add support to detect if boost kthread prio is too low
        rcutorture: Use monotonic timestamp for stall detection
        rcutorture: Make boost test more robust
        rcutorture: Disable RT throttling for boost tests
        rcutorture: Emphasize testing of single reader protection type
        rcutorture: Handle extended read-side critical sections
        rcutorture: Make rcu_torture_timer() use rcu_torture_one_read()
        rcutorture: Use per-CPU random state for rcu_torture_timer()
        rcutorture: Use atomic increment for n_rcu_torture_timers
        rcutorture: Extract common code from rcu_torture_reader()
        rcuperf: Remove unused torturing_tasks() function
        rcu: Remove rcutorture test version and sequence number
        rcutorture: Change units of onoff_interval to jiffies
        rcu: Assign higher prio to RCU threads if rcutorture is built-in
        rculist: Improve documentation for list_for_each_entry_from_rcu()
        srcu: Add grace-period number to rcutorture statistics printout
        rcu: Print stall-warning NMI dyntick state in hexadecimal
        MAINTAINERS: Update RCU, SRCU, and TORTURE-TEST entries
        rcu: Make rcu_seq_diff() more exact
        ...
      b99cdfdf
    • L
      Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d0daaeaf
      Linus Torvalds 提交于
      Pull genirq updates from Thomas Gleixner:
       "The irq departement provides:
      
         - A synchronization fix for free_irq() to synchronize just the
           removed interrupt thread on shared interrupt lines.
      
         - Consolidate the multi low level interrupt entry handling and mvoe
           it to the generic code instead of adding yet another copy for
           RISC-V
      
         - Refactoring of the ARM LPI allocator and LPI exposure to the
           hypervisor
      
         - Yet another interrupt chip driver for the JZ4725B SoC
      
         - Speed up for /proc/interrupts as people seem to love reading this
           file with high frequency
      
         - Miscellaneous fixes and updates"
      
      * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
        irqchip/gic-v3-its: Make its_lock a raw_spin_lock_t
        genirq/irqchip: Remove MULTI_IRQ_HANDLER as it's now obselete
        openrisc: Use the new GENERIC_IRQ_MULTI_HANDLER
        arm64: Use the new GENERIC_IRQ_MULTI_HANDLER
        ARM: Convert to GENERIC_IRQ_MULTI_HANDLER
        irqchip: Port the ARM IRQ drivers to GENERIC_IRQ_MULTI_HANDLER
        irqchip/gic-v3-its: Reduce minimum LPI allocation to 1 for PCI devices
        dt-bindings: irqchip: renesas-irqc: Document r8a77980 support
        dt-bindings: irqchip: renesas-irqc: Document r8a77470 support
        irqchip/ingenic: Add support for the JZ4725B SoC
        irqchip/stm32: Add exti0 translation for stm32mp1
        genirq: Remove redundant NULL pointer check in __free_irq()
        irqchip/gic-v3-its: Honor hypervisor enforced LPI range
        irqchip/gic-v3: Expose GICD_TYPER in the rdist structure
        irqchip/gic-v3-its: Drop chunk allocation compatibility
        irqchip/gic-v3-its: Move minimum LPI requirements to individual busses
        irqchip/gic-v3-its: Use full range of LPIs
        irqchip/gic-v3-its: Refactor LPI allocator
        genirq: Synchronize only with single thread on free_irq()
        genirq: Update code comments wrt recycled thread_mask
        ...
      d0daaeaf
    • L
      Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 40043927
      Linus Torvalds 提交于
      Pull EFI updates from Thomas Gleixner:
       "The EFI pile:
      
         - Make mixed mode UEFI runtime service invocations mutually
           exclusive, as mandated by the UEFI spec
      
         - Perform UEFI runtime services calls from a work queue so the calls
           into the firmware occur from a kernel thread
      
         - Honor the UEFI memory map attributes for live memory regions
           configured by UEFI as a framebuffer. This works around a coherency
           problem with KVM guests running on ARM.
      
         - Cleanups, improvements and fixes all over the place"
      
      * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efivars: Call guid_parse() against guid_t type of variable
        efi/cper: Use consistent types for UUIDs
        efi/x86: Replace references to efi_early->is64 with efi_is_64bit()
        efi: Deduplicate efi_open_volume()
        efi/x86: Add missing NULL initialization in UGA draw protocol discovery
        efi/x86: Merge 32-bit and 64-bit UGA draw protocol setup routines
        efi/x86: Align efi_uga_draw_protocol typedef names to convention
        efi/x86: Merge the setup_efi_pci32() and setup_efi_pci64() routines
        efi/x86: Prevent reentrant firmware calls in mixed mode
        efi/esrt: Only call efi_mem_reserve() for boot services memory
        fbdev/efifb: Honour UEFI memory map attributes when mapping the FB
        efi: Drop type and attribute checks in efi_mem_desc_lookup()
        efi/libstub/arm: Add opt-in Kconfig option for the DTB loader
        efi: Remove the declaration of efi_late_init() as the function is unused
        efi/cper: Avoid using get_seconds()
        efi: Use a work queue to invoke EFI Runtime Services
        efi/x86: Use non-blocking SetVariable() for efi_delete_dummy_variable()
        efi/x86: Clean up the eboot code
      40043927
    • L
      Merge branch 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0ad6b38a
      Linus Torvalds 提交于
      Pull debugobjects update from Thomas Gleixner:
       "Two simple updates for the debug objects code:
      
         - Make the stack check warning more informative by adding the object
           and the stack page address to the printout
      
         - Remove a redundant NULL pointer check"
      
      * 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        debugobjects: Remove redundant NULL pointer check
        debugobjects: Make stack check warning more informative
      0ad6b38a
    • L
      Merge tag 'm68k-for-v4.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k · 03e61914
      Linus Torvalds 提交于
      Pull m68k updates from Geert Uytterhoeven:
      
       - Enable mac_scsi PDMA on PowerBook 500
      
       - Generic dma_noncoherent_ops conversion
      
       - Time handling improvements
      
       - I/O accessor improvements
      
       - Conversion to MEMBLOCK and NO_BOOTMEM, to bring m68k in line with
         other mainstream architectures
      
       - Miscellaneous fixes and cleanups
      
       - Defconfig updates
      
      * tag 'm68k-for-v4.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
        m68k/defconfig: Update defconfigs for v4.18-rc6
        m68k: switch to MEMBLOCK + NO_BOOTMEM
        m68k/page_no.h: force __va argument to be unsigned long
        m68k/bitops: convert __ffs to match generic declaration
        m68k/io: Switch mmu variant to <asm-generic/io.h>
        m68k/io: Move mem*io define guards to <asm/kmap.h>
        Input: hilkbd - Add casts to HP9000/300 I/O accessors
        net: mac8390: Use standard memcpy_{from,to}io()
        m68k/io: Add missing ioremap define guards, fix typo
        m68k: Remove unused set_clock_mmss() helpers
        m68k: mac: Use time64_t in RTC handling
        m68k: Use generic dma_noncoherent_ops
        nubus: Set default dma mask for nubus_board devices
        m68k/mac: Enable PDMA for PowerBook 500 series
      03e61914
  2. 13 8月, 2018 4 次提交
    • L
      Linux 4.18 · 94710cac
      Linus Torvalds 提交于
      94710cac
    • L
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 921195d3
      Linus Torvalds 提交于
      Pull SCSI fixes from James Bottomley:
       "Eight fixes.
      
        The most important one is the mpt3sas fix which makes the driver work
        again on big endian systems. The rest are mostly minor error path or
        checker issues and the vmw_scsi one fixes a performance problem"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: vmw_pvscsi: Return DID_RESET for status SAM_STAT_COMMAND_TERMINATED
        scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled
        scsi: mpt3sas: Swap I/O memory read value back to cpu endianness
        scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO
        scsi: fcoe: drop frames in ELS LOGO error path
        scsi: fcoe: fix use-after-free in fcoe_ctlr_els_send
        scsi: qedi: Fix a potential buffer overflow
        scsi: qla2xxx: Fix memory leak for allocating abort IOCB
      921195d3
    • L
      init: rename and re-order boot_cpu_state_init() · b5b1404d
      Linus Torvalds 提交于
      This is purely a preparatory patch for upcoming changes during the 4.19
      merge window.
      
      We have a function called "boot_cpu_state_init()" that isn't really
      about the bootup cpu state: that is done much earlier by the similarly
      named "boot_cpu_init()" (note lack of "state" in name).
      
      This function initializes some hotplug CPU state, and needs to run after
      the percpu data has been properly initialized.  It even has a comment to
      that effect.
      
      Except it _doesn't_ actually run after the percpu data has been properly
      initialized.  On x86 it happens to do that, but on at least arm and
      arm64, the percpu base pointers are initialized by the arch-specific
      'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init().
      
      This had some unexpected results, and in particular we have a patch
      pending for the merge window that did the obvious cleanup of using
      'this_cpu_write()' in the cpu hotplug init code:
      
        -       per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
        +       this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
      
      which is obviously the right thing to do.  Except because of the
      ordering issue, it actually failed miserably and unexpectedly on arm64.
      
      So this just fixes the ordering, and changes the name of the function to
      be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu
      hotplug state, because the core CPU state was supposed to have already
      been done earlier.
      
      Marked for stable, since the (not yet merged) patch that will show this
      problem is marked for stable.
      Reported-by: NVlastimil Babka <vbabka@suse.cz>
      Reported-by: NMian Yousaf Kaukab <yousaf.kaukab@suse.com>
      Suggested-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5b1404d
    • L
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · d6dd6431
      Linus Torvalds 提交于
      Pull vfs fixes from Al Viro:
       "A bunch of race fixes, mostly around lazy pathwalk.
      
        All of it is -stable fodder, a large part going back to 2013"
      
      * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        make sure that __dentry_kill() always invalidates d_seq, unhashed or not
        fix __legitimize_mnt()/mntput() race
        fix mntput/mntput race
        root dentries need RCU-delayed freeing
      d6dd6431
  3. 12 8月, 2018 5 次提交
  4. 11 8月, 2018 5 次提交
  5. 10 8月, 2018 10 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · e91e2189
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-08-10
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix cpumap and devmap on teardown as they're under RCU context
         and won't have same assumption as running under NAPI protection,
         from Jesper.
      
      2) Fix various sockmap bugs in bpf_tcp_sendmsg() code, e.g. we had
         a bug where socket error was not propagated correctly, from Daniel.
      
      3) Fix incompatible libbpf header license for BTF code and match it
         before it gets officially released with the rest of libbpf which
         is LGPL-2.1, from Martin.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e91e2189
    • A
      make sure that __dentry_kill() always invalidates d_seq, unhashed or not · 4c0d7cd5
      Al Viro 提交于
      RCU pathwalk relies upon the assumption that anything that changes
      ->d_inode of a dentry will invalidate its ->d_seq.  That's almost
      true - the one exception is that the final dput() of already unhashed
      dentry does *not* touch ->d_seq at all.  Unhashing does, though,
      so for anything we'd found by RCU dcache lookup we are fine.
      Unfortunately, we can *start* with an unhashed dentry or jump into
      it.
      
      We could try and be careful in the (few) places where that could
      happen.  Or we could just make the final dput() invalidate the damn
      thing, unhashed or not.  The latter is much simpler and easier to
      backport, so let's do it that way.
      Reported-by: N"Dae R. Jeong" <threeearcat@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4c0d7cd5
    • A
      fix __legitimize_mnt()/mntput() race · 119e1ef8
      Al Viro 提交于
      __legitimize_mnt() has two problems - one is that in case of success
      the check of mount_lock is not ordered wrt preceding increment of
      refcount, making it possible to have successful __legitimize_mnt()
      on one CPU just before the otherwise final mntpu() on another,
      with __legitimize_mnt() not seeing mntput() taking the lock and
      mntput() not seeing the increment done by __legitimize_mnt().
      Solved by a pair of barriers.
      
      Another is that failure of __legitimize_mnt() on the second
      read_seqretry() leaves us with reference that'll need to be
      dropped by caller; however, if that races with final mntput()
      we can end up with caller dropping rcu_read_lock() and doing
      mntput() to release that reference - with the first mntput()
      having freed the damn thing just as rcu_read_lock() had been
      dropped.  Solution: in "do mntput() yourself" failure case
      grab mount_lock, check if MNT_DOOMED has been set by racing
      final mntput() that has missed our increment and if it has -
      undo the increment and treat that as "failure, caller doesn't
      need to drop anything" case.
      
      It's not easy to hit - the final mntput() has to come right
      after the first read_seqretry() in __legitimize_mnt() *and*
      manage to miss the increment done by __legitimize_mnt() before
      the second read_seqretry() in there.  The things that are almost
      impossible to hit on bare hardware are not impossible on SMP
      KVM, though...
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Fixes: 48a066e7 ("RCU'd vsfmounts")
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      119e1ef8
    • A
      fix mntput/mntput race · 9ea0a46c
      Al Viro 提交于
      mntput_no_expire() does the calculation of total refcount under mount_lock;
      unfortunately, the decrement (as well as all increments) are done outside
      of it, leading to false positives in the "are we dropping the last reference"
      test.  Consider the following situation:
      	* mnt is a lazy-umounted mount, kept alive by two opened files.  One
      of those files gets closed.  Total refcount of mnt is 2.  On CPU 42
      mntput(mnt) (called from __fput()) drops one reference, decrementing component
      	* After it has looked at component #0, the process on CPU 0 does
      mntget(), incrementing component #0, gets preempted and gets to run again -
      on CPU 69.  There it does mntput(), which drops the reference (component #69)
      and proceeds to spin on mount_lock.
      	* On CPU 42 our first mntput() finishes counting.  It observes the
      decrement of component #69, but not the increment of component #0.  As the
      result, the total it gets is not 1 as it should've been - it's 0.  At which
      point we decide that vfsmount needs to be killed and proceed to free it and
      shut the filesystem down.  However, there's still another opened file
      on that filesystem, with reference to (now freed) vfsmount, etc. and we are
      screwed.
      
      It's not a wide race, but it can be reproduced with artificial slowdown of
      the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.
      
      Fix consists of moving the refcount decrement under mount_lock; the tricky
      part is that we want (and can) keep the fast case (i.e. mount that still
      has non-NULL ->mnt_ns) entirely out of mount_lock.  All places that zero
      mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
      before that mntput().  IOW, if mntput() observes (under rcu_read_lock())
      a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
      be dropped.
      Reported-by: NJann Horn <jannh@google.com>
      Tested-by: NJann Horn <jannh@google.com>
      Fixes: 48a066e7 ("RCU'd vsfmounts")
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9ea0a46c
    • D
      Merge branch 'bpf-fix-cpu-and-devmap-teardown' · 9c954201
      Daniel Borkmann 提交于
      Jesper Dangaard Brouer says:
      
      ====================
      Removing entries from cpumap and devmap, goes through a number of
      syncronization steps to make sure no new xdp_frames can be enqueued.
      But there is a small chance, that xdp_frames remains which have not
      been flushed/processed yet.  Flushing these during teardown, happens
      from RCU context and not as usual under RX NAPI context.
      
      The optimization introduced in commt 389ab7f0 ("xdp: introduce
      xdp_return_frame_rx_napi"), missed that the flush operation can also
      be called from RCU context.  Thus, we cannot always use the
      xdp_return_frame_rx_napi call, which take advantage of the protection
      provided by XDP RX running under NAPI protection.
      
      The samples/bpf xdp_redirect_cpu have a --stress-mode, that is
      adjusted to easier reproduce (verified by Red Hat QA).
      ====================
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      9c954201
    • J
      xdp: fix bug in devmap teardown code path · 1bf9116d
      Jesper Dangaard Brouer 提交于
      Like cpumap teardown, the devmap teardown code also flush remaining
      xdp_frames, via bq_xmit_all() in case map entry is removed.  The code
      can call xdp_return_frame_rx_napi, from the the wrong context, in-case
      ndo_xdp_xmit() fails.
      
      Fixes: 389ab7f0 ("xdp: introduce xdp_return_frame_rx_napi")
      Fixes: 735fc405 ("xdp: change ndo_xdp_xmit API to support bulking")
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1bf9116d
    • J
      samples/bpf: xdp_redirect_cpu adjustment to reproduce teardown race easier · 37d7ff25
      Jesper Dangaard Brouer 提交于
      The teardown race in cpumap is really hard to reproduce.  These changes
      makes it easier to reproduce, for QA.
      
      The --stress-mode now have a case of a very small queue size of 8, that helps
      to trigger teardown flush to encounter a full queue, which results in calling
      xdp_return_frame API, in a non-NAPI protect context.
      
      Also increase MAX_CPUS, as my QA department have larger machines than me.
      Tested-by: NJean-Tsung Hsiao <jhsiao@redhat.com>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      37d7ff25
    • J
      xdp: fix bug in cpumap teardown code path · ad0ab027
      Jesper Dangaard Brouer 提交于
      When removing a cpumap entry, a number of syncronization steps happen.
      Eventually the teardown code __cpu_map_entry_free is invoked from/via
      call_rcu.
      
      The teardown code __cpu_map_entry_free() flushes remaining xdp_frames,
      by invoking bq_flush_to_queue, which calls xdp_return_frame_rx_napi().
      The issues is that the teardown code is not running in the RX NAPI
      code path.  Thus, it is not allowed to invoke the NAPI variant of
      xdp_return_frame.
      
      This bug was found and triggered by using the --stress-mode option to
      the samples/bpf program xdp_redirect_cpu.  It is hard to trigger,
      because the ptr_ring have to be full and cpumap bulk queue max
      contains 8 packets, and a remote CPU is racing to empty the ptr_ring
      queue.
      
      Fixes: 389ab7f0 ("xdp: introduce xdp_return_frame_rx_napi")
      Tested-by: NJean-Tsung Hsiao <jhsiao@redhat.com>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      ad0ab027
    • L
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 112cbae2
      Linus Torvalds 提交于
      Pull crypto fix from Herbert Xu:
       "This fixes a performance regression in arm64 NEON crypto as well as a
        crash in x86 aegis/morus on unsupported CPUs"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: x86/aegis,morus - Fix and simplify CPUID checks
        crypto: arm64 - revert NEON yield for fast AEAD implementations
      112cbae2
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 6395ad85
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) The real fix for the ipv6 route metric leak Sabrina was seeing, from
          Cong Wang.
      
       2) Fix syzbot triggers AF_PACKET v3 ring buffer insufficient room
          conditions, from Willem de Bruijn.
      
       3) vsock can reinitialize active work struct, fix from Cong Wang.
      
       4) RXRPC keepalive generator can wedge a cpu, fix from David Howells.
      
       5) Fix locking in AF_SMC ioctl, from Ursula Braun.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        dsa: slave: eee: Allow ports to use phylink
        net/smc: move sock lock in smc_ioctl()
        net/smc: allow sysctl rmem and wmem defaults for servers
        net/smc: no shutdown in state SMC_LISTEN
        net: aquantia: Fix IFF_ALLMULTI flag functionality
        rxrpc: Fix the keepalive generator [ver #2]
        net/mlx5e: Cleanup of dcbnl related fields
        net/mlx5e: Properly check if hairpin is possible between two functions
        vhost: reset metadata cache when initializing new IOTLB
        llc: use refcount_inc_not_zero() for llc_sap_find()
        dccp: fix undefined behavior with 'cwnd' shift in ccid2_cwnd_restart()
        tipc: fix an interrupt unsafe locking scenario
        vsock: split dwork to avoid reinitializations
        net: thunderx: check for failed allocation lmac->dmacs
        cxgb4: mk_act_open_req() buggers ->{local, peer}_ip on big-endian hosts
        packet: refine ring v3 block size test to hold one frame
        ip6_tunnel: use the right value for ipv4 min mtu check in ip6_tnl_xmit
        ipv6: fix double refcount of fib6_metrics
      6395ad85
  6. 09 8月, 2018 6 次提交