1. 10 8月, 2018 4 次提交
    • D
      Merge branch 'bpf-fix-cpu-and-devmap-teardown' · 9c954201
      Daniel Borkmann 提交于
      Jesper Dangaard Brouer says:
      
      ====================
      Removing entries from cpumap and devmap, goes through a number of
      syncronization steps to make sure no new xdp_frames can be enqueued.
      But there is a small chance, that xdp_frames remains which have not
      been flushed/processed yet.  Flushing these during teardown, happens
      from RCU context and not as usual under RX NAPI context.
      
      The optimization introduced in commt 389ab7f0 ("xdp: introduce
      xdp_return_frame_rx_napi"), missed that the flush operation can also
      be called from RCU context.  Thus, we cannot always use the
      xdp_return_frame_rx_napi call, which take advantage of the protection
      provided by XDP RX running under NAPI protection.
      
      The samples/bpf xdp_redirect_cpu have a --stress-mode, that is
      adjusted to easier reproduce (verified by Red Hat QA).
      ====================
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      9c954201
    • J
      xdp: fix bug in devmap teardown code path · 1bf9116d
      Jesper Dangaard Brouer 提交于
      Like cpumap teardown, the devmap teardown code also flush remaining
      xdp_frames, via bq_xmit_all() in case map entry is removed.  The code
      can call xdp_return_frame_rx_napi, from the the wrong context, in-case
      ndo_xdp_xmit() fails.
      
      Fixes: 389ab7f0 ("xdp: introduce xdp_return_frame_rx_napi")
      Fixes: 735fc405 ("xdp: change ndo_xdp_xmit API to support bulking")
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1bf9116d
    • J
      samples/bpf: xdp_redirect_cpu adjustment to reproduce teardown race easier · 37d7ff25
      Jesper Dangaard Brouer 提交于
      The teardown race in cpumap is really hard to reproduce.  These changes
      makes it easier to reproduce, for QA.
      
      The --stress-mode now have a case of a very small queue size of 8, that helps
      to trigger teardown flush to encounter a full queue, which results in calling
      xdp_return_frame API, in a non-NAPI protect context.
      
      Also increase MAX_CPUS, as my QA department have larger machines than me.
      Tested-by: NJean-Tsung Hsiao <jhsiao@redhat.com>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      37d7ff25
    • J
      xdp: fix bug in cpumap teardown code path · ad0ab027
      Jesper Dangaard Brouer 提交于
      When removing a cpumap entry, a number of syncronization steps happen.
      Eventually the teardown code __cpu_map_entry_free is invoked from/via
      call_rcu.
      
      The teardown code __cpu_map_entry_free() flushes remaining xdp_frames,
      by invoking bq_flush_to_queue, which calls xdp_return_frame_rx_napi().
      The issues is that the teardown code is not running in the RX NAPI
      code path.  Thus, it is not allowed to invoke the NAPI variant of
      xdp_return_frame.
      
      This bug was found and triggered by using the --stress-mode option to
      the samples/bpf program xdp_redirect_cpu.  It is hard to trigger,
      because the ptr_ring have to be full and cpumap bulk queue max
      contains 8 packets, and a remote CPU is racing to empty the ptr_ring
      queue.
      
      Fixes: 389ab7f0 ("xdp: introduce xdp_return_frame_rx_napi")
      Tested-by: NJean-Tsung Hsiao <jhsiao@redhat.com>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      ad0ab027
  2. 09 8月, 2018 4 次提交
  3. 06 8月, 2018 7 次提交
  4. 05 8月, 2018 8 次提交
    • L
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b9fb1fc7
      Linus Torvalds 提交于
      Pull irq fix from Thomas Gleixner:
       "A single bugfix for the irq core to prevent silent data corruption and
        malfunction of threaded interrupts under certain conditions"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Make force irq threading setup more robust
      b9fb1fc7
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 212dab05
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Handle frames in error situations properly in AF_XDP, from Jakub
          Kicinski.
      
       2) tcp_mmap test case only tests ipv6 due to a thinko, fix from
          Maninder Singh.
      
       3) Session refcnt fix in l2tp_ppp, from Guillaume Nault.
      
       4) Fix regression in netlink bind handling of multicast gruops, from
          Dmitry Safonov.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        netlink: Don't shift on 64 for ngroups
        net/smc: no cursor update send in state SMC_INIT
        l2tp: fix missing refcount drop in pppol2tp_tunnel_ioctl()
        mlxsw: core_acl_flex_actions: Remove redundant mirror resource destruction
        mlxsw: core_acl_flex_actions: Remove redundant counter destruction
        mlxsw: core_acl_flex_actions: Remove redundant resource destruction
        mlxsw: core_acl_flex_actions: Return error for conflicting actions
        selftests/bpf: update test_lwt_seg6local.sh according to iproute2
        drivers: net: lmc: fix case value for target abort error
        selftest/net: fix protocol family to work for IPv4.
        net: xsk: don't return frames via the allocator on error
        tools/bpftool: fix a percpu_array map dump problem
      212dab05
    • L
      Merge tag 'usercopy-fix-v4.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 60f5a217
      Linus Torvalds 提交于
      Pull usercopy whitelisting fix from Kees Cook:
       "Bart Massey discovered that the usercopy whitelist for JFS was
        incomplete: the inline inode data may intentionally "overflow" into
        the neighboring "extended area", so the size of the whitelist needed
        to be raised to include the neighboring field"
      
      * tag 'usercopy-fix-v4.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        jfs: Fix usercopy whitelist for inline inode data
      60f5a217
    • L
      Merge tag 'xfs-4.18-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · f639bef5
      Linus Torvalds 提交于
      Pull xfs bugfix from Darrick Wong:
       "One more patch for 4.18 to fix a coding error in the iomap_bmap()
        function introduced in -rc1: fix incorrect shifting"
      
      * tag 'xfs-4.18-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        fs: fix iomap_bmap position calculation
      f639bef5
    • L
      Partially revert "block: fail op_is_write() requests to read-only partitions" · a32e236e
      Linus Torvalds 提交于
      It turns out that commit 721c7fc7 ("block: fail op_is_write()
      requests to read-only partitions"), while obviously correct, causes
      problems for some older lvm2 installations.
      
      The reason is that the lvm snapshotting will continue to write to the
      snapshow COW volume, even after the volume has been marked read-only.
      End result: snapshot failure.
      
      This has actually been fixed in newer version of the lvm2 tool, but the
      old tools still exist, and the breakage was reported both in the kernel
      bugzilla and in the Debian bugzilla:
      
        https://bugzilla.kernel.org/show_bug.cgi?id=200439
        https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900442
      
      The lvm2 fix is here
      
        https://sourceware.org/git/?p=lvm2.git;a=commit;h=a6fdb9d9d70f51c49ad11a87ab4243344e6701a3
      
      but until everybody has updated to recent versions, we'll have to weaken
      the "never write to read-only partitions" check.  It now allows the
      write to happen, but causes a warning, something like this:
      
        generic_make_request: Trying to write to read-only block-device dm-3 (partno X)
        Modules linked in: nf_tables xt_cgroup xt_owner kvm_intel iwlmvm kvm irqbypass iwlwifi
        CPU: 1 PID: 77 Comm: kworker/1:1 Not tainted 4.17.9-gentoo #3
        Hardware name: LENOVO 20B6A019RT/20B6A019RT, BIOS GJET91WW (2.41 ) 09/21/2016
        Workqueue: ksnaphd do_metadata
        RIP: 0010:generic_make_request_checks+0x4ac/0x600
        ...
        Call Trace:
         generic_make_request+0x64/0x400
         submit_bio+0x6c/0x140
         dispatch_io+0x287/0x430
         sync_io+0xc3/0x120
         dm_io+0x1f8/0x220
         do_metadata+0x1d/0x30
         process_one_work+0x1b9/0x3e0
         worker_thread+0x2b/0x3c0
         kthread+0x113/0x130
         ret_from_fork+0x35/0x40
      
      Note that this is a "revert" in behavior only.  I'm leaving alone the
      actual code cleanups in commit 721c7fc7, but letting the previously
      uncaught request go through with a warning instead of stopping it.
      
      Fixes: 721c7fc7 ("block: fail op_is_write() requests to read-only partitions")
      Reported-and-tested-by: NWGH <wgh@torlan.ru>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Ilya Dryomov <idryomov@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Zdenek Kabelac <zkabelac@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a32e236e
    • D
      netlink: Don't shift on 64 for ngroups · 91874ecf
      Dmitry Safonov 提交于
      It's legal to have 64 groups for netlink_sock.
      
      As user-supplied nladdr->nl_groups is __u32, it's possible to subscribe
      only to first 32 groups.
      
      The check for correctness of .bind() userspace supplied parameter
      is done by applying mask made from ngroups shift. Which broke Android
      as they have 64 groups and the shift for mask resulted in an overflow.
      
      Fixes: 61f4b237 ("netlink: Don't shift with UB on nlk->ngroups")
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: netdev@vger.kernel.org
      Cc: stable@vger.kernel.org
      Reported-and-Tested-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NDmitry Safonov <dima@arista.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91874ecf
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 5dbfb6ec
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-08-05
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix bpftool percpu_array dump by using correct roundup to next
         multiple of 8 for the value size, from Yonghong.
      
      2) Fix in AF_XDP's __xsk_rcv_zc() to not returning frames back to
         allocator since driver will recycle frame anyway in case of an
         error, from Jakub.
      
      3) Fix up BPF test_lwt_seg6local test cases to final iproute2
         syntax, from Mathieu.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5dbfb6ec
    • U
      net/smc: no cursor update send in state SMC_INIT · 5607016c
      Ursula Braun 提交于
      If a writer blocked condition is received without data, the current
      consumer cursor is immediately sent. Servers could already receive this
      condition in state SMC_INIT without finished tx-setup. This patch
      avoids sending a consumer cursor update in this case.
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5607016c
  5. 04 8月, 2018 14 次提交
  6. 03 8月, 2018 3 次提交
    • F
      nohz: Fix missing tick reprogram when interrupting an inline softirq · 0a0e0829
      Frederic Weisbecker 提交于
      The full nohz tick is reprogrammed in irq_exit() only if the exit is not in
      a nesting interrupt. This stands as an optimization: whether a hardirq or a
      softirq is interrupted, the tick is going to be reprogrammed when necessary
      at the end of the inner interrupt, with even potential new updates on the
      timer queue.
      
      When soft interrupts are interrupted, it's assumed that they are executing
      on the tail of an interrupt return. In that case tick_nohz_irq_exit() is
      called after softirq processing to take care of the tick reprogramming.
      
      But the assumption is wrong: softirqs can be processed inline as well, ie:
      outside of an interrupt, like in a call to local_bh_enable() or from
      ksoftirqd.
      
      Inline softirqs don't reprogram the tick once they are done, as opposed to
      interrupt tail softirq processing. So if a tick interrupts an inline
      softirq processing, the next timer will neither be reprogrammed from the
      interrupting tick's irq_exit() nor after the interrupted softirq
      processing. This situation may leave the tick unprogrammed while timers are
      armed.
      
      To fix this, simply keep reprogramming the tick even if a softirq has been
      interrupted. That can be optimized further, but for now correctness is more
      important.
      
      Note that new timers enqueued in nohz_full mode after a softirq gets
      interrupted will still be handled just fine through self-IPIs triggered by
      the timer code.
      Reported-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: stable@vger.kernel.org # 4.14+
      Link: https://lkml.kernel.org/r/1533303094-15855-1-git-send-email-frederic@kernel.org
      0a0e0829
    • T
      genirq: Make force irq threading setup more robust · d1f0301b
      Thomas Gleixner 提交于
      The support of force threading interrupts which are set up with both a
      primary and a threaded handler wreckaged the setup of regular requested
      threaded interrupts (primary handler == NULL).
      
      The reason is that it does not check whether the primary handler is set to
      the default handler which wakes the handler thread. Instead it replaces the
      thread handler with the primary handler as it would do with force threaded
      interrupts which have been requested via request_irq(). So both the primary
      and the thread handler become the same which then triggers the warnon that
      the thread handler tries to wakeup a not configured secondary thread.
      
      Fortunately this only happens when the driver omits the IRQF_ONESHOT flag
      when requesting the threaded interrupt, which is normaly caught by the
      sanity checks when force irq threading is disabled.
      
      Fix it by skipping the force threading setup when a regular threaded
      interrupt is requested. As a consequence the interrupt request which lacks
      the IRQ_ONESHOT flag is rejected correctly instead of silently wreckaging
      it.
      
      Fixes: 2a1d3ab8 ("genirq: Handle force threading of irqs with primary and thread handler")
      Reported-by: NKurt Kanzenbach <kurt.kanzenbach@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NKurt Kanzenbach <kurt.kanzenbach@linutronix.de>
      Cc: stable@vger.kernel.org
      d1f0301b
    • M
      selftests/bpf: update test_lwt_seg6local.sh according to iproute2 · 8c85cbdf
      Mathieu Xhonneux 提交于
      The shell file for test_lwt_seg6local contains an early iproute2 syntax
      for installing a seg6local End.BPF route. iproute2 support for this
      feature has recently been upstreamed, but with an additional keyword
      required. This patch updates test_lwt_seg6local.sh to the definitive
      iproute2 syntax
      Signed-off-by: NMathieu Xhonneux <m.xhonneux@gmail.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      8c85cbdf