1. 03 7月, 2018 1 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 4e33d7d4
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Verify netlink attributes properly in nf_queue, from Eric Dumazet.
      
       2) Need to bump memory lock rlimit for test_sockmap bpf test, from
          Yonghong Song.
      
       3) Fix VLAN handling in lan78xx driver, from Dave Stevenson.
      
       4) Fix uninitialized read in nf_log, from Jann Horn.
      
       5) Fix raw command length parsing in mlx5, from Alex Vesker.
      
       6) Cleanup loopback RDS connections upon netns deletion, from Sowmini
          Varadhan.
      
       7) Fix regressions in FIB rule matching during create, from Jason A.
          Donenfeld and Roopa Prabhu.
      
       8) Fix mpls ether type detection in nfp, from Pieter Jansen van Vuuren.
      
       9) More bpfilter build fixes/adjustments from Masahiro Yamada.
      
      10) Fix XDP_{TX,REDIRECT} flushing in various drivers, from Jesper
          Dangaard Brouer.
      
      11) fib_tests.sh file permissions were broken, from Shuah Khan.
      
      12) Make sure BH/preemption is disabled in data path of mac80211, from
          Denis Kenzior.
      
      13) Don't ignore nla_parse_nested() return values in nl80211, from
          Johannes berg.
      
      14) Properly account sock objects ot kmemcg, from Shakeel Butt.
      
      15) Adjustments to setting bpf program permissions to read-only, from
          Daniel Borkmann.
      
      16) TCP Fast Open key endianness was broken, it always took on the host
          endiannness. Whoops. Explicitly make it little endian. From Yuching
          Cheng.
      
      17) Fix prefix route setting for link local addresses in ipv6, from
          David Ahern.
      
      18) Potential Spectre v1 in zatm driver, from Gustavo A. R. Silva.
      
      19) Various bpf sockmap fixes, from John Fastabend.
      
      20) Use after free for GRO with ESP, from Sabrina Dubroca.
      
      21) Passing bogus flags to crypto_alloc_shash() in ipv6 SR code, from
          Eric Biggers.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
        qede: Adverstise software timestamp caps when PHC is not available.
        qed: Fix use of incorrect size in memcpy call.
        qed: Fix setting of incorrect eswitch mode.
        qed: Limit msix vectors in kdump kernel to the minimum required count.
        ipvlan: call dev_change_flags when ipvlan mode is reset
        ipv6: sr: fix passing wrong flags to crypto_alloc_shash()
        net: fix use-after-free in GRO with ESP
        tcp: prevent bogus FRTO undos with non-SACK flows
        bpf: sockhash, add release routine
        bpf: sockhash fix omitted bucket lock in sock_close
        bpf: sockmap, fix smap_list_map_remove when psock is in many maps
        bpf: sockmap, fix crash when ipv6 sock is added
        net: fib_rules: bring back rule_exists to match rule during add
        hv_netvsc: split sub-channel setup into async and sync
        net: use dev_change_tx_queue_len() for SIOCSIFTXQLEN
        atm: zatm: Fix potential Spectre v1
        s390/qeth: consistently re-enable device features
        s390/qeth: don't clobber buffer on async TX completion
        s390/qeth: avoid using is_multicast_ether_addr_64bits on (u8 *)[6]
        s390/qeth: fix race when setting MAC address
        ...
      4e33d7d4
  2. 02 7月, 2018 15 次提交
  3. 01 7月, 2018 14 次提交
    • I
      tcp: prevent bogus FRTO undos with non-SACK flows · 1236f22f
      Ilpo Järvinen 提交于
      If SACK is not enabled and the first cumulative ACK after the RTO
      retransmission covers more than the retransmitted skb, a spurious
      FRTO undo will trigger (assuming FRTO is enabled for that RTO).
      The reason is that any non-retransmitted segment acknowledged will
      set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
      no indication that it would have been delivered for real (the
      scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
      case so the check for that bit won't help like it does with SACK).
      Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo
      in tcp_process_loss.
      
      We need to use more strict condition for non-SACK case and check
      that none of the cumulatively ACKed segments were retransmitted
      to prove that progress is due to original transmissions. Only then
      keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in
      non-SACK case.
      
      (FLAG_ORIG_SACK_ACKED is planned to be renamed to FLAG_ORIG_PROGRESS
      to better indicate its purpose but to keep this change minimal, it
      will be done in another patch).
      
      Besides burstiness and congestion control violations, this problem
      can result in RTO loop: When the loss recovery is prematurely
      undoed, only new data will be transmitted (if available) and
      the next retransmission can occur only after a new RTO which in case
      of multiple losses (that are not for consecutive packets) requires
      one RTO per loss to recover.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Tested-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1236f22f
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 271b955e
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-07-01
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) A bpf_fib_lookup() helper fix to change the API before freeze to
         return an encoding of the FIB lookup result and return the nexthop
         device index in the params struct (instead of device index as return
         code that we had before), from David.
      
      2) Various BPF JIT fixes to address syzkaller fallout, that is, do not
         reject progs when set_memory_*() fails since it could still be RO.
         Also arm32 JIT was not using bpf_jit_binary_lock_ro() API which was
         an issue, and a memory leak in s390 JIT found during review, from
         Daniel.
      
      3) Multiple fixes for sockmap/hash to address most of the syzkaller
         triggered bugs. Usage with IPv6 was crashing, a GPF in bpf_tcp_close(),
         a missing sock_map_release() routine to hook up to callbacks, and a
         fix for an omitted bucket lock in sock_close(), from John.
      
      4) Two bpftool fixes to remove duplicated error message on program load,
         and another one to close the libbpf object after program load. One
         additional fix for nfp driver's BPF offload to avoid stopping offload
         completely if replace of program failed, from Jakub.
      
      5) Couple of BPF selftest fixes that bail out in some of the test
         scripts if the user does not have the right privileges, from Jeffrin.
      
      6) Fixes in test_bpf for s390 when CONFIG_BPF_JIT_ALWAYS_ON is set
         where we need to set the flag that some of the test cases are expected
         to fail, from Kleber.
      
      7) Fix to detangle BPF_LIRC_MODE2 dependency from CONFIG_CGROUP_BPF
         since it has no relation to it and lirc2 users often have configs
         without cgroups enabled and thus would not be able to use it, from Sean.
      
      8) Fix a selftest failure in sockmap by removing a useless setrlimit()
         call that would set a too low limit where at the same time we are
         already including bpf_rlimit.h that does the job, from Yonghong.
      
      9) Fix BPF selftest config with missing missing NET_SCHED, from Anders.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      271b955e
    • D
      Merge branch 'bpf-sockmap-fixes' · bf2b866a
      Daniel Borkmann 提交于
      John Fastabend says:
      
      ====================
      This addresses two syzbot issues that lead to identifying (by Eric and
      Wei) a class of bugs where we don't correctly check for IPv4/v6
      sockets and their associated state. The second issue was a locking
      omission in sockhash.
      
      The first patch addresses IPv6 socks and fixing an error where
      sockhash would overwrite the prot pointer with IPv4 prot. To fix
      this build similar solution to TLS ULP. Although we continue to
      allow socks in all states not just ESTABLISH in this patch set
      because as Martin points out there should be no issue with this
      on the sockmap ULP because we don't use the ctx in this code. Once
      multiple ULPs coexist we may need to revisit this. However we
      can do this in *next trees.
      
      The other issue syzbot found that the tcp_close() handler missed
      locking the hash bucket lock which could result in corrupting the
      sockhash bucket list if delete and close ran at the same time.
      And also the smap_list_remove() routine was not working correctly
      at all. This was not caught in my testing because in general my
      tests (to date at least lets add some more robust selftest in
      bpf-next) do things in the "expected" order, create map, add socks,
      delete socks, then tear down maps. The tests we have that do the
      ops out of this order where only working on single maps not multi-
      maps so we never saw the issue. Thanks syzbot. The fix is to
      restructure the tcp_close() lock handling. And fix the obvious
      bug in smap_list_remove().
      
      Finally, during review I noticed the release handler was omitted
      from the upstream code (patch 4) due to an incorrect merge conflict
      fix when I ported the code to latest bpf-next before submitting.
      This would leave references to the map around if the user never
      closes the map.
      
      v3: rework patches, dropping ESTABLISH check and adding rcu
          annotation along with the smap_list_remove fix
      
      v4: missed one more case where maps was being accessed without
          the sk_callback_lock, spoted by Martin as well.
      
      v5: changed to use a specific lock for maps and reduced callback
          lock so that it is only used to gaurd sk callbacks. I think
          this makes the logic a bit cleaner and avoids confusion
          ovoer what each lock is doing.
      
      Also big thanks to Martin for thorough review he caught at least
      one case where I missed a rcu_call().
      ====================
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      bf2b866a
    • J
      bpf: sockhash, add release routine · caac76a5
      John Fastabend 提交于
      Add map_release_uref pointer to hashmap ops. This was dropped when
      original sockhash code was ported into bpf-next before initial
      commit.
      
      Fixes: 81110384 ("bpf: sockmap, add hash map support")
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      caac76a5
    • J
      bpf: sockhash fix omitted bucket lock in sock_close · e9db4ef6
      John Fastabend 提交于
      First the sk_callback_lock() was being used to protect both the
      sock callback hooks and the psock->maps list. This got overly
      convoluted after the addition of sockhash (in sockmap it made
      some sense because masp and callbacks were tightly coupled) so
      lets split out a specific lock for maps and only use the callback
      lock for its intended purpose. This fixes a couple cases where
      we missed using maps lock when it was in fact needed. Also this
      makes it easier to follow the code because now we can put the
      locking closer to the actual code its serializing.
      
      Next, in sock_hash_delete_elem() the pattern was as follows,
      
        sock_hash_delete_elem()
           [...]
           spin_lock(bucket_lock)
           l = lookup_elem_raw()
           if (l)
              hlist_del_rcu()
              write_lock(sk_callback_lock)
               .... destroy psock ...
              write_unlock(sk_callback_lock)
           spin_unlock(bucket_lock)
      
      The ordering is necessary because we only know the {p}sock after
      dereferencing the hash table which we can't do unless we have the
      bucket lock held. Once we have the bucket lock and the psock element
      it is deleted from the hashmap to ensure any other path doing a lookup
      will fail. Finally, the refcnt is decremented and if zero the psock
      is destroyed.
      
      In parallel with the above (or free'ing the map) a tcp close event
      may trigger tcp_close(). Which at the moment omits the bucket lock
      altogether (oops!) where the flow looks like this,
      
        bpf_tcp_close()
           [...]
           write_lock(sk_callback_lock)
           for each psock->maps // list of maps this sock is part of
               hlist_del_rcu(ref_hash_node);
               .... destroy psock ...
           write_unlock(sk_callback_lock)
      
      Obviously, and demonstrated by syzbot, this is broken because
      we can have multiple threads deleting entries via hlist_del_rcu().
      
      To fix this we might be tempted to wrap the hlist operation in a
      bucket lock but that would create a lock inversion problem. In
      summary to follow locking rules the psocks maps list needs the
      sk_callback_lock (after this patch maps_lock) but we need the bucket
      lock to do the hlist_del_rcu.
      
      To resolve the lock inversion problem pop the head of the maps list
      repeatedly and remove the reference until no more are left. If a
      delete happens in parallel from the BPF API that is OK as well because
      it will do a similar action, lookup the lock in the map/hash, delete
      it from the map/hash, and dec the refcnt. We check for this case
      before doing a destroy on the psock to ensure we don't have two
      threads tearing down a psock. The new logic is as follows,
      
        bpf_tcp_close()
        e = psock_map_pop(psock->maps) // done with map lock
        bucket_lock() // lock hash list bucket
        l = lookup_elem_raw(head, hash, key, key_size);
        if (l) {
           //only get here if elmnt was not already removed
           hlist_del_rcu()
           ... destroy psock...
        }
        bucket_unlock()
      
      And finally for all the above to work add missing locking around  map
      operations per above. Then add RCU annotations and use
      rcu_dereference/rcu_assign_pointer to manage values relying on RCU so
      that the object is not free'd from sock_hash_free() while it is being
      referenced in bpf_tcp_close().
      
      Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com
      Fixes: 81110384 ("bpf: sockmap, add hash map support")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      e9db4ef6
    • J
      bpf: sockmap, fix smap_list_map_remove when psock is in many maps · 54fedb42
      John Fastabend 提交于
      If a hashmap is free'd with open socks it removes the reference to
      the hash entry from the psock. If that is the last reference to the
      psock then it will also be free'd by the reference counting logic.
      However the current logic that removes the hash reference from the
      list of references is broken. In smap_list_remove() we first check
      if the sockmap entry matches and then check if the hashmap entry
      matches. But, the sockmap entry sill always match because its NULL in
      this case which causes the first entry to be removed from the list.
      If this is always the "right" entry (because the user adds/removes
      entries in order) then everything is OK but otherwise a subsequent
      bpf_tcp_close() may reference a free'd object.
      
      To fix this create two list handlers one for sockmap and one for
      sockhash.
      
      Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com
      Fixes: 81110384 ("bpf: sockmap, add hash map support")
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      54fedb42
    • J
      bpf: sockmap, fix crash when ipv6 sock is added · 9901c5d7
      John Fastabend 提交于
      This fixes a crash where we assign tcp_prot to IPv6 sockets instead
      of tcpv6_prot.
      
      Previously we overwrote the sk->prot field with tcp_prot even in the
      AF_INET6 case. This patch ensures the correct tcp_prot and tcpv6_prot
      are used.
      
      Tested with 'netserver -6' and 'netperf -H [IPv6]' as well as
      'netperf -H [IPv4]'. The ESTABLISHED check resolves the previously
      crashing case here.
      
      Fixes: 174a79ff ("bpf: sockmap with sk redirect support")
      Reported-by: syzbot+5c063698bdbfac19f363@syzkaller.appspotmail.com
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      9901c5d7
    • L
      Merge branch 'parisc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 883c9ab9
      Linus Torvalds 提交于
      Pull parisc fixes and cleanups from Helge Deller:
       "Nothing exiting in this patchset, just
      
         - small cleanups of header files
      
         - default to 4 CPUs when building a SMP kernel
      
         - mark 16kB and 64kB page sizes broken
      
         - addition of the new io_pgetevents syscall"
      
      * 'parisc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Build kernel without -ffunction-sections
        parisc: Reduce debug output in unwind code
        parisc: Wire up io_pgetevents syscall
        parisc: Default to 4 SMP CPUs
        parisc: Convert printk(KERN_LEVEL) to pr_lvl()
        parisc: Mark 16kB and 64kB page sizes BROKEN
        parisc: Drop struct sigaction from not exported header file
      883c9ab9
    • L
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 08af78d7
      Linus Torvalds 提交于
      Pull ARM SoC fixes from Olof Johansson:
       "A smaller batch for the end of the week (let's see if I can keep the
        weekly cadence going for once).
      
        All medium-grade fixes here, nothing worrisome:
      
         - Fixes for some fairly old bugs around SD card write-protect
           detection and GPIO interrupt assignments on Davinci.
      
         - Wifi module suspend fix for Hikey.
      
         - Minor DT tweaks to fix inaccuracies for Amlogic platforms, one
           of which solves booting with third-party u-boot"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        arm64: dts: hikey960: Define wl1837 power capabilities
        arm64: dts: hikey: Define wl1835 power capabilities
        ARM64: dts: meson-gxl: fix Mali GPU compatible string
        ARM64: dts: meson-axg: fix ethernet stability issue
        ARM64: dts: meson-gx: fix ATF reserved memory region
        ARM64: dts: meson-gxl-s905x-p212: Add phy-supply for usb0
        ARM64: dts: meson: fix register ranges for SD/eMMC
        ARM64: dts: meson: disable sd-uhs modes on the libretech-cc
        ARM: dts: da850: Fix interrups property for gpio
        ARM: davinci: board-da850-evm: fix WP pin polarity for MMC/SD
      08af78d7
    • L
      Merge tag 'kbuild-fixes-v4.18' of... · 22d3e0c3
      Linus Torvalds 提交于
      Merge tag 'kbuild-fixes-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - introduce __diag_* macros and suppress -Wattribute-alias warnings
         from GCC 8
      
       - fix stack protector test script for x86_64
      
       - fix line number handling in Kconfig
      
       - document that '#' starts a comment in Kconfig
      
       - handle P_SYMBOL property in dump debugging of Kconfig
      
       - correct help message of LD_DEAD_CODE_DATA_ELIMINATION
      
       - fix occasional segmentation faults in Kconfig
      
      * tag 'kbuild-fixes-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kconfig: loop boundary condition fix
        kbuild: reword help of LD_DEAD_CODE_DATA_ELIMINATION
        kconfig: handle P_SYMBOL in print_symbol()
        kconfig: document Kconfig source file comments
        kconfig: fix line numbers for if-entries in menu tree
        stack-protector: Fix test with 32-bit userland and CONFIG_64BIT=y
        powerpc: Remove -Wattribute-alias pragmas
        disable -Wattribute-alias warning for SYSCALL_DEFINEx()
        kbuild: add macro for controlling warnings to linux/compiler.h
      22d3e0c3
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0fbc4aea
      Linus Torvalds 提交于
      Pull x86 fixes from Ingo Molnar:
       "The biggest diffstat comes from self-test updates, plus there's entry
        code fixes, 5-level paging related fixes, console debug output fixes,
        and misc fixes"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mm: Clean up the printk()s in show_fault_oops()
        x86/mm: Drop unneeded __always_inline for p4d page table helpers
        x86/efi: Fix efi_call_phys_epilog() with CONFIG_X86_5LEVEL=y
        selftests/x86/sigreturn: Do minor cleanups
        selftests/x86/sigreturn/64: Fix spurious failures on AMD CPUs
        x86/entry/64/compat: Fix "x86/entry/64/compat: Preserve r8-r11 in int $0x80"
        x86/mm: Don't free P4D table when it is folded at runtime
        x86/entry/32: Add explicit 'l' instruction suffix
        x86/mm: Get rid of KERN_CONT in show_fault_oops()
      0fbc4aea
    • L
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d7d53886
      Linus Torvalds 提交于
      Pull perf fixes from Ingo Molnar:
       "Tooling fixes mostly, plus a build warning fix"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
        perf/core: Move inline keyword at the beginning of declaration
        tools/headers: Pick up latest kernel ABIs
        perf tools: Fix crash caused by accessing feat_ops[HEADER_LAST_FEATURE]
        perf script: Fix crash because of missing evsel->priv
        perf script: Add missing output fields in a hint
        perf bench: Fix numa report output code
        perf stat: Remove duplicate event counting
        perf alias: Rebuild alias expression string to make it comparable
        perf alias: Remove trailing newline when reading sysfs files
        perf tools: Fix a clang 7.0 compilation error
        tools include uapi: Synchronize bpf.h with the kernel
        tools include uapi: Update if_link.h to pick IFLA_{BRPORT_ISOLATED,VXLAN_TTL_INHERIT}
        tools include powerpc: Update arch/powerpc/include/uapi/asm/unistd.h copy to get 'rseq' syscall
        perf tools: Update x86's syscall_64.tbl, adding 'io_pgetevents' and 'rseq'
        tools headers uapi: Synchronize drm/drm.h
        perf intel-pt: Fix packet decoding of CYC packets
        perf tests: Add valid callback for parse-events test
        perf tests: Add event parsing error handling to parse events test
        perf report powerpc: Fix crash if callchain is empty
        perf test session topology: Fix test on s390
        ...
      d7d53886
    • L
      Merge tag 'selinux-pr-20180629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · 34a484d5
      Linus Torvalds 提交于
      Pull selinux fix from Paul Moore:
       "One fairly straightforward patch to fix a longstanding issue where a
        process could stall while accessing files in selinuxfs and block
        everyone else due to a held mutex.
      
        The patch passes all our tests and looks to apply cleanly to your
        current tree"
      
      * tag 'selinux-pr-20180629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
        selinux: move user accesses in selinuxfs out of locked regions
      34a484d5
    • L
      Merge tag 'for-linus-20180629' of git://git.kernel.dk/linux-block · e6e5bec4
      Linus Torvalds 提交于
      Pull block fixes from Jens Axboe:
       "Small set of fixes for this series. Mostly just minor fixes, the only
        oddball in here is the sg change.
      
        The sg change came out of the stall fix for NVMe, where we added a
        mempool and limited us to a single page allocation. CONFIG_SG_DEBUG
        sort-of ruins that, since we'd need to account for that. That's
        actually a generic problem, since lots of drivers need to allocate SG
        lists. So this just removes support for CONFIG_SG_DEBUG, which I added
        back in 2007 and to my knowledge it was never useful.
      
        Anyway, outside of that, this pull contains:
      
         - clone of request with special payload fix (Bart)
      
         - drbd discard handling fix (Bart)
      
         - SATA blk-mq stall fix (me)
      
         - chunk size fix (Keith)
      
         - double free nvme rdma fix (Sagi)"
      
      * tag 'for-linus-20180629' of git://git.kernel.dk/linux-block:
        sg: remove ->sg_magic member
        drbd: Fix drbd_request_prepare() discard handling
        blk-mq: don't queue more if we get a busy return
        block: Fix cloning of requests with a special payload
        nvme-rdma: fix possible double free of controller async event buffer
        block: Fix transfer when chunk sectors exceeds max
      e6e5bec4
  4. 30 6月, 2018 10 次提交
    • R
      net: fib_rules: bring back rule_exists to match rule during add · 35e8c7ba
      Roopa Prabhu 提交于
      After commit f9d4b0c1 ("fib_rules: move common handling of newrule
      delrule msgs into fib_nl2rule"), rule_exists got replaced by rule_find
      for existing rule lookup in both the add and del paths. While this
      is good for the delete path, it solves a few problems but opens up
      a few invalid key matches in the add path.
      
      $ip -4 rule add table main tos 10 fwmark 1
      $ip -4 rule add table main tos 10
      RTNETLINK answers: File exists
      
      The problem here is rule_find does not check if the key masks in
      the new and old rule are the same and hence ends up matching a more
      secific rule. Rule key masks cannot be easily compared today without
      an elaborate if-else block. Its best to introduce key masks for easier
      and accurate rule comparison in the future. Until then, due to fear of
      regressions this patch re-introduces older loose rule_exists during add.
      Also fixes both rule_exists and rule_find to cover missing attributes.
      
      Fixes: f9d4b0c1 ("fib_rules: move common handling of newrule delrule msgs into fib_nl2rule")
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35e8c7ba
    • S
      hv_netvsc: split sub-channel setup into async and sync · 3ffe64f1
      Stephen Hemminger 提交于
      When doing device hotplug the sub channel must be async to avoid
      deadlock issues because device is discovered in softirq context.
      
      When doing changes to MTU and number of channels, the setup
      must be synchronous to avoid races such as when MTU and device
      settings are done in a single ip command.
      Reported-by: NThomas Walker <Thomas.Walker@twosigma.com>
      Fixes: 8195b139 ("hv_netvsc: fix deadlock on hotplug")
      Fixes: 732e4985 ("netvsc: fix race on sub channel creation")
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ffe64f1
    • C
      net: use dev_change_tx_queue_len() for SIOCSIFTXQLEN · 3f76df19
      Cong Wang 提交于
      As noticed by Eric, we need to switch to the helper
      dev_change_tx_queue_len() for SIOCSIFTXQLEN call path too,
      otheriwse still miss dev_qdisc_change_tx_queue_len().
      
      Fixes: 6a643ddb ("net: introduce helper dev_change_tx_queue_len()")
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f76df19
    • G
      atm: zatm: Fix potential Spectre v1 · ced9e191
      Gustavo A. R. Silva 提交于
      pool can be indirectly controlled by user-space, hence leading to
      a potential exploitation of the Spectre variant 1 vulnerability.
      
      This issue was detected with the help of Smatch:
      
      drivers/atm/zatm.c:1491 zatm_ioctl() warn: potential spectre issue
      'zatm_dev->pool_info' (local cap)
      
      Fix this by sanitizing pool before using it to index
      zatm_dev->pool_info
      
      Notice that given that speculation windows are large, the policy is
      to kill the speculation on the first load and not worry if it can be
      completed with a dependent load/store [1].
      
      [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ced9e191
    • D
      Merge branch 's390-qeth-fixes' · c7f653e0
      David S. Miller 提交于
      Julian Wiedmann says:
      
      ====================
      s390/qeth: fixes 2018-06-29
      
      please apply a few qeth fixes for -net and your 4.17 stable queue.
      
      Patches 1-3 fix several issues wrt to MAC address management that were
      introduced during the 4.17 cycle.
      Patch 4 tackles a long-standing issue with busy multi-connection workloads
      on devices in af_iucv mode.
      Patch 5 makes sure to re-enable all active HW offloads, after a card was
      previously set offline and thus lost its HW context.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7f653e0
    • J
      s390/qeth: consistently re-enable device features · d025da9e
      Julian Wiedmann 提交于
      commit e830baa9 ("qeth: restore device features after recovery") and
      commit ce344356 ("s390/qeth: rely on kernel for feature recovery")
      made sure that the HW functions for device features get re-programmed
      after recovery.
      
      But we missed that the same handling is also required when a card is
      first set offline (destroying all HW context), and then online again.
      Fix this by moving the re-enable action out of the recovery-only path.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d025da9e
    • J
      s390/qeth: don't clobber buffer on async TX completion · ce28867f
      Julian Wiedmann 提交于
      If qeth_qdio_output_handler() detects that a transmit requires async
      completion, it replaces the pending buffer's metadata object
      (qeth_qdio_out_buffer) so that this queue buffer can be re-used while
      the data is pending completion.
      
      Later when the CQ indicates async completion of such a metadata object,
      qeth_qdio_cq_handler() tries to free any data associated with this
      object (since HW has now completed the transfer). By calling
      qeth_clear_output_buffer(), it erronously operates on the queue buffer
      that _previously_ belonged to this transfer ... but which has been
      potentially re-used several times by now.
      This results in double-free's of the buffer's data, and failing
      transmits as the buffer descriptor is scrubbed in mid-air.
      
      The correct way of handling this situation is to
      1. scrub the queue buffer when it is prepared for re-use, and
      2. later obtain the data addresses from the async-completion notifier
         (ie. the AOB), instead of the queue buffer.
      
      All this only affects qeth devices used for af_iucv HiperTransport.
      
      Fixes: 0da9581d ("qeth: exploit asynchronous delivery of storage blocks")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce28867f
    • V
      s390/qeth: avoid using is_multicast_ether_addr_64bits on (u8 *)[6] · 9d0a58fb
      Vasily Gorbik 提交于
      *ether_addr*_64bits functions have been introduced to optimize
      performance critical paths, which access 6-byte ethernet address as u64
      value to get "nice" assembly. A harmless hack works nicely on ethernet
      addresses shoved into a structure or a larger buffer, until busted by
      Kasan on smth like plain (u8 *)[6].
      
      qeth_l2_set_mac_address calls qeth_l2_remove_mac passing
      u8 old_addr[ETH_ALEN] as an argument.
      
      Adding/removing macs for an ethernet adapter is not that performance
      critical. Moreover is_multicast_ether_addr_64bits itself on s390 is not
      faster than is_multicast_ether_addr:
      
      is_multicast_ether_addr(%r2) -> %r2
      llc	%r2,0(%r2)
      risbg	%r2,%r2,63,191,0
      
      is_multicast_ether_addr_64bits(%r2) -> %r2
      llgc	%r2,0(%r2)
      risbg	%r2,%r2,63,191,0
      
      So, let's just use is_multicast_ether_addr instead of
      is_multicast_ether_addr_64bits.
      
      Fixes: bcacfcbc ("s390/qeth: fix MAC address update sequence")
      Reviewed-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d0a58fb
    • J
      s390/qeth: fix race when setting MAC address · 4789a218
      Julian Wiedmann 提交于
      When qeth_l2_set_mac_address() finds the card in a non-reachable state,
      it merely copies the new MAC address into dev->dev_addr so that
      __qeth_l2_set_online() can later register it with the HW.
      
      But __qeth_l2_set_online() may very well be running concurrently, so we
      can't trust the card state without appropriate locking:
      If the online sequence is past the point where it registers
      dev->dev_addr (but not yet in SOFTSETUP state), any address change needs
      to be properly programmed into the HW. Otherwise the netdevice ends up
      with a different MAC address than what's set in the HW, and inbound
      traffic is not forwarded as expected.
      
      This is most likely to occur for OSD in LPAR, where
      commit 21b1702a ("s390/qeth: improve fallback to random MAC address")
      now triggers eg. systemd to immediately change the MAC when the netdevice
      is registered with a NET_ADDR_RANDOM address.
      
      Fixes: bcacfcbc ("s390/qeth: fix MAC address update sequence")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4789a218
    • J
      Revert "s390/qeth: use Read device to query hypervisor for MAC" · 46646105
      Julian Wiedmann 提交于
      This reverts commit b7493e91.
      
      On its own, querying RDEV for a MAC address works fine. But when upgrading
      from a qeth that previously queried DDEV on a z/VM NIC (ie. any kernel with
      commit ec61bd2f), the RDEV query now returns a _different_ MAC address
      than the DDEV query.
      
      If the NIC is configured with MACPROTECT, z/VM apparently requires us to
      use the MAC that was initially returned (on DDEV) and registered. So after
      upgrading to a kernel that uses RDEV, the SETVMAC registration cmd for the
      new MAC address fails and we end up with a non-operabel interface.
      
      To avoid regressions on upgrade, switch back to using DDEV for the MAC
      address query. The downgrade path (first RDEV, later DDEV) is fine, in this
      case both queries return the same MAC address.
      
      Fixes: b7493e91 ("s390/qeth: use Read device to query hypervisor for MAC")
      Reported-by: NMichal Kubecek <mkubecek@suse.com>
      Tested-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46646105