1. 10 8月, 2017 27 次提交
    • F
      net: call newid/getid without rtnl mutex held · 165b9117
      Florian Westphal 提交于
      Both functions take nsid_lock and don't rely on rtnl lock.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      165b9117
    • F
      rtnetlink: add RTNL_FLAG_DOIT_UNLOCKED · 62256f98
      Florian Westphal 提交于
      Allow callers to tell rtnetlink core that its doit callback
      should be invoked without holding rtnl mutex.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62256f98
    • F
      rtnetlink: protect handler table with rcu · 6853dd48
      Florian Westphal 提交于
      Note that netlink dumps still acquire rtnl mutex via the netlink
      dump infrastructure.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6853dd48
    • F
      rtnetlink: small rtnl lock pushdown · 0cc09020
      Florian Westphal 提交于
      instead of rtnl lock/unload at the top level, push it down
      to the called function.
      
      This is just an intermediate step, next commit switches protection
      of the rtnl_link ops table to rcu, in which case (for dumps) the
      rtnl lock is acquired only by the netlink dumper infrastructure
      (current lock/unlock/dump/lock/unlock rtnl sequence becomes
       rcu lock/rcu unlock/dump).
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0cc09020
    • F
      rtnetlink: add reference counting to prevent module unload while dump is in progress · 019a3169
      Florian Westphal 提交于
      I don't see what prevents rmmod (unregister_all is called) while a dump
      is active.
      
      Even if we'd add rtnl lock/unlock pair to unregister_all (as done here),
      thats not enough either as rtnl_lock is released right before the dump
      process starts.
      
      So this adds a refcount:
       * acquire rtnl mutex
       * bump refcount
       * release mutex
       * start the dump
      
      ... and make unregister_all remove the callbacks (no new dumps possible)
      and then wait until refcount is 0.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      019a3169
    • F
      rtnetlink: make rtnl_register accept a flags parameter · b97bac64
      Florian Westphal 提交于
      This change allows us to later indicate to rtnetlink core that certain
      doit functions should be called without acquiring rtnl_mutex.
      
      This change should have no effect, we simply replace the last (now
      unused) calcit argument with the new flag.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b97bac64
    • F
      rtnetlink: call rtnl_calcit directly · e1fa6d21
      Florian Westphal 提交于
      There is only a single place in the kernel that regisers the "calcit"
      callback (to determine min allocation for dumps).
      
      This is in rtnetlink.c for PF_UNSPEC RTM_GETLINK.
      The function that checks for calcit presence at run time will first check
      the requested family (which will always fail for !PF_UNSPEC as no callsite
      registers this), then falls back to checking PF_UNSPEC.
      
      Therefore we can just check if type is RTM_GETLINK and then do a direct
      call.  Because of fallback to PF_UNSPEC all RTM_GETLINK types used this
      regardless of family.
      
      This has the advantage that we don't need to allocate space for
      the function pointer for all the other families.
      
      A followup patch will drop the calcit function pointer from the
      rtnl_link callback structure.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1fa6d21
    • D
      Merge branch 'bpf-new-branches' · 078295fb
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      bpf: Add BPF_J{LT,LE,SLT,SLE} instructions
      
      This set adds BPF_J{LT,LE,SLT,SLE} instructions to the BPF
      insn set, interpreter, JIT hardening code and all JITs are
      also updated to support the new instructions. Basic idea is
      to reduce register pressure by avoiding BPF_J{GT,GE,SGT,SGE}
      rewrites. Removing the workaround for the rewrites in LLVM,
      this can result in shorter BPF programs, less stack usage
      and less verification complexity. First patch provides some
      more details on rationale and integration.
      
      Thanks a lot!
      
      v1 -> v2:
        - Reworded commit msg in patch 1
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      078295fb
    • D
      bpf: add test cases for new BPF_J{LT, LE, SLT, SLE} instructions · 31e482bf
      Daniel Borkmann 提交于
      Add test cases to the verifier selftest suite in order to verify that
      i) direct packet access, and ii) dynamic map value access is working
      with the changes related to the new instructions.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31e482bf
    • D
      bpf: enable BPF_J{LT, LE, SLT, SLE} opcodes in verifier · b4e432f1
      Daniel Borkmann 提交于
      Enable the newly added jump opcodes, main parts are in two
      different areas, namely direct packet access and dynamic map
      value access. For the direct packet access, we now allow for
      the following two new patterns to match in order to trigger
      markings with find_good_pkt_pointers():
      
      Variant 1 (access ok when taking the branch):
      
        0: (61) r2 = *(u32 *)(r1 +76)
        1: (61) r3 = *(u32 *)(r1 +80)
        2: (bf) r0 = r2
        3: (07) r0 += 8
        4: (ad) if r0 < r3 goto pc+2
        R0=pkt(id=0,off=8,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
        R3=pkt_end R10=fp
        5: (b7) r0 = 0
        6: (95) exit
      
        from 4 to 7: R0=pkt(id=0,off=8,r=8) R1=ctx
                     R2=pkt(id=0,off=0,r=8) R3=pkt_end R10=fp
        7: (71) r0 = *(u8 *)(r2 +0)
        8: (05) goto pc-4
        5: (b7) r0 = 0
        6: (95) exit
        processed 11 insns, stack depth 0
      
      Variant 2 (access ok on fall-through):
      
        0: (61) r2 = *(u32 *)(r1 +76)
        1: (61) r3 = *(u32 *)(r1 +80)
        2: (bf) r0 = r2
        3: (07) r0 += 8
        4: (bd) if r3 <= r0 goto pc+1
        R0=pkt(id=0,off=8,r=8) R1=ctx R2=pkt(id=0,off=0,r=8)
        R3=pkt_end R10=fp
        5: (71) r0 = *(u8 *)(r2 +0)
        6: (b7) r0 = 1
        7: (95) exit
      
        from 4 to 6: R0=pkt(id=0,off=8,r=0) R1=ctx
                     R2=pkt(id=0,off=0,r=0) R3=pkt_end R10=fp
        6: (b7) r0 = 1
        7: (95) exit
        processed 10 insns, stack depth 0
      
      The above two basically just swap the branches where we need
      to handle an exception and allow packet access compared to the
      two already existing variants for find_good_pkt_pointers().
      
      For the dynamic map value access, we add the new instructions
      to reg_set_min_max() and reg_set_min_max_inv() in order to
      learn bounds. Verifier test cases for both are added in a
      follow-up patch.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4e432f1
    • D
      bpf, nfp: implement jiting of BPF_J{LT,LE} · 5dd294d4
      Daniel Borkmann 提交于
      This work implements jiting of BPF_J{LT,LE} instructions with
      BPF_X/BPF_K variants for the nfp eBPF JIT. The two BPF_J{SLT,SLE}
      instructions have not been added yet given BPF_J{SGT,SGE} are
      not supported yet either.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5dd294d4
    • D
      bpf, ppc64: implement jiting of BPF_J{LT, LE, SLT, SLE} · 20dbf5cc
      Daniel Borkmann 提交于
      This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions
      with BPF_X/BPF_K variants for the ppc64 eBPF JIT.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Tested-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20dbf5cc
    • D
      bpf, s390x: implement jiting of BPF_J{LT, LE, SLT, SLE} · 3b497806
      Daniel Borkmann 提交于
      This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions
      with BPF_X/BPF_K variants for the s390x eBPF JIT.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b497806
    • D
      bpf, sparc64: implement jiting of BPF_J{LT, LE, SLT, SLE} · 18423550
      Daniel Borkmann 提交于
      This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions
      with BPF_X/BPF_K variants for the sparc64 eBPF JIT.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18423550
    • D
      bpf, arm64: implement jiting of BPF_J{LT, LE, SLT, SLE} · c362b2f3
      Daniel Borkmann 提交于
      This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions
      with BPF_X/BPF_K variants for the arm64 eBPF JIT.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c362b2f3
    • D
      bpf, x86: implement jiting of BPF_J{LT,LE,SLT,SLE} · 52afc51e
      Daniel Borkmann 提交于
      This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions
      with BPF_X/BPF_K variants for the x86_64 eBPF JIT.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52afc51e
    • D
      bpf: add BPF_J{LT,LE,SLT,SLE} instructions · 92b31a9a
      Daniel Borkmann 提交于
      Currently, eBPF only understands BPF_JGT (>), BPF_JGE (>=),
      BPF_JSGT (s>), BPF_JSGE (s>=) instructions, this means that
      particularly *JLT/*JLE counterparts involving immediates need
      to be rewritten from e.g. X < [IMM] by swapping arguments into
      [IMM] > X, meaning the immediate first is required to be loaded
      into a register Y := [IMM], such that then we can compare with
      Y > X. Note that the destination operand is always required to
      be a register.
      
      This has the downside of having unnecessarily increased register
      pressure, meaning complex program would need to spill other
      registers temporarily to stack in order to obtain an unused
      register for the [IMM]. Loading to registers will thus also
      affect state pruning since we need to account for that register
      use and potentially those registers that had to be spilled/filled
      again. As a consequence slightly more stack space might have
      been used due to spilling, and BPF programs are a bit longer
      due to extra code involving the register load and potentially
      required spill/fills.
      
      Thus, add BPF_JLT (<), BPF_JLE (<=), BPF_JSLT (s<), BPF_JSLE (s<=)
      counterparts to the eBPF instruction set. Modifying LLVM to
      remove the NegateCC() workaround in a PoC patch at [1] and
      allowing it to also emit the new instructions resulted in
      cilium's BPF programs that are injected into the fast-path to
      have a reduced program length in the range of 2-3% (e.g.
      accumulated main and tail call sections from one of the object
      file reduced from 4864 to 4729 insns), reduced complexity in
      the range of 10-30% (e.g. accumulated sections reduced in one
      of the cases from 116432 to 88428 insns), and reduced stack
      usage in the range of 1-5% (e.g. accumulated sections from one
      of the object files reduced from 824 to 784b).
      
      The modification for LLVM will be incorporated in a backwards
      compatible way. Plan is for LLVM to have i) a target specific
      option to offer a possibility to explicitly enable the extension
      by the user (as we have with -m target specific extensions today
      for various CPU insns), and ii) have the kernel checked for
      presence of the extensions and enable them transparently when
      the user is selecting more aggressive options such as -march=native
      in a bpf target context. (Other frontends generating BPF byte
      code, e.g. ply can probe the kernel directly for its code
      generation.)
      
        [1] https://github.com/borkmann/llvm/tree/bpf-insnsSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92b31a9a
    • D
      Merge branch 'net-zerocopy-fixes' · 0bdf7101
      David S. Miller 提交于
      Willem de Bruijn says:
      
      ====================
      net: zerocopy fixes
      
      Fix two issues introduced in the msg_zerocopy patchset.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0bdf7101
    • W
      sock: fix zerocopy_success regression with msg_zerocopy · 0a4a060b
      Willem de Bruijn 提交于
      Do not use uarg->zerocopy outside msg_zerocopy. In other paths the
      field is not explicitly initialized and aliases another field.
      
      Those paths have only one reference so do not need this intermediate
      variable. Call uarg->callback directly.
      
      Fixes: 1f8b977a ("sock: enable MSG_ZEROCOPY")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a4a060b
    • W
      sock: fix zerocopy panic in mem accounting · ccaffff1
      Willem de Bruijn 提交于
      Only call mm_unaccount_pinned_pages when releasing a struct ubuf_info
      that has initialized its field uarg->mmp.
      
      Before this patch, a vhost-net with experimental_zcopytx can crash in
      
        mm_unaccount_pinned_pages
        sock_zerocopy_put
        skb_zcopy_clear
        skb_release_data
      
      Only sock_zerocopy_alloc initializes this field. Move the unaccount
      call from generic sock_zerocopy_put to its specific callback
      sock_zerocopy_callback.
      
      Fixes: a91dbff5 ("sock: ulimit on MSG_ZEROCOPY pages")
      Reported-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccaffff1
    • D
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · d5e7f827
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      1GbE Intel Wired LAN Driver Updates 2017-08-08
      
      This series contains updates to e1000e and igb/igbvf.
      
      Gangfeng Huang fixes an issue with receive network flow classification,
      where igb_nfc_filter_exit() was not being called in igb_down() which
      would cause the filter tables to "fill up" if a user where to change
      the adapter settings (such as speed) which requires a reset of the
      adapter.
      
      Cliff Spradlin fixes a timestamping issue, where igb was allowing requests
      for hardware timestamping even if it was not configured for hardware
      transmit timestamping.
      
      Corinna Vinschen removes the error message that there was an "unexpected
      SYS WRAP", when it is actually expected.  So remove the message to not
      confuse users.
      
      Greg Edwards provides several patches for the mailbox interface between
      the PF and VF drivers.  Added a mailbox unlock method to be used to unlock
      the PF/VF mailbox by the PF.  Added a lock around the VF mailbox ops to
      prevent the VF from sending another message while the PF is still
      processing the previous message.  Fixed a "scheduling while atomic" issue
      by changing msleep() to mdelay().
      
      Sasha adds support for the next LOM generations i219 (v8 & v9) which
      will be available in the next Intel client platform IceLake.
      
      John Linville adds support for a Broadcom PHY to the igb driver, since
      there are designs out in the world which use the igb MAC and a third
      party PHY.  This allows the driver to load and function as expected on
      these designs.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5e7f827
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 3118e6e1
      David S. Miller 提交于
      The UDP offload conflict is dealt with by simply taking what is
      in net-next where we have removed all of the UFO handling code
      entirely.
      
      The TCP conflict was a case of local variables in a function
      being removed from both net and net-next.
      
      In netvsc we had an assignment right next to where a missing
      set of u64 stats sync object inits were added.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3118e6e1
    • M
      futex: Remove unnecessary warning from get_futex_key · 48fb6f4d
      Mel Gorman 提交于
      Commit 65d8fc77 ("futex: Remove requirement for lock_page() in
      get_futex_key()") removed an unnecessary lock_page() with the
      side-effect that page->mapping needed to be treated very carefully.
      
      Two defensive warnings were added in case any assumption was missed and
      the first warning assumed a correct application would not alter a
      mapping backing a futex key.  Since merging, it has not triggered for
      any unexpected case but Mark Rutland reported the following bug
      triggering due to the first warning.
      
        kernel BUG at kernel/futex.c:679!
        Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
        Modules linked in:
        CPU: 0 PID: 3695 Comm: syz-executor1 Not tainted 4.13.0-rc3-00020-g307fec773ba3 #3
        Hardware name: linux,dummy-virt (DT)
        task: ffff80001e271780 task.stack: ffff000010908000
        PC is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
        LR is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
        pc : [<ffff00000821ac14>] lr : [<ffff00000821ac14>] pstate: 80000145
      
      The fact that it's a bug instead of a warning was due to an unrelated
      arm64 problem, but the warning itself triggered because the underlying
      mapping changed.
      
      This is an application issue but from a kernel perspective it's a
      recoverable situation and the warning is unnecessary so this patch
      removes the warning.  The warning may potentially be triggered with the
      following test program from Mark although it may be necessary to adjust
      NR_FUTEX_THREADS to be a value smaller than the number of CPUs in the
      system.
      
          #include <linux/futex.h>
          #include <pthread.h>
          #include <stdio.h>
          #include <stdlib.h>
          #include <sys/mman.h>
          #include <sys/syscall.h>
          #include <sys/time.h>
          #include <unistd.h>
      
          #define NR_FUTEX_THREADS 16
          pthread_t threads[NR_FUTEX_THREADS];
      
          void *mem;
      
          #define MEM_PROT  (PROT_READ | PROT_WRITE)
          #define MEM_SIZE  65536
      
          static int futex_wrapper(int *uaddr, int op, int val,
                                   const struct timespec *timeout,
                                   int *uaddr2, int val3)
          {
              syscall(SYS_futex, uaddr, op, val, timeout, uaddr2, val3);
          }
      
          void *poll_futex(void *unused)
          {
              for (;;) {
                  futex_wrapper(mem, FUTEX_CMP_REQUEUE_PI, 1, NULL, mem + 4, 1);
              }
          }
      
          int main(int argc, char *argv[])
          {
              int i;
      
              mem = mmap(NULL, MEM_SIZE, MEM_PROT,
                     MAP_SHARED | MAP_ANONYMOUS, -1, 0);
      
              printf("Mapping @ %p\n", mem);
      
              printf("Creating futex threads...\n");
      
              for (i = 0; i < NR_FUTEX_THREADS; i++)
                  pthread_create(&threads[i], NULL, poll_futex, NULL);
      
              printf("Flipping mapping...\n");
              for (;;) {
                  mmap(mem, MEM_SIZE, MEM_PROT,
                       MAP_FIXED | MAP_SHARED | MAP_ANONYMOUS, -1, 0);
              }
      
              return 0;
          }
      Reported-and-tested-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org # 4.7+
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      48fb6f4d
    • L
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 358f8c26
      Linus Torvalds 提交于
      Pull i2c fixes from Wolfram Sang:
       "The main thing is to allow empty id_tables for ACPI to make some
        drivers get probed again. It looks a bit bigger than usual because it
        needs some internal renaming, too.
      
        Other than that, there is a fix for broken DSTDs, a super simple
        enablement for ARM MPS, and two documentation fixes which I'd like to
        see in v4.13 already"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: rephrase explanation of I2C_CLASS_DEPRECATED
        i2c: allow i2c-versatile for ARM MPS platforms
        i2c: designware: Some broken DSTDs use 1MiHz instead of 1MHz
        i2c: designware: Print clock freq on invalid clock freq error
        i2c: core: Allow empty id_table in ACPI case as well
        i2c: mux: pinctrl: mention correct module name in Kconfig help text
      358f8c26
    • L
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 31cf92f3
      Linus Torvalds 提交于
      Pull block fixes from Jens Axboe:
       "Three patches that should go into this release.
      
        Two of them are from Paolo and fix up some corner cases with BFQ, and
        the last patch is from Ming and fixes up a potential usage count
        imbalance regression due to the recent NOWAIT work"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        blk-mq: don't leak preempt counter/q_usage_counter when allocating rq failed
        block, bfq: consider also in_service_entity to state whether an entity is active
        block, bfq: reset in_service_entity if it becomes idle
      31cf92f3
    • L
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · d555eb6b
      Linus Torvalds 提交于
      Pull crypto fixes from Herbert Xu:
       "Fix two regressions in the inside-secure driver with respect to
        hmac(sha1)"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: inside-secure - fix the sha state length in hmac_sha1_setkey
        crypto: inside-secure - fix invalidation check in hmac_sha1_setkey
      d555eb6b
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 4530cca1
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
       "The pull requests are getting smaller, that's progress I suppose :-)
      
         1) Fix infinite loop in CIPSO option parsing, from Yujuan Qi.
      
         2) Fix remote checksum handling in VXLAN and GUE tunneling drivers,
            from Koichiro Den.
      
         3) Missing u64_stats_init() calls in several drivers, from Florian
            Fainelli.
      
         4) TCP can set the congestion window to an invalid ssthresh value
            after congestion window reductions, from Yuchung Cheng.
      
         5) Fix BPF jit branch generation on s390, from Daniel Borkmann.
      
         6) Correct MIPS ebpf JIT merge, from David Daney.
      
         7) Correct byte order test in BPF test_verifier.c, from Daniel
            Borkmann.
      
         8) Fix various crashes and leaks in ASIX driver, from Dean Jenkins.
      
         9) Handle SCTP checksums properly in mlx4 driver, from Davide
            Caratti.
      
        10) We can potentially enter tcp_connect() with a cached route
            already, due to fastopen, so we have to explicitly invalidate it.
      
        11) skb_warn_bad_offload() can bark in legitimate situations, fix from
            Willem de Bruijn"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
        net: avoid skb_warn_bad_offload false positives on UFO
        qmi_wwan: fix NULL deref on disconnect
        ppp: fix xmit recursion detection on ppp channels
        rds: Reintroduce statistics counting
        tcp: fastopen: tcp_connect() must refresh the route
        net: sched: set xt_tgchk_param par.net properly in ipt_init_target
        net: dsa: mediatek: add adjust link support for user ports
        net/mlx4_en: don't set CHECKSUM_COMPLETE on SCTP packets
        qed: Fix a memory allocation failure test in 'qed_mcp_cmd_init()'
        hysdn: fix to a race condition in put_log_buffer
        s390/qeth: fix L3 next-hop in xmit qeth hdr
        asix: Fix small memory leak in ax88772_unbind()
        asix: Ensure asix_rx_fixup_info members are all reset
        asix: Add rx->ax_skb = NULL after usbnet_skb_return()
        bpf: fix selftest/bpf/test_pkt_md_access on s390x
        netvsc: fix race on sub channel creation
        bpf: fix byte order test in test_verifier
        xgene: Always get clk source, but ignore if it's missing for SGMII ports
        MIPS: Add missing file for eBPF JIT.
        bpf, s390: fix build for libbpf and selftest suite
        ...
      4530cca1
  2. 09 8月, 2017 13 次提交
    • V
      net: ipv6: avoid overhead when no custom FIB rules are installed · feca7d8c
      Vincent Bernat 提交于
      If the user hasn't installed any custom rules, don't go through the
      whole FIB rules layer. This is pretty similar to f4530fa5 (ipv4:
      Avoid overhead when no custom FIB rules are installed).
      
      Using a micro-benchmark module [1], timing ip6_route_output() with
      get_cycles(), with 40,000 routes in the main routing table, before this
      patch:
      
          min=606 max=12911 count=627 average=1959 95th=4903 90th=3747 50th=1602 mad=821
          table=254 avgdepth=21.8 maxdepth=39
          value │                         ┊                            count
            600 │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒                                         199
            880 │▒▒▒░░░░░░░░░░░░░░░░                                      43
           1160 │▒▒▒░░░░░░░░░░░░░░░░░░░░                                  48
           1440 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░                               43
           1720 │▒▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░                          59
           2000 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                      50
           2280 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                    26
           2560 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                  31
           2840 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░               28
           3120 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░              17
           3400 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░             17
           3680 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░             8
           3960 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░           11
           4240 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░            6
           4520 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░           6
           4800 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░           9
      
      After:
      
          min=544 max=11687 count=627 average=1776 95th=4546 90th=3585 50th=1227 mad=565
          table=254 avgdepth=21.8 maxdepth=39
          value │                         ┊                            count
            540 │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒                                        201
            800 │▒▒▒▒▒░░░░░░░░░░░░░░░░                                    63
           1060 │▒▒▒▒▒░░░░░░░░░░░░░░░░░░░░░                               68
           1320 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░                            39
           1580 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                         32
           1840 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                       32
           2100 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                    34
           2360 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                 33
           2620 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░               26
           2880 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░              22
           3140 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░              9
           3400 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░             8
           3660 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░             9
           3920 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░            8
           4180 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░           8
           4440 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░           8
      
      At the frequency of the host during the bench (~ 3.7 GHz), this is
      about a 100 ns difference on the median value.
      
      A next step would be to collapse local and main tables, as in
      0ddcf43d (ipv4: FIB Local/MAIN table collapse).
      
      [1]: https://github.com/vincentbernat/network-lab/blob/master/lab-routes-ipv6/kbench_mod.cSigned-off-by: NVincent Bernat <vincent@bernat.im>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      feca7d8c
    • W
      net: avoid skb_warn_bad_offload false positives on UFO · 8d63bee6
      Willem de Bruijn 提交于
      skb_warn_bad_offload triggers a warning when an skb enters the GSO
      stack at __skb_gso_segment that does not have CHECKSUM_PARTIAL
      checksum offload set.
      
      Commit b2504a5d ("net: reduce skb_warn_bad_offload() noise")
      observed that SKB_GSO_DODGY producers can trigger the check and
      that passing those packets through the GSO handlers will fix it
      up. But, the software UFO handler will set ip_summed to
      CHECKSUM_NONE.
      
      When __skb_gso_segment is called from the receive path, this
      triggers the warning again.
      
      Make UFO set CHECKSUM_UNNECESSARY instead of CHECKSUM_NONE. On
      Tx these two are equivalent. On Rx, this better matches the
      skb state (checksum computed), as CHECKSUM_NONE here means no
      checksum computed.
      
      See also this thread for context:
      http://patchwork.ozlabs.org/patch/799015/
      
      Fixes: b2504a5d ("net: reduce skb_warn_bad_offload() noise")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d63bee6
    • A
      isdn: hfcsusb: constify usb_device_id · f374771d
      Arvind Yadav 提交于
      usb_device_id are not supposed to change at runtime. All functions
      working with usb_device_id provided by <linux/usb.h> work with
      const usb_device_id. So mark the non-const structs as const.
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f374771d
    • A
      isdn: hisax: hfc_usb: constify usb_device_id · 585f46a8
      Arvind Yadav 提交于
      usb_device_id are not supposed to change at runtime. All functions
      working with usb_device_id provided by <linux/usb.h> work with
      const usb_device_id. So mark the non-const structs as const.
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      585f46a8
    • B
      qmi_wwan: fix NULL deref on disconnect · bbae08e5
      Bjørn Mork 提交于
      qmi_wwan_disconnect is called twice when disconnecting devices with
      separate control and data interfaces.  The first invocation will set
      the interface data to NULL for both interfaces to flag that the
      disconnect has been handled.  But the matching NULL check was left
      out when qmi_wwan_disconnect was added, resulting in this oops:
      
        usb 2-1.4: USB disconnect, device number 4
        qmi_wwan 2-1.4:1.6 wwp0s29u1u4i6: unregister 'qmi_wwan' usb-0000:00:1d.0-1.4, WWAN/QMI device
        BUG: unable to handle kernel NULL pointer dereference at 00000000000000e0
        IP: qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
        PGD 0
        P4D 0
        Oops: 0000 [#1] SMP
        Modules linked in: <stripped irrelevant module list>
        CPU: 2 PID: 33 Comm: kworker/2:1 Tainted: G            E   4.12.3-nr44-normandy-r1500619820+ #1
        Hardware name: LENOVO 4291LR7/4291LR7, BIOS CBET4000 4.6-810-g50522254fb 07/21/2017
        Workqueue: usb_hub_wq hub_event [usbcore]
        task: ffff8c882b716040 task.stack: ffffb8e800d84000
        RIP: 0010:qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
        RSP: 0018:ffffb8e800d87b38 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffff8c8824f3f1d0 RDI: ffff8c8824ef6400
        RBP: ffff8c8824ef6400 R08: 0000000000000000 R09: 0000000000000000
        R10: ffffb8e800d87780 R11: 0000000000000011 R12: ffffffffc07ea0e8
        R13: ffff8c8824e2e000 R14: ffff8c8824e2e098 R15: 0000000000000000
        FS:  0000000000000000(0000) GS:ffff8c8835300000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000000000e0 CR3: 0000000229ca5000 CR4: 00000000000406e0
        Call Trace:
         ? usb_unbind_interface+0x71/0x270 [usbcore]
         ? device_release_driver_internal+0x154/0x210
         ? qmi_wwan_unbind+0x6d/0xc0 [qmi_wwan]
         ? usbnet_disconnect+0x6c/0xf0 [usbnet]
         ? qmi_wwan_disconnect+0x87/0xc0 [qmi_wwan]
         ? usb_unbind_interface+0x71/0x270 [usbcore]
         ? device_release_driver_internal+0x154/0x210
      Reported-and-tested-by: NNathaniel Roach <nroach44@gmail.com>
      Fixes: c6adf779 ("net: usb: qmi_wwan: add qmap mux protocol support")
      Cc: Daniele Palmas <dnlplm@gmail.com>
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbae08e5
    • B
      qmi_wwan: fix NULL deref on disconnect · 3df3ba2d
      Bjørn Mork 提交于
      qmi_wwan_disconnect is called twice when disconnecting devices with
      separate control and data interfaces.  The first invocation will set
      the interface data to NULL for both interfaces to flag that the
      disconnect has been handled.  But the matching NULL check was left
      out when qmi_wwan_disconnect was added, resulting in this oops:
      
        usb 2-1.4: USB disconnect, device number 4
        qmi_wwan 2-1.4:1.6 wwp0s29u1u4i6: unregister 'qmi_wwan' usb-0000:00:1d.0-1.4, WWAN/QMI device
        BUG: unable to handle kernel NULL pointer dereference at 00000000000000e0
        IP: qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
        PGD 0
        P4D 0
        Oops: 0000 [#1] SMP
        Modules linked in: <stripped irrelevant module list>
        CPU: 2 PID: 33 Comm: kworker/2:1 Tainted: G            E   4.12.3-nr44-normandy-r1500619820+ #1
        Hardware name: LENOVO 4291LR7/4291LR7, BIOS CBET4000 4.6-810-g50522254fb 07/21/2017
        Workqueue: usb_hub_wq hub_event [usbcore]
        task: ffff8c882b716040 task.stack: ffffb8e800d84000
        RIP: 0010:qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
        RSP: 0018:ffffb8e800d87b38 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffff8c8824f3f1d0 RDI: ffff8c8824ef6400
        RBP: ffff8c8824ef6400 R08: 0000000000000000 R09: 0000000000000000
        R10: ffffb8e800d87780 R11: 0000000000000011 R12: ffffffffc07ea0e8
        R13: ffff8c8824e2e000 R14: ffff8c8824e2e098 R15: 0000000000000000
        FS:  0000000000000000(0000) GS:ffff8c8835300000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000000000e0 CR3: 0000000229ca5000 CR4: 00000000000406e0
        Call Trace:
         ? usb_unbind_interface+0x71/0x270 [usbcore]
         ? device_release_driver_internal+0x154/0x210
         ? qmi_wwan_unbind+0x6d/0xc0 [qmi_wwan]
         ? usbnet_disconnect+0x6c/0xf0 [usbnet]
         ? qmi_wwan_disconnect+0x87/0xc0 [qmi_wwan]
         ? usb_unbind_interface+0x71/0x270 [usbcore]
         ? device_release_driver_internal+0x154/0x210
      Reported-and-tested-by: NNathaniel Roach <nroach44@gmail.com>
      Fixes: c6adf779 ("net: usb: qmi_wwan: add qmap mux protocol support")
      Cc: Daniele Palmas <dnlplm@gmail.com>
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3df3ba2d
    • C
      net: phy: mdio-bcm-unimac: fix unsigned wrap-around when decrementing timeout · 51ce3e21
      Colin Ian King 提交于
      Change post-decrement compare to pre-decrement to avoid an
      unsigned integer wrap-around on timeout. This leads to the following
      !timeout check to never to be true so -ETIMEDOUT is never returned.
      
      Detected by CoverityScan, CID#1452623 ("Logically dead code")
      
      Fixes: 69a60b05 ("net: phy: mdio-bcm-unimac: factor busy polling loop")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51ce3e21
    • G
      ppp: fix xmit recursion detection on ppp channels · 0a0e1a85
      Guillaume Nault 提交于
      Commit e5dadc65 ("ppp: Fix false xmit recursion detect with two ppp
      devices") dropped the xmit_recursion counter incrementation in
      ppp_channel_push() and relied on ppp_xmit_process() for this task.
      But __ppp_channel_push() can also send packets directly (using the
      .start_xmit() channel callback), in which case the xmit_recursion
      counter isn't incremented anymore. If such packets get routed back to
      the parent ppp unit, ppp_xmit_process() won't notice the recursion and
      will call ppp_channel_push() on the same channel, effectively creating
      the deadlock situation that the xmit_recursion mechanism was supposed
      to prevent.
      
      This patch re-introduces the xmit_recursion counter incrementation in
      ppp_channel_push(). Since the xmit_recursion variable is now part of
      the parent ppp unit, incrementation is skipped if the channel doesn't
      have any. This is fine because only packets routed through the parent
      unit may enter the channel recursively.
      
      Finally, we have to ensure that pch->ppp is not going to be modified
      while executing ppp_channel_push(). Instead of taking this lock only
      while calling ppp_xmit_process(), we now have to hold it for the full
      ppp_channel_push() execution. This respects the ppp locks ordering
      which requires locking ->upl before ->downl.
      
      Fixes: e5dadc65 ("ppp: Fix false xmit recursion detect with two ppp devices")
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a0e1a85
    • H
      rds: Reintroduce statistics counting · 05bfd7db
      Håkon Bugge 提交于
      In commit 7e3f2952 ("rds: don't let RDS shutdown a connection
      while senders are present"), refilling the receive queue was removed
      from rds_ib_recv(), along with the increment of
      s_ib_rx_refill_from_thread.
      
      Commit 73ce4317 ("RDS: make sure we post recv buffers")
      re-introduces filling the receive queue from rds_ib_recv(), but does
      not add the statistics counter. rds_ib_recv() was later renamed to
      rds_ib_recv_path().
      
      This commit reintroduces the statistics counting of
      s_ib_rx_refill_from_thread and s_ib_rx_refill_from_cq.
      Signed-off-by: NHåkon Bugge <haakon.bugge@oracle.com>
      Reviewed-by: NKnut Omang <knut.omang@oracle.com>
      Reviewed-by: NWei Lin Guay <wei.lin.guay@oracle.com>
      Reviewed-by: NShamir Rabinovitch <shamir.rabinovitch@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05bfd7db
    • E
      tcp: fastopen: tcp_connect() must refresh the route · 8ba60924
      Eric Dumazet 提交于
      With new TCP_FASTOPEN_CONNECT socket option, there is a possibility
      to call tcp_connect() while socket sk_dst_cache is either NULL
      or invalid.
      
       +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
       +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
       +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
       +0 connect(4, ..., ...) = 0
      
      << sk->sk_dst_cache becomes obsolete, or even set to NULL >>
      
       +1 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000
      
      We need to refresh the route otherwise bad things can happen,
      especially when syzkaller is running on the host :/
      
      Fixes: 19f6d3f3 ("net/tcp-fastopen: Add new API support")
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ba60924
    • X
      net: sched: set xt_tgchk_param par.net properly in ipt_init_target · ec0acb09
      Xin Long 提交于
      Now xt_tgchk_param par in ipt_init_target is a local varibale,
      par.net is not initialized there. Later when xt_check_target
      calls target's checkentry in which it may access par.net, it
      would cause kernel panic.
      
      Jaroslav found this panic when running:
      
        # ip link add TestIface type dummy
        # tc qd add dev TestIface ingress handle ffff:
        # tc filter add dev TestIface parent ffff: u32 match u32 0 0 \
          action xt -j CONNMARK --set-mark 4
      
      This patch is to pass net param into ipt_init_target and set
      par.net with it properly in there.
      
      v1->v2:
        As Wang Cong pointed, I missed ipt_net_id != xt_net_id, so fix
        it by also passing net_id to __tcf_ipt_init.
      v2->v3:
        Missed the fixes tag, so add it.
      
      Fixes: ecb2421b ("netfilter: add and use nf_ct_netns_get/put")
      Reported-by: NJaroslav Aster <jaster@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec0acb09
    • A
      cxgb4: Clear On FLASH config file after a FW upgrade · 4da18741
      Arjun Vynipadath 提交于
      Because Firmware and the Firmware Configuration File need to be
      in sync; clear out any On-FLASH Firmware Configuration File when new
      Firmware is loaded.  This will avoid difficult to diagnose and fix
      problems with a mis-matched Firmware Configuration File which prevents the
      adapter from being initialized.
      
      Original work by: Casey Leedom <leedom@chelsio.com>
      Signed-off-by: NArjun Vynipadath <arjun@chelsio.com>
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4da18741
    • W
      net_sched: get rid of some forward declarations · 7120371c
      WANG Cong 提交于
      If we move up tcf_fill_node() we can get rid of these
      forward declarations.
      
      Also, move down tfilter_notify_chain() to group them together.
      Reported-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7120371c