1. 09 6月, 2017 6 次提交
    • D
      bpf, tests: fix endianness selection · 78a5a93c
      Daniel Borkmann 提交于
      I noticed that test_l4lb was failing in selftests:
      
        # ./test_progs
        test_pkt_access:PASS:ipv4 77 nsec
        test_pkt_access:PASS:ipv6 44 nsec
        test_xdp:PASS:ipv4 2933 nsec
        test_xdp:PASS:ipv6 1500 nsec
        test_l4lb:PASS:ipv4 377 nsec
        test_l4lb:PASS:ipv6 544 nsec
        test_l4lb:FAIL:stats 6297600000 200000
        test_tcp_estats:PASS: 0 nsec
        Summary: 7 PASSED, 1 FAILED
      
      Tracking down the issue actually revealed that endianness selection
      in bpf_endian.h is broken when compiled with clang with bpf target.
      test_pkt_access.c, test_l4lb.c is compiled with __BYTE_ORDER as
      __BIG_ENDIAN, test_xdp.c as __LITTLE_ENDIAN! test_l4lb noticeably
      fails, because the test accounts bytes via bpf_ntohs(ip6h->payload_len)
      and bpf_ntohs(iph->tot_len), and compares them against a defined
      value and given a wrong endianness, the test outcome is different,
      of course.
      
      Turns out that there are actually two bugs: i) when we do __BYTE_ORDER
      comparison with __LITTLE_ENDIAN/__BIG_ENDIAN, then depending on the
      include order we see different outcomes. Reason is that __BYTE_ORDER
      is undefined due to missing endian.h include. Before we include the
      asm/byteorder.h (e.g. through linux/in.h), then __BYTE_ORDER equals
      __LITTLE_ENDIAN since both are undefined, after the include which
      correctly pulls in linux/byteorder/little_endian.h, __LITTLE_ENDIAN
      is defined, but given __BYTE_ORDER is still undefined, we match on
      __BYTE_ORDER equals to __BIG_ENDIAN since __BIG_ENDIAN is also
      undefined at that point, sigh. ii) But even that would be wrong,
      since when compiling the test cases with clang, one can select between
      bpfeb and bpfel targets for cross compilation. Hence, we can also not
      rely on what the system's endian.h provides, but we need to look at
      the compiler's defined endianness. The compiler defines __BYTE_ORDER__,
      and we can match __ORDER_LITTLE_ENDIAN__ and __ORDER_BIG_ENDIAN__,
      which also reflects targets bpf (native), bpfel, bpfeb correctly,
      thus really only rely on that. After patch:
      
        # ./test_progs
        test_pkt_access:PASS:ipv4 74 nsec
        test_pkt_access:PASS:ipv6 42 nsec
        test_xdp:PASS:ipv4 2340 nsec
        test_xdp:PASS:ipv6 1461 nsec
        test_l4lb:PASS:ipv4 400 nsec
        test_l4lb:PASS:ipv6 530 nsec
        test_tcp_estats:PASS: 0 nsec
        Summary: 7 PASSED, 0 FAILED
      
      Fixes: 43bcf707 ("bpf: fix _htons occurences in test_progs")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78a5a93c
    • N
      ethtool.h: remind to update 802.3ad when adding new speeds · 297fb414
      Nicolas Dichtel 提交于
      Each time a new speed is added, the bonding 802.3ad isn't updated. Add a
      comment to remind the developer to update this driver.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      297fb414
    • N
      bonding: fix 802.3ad support for 14G speed · 3fcd64cf
      Nicolas Dichtel 提交于
      This patch adds 14 Gbps enum definition, and fixes
      aggregated bandwidth calculation based on above slave links.
      
      Fixes: 0d7e2d21 ("IB/ipoib: add get_link_ksettings in ethtool")
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fcd64cf
    • T
      bonding: fix 802.3ad support for 5G and 50G speeds · c7c55067
      Thibaut Collet 提交于
      This patch adds [5|50] Gbps enum definition, and fixes
      aggregated bandwidth calculation based on above slave links.
      
      Fixes: c9a70d43 ("net-next: ethtool: Added port speed macros.")
      Signed-off-by: NThibaut Collet <thibaut.collet@6wind.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7c55067
    • N
      openvswitch: warn about missing first netlink attribute · daa6630a
      Nicolas Dichtel 提交于
      The first netlink attribute (value 0) must always be defined
      as none/unspec.
      
      Because we cannot change an existing UAPI, I add a comment to point the
      mistake and avoid to propagate it in a new ovs API in the future.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      daa6630a
    • A
      ila_xlat: add missing hash secret initialization · 0db47e3d
      Arnd Bergmann 提交于
      While discussing the possible merits of clang warning about unused initialized
      functions, I found one function that was clearly meant to be called but
      never actually is.
      
      __ila_hash_secret_init() initializes the hash value for the ila locator,
      apparently this is intended to prevent hash collision attacks, but this ends
      up being a read-only zero constant since there is no caller. I could find
      no indication of why it was never called, the earliest patch submission
      for the module already was like this. If my interpretation is right, we
      certainly want to backport the patch to stable kernels as well.
      
      I considered adding it to the ila_xlat_init callback, but for best effect
      the random data is read as late as possible, just before it is first used.
      The underlying net_get_random_once() is already highly optimized to avoid
      overhead when called frequently.
      
      Fixes: 7f00feaf ("ila: Add generic ILA translation facility")
      Cc: stable@vger.kernel.org
      Link: https://www.spinics.net/lists/kernel/msg2527243.htmlSigned-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0db47e3d
  2. 08 6月, 2017 21 次提交
  3. 07 6月, 2017 13 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · b29794ec
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Made TCP congestion control documentation match current reality,
          from Anmol Sarma.
      
       2) Various build warning and failure fixes from Arnd Bergmann.
      
       3) Fix SKB list leak in ipv6_gso_segment().
      
       4) Use after free in ravb driver, from Eugeniu Rosca.
      
       5) Don't use udp_poll() in ping protocol driver, from Eric Dumazet.
      
       6) Don't crash in PCI error recovery of cxgb4 driver, from Guilherme
          Piccoli.
      
       7) _SRC_NAT_DONE_BIT needs to be cleared using atomics, from Liping
          Zhang.
      
       8) Use after free in vxlan deletion, from Mark Bloch.
      
       9) Fix ordering of NAPI poll enabled in ethoc driver, from Max
          Filippov.
      
      10) Fix stmmac hangs with TSO, from Niklas Cassel.
      
      11) Fix crash in CALIPSO ipv6, from Richard Haines.
      
      12) Clear nh_flags properly on mpls link up. From Roopa Prabhu.
      
      13) Fix regression in sk_err socket error queue handling, noticed by
          ping applications. From Soheil Hassas Yeganeh.
      
      14) Update mlx4/mlx5 MAINTAINERS information.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (78 commits)
        net: stmmac: fix a broken u32 less than zero check
        net: stmmac: fix completely hung TX when using TSO
        net: ethoc: enable NAPI before poll may be scheduled
        net: bridge: fix a null pointer dereference in br_afspec
        ravb: Fix use-after-free on `ifconfig eth0 down`
        net/ipv6: Fix CALIPSO causing GPF with datagram support
        net: stmmac: ensure jumbo_frm error return is correctly checked for -ve value
        Revert "sit: reload iphdr in ipip6_rcv"
        i40e/i40evf: proper update of the page_offset field
        i40e: Fix state flags for bit set and clean operations of PF
        iwlwifi: fix host command memory leaks
        iwlwifi: fix min API version for 7265D, 3168, 8000 and 8265
        iwlwifi: mvm: clear new beacon command template struct
        iwlwifi: mvm: don't fail when removing a key from an inexisting sta
        iwlwifi: pcie: only use d0i3 in suspend/resume if system_pm is set to d0i3
        iwlwifi: mvm: fix firmware debug restart recording
        iwlwifi: tt: move ucode_loaded check under mutex
        iwlwifi: mvm: support ibss in dqa mode
        iwlwifi: mvm: Fix command queue number on d0i3 flow
        iwlwifi: mvm: rs: start using LQ command color
        ...
      b29794ec
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · e87f327e
      Linus Torvalds 提交于
      Pull sparc fixes from David Miller:
      
       1) Fix TLB context wrap races, from Pavel Tatashin.
      
       2) Cure some gcc-7 build issues.
      
       3) Handle invalid setup_hugepagesz command line values properly, from
          Liam R Howlett.
      
       4) Copy TSB using the correct address shift for the huge TSB, from Mike
          Kravetz.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc64: delete old wrap code
        sparc64: new context wrap
        sparc64: add per-cpu mm of secondary contexts
        sparc64: redefine first version
        sparc64: combine activate_mm and switch_mm
        sparc64: reset mm cpumask after wrap
        sparc/mm/hugepages: Fix setup_hugepagesz for invalid values.
        sparc: Machine description indices can vary
        sparc64: mm: fix copy_tsb to correctly copy huge page TSBs
        arch/sparc: support NR_CPUS = 4096
        sparc64: Add __multi3 for gcc 7.x and later.
        sparc64: Fix build warnings with gcc 7.
        arch/sparc: increase CONFIG_NODES_SHIFT on SPARC64 to 5
      e87f327e
    • D
      compiler, clang: suppress warning for unused static inline functions · abb2ea7d
      David Rientjes 提交于
      GCC explicitly does not warn for unused static inline functions for
      -Wunused-function.  The manual states:
      
      	Warn whenever a static function is declared but not defined or
      	a non-inline static function is unused.
      
      Clang does warn for static inline functions that are unused.
      
      It turns out that suppressing the warnings avoids potentially complex
      #ifdef directives, which also reduces LOC.
      
      Suppress the warning for clang.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      abb2ea7d
    • D
      Merge branch 'sparc64-context-wrap-fixes' · b3aefc2f
      David S. Miller 提交于
      Pavel Tatashin says:
      
      ====================
      sparc64: context wrap fixes
      
      This patch series contains fixes for context wrap: when we are out of
      context ids, and need to get a new version.
      
      It fixes memory corruption issues which happen when more than number of
      context ids (currently set to 8K) number of processes are started
      simultaneously, and processes can get a wrong context.
      
      sparc64: new context wrap:
      - contains explanation of new wrap method, and also explanation of races
        that it solves
      sparc64: reset mm cpumask after wrap
      - explains issue of not reseting cpu mask on a wrap
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b3aefc2f
    • P
      sparc64: delete old wrap code · 0197e41c
      Pavel Tatashin 提交于
      The old method that is using xcall and softint to get new context id is
      deleted, as it is replaced by a method of using per_cpu_secondary_mm
      without xcall to perform the context wrap.
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: NBob Picco <bob.picco@oracle.com>
      Reviewed-by: NSteven Sistare <steven.sistare@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0197e41c
    • P
      sparc64: new context wrap · a0582f26
      Pavel Tatashin 提交于
      The current wrap implementation has a race issue: it is called outside of
      the ctx_alloc_lock, and also does not wait for all CPUs to complete the
      wrap.  This means that a thread can get a new context with a new version
      and another thread might still be running with the same context. The
      problem is especially severe on CPUs with shared TLBs, like sun4v. I used
      the following test to very quickly reproduce the problem:
      - start over 8K processes (must be more than context IDs)
      - write and read values at a  memory location in every process.
      
      Very quickly memory corruptions start happening, and what we read back
      does not equal what we wrote.
      
      Several approaches were explored before settling on this one:
      
      Approach 1:
      Move smp_new_mmu_context_version() inside ctx_alloc_lock, and wait for
      every process to complete the wrap. (Note: every CPU must WAIT before
      leaving smp_new_mmu_context_version_client() until every one arrives).
      
      This approach ends up with deadlocks, as some threads own locks which other
      threads are waiting for, and they never receive softint until these threads
      exit smp_new_mmu_context_version_client(). Since we do not allow the exit,
      deadlock happens.
      
      Approach 2:
      Handle wrap right during mondo interrupt. Use etrap/rtrap to enter into
      into C code, and issue new versions to every CPU.
      This approach adds some overhead to runtime: in switch_mm() we must add
      some checks to make sure that versions have not changed due to wrap while
      we were loading the new secondary context. (could be protected by PSTATE_IE
      but that degrades performance as on M7 and older CPUs as it takes 50 cycles
      for each access). Also, we still need a global per-cpu array of MMs to know
      where we need to load new contexts, otherwise we can change context to a
      thread that is going way (if we received mondo between switch_mm() and
      switch_to() time). Finally, there are some issues with window registers in
      rtrap() when context IDs are changed during CPU mondo time.
      
      The approach in this patch is the simplest and has almost no impact on
      runtime.  We use the array with mm's where last secondary contexts were
      loaded onto CPUs and bump their versions to the new generation without
      changing context IDs. If a new process comes in to get a context ID, it
      will go through get_new_mmu_context() because of version mismatch. But the
      running processes do not need to be interrupted. And wrap is quicker as we
      do not need to xcall and wait for everyone to receive and complete wrap.
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: NBob Picco <bob.picco@oracle.com>
      Reviewed-by: NSteven Sistare <steven.sistare@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0582f26
    • P
      sparc64: add per-cpu mm of secondary contexts · 7a5b4bbf
      Pavel Tatashin 提交于
      The new wrap is going to use information from this array to figure out
      mm's that currently have valid secondary contexts setup.
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: NBob Picco <bob.picco@oracle.com>
      Reviewed-by: NSteven Sistare <steven.sistare@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a5b4bbf
    • P
      sparc64: redefine first version · c4415235
      Pavel Tatashin 提交于
      CTX_FIRST_VERSION defines the first context version, but also it defines
      first context. This patch redefines it to only include the first context
      version.
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: NBob Picco <bob.picco@oracle.com>
      Reviewed-by: NSteven Sistare <steven.sistare@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4415235
    • P
      sparc64: combine activate_mm and switch_mm · 14d0334c
      Pavel Tatashin 提交于
      The only difference between these two functions is that in activate_mm we
      unconditionally flush context. However, there is no need to keep this
      difference after fixing a bug where cpumask was not reset on a wrap. So, in
      this patch we combine these.
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: NBob Picco <bob.picco@oracle.com>
      Reviewed-by: NSteven Sistare <steven.sistare@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14d0334c
    • P
      sparc64: reset mm cpumask after wrap · 58897485
      Pavel Tatashin 提交于
      After a wrap (getting a new context version) a process must get a new
      context id, which means that we would need to flush the context id from
      the TLB before running for the first time with this ID on every CPU. But,
      we use mm_cpumask to determine if this process has been running on this CPU
      before, and this mask is not reset after a wrap. So, there are two possible
      fixes for this issue:
      
      1. Clear mm cpumask whenever mm gets a new context id
      2. Unconditionally flush context every time process is running on a CPU
      
      This patch implements the first solution
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: NBob Picco <bob.picco@oracle.com>
      Reviewed-by: NSteven Sistare <steven.sistare@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58897485
    • L
      sparc/mm/hugepages: Fix setup_hugepagesz for invalid values. · f322980b
      Liam R. Howlett 提交于
      hugetlb_bad_size needs to be called on invalid values.  Also change the
      pr_warn to a pr_err to better align with other platforms.
      Signed-off-by: NLiam R. Howlett <Liam.Howlett@Oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f322980b
    • J
      sparc: Machine description indices can vary · c982aa9c
      James Clarke 提交于
      VIO devices were being looked up by their index in the machine
      description node block, but this often varies over time as devices are
      added and removed. Instead, store the ID and look up using the type,
      config handle and ID.
      Signed-off-by: NJames Clarke <jrtc27@jrtc27.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112541Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c982aa9c
    • M
      sparc64: mm: fix copy_tsb to correctly copy huge page TSBs · 654f4807
      Mike Kravetz 提交于
      When a TSB grows beyond its current capacity, a new TSB is allocated
      and copy_tsb is called to copy entries from the old TSB to the new.
      A hash shift based on page size is used to calculate the index of an
      entry in the TSB.  copy_tsb has hard coded PAGE_SHIFT in these
      calculations.  However, for huge page TSBs the value REAL_HPAGE_SHIFT
      should be used.  As a result, when copy_tsb is called for a huge page
      TSB the entries are placed at the incorrect index in the newly
      allocated TSB.  When doing hardware table walk, the MMU does not
      match these entries and we end up in the TSB miss handling code.
      This code will then create and write an entry to the correct index
      in the TSB.  We take a performance hit for the table walk miss and
      recreation of these entries.
      
      Pass a new parameter to copy_tsb that is the page size shift to be
      used when copying the TSB.
      Suggested-by: NAnthony Yznaga <anthony.yznaga@oracle.com>
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      654f4807