1. 02 5月, 2017 12 次提交
  2. 01 5月, 2017 28 次提交
    • I
      mlxsw: spectrum_router: Simplify VRF enslavement · b1e45526
      Ido Schimmel 提交于
      When a netdev is enslaved to a VRF master, its router interface (RIF)
      needs to be destroyed (if exists) and a new one created using the
      corresponding virtual router (VR).
      
      >From the driver's perspective, the above is equivalent to an inetaddr
      event sent for this netdev. Therefore, when a port netdev (or its
      uppers) are enslaved to a VRF master, call the same function that
      would've been called had a NETDEV_UP was sent for this netdev in the
      inetaddr notification chain.
      
      This patch also fixes a bug when a LAG netdev with an existing RIF is
      enslaved to a VRF. Before this patch, each LAG port would drop the
      reference on the RIF, but would re-join the same one (in the wrong VR)
      soon after. With this patch, the corresponding RIF is first destroyed
      and a new one is created using the correct VR.
      
      Fixes: 7179eb5a ("mlxsw: spectrum_router: Add support for VRFs")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1e45526
    • D
      Merge tag 'mlx5-updates-2017-04-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · cedf90c0
      David S. Miller 提交于
      mlx5-updates-2017-04-30
      
      Or says:
      ================
      mlx5 neigh update
      
      This series (whose code name is 'neigh update') from Hadar, enhances the
      mlx5 TC IP tunnel offloads to deal with changes to tunnel destination
      neighbours used in offloaded flows which involved encapsulation.
      
      In order to keep track on the validity state of such neighbours, we register
      a netevent notifier callback and act on NEIGH_UPDATE events: if a neighbour
      becomes valid, offload the related flows to HW (the other way around when
      neigh becomes invalid) and similarly when a neigh mac addresses changes.
      
      Since this traffic is offloaded from the host OS, the neighbour for the IP
      tunnel destination can mistakenly become STALE and deleted by the kernel
      since its 'used' value wasn't changed. To address that, we proactively
      update the neighbour 'used' value every DELAY_PROBE_TIME seconds, using
      time stamps generated by the existing driver code for HW flow counters.
      We use the DELAY_PROBE_TIME_UPDATE event to adjust the frequency of the updates.
      
      Prior to the core of the series, there's a patch from Saeed that introduces an
      extendable vport representor implementation scheme. It provides a separation
      between the eswitch to the netdev related aspects of the representors.
      
      We would like to thank Ido Schimmel and Ilya Lesokhin for their coaching && advice
      through the long design and review cycles while we struggled to understand and
      (hopefully correctly) implement the locking around the different driver flows(..) .
      
      - Or.
      =================
      
      Misc Updates:
      
      From Tariq:
      Some small performance and trivial code optimization for mlx5 netdev driver
      - Optimize poll ICOSQ completion queue
      - Use prefetchw when a write is to follow
      - Use u8 as ownership type in mlx5e_get_cqe()
      
      From Eran:
      - Disable LRO by default on specific setups
      
      From Eli:
      - Small cleanup for E-Switch to avoid redundant allocation
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cedf90c0
    • M
      qed: Prevent warning without CONFIG_RFS_ACCEL · 07ff2ed0
      Mintz, Yuval 提交于
      After removing the PTP related initialization from slowpath start,
      the remaining PTT entry is required only in case CONFIG_RFS_ACCEL is set.
      Otherwise, it leads to a warning due to it being unused.
      
      Fixes: d179bd16 ("qed: Acquire/release ptt_ptp lock when enabling/disabling PTP")
      Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07ff2ed0
    • D
      Merge branch 'qed-RoCE-fixes' · a6e8ab8e
      David S. Miller 提交于
      Yuval Mintz says:
      
      ====================
      qed: RoCE related pseudo-fixes
      
      This series contains multiple small corrections to the RoCE logic
      in qed plus some debug information and inter-module parameter
      meant to prevent issues further along.
      
       - #1, #6 Share information with protocol driver
         [either new or filling missing bits in existing API].
       - #2, #3 correct error flows in qed.
       - #4 add debug related information.
       - #5 fixes a minor issue in the HW configuration.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6e8ab8e
    • R
      qed: output the DPM status and WID count · 20b1bd96
      Ram Amrani 提交于
      Output to the RDMA driver whether DPM mode is enabled or disabled in
      the HW and if so what is the number of WIDs it supports
      Signed-off-by: NRam Amrani <Ram.Amrani@cavium.com>
      Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20b1bd96
    • R
      qed: align DPI configuration to HW requirements · 107392b7
      Ram Amrani 提交于
      When calculating doorbell BAR partitioning round up the number of
      CPUs to the nearest power of 2 so the size of the DPI (per user
      section) configured in the hardware will be stored properly and
      not truncated.
      Signed-off-by: NRam Amrani <Ram.Amrani@cavium.com>
      Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      107392b7
    • R
      qed: verify RoCE resource bitmaps are released · e015d58b
      Ram Amrani 提交于
      Add mechanism to verify RoCE resources are released prior to freeing the
      bitmaps. If this is not the case, print what resources were not released.
      Signed-off-by: NRam Amrani <Ram.Amrani@cavium.com>
      Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e015d58b
    • R
      qed: add error handling flow to TID deregistratin posting failure · 10536194
      Ram Amrani 提交于
      If the posting of the ramrod for the purpose of TID deregistration
      fails, abort the deregistration operation without using the FW's
      return code.
      Signed-off-by: NRam Amrani <Ram.Amrani@cavium.com>
      Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10536194
    • R
      qed: remove unused SQ error state · ba0154e9
      Ram Amrani 提交于
      The internal RoCE SQE QP state isn't being used. Instead we mark the
      QP as in regular error state.
      Signed-off-by: NRam Amrani <Ram.Amrani@cavium.com>
      Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba0154e9
    • R
      793ea8a9
    • Y
      bpf: enhance verifier to understand stack pointer arithmetic · 332270fd
      Yonghong Song 提交于
      llvm 4.0 and above generates the code like below:
      ....
      440: (b7) r1 = 15
      441: (05) goto pc+73
      515: (79) r6 = *(u64 *)(r10 -152)
      516: (bf) r7 = r10
      517: (07) r7 += -112
      518: (bf) r2 = r7
      519: (0f) r2 += r1
      520: (71) r1 = *(u8 *)(r8 +0)
      521: (73) *(u8 *)(r2 +45) = r1
      ....
      and the verifier complains "R2 invalid mem access 'inv'" for insn #521.
      This is because verifier marks register r2 as unknown value after #519
      where r2 is a stack pointer and r1 holds a constant value.
      
      Teach verifier to recognize "stack_ptr + imm" and
      "stack_ptr + reg with const val" as valid stack_ptr with new offset.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      332270fd
    • K
      benet: Use time_before_eq for time comparison · 2faf2657
      Karim Eshapa 提交于
      Use time_before_eq for time comparison more safe and dealing
      with timer wrapping to be future-proof.
      Signed-off-by: NKarim Eshapa <karim.eshapa@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2faf2657
    • B
      flower: check unused bits in MPLS fields · 1a7fca63
      Benjamin LaHaise 提交于
      Since several of the the netlink attributes used to configure the flower
      classifier's MPLS TC, BOS and Label fields have additional bits which are
      unused, check those bits to ensure that they are actually 0 as suggested
      by Jamal.
      Signed-off-by: NBenjamin LaHaise <benjamin.lahaise@netronome.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Simon Horman <simon.horman@netronome.com>
      Cc: Jakub Kicinski <kubakici@wp.pl>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a7fca63
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · a01aa920
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS updates for net-next
      
      The following patchset contains Netfilter updates for your net-next
      tree. A large bunch of code cleanups, simplify the conntrack extension
      codebase, get rid of the fake conntrack object, speed up netns by
      selective synchronize_net() calls. More specifically, they are:
      
      1) Check for ct->status bit instead of using nfct_nat() from IPVS and
         Netfilter codebase, patch from Florian Westphal.
      
      2) Use kcalloc() wherever possible in the IPVS code, from Varsha Rao.
      
      3) Simplify FTP IPVS helper module registration path, from Arushi Singhal.
      
      4) Introduce nft_is_base_chain() helper function.
      
      5) Enforce expectation limit from userspace conntrack helper,
         from Gao Feng.
      
      6) Add nf_ct_remove_expect() helper function, from Gao Feng.
      
      7) NAT mangle helper function return boolean, from Gao Feng.
      
      8) ctnetlink_alloc_expect() should only work for conntrack with
         helpers, from Gao Feng.
      
      9) Add nfnl_msg_type() helper function to nfnetlink to build the
         netlink message type.
      
      10) Get rid of unnecessary cast on void, from simran singhal.
      
      11) Use seq_puts()/seq_putc() instead of seq_printf() where possible,
          also from simran singhal.
      
      12) Use list_prev_entry() from nf_tables, from simran signhal.
      
      13) Remove unnecessary & on pointer function in the Netfilter and IPVS
          code.
      
      14) Remove obsolete comment on set of rules per CPU in ip6_tables,
          no longer true. From Arushi Singhal.
      
      15) Remove duplicated nf_conntrack_l4proto_udplite4, from Gao Feng.
      
      16) Remove unnecessary nested rcu_read_lock() in
          __nf_nat_decode_session(). Code running from hooks are already
          guaranteed to run under RCU read side.
      
      17) Remove deadcode in nf_tables_getobj(), from Aaron Conole.
      
      18) Remove double assignment in nf_ct_l4proto_pernet_unregister_one(),
          also from Aaron.
      
      19) Get rid of unsed __ip_set_get_netlink(), from Aaron Conole.
      
      20) Don't propagate NF_DROP error to userspace via ctnetlink in
          __nf_nat_alloc_null_binding() function, from Gao Feng.
      
      21) Revisit nf_ct_deliver_cached_events() to remove unnecessary checks,
          from Gao Feng.
      
      22) Kill the fake untracked conntrack objects, use ctinfo instead to
          annotate a conntrack object is untracked, from Florian Westphal.
      
      23) Remove nf_ct_is_untracked(), now obsolete since we have no
          conntrack template anymore, from Florian.
      
      24) Add event mask support to nft_ct, also from Florian.
      
      25) Move nf_conn_help structure to
          include/net/netfilter/nf_conntrack_helper.h.
      
      26) Add a fixed 32 bytes scratchpad area for conntrack helpers.
          Thus, we don't deal with variable conntrack extensions anymore.
          Make sure userspace conntrack helper doesn't go over that size.
          Remove variable size ct extension infrastructure now this code
          got no more clients. From Florian Westphal.
      
      27) Restore offset and length of nf_ct_ext structure to 8 bytes now
          that wraparound is not possible any longer, also from Florian.
      
      28) Allow to get rid of unassured flows under stress in conntrack,
          this applies to DCCP, SCTP and TCP protocols, from Florian.
      
      29) Shrink size of nf_conntrack_ecache structure, from Florian.
      
      30) Use TCP_MAX_WSCALE instead of hardcoded 14 in TCP tracker,
          from Gao Feng.
      
      31) Register SYNPROXY hooks on demand, from Florian Westphal.
      
      32) Use pernet hook whenever possible, instead of global hook
          registration, from Florian Westphal.
      
      33) Pass hook structure to ebt_register_table() to consolidate some
          infrastructure code, from Florian Westphal.
      
      34) Use consume_skb() and return NF_STOLEN, instead of NF_DROP in the
          SYNPROXY code, to make sure device stats are not fooled, patch
          from Gao Feng.
      
      35) Remove NF_CT_EXT_F_PREALLOC this kills quite some code that we
          don't need anymore if we just select a fixed size instead of
          expensive runtime time calculation of this. From Florian.
      
      36) Constify nf_ct_extend_register() and nf_ct_extend_unregister(),
          from Florian.
      
      37) Simplify nf_ct_ext_add(), this kills nf_ct_ext_create(), from
          Florian.
      
      38) Attach NAT extension on-demand from masquerade and pptp helper
          path, from Florian.
      
      39) Get rid of useless ip_vs_set_state_timeout(), from Aaron Conole.
      
      40) Speed up netns by selective calls of synchronize_net(), from
          Florian Westphal.
      
      41) Silence stack size warning gcc in 32-bit arch in snmp helper,
          from Florian.
      
      42) Inconditionally call nf_ct_ext_destroy(), even if we have no
          extensions, to deal with the NF_NAT_MANIP_SRC case. Patch from
          Liping Zhang.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a01aa920
    • D
      Merge branch 'bpf-samples-skb_mode-bug-fixes' · edd7f4ef
      David S. Miller 提交于
      Jesper Dangaard Brouer says:
      
      ====================
      samples/bpf: two bug fixes to XDP_FLAGS_SKB_MODE attaching
      
      Two small bugfixes for:
       commit 3993f2cb ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      edd7f4ef
    • J
      samples/bpf: fix XDP_FLAGS_SKB_MODE detach for xdp_tx_iptunnel · f76254a8
      Jesper Dangaard Brouer 提交于
      The xdp_tx_iptunnel program can be terminated in two ways, after
      N-seconds or via Ctrl-C SIGINT.  The SIGINT code path does not
      handle detatching the correct XDP program, in-case the program
      was attached with XDP_FLAGS_SKB_MODE.
      
      Fix this by storing the XDP flags as a global variable, which is
      available for the SIGINT handler function.
      
      Fixes: 3993f2cb ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f76254a8
    • J
      samples/bpf: fix SKB_MODE flag to be a 32-bit unsigned int · 6387d011
      Jesper Dangaard Brouer 提交于
      The kernel side of XDP_FLAGS_SKB_MODE is unsigned, and the rtnetlink
      IFLA_XDP_FLAGS is defined as NLA_U32. Thus, userspace programs under
      samples/bpf/ should use the correct type.
      
      Fixes: 3993f2cb ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6387d011
    • D
      Merge branch 'xdp-netlink-ext-ack' · d74a32ac
      David S. Miller 提交于
      Jakub Kicinski says:
      
      ====================
      xdp: use netlink extended ACK reporting
      
      This series is an attempt to make XDP more user friendly by
      enabling exploiting the recently added netlink extended ACK
      reporting to carry messages to user space.
      
      David Ahern's iproute2 ext ack patches for ip link are sufficient
      to show the errors like this:
      
      Error: nfp: MTU too large w/ XDP enabled
      
      Where the message is coming directly from the driver.  There could
      still be a bit of a leap for a complete novice from the message
      above to the right settings, but it's a big improvement over the
      standard "Invalid argument" message.
      
      v1/non-rfc:
       - add a separate macro in patch 1;
       - add KBUILD_MODNAME as part of the message (Daniel);
       - don't print the error to logs in patch 1.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d74a32ac
    • J
      virtio_net: make use of extended ack message reporting · 9861ce03
      Jakub Kicinski 提交于
      Try to carry error messages to the user via the netlink extended
      ack message attribute.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9861ce03
    • J
      nfp: make use of extended ack message reporting · d957c0f7
      Jakub Kicinski 提交于
      Try to carry error messages to the user via the netlink extended
      ack message attribute.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d957c0f7
    • J
      xdp: propagate extended ack to XDP setup · ddf9f970
      Jakub Kicinski 提交于
      Drivers usually have a number of restrictions for running XDP
      - most common being buffer sizes, LRO and number of rings.
      Even though some drivers try to be helpful and print error
      messages experience shows that users don't often consult
      kernel logs on netlink errors.  Try to use the new extended
      ack mechanism to carry the message back to user space.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ddf9f970
    • J
      netlink: add NULL-friendly helper for setting extended ACK message · 45d9b378
      Jakub Kicinski 提交于
      As we propagate extended ack reporting throughout various paths in
      the kernel it may be that the same function is called with the
      extended ack parameter passed as NULL.  One place where that happens
      is in drivers which have a centralized reconfiguration function
      called both from ndos and from ethtool_ops.  Add a new helper for
      setting the error message in such conditions.
      
      Existing helper is left as is to encourage propagating the ext act
      fully wherever possible.  It also makes it clear in the code which
      messages may be lost due to ext ack being NULL.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45d9b378
    • L
      netfilter: nf_ct_ext: invoke destroy even when ext is not attached · 8eeef235
      Liping Zhang 提交于
      For NF_NAT_MANIP_SRC, we will insert the ct to the nat_bysource_table,
      then remove it from the nat_bysource_table via nat_extend->destroy.
      
      But now, the nat extension is attached on demand, so if the nat extension
      is not attached, we will not be notified when the ct is destroyed, i.e.
      we may fail to remove ct from the nat_bysource_table.
      
      So just keep it simple, even if the extension is not attached, we will
      still invoke the related ext->destroy. And this will also preserve the
      flexibility for the future extension.
      
      Fixes: 9a08ecfe ("netfilter: don't attach a nat extension by default")
      Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8eeef235
    • P
      Merge tag 'ipvs3-for-v4.12' of http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next · d1908ca8
      Pablo Neira Ayuso 提交于
      Simon Horman says:
      
      ====================
      Third Round of IPVS Updates for v4.12
      
      please consider these enhancements to IPVS for v4.12.
      If it is too late for v4.12 then please consider them for v4.13.
      
      * Remove unused function
      * Correct comparison of unsigned value
      ====================
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d1908ca8
    • F
      netfilter: snmp: avoid stack size warning · 0e72f55f
      Florian Westphal 提交于
      net/ipv4/netfilter/nf_nat_snmp_basic.c:1158:1: warning: the frame size
      of 1160 bytes is larger than 1024 bytes
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0e72f55f
    • F
      netfilter: nf_queue: only call synchronize_net twice if nf_queue is active · 039b40ee
      Florian Westphal 提交于
      nf_unregister_net_hook(s) can avoid a second call to synchronize_net,
      provided there is no nfqueue active in that net namespace (which is
      the common case).
      
      This also gets rid of the extra arg to nf_queue_nf_hook_drop(), normally
      this gets called during netns cleanup so no packets should be queued.
      
      For the rare case of base chain being unregistered or module removal
      while nfqueue is in use the extra hiccup due to the packet drops isn't
      a big deal.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      039b40ee
    • F
      netfilter: nf_log: don't call synchronize_rcu in nf_log_unset · c83fa196
      Florian Westphal 提交于
      nf_log_unregister() (which is what gets called in the logger backends
      module exit paths) does a (required, module is removed) synchronize_rcu().
      
      But nf_log_unset() is only called from pernet exit handlers. It doesn't
      free any memory so there appears to be no need to call synchronize_rcu.
      
      v2: Liping Zhang points out that nf_log_unregister() needs to be called
      after pernet unregister, else rmmod would become unsafe.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c83fa196
    • F
      netfilter: batch synchronize_net calls during hook unregister · 933bd83e
      Florian Westphal 提交于
      synchronize_net is expensive and slows down netns cleanup a lot.
      
      We have two APIs to unregister a hook:
      nf_unregister_net_hook (which calls synchronize_net())
      and
      nf_unregister_net_hooks (calls nf_unregister_net_hook in a loop)
      
      Make nf_unregister_net_hook a wapper around new helper
      __nf_unregister_net_hook, which unlinks the hook but does not free it.
      
      Then, we can call that helper in nf_unregister_net_hooks and then
      call synchronize_net() only once.
      
      Andrey Konovalov reports this change improves syzkaller fuzzing speed at
      least twice.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      933bd83e