1. 26 7月, 2016 40 次提交
    • S
      samples/bpf: Add test/example of using bpf_probe_write_user bpf helper · cf9b1199
      Sargun Dhillon 提交于
      This example shows using a kprobe to act as a dnat mechanism to divert
      traffic for arbitrary endpoints. It rewrite the arguments to a syscall
      while they're still in userspace, and before the syscall has a chance
      to copy the argument into kernel space.
      
      Although this is an example, it also acts as a test because the mapped
      address is 255.255.255.255:555 -> real address, and that's not a legal
      address to connect to. If the helper is broken, the example will fail
      on the intermediate steps, as well as the final step to verify the
      rewrite of userspace memory succeeded.
      Signed-off-by: NSargun Dhillon <sargun@sargun.me>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf9b1199
    • S
      bpf: Add bpf_probe_write_user BPF helper to be called in tracers · 96ae5227
      Sargun Dhillon 提交于
      This allows user memory to be written to during the course of a kprobe.
      It shouldn't be used to implement any kind of security mechanism
      because of TOC-TOU attacks, but rather to debug, divert, and
      manipulate execution of semi-cooperative processes.
      
      Although it uses probe_kernel_write, we limit the address space
      the probe can write into by checking the space with access_ok.
      We do this as opposed to calling copy_to_user directly, in order
      to avoid sleeping. In addition we ensure the threads's current fs
      / segment is USER_DS and the thread isn't exiting nor a kernel thread.
      
      Given this feature is meant for experiments, and it has a risk of
      crashing the system, and running programs, we print a warning on
      when a proglet that attempts to use this helper is installed,
      along with the pid and process name.
      Signed-off-by: NSargun Dhillon <sargun@sargun.me>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96ae5227
    • A
      net/mlx4_core: Check device state before unregistering it · 9b022a6e
      Alex Vesker 提交于
      Verify that the device state is registered before un-registering it.
      This check is required to prevent an OOPS on flows that do
      re-registration of the device and its previous state was
      unregistered.
      
      Fixes: 225c7b1f ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
      Signed-off-by: NAlex Vesker <valex@mellanox.com>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b022a6e
    • I
      mlxsw: spectrum: Fix compilation error when CLS_ACT isn't set · 86cb13e4
      Ido Schimmel 提交于
      When CONFIG_NET_CLS_ACT isn't set 'struct tcf_exts' has no member named
      'actions' and we therefore must not access it. Otherwise compilation
      fails.
      
      Fix this by introducing a new macro similar to tc_no_actions(), which
      always returns 'false' if CONFIG_NET_CLS_ACT isn't set.
      
      Fixes: 763b4b70 ("mlxsw: spectrum: Add support in matchall mirror TC offloading")
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86cb13e4
    • U
      net: davinci_cpdma: remove excessive dump of register values to kernel log · 3568bdf0
      Uwe Kleine-König 提交于
      Such a big dump of register values is hardly useful on a production
      system.
      
      Another downside of the now removed functions is that calling
      emac_dump_regs resulted in at least 87 calls to dev_info while holding a
      spinlock and having irqs off which is a big source of latency.
      Signed-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3568bdf0
    • C
      gtp: #define #define _GTP_H_ and not #define _GTP_H · 9b8ac4f9
      Colin Ian King 提交于
      Fix clang build warning:
      
      ./include/net/gtp.h:1:9: warning: '_GTP_H_' is used as a header
      guard here, followed by #define of a different macro [-Wheader-guard]
      
      fix by defining _GTP_H_ and not _GTP_H
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b8ac4f9
    • D
      Merge branch 'mlx5-minimum-inline-header-mode' · 779d1436
      David S. Miller 提交于
      Saeed Mahameed says:
      
      ====================
      Mellanox 100G mlx5 minimum inline header mode
      
      This small series from Hadar adds the support for minimum inline header mode query
      in mlx5e NIC driver.
      
      Today on TX the driver copies to the HW descriptor only up to L2 header which is the default
      required mode and sufficient for today's needs.
      
      The header in the HW descriptor is used for HW loopback steering decision, without it packets
      will go directly to the wire with no questions asked.
      
      For TX loopback steering according to L2/L3/L4 headers, ConnectX-4 requires to copy the
      corresponding headers into the send queue(SQ) WQE HW descriptor so it can decide whether to loop it back
      or to forward to wire.
      
      For legacy E-Switch mode only L2 headers copy is required.
      For advanced steering (E-Switch offloads) more header layers may be required to be copied,
      the required mode will be advertised by FW to each VF and PF according to the corresponding
      E-Switch configuration.
      
      Changes V2:
       - Allocate query_nic_vport_context_out on the stack
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      779d1436
    • H
      net/mlx5e: Query minimum required header copy during xmit · cff92d7c
      Hadar Hen Zion 提交于
      Add support for query the minimum inline mode from the Firmware.
      It is required for correct TX steering according to L3/L4 packet
      headers.
      
      Each send queue (SQ) has inline mode that defines the minimal required
      headers that needs to be copied into the SQ WQE.
      The driver asks the Firmware for the wqe_inline_mode device capability
      value.  In case the device capability defined as "vport context" the
      driver must check the reported min inline mode from the vport context
      before creating its SQs.
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cff92d7c
    • H
      net/mlx5e: Check the minimum inline header mode before xmit · ae76715d
      Hadar Hen Zion 提交于
      Each send queue (SQ) has inline mode that defines the minimal required
      inline headers in the SQ WQE.
      Before sending each packet check that the minimum required headers
      on the WQE are copied.
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae76715d
    • V
      net/sctp: terminate rhashtable walk correctly · 5fc382d8
      Vegard Nossum 提交于
      I was seeing a lot of these:
      
          BUG: sleeping function called from invalid context at mm/slab.h:388
          in_atomic(): 0, irqs_disabled(): 0, pid: 14971, name: trinity-c2
          Preemption disabled at:[<ffffffff819bcd46>] rhashtable_walk_start+0x46/0x150
      
           [<ffffffff81149abb>] preempt_count_add+0x1fb/0x280
           [<ffffffff83295722>] _raw_spin_lock+0x12/0x40
           [<ffffffff811aac87>] console_unlock+0x2f7/0x930
           [<ffffffff811ab5bb>] vprintk_emit+0x2fb/0x520
           [<ffffffff811aba6a>] vprintk_default+0x1a/0x20
           [<ffffffff812c171a>] printk+0x94/0xb0
           [<ffffffff811d6ed0>] print_stack_trace+0xe0/0x170
           [<ffffffff8115835e>] ___might_sleep+0x3be/0x460
           [<ffffffff81158490>] __might_sleep+0x90/0x1a0
           [<ffffffff8139b823>] kmem_cache_alloc+0x153/0x1e0
           [<ffffffff819bca1e>] rhashtable_walk_init+0xfe/0x2d0
           [<ffffffff82ec64de>] sctp_transport_walk_start+0x1e/0x60
           [<ffffffff82edd8ad>] sctp_transport_seq_start+0x4d/0x150
           [<ffffffff8143a82b>] seq_read+0x27b/0x1180
           [<ffffffff814f97fc>] proc_reg_read+0xbc/0x180
           [<ffffffff813d471b>] __vfs_read+0xdb/0x610
           [<ffffffff813d4d3a>] vfs_read+0xea/0x2d0
           [<ffffffff813d615b>] SyS_pread64+0x11b/0x150
           [<ffffffff8100334c>] do_syscall_64+0x19c/0x410
           [<ffffffff832960a5>] return_from_SYSCALL_64+0x0/0x6a
           [<ffffffffffffffff>] 0xffffffffffffffff
      
      Apparently we always need to call rhashtable_walk_stop(), even when
      rhashtable_walk_start() fails:
      
       * rhashtable_walk_start - Start a hash table walk
       * @iter:       Hash table iterator
       *
       * Start a hash table walk.  Note that we take the RCU lock in all
       * cases including when we return an error.  So you must always call
       * rhashtable_walk_stop to clean up.
      
      otherwise we never call rcu_read_unlock() and we get the splat above.
      
      Fixes: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag")
      See-also: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag")
      See-also: f2dba9c6 ("rhashtable: Introduce rhashtable_walk_*")
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: stable@vger.kernel.org
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5fc382d8
    • D
      Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 9bc4a1cc
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      10GbE Intel Wired LAN Driver Updates 2016-07-22
      
      This series contains updates to ixgbe and ixgbevf only.
      
      Emil fixes the NACK check in ixgbevf_set_uc_addr_vf() for instances where
      the index is not equal to zero.  Fixes an issue where mac->ops.setup_fc
      can be NULL for backplanes which can cause the driver to crash on load.
      
      Don fixes the second parameter of the LED functions, which is the index to
      the LED we are interested in affecting.  Fixed variable to store register
      reads to unsigned integer.  Adds support for the new x553 hardware into
      ixgbevf.  Fixed a missing rtnl lock around ixgbevf_reinit_locked().
      Fixed an issue where in ixgbevf_reset_subtask() was not verifying that
      the port has been removed.  Cleans up the initial crosstalk fix, since
      the SFP that indicates the presence of a SFP+ module changes between
      hardware types.
      
      Babu Moger fixes typo in freeing IRQ, since the array subscript increments
      after the execution of the statement.
      
      Wei Yongjun adds the missing destroy_workqueue() before returning from
      ixgbe_init_module() in the error handling case.
      
      Tony adds range checking for setting the MTU from the VF, where the PF can
      return a NACK but this was not passed on to the VF, so propagate the
      results from the PF to the VF so errors can be reported.  Consolidates
      mailbox read and write functions, since the recent changes to
      ixgbevf_write_msg_read_ack(), other functions are performing the same
      operations done here.
      
      Colin Ian King removes a redundant check on ret_val, since ret_val has
      not changed since the previous check.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bc4a1cc
    • V
      net/irda: fix NULL pointer dereference on memory allocation failure · d3e6952c
      Vegard Nossum 提交于
      I ran into this:
      
          kasan: CONFIG_KASAN_INLINE enabled
          kasan: GPF could be caused by NULL-ptr deref or user memory access
          general protection fault: 0000 [#1] PREEMPT SMP KASAN
          CPU: 2 PID: 2012 Comm: trinity-c3 Not tainted 4.7.0-rc7+ #19
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
          task: ffff8800b745f2c0 ti: ffff880111740000 task.ti: ffff880111740000
          RIP: 0010:[<ffffffff82bbf066>]  [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710
          RSP: 0018:ffff880111747bb8  EFLAGS: 00010286
          RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000069dd8358
          RDX: 0000000000000009 RSI: 0000000000000027 RDI: 0000000000000048
          RBP: ffff880111747c00 R08: 0000000000000000 R09: 0000000000000000
          R10: 0000000069dd8358 R11: 1ffffffff0759723 R12: 0000000000000000
          R13: ffff88011a7e4780 R14: 0000000000000027 R15: 0000000000000000
          FS:  00007fc738404700(0000) GS:ffff88011af00000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 00007fc737fdfb10 CR3: 0000000118087000 CR4: 00000000000006e0
          Stack:
           0000000000000200 ffff880111747bd8 ffffffff810ee611 ffff880119f1f220
           ffff880119f1f4f8 ffff880119f1f4f0 ffff88011a7e4780 ffff880119f1f232
           ffff880119f1f220 ffff880111747d58 ffffffff82bca542 0000000000000000
          Call Trace:
           [<ffffffff82bca542>] irda_connect+0x562/0x1190
           [<ffffffff825ae582>] SYSC_connect+0x202/0x2a0
           [<ffffffff825b4489>] SyS_connect+0x9/0x10
           [<ffffffff8100334c>] do_syscall_64+0x19c/0x410
           [<ffffffff83295ca5>] entry_SYSCALL64_slow_path+0x25/0x25
          Code: 41 89 ca 48 89 e5 41 57 41 56 41 55 41 54 41 89 d7 53 48 89 fb 48 83 c7 48 48 89 fa 41 89 f6 48 c1 ea 03 48 83 ec 20 4c 8b 65 10 <0f> b6 04 02 84 c0 74 08 84 c0 0f 8e 4c 04 00 00 80 7b 48 00 74
          RIP  [<ffffffff82bbf066>] irttp_connect_request+0x36/0x710
           RSP <ffff880111747bb8>
          ---[ end trace 4cda2588bc055b30 ]---
      
      The problem is that irda_open_tsap() can fail and leave self->tsap = NULL,
      and then irttp_connect_request() almost immediately dereferences it.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3e6952c
    • M
      sctp: also point GSO head_skb to the sk when it's available · 52253db9
      Marcelo Ricardo Leitner 提交于
      The head skb for GSO packets won't travel through the inner depths of
      SCTP stack as it doesn't contain any chunks on it. That means skb->sk
      doesn't get set and then when sctp_recvmsg() calls
      sctp_inet6_skb_msgname() on the head_skb it panics, as this last needs
      to check flags at the socket (sp->v4mapped).
      
      The fix is to initialize skb->sk for th head skb once we are able to do
      it. That is, when the first chunk is processed.
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52253db9
    • M
      sctp: fix BH handling on socket backlog · eefc1b1d
      Marcelo Ricardo Leitner 提交于
      Now that the backlog processing is called with BH enabled, we have to
      disable BH before taking the socket lock via bh_lock_sock() otherwise
      it may dead lock:
      
      sctp_backlog_rcv()
                      bh_lock_sock(sk);
      
                      if (sock_owned_by_user(sk)) {
                              if (sk_add_backlog(sk, skb, sk->sk_rcvbuf))
                                      sctp_chunk_free(chunk);
                              else
                                      backloged = 1;
                      } else
                              sctp_inq_push(inqueue, chunk);
      
                      bh_unlock_sock(sk);
      
      while sctp_inq_push() was disabling/enabling BH, but enabling BH
      triggers pending softirq, which then may try to re-lock the socket in
      sctp_rcv().
      
      [  219.187215]  <IRQ>
      [  219.187217]  [<ffffffff817ca3e0>] _raw_spin_lock+0x20/0x30
      [  219.187223]  [<ffffffffa041888c>] sctp_rcv+0x48c/0xba0 [sctp]
      [  219.187225]  [<ffffffff816e7db2>] ? nf_iterate+0x62/0x80
      [  219.187226]  [<ffffffff816f1b14>] ip_local_deliver_finish+0x94/0x1e0
      [  219.187228]  [<ffffffff816f1e1f>] ip_local_deliver+0x6f/0xf0
      [  219.187229]  [<ffffffff816f1a80>] ? ip_rcv_finish+0x3b0/0x3b0
      [  219.187230]  [<ffffffff816f17a8>] ip_rcv_finish+0xd8/0x3b0
      [  219.187232]  [<ffffffff816f2122>] ip_rcv+0x282/0x3a0
      [  219.187233]  [<ffffffff810d8bb6>] ? update_curr+0x66/0x180
      [  219.187235]  [<ffffffff816abac4>] __netif_receive_skb_core+0x524/0xa90
      [  219.187236]  [<ffffffff810d8e00>] ? update_cfs_shares+0x30/0xf0
      [  219.187237]  [<ffffffff810d557c>] ? __enqueue_entity+0x6c/0x70
      [  219.187239]  [<ffffffff810dc454>] ? enqueue_entity+0x204/0xdf0
      [  219.187240]  [<ffffffff816ac048>] __netif_receive_skb+0x18/0x60
      [  219.187242]  [<ffffffff816ad1ce>] process_backlog+0x9e/0x140
      [  219.187243]  [<ffffffff816ac8ec>] net_rx_action+0x22c/0x370
      [  219.187245]  [<ffffffff817cd352>] __do_softirq+0x112/0x2e7
      [  219.187247]  [<ffffffff817cc3bc>] do_softirq_own_stack+0x1c/0x30
      [  219.187247]  <EOI>
      [  219.187248]  [<ffffffff810aa1c8>] do_softirq.part.14+0x38/0x40
      [  219.187249]  [<ffffffff810aa24d>] __local_bh_enable_ip+0x7d/0x80
      [  219.187254]  [<ffffffffa0408428>] sctp_inq_push+0x68/0x80 [sctp]
      [  219.187258]  [<ffffffffa04190f1>] sctp_backlog_rcv+0x151/0x1c0 [sctp]
      [  219.187260]  [<ffffffff81692b07>] __release_sock+0x87/0xf0
      [  219.187261]  [<ffffffff81692ba0>] release_sock+0x30/0xa0
      [  219.187265]  [<ffffffffa040e46d>] sctp_accept+0x17d/0x210 [sctp]
      [  219.187266]  [<ffffffff810e7510>] ? prepare_to_wait_event+0xf0/0xf0
      [  219.187268]  [<ffffffff8172d52c>] inet_accept+0x3c/0x130
      [  219.187269]  [<ffffffff8168d7a3>] SYSC_accept4+0x103/0x210
      [  219.187271]  [<ffffffff817ca2ba>] ? _raw_spin_unlock_bh+0x1a/0x20
      [  219.187272]  [<ffffffff81692bfc>] ? release_sock+0x8c/0xa0
      [  219.187276]  [<ffffffffa0413e22>] ? sctp_inet_listen+0x62/0x1b0 [sctp]
      [  219.187277]  [<ffffffff8168f2d0>] SyS_accept+0x10/0x20
      
      Fixes: 860fbbc3 ("sctp: prepare for socket backlog behavior change")
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eefc1b1d
    • H
      hv_netvsc: Fix VF register on bonding devices · e2b9f1f7
      Haiyang Zhang 提交于
      Added a condition to avoid bonding devices with same MAC registering
      as VF.
      Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2b9f1f7
    • C
      kcm: remove redundant -ve error check and return path · 0a58f474
      Colin Ian King 提交于
      The check for a -ve error is redundant, remove it and just
      immediately return the return value from the call to
      seq_open_net.
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a58f474
    • M
      net: ipv6: Always leave anycast and multicast groups on link down · ea06f717
      Mike Manning 提交于
      Default kernel behavior is to delete IPv6 addresses on link
      down, which entails deletion of the multicast and the
      subnet-router anycast addresses. These deletions do not
      happen with sysctl setting to keep global IPv6 addresses on
      link down, so every link down/up causes an increment of the
      anycast and multicast refcounts. These bogus refcounts may
      stop these addrs from being removed on subsequent calls to
      delete them. The solution is to leave the groups for the
      multicast and subnet anycast on link down for the callflow
      when global IPv6 addresses are kept.
      
      Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
      Signed-off-by: NMike Manning <mmanning@brocade.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea06f717
    • D
      Merge tag 'wireless-drivers-next-for-davem-2016-07-22' of... · d5b160d3
      David S. Miller 提交于
      Merge tag 'wireless-drivers-next-for-davem-2016-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
      
      Kalle Valo says:
      
      ====================
      pull-request: wireless-drivers-next 2016-07-22
      
      I'm sick so I have to keep this short, but here's the last pull request
      to net-next. This time there's a trivial conflict with mtd tree:
      
      http://lkml.kernel.org/g/20160720123133.44dab209@canb.auug.org.au
      
      We concluded with Brian (CCed) that it's best that we ask Linus to fix
      this. The patches have been in linux-next for a couple of days. This
      time I haven't done any merge tests so I don't know if there are any
      other conflicts etc.
      
      Please let me know if there are any problems.
      
      wireless-drivers-next patches for 4.8
      
      Major changes:
      
      wl18xx
      
      * add initial mesh support
      
      bcma
      
      * serial flash support on non-MIPS SoCs
      
      ath10k
      
      * enable support for QCA9888
      * disable wake_tx_queue() mac80211 op for older devices to workaround
        throughput regression
      
      ath9k
      
      * implement temperature compensation support for AR9003+
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5b160d3
    • W
      libcxgb: remove unused including <linux/version.h> · 15657841
      Wei Yongjun 提交于
      Remove including <linux/version.h> that don't need it.
      Signed-off-by: NWei Yongjun <weiyj.lk@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15657841
    • X
      sctp: use inet_recvmsg to support sctp RFS well · fd2d180a
      Xin Long 提交于
      Commit 486bdee0 ("sctp: add support for RPS and RFS")
      saves skb->hash into sk->sk_rxhash so that the inet_* can
      record it to flow table.
      
      But sctp uses sock_common_recvmsg as .recvmsg instead
      of inet_recvmsg, sock_common_recvmsg doesn't invoke
      sock_rps_record_flow to record the flow. It may cause
      that the receiver has no chances to record the flow if
      it doesn't send msg or poll the socket.
      
      So this patch fixes it by using inet_recvmsg as .recvmsg
      in sctp.
      
      Fixes: 486bdee0 ("sctp: add support for RPS and RFS")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd2d180a
    • D
      Merge branch 'macsec-icv-fixes' · 07a01697
      David S. Miller 提交于
      Davide Caratti says:
      
      ====================
      macsec: fix configurable ICV length
      
      This series provides a fix for macsec configurable ICV length. The
      maximum length of ICV element has been made compliant to IEEE 802.1AE,
      and error reporting in case of cipher suite configuration failure has been
      improved. Finally, a test has been added to netlink verify() callback in
      order to avoid creation of macsec interfaces having user-provided ICV length
      values that are not supported by the cipher suite.
      ====================
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07a01697
    • D
      macsec: validate ICV length on link creation · f04c392d
      Davide Caratti 提交于
      Test the cipher suite initialization in case ICV length has a value
      different than its default. If this test fails, creation of a new macsec
      link will also fail. This avoids situations where further security
      associations can't be added due to failures of crypto_aead_setauthsize(),
      caused by unsupported user-provided values of the ICV length.
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f04c392d
    • D
      macsec: fix error codes when a SA is created · 34aedfee
      Davide Caratti 提交于
      preserve the return value of AEAD functions that are called when a SA is
      created, to avoid inappropriate display of "RTNETLINK answers: Cannot
      allocate memory" message.
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34aedfee
    • D
      macsec: limit ICV length to 16 octets · 2ccbe2cb
      Davide Caratti 提交于
      IEEE 802.1AE-2006 standard recommends that the ICV element in a MACsec
      frame should not exceed 16 octets: add MACSEC_STD_ICV_LEN in uapi
      definitions accordingly, and avoid accepting configurations where the ICV
      length exceeds the standard value. Leave definition of MACSEC_MAX_ICV_LEN
      unchanged for backwards compatibility with userspace programs.
      
      Fixes: dece8d2b ("uapi: add MACsec bits")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ccbe2cb
    • I
      bridge: Fix incorrect re-injection of LLDP packets · baedbe55
      Ido Schimmel 提交于
      Commit 8626c56c ("bridge: fix potential use-after-free when hook
      returns QUEUE or STOLEN verdict") caused LLDP packets arriving through a
      bridge port to be re-injected to the Rx path with skb->dev set to the
      bridge device, but this breaks the lldpad daemon.
      
      The lldpad daemon opens a packet socket with protocol set to ETH_P_LLDP
      for any valid device on the system, which doesn't not include soft
      devices such as bridge and VLAN.
      
      Since packet sockets (ptype_base) are processed in the Rx path after the
      Rx handler, LLDP packets with skb->dev set to the bridge device never
      reach the lldpad daemon.
      
      Fix this by making the bridge's Rx handler re-inject LLDP packets with
      RX_HANDLER_PASS, which effectively restores the behaviour prior to the
      mentioned commit.
      
      This means netfilter will never receive LLDP packets coming through a
      bridge port, as I don't see a way in which we can have okfn() consume
      the packet without breaking existing behaviour. I've already carried out
      a similar fix for STP packets in commit 56fae404 ("bridge: Fix
      incorrect re-injection of STP packets").
      
      Fixes: 8626c56c ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      baedbe55
    • X
      sctp: support ipv6 nonlocal bind · 9b974202
      Xin Long 提交于
      This patch makes sctp support ipv6 nonlocal bind by adding
      sp->inet.freebind and net->ipv6.sysctl.ip_nonlocal_bind
      check in sctp_v6_available as what sctp did to support
      ipv4 nonlocal bind (commit cdac4e07).
      Reported-by: NShijoe George <spanjikk@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b974202
    • D
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · da54bb13
      David S. Miller 提交于
      Conflicts:
      	drivers/net/ethernet/intel/i40e/i40e_main.c
      
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2016-07-22
      
      This series contains updates to i40e and i40evf.
      
      Heinrich Schuchardt found a possible null pointer being dereferenced in
      i40e_debug_aq(), fixed the issue by doing the variable assignment after
      we are sure the pointer is not null.
      
      Avinash fixed an issue when link was down, we were not showing the
      correct advertised link modes.
      
      Mitch cleans up a useless initializer since the variable is assigned
      right away.  Refactors the receive filter handling to properly track
      filter adds and deletes so the driver will not lose filters during a
      reset and up/down cycles.  Also added a tracking mechanism so that the
      driver knows when to enter and leave promiscuous mode.
      
      Catherine removes a device id which is not needed (or used).  Moves
      a mutex lock since we need to lock the client list around the
      i40e_client_release() call to prevent the release from interrupting
      the client instances while they are being added.
      
      Joshua adds Hyper-V specific VF device ids.
      
      Amitoj Kaur Chawla cleans up a redundant memset() call before a memcpy().
      
      Stefan Assmann adds the missing link advertise for some x710 NICs.
      
      Tushar Dave fixes and issue found on SPARC, where a PF reset clears MAC
      filters and if a platform-specific MAC address is used, the driver has
      to explicitly write default MAC address to MAC filters otherwise all
      incoming traffic destined to the default MAC address will be dropped
      after reset.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da54bb13
    • D
      bpf, events: fix offset in skb copy handler · aa7145c1
      Daniel Borkmann 提交于
      This patch fixes the __output_custom() routine we currently use with
      bpf_skb_copy(). I missed that when len is larger than the size of the
      current handle, we can issue multiple invocations of copy_func, and
      __output_custom() advances destination but also source buffer by the
      written amount of bytes. When we have __output_custom(), this is actually
      wrong since in that case the source buffer points to a non-linear object,
      in our case an skb, which the copy_func helper is supposed to walk.
      Therefore, since this is non-linear we thus need to pass the offset into
      the helper, so that copy_func can use it for extracting the data from
      the source object.
      
      Therefore, adjust the callback signatures properly and pass offset
      into the skb_header_pointer() invoked from bpf_skb_copy() callback. The
      __DEFINE_OUTPUT_COPY_BODY() is adjusted to accommodate for two things:
      i) to pass in whether we should advance source buffer or not; this is
      a compile-time constant condition, ii) to pass in the offset for
      __output_custom(), which we do with help of __VA_ARGS__, so everything
      can stay inlined as is currently. Both changes allow for adapting the
      __output_* fast-path helpers w/o extra overhead.
      
      Fixes: 555c8a86 ("bpf: avoid stack copy and use skb ctx for event output")
      Fixes: 7e3f977e ("perf, events: add non-linear data support for raw records")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa7145c1
    • A
      net/ncsi: avoid maybe-uninitialized warning · a1b43edd
      Arnd Bergmann 提交于
      gcc-4.9 and higher warn about the newly added NSCI code:
      
      net/ncsi/ncsi-manage.c: In function 'ncsi_process_next_channel':
      net/ncsi/ncsi-manage.c:1003:2: error: 'old_state' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      The warning is a false positive and therefore harmless, but it would be good to
      avoid it anyway. I have determined that the barrier in the spin_unlock_irqsave()
      is what confuses gcc to the point that it cannot track whether the variable
      was unused or not.
      
      This rearranges the code in a way that makes it obvious to gcc that old_state
      is always initialized at the time of use, functionally this should not
      change anything.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1b43edd
    • D
      Merge branch 'libcxgb' · 974b9963
      David S. Miller 提交于
      Varun Prakash says:
      
      ====================
      common library for Chelsio drivers.
      
       This patch series adds common library module(libcxgb.ko)
       for Chelsio drivers to remove duplicate code.
      
       This series moves common iSCSI DDP Page Pod manager
       code from cxgb4.ko to libcxgb.ko, earlier this code
       was used by only cxgbit.ko now it is used by
       three Chelsio iSCSI drivers cxgb3i, cxgb4i, cxgbit.
      
       In future this module will have common connection
       management and hardware specific code that can
       be shared by multiple Chelsio drivers(cxgb4,
       csiostor, iw_cxgb4, cxgb4i, cxgbit).
      
       Please review.
      
       Thanks
      
      -v3
      - removed unused module init and exit functions.
      
      -v2
      - updated CONFIG_CHELSIO_LIB to an invisible option
      - changed libcxgb.ko module license from GPL to Dual BSD/GPL
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      974b9963
    • V
      cxgb3i, cxgb4i: fix symbol not declared sparse warning · 4665bdd5
      Varun Prakash 提交于
      Fix following sparse warnings
      warning: symbol 'cxgb3i_ofld_init' was not declared. Should it be static?
      warning: symbol 'cxgb4i_cplhandlers' was not declared. Should it be static?
      warning: symbol 'cxgb4i_ofld_init' was not declared. Should it be static?
      Signed-off-by: NVarun Prakash <varun@chelsio.com>
      Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4665bdd5
    • V
      libcxgb: export ppm release and tagmask set api · 9d5c44b7
      Varun Prakash 提交于
      Export cxgbi_ppm_release() to release
      ppod manager and cxgbi_tagmask_set() to
      set tag mask, they are used by cxgb3i, cxgb4i
      and cxgbit.
      Signed-off-by: NVarun Prakash <varun@chelsio.com>
      Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d5c44b7
    • V
      cxgb3i: add iSCSI DDP support · b75113b1
      Varun Prakash 提交于
      Add iSCSI DDP support in cxgb3i driver
      using common iSCSI DDP Page Pod Manager.
      Signed-off-by: NVarun Prakash <varun@chelsio.com>
      Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b75113b1
    • V
      cxgb4i,libcxgbi: add iSCSI DDP support · 71f7a00b
      Varun Prakash 提交于
      Add iSCSI DDP support in cxgb4i driver
      using common iSCSI DDP Page Pod Manager.
      Signed-off-by: NVarun Prakash <varun@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71f7a00b
    • V
      cxgb3i,cxgb4i,libcxgbi: remove iSCSI DDP support · 5999299f
      Varun Prakash 提交于
      Remove old ddp code from cxgb3i,cxgb4i,libcxgbi.
      
      Next two commits adds DDP support using
      common iSCSI DDP Page Pod Manager.
      Signed-off-by: NVarun Prakash <varun@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5999299f
    • V
      libcxgb: add library module for Chelsio drivers · b8b9d81b
      Varun Prakash 提交于
      Add common library module(libcxgb.ko) for
      Chelsio drivers to remove duplicate code.
      
      Code for iSCSI DDP Page Pod Manager is moved
      from cxgb4.ko to libcxgb.ko. Earlier only cxgbit.ko
      was using this code, now cxgb3i and cxgb4i will
      also use common Page Pod manager code.
      
      In future this module will have common connection
      management and hardware specific code that can be
      shared by multiple Chelsio drivers.
      Signed-off-by: NVarun Prakash <varun@chelsio.com>
      Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8b9d81b
    • V
      net: bridge: br_set_ageing_time takes a clock_t · 9e0b27fe
      Vivien Didelot 提交于
      Change the ageing_time type in br_set_ageing_time() from u32 to what it
      is expected to be, i.e. a clock_t.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e0b27fe
    • V
      net: bridge: fix br_stp_enable_bridge comment · dba479f3
      Vivien Didelot 提交于
      br_stp_enable_bridge() does take the br->lock spinlock. Fix its wrongly
      pasted comment and use the same as br_stp_disable_bridge().
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dba479f3
    • G
      cxgb4/cxgb4vf: Add link mode mask API to cxgb4 and cxgb4vf · eb97ad99
      Ganesh Goudar 提交于
      Based on original work by Casey Leedom <leedom@chelsio.com>
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb97ad99
    • M
      net/bonding: Enforce active-backup policy for IPoIB bonds · 1533e773
      Mark Bloch 提交于
      When using an IPoIB bond currently only active-backup mode is a valid
      use case and this commit strengthens it.
      
      Since commit 2ab82852 ("net/bonding: Enable bonding to enslave
      netdevices not supporting set_mac_address()") was introduced till
      4.7-rc1, IPoIB didn't support the set_mac_address ndo, and hence the
      fail over mac policy always applied to IPoIB bonds.
      
      With the introduction of commit 492a7e67 ("IB/IPoIB: Allow setting
      the device address"), that doesn't hold and practically IPoIB bonds are
      broken as of that. To fix it, lets go to fail over mac if the device
      doesn't support the ndo OR this is IPoIB device.
      
      As a by-product, this commit also prevents a stack corruption which
      occurred when trying to copy 20 bytes (IPoIB) device address
      to a sockaddr struct that has only 16 bytes of storage.
      Signed-off-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Acked-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1533e773