1. 13 8月, 2019 6 次提交
  2. 12 8月, 2019 8 次提交
    • N
      net: tc35815: Explicitly check NET_IP_ALIGN is not zero in tc35815_rx · 125b7e09
      Nathan Chancellor 提交于
      clang warns:
      
      drivers/net/ethernet/toshiba/tc35815.c:1507:30: warning: use of logical
      '&&' with constant operand [-Wconstant-logical-operand]
                              if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN)
                                                        ^  ~~~~~~~~~~~~
      drivers/net/ethernet/toshiba/tc35815.c:1507:30: note: use '&' for a
      bitwise operation
                              if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN)
                                                        ^~
                                                        &
      drivers/net/ethernet/toshiba/tc35815.c:1507:30: note: remove constant to
      silence this warning
                              if (!HAVE_DMA_RXALIGN(lp) && NET_IP_ALIGN)
                                                       ~^~~~~~~~~~~~~~~
      1 warning generated.
      
      Explicitly check that NET_IP_ALIGN is not zero, which matches how this
      is checked in other parts of the tree. Because NET_IP_ALIGN is a build
      time constant, this check will be constant folded away during
      optimization.
      
      Fixes: 82a9928d ("tc35815: Enable StripCRC feature")
      Link: https://github.com/ClangBuiltLinux/linux/issues/608Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      125b7e09
    • C
      tipc: initialise addr_trail_end when setting node addresses · 8874ecae
      Chris Packham 提交于
      We set the field 'addr_trial_end' to 'jiffies', instead of the current
      value 0, at the moment the node address is initialized. This guarantees
      we don't inadvertently enter an address trial period when the node
      address is explicitly set by the user.
      Signed-off-by: NChris Packham <chris.packham@alliedtelesis.co.nz>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8874ecae
    • C
      net: dsa: Check existence of .port_mdb_add callback before calling it · 58799865
      Chen-Yu Tsai 提交于
      The dsa framework has optional .port_mdb_{prepare,add,del} callback fields
      for drivers to handle multicast database entries. When adding an entry, the
      framework goes through a prepare phase, then a commit phase. Drivers not
      providing these callbacks should be detected in the prepare phase.
      
      DSA core may still bypass the bridge layer and call the dsa_port_mdb_add
      function directly with no prepare phase or no switchdev trans object,
      and the framework ends up calling an undefined .port_mdb_add callback.
      This results in a NULL pointer dereference, as shown in the log below.
      
      The other functions seem to be properly guarded. Do the same for
      .port_mdb_add in dsa_switch_mdb_add_bitmap() as well.
      
          8<--- cut here ---
          Unable to handle kernel NULL pointer dereference at virtual address 00000000
          pgd = (ptrval)
          [00000000] *pgd=00000000
          Internal error: Oops: 80000005 [#1] SMP ARM
          Modules linked in: rtl8xxxu rtl8192cu rtl_usb rtl8192c_common rtlwifi mac80211 cfg80211
          CPU: 1 PID: 134 Comm: kworker/1:2 Not tainted 5.3.0-rc1-00247-gd3519030752a #1
          Hardware name: Allwinner sun7i (A20) Family
          Workqueue: events switchdev_deferred_process_work
          PC is at 0x0
          LR is at dsa_switch_event+0x570/0x620
          pc : [<00000000>]    lr : [<c08533ec>]    psr: 80070013
          sp : ee871db8  ip : 00000000  fp : ee98d0a4
          r10: 0000000c  r9 : 00000008  r8 : ee89f710
          r7 : ee98d040  r6 : ee98d088  r5 : c0f04c48  r4 : ee98d04c
          r3 : 00000000  r2 : ee89f710  r1 : 00000008  r0 : ee98d040
          Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
          Control: 10c5387d  Table: 6deb406a  DAC: 00000051
          Process kworker/1:2 (pid: 134, stack limit = 0x(ptrval))
          Stack: (0xee871db8 to 0xee872000)
          1da0:                                                       ee871e14 103ace2d
          1dc0: 00000000 ffffffff 00000000 ee871e14 00000005 00000000 c08524a0 00000000
          1de0: ffffe000 c014bdfc c0f04c48 ee871e98 c0f04c48 ee9e5000 c0851120 c014bef0
          1e00: 00000000 b643aea2 ee9b4068 c08509a8 ee2bf940 ee89f710 ee871ecb 00000000
          1e20: 00000008 103ace2d 00000000 c087e248 ee29c868 103ace2d 00000001 ffffffff
          1e40: 00000000 ee871e98 00000006 00000000 c0fb2a50 c087e2d0 ffffffff c08523c4
          1e60: ffffffff c014bdfc 00000006 c0fad2d0 ee871e98 ee89f710 00000000 c014c500
          1e80: 00000000 ee89f3c0 c0f04c48 00000000 ee9e5000 c087dfb4 ee9e5000 00000000
          1ea0: ee89f710 ee871ecb 00000001 103ace2d 00000000 c0f04c48 00000000 c087e0a8
          1ec0: 00000000 efd9a3e0 0089f3c0 103ace2d ee89f700 ee89f710 ee9e5000 00000122
          1ee0: 00000100 c087e130 ee89f700 c0fad2c8 c1003ef0 c087de4c 2e928000 c0fad2ec
          1f00: c0fad2ec ee839580 ef7a62c0 ef7a9400 00000000 c087def8 c0fad2ec c01447dc
          1f20: ef315640 ef7a62c0 00000008 ee839580 ee839594 ef7a62c0 00000008 c0f03d00
          1f40: ef7a62d8 ef7a62c0 ffffe000 c0145b84 ffffe000 c0fb2420 c0bfaa8c 00000000
          1f60: ffffe000 ee84b600 ee84b5c0 00000000 ee870000 ee839580 c0145b40 ef0e5ea4
          1f80: ee84b61c c014a6f8 00000001 ee84b5c0 c014a5b0 00000000 00000000 00000000
          1fa0: 00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000
          1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
          1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
          [<c08533ec>] (dsa_switch_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84)
          [<c014bdfc>] (notifier_call_chain) from [<c014bef0>] (raw_notifier_call_chain+0x18/0x20)
          [<c014bef0>] (raw_notifier_call_chain) from [<c08509a8>] (dsa_port_mdb_add+0x48/0x74)
          [<c08509a8>] (dsa_port_mdb_add) from [<c087e248>] (__switchdev_handle_port_obj_add+0x54/0xd4)
          [<c087e248>] (__switchdev_handle_port_obj_add) from [<c087e2d0>] (switchdev_handle_port_obj_add+0x8/0x14)
          [<c087e2d0>] (switchdev_handle_port_obj_add) from [<c08523c4>] (dsa_slave_switchdev_blocking_event+0x94/0xa4)
          [<c08523c4>] (dsa_slave_switchdev_blocking_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84)
          [<c014bdfc>] (notifier_call_chain) from [<c014c500>] (blocking_notifier_call_chain+0x50/0x68)
          [<c014c500>] (blocking_notifier_call_chain) from [<c087dfb4>] (switchdev_port_obj_notify+0x44/0xa8)
          [<c087dfb4>] (switchdev_port_obj_notify) from [<c087e0a8>] (switchdev_port_obj_add_now+0x90/0x104)
          [<c087e0a8>] (switchdev_port_obj_add_now) from [<c087e130>] (switchdev_port_obj_add_deferred+0x14/0x5c)
          [<c087e130>] (switchdev_port_obj_add_deferred) from [<c087de4c>] (switchdev_deferred_process+0x64/0x104)
          [<c087de4c>] (switchdev_deferred_process) from [<c087def8>] (switchdev_deferred_process_work+0xc/0x14)
          [<c087def8>] (switchdev_deferred_process_work) from [<c01447dc>] (process_one_work+0x218/0x50c)
          [<c01447dc>] (process_one_work) from [<c0145b84>] (worker_thread+0x44/0x5bc)
          [<c0145b84>] (worker_thread) from [<c014a6f8>] (kthread+0x148/0x150)
          [<c014a6f8>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
          Exception stack(0xee871fb0 to 0xee871ff8)
          1fa0:                                     00000000 00000000 00000000 00000000
          1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
          1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
          Code: bad PC value
          ---[ end trace 1292c61abd17b130 ]---
      
          [<c08533ec>] (dsa_switch_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84)
          corresponds to
      
      	$ arm-linux-gnueabihf-addr2line -C -i -e vmlinux c08533ec
      
      	linux/net/dsa/switch.c:156
      	linux/net/dsa/switch.c:178
      	linux/net/dsa/switch.c:328
      
      Fixes: e6db98db ("net: dsa: add switch mdb bitmap functions")
      Signed-off-by: NChen-Yu Tsai <wens@csie.org>
      Reviewed-by: NVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58799865
    • P
      mlxsw: spectrum_ptp: Keep unmatched entries in a linked list · 8028ccda
      Petr Machata 提交于
      To identify timestamps for matching with their packets, Spectrum-1 uses a
      five-tuple of (port, direction, domain number, message type, sequence ID).
      If there are several clients from the same domain behind a single port
      sending Delay_Req's, the only thing differentiating these packets, as far
      as Spectrum-1 is concerned, is the sequence ID. Should sequence IDs between
      individual clients be similar, conflicts may arise. That is not a problem
      to hardware, which will simply deliver timestamps on a first comes, first
      served basis.
      
      However the driver uses a simple hash table to store the unmatched pieces.
      When a new conflicting piece arrives, it pushes out the previously stored
      one, which if it is a packet, is delivered without timestamp. Later on as
      the corresponding timestamps arrive, the first one is mismatched to the
      second packet, and the second one is never matched and eventually is GCd.
      
      To correct this issue, instead of using a simple rhashtable, use rhltable
      to keep the unmatched entries.
      
      Previously, a found unmatched entry would always be removed from the hash
      table. That is not the case anymore--an incompatible entry is left in the
      hash table. Therefore removal from the hash table cannot be used to confirm
      the validity of the looked-up pointer, instead the lookup would simply need
      to be redone. Therefore move it inside the critical section. This
      simplifies a lot of the code.
      
      Fixes: 87486427 ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls")
      Reported-by: NAlex Veber <alexve@mellanox.com>
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8028ccda
    • J
      net: nps_enet: Fix function names in doc comments · d81f4141
      Jonathan Neuschäfer 提交于
      Adjust the function names in two doc comments to match the corresponding
      functions.
      Signed-off-by: NJonathan Neuschäfer <j.neuschaefer@gmx.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d81f4141
    • D
      rxrpc: Fix local refcounting · 68553f1a
      David Howells 提交于
      Fix rxrpc_unuse_local() to handle a NULL local pointer as it can be called
      on an unbound socket on which rx->local is not yet set.
      
      The following reproduced (includes omitted):
      
      	int main(void)
      	{
      		socket(AF_RXRPC, SOCK_DGRAM, AF_INET);
      		return 0;
      	}
      
      causes the following oops to occur:
      
      	BUG: kernel NULL pointer dereference, address: 0000000000000010
      	...
      	RIP: 0010:rxrpc_unuse_local+0x8/0x1b
      	...
      	Call Trace:
      	 rxrpc_release+0x2b5/0x338
      	 __sock_release+0x37/0xa1
      	 sock_close+0x14/0x17
      	 __fput+0x115/0x1e9
      	 task_work_run+0x72/0x98
      	 do_exit+0x51b/0xa7a
      	 ? __context_tracking_exit+0x4e/0x10e
      	 do_group_exit+0xab/0xab
      	 __x64_sys_exit_group+0x14/0x17
      	 do_syscall_64+0x89/0x1d4
      	 entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Reported-by: syzbot+20dee719a2e090427b5f@syzkaller.appspotmail.com
      Fixes: 730c5fd4 ("rxrpc: Fix local endpoint refcounting")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Jeffrey Altman <jaltman@auristor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68553f1a
    • D
      netdevsim: Restore per-network namespace accounting for fib entries · 59c84b9f
      David Ahern 提交于
      Prior to the commit in the fixes tag, the resource controller in netdevsim
      tracked fib entries and rules per network namespace. Restore that behavior.
      
      Fixes: 5fc49422 ("netdevsim: create devlink instance per netdevsim instance")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59c84b9f
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 9481382b
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2019-08-11
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) x64 JIT code generation fix for backward-jumps to 1st insn, from Alexei.
      
      2) Fix buggy multi-closing of BTF file descriptor in libbpf, from Andrii.
      
      3) Fix libbpf_num_possible_cpus() to make it thread safe, from Takshak.
      
      4) Fix bpftool to dump an error if pinning fails, from Jakub.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9481382b
  3. 10 8月, 2019 8 次提交
    • J
      net/tls: swap sk_write_space on close · 57c722e9
      Jakub Kicinski 提交于
      Now that we swap the original proto and clear the ULP pointer
      on close we have to make sure no callback will try to access
      the freed state. sk_write_space is not part of sk_prot, remember
      to swap it.
      
      Reported-by: syzbot+dcdc9deefaec44785f32@syzkaller.appspotmail.com
      Fixes: 95fa1454 ("bpf: sockmap/tls, close can race with map free")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57c722e9
    • D
      hv_netvsc: Fix a warning of suspicious RCU usage · 6d0d779d
      Dexuan Cui 提交于
      This fixes a warning of "suspicious rcu_dereference_check() usage"
      when nload runs.
      
      Fixes: 776e726b ("netvsc: fix RCU warning in get_stats")
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d0d779d
    • D
      Merge tag 'mlx5-fixes-2019-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 9566e650
      David S. Miller 提交于
      Saeed Mahameed says:
      
      ====================
      Mellanox, mlx5 fixes 2019-08-08
      
      This series introduces some fixes to mlx5 driver.
      
      Highlights:
      1) From Tariq, Critical mlx5 kTLS fixes to better align with hw specs.
      2) From Aya, Fixes to mlx5 tx devlink health reporter.
      3) From Maxim, aRFs parsing to use flow dissector to avoid relying on
      invalid skb fields.
      
      Please pull and let me know if there is any problem.
      
      For -stable v4.3
       ('net/mlx5e: Only support tx/rx pause setting for port owner')
      For -stable v4.9
       ('net/mlx5e: Use flow keys dissector to parse packets for ARFS')
      For -stable v5.1
       ('net/mlx5e: Fix false negative indication on tx reporter CQE recovery')
       ('net/mlx5e: Remove redundant check in CQE recovery flow of tx reporter')
       ('net/mlx5e: ethtool, Avoid setting speed to 56GBASE when autoneg off')
      
      Note: when merged with net-next this minor conflict will pop up:
      ++<<<<<<< (net-next)
       +      if (is_eswitch_flow) {
       +              flow->esw_attr->match_level = match_level;
       +              flow->esw_attr->tunnel_match_level = tunnel_match_level;
      ++=======
      +       if (flow->flags & MLX5E_TC_FLOW_ESWITCH) {
      +               flow->esw_attr->inner_match_level = inner_match_level;
      +               flow->esw_attr->outer_match_level = outer_match_level;
      ++>>>>>>> (net)
      
      To resolve, use hunks from net (2nd) and replace:
      if (flow->flags & MLX5E_TC_FLOW_ESWITCH)
      with
      if (is_eswitch_flow)
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9566e650
    • T
      ixgbe: fix possible deadlock in ixgbe_service_task() · 8b638160
      Taehee Yoo 提交于
      ixgbe_service_task() calls unregister_netdev() under rtnl_lock().
      But unregister_netdev() internally calls rtnl_lock().
      So deadlock would occur.
      
      Fixes: 59dd45d5 ("ixgbe: firmware recovery mode")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b638160
    • D
      Merge branch 'Fix-collisions-in-socket-cookie-generation' · 703acf62
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      Fix collisions in socket cookie generation
      
      This change makes the socket cookie generator as a global counter
      instead of per netns in order to fix cookie collisions for BPF use
      cases we ran into. See main patch #1 for more details.
      
      Given the change is small/trivial and fixes an issue we're seeing
      my preference would be net tree (though it cleanly applies to
      net-next as well). Went for net tree instead of bpf tree here given
      the main change is in net/core/sock_diag.c, but either way would be
      fine with me.
      
      v1 -> v2:
        - Fix up commit description in patch #1, thanks Eric!
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      703acf62
    • D
      bpf: sync bpf.h to tools infrastructure · 609a2ca5
      Daniel Borkmann 提交于
      Pull in updates in BPF helper function description.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      609a2ca5
    • D
      sock: make cookie generation global instead of per netns · cd48bdda
      Daniel Borkmann 提交于
      Generating and retrieving socket cookies are a useful feature that is
      exposed to BPF for various program types through bpf_get_socket_cookie()
      helper.
      
      The fact that the cookie counter is per netns is quite a limitation
      for BPF in practice in particular for programs in host namespace that
      use socket cookies as part of a map lookup key since they will be
      causing socket cookie collisions e.g. when attached to BPF cgroup hooks
      or cls_bpf on tc egress in host namespace handling container traffic
      from veth or ipvlan devices with peer in different netns. Change the
      counter to be global instead.
      
      Socket cookie consumers must assume the value as opqaue in any case.
      Not every socket must have a cookie generated and knowledge of the
      counter value itself does not provide much value either way hence
      conversion to global is fine.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Martynas Pumputis <m@lambda.lt>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd48bdda
    • D
      Merge tag 'rxrpc-fixes-20190809' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 7bac762d
      David S. Miller 提交于
      David Howells says:
      
      ====================
      Here's a couple of fixes for rxrpc:
      
       (1) Fix refcounting of the local endpoint.
      
       (2) Don't calculate or report packet skew information.  This has been
           obsolete since AFS 3.1 and so is a waste of resources.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7bac762d
  4. 09 8月, 2019 18 次提交
    • D
      Merge branch 'bpf-bpftool-pinning-error-msg' · 4f7aafd7
      Daniel Borkmann 提交于
      Jakub Kicinski says:
      
      ====================
      First make sure we don't use "prog" in error messages because
      the pinning operation could be performed on a map. Second add
      back missing error message if pin syscall failed.
      ====================
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      4f7aafd7
    • J
      tools: bpftool: add error message on pin failure · 3c7be384
      Jakub Kicinski 提交于
      No error message is currently printed if the pin syscall
      itself fails. It got lost in the loadall refactoring.
      
      Fixes: 77380998 ("bpftool: add loadall command")
      Reported-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      3c7be384
    • J
      tools: bpftool: fix error message (prog -> object) · b3e78adc
      Jakub Kicinski 提交于
      Change an error message to work for any object being
      pinned not just programs.
      
      Fixes: 71bb428f ("tools: bpf: add bpftool")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      b3e78adc
    • D
      rxrpc: Don't bother generating maxSkew in the ACK packet · e8c3af6b
      David Howells 提交于
      Don't bother generating maxSkew in the ACK packet as it has been obsolete
      since AFS 3.1.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NJeffrey Altman <jaltman@auristor.com>
      e8c3af6b
    • D
      rxrpc: Fix local endpoint refcounting · 730c5fd4
      David Howells 提交于
      The object lifetime management on the rxrpc_local struct is broken in that
      the rxrpc_local_processor() function is expected to clean up and remove an
      object - but it may get requeued by packets coming in on the backing UDP
      socket once it starts running.
      
      This may result in the assertion in rxrpc_local_rcu() firing because the
      memory has been scheduled for RCU destruction whilst still queued:
      
      	rxrpc: Assertion failed
      	------------[ cut here ]------------
      	kernel BUG at net/rxrpc/local_object.c:468!
      
      Note that if the processor comes around before the RCU free function, it
      will just do nothing because ->dead is true.
      
      Fix this by adding a separate refcount to count active users of the
      endpoint that causes the endpoint to be destroyed when it reaches 0.
      
      The original refcount can then be used to refcount objects through the work
      processor and cause the memory to be rcu freed when that reaches 0.
      
      Fixes: 4f95dd78 ("rxrpc: Rework local endpoint management")
      Reported-by: syzbot+1e0edc4b8b7494c28450@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      730c5fd4
    • F
      net: tundra: tsi108: use spin_lock_irqsave instead of spin_lock_irq in IRQ context · 8c25d088
      Fuqian Huang 提交于
      As spin_unlock_irq will enable interrupts.
      Function tsi108_stat_carry is called from interrupt handler tsi108_irq.
      Interrupts are enabled in interrupt handler.
      Use spin_lock_irqsave/spin_unlock_irqrestore instead of spin_(un)lock_irq
      in IRQ context to avoid this.
      Signed-off-by: NFuqian Huang <huangfq.daxian@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c25d088
    • Y
      team: Add vlan tx offload to hw_enc_features · 227f2f03
      YueHaibing 提交于
      We should also enable team's vlan tx offload in hw_enc_features,
      pass the vlan packets to the slave devices with vlan tci, let the
      slave handle vlan tunneling offload implementation.
      
      Fixes: 3268e5cb ("team: Advertise tunneling offload features")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      227f2f03
    • J
      net/tls: prevent skb_orphan() from leaking TLS plain text with offload · 41477662
      Jakub Kicinski 提交于
      sk_validate_xmit_skb() and drivers depend on the sk member of
      struct sk_buff to identify segments requiring encryption.
      Any operation which removes or does not preserve the original TLS
      socket such as skb_orphan() or skb_clone() will cause clear text
      leaks.
      
      Make the TCP socket underlying an offloaded TLS connection
      mark all skbs as decrypted, if TLS TX is in offload mode.
      Then in sk_validate_xmit_skb() catch skbs which have no socket
      (or a socket with no validation) and decrypted flag set.
      
      Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and
      sk->sk_validate_xmit_skb are slightly interchangeable right now,
      they all imply TLS offload. The new checks are guarded by
      CONFIG_TLS_DEVICE because that's the option guarding the
      sk_buff->decrypted member.
      
      Second, smaller issue with orphaning is that it breaks
      the guarantee that packets will be delivered to device
      queues in-order. All TLS offload drivers depend on that
      scheduling property. This means skb_orphan_partial()'s
      trick of preserving partial socket references will cause
      issues in the drivers. We need a full orphan, and as a
      result netem delay/throttling will cause all TLS offload
      skbs to be dropped.
      
      Reusing the sk_buff->decrypted flag also protects from
      leaking clear text when incoming, decrypted skb is redirected
      (e.g. by TC).
      
      See commit 0608c69c ("bpf: sk_msg, sock{map|hash} redirect
      through ULP") for justification why the internal flag is safe.
      The only location which could leak the flag in is tcp_bpf_sendmsg(),
      which is taken care of by clearing the previously unused bit.
      
      v2:
       - remove superfluous decrypted mark copy (Willem);
       - remove the stale doc entry (Boris);
       - rely entirely on EOR marking to prevent coalescing (Boris);
       - use an internal sendpages flag instead of marking the socket
         (Boris).
      v3 (Willem):
       - reorganize the can_skb_orphan_partial() condition;
       - fix the flag leak-in through tcp_bpf_sendmsg.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41477662
    • D
      Merge branch 'skbedit-batch-fixes' · 0de94de1
      David S. Miller 提交于
      Roman Mashak says:
      
      ====================
      Fix batched event generation for skbedit action
      
      When adding or deleting a batch of entries, the kernel sends up to
      TCA_ACT_MAX_PRIO (defined to 32 in kernel) entries in an event to user
      space. However it does not consider that the action sizes may vary and
      require different skb sizes.
      
      For example, consider the following script adding 32 entries with all
      supported skbedit parameters and cookie (in order to maximize netlink
      messages size):
      
      % cat tc-batch.sh
      TC="sudo /mnt/iproute2.git/tc/tc"
      
      $TC actions flush action skbedit
      for i in `seq 1 $1`;
      do
         cmd="action skbedit queue_mapping 2 priority 10 mark 7/0xaabbccdd \
                     ptype host inheritdsfield \
                     index $i cookie aabbccddeeff112233445566778800a1 "
         args=$args$cmd
      done
      $TC actions add $args
      %
      % ./tc-batch.sh 32
      Error: Failed to fill netlink attributes while adding TC action.
      We have an error talking to the kernel
      %
      
      patch 1 adds callback in tc_action_ops of skbedit action, which calculates
      the action size, and passes size to tcf_add_notify()/tcf_del_notify().
      
      patch 2 updates the TDC test suite with relevant skbedit test cases.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0de94de1
    • R
      tc-testing: updated skbedit action tests with batch create/delete · 7bc16184
      Roman Mashak 提交于
      Update TDC tests with cases varifying ability of TC to install or delete
      batches of skbedit actions.
      Signed-off-by: NRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7bc16184
    • R
      net sched: update skbedit action for batched events operations · e1fea322
      Roman Mashak 提交于
      Add get_fill_size() routine used to calculate the action size
      when building a batch of events.
      
      Fixes: ca9b0e27 ("pkt_action: add new action skbedit")
      Signed-off-by: NRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1fea322
    • Y
      net: dsa: sja1105: remove set but not used variables 'tx_vid' and 'rx_vid' · e3e3af9a
      YueHaibing 提交于
      Fixes gcc '-Wunused-but-set-variable' warning:
      
      drivers/net/dsa/sja1105/sja1105_main.c: In function sja1105_fdb_dump:
      drivers/net/dsa/sja1105/sja1105_main.c:1226:14: warning:
       variable tx_vid set but not used [-Wunused-but-set-variable]
      drivers/net/dsa/sja1105/sja1105_main.c:1226:6: warning:
       variable rx_vid set but not used [-Wunused-but-set-variable]
      
      They are not used since commit 6d7c7d94 ("net: dsa:
      sja1105: Fix broken learning with vlan_filtering disabled")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3e3af9a
    • Y
      bonding: Add vlan tx offload to hw_enc_features · d595b03d
      YueHaibing 提交于
      As commit 30d8177e ("bonding: Always enable vlan tx offload")
      said, we should always enable bonding's vlan tx offload, pass the
      vlan packets to the slave devices with vlan tci, let them to handle
      vlan implementation.
      
      Now if encapsulation protocols like VXLAN is used, skb->encapsulation
      may be set, then the packet is passed to vlan device which based on
      bonding device. However in netif_skb_features(), the check of
      hw_enc_features:
      
      	 if (skb->encapsulation)
                       features &= dev->hw_enc_features;
      
      clears NETIF_F_HW_VLAN_CTAG_TX/NETIF_F_HW_VLAN_STAG_TX. This results
      in same issue in commit 30d8177e like this:
      
      vlan_dev_hard_start_xmit
        -->dev_queue_xmit
          -->validate_xmit_skb
            -->netif_skb_features //NETIF_F_HW_VLAN_CTAG_TX is cleared
            -->validate_xmit_vlan
              -->__vlan_hwaccel_push_inside //skb->tci is cleared
      ...
       --> bond_start_xmit
         --> bond_xmit_hash //BOND_XMIT_POLICY_ENCAP34
           --> __skb_flow_dissect // nhoff point to IP header
              -->  case htons(ETH_P_8021Q)
                   // skb_vlan_tag_present is false, so
                   vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan),
                   //vlan point to ip header wrongly
      
      Fixes: b2a103e6 ("bonding: convert to ndo_fix_features")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Acked-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d595b03d
    • I
      net: sched: sch_taprio: fix memleak in error path for sched list parse · 51650d33
      Ivan Khoronzhuk 提交于
      In error case, all entries should be freed from the sched list
      before deleting it. For simplicity use rcu way.
      
      Fixes: 5a781ccb ("tc: Add support for configuring the taprio scheduler")
      Acked-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51650d33
    • S
      net: docs: replace IPX in tuntap documentation · fe90689f
      Stephen Hemminger 提交于
      IPX is no longer supported, but the example in the documentation
      might useful. Replace it with IPv6.
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe90689f
    • S
      docs: admin-guide: remove references to IPX and token-ring · 7e7c076e
      Stephen Hemminger 提交于
      Both IPX and TR have not been supported for a while now.
      Remove them from the /proc/sys/net documentation.
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e7c076e
    • R
      xen/netback: Reset nr_frags before freeing skb · 3a0233dd
      Ross Lagerwall 提交于
      At this point nr_frags has been incremented but the frag does not yet
      have a page assigned so freeing the skb results in a crash. Reset
      nr_frags before freeing the skb to prevent this.
      Signed-off-by: NRoss Lagerwall <ross.lagerwall@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a0233dd
    • G
      inet: frags: re-introduce skb coalescing for local delivery · 891584f4
      Guillaume Nault 提交于
      Before commit d4289fcc ("net: IP6 defrag: use rbtrees for IPv6
      defrag"), a netperf UDP_STREAM test[0] using big IPv6 datagrams (thus
      generating many fragments) and running over an IPsec tunnel, reported
      more than 6Gbps throughput. After that patch, the same test gets only
      9Mbps when receiving on a be2net nic (driver can make a big difference
      here, for example, ixgbe doesn't seem to be affected).
      
      By reusing the IPv4 defragmentation code, IPv6 lost fragment coalescing
      (IPv4 fragment coalescing was dropped by commit 14fe22e3 ("Revert
      "ipv4: use skb coalescing in defragmentation"")).
      
      Without fragment coalescing, be2net runs out of Rx ring entries and
      starts to drop frames (ethtool reports rx_drops_no_frags errors). Since
      the netperf traffic is only composed of UDP fragments, any lost packet
      prevents reassembly of the full datagram. Therefore, fragments which
      have no possibility to ever get reassembled pile up in the reassembly
      queue, until the memory accounting exeeds the threshold. At that point
      no fragment is accepted anymore, which effectively discards all
      netperf traffic.
      
      When reassembly timeout expires, some stale fragments are removed from
      the reassembly queue, so a few packets can be received, reassembled
      and delivered to the netperf receiver. But the nic still drops frames
      and soon the reassembly queue gets filled again with stale fragments.
      These long time frames where no datagram can be received explain why
      the performance drop is so significant.
      
      Re-introducing fragment coalescing is enough to get the initial
      performances again (6.6Gbps with be2net): driver doesn't drop frames
      anymore (no more rx_drops_no_frags errors) and the reassembly engine
      works at full speed.
      
      This patch is quite conservative and only coalesces skbs for local
      IPv4 and IPv6 delivery (in order to avoid changing skb geometry when
      forwarding). Coalescing could be extended in the future if need be, as
      more scenarios would probably benefit from it.
      
      [0]: Test configuration
      Sender:
      ip xfrm policy flush
      ip xfrm state flush
      ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1
      ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir in tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow
      ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1
      ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir out tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow
      netserver -D -L fc00:2::1
      
      Receiver:
      ip xfrm policy flush
      ip xfrm state flush
      ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1
      ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir in tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow
      ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1
      ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir out tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow
      netperf -H fc00:2::1 -f k -P 0 -L fc00:1::1 -l 60 -t UDP_STREAM -I 99,5 -i 5,5 -T5,5 -6
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      891584f4