1. 27 8月, 2019 4 次提交
  2. 19 8月, 2019 1 次提交
  3. 17 8月, 2019 2 次提交
    • M
      Bluetooth: Add debug setting for changing minimum encryption key size · 58a96fc3
      Marcel Holtmann 提交于
      For testing and qualification purposes it is useful to allow changing
      the minimum encryption key size value that the host stack is going to
      enforce. This adds a new debugfs setting min_encrypt_key_size to achieve
      this functionality.
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>
      58a96fc3
    • T
      tipc: fix false detection of retransmit failures · 71204231
      Tuong Lien 提交于
      This commit eliminates the use of the link 'stale_limit' & 'prev_from'
      (besides the already removed - 'stale_cnt') variables in the detection
      of repeated retransmit failures as there is no proper way to initialize
      them to avoid a false detection, i.e. it is not really a retransmission
      failure but due to a garbage values in the variables.
      
      Instead, a jiffies variable will be added to individual skbs (like the
      way we restrict the skb retransmissions) in order to mark the first skb
      retransmit time. Later on, at the next retransmissions, the timestamp
      will be checked to see if the skb in the link transmq is "too stale",
      that is, the link tolerance time has passed, so that a link reset will
      be ordered. Note, just checking on the first skb in the queue is fine
      enough since it must be the oldest one.
      A counter is also added to keep track the actual skb retransmissions'
      number for later checking when the failure happens.
      
      The downside of this approach is that the skb->cb[] buffer is about to
      be exhausted, however it is always able to allocate another memory area
      and keep a reference to it when needed.
      
      Fixes: 77cf8edb ("tipc: simplify stale link failure criteria")
      Reported-by: NHoang Le <hoang.h.le@dektech.com.au>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71204231
  4. 16 8月, 2019 2 次提交
    • E
      net/packet: fix race in tpacket_snd() · 32d3182c
      Eric Dumazet 提交于
      packet_sendmsg() checks tx_ring.pg_vec to decide
      if it must call tpacket_snd().
      
      Problem is that the check is lockless, meaning another thread
      can issue a concurrent setsockopt(PACKET_TX_RING ) to flip
      tx_ring.pg_vec back to NULL.
      
      Given that tpacket_snd() grabs pg_vec_lock mutex, we can
      perform the check again to solve the race.
      
      syzbot reported :
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 11429 Comm: syz-executor394 Not tainted 5.3.0-rc4+ #101
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:packet_lookup_frame+0x8d/0x270 net/packet/af_packet.c:474
      Code: c1 ee 03 f7 73 0c 80 3c 0e 00 0f 85 cb 01 00 00 48 8b 0b 89 c0 4c 8d 24 c1 48 b8 00 00 00 00 00 fc ff df 4c 89 e1 48 c1 e9 03 <80> 3c 01 00 0f 85 94 01 00 00 48 8d 7b 10 4d 8b 3c 24 48 b8 00 00
      RSP: 0018:ffff88809f82f7b8 EFLAGS: 00010246
      RAX: dffffc0000000000 RBX: ffff8880a45c7030 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 1ffff110148b8e06 RDI: ffff8880a45c703c
      RBP: ffff88809f82f7e8 R08: ffff888087aea200 R09: fffffbfff134ae50
      R10: fffffbfff134ae4f R11: ffffffff89a5727f R12: 0000000000000000
      R13: 0000000000000001 R14: ffff8880a45c6ac0 R15: 0000000000000000
      FS:  00007fa04716f700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fa04716edb8 CR3: 0000000091eb4000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       packet_current_frame net/packet/af_packet.c:487 [inline]
       tpacket_snd net/packet/af_packet.c:2667 [inline]
       packet_sendmsg+0x590/0x6250 net/packet/af_packet.c:2975
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:657
       ___sys_sendmsg+0x3e2/0x920 net/socket.c:2311
       __sys_sendmmsg+0x1bf/0x4d0 net/socket.c:2413
       __do_sys_sendmmsg net/socket.c:2442 [inline]
       __se_sys_sendmmsg net/socket.c:2439 [inline]
       __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2439
       do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 69e3c75f ("net: TX_RING and packet mmap")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32d3182c
    • J
      net: tls, fix sk_write_space NULL write when tx disabled · d85f0177
      John Fastabend 提交于
      The ctx->sk_write_space pointer is only set when TLS tx mode is enabled.
      When running without TX mode its a null pointer but we still set the
      sk sk_write_space pointer on close().
      
      Fix the close path to only overwrite sk->sk_write_space when the current
      pointer is to the tls_write_space function indicating the tls module should
      clean it up properly as well.
      Reported-by: NHillf Danton <hdanton@sina.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Fixes: 57c722e9 ("net/tls: swap sk_write_space on close")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d85f0177
  5. 14 8月, 2019 6 次提交
  6. 13 8月, 2019 1 次提交
  7. 12 8月, 2019 3 次提交
    • C
      tipc: initialise addr_trail_end when setting node addresses · 8874ecae
      Chris Packham 提交于
      We set the field 'addr_trial_end' to 'jiffies', instead of the current
      value 0, at the moment the node address is initialized. This guarantees
      we don't inadvertently enter an address trial period when the node
      address is explicitly set by the user.
      Signed-off-by: NChris Packham <chris.packham@alliedtelesis.co.nz>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8874ecae
    • C
      net: dsa: Check existence of .port_mdb_add callback before calling it · 58799865
      Chen-Yu Tsai 提交于
      The dsa framework has optional .port_mdb_{prepare,add,del} callback fields
      for drivers to handle multicast database entries. When adding an entry, the
      framework goes through a prepare phase, then a commit phase. Drivers not
      providing these callbacks should be detected in the prepare phase.
      
      DSA core may still bypass the bridge layer and call the dsa_port_mdb_add
      function directly with no prepare phase or no switchdev trans object,
      and the framework ends up calling an undefined .port_mdb_add callback.
      This results in a NULL pointer dereference, as shown in the log below.
      
      The other functions seem to be properly guarded. Do the same for
      .port_mdb_add in dsa_switch_mdb_add_bitmap() as well.
      
          8<--- cut here ---
          Unable to handle kernel NULL pointer dereference at virtual address 00000000
          pgd = (ptrval)
          [00000000] *pgd=00000000
          Internal error: Oops: 80000005 [#1] SMP ARM
          Modules linked in: rtl8xxxu rtl8192cu rtl_usb rtl8192c_common rtlwifi mac80211 cfg80211
          CPU: 1 PID: 134 Comm: kworker/1:2 Not tainted 5.3.0-rc1-00247-gd3519030752a #1
          Hardware name: Allwinner sun7i (A20) Family
          Workqueue: events switchdev_deferred_process_work
          PC is at 0x0
          LR is at dsa_switch_event+0x570/0x620
          pc : [<00000000>]    lr : [<c08533ec>]    psr: 80070013
          sp : ee871db8  ip : 00000000  fp : ee98d0a4
          r10: 0000000c  r9 : 00000008  r8 : ee89f710
          r7 : ee98d040  r6 : ee98d088  r5 : c0f04c48  r4 : ee98d04c
          r3 : 00000000  r2 : ee89f710  r1 : 00000008  r0 : ee98d040
          Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
          Control: 10c5387d  Table: 6deb406a  DAC: 00000051
          Process kworker/1:2 (pid: 134, stack limit = 0x(ptrval))
          Stack: (0xee871db8 to 0xee872000)
          1da0:                                                       ee871e14 103ace2d
          1dc0: 00000000 ffffffff 00000000 ee871e14 00000005 00000000 c08524a0 00000000
          1de0: ffffe000 c014bdfc c0f04c48 ee871e98 c0f04c48 ee9e5000 c0851120 c014bef0
          1e00: 00000000 b643aea2 ee9b4068 c08509a8 ee2bf940 ee89f710 ee871ecb 00000000
          1e20: 00000008 103ace2d 00000000 c087e248 ee29c868 103ace2d 00000001 ffffffff
          1e40: 00000000 ee871e98 00000006 00000000 c0fb2a50 c087e2d0 ffffffff c08523c4
          1e60: ffffffff c014bdfc 00000006 c0fad2d0 ee871e98 ee89f710 00000000 c014c500
          1e80: 00000000 ee89f3c0 c0f04c48 00000000 ee9e5000 c087dfb4 ee9e5000 00000000
          1ea0: ee89f710 ee871ecb 00000001 103ace2d 00000000 c0f04c48 00000000 c087e0a8
          1ec0: 00000000 efd9a3e0 0089f3c0 103ace2d ee89f700 ee89f710 ee9e5000 00000122
          1ee0: 00000100 c087e130 ee89f700 c0fad2c8 c1003ef0 c087de4c 2e928000 c0fad2ec
          1f00: c0fad2ec ee839580 ef7a62c0 ef7a9400 00000000 c087def8 c0fad2ec c01447dc
          1f20: ef315640 ef7a62c0 00000008 ee839580 ee839594 ef7a62c0 00000008 c0f03d00
          1f40: ef7a62d8 ef7a62c0 ffffe000 c0145b84 ffffe000 c0fb2420 c0bfaa8c 00000000
          1f60: ffffe000 ee84b600 ee84b5c0 00000000 ee870000 ee839580 c0145b40 ef0e5ea4
          1f80: ee84b61c c014a6f8 00000001 ee84b5c0 c014a5b0 00000000 00000000 00000000
          1fa0: 00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000
          1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
          1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
          [<c08533ec>] (dsa_switch_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84)
          [<c014bdfc>] (notifier_call_chain) from [<c014bef0>] (raw_notifier_call_chain+0x18/0x20)
          [<c014bef0>] (raw_notifier_call_chain) from [<c08509a8>] (dsa_port_mdb_add+0x48/0x74)
          [<c08509a8>] (dsa_port_mdb_add) from [<c087e248>] (__switchdev_handle_port_obj_add+0x54/0xd4)
          [<c087e248>] (__switchdev_handle_port_obj_add) from [<c087e2d0>] (switchdev_handle_port_obj_add+0x8/0x14)
          [<c087e2d0>] (switchdev_handle_port_obj_add) from [<c08523c4>] (dsa_slave_switchdev_blocking_event+0x94/0xa4)
          [<c08523c4>] (dsa_slave_switchdev_blocking_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84)
          [<c014bdfc>] (notifier_call_chain) from [<c014c500>] (blocking_notifier_call_chain+0x50/0x68)
          [<c014c500>] (blocking_notifier_call_chain) from [<c087dfb4>] (switchdev_port_obj_notify+0x44/0xa8)
          [<c087dfb4>] (switchdev_port_obj_notify) from [<c087e0a8>] (switchdev_port_obj_add_now+0x90/0x104)
          [<c087e0a8>] (switchdev_port_obj_add_now) from [<c087e130>] (switchdev_port_obj_add_deferred+0x14/0x5c)
          [<c087e130>] (switchdev_port_obj_add_deferred) from [<c087de4c>] (switchdev_deferred_process+0x64/0x104)
          [<c087de4c>] (switchdev_deferred_process) from [<c087def8>] (switchdev_deferred_process_work+0xc/0x14)
          [<c087def8>] (switchdev_deferred_process_work) from [<c01447dc>] (process_one_work+0x218/0x50c)
          [<c01447dc>] (process_one_work) from [<c0145b84>] (worker_thread+0x44/0x5bc)
          [<c0145b84>] (worker_thread) from [<c014a6f8>] (kthread+0x148/0x150)
          [<c014a6f8>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
          Exception stack(0xee871fb0 to 0xee871ff8)
          1fa0:                                     00000000 00000000 00000000 00000000
          1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
          1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
          Code: bad PC value
          ---[ end trace 1292c61abd17b130 ]---
      
          [<c08533ec>] (dsa_switch_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84)
          corresponds to
      
      	$ arm-linux-gnueabihf-addr2line -C -i -e vmlinux c08533ec
      
      	linux/net/dsa/switch.c:156
      	linux/net/dsa/switch.c:178
      	linux/net/dsa/switch.c:328
      
      Fixes: e6db98db ("net: dsa: add switch mdb bitmap functions")
      Signed-off-by: NChen-Yu Tsai <wens@csie.org>
      Reviewed-by: NVivien Didelot <vivien.didelot@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58799865
    • D
      rxrpc: Fix local refcounting · 68553f1a
      David Howells 提交于
      Fix rxrpc_unuse_local() to handle a NULL local pointer as it can be called
      on an unbound socket on which rx->local is not yet set.
      
      The following reproduced (includes omitted):
      
      	int main(void)
      	{
      		socket(AF_RXRPC, SOCK_DGRAM, AF_INET);
      		return 0;
      	}
      
      causes the following oops to occur:
      
      	BUG: kernel NULL pointer dereference, address: 0000000000000010
      	...
      	RIP: 0010:rxrpc_unuse_local+0x8/0x1b
      	...
      	Call Trace:
      	 rxrpc_release+0x2b5/0x338
      	 __sock_release+0x37/0xa1
      	 sock_close+0x14/0x17
      	 __fput+0x115/0x1e9
      	 task_work_run+0x72/0x98
      	 do_exit+0x51b/0xa7a
      	 ? __context_tracking_exit+0x4e/0x10e
      	 do_group_exit+0xab/0xab
      	 __x64_sys_exit_group+0x14/0x17
      	 do_syscall_64+0x89/0x1d4
      	 entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Reported-by: syzbot+20dee719a2e090427b5f@syzkaller.appspotmail.com
      Fixes: 730c5fd4 ("rxrpc: Fix local endpoint refcounting")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Jeffrey Altman <jaltman@auristor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68553f1a
  8. 10 8月, 2019 2 次提交
    • J
      net/tls: swap sk_write_space on close · 57c722e9
      Jakub Kicinski 提交于
      Now that we swap the original proto and clear the ULP pointer
      on close we have to make sure no callback will try to access
      the freed state. sk_write_space is not part of sk_prot, remember
      to swap it.
      
      Reported-by: syzbot+dcdc9deefaec44785f32@syzkaller.appspotmail.com
      Fixes: 95fa1454 ("bpf: sockmap/tls, close can race with map free")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57c722e9
    • D
      sock: make cookie generation global instead of per netns · cd48bdda
      Daniel Borkmann 提交于
      Generating and retrieving socket cookies are a useful feature that is
      exposed to BPF for various program types through bpf_get_socket_cookie()
      helper.
      
      The fact that the cookie counter is per netns is quite a limitation
      for BPF in practice in particular for programs in host namespace that
      use socket cookies as part of a map lookup key since they will be
      causing socket cookie collisions e.g. when attached to BPF cgroup hooks
      or cls_bpf on tc egress in host namespace handling container traffic
      from veth or ipvlan devices with peer in different netns. Change the
      counter to be global instead.
      
      Socket cookie consumers must assume the value as opqaue in any case.
      Not every socket must have a cookie generated and knowledge of the
      counter value itself does not provide much value either way hence
      conversion to global is fine.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Martynas Pumputis <m@lambda.lt>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd48bdda
  9. 09 8月, 2019 9 次提交
    • D
      rxrpc: Don't bother generating maxSkew in the ACK packet · e8c3af6b
      David Howells 提交于
      Don't bother generating maxSkew in the ACK packet as it has been obsolete
      since AFS 3.1.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NJeffrey Altman <jaltman@auristor.com>
      e8c3af6b
    • D
      rxrpc: Fix local endpoint refcounting · 730c5fd4
      David Howells 提交于
      The object lifetime management on the rxrpc_local struct is broken in that
      the rxrpc_local_processor() function is expected to clean up and remove an
      object - but it may get requeued by packets coming in on the backing UDP
      socket once it starts running.
      
      This may result in the assertion in rxrpc_local_rcu() firing because the
      memory has been scheduled for RCU destruction whilst still queued:
      
      	rxrpc: Assertion failed
      	------------[ cut here ]------------
      	kernel BUG at net/rxrpc/local_object.c:468!
      
      Note that if the processor comes around before the RCU free function, it
      will just do nothing because ->dead is true.
      
      Fix this by adding a separate refcount to count active users of the
      endpoint that causes the endpoint to be destroyed when it reaches 0.
      
      The original refcount can then be used to refcount objects through the work
      processor and cause the memory to be rcu freed when that reaches 0.
      
      Fixes: 4f95dd78 ("rxrpc: Rework local endpoint management")
      Reported-by: syzbot+1e0edc4b8b7494c28450@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      730c5fd4
    • P
      netfilter: nf_flow_table: teardown flow timeout race · 1e5b2471
      Pablo Neira Ayuso 提交于
      Flows that are in teardown state (due to RST / FIN TCP packet) still
      have their offload flag set on. Hence, the conntrack garbage collector
      may race to undo the timeout adjustment that the fixup routine performs,
      leaving the conntrack entry in place with the internal offload timeout
      (one day).
      
      Update teardown flow state to ESTABLISHED and set tracking to liberal,
      then once the offload bit is cleared, adjust timeout if it is more than
      the default fixup timeout (conntrack might already have set a lower
      timeout from the packet path).
      
      Fixes: da5984e5 ("netfilter: nf_flow_table: add support for sending flows back to the slow path")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      1e5b2471
    • P
      netfilter: nf_flow_table: conntrack picks up expired flows · 3e68db2f
      Pablo Neira Ayuso 提交于
      Update conntrack entry to pick up expired flows, otherwise the conntrack
      entry gets stuck with the internal offload timeout (one day). The TCP
      state also needs to be adjusted to ESTABLISHED state and tracking is set
      to liberal mode in order to give conntrack a chance to pick up the
      expired flow.
      
      Fixes: ac2a6666 ("netfilter: add generic flow table infrastructure")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      3e68db2f
    • P
      netfilter: nf_tables: use-after-free in failing rule with bound set · 6a0a8d10
      Pablo Neira Ayuso 提交于
      If a rule that has already a bound anonymous set fails to be added, the
      preparation phase releases the rule and the bound set. However, the
      transaction object from the abort path still has a reference to the set
      object that is stale, leading to a use-after-free when checking for the
      set->bound field. Add a new field to the transaction that specifies if
      the set is bound, so the abort path can skip releasing it since the rule
      command owns it and it takes care of releasing it. After this update,
      the set->bound field is removed.
      
      [   24.649883] Unable to handle kernel paging request at virtual address 0000000000040434
      [   24.657858] Mem abort info:
      [   24.660686]   ESR = 0x96000004
      [   24.663769]   Exception class = DABT (current EL), IL = 32 bits
      [   24.669725]   SET = 0, FnV = 0
      [   24.672804]   EA = 0, S1PTW = 0
      [   24.675975] Data abort info:
      [   24.678880]   ISV = 0, ISS = 0x00000004
      [   24.682743]   CM = 0, WnR = 0
      [   24.685723] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000428952000
      [   24.692207] [0000000000040434] pgd=0000000000000000
      [   24.697119] Internal error: Oops: 96000004 [#1] SMP
      [...]
      [   24.889414] Call trace:
      [   24.891870]  __nf_tables_abort+0x3f0/0x7a0
      [   24.895984]  nf_tables_abort+0x20/0x40
      [   24.899750]  nfnetlink_rcv_batch+0x17c/0x588
      [   24.904037]  nfnetlink_rcv+0x13c/0x190
      [   24.907803]  netlink_unicast+0x18c/0x208
      [   24.911742]  netlink_sendmsg+0x1b0/0x350
      [   24.915682]  sock_sendmsg+0x4c/0x68
      [   24.919185]  ___sys_sendmsg+0x288/0x2c8
      [   24.923037]  __sys_sendmsg+0x7c/0xd0
      [   24.926628]  __arm64_sys_sendmsg+0x2c/0x38
      [   24.930744]  el0_svc_common.constprop.0+0x94/0x158
      [   24.935556]  el0_svc_handler+0x34/0x90
      [   24.939322]  el0_svc+0x8/0xc
      [   24.942216] Code: 37280300 f9404023 91014262 aa1703e0 (f9401863)
      [   24.948336] ---[ end trace cebbb9dcbed3b56f ]---
      
      Fixes: f6ac8585 ("netfilter: nf_tables: unbind set in rule from commit path")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      6a0a8d10
    • J
      net/tls: prevent skb_orphan() from leaking TLS plain text with offload · 41477662
      Jakub Kicinski 提交于
      sk_validate_xmit_skb() and drivers depend on the sk member of
      struct sk_buff to identify segments requiring encryption.
      Any operation which removes or does not preserve the original TLS
      socket such as skb_orphan() or skb_clone() will cause clear text
      leaks.
      
      Make the TCP socket underlying an offloaded TLS connection
      mark all skbs as decrypted, if TLS TX is in offload mode.
      Then in sk_validate_xmit_skb() catch skbs which have no socket
      (or a socket with no validation) and decrypted flag set.
      
      Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and
      sk->sk_validate_xmit_skb are slightly interchangeable right now,
      they all imply TLS offload. The new checks are guarded by
      CONFIG_TLS_DEVICE because that's the option guarding the
      sk_buff->decrypted member.
      
      Second, smaller issue with orphaning is that it breaks
      the guarantee that packets will be delivered to device
      queues in-order. All TLS offload drivers depend on that
      scheduling property. This means skb_orphan_partial()'s
      trick of preserving partial socket references will cause
      issues in the drivers. We need a full orphan, and as a
      result netem delay/throttling will cause all TLS offload
      skbs to be dropped.
      
      Reusing the sk_buff->decrypted flag also protects from
      leaking clear text when incoming, decrypted skb is redirected
      (e.g. by TC).
      
      See commit 0608c69c ("bpf: sk_msg, sock{map|hash} redirect
      through ULP") for justification why the internal flag is safe.
      The only location which could leak the flag in is tcp_bpf_sendmsg(),
      which is taken care of by clearing the previously unused bit.
      
      v2:
       - remove superfluous decrypted mark copy (Willem);
       - remove the stale doc entry (Boris);
       - rely entirely on EOR marking to prevent coalescing (Boris);
       - use an internal sendpages flag instead of marking the socket
         (Boris).
      v3 (Willem):
       - reorganize the can_skb_orphan_partial() condition;
       - fix the flag leak-in through tcp_bpf_sendmsg.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41477662
    • R
      net sched: update skbedit action for batched events operations · e1fea322
      Roman Mashak 提交于
      Add get_fill_size() routine used to calculate the action size
      when building a batch of events.
      
      Fixes: ca9b0e27 ("pkt_action: add new action skbedit")
      Signed-off-by: NRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1fea322
    • I
      net: sched: sch_taprio: fix memleak in error path for sched list parse · 51650d33
      Ivan Khoronzhuk 提交于
      In error case, all entries should be freed from the sched list
      before deleting it. For simplicity use rcu way.
      
      Fixes: 5a781ccb ("tc: Add support for configuring the taprio scheduler")
      Acked-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51650d33
    • G
      inet: frags: re-introduce skb coalescing for local delivery · 891584f4
      Guillaume Nault 提交于
      Before commit d4289fcc ("net: IP6 defrag: use rbtrees for IPv6
      defrag"), a netperf UDP_STREAM test[0] using big IPv6 datagrams (thus
      generating many fragments) and running over an IPsec tunnel, reported
      more than 6Gbps throughput. After that patch, the same test gets only
      9Mbps when receiving on a be2net nic (driver can make a big difference
      here, for example, ixgbe doesn't seem to be affected).
      
      By reusing the IPv4 defragmentation code, IPv6 lost fragment coalescing
      (IPv4 fragment coalescing was dropped by commit 14fe22e3 ("Revert
      "ipv4: use skb coalescing in defragmentation"")).
      
      Without fragment coalescing, be2net runs out of Rx ring entries and
      starts to drop frames (ethtool reports rx_drops_no_frags errors). Since
      the netperf traffic is only composed of UDP fragments, any lost packet
      prevents reassembly of the full datagram. Therefore, fragments which
      have no possibility to ever get reassembled pile up in the reassembly
      queue, until the memory accounting exeeds the threshold. At that point
      no fragment is accepted anymore, which effectively discards all
      netperf traffic.
      
      When reassembly timeout expires, some stale fragments are removed from
      the reassembly queue, so a few packets can be received, reassembled
      and delivered to the netperf receiver. But the nic still drops frames
      and soon the reassembly queue gets filled again with stale fragments.
      These long time frames where no datagram can be received explain why
      the performance drop is so significant.
      
      Re-introducing fragment coalescing is enough to get the initial
      performances again (6.6Gbps with be2net): driver doesn't drop frames
      anymore (no more rx_drops_no_frags errors) and the reassembly engine
      works at full speed.
      
      This patch is quite conservative and only coalesces skbs for local
      IPv4 and IPv6 delivery (in order to avoid changing skb geometry when
      forwarding). Coalescing could be extended in the future if need be, as
      more scenarios would probably benefit from it.
      
      [0]: Test configuration
      Sender:
      ip xfrm policy flush
      ip xfrm state flush
      ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1
      ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir in tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow
      ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1
      ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir out tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow
      netserver -D -L fc00:2::1
      
      Receiver:
      ip xfrm policy flush
      ip xfrm state flush
      ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1
      ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir in tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow
      ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1
      ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir out tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow
      netperf -H fc00:2::1 -f k -P 0 -L fc00:1::1 -l 60 -t UDP_STREAM -I 99,5 -i 5,5 -T5,5 -6
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      891584f4
  10. 07 8月, 2019 3 次提交
    • V
      net: dsa: sja1105: Fix memory leak on meta state machine error path · 93fa8587
      Vladimir Oltean 提交于
      When RX timestamping is enabled and two link-local (non-meta) frames are
      received in a row, this constitutes an error.
      
      The tagger is always caching the last link-local frame, in an attempt to
      merge it with the meta follow-up frame when that arrives. To recover
      from the above error condition, the initial cached link-local frame is
      dropped and the second frame in a row is cached (in expectance of the
      second meta frame).
      
      However, when dropping the initial link-local frame, its backing memory
      was being leaked.
      
      Fixes: f3097be2 ("net: dsa: sja1105: Add a state machine for RX timestamping")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93fa8587
    • V
      net: dsa: sja1105: Fix memory leak on meta state machine normal path · f163fed2
      Vladimir Oltean 提交于
      After a meta frame is received, it is associated with the cached
      sp->data->stampable_skb from the DSA tagger private structure.
      
      Cached means its refcount is incremented with skb_get() in order for
      dsa_switch_rcv() to not free it when the tagger .rcv returns NULL.
      
      The mistake is that skb_unref() is not the correct function to use. It
      will correctly decrement the refcount (which will go back to zero) but
      the skb memory will not be freed.  That is the job of kfree_skb(), which
      also calls skb_unref().
      
      But it turns out that freeing the cached stampable_skb is in fact not
      necessary.  It is still a perfectly valid skb, and now it is even
      annotated with the partial RX timestamp.  So remove the skb_copy()
      altogether and simply pass the stampable_skb with a refcount of 1
      (incremented by us, decremented by dsa_switch_rcv) up the stack.
      
      Fixes: f3097be2 ("net: dsa: sja1105: Add a state machine for RX timestamping")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f163fed2
    • R
      net sched: update vlan action for batched events operations · b35475c5
      Roman Mashak 提交于
      Add get_fill_size() routine used to calculate the action size
      when building a batch of events.
      
      Fixes: c7e2b968 ("sched: introduce vlan action")
      Signed-off-by: NRoman Mashak <mrv@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b35475c5
  11. 06 8月, 2019 7 次提交
    • N
      net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER · 091adf9b
      Nikolay Aleksandrov 提交于
      Most of the bridge device's vlan init bugs come from the fact that its
      default pvid is created at the wrong time, way too early in ndo_init()
      before the device is even assigned an ifindex. It introduces a bug when the
      bridge's dev_addr is added as fdb during the initial default pvid creation
      the notification has ifindex/NDA_MASTER both equal to 0 (see example below)
      which really makes no sense for user-space[0] and is wrong.
      Usually user-space software would ignore such entries, but they are
      actually valid and will eventually have all necessary attributes.
      It makes much more sense to send a notification *after* the device has
      registered and has a proper ifindex allocated rather than before when
      there's a chance that the registration might still fail or to receive
      it with ifindex/NDA_MASTER == 0. Note that we can remove the fdb flush
      from br_vlan_flush() since that case can no longer happen. At
      NETDEV_REGISTER br->default_pvid is always == 1 as it's initialized by
      br_vlan_init() before that and at NETDEV_UNREGISTER it can be anything
      depending why it was called (if called due to NETDEV_REGISTER error
      it'll still be == 1, otherwise it could be any value changed during the
      device life time).
      
      For the demonstration below a small change to iproute2 for printing all fdb
      notifications is added, because it contained a workaround not to show
      entries with ifindex == 0.
      Command executed while monitoring: $ ip l add br0 type bridge
      Before (both ifindex and master == 0):
      $ bridge monitor fdb
      36:7e:8a:b3:56:ba dev * vlan 1 master * permanent
      
      After (proper br0 ifindex):
      $ bridge monitor fdb
      e6:2a:ae:7a:b7:48 dev br0 vlan 1 master br0 permanent
      
      v4: move only the default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
      v3: send the correct v2 patch with all changes (stub should return 0)
      v2: on error in br_vlan_init set br->vlgrp to NULL and return 0 in
          the br_vlan_bridge_event stub when bridge vlans are disabled
      
      [0] https://bugzilla.kernel.org/show_bug.cgi?id=204389Reported-by: Nmichael-dev <michael-dev@fami-braun.de>
      Fixes: 5be5a2df ("bridge: Add filtering support for default_pvid")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      091adf9b
    • U
      net/smc: avoid fallback in case of non-blocking connect · cd206360
      Ursula Braun 提交于
      FASTOPEN is not possible with SMC. sendmsg() with msg_flag MSG_FASTOPEN
      triggers a fallback to TCP if the socket is in state SMC_INIT.
      But if a nonblocking connect is already started, fallback to TCP
      is no longer possible, even though the socket may still be in state
      SMC_INIT.
      And if a nonblocking connect is already started, a listen() call
      does not make sense.
      
      Reported-by: syzbot+bd8cc73d665590a1fcad@syzkaller.appspotmail.com
      Fixes: 50717a37 ("net/smc: nonblocking connect rework")
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd206360
    • U
      net/smc: do not schedule tx_work in SMC_CLOSED state · f9cedf1a
      Ursula Braun 提交于
      The setsockopts options TCP_NODELAY and TCP_CORK may schedule the
      tx worker. Make sure the socket is not yet moved into SMC_CLOSED
      state (for instance by a shutdown SHUT_RDWR call).
      
      Reported-by: syzbot+92209502e7aab127c75f@syzkaller.appspotmail.com
      Reported-by: syzbot+b972214bb803a343f4fe@syzkaller.appspotmail.com
      Fixes: 01d2f7e2 ("net/smc: sockopts TCP_NODELAY and TCP_CORK")
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9cedf1a
    • D
      ipv6: Fix unbalanced rcu locking in rt6_update_exception_stamp_rt · cff6a327
      David Ahern 提交于
      The nexthop path in rt6_update_exception_stamp_rt needs to call
      rcu_read_unlock if it fails to find a fib6_nh match rather than
      just returning.
      
      Fixes: e659ba31 ("ipv6: Handle all fib6_nh in a nexthop in exception handling")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cff6a327
    • J
      net/tls: partially revert fix transition through disconnect with close · 5d92e631
      Jakub Kicinski 提交于
      Looks like we were slightly overzealous with the shutdown()
      cleanup. Even though the sock->sk_state can reach CLOSED again,
      socket->state will not got back to SS_UNCONNECTED once
      connections is ESTABLISHED. Meaning we will see EISCONN if
      we try to reconnect, and EINVAL if we try to listen.
      
      Only listen sockets can be shutdown() and reused, but since
      ESTABLISHED sockets can never be re-connected() or used for
      listen() we don't need to try to clean up the ULP state early.
      
      Fixes: 32857cf5 ("net/tls: fix transition through disconnect with close")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d92e631
    • J
      net: fix bpf_xdp_adjust_head regression for generic-XDP · 065af355
      Jesper Dangaard Brouer 提交于
      When generic-XDP was moved to a later processing step by commit
      458bf2f2 ("net: core: support XDP generic on stacked devices.")
      a regression was introduced when using bpf_xdp_adjust_head.
      
      The issue is that after this commit the skb->network_header is now
      changed prior to calling generic XDP and not after. Thus, if the header
      is changed by XDP (via bpf_xdp_adjust_head), then skb->network_header
      also need to be updated again.  Fix by calling skb_reset_network_header().
      
      Fixes: 458bf2f2 ("net: core: support XDP generic on stacked devices.")
      Reported-by: NBrandon Cazander <brandon.cazander@multapplied.net>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      065af355
    • D
      net: sched: use temporary variable for actions indexes · 7be8ef2c
      Dmytro Linkin 提交于
      Currently init call of all actions (except ipt) init their 'parm'
      structure as a direct pointer to nla data in skb. This leads to race
      condition when some of the filter actions were initialized successfully
      (and were assigned with idr action index that was written directly
      into nla data), but then were deleted and retried (due to following
      action module missing or classifier-initiated retry), in which case
      action init code tries to insert action to idr with index that was
      assigned on previous iteration. During retry the index can be reused
      by another action that was inserted concurrently, which causes
      unintended action sharing between filters.
      To fix described race condition, save action idr index to temporary
      stack-allocated variable instead on nla data.
      
      Fixes: 0190c1d4 ("net: sched: atomically check-allocate action")
      Signed-off-by: NDmytro Linkin <dmitrolin@mellanox.com>
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7be8ef2c