1. 07 12月, 2017 4 次提交
  2. 06 12月, 2017 7 次提交
  3. 05 12月, 2017 7 次提交
  4. 04 12月, 2017 1 次提交
    • E
      tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb() · eeea10b8
      Eric Dumazet 提交于
      James Morris reported kernel stack corruption bug [1] while
      running the SELinux testsuite, and bisected to a recent
      commit bffa72cf ("net: sk_buff rbnode reorg")
      
      We believe this commit is fine, but exposes an older bug.
      
      SELinux code runs from tcp_filter() and might send an ICMP,
      expecting IP options to be found in skb->cb[] using regular IPCB placement.
      
      We need to defer TCP mangling of skb->cb[] after tcp_filter() calls.
      
      This patch adds tcp_v4_fill_cb()/tcp_v4_restore_cb() in a very
      similar way we added them for IPv6.
      
      [1]
      [  339.806024] SELinux: failure in selinux_parse_skb(), unable to parse packet
      [  339.822505] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81745af5
      [  339.822505]
      [  339.852250] CPU: 4 PID: 3642 Comm: client Not tainted 4.15.0-rc1-test #15
      [  339.868498] Hardware name: LENOVO 10FGS0VA1L/30BC, BIOS FWKT68A   01/19/2017
      [  339.885060] Call Trace:
      [  339.896875]  <IRQ>
      [  339.908103]  dump_stack+0x63/0x87
      [  339.920645]  panic+0xe8/0x248
      [  339.932668]  ? ip_push_pending_frames+0x33/0x40
      [  339.946328]  ? icmp_send+0x525/0x530
      [  339.958861]  ? kfree_skbmem+0x60/0x70
      [  339.971431]  __stack_chk_fail+0x1b/0x20
      [  339.984049]  icmp_send+0x525/0x530
      [  339.996205]  ? netlbl_skbuff_err+0x36/0x40
      [  340.008997]  ? selinux_netlbl_err+0x11/0x20
      [  340.021816]  ? selinux_socket_sock_rcv_skb+0x211/0x230
      [  340.035529]  ? security_sock_rcv_skb+0x3b/0x50
      [  340.048471]  ? sk_filter_trim_cap+0x44/0x1c0
      [  340.061246]  ? tcp_v4_inbound_md5_hash+0x69/0x1b0
      [  340.074562]  ? tcp_filter+0x2c/0x40
      [  340.086400]  ? tcp_v4_rcv+0x820/0xa20
      [  340.098329]  ? ip_local_deliver_finish+0x71/0x1a0
      [  340.111279]  ? ip_local_deliver+0x6f/0xe0
      [  340.123535]  ? ip_rcv_finish+0x3a0/0x3a0
      [  340.135523]  ? ip_rcv_finish+0xdb/0x3a0
      [  340.147442]  ? ip_rcv+0x27c/0x3c0
      [  340.158668]  ? inet_del_offload+0x40/0x40
      [  340.170580]  ? __netif_receive_skb_core+0x4ac/0x900
      [  340.183285]  ? rcu_accelerate_cbs+0x5b/0x80
      [  340.195282]  ? __netif_receive_skb+0x18/0x60
      [  340.207288]  ? process_backlog+0x95/0x140
      [  340.218948]  ? net_rx_action+0x26c/0x3b0
      [  340.230416]  ? __do_softirq+0xc9/0x26a
      [  340.241625]  ? do_softirq_own_stack+0x2a/0x40
      [  340.253368]  </IRQ>
      [  340.262673]  ? do_softirq+0x50/0x60
      [  340.273450]  ? __local_bh_enable_ip+0x57/0x60
      [  340.285045]  ? ip_finish_output2+0x175/0x350
      [  340.296403]  ? ip_finish_output+0x127/0x1d0
      [  340.307665]  ? nf_hook_slow+0x3c/0xb0
      [  340.318230]  ? ip_output+0x72/0xe0
      [  340.328524]  ? ip_fragment.constprop.54+0x80/0x80
      [  340.340070]  ? ip_local_out+0x35/0x40
      [  340.350497]  ? ip_queue_xmit+0x15c/0x3f0
      [  340.361060]  ? __kmalloc_reserve.isra.40+0x31/0x90
      [  340.372484]  ? __skb_clone+0x2e/0x130
      [  340.382633]  ? tcp_transmit_skb+0x558/0xa10
      [  340.393262]  ? tcp_connect+0x938/0xad0
      [  340.403370]  ? ktime_get_with_offset+0x4c/0xb0
      [  340.414206]  ? tcp_v4_connect+0x457/0x4e0
      [  340.424471]  ? __inet_stream_connect+0xb3/0x300
      [  340.435195]  ? inet_stream_connect+0x3b/0x60
      [  340.445607]  ? SYSC_connect+0xd9/0x110
      [  340.455455]  ? __audit_syscall_entry+0xaf/0x100
      [  340.466112]  ? syscall_trace_enter+0x1d0/0x2b0
      [  340.476636]  ? __audit_syscall_exit+0x209/0x290
      [  340.487151]  ? SyS_connect+0xe/0x10
      [  340.496453]  ? do_syscall_64+0x67/0x1b0
      [  340.506078]  ? entry_SYSCALL64_slow_path+0x25/0x25
      
      Fixes: 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NJames Morris <james.l.morris@oracle.com>
      Tested-by: NJames Morris <james.l.morris@oracle.com>
      Tested-by: NCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eeea10b8
  5. 03 12月, 2017 13 次提交
  6. 02 12月, 2017 8 次提交
    • W
      ip6_gre: Add ERSPAN native tunnel support · 5a963eb6
      William Tu 提交于
      The patch adds support for ERSPAN tunnel over ipv6.
      Signed-off-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a963eb6
    • W
      ip6_gre: Refactor ip6gre xmit codes · 898b2979
      William Tu 提交于
      This patch refactors the ip6gre_xmit_{ipv4, ipv6}.
      It is a prep work to add the ip6erspan tunnel.
      Signed-off-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      898b2979
    • W
      ip_gre: Refector the erpsan tunnel code. · a3222dc9
      William Tu 提交于
      Move two erspan functions to header file, erspan.h, so ipv6
      erspan implementation can use it.
      Signed-off-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3222dc9
    • S
      rds: tcp: atomically purge entries from rds_tcp_conn_list during netns delete · f10b4cff
      Sowmini Varadhan 提交于
      The rds_tcp_kill_sock() function parses the rds_tcp_conn_list
      to find the rds_connection entries marked for deletion as part
      of the netns deletion under the protection of the rds_tcp_conn_lock.
      Since the rds_tcp_conn_list tracks rds_tcp_connections (which
      have a 1:1 mapping with rds_conn_path), multiple tc entries in
      the rds_tcp_conn_list will map to a single rds_connection, and will
      be deleted as part of the rds_conn_destroy() operation that is
      done outside the rds_tcp_conn_lock.
      
      The rds_tcp_conn_list traversal done under the protection of
      rds_tcp_conn_lock should not leave any doomed tc entries in
      the list after the rds_tcp_conn_lock is released, else another
      concurrently executiong netns delete (for a differnt netns) thread
      may trip on these entries.
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f10b4cff
    • S
      rds: tcp: correctly sequence cleanup on netns deletion. · 681648e6
      Sowmini Varadhan 提交于
      Commit 8edc3aff ("rds: tcp: Take explicit refcounts on struct net")
      introduces a regression in rds-tcp netns cleanup. The cleanup_net(),
      (and thus rds_tcp_dev_event notification) is only called from put_net()
      when all netns refcounts go to 0, but this cannot happen if the
      rds_connection itself is holding a c_net ref that it expects to
      release in rds_tcp_kill_sock.
      
      Instead, the rds_tcp_kill_sock callback should make sure to
      tear down state carefully, ensuring that the socket teardown
      is only done after all data-structures and workqs that depend
      on it are quiesced.
      
      The original motivation for commit 8edc3aff ("rds: tcp: Take explicit
      refcounts on struct net") was to resolve a race condition reported by
      syzkaller where workqs for tx/rx/connect were triggered after the
      namespace was deleted. Those worker threads should have been
      cancelled/flushed before socket tear-down and indeed,
      rds_conn_path_destroy() does try to sequence this by doing
           /* cancel cp_send_w */
           /* cancel cp_recv_w */
           /* flush cp_down_w */
           /* free data structures */
      Here the "flush cp_down_w" will trigger rds_conn_shutdown and thus
      invoke rds_tcp_conn_path_shutdown() to close the tcp socket, so that
      we ought to have satisfied the requirement that "socket-close is
      done after all other dependent state is quiesced". However,
      rds_conn_shutdown has a bug in that it *always* triggers the reconnect
      workq (and if connection is successful, we always restart tx/rx
      workqs so with the right timing, we risk the race conditions reported
      by syzkaller).
      
      Netns deletion is like module teardown- no need to restart a
      reconnect in this case. We can use the c_destroy_in_prog bit
      to avoid restarting the reconnect.
      
      Fixes: 8edc3aff ("rds: tcp: Take explicit refcounts on struct net")
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      681648e6
    • S
      rds: tcp: remove redundant function rds_tcp_conn_paths_destroy() · 2d746c93
      Sowmini Varadhan 提交于
      A side-effect of Commit c14b0366 ("rds: tcp: set linger to 1
      when unloading a rds-tcp") is that we always send a RST on the tcp
      connection for rds_conn_destroy(), so rds_tcp_conn_paths_destroy()
      is not needed any more and is removed in this patch.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d746c93
    • J
      tipc: fall back to smaller MTU if allocation of local send skb fails · 4c94cc2d
      Jon Maloy 提交于
      When sending node local messages the code is using an 'mtu' of 66060
      bytes to avoid unnecessary fragmentation. During situations of low
      memory tipc_msg_build() may sometimes fail to allocate such large
      buffers, resulting in unnecessary send failures. This can easily be
      remedied by falling back to a smaller MTU, and then reassemble the
      buffer chain as if the message were arriving from a remote node.
      
      At the same time, we change the initial MTU setting of the broadcast
      link to a lower value, so that large messages always are fragmented
      into smaller buffers even when we run in single node mode. Apart from
      obtaining the same advantage as for the 'fallback' solution above, this
      turns out to give a significant performance improvement. This can
      probably be explained with the __pskb_copy() operation performed on the
      buffer for each recipient during reception. We found the optimal value
      for this, considering the most relevant skb pool, to be 3744 bytes.
      Acked-by: NYing Xue <ying.xue@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c94cc2d
    • T
      tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv() · c7799c06
      Tommi Rantala 提交于
      Remove the second tipc_rcv() call in tipc_udp_recv(). We have just
      checked that the bearer is not up, and calling tipc_rcv() with a bearer
      that is not up leads to a TIPC div-by-zero crash in
      tipc_node_calculate_timer(). The crash is rare in practice, but can
      happen like this:
      
        We're enabling a bearer, but it's not yet up and fully initialized.
        At the same time we receive a discovery packet, and in tipc_udp_recv()
        we end up calling tipc_rcv() with the not-yet-initialized bearer,
        causing later the div-by-zero crash in tipc_node_calculate_timer().
      
      Jon Maloy explains the impact of removing the second tipc_rcv() call:
        "link setup in the worst case will be delayed until the next arriving
         discovery messages, 1 sec later, and this is an acceptable delay."
      
      As the tipc_rcv() call is removed, just leave the function via the
      rcu_out label, so that we will kfree_skb().
      
      [   12.590450] Own node address <1.1.1>, network identity 1
      [   12.668088] divide error: 0000 [#1] SMP
      [   12.676952] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.14.2-dirty #1
      [   12.679225] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
      [   12.682095] task: ffff8c2a761edb80 task.stack: ffffa41cc0cac000
      [   12.684087] RIP: 0010:tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc]
      [   12.686486] RSP: 0018:ffff8c2a7fc838a0 EFLAGS: 00010246
      [   12.688451] RAX: 0000000000000000 RBX: ffff8c2a5b382600 RCX: 0000000000000000
      [   12.691197] RDX: 0000000000000000 RSI: ffff8c2a5b382600 RDI: ffff8c2a5b382600
      [   12.693945] RBP: ffff8c2a7fc838b0 R08: 0000000000000001 R09: 0000000000000001
      [   12.696632] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c2a5d8949d8
      [   12.699491] R13: ffffffff95ede400 R14: 0000000000000000 R15: ffff8c2a5d894800
      [   12.702338] FS:  0000000000000000(0000) GS:ffff8c2a7fc80000(0000) knlGS:0000000000000000
      [   12.705099] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   12.706776] CR2: 0000000001bb9440 CR3: 00000000bd009001 CR4: 00000000003606e0
      [   12.708847] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   12.711016] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   12.712627] Call Trace:
      [   12.713390]  <IRQ>
      [   12.714011]  tipc_node_check_dest+0x2e8/0x350 [tipc]
      [   12.715286]  tipc_disc_rcv+0x14d/0x1d0 [tipc]
      [   12.716370]  tipc_rcv+0x8b0/0xd40 [tipc]
      [   12.717396]  ? minmax_running_min+0x2f/0x60
      [   12.718248]  ? dst_alloc+0x4c/0xa0
      [   12.718964]  ? tcp_ack+0xaf1/0x10b0
      [   12.719658]  ? tipc_udp_is_known_peer+0xa0/0xa0 [tipc]
      [   12.720634]  tipc_udp_recv+0x71/0x1d0 [tipc]
      [   12.721459]  ? dst_alloc+0x4c/0xa0
      [   12.722130]  udp_queue_rcv_skb+0x264/0x490
      [   12.722924]  __udp4_lib_rcv+0x21e/0x990
      [   12.723670]  ? ip_route_input_rcu+0x2dd/0xbf0
      [   12.724442]  ? tcp_v4_rcv+0x958/0xa40
      [   12.725039]  udp_rcv+0x1a/0x20
      [   12.725587]  ip_local_deliver_finish+0x97/0x1d0
      [   12.726323]  ip_local_deliver+0xaf/0xc0
      [   12.726959]  ? ip_route_input_noref+0x19/0x20
      [   12.727689]  ip_rcv_finish+0xdd/0x3b0
      [   12.728307]  ip_rcv+0x2ac/0x360
      [   12.728839]  __netif_receive_skb_core+0x6fb/0xa90
      [   12.729580]  ? udp4_gro_receive+0x1a7/0x2c0
      [   12.730274]  __netif_receive_skb+0x1d/0x60
      [   12.730953]  ? __netif_receive_skb+0x1d/0x60
      [   12.731637]  netif_receive_skb_internal+0x37/0xd0
      [   12.732371]  napi_gro_receive+0xc7/0xf0
      [   12.732920]  receive_buf+0x3c3/0xd40
      [   12.733441]  virtnet_poll+0xb1/0x250
      [   12.733944]  net_rx_action+0x23e/0x370
      [   12.734476]  __do_softirq+0xc5/0x2f8
      [   12.734922]  irq_exit+0xfa/0x100
      [   12.735315]  do_IRQ+0x4f/0xd0
      [   12.735680]  common_interrupt+0xa2/0xa2
      [   12.736126]  </IRQ>
      [   12.736416] RIP: 0010:native_safe_halt+0x6/0x10
      [   12.736925] RSP: 0018:ffffa41cc0cafe90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff4d
      [   12.737756] RAX: 0000000000000000 RBX: ffff8c2a761edb80 RCX: 0000000000000000
      [   12.738504] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      [   12.739258] RBP: ffffa41cc0cafe90 R08: 0000014b5b9795e5 R09: ffffa41cc12c7e88
      [   12.740118] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
      [   12.740964] R13: ffff8c2a761edb80 R14: 0000000000000000 R15: 0000000000000000
      [   12.741831]  default_idle+0x2a/0x100
      [   12.742323]  arch_cpu_idle+0xf/0x20
      [   12.742796]  default_idle_call+0x28/0x40
      [   12.743312]  do_idle+0x179/0x1f0
      [   12.743761]  cpu_startup_entry+0x1d/0x20
      [   12.744291]  start_secondary+0x112/0x120
      [   12.744816]  secondary_startup_64+0xa5/0xa5
      [   12.745367] Code: b9 f4 01 00 00 48 89 c2 48 c1 ea 02 48 3d d3 07 00
      00 48 0f 47 d1 49 8b 0c 24 48 39 d1 76 07 49 89 14 24 48 89 d1 31 d2 48
      89 df <48> f7 f1 89 c6 e8 81 6e ff ff 5b 41 5c 5d c3 66 90 66 2e 0f 1f
      [   12.747527] RIP: tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc] RSP: ffff8c2a7fc838a0
      [   12.748555] ---[ end trace 1399ab83390650fd ]---
      [   12.749296] Kernel panic - not syncing: Fatal exception in interrupt
      [   12.750123] Kernel Offset: 0x13200000 from 0xffffffff82000000
      (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      [   12.751215] Rebooting in 60 seconds..
      
      Fixes: c9b64d49 ("tipc: add replicast peer discovery")
      Signed-off-by: NTommi Rantala <tommi.t.rantala@nokia.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7799c06