1. 19 7月, 2018 1 次提交
  2. 11 7月, 2018 2 次提交
    • A
      ipv6: xfrm: use 64-bit timestamps · 03dc7a35
      Arnd Bergmann 提交于
      get_seconds() is deprecated because it can overflow on 32-bit
      architectures.  For the xfrm_state->lastused member, we treat the data
      as a 64-bit number already, so we just need to use the right accessor
      that works on both 32-bit and 64-bit machines.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      03dc7a35
    • A
      xfrm: use time64_t for in-kernel timestamps · 386c5680
      Arnd Bergmann 提交于
      The lifetime managment uses '__u64' timestamps on the user space
      interface, but 'unsigned long' for reading the current time in the kernel
      with get_seconds().
      
      While this is probably safe beyond y2038, it will still overflow in 2106,
      and the get_seconds() call is deprecated because fo that.
      
      This changes the xfrm time handling to use time64_t consistently, along
      with reading the time using the safer ktime_get_real_seconds(). It still
      suffers from problems that can happen from a concurrent settimeofday()
      call or (to a lesser degree) a leap second update, but since the time
      stamps are part of the user API, there is nothing we can do to prevent
      that.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      386c5680
  3. 01 7月, 2018 1 次提交
    • N
      xfrm: Allow Set Mark to be Updated Using UPDSA · 6d8e85ff
      Nathan Harold 提交于
      Allow UPDSA to change "set mark" to permit
      policy separation of packet routing decisions from
      SA keying in systems that use mark-based routing.
      
      The set mark, used as a routing and firewall mark
      for outbound packets, is made update-able which
      allows routing decisions to be handled independently
      of keying/SA creation. To maintain consistency with
      other optional attributes, the set mark is only
      updated if sent with a non-zero value.
      
      The per-SA lock and the xfrm_state_lock are taken in
      that order to avoid a deadlock with
      xfrm_timer_handler(), which also takes the locks in
      that order.
      Signed-off-by: NNathan Harold <nharold@google.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      6d8e85ff
  4. 25 6月, 2018 1 次提交
    • F
      xfrm: policy: remove pcpu policy cache · e4db5b61
      Florian Westphal 提交于
      Kristian Evensen says:
        In a project I am involved in, we are running ipsec (Strongswan) on
        different mt7621-based routers. Each router is configured as an
        initiator and has around ~30 tunnels to different responders (running
        on misc. devices). Before the flow cache was removed (kernel 4.9), we
        got a combined throughput of around 70Mbit/s for all tunnels on one
        router. However, we recently switched to kernel 4.14 (4.14.48), and
        the total throughput is somewhere around 57Mbit/s (best-case). I.e., a
        drop of around 20%. Reverting the flow cache removal restores, as
        expected, performance levels to that of kernel 4.9.
      
      When pcpu xdst exists, it has to be validated first before it can be
      used.
      
      A negative hit thus increases cost vs. no-cache.
      
      As number of tunnels increases, hit rate decreases so this pcpu caching
      isn't a viable strategy.
      
      Furthermore, the xdst cache also needs to run with BH off, so when
      removing this the bh disable/enable pairs can be removed too.
      
      Kristian tested a 4.14.y backport of this change and reported
      increased performance:
      
        In our tests, the throughput reduction has been reduced from around -20%
        to -5%. We also see that the overall throughput is independent of the
        number of tunnels, while before the throughput was reduced as the number
        of tunnels increased.
      Reported-by: NKristian Evensen <kristian.evensen@gmail.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      e4db5b61
  5. 23 6月, 2018 4 次提交
  6. 22 6月, 2018 7 次提交
    • E
      tcp_bbr: fix bbr pacing rate for internal pacing · cadefe5f
      Eric Dumazet 提交于
      This commit makes BBR use only the MSS (without any headers) to
      calculate pacing rates when internal TCP-layer pacing is used.
      
      This is necessary to achieve the correct pacing behavior in this case,
      since tcp_internal_pacing() uses only the payload length to calculate
      pacing delays.
      Signed-off-by: NKevin Yang <yyd@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cadefe5f
    • W
      tcp: ignore rcv_rtt sample with old ts ecr value · 3f6c65d6
      Wei Wang 提交于
      When receiving multiple packets with the same ts ecr value, only try
      to compute rcv_rtt sample with the earliest received packet.
      This is because the rcv_rtt calculated by later received packets
      could possibly include long idle time or other types of delay.
      For example:
      (1) server sends last packet of reply with TS val V1
      (2) client ACKs last packet of reply with TS ecr V1
      (3) long idle time passes
      (4) client sends next request data packet with TS ecr V1 (again!)
      At this time, the rcv_rtt computed on server with TS ecr V1 will be
      inflated with the idle time and should get ignored.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f6c65d6
    • N
      rhashtable: remove nulls_base and related code. · 9f9a7077
      NeilBrown 提交于
      This "feature" is unused, undocumented, and untested and so doesn't
      really belong.  A patch is under development to properly implement
      support for detecting when a search gets diverted down a different
      chain, which the common purpose of nulls markers.
      
      This patch actually fixes a bug too.  The table resizing allows a
      table to grow to 2^31 buckets, but the hash is truncated to 27 bits -
      any growth beyond 2^27 is wasteful an ineffective.
      
      This patch results in NULLS_MARKER(0) being used for all chains,
      and leaves the use of rht_is_a_null() to test for it.
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f9a7077
    • N
      rhashtable: split rhashtable.h · 0eb71a9d
      NeilBrown 提交于
      Due to the use of rhashtables in net namespaces,
      rhashtable.h is included in lots of the kernel,
      so a small changes can required a large recompilation.
      This makes development painful.
      
      This patch splits out rhashtable-types.h which just includes
      the major type declarations, and does not include (non-trivial)
      inline code.  rhashtable.h is no longer included by anything
      in the include/ directory.
      Common include files only include rhashtable-types.h so a large
      recompilation is only triggered when that changes.
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0eb71a9d
    • C
      VSOCK: fix loopback on big-endian systems · e5ab564c
      Claudio Imbrenda 提交于
      The dst_cid and src_cid are 64 bits, therefore 64 bit accessors should be
      used, and in fact in virtio_transport_common.c only 64 bit accessors are
      used. Using 32 bit accessors for 64 bit values breaks big endian systems.
      
      This patch fixes a wrong use of le32_to_cpu in virtio_transport_send_pkt.
      
      Fixes: b9116823 ("VSOCK: add loopback to virtio_transport")
      Signed-off-by: NClaudio Imbrenda <imbrenda@linux.vnet.ibm.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5ab564c
    • P
      cls_flower: fix use after free in flower S/W path · 44a5cd43
      Paolo Abeni 提交于
      If flower filter is created without the skip_sw flag, fl_mask_put()
      can race with fl_classify() and we can destroy the mask rhashtable
      while a lookup operation is accessing it.
      
       BUG: unable to handle kernel paging request at 00000000000911d1
       PGD 0 P4D 0
       SMP PTI
       CPU: 3 PID: 5582 Comm: vhost-5541 Not tainted 4.18.0-rc1.vanilla+ #1950
       Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
       RIP: 0010:rht_bucket_nested+0x20/0x60
       Code: 31 c8 c1 c1 18 29 c8 c3 66 90 8b 4f 04 ba 01 00 00 00 8b 07 48 8b bf 80 00 00 0
       RSP: 0018:ffffafc5cfbb7a48 EFLAGS: 00010206
       RAX: 0000000000001978 RBX: ffff9f12dff88a00 RCX: 00000000ffff9f12
       RDX: 00000000000911d1 RSI: 0000000000000148 RDI: 0000000000000001
       RBP: ffff9f12dff88a00 R08: 000000005f1cc119 R09: 00000000a715fae2
       R10: ffffafc5cfbb7aa8 R11: ffff9f1cb4be804e R12: ffff9f1265e13000
       R13: 0000000000000000 R14: ffffafc5cfbb7b48 R15: ffff9f12dff88b68
       FS:  0000000000000000(0000) GS:ffff9f1d3f0c0000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00000000000911d1 CR3: 0000001575a94006 CR4: 00000000001626e0
       Call Trace:
        fl_lookup+0x134/0x140 [cls_flower]
        fl_classify+0xf3/0x180 [cls_flower]
        tcf_classify+0x78/0x150
        __netif_receive_skb_core+0x69e/0xa50
        netif_receive_skb_internal+0x42/0xf0
        tun_get_user+0xdd5/0xfd0 [tun]
        tun_sendmsg+0x52/0x70 [tun]
        handle_tx+0x2b3/0x5f0 [vhost_net]
        vhost_worker+0xab/0x100 [vhost]
        kthread+0xf8/0x130
        ret_from_fork+0x35/0x40
       Modules linked in: act_mirred act_gact cls_flower vhost_net vhost tap sch_ingress
       CR2: 00000000000911d1
      
      Fix the above waiting for a RCU grace period before destroying the
      rhashtable: we need to use tcf_queue_work(), as rhashtable_destroy()
      must run in process context, as pointed out by Cong Wang.
      
      v1 -> v2: use tcf_queue_work to run rhashtable_destroy().
      
      Fixes: 05cd271f ("cls_flower: Support multiple masks per priority")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44a5cd43
    • E
      net/packet: fix use-after-free · 945d015e
      Eric Dumazet 提交于
      We should put copy_skb in receive_queue only after
      a successful call to virtio_net_hdr_from_skb().
      
      syzbot report :
      
      BUG: KASAN: use-after-free in __skb_unlink include/linux/skbuff.h:1843 [inline]
      BUG: KASAN: use-after-free in __skb_dequeue include/linux/skbuff.h:1863 [inline]
      BUG: KASAN: use-after-free in skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815
      Read of size 8 at addr ffff8801b044ecc0 by task syz-executor217/4553
      
      CPU: 0 PID: 4553 Comm: syz-executor217 Not tainted 4.18.0-rc1+ #111
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
       print_address_description+0x6c/0x20b mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
       __skb_unlink include/linux/skbuff.h:1843 [inline]
       __skb_dequeue include/linux/skbuff.h:1863 [inline]
       skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815
       skb_queue_purge+0x26/0x40 net/core/skbuff.c:2852
       packet_set_ring+0x675/0x1da0 net/packet/af_packet.c:4331
       packet_release+0x630/0xd90 net/packet/af_packet.c:2991
       __sock_release+0xd7/0x260 net/socket.c:603
       sock_close+0x19/0x20 net/socket.c:1186
       __fput+0x35b/0x8b0 fs/file_table.c:209
       ____fput+0x15/0x20 fs/file_table.c:243
       task_work_run+0x1ec/0x2a0 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x1b08/0x2750 kernel/exit.c:865
       do_group_exit+0x177/0x440 kernel/exit.c:968
       __do_sys_exit_group kernel/exit.c:979 [inline]
       __se_sys_exit_group kernel/exit.c:977 [inline]
       __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:977
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4448e9
      Code: Bad RIP value.
      RSP: 002b:00007ffd5f777ca8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004448e9
      RDX: 00000000004448e9 RSI: 000000000000fcfb RDI: 0000000000000001
      RBP: 00000000006cf018 R08: 00007ffd0000a45b R09: 0000000000000000
      R10: 00007ffd5f777e48 R11: 0000000000000202 R12: 00000000004021f0
      R13: 0000000000402280 R14: 0000000000000000 R15: 0000000000000000
      
      Allocated by task 4553:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
       kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
       skb_clone+0x1f5/0x500 net/core/skbuff.c:1282
       tpacket_rcv+0x28f7/0x3200 net/packet/af_packet.c:2221
       deliver_skb net/core/dev.c:1925 [inline]
       deliver_ptype_list_skb net/core/dev.c:1940 [inline]
       __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611
       __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
       netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767
       netif_receive_skb+0xbf/0x420 net/core/dev.c:4791
       tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571
       tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981
       tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009
       call_write_iter include/linux/fs.h:1795 [inline]
       new_sync_write fs/read_write.c:474 [inline]
       __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
       vfs_write+0x1f8/0x560 fs/read_write.c:549
       ksys_write+0x101/0x260 fs/read_write.c:598
       __do_sys_write fs/read_write.c:610 [inline]
       __se_sys_write fs/read_write.c:607 [inline]
       __x64_sys_write+0x73/0xb0 fs/read_write.c:607
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 4553:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
       kfree_skbmem+0x154/0x230 net/core/skbuff.c:582
       __kfree_skb net/core/skbuff.c:642 [inline]
       kfree_skb+0x1a5/0x580 net/core/skbuff.c:659
       tpacket_rcv+0x189e/0x3200 net/packet/af_packet.c:2385
       deliver_skb net/core/dev.c:1925 [inline]
       deliver_ptype_list_skb net/core/dev.c:1940 [inline]
       __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611
       __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
       netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767
       netif_receive_skb+0xbf/0x420 net/core/dev.c:4791
       tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571
       tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981
       tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009
       call_write_iter include/linux/fs.h:1795 [inline]
       new_sync_write fs/read_write.c:474 [inline]
       __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
       vfs_write+0x1f8/0x560 fs/read_write.c:549
       ksys_write+0x101/0x260 fs/read_write.c:598
       __do_sys_write fs/read_write.c:610 [inline]
       __se_sys_write fs/read_write.c:607 [inline]
       __x64_sys_write+0x73/0xb0 fs/read_write.c:607
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8801b044ecc0
       which belongs to the cache skbuff_head_cache of size 232
      The buggy address is located 0 bytes inside of
       232-byte region [ffff8801b044ecc0, ffff8801b044eda8)
      The buggy address belongs to the page:
      page:ffffea0006c11380 count:1 mapcount:0 mapping:ffff8801d9be96c0 index:0x0
      flags: 0x2fffc0000000100(slab)
      raw: 02fffc0000000100 ffffea0006c17988 ffff8801d9bec248 ffff8801d9be96c0
      raw: 0000000000000000 ffff8801b044e040 000000010000000c 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801b044eb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff8801b044ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
      >ffff8801b044ec80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                                 ^
       ffff8801b044ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8801b044ed80: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
      
      Fixes: 58d19b19 ("packet: vnet_hdr support for tpacket_rcv")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      945d015e
  7. 21 6月, 2018 3 次提交
  8. 20 6月, 2018 11 次提交
    • W
      ip: limit use of gso_size to udp · 9887cba1
      Willem de Bruijn 提交于
      The ipcm(6)_cookie field gso_size is set only in the udp path. The ip
      layer copies this to cork only if sk_type is SOCK_DGRAM. This check
      proved too permissive. Ping and l2tp sockets have the same type.
      
      Limit to sockets of type SOCK_DGRAM and protocol IPPROTO_UDP to
      exclude ping sockets.
      
      v1 -> v2
      - remove irrelevant whitespace changes
      
      Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
      Reported-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9887cba1
    • M
      bpfilter: ignore binary files · 8b26a06a
      Matteo Croce 提交于
      net/bpfilter/bpfilter_umh is a binary file generated when bpfilter is
      enabled, add it to .gitignore to avoid committing it.
      
      Fixes: d2ba09c1 ("net: add skeleton of bpfilter kernel module")
      Signed-off-by: NMatteo Croce <mcroce@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b26a06a
    • M
      bpfilter: fix build error · 421780fd
      Matteo Croce 提交于
      bpfilter Makefile assumes that the system locale is en_US, and the
      parsing of objdump output fails.
      Set LC_ALL=C and, while at it, rewrite the objdump parsing so it spawns
      only 2 processes instead of 7.
      
      Fixes: d2ba09c1 ("net: add skeleton of bpfilter kernel module")
      Signed-off-by: NMatteo Croce <mcroce@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      421780fd
    • D
      net/sched: act_ife: preserve the action control in case of error · cbf56c29
      Davide Caratti 提交于
      in the following script
      
       # tc actions add action ife encode allow prio pass index 42
       # tc actions replace action ife encode allow tcindex drop index 42
      
      the action control should remain equal to 'pass', if the kernel failed
      to replace the TC action. Pospone the assignment of the action control,
      to ensure it is not overwritten in the error path of tcf_ife_init().
      
      Fixes: ef6980b6 ("introduce IFE action")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbf56c29
    • D
      net/sched: act_ife: fix recursive lock and idr leak · 0a889b94
      Davide Caratti 提交于
      a recursive lock warning [1] can be observed with the following script,
      
       # $TC actions add action ife encode allow prio pass index 42
       IFE type 0xED3E
       # $TC actions replace action ife encode allow tcindex pass index 42
      
      in case the kernel was unable to run the last command (e.g. because of
      the impossibility to load 'act_meta_skbtcindex'). For a similar reason,
      the kernel can leak idr in the error path of tcf_ife_init(), because
      tcf_idr_release() is not called after successful idr reservation:
      
       # $TC actions add action ife encode allow tcindex index 47
       IFE type 0xED3E
       RTNETLINK answers: No such file or directory
       We have an error talking to the kernel
       # $TC actions add action ife encode allow tcindex index 47
       IFE type 0xED3E
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
       # $TC actions add action ife encode use mark 7 type 0xfefe pass index 47
       IFE type 0xFEFE
       RTNETLINK answers: No space left on device
       We have an error talking to the kernel
      
      Since tcfa_lock is already taken when the action is being edited, a call
      to tcf_idr_release() wrongly makes tcf_idr_cleanup() take the same lock
      again. On the other hand, tcf_idr_release() needs to be called in the
      error path of tcf_ife_init(), to undo the last tcf_idr_create() invocation.
      Fix both problems in tcf_ife_init().
      Since the cleanup() routine can now be called when ife->params is NULL,
      also add a NULL pointer check to avoid calling kfree_rcu(NULL, rcu).
      
       [1]
       ============================================
       WARNING: possible recursive locking detected
       4.17.0-rc4.kasan+ #417 Tainted: G            E
       --------------------------------------------
       tc/3932 is trying to acquire lock:
       000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_cleanup+0x19/0x80 [act_ife]
      
       but task is already holding lock:
       000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife]
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock(&(&p->tcfa_lock)->rlock);
         lock(&(&p->tcfa_lock)->rlock);
      
        *** DEADLOCK ***
      
        May be due to missing lock nesting notation
      
       2 locks held by tc/3932:
        #0: 000000007ca8e990 (rtnl_mutex){+.+.}, at: tcf_ife_init+0xf61/0x13c0 [act_ife]
        #1: 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife]
      
       stack backtrace:
       CPU: 3 PID: 3932 Comm: tc Tainted: G            E     4.17.0-rc4.kasan+ #417
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       Call Trace:
        dump_stack+0x9a/0xeb
        __lock_acquire+0xf43/0x34a0
        ? debug_check_no_locks_freed+0x2b0/0x2b0
        ? debug_check_no_locks_freed+0x2b0/0x2b0
        ? debug_check_no_locks_freed+0x2b0/0x2b0
        ? __mutex_lock+0x62f/0x1240
        ? kvm_sched_clock_read+0x1a/0x30
        ? sched_clock+0x5/0x10
        ? sched_clock_cpu+0x18/0x170
        ? find_held_lock+0x39/0x1d0
        ? lock_acquire+0x10b/0x330
        lock_acquire+0x10b/0x330
        ? tcf_ife_cleanup+0x19/0x80 [act_ife]
        _raw_spin_lock_bh+0x38/0x70
        ? tcf_ife_cleanup+0x19/0x80 [act_ife]
        tcf_ife_cleanup+0x19/0x80 [act_ife]
        __tcf_idr_release+0xff/0x350
        tcf_ife_init+0xdde/0x13c0 [act_ife]
        ? ife_exit_net+0x290/0x290 [act_ife]
        ? __lock_is_held+0xb4/0x140
        tcf_action_init_1+0x67b/0xad0
        ? tcf_action_dump_old+0xa0/0xa0
        ? sched_clock+0x5/0x10
        ? sched_clock_cpu+0x18/0x170
        ? kvm_sched_clock_read+0x1a/0x30
        ? sched_clock+0x5/0x10
        ? sched_clock_cpu+0x18/0x170
        ? memset+0x1f/0x40
        tcf_action_init+0x30f/0x590
        ? tcf_action_init_1+0xad0/0xad0
        ? memset+0x1f/0x40
        tc_ctl_action+0x48e/0x5e0
        ? mutex_lock_io_nested+0x1160/0x1160
        ? tca_action_gd+0x990/0x990
        ? sched_clock+0x5/0x10
        ? find_held_lock+0x39/0x1d0
        rtnetlink_rcv_msg+0x4da/0x990
        ? validate_linkmsg+0x680/0x680
        ? sched_clock_cpu+0x18/0x170
        ? find_held_lock+0x39/0x1d0
        netlink_rcv_skb+0x127/0x350
        ? validate_linkmsg+0x680/0x680
        ? netlink_ack+0x970/0x970
        ? __kmalloc_node_track_caller+0x304/0x3a0
        netlink_unicast+0x40f/0x5d0
        ? netlink_attachskb+0x580/0x580
        ? _copy_from_iter_full+0x187/0x760
        ? import_iovec+0x90/0x390
        netlink_sendmsg+0x67f/0xb50
        ? netlink_unicast+0x5d0/0x5d0
        ? copy_msghdr_from_user+0x206/0x340
        ? netlink_unicast+0x5d0/0x5d0
        sock_sendmsg+0xb3/0xf0
        ___sys_sendmsg+0x60a/0x8b0
        ? copy_msghdr_from_user+0x340/0x340
        ? lock_downgrade+0x5e0/0x5e0
        ? tty_write_lock+0x18/0x50
        ? kvm_sched_clock_read+0x1a/0x30
        ? sched_clock+0x5/0x10
        ? sched_clock_cpu+0x18/0x170
        ? find_held_lock+0x39/0x1d0
        ? lock_downgrade+0x5e0/0x5e0
        ? lock_acquire+0x10b/0x330
        ? __audit_syscall_entry+0x316/0x690
        ? current_kernel_time64+0x6b/0xd0
        ? __fget_light+0x55/0x1f0
        ? __sys_sendmsg+0xd2/0x170
        __sys_sendmsg+0xd2/0x170
        ? __ia32_sys_shutdown+0x70/0x70
        ? syscall_trace_enter+0x57a/0xd60
        ? rcu_read_lock_sched_held+0xdc/0x110
        ? __bpf_trace_sys_enter+0x10/0x10
        ? do_syscall_64+0x22/0x480
        do_syscall_64+0xa5/0x480
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
       RIP: 0033:0x7fd646988ba0
       RSP: 002b:00007fffc9fab3c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 00007fffc9fab4f0 RCX: 00007fd646988ba0
       RDX: 0000000000000000 RSI: 00007fffc9fab440 RDI: 0000000000000003
       RBP: 000000005b28c8b3 R08: 0000000000000002 R09: 0000000000000000
       R10: 00007fffc9faae20 R11: 0000000000000246 R12: 0000000000000000
       R13: 00007fffc9fab504 R14: 0000000000000001 R15: 000000000066c100
      
      Fixes: 4e8c8615 ("net sched: net sched: ife action fix late binding")
      Fixes: ef6980b6 ("introduce IFE action")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a889b94
    • L
      net: propagate dev_get_valid_name return code · 7892bd08
      Li RongQing 提交于
      if dev_get_valid_name failed, propagate its return code
      
      and remove the setting err to ENODEV, it will be set to
      0 again before dev_change_net_namespace exits.
      Signed-off-by: NLi RongQing <lirongqing@baidu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7892bd08
    • D
      net/tcp: Fix socket lookups with SO_BINDTODEVICE · 8c43bd17
      David Ahern 提交于
      Similar to 69678bcd ("udp: fix SO_BINDTODEVICE"), TCP socket lookups
      need to fail if dev_match is not true. Currently, a packet to a given port
      can match a socket bound to device when it should not. In the VRF case,
      this causes the lookup to hit a VRF socket and not a global socket
      resulting in a response trying to go through the VRF when it should not.
      
      Fixes: 3fa6f616 ("net: ipv4: add second dif to inet socket lookups")
      Fixes: 4297a0ef ("net: ipv6: add second dif to inet6 socket lookups")
      Reported-by: NLou Berger <lberger@labn.net>
      Diagnosed-by: NRenato Westphal <renato@opensourcerouting.org>
      Tested-by: NRenato Westphal <renato@opensourcerouting.org>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c43bd17
    • E
      net/ipv6: respect rcu grace period before freeing fib6_info · 9b0a8da8
      Eric Dumazet 提交于
      syzbot reported use after free that is caused by fib6_info being
      freed without a proper RCU grace period.
      
      CPU: 0 PID: 1407 Comm: udevd Not tainted 4.17.0+ #39
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1b9/0x294 lib/dump_stack.c:113
       print_address_description+0x6c/0x20b mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
       __read_once_size include/linux/compiler.h:188 [inline]
       find_rr_leaf net/ipv6/route.c:705 [inline]
       rt6_select net/ipv6/route.c:761 [inline]
       fib6_table_lookup+0x12b7/0x14d0 net/ipv6/route.c:1823
       ip6_pol_route+0x1c2/0x1020 net/ipv6/route.c:1856
       ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2082
       fib6_rule_lookup+0x211/0x6d0 net/ipv6/fib6_rules.c:122
       ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2110
       ip6_route_output include/net/ip6_route.h:82 [inline]
       icmpv6_xrlim_allow net/ipv6/icmp.c:211 [inline]
       icmp6_send+0x147c/0x2da0 net/ipv6/icmp.c:535
       icmpv6_send+0x17a/0x300 net/ipv6/ip6_icmp.c:43
       ip6_link_failure+0xa5/0x790 net/ipv6/route.c:2244
       dst_link_failure include/net/dst.h:427 [inline]
       ndisc_error_report+0xd1/0x1c0 net/ipv6/ndisc.c:695
       neigh_invalidate+0x246/0x550 net/core/neighbour.c:892
       neigh_timer_handler+0xaf9/0xde0 net/core/neighbour.c:978
       call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
       expire_timers kernel/time/timer.c:1363 [inline]
       __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
       run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
       __do_softirq+0x2e0/0xaf5 kernel/softirq.c:284
       invoke_softirq kernel/softirq.c:364 [inline]
       irq_exit+0x1d1/0x200 kernel/softirq.c:404
       exiting_irq arch/x86/include/asm/apic.h:527 [inline]
       smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
       apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
       </IRQ>
      RIP: 0010:strlen+0x5e/0xa0 lib/string.c:482
      Code: 24 00 74 3b 48 bb 00 00 00 00 00 fc ff df 4c 89 e0 48 83 c0 01 48 89 c2 48 89 c1 48 c1 ea 03 83 e1 07 0f b6 14 1a 38 ca 7f 04 <84> d2 75 23 80 38 00 75 de 48 83 c4 08 4c 29 e0 5b 41 5c 5d c3 48
      RSP: 0018:ffff8801af117850 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
      RAX: ffff880197f53bd0 RBX: dffffc0000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff81c5b06c RDI: ffff880197f53bc0
      RBP: ffff8801af117868 R08: ffff88019a976540 R09: 0000000000000000
      R10: ffff88019a976540 R11: 0000000000000000 R12: ffff880197f53bc0
      R13: ffff880197f53bc0 R14: ffffffff899e4e90 R15: ffff8801d91c6a00
       strlen include/linux/string.h:267 [inline]
       getname_kernel+0x24/0x370 fs/namei.c:218
       open_exec+0x17/0x70 fs/exec.c:882
       load_elf_binary+0x968/0x5610 fs/binfmt_elf.c:780
       search_binary_handler+0x17d/0x570 fs/exec.c:1653
       exec_binprm fs/exec.c:1695 [inline]
       __do_execve_file.isra.35+0x16fe/0x2710 fs/exec.c:1819
       do_execveat_common fs/exec.c:1866 [inline]
       do_execve fs/exec.c:1883 [inline]
       __do_sys_execve fs/exec.c:1964 [inline]
       __se_sys_execve fs/exec.c:1959 [inline]
       __x64_sys_execve+0x8f/0xc0 fs/exec.c:1959
       do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x7f1576a46207
      Code: 77 19 f4 48 89 d7 44 89 c0 0f 05 48 3d 00 f0 ff ff 76 e0 f7 d8 64 41 89 01 eb d8 f7 d8 64 41 89 01 eb df b8 3b 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 02 f3 c3 48 8b 15 00 8c 2d 00 f7 d8 64 89 02
      RSP: 002b:00007ffff2784568 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
      RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f1576a46207
      RDX: 0000000001215b10 RSI: 00007ffff2784660 RDI: 00007ffff2785670
      RBP: 0000000000625500 R08: 000000000000589c R09: 000000000000589c
      R10: 0000000000000000 R11: 0000000000000202 R12: 0000000001215b10
      R13: 0000000000000007 R14: 0000000001204250 R15: 0000000000000005
      
      Allocated by task 12188:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
       kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
       kmalloc include/linux/slab.h:513 [inline]
       kzalloc include/linux/slab.h:706 [inline]
       fib6_info_alloc+0xbb/0x280 net/ipv6/ip6_fib.c:152
       ip6_route_info_create+0x782/0x2b50 net/ipv6/route.c:3013
       ip6_route_add+0x23/0xb0 net/ipv6/route.c:3154
       ipv6_route_ioctl+0x5a5/0x760 net/ipv6/route.c:3660
       inet6_ioctl+0x100/0x1f0 net/ipv6/af_inet6.c:546
       sock_do_ioctl+0xe4/0x3e0 net/socket.c:973
       sock_ioctl+0x30d/0x680 net/socket.c:1097
       vfs_ioctl fs/ioctl.c:46 [inline]
       file_ioctl fs/ioctl.c:500 [inline]
       do_vfs_ioctl+0x1cf/0x16f0 fs/ioctl.c:684
       ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
       __do_sys_ioctl fs/ioctl.c:708 [inline]
       __se_sys_ioctl fs/ioctl.c:706 [inline]
       __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
       do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 1402:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kfree+0xd9/0x260 mm/slab.c:3813
       fib6_info_destroy+0x29b/0x350 net/ipv6/ip6_fib.c:207
       fib6_info_release include/net/ip6_fib.h:286 [inline]
       __ip6_del_rt_siblings net/ipv6/route.c:3235 [inline]
       ip6_route_del+0x11c4/0x13b0 net/ipv6/route.c:3316
       ipv6_route_ioctl+0x616/0x760 net/ipv6/route.c:3663
       inet6_ioctl+0x100/0x1f0 net/ipv6/af_inet6.c:546
       sock_do_ioctl+0xe4/0x3e0 net/socket.c:973
       sock_ioctl+0x30d/0x680 net/socket.c:1097
       vfs_ioctl fs/ioctl.c:46 [inline]
       file_ioctl fs/ioctl.c:500 [inline]
       do_vfs_ioctl+0x1cf/0x16f0 fs/ioctl.c:684
       ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
       __do_sys_ioctl fs/ioctl.c:708 [inline]
       __se_sys_ioctl fs/ioctl.c:706 [inline]
       __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
       do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8801b5df2580
       which belongs to the cache kmalloc-256 of size 256
      The buggy address is located 8 bytes inside of
       256-byte region [ffff8801b5df2580, ffff8801b5df2680)
      The buggy address belongs to the page:
      page:ffffea0006d77c80 count:1 mapcount:0 mapping:ffff8801da8007c0 index:0xffff8801b5df2e40
      flags: 0x2fffc0000000100(slab)
      raw: 02fffc0000000100 ffffea0006c5cc48 ffffea0007363308 ffff8801da8007c0
      raw: ffff8801b5df2e40 ffff8801b5df2080 0000000100000006 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801b5df2480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8801b5df2500: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      > ffff8801b5df2580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                            ^
       ffff8801b5df2600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8801b5df2680: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
      
      Fixes: a64efe14 ("net/ipv6: introduce fib6_info struct and helpers")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reported-by: syzbot+9e6d75e3edef427ee888@syzkaller.appspotmail.com
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Tested-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b0a8da8
    • J
      net/ncsi: Use netdev_dbg for debug messages · 6e42a3f5
      Joel Stanley 提交于
      This moves all of the netdev_printk(KERN_DEBUG, ...) messages over to
      netdev_dbg.
      
      As Joe explains:
      
      > netdev_dbg is not included in object code unless
      > DEBUG is defined or CONFIG_DYNAMIC_DEBUG is set.
      > And then, it is not emitted into the log unless
      > DEBUG is set or this specific netdev_dbg is enabled
      > via the dynamic debug control file.
      
      Which is what we're after in this case.
      Acked-by: NSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e42a3f5
    • J
      net/ncsi: Drop no more channels message · 5d3b1467
      Joel Stanley 提交于
      This does not provide useful information. As the ncsi maintainer said:
      
       > either we get a channel or broadcom has gone out to lunch
      Acked-by: NSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d3b1467
    • J
      net/ncsi: Silence debug messages · 87975a01
      Joel Stanley 提交于
      In normal operation we see this series of messages as the host drives
      the network device:
      
       ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state down
       ftgmac100 1e660000.ethernet eth0: NCSI: suspending channel 0
       ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
       ftgmac100 1e660000.ethernet eth0: NCSI: channel 0 link down after config
       ftgmac100 1e660000.ethernet eth0: NCSI interface down
       ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state up
       ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
       ftgmac100 1e660000.ethernet eth0: NCSI interface up
       ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state down
       ftgmac100 1e660000.ethernet eth0: NCSI: suspending channel 0
       ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
       ftgmac100 1e660000.ethernet eth0: NCSI: channel 0 link down after config
       ftgmac100 1e660000.ethernet eth0: NCSI interface down
       ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state up
       ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
       ftgmac100 1e660000.ethernet eth0: NCSI interface up
      
      This makes all of these messages netdev_dbg. They are still useful to
      debug eg. misbehaving network device firmware, but we do not need them
      filling up the kernel logs in normal operation.
      Acked-by: NSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87975a01
  9. 17 6月, 2018 2 次提交
    • K
      net_sched: blackhole: tell upper qdisc about dropped packets · 7e85dc8c
      Konstantin Khlebnikov 提交于
      When blackhole is used on top of classful qdisc like hfsc it breaks
      qlen and backlog counters because packets are disappear without notice.
      
      In HFSC non-zero qlen while all classes are inactive triggers warning:
      WARNING: ... at net/sched/sch_hfsc.c:1393 hfsc_dequeue+0xba4/0xe90 [sch_hfsc]
      and schedules watchdog work endlessly.
      
      This patch return __NET_XMIT_BYPASS in addition to NET_XMIT_SUCCESS,
      this flag tells upper layer: this packet is gone and isn't queued.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e85dc8c
    • D
      atm: Preserve value of skb->truesize when accounting to vcc · 9bbe60a6
      David Woodhouse 提交于
      ATM accounts for in-flight TX packets in sk_wmem_alloc of the VCC on
      which they are to be sent. But it doesn't take ownership of those
      packets from the sock (if any) which originally owned them. They should
      remain owned by their actual sender until they've left the box.
      
      There's a hack in pskb_expand_head() to avoid adjusting skb->truesize
      for certain skbs, precisely to avoid messing up sk_wmem_alloc
      accounting. Ideally that hack would cover the ATM use case too, but it
      doesn't — skbs which aren't owned by any sock, for example PPP control
      frames, still get their truesize adjusted when the low-level ATM driver
      adds headroom.
      
      This has always been an issue, it seems. The truesize of a packet
      increases, and sk_wmem_alloc on the VCC goes negative. But this wasn't
      for normal traffic, only for control frames. So I think we just got away
      with it, and we probably needed to send 2GiB of LCP echo frames before
      the misaccounting would ever have caused a problem and caused
      atm_may_send() to start refusing packets.
      
      Commit 14afee4b ("net: convert sock.sk_wmem_alloc from atomic_t to
      refcount_t") did exactly what it was intended to do, and turned this
      mostly-theoretical problem into a real one, causing PPPoATM to fail
      immediately as sk_wmem_alloc underflows and atm_may_send() *immediately*
      starts refusing to allow new packets.
      
      The least intrusive solution to this problem is to stash the value of
      skb->truesize that was accounted to the VCC, in a new member of the
      ATM_SKB(skb) structure. Then in atm_pop_raw() subtract precisely that
      value instead of the then-current value of skb->truesize.
      
      Fixes: 158f323b ("net: adjust skb->truesize in pskb_expand_head()")
      Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
      Tested-by: NKevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bbe60a6
  10. 16 6月, 2018 7 次提交
    • T
      xdp: Fix handling of devmap in generic XDP · 6d5fc195
      Toshiaki Makita 提交于
      Commit 67f29e07 ("bpf: devmap introduce dev_map_enqueue") changed
      the return value type of __devmap_lookup_elem() from struct net_device *
      to struct bpf_dtab_netdev * but forgot to modify generic XDP code
      accordingly.
      
      Thus generic XDP incorrectly used struct bpf_dtab_netdev where struct
      net_device is expected, then skb->dev was set to invalid value.
      
      v2:
      - Fix compiler warning without CONFIG_BPF_SYSCALL.
      
      Fixes: 67f29e07 ("bpf: devmap introduce dev_map_enqueue")
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: NYonghong Song <yhs@fb.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      6d5fc195
    • R
      neighbour: skip NTF_EXT_LEARNED entries during forced gc · f6a6f203
      Roopa Prabhu 提交于
      Commit 9ce33e46 ("neighbour: support for NTF_EXT_LEARNED flag")
      added support for NTF_EXT_LEARNED for neighbour entries.
      NTF_EXT_LEARNED entries are neigh entries managed by control
      plane (eg: Ethernet VPN implementation in FRR routing suite).
      Periodic gc already excludes these entries. This patch extends
      it to forced gc which the earlier patch missed.
      
      Fixes: 9ce33e46 ("neighbour: support for NTF_EXT_LEARNED flag")
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6a6f203
    • D
      tls: fix waitall behavior in tls_sw_recvmsg · 06030dba
      Daniel Borkmann 提交于
      Current behavior in tls_sw_recvmsg() is to wait for incoming tls
      messages and copy up to exactly len bytes of data that the user
      provided. This is problematic in the sense that i) if no packet
      is currently queued in strparser we keep waiting until one has been
      processed and pushed into tls receive layer for tls_wait_data() to
      wake up and push the decrypted bits to user space. Given after
      tls decryption, we're back at streaming data, use sock_rcvlowat()
      hint from tcp socket instead. Retain current behavior with MSG_WAITALL
      flag and otherwise use the hint target for breaking the loop and
      returning to application. This is done if currently no ctx->recv_pkt
      is ready, otherwise continue to process it from our strparser
      backlog.
      
      Fixes: c46234eb ("tls: RX path for ktls")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06030dba
    • D
      tls: fix use-after-free in tls_push_record · a447da7d
      Daniel Borkmann 提交于
      syzkaller managed to trigger a use-after-free in tls like the
      following:
      
        BUG: KASAN: use-after-free in tls_push_record.constprop.15+0x6a2/0x810 [tls]
        Write of size 1 at addr ffff88037aa08000 by task a.out/2317
      
        CPU: 3 PID: 2317 Comm: a.out Not tainted 4.17.0+ #144
        Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
        Call Trace:
         dump_stack+0x71/0xab
         print_address_description+0x6a/0x280
         kasan_report+0x258/0x380
         ? tls_push_record.constprop.15+0x6a2/0x810 [tls]
         tls_push_record.constprop.15+0x6a2/0x810 [tls]
         tls_sw_push_pending_record+0x2e/0x40 [tls]
         tls_sk_proto_close+0x3fe/0x710 [tls]
         ? tcp_check_oom+0x4c0/0x4c0
         ? tls_write_space+0x260/0x260 [tls]
         ? kmem_cache_free+0x88/0x1f0
         inet_release+0xd6/0x1b0
         __sock_release+0xc0/0x240
         sock_close+0x11/0x20
         __fput+0x22d/0x660
         task_work_run+0x114/0x1a0
         do_exit+0x71a/0x2780
         ? mm_update_next_owner+0x650/0x650
         ? handle_mm_fault+0x2f5/0x5f0
         ? __do_page_fault+0x44f/0xa50
         ? mm_fault_error+0x2d0/0x2d0
         do_group_exit+0xde/0x300
         __x64_sys_exit_group+0x3a/0x50
         do_syscall_64+0x9a/0x300
         ? page_fault+0x8/0x30
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This happened through fault injection where aead_req allocation in
      tls_do_encryption() eventually failed and we returned -ENOMEM from
      the function. Turns out that the use-after-free is triggered from
      tls_sw_sendmsg() in the second tls_push_record(). The error then
      triggers a jump to waiting for memory in sk_stream_wait_memory()
      resp. returning immediately in case of MSG_DONTWAIT. What follows is
      the trim_both_sgl(sk, orig_size), which drops elements from the sg
      list added via tls_sw_sendmsg(). Now the use-after-free gets triggered
      when the socket is being closed, where tls_sk_proto_close() callback
      is invoked. The tls_complete_pending_work() will figure that there's
      a pending closed tls record to be flushed and thus calls into the
      tls_push_pending_closed_record() from there. ctx->push_pending_record()
      is called from the latter, which is the tls_sw_push_pending_record()
      from sw path. This again calls into tls_push_record(). And here the
      tls_fill_prepend() will panic since the buffer address has been freed
      earlier via trim_both_sgl(). One way to fix it is to move the aead
      request allocation out of tls_do_encryption() early into tls_push_record().
      This means we don't prep the tls header and advance state to the
      TLS_PENDING_CLOSED_RECORD before allocation which could potentially
      fail happened. That fixes the issue on my side.
      
      Fixes: 3c4d7559 ("tls: kernel TLS support")
      Reported-by: syzbot+5c74af81c547738e1684@syzkaller.appspotmail.com
      Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a447da7d
    • G
      l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl() · ecd012e4
      Guillaume Nault 提交于
      pppol2tp_tunnel_ioctl() can act on an L2TPv3 tunnel, in which case
      'session' may be an Ethernet pseudo-wire.
      
      However, pppol2tp_session_ioctl() expects a PPP pseudo-wire, as it
      assumes l2tp_session_priv() points to a pppol2tp_session structure. For
      an Ethernet pseudo-wire l2tp_session_priv() points to an l2tp_eth_sess
      structure instead, making pppol2tp_session_ioctl() access invalid
      memory.
      
      Fixes: d9e31d17 ("l2tp: Add L2TP ethernet pseudowire support")
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ecd012e4
    • G
      l2tp: reject creation of non-PPP sessions on L2TPv2 tunnels · de9bada5
      Guillaume Nault 提交于
      The /proc/net/pppol2tp handlers (pppol2tp_seq_*()) iterate over all
      L2TPv2 tunnels, and rightfully expect that only PPP sessions can be
      found there. However, l2tp_netlink accepts creating Ethernet sessions
      regardless of the underlying tunnel version.
      
      This confuses pppol2tp_seq_session_show(), which expects that
      l2tp_session_priv() returns a pppol2tp_session structure. When the
      session is an Ethernet pseudo-wire, a struct l2tp_eth_sess is returned
      instead. This leads to invalid memory access when
      pppol2tp_session_get_sock() later tries to dereference ps->sk.
      
      Fixes: d9e31d17 ("l2tp: Add L2TP ethernet pseudowire support")
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de9bada5
    • I
      ipv6: Only emit append events for appended routes · 6eba08c3
      Ido Schimmel 提交于
      Current code will emit an append event in the FIB notification chain for
      any route added with NLM_F_APPEND set, even if the route was not
      appended to any existing route.
      
      This is inconsistent with IPv4 where such an event is only emitted when
      the new route is appended after an existing one.
      
      Align IPv6 behavior with IPv4, thereby allowing listeners to more easily
      handle these events.
      
      Fixes: f34436a4 ("net/ipv6: Simplify route replace and appending into multipath route")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6eba08c3
  11. 15 6月, 2018 1 次提交