1. 14 11月, 2017 1 次提交
  2. 25 10月, 2017 1 次提交
    • K
      net: af_packet: Convert timers to use timer_setup() · 17bfd8c8
      Kees Cook 提交于
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Mike Maloney <maloney@google.com>
      Cc: Jarno Rajahalme <jarno@ovn.org>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17bfd8c8
  3. 21 10月, 2017 1 次提交
    • E
      packet: avoid panic in packet_getsockopt() · 509c7a1e
      Eric Dumazet 提交于
      syzkaller got crashes in packet_getsockopt() processing
      PACKET_ROLLOVER_STATS command while another thread was managing
      to change po->rollover
      
      Using RCU will fix this bug. We might later add proper RCU annotations
      for sparse sake.
      
      In v2: I replaced kfree(rollover) in fanout_add() to kfree_rcu()
      variant, as spotted by John.
      
      Fixes: a9b63918 ("packet: rollover statistics")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: John Sperbeck <jsperbeck@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      509c7a1e
  4. 29 9月, 2017 2 次提交
  5. 26 9月, 2017 1 次提交
  6. 21 9月, 2017 1 次提交
    • W
      packet: hold bind lock when rebinding to fanout hook · 008ba2a1
      Willem de Bruijn 提交于
      Packet socket bind operations must hold the po->bind_lock. This keeps
      po->running consistent with whether the socket is actually on a ptype
      list to receive packets.
      
      fanout_add unbinds a socket and its packet_rcv/tpacket_rcv call, then
      binds the fanout object to receive through packet_rcv_fanout.
      
      Make it hold the po->bind_lock when testing po->running and rebinding.
      Else, it can race with other rebind operations, such as that in
      packet_set_ring from packet_rcv to tpacket_rcv. Concurrent updates
      can result in a socket being added to a fanout group twice, causing
      use-after-free KASAN bug reports, among others.
      
      Reported independently by both trinity and syzkaller.
      Verified that the syzkaller reproducer passes after this patch.
      
      Fixes: dc99f600 ("packet: Add fanout support.")
      Reported-by: Nnixioaming <nixiaoming@huawei.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      008ba2a1
  7. 30 8月, 2017 1 次提交
  8. 11 8月, 2017 1 次提交
  9. 25 7月, 2017 1 次提交
    • W
      packet: fix use-after-free in prb_retire_rx_blk_timer_expired() · c800aaf8
      WANG Cong 提交于
      There are multiple reports showing we have a use-after-free in
      the timer prb_retire_rx_blk_timer_expired(), where we use struct
      tpacket_kbdq_core::pkbdq, a pg_vec, after it gets freed by
      free_pg_vec().
      
      The interesting part is it is not freed via packet_release() but
      via packet_setsockopt(), which means we are not closing the socket.
      Looking into the big and fat function packet_set_ring(), this could
      happen if we satisfy the following conditions:
      
      1. closing == 0, not on packet_release() path
      2. req->tp_block_nr == 0, we don't allocate a new pg_vec
      3. rx_ring->pg_vec is already set as V3, which means we already called
         packet_set_ring() wtih req->tp_block_nr > 0 previously
      4. req->tp_frame_nr == 0, pass sanity check
      5. po->mapped == 0, never called mmap()
      
      In this scenario we are clearing the old rx_ring->pg_vec, so we need
      to free this pg_vec, but we don't stop the timer on this path because
      of closing==0.
      
      The timer has to be stopped as long as we need to free pg_vec, therefore
      the check on closing!=0 is wrong, we should check pg_vec!=NULL instead.
      
      Thanks to liujian for testing different fixes.
      
      Reported-by: alexander.levin@verizon.com
      Reported-by: NDave Jones <davej@codemonkey.org.uk>
      Reported-by: Nliujian (CE) <liujian56@huawei.com>
      Tested-by: Nliujian (CE) <liujian56@huawei.com>
      Cc: Ding Tianhong <dingtianhong@huawei.com>
      Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c800aaf8
  10. 20 7月, 2017 1 次提交
  11. 14 7月, 2017 1 次提交
    • I
      net/packet: Fix Tx queue selection for AF_PACKET · ccd4eb49
      Iván Briano 提交于
      When PACKET_QDISC_BYPASS is not used, Tx queue selection will be done
      before the packet is enqueued, taking into account any mappings set by
      a queuing discipline such as mqprio without hardware offloading. This
      selection may be affected by a previously saved queue_mapping, either on
      the Rx path, or done before the packet reaches the device, as it's
      currently the case for AF_PACKET.
      
      In order for queue selection to work as expected when using traffic
      control, there can't be another selection done before that point is
      reached, so move the call to packet_pick_tx_queue to
      packet_direct_xmit, leaving the default xmit path as it was before
      PACKET_QDISC_BYPASS was introduced.
      
      A forward declaration of packet_pick_tx_queue() is introduced to avoid
      the need to reorder the functions within the file.
      
      Fixes: d346a3fa ("packet: introduce PACKET_QDISC_BYPASS socket option")
      Signed-off-by: NIván Briano <ivan.briano@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccd4eb49
  12. 01 7月, 2017 3 次提交
  13. 11 6月, 2017 1 次提交
  14. 26 5月, 2017 1 次提交
  15. 16 5月, 2017 1 次提交
  16. 26 4月, 2017 1 次提交
  17. 25 4月, 2017 1 次提交
    • M
      packet: add PACKET_FANOUT_FLAG_UNIQUEID to assign new fanout group id. · 4a69a864
      Mike Maloney 提交于
      Fanout uses a per net global namespace. A process that intends to create
      a new fanout group can accidentally join an existing group. It is not
      possible to detect this.
      
      Add socket option PACKET_FANOUT_FLAG_UNIQUEID.  When specified the
      supplied fanout group id must be set to 0, and the kernel chooses an id
      that is not already in use.  This is an ephemeral flag so that
      other sockets can be added to this group using setsockopt, but NOT
      specifying this flag.  The current getsockopt(..., PACKET_FANOUT, ...)
      can be used to retrieve the new group id.
      
      We assume that there are not a lot of fanout groups and that this is not
      a high frequency call.
      
      The method assigns ids starting at zero and increases until it finds an
      unused id.  It keeps track of the last assigned id, and uses it as a
      starting point to find new ids.
      Signed-off-by: NMike Maloney <maloney@google.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a69a864
  18. 31 3月, 2017 3 次提交
  19. 02 3月, 2017 1 次提交
    • A
      net: don't call strlen() on the user buffer in packet_bind_spkt() · 540e2894
      Alexander Potapenko 提交于
      KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
      uninitialized memory in packet_bind_spkt():
      Acked-by: NEric Dumazet <edumazet@google.com>
      
      ==================================================================
      BUG: KMSAN: use of unitialized memory
      CPU: 0 PID: 1074 Comm: packet Not tainted 4.8.0-rc6+ #1891
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
      01/01/2011
       0000000000000000 ffff88006b6dfc08 ffffffff82559ae8 ffff88006b6dfb48
       ffffffff818a7c91 ffffffff85b9c870 0000000000000092 ffffffff85b9c550
       0000000000000000 0000000000000092 00000000ec400911 0000000000000002
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff82559ae8>] dump_stack+0x238/0x290 lib/dump_stack.c:51
       [<ffffffff818a6626>] kmsan_report+0x276/0x2e0 mm/kmsan/kmsan.c:1003
       [<ffffffff818a783b>] __msan_warning+0x5b/0xb0
      mm/kmsan/kmsan_instr.c:424
       [<     inline     >] strlen lib/string.c:484
       [<ffffffff8259b58d>] strlcpy+0x9d/0x200 lib/string.c:144
       [<ffffffff84b2eca4>] packet_bind_spkt+0x144/0x230
      net/packet/af_packet.c:3132
       [<ffffffff84242e4d>] SYSC_bind+0x40d/0x5f0 net/socket.c:1370
       [<ffffffff84242a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
       [<ffffffff8515991b>] entry_SYSCALL_64_fastpath+0x13/0x8f
      arch/x86/entry/entry_64.o:?
      chained origin: 00000000eba00911
       [<ffffffff810bb787>] save_stack_trace+0x27/0x50
      arch/x86/kernel/stacktrace.c:67
       [<     inline     >] kmsan_save_stack_with_flags mm/kmsan/kmsan.c:322
       [<     inline     >] kmsan_save_stack mm/kmsan/kmsan.c:334
       [<ffffffff818a59f8>] kmsan_internal_chain_origin+0x118/0x1e0
      mm/kmsan/kmsan.c:527
       [<ffffffff818a7773>] __msan_set_alloca_origin4+0xc3/0x130
      mm/kmsan/kmsan_instr.c:380
       [<ffffffff84242b69>] SYSC_bind+0x129/0x5f0 net/socket.c:1356
       [<ffffffff84242a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
       [<ffffffff8515991b>] entry_SYSCALL_64_fastpath+0x13/0x8f
      arch/x86/entry/entry_64.o:?
      origin description: ----address@SYSC_bind (origin=00000000eb400911)
      ==================================================================
      (the line numbers are relative to 4.8-rc6, but the bug persists
      upstream)
      
      , when I run the following program as root:
      
      =====================================
       #include <string.h>
       #include <sys/socket.h>
       #include <netpacket/packet.h>
       #include <net/ethernet.h>
      
       int main() {
         struct sockaddr addr;
         memset(&addr, 0xff, sizeof(addr));
         addr.sa_family = AF_PACKET;
         int fd = socket(PF_PACKET, SOCK_PACKET, htons(ETH_P_ALL));
         bind(fd, &addr, sizeof(addr));
         return 0;
       }
      =====================================
      
      This happens because addr.sa_data copied from the userspace is not
      zero-terminated, and copying it with strlcpy() in packet_bind_spkt()
      results in calling strlen() on the kernel copy of that non-terminated
      buffer.
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      540e2894
  20. 18 2月, 2017 1 次提交
    • A
      packet: Do not call fanout_release from atomic contexts · 2bd624b4
      Anoob Soman 提交于
      Commit 66644982 ("packet: call fanout_release, while UNREGISTERING a
      netdev"), unfortunately, introduced the following issues.
      
      1. calling mutex_lock(&fanout_mutex) (fanout_release()) from inside
      rcu_read-side critical section. rcu_read_lock disables preemption, most often,
      which prohibits calling sleeping functions.
      
      [  ] include/linux/rcupdate.h:560 Illegal context switch in RCU read-side critical section!
      [  ]
      [  ] rcu_scheduler_active = 1, debug_locks = 0
      [  ] 4 locks held by ovs-vswitchd/1969:
      [  ]  #0:  (cb_lock){++++++}, at: [<ffffffff8158a6c9>] genl_rcv+0x19/0x40
      [  ]  #1:  (ovs_mutex){+.+.+.}, at: [<ffffffffa04878ca>] ovs_vport_cmd_del+0x4a/0x100 [openvswitch]
      [  ]  #2:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81564157>] rtnl_lock+0x17/0x20
      [  ]  #3:  (rcu_read_lock){......}, at: [<ffffffff81614165>] packet_notifier+0x5/0x3f0
      [  ]
      [  ] Call Trace:
      [  ]  [<ffffffff813770c1>] dump_stack+0x85/0xc4
      [  ]  [<ffffffff810c9077>] lockdep_rcu_suspicious+0x107/0x110
      [  ]  [<ffffffff810a2da7>] ___might_sleep+0x57/0x210
      [  ]  [<ffffffff810a2fd0>] __might_sleep+0x70/0x90
      [  ]  [<ffffffff8162e80c>] mutex_lock_nested+0x3c/0x3a0
      [  ]  [<ffffffff810de93f>] ? vprintk_default+0x1f/0x30
      [  ]  [<ffffffff81186e88>] ? printk+0x4d/0x4f
      [  ]  [<ffffffff816106dd>] fanout_release+0x1d/0xe0
      [  ]  [<ffffffff81614459>] packet_notifier+0x2f9/0x3f0
      
      2. calling mutex_lock(&fanout_mutex) inside spin_lock(&po->bind_lock).
      "sleeping function called from invalid context"
      
      [  ] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
      [  ] in_atomic(): 1, irqs_disabled(): 0, pid: 1969, name: ovs-vswitchd
      [  ] INFO: lockdep is turned off.
      [  ] Call Trace:
      [  ]  [<ffffffff813770c1>] dump_stack+0x85/0xc4
      [  ]  [<ffffffff810a2f52>] ___might_sleep+0x202/0x210
      [  ]  [<ffffffff810a2fd0>] __might_sleep+0x70/0x90
      [  ]  [<ffffffff8162e80c>] mutex_lock_nested+0x3c/0x3a0
      [  ]  [<ffffffff816106dd>] fanout_release+0x1d/0xe0
      [  ]  [<ffffffff81614459>] packet_notifier+0x2f9/0x3f0
      
      3. calling dev_remove_pack(&fanout->prot_hook), from inside
      spin_lock(&po->bind_lock) or rcu_read-side critical-section. dev_remove_pack()
      -> synchronize_net(), which might sleep.
      
      [  ] BUG: scheduling while atomic: ovs-vswitchd/1969/0x00000002
      [  ] INFO: lockdep is turned off.
      [  ] Call Trace:
      [  ]  [<ffffffff813770c1>] dump_stack+0x85/0xc4
      [  ]  [<ffffffff81186274>] __schedule_bug+0x64/0x73
      [  ]  [<ffffffff8162b8cb>] __schedule+0x6b/0xd10
      [  ]  [<ffffffff8162c5db>] schedule+0x6b/0x80
      [  ]  [<ffffffff81630b1d>] schedule_timeout+0x38d/0x410
      [  ]  [<ffffffff810ea3fd>] synchronize_sched_expedited+0x53d/0x810
      [  ]  [<ffffffff810ea6de>] synchronize_rcu_expedited+0xe/0x10
      [  ]  [<ffffffff8154eab5>] synchronize_net+0x35/0x50
      [  ]  [<ffffffff8154eae3>] dev_remove_pack+0x13/0x20
      [  ]  [<ffffffff8161077e>] fanout_release+0xbe/0xe0
      [  ]  [<ffffffff81614459>] packet_notifier+0x2f9/0x3f0
      
      4. fanout_release() races with calls from different CPU.
      
      To fix the above problems, remove the call to fanout_release() under
      rcu_read_lock(). Instead, call __dev_remove_pack(&fanout->prot_hook) and
      netdev_run_todo will be happy that &dev->ptype_specific list is empty. In order
      to achieve this, I moved dev_{add,remove}_pack() out of fanout_{add,release} to
      __fanout_{link,unlink}. So, call to {,__}unregister_prot_hook() will make sure
      fanout->prot_hook is removed as well.
      
      Fixes: 66644982 ("packet: call fanout_release, while UNREGISTERING a netdev")
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NAnoob Soman <anoob.soman@citrix.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2bd624b4
  21. 15 2月, 2017 1 次提交
    • E
      packet: fix races in fanout_add() · d199fab6
      Eric Dumazet 提交于
      Multiple threads can call fanout_add() at the same time.
      
      We need to grab fanout_mutex earlier to avoid races that could
      lead to one thread freeing po->rollover that was set by another thread.
      
      Do the same in fanout_release(), for peace of mind, and to help us
      finding lockdep issues earlier.
      
      Fixes: dc99f600 ("packet: Add fanout support.")
      Fixes: 0648ab70 ("packet: rollover prepare: per-socket state")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d199fab6
  22. 09 2月, 2017 1 次提交
  23. 21 1月, 2017 1 次提交
  24. 05 1月, 2017 1 次提交
  25. 04 1月, 2017 1 次提交
    • S
      af_packet: TX_RING support for TPACKET_V3 · 7f953ab2
      Sowmini Varadhan 提交于
      Although TPACKET_V3 Rx has some benefits over TPACKET_V2 Rx, *_v3
      does not currently have TX_RING support. As a result an application
      that wants the best perf for Tx and Rx (e.g. to handle request/response
      transacations) ends up needing 2 sockets, one with *_v2 for Tx and
      another with *_v3 for Rx.
      
      This patch enables TPACKET_V2 compatible Tx features in TPACKET_V3
      so that an application can use a single descriptor to get the benefits
      of _v3 RX_RING and _v2 TX_RING. An application may do a block-send by
      first filling up multiple frames in the Tx ring and then triggering a
      transmit. This patch only support fixed size Tx frames for TPACKET_V3,
      and requires that tp_next_offset must be zero.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f953ab2
  26. 25 12月, 2016 1 次提交
  27. 06 12月, 2016 1 次提交
    • A
      [iov_iter] new primitives - copy_from_iter_full() and friends · cbbd26b8
      Al Viro 提交于
      copy_from_iter_full(), copy_from_iter_full_nocache() and
      csum_and_copy_from_iter_full() - counterparts of copy_from_iter()
      et.al., advancing iterator only in case of successful full copy
      and returning whether it had been successful or not.
      
      Convert some obvious users.  *NOTE* - do not blindly assume that
      something is a good candidate for those unless you are sure that
      not advancing iov_iter in failure case is the right thing in
      this case.  Anything that does short read/short write kind of
      stuff (or is in a loop, etc.) is unlikely to be a good one.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cbbd26b8
  28. 03 12月, 2016 1 次提交
    • P
      packet: fix race condition in packet_set_ring · 84ac7260
      Philip Pettersson 提交于
      When packet_set_ring creates a ring buffer it will initialize a
      struct timer_list if the packet version is TPACKET_V3. This value
      can then be raced by a different thread calling setsockopt to
      set the version to TPACKET_V1 before packet_set_ring has finished.
      
      This leads to a use-after-free on a function pointer in the
      struct timer_list when the socket is closed as the previously
      initialized timer will not be deleted.
      
      The bug is fixed by taking lock_sock(sk) in packet_setsockopt when
      changing the packet version while also taking the lock at the start
      of packet_set_ring.
      
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      Signed-off-by: NPhilip Pettersson <philip.pettersson@gmail.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84ac7260
  29. 19 11月, 2016 3 次提交
  30. 30 10月, 2016 1 次提交
    • W
      packet: on direct_xmit, limit tso and csum to supported devices · 104ba78c
      Willem de Bruijn 提交于
      When transmitting on a packet socket with PACKET_VNET_HDR and
      PACKET_QDISC_BYPASS, validate device support for features requested
      in vnet_hdr.
      
      Drop TSO packets sent to devices that do not support TSO or have the
      feature disabled. Note that the latter currently do process those
      packets correctly, regardless of not advertising the feature.
      
      Because of SKB_GSO_DODGY, it is not sufficient to test device features
      with netif_needs_gso. Full validate_xmit_skb is needed.
      
      Switch to software checksum for non-TSO packets that request checksum
      offload if that device feature is unsupported or disabled. Note that
      similar to the TSO case, device drivers may perform checksum offload
      correctly even when not advertising it.
      
      When switching to software checksum, packets hit skb_checksum_help,
      which has two BUG_ON checksum not in linear segment. Packet sockets
      always allocate at least up to csum_start + csum_off + 2 as linear.
      
      Tested by running github.com/wdebruij/kerneltools/psock_txring_vnet.c
      
        ethtool -K eth0 tso off tx on
        psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v
        psock_txring_vnet -d $dst -s $src -i eth0 -l 2000 -n 1 -q -v -N
      
        ethtool -K eth0 tx off
        psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G
        psock_txring_vnet -d $dst -s $src -i eth0 -l 1000 -n 1 -q -v -G -N
      
      v2:
        - add EXPORT_SYMBOL_GPL(validate_xmit_skb_list)
      
      Fixes: d346a3fa ("packet: introduce PACKET_QDISC_BYPASS socket option")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      104ba78c
  31. 07 10月, 2016 1 次提交
  32. 22 7月, 2016 1 次提交
  33. 20 7月, 2016 1 次提交