1. 02 7月, 2016 1 次提交
    • D
      packet: Use symmetric hash for PACKET_FANOUT_HASH. · eb70db87
      David S. Miller 提交于
      People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that
      they want packets going in both directions on a flow to hash to the
      same bucket.
      
      The core kernel SKB hash became non-symmetric when the ipv6 flow label
      and other entities were incorporated into the standard flow hash order
      to increase entropy.
      
      But there are no users of PACKET_FANOUT_HASH who want an assymetric
      hash, they all want a symmetric one.
      
      Therefore, use the flow dissector to compute a flat symmetric hash
      over only the protocol, addresses and ports.  This hash does not get
      installed into and override the normal skb hash, so this change has
      no effect whatsoever on the rest of the stack.
      Reported-by: NEric Leblond <eric@regit.org>
      Tested-by: NEric Leblond <eric@regit.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb70db87
  2. 10 6月, 2016 1 次提交
  3. 15 4月, 2016 1 次提交
  4. 14 4月, 2016 1 次提交
  5. 07 4月, 2016 1 次提交
  6. 05 4月, 2016 1 次提交
    • S
      sock: enable timestamping using control messages · c14ac945
      Soheil Hassas Yeganeh 提交于
      Currently, SOL_TIMESTAMPING can only be enabled using setsockopt.
      This is very costly when users want to sample writes to gather
      tx timestamps.
      
      Add support for enabling SO_TIMESTAMPING via control messages by
      using tsflags added in `struct sockcm_cookie` (added in the previous
      patches in this series) to set the tx_flags of the last skb created in
      a sendmsg. With this patch, the timestamp recording bits in tx_flags
      of the skbuff is overridden if SO_TIMESTAMPING is passed in a cmsg.
      
      Please note that this is only effective for overriding the recording
      timestamps flags. Users should enable timestamp reporting (e.g.,
      SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_OPT_ID) using
      socket options and then should ask for SOF_TIMESTAMPING_TX_*
      using control messages per sendmsg to sample timestamps for each
      write.
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c14ac945
  7. 10 3月, 2016 1 次提交
  8. 26 2月, 2016 1 次提交
  9. 09 2月, 2016 4 次提交
  10. 30 11月, 2015 1 次提交
  11. 18 11月, 2015 2 次提交
  12. 16 11月, 2015 5 次提交
  13. 06 11月, 2015 1 次提交
    • F
      packet: race condition in packet_bind · 30f7ea1c
      Francesco Ruggeri 提交于
      There is a race conditions between packet_notifier and packet_bind{_spkt}.
      
      It happens if packet_notifier(NETDEV_UNREGISTER) executes between the
      time packet_bind{_spkt} takes a reference on the new netdevice and the
      time packet_do_bind sets po->ifindex.
      In this case the notification can be missed.
      If this happens during a dev_change_net_namespace this can result in the
      netdevice to be moved to the new namespace while the packet_sock in the
      old namespace still holds a reference on it. When the netdevice is later
      deleted in the new namespace the deletion hangs since the packet_sock
      is not found in the new namespace' &net->packet.sklist.
      It can be reproduced with the script below.
      
      This patch makes packet_do_bind check again for the presence of the
      netdevice in the packet_sock's namespace after the synchronize_net
      in unregister_prot_hook.
      More in general it also uses the rcu lock for the duration of the bind
      to stop dev_change_net_namespace/rollback_registered_many from
      going past the synchronize_net following unlist_netdevice, so that
      no NETDEV_UNREGISTER notifications can happen on the new netdevice
      while the bind is executing. In order to do this some code from
      packet_bind{_spkt} is consolidated into packet_do_dev.
      
      import socket, os, time, sys
      proto=7
      realDev='em1'
      vlanId=400
      if len(sys.argv) > 1:
         vlanId=int(sys.argv[1])
      dev='vlan%d' % vlanId
      
      os.system('taskset -p 0x10 %d' % os.getpid())
      
      s = socket.socket(socket.PF_PACKET, socket.SOCK_RAW, proto)
      os.system('ip link add link %s name %s type vlan id %d' %
                (realDev, dev, vlanId))
      os.system('ip netns add dummy')
      
      pid=os.fork()
      
      if pid == 0:
         # dev should be moved while packet_do_bind is in synchronize net
         os.system('taskset -p 0x20000 %d' % os.getpid())
         os.system('ip link set %s netns dummy' % dev)
         os.system('ip netns exec dummy ip link del %s' % dev)
         s.close()
         sys.exit(0)
      
      time.sleep(.004)
      try:
         s.bind(('%s' % dev, proto+1))
      except:
         print 'Could not bind socket'
         s.close()
         os.system('ip netns del dummy')
         sys.exit(0)
      
      os.waitpid(pid, 0)
      s.close()
      os.system('ip netns del dummy')
      sys.exit(0)
      Signed-off-by: NFrancesco Ruggeri <fruggeri@arista.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30f7ea1c
  14. 13 10月, 2015 3 次提交
  15. 11 10月, 2015 1 次提交
    • A
      bpf: fix cb access in socket filter programs · ff936a04
      Alexei Starovoitov 提交于
      eBPF socket filter programs may see junk in 'u32 cb[5]' area,
      since it could have been used by protocol layers earlier.
      
      For socket filter programs used in af_packet we need to clean
      20 bytes of skb->cb area if it could be used by the program.
      For programs attached to TCP/UDP sockets we need to save/restore
      these 20 bytes, since it's used by protocol layers.
      
      Remove SK_RUN_FILTER macro, since it's no longer used.
      
      Long term we may move this bpf cb area to per-cpu scratch, but that
      requires addition of new 'per-cpu load/store' instructions,
      so not suitable as a short term fix.
      
      Fixes: d691f9e8 ("bpf: allow programs to write to certain skb fields")
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff936a04
  16. 05 10月, 2015 1 次提交
    • D
      bpf, seccomp: prepare for upcoming criu support · bab18991
      Daniel Borkmann 提交于
      The current ongoing effort to dump existing cBPF seccomp filters back
      to user space requires to hold the pre-transformed instructions like
      we do in case of socket filters from sk_attach_filter() side, so they
      can be reloaded in original form at a later point in time by utilities
      such as criu.
      
      To prepare for this, simply extend the bpf_prog_create_from_user()
      API to hold a flag that tells whether we should store the original
      or not. Also, fanout filters could make use of that in future for
      things like diag. While fanout filters already use bpf_prog_destroy(),
      move seccomp over to them as well to handle original programs when
      present.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Tycho Andersen <tycho.andersen@canonical.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Tested-by: NTycho Andersen <tycho.andersen@canonical.com>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bab18991
  17. 24 9月, 2015 1 次提交
    • D
      Fix AF_PACKET ABI breakage in 4.2 · d3869efe
      David Woodhouse 提交于
      Commit 7d824109 ("virtio: add explicit big-endian support to memory
      accessors") accidentally changed the virtio_net header used by
      AF_PACKET with PACKET_VNET_HDR from host-endian to big-endian.
      
      Since virtio_legacy_is_little_endian() is a very long identifier,
      define a vio_le macro and use that throughout the code instead of the
      hard-coded 'false' for little-endian.
      
      This restores the ABI to match 4.1 and earlier kernels, and makes my
      test program work again.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3869efe
  18. 18 8月, 2015 2 次提交
  19. 30 7月, 2015 1 次提交
  20. 29 7月, 2015 1 次提交
  21. 28 7月, 2015 1 次提交
  22. 23 6月, 2015 1 次提交
  23. 22 6月, 2015 3 次提交
  24. 18 5月, 2015 1 次提交
    • W
      net-packet: fix null pointer exception in rollover mode · 4633c9e0
      Willem de Bruijn 提交于
      Rollover can be enabled as flag or mode. Allocate state in both cases.
      This solves a NULL pointer exception in fanout_demux_rollover on
      referencing po->rollover if using mode rollover.
      
      Also make sure that in rollover mode each silo is tried (contrary
      to rollover flag, where the main socket is excluded after an initial
      try_self).
      
      Tested:
        Passes tools/testing/net/psock_fanout.c, which tests both modes and
        flag. My previous tests were limited to bench_rollover, which only
        stresses the flag. The test now completes safely. it still gives an
        error for mode rollover, because it does not expect the new headroom
        (ROOM_NORMAL) requirement. I will send a separate patch to the test.
      
      Fixes: 0648ab70 ("packet: rollover prepare: per-socket state")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      
      ----
      
      I should have run this test and caught this before submission, of
      course. Apologies for the oversight.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4633c9e0
  25. 15 5月, 2015 1 次提交
  26. 14 5月, 2015 2 次提交
    • W
      packet: rollover statistics · a9b63918
      Willem de Bruijn 提交于
      Rollover indicates exceptional conditions. Export a counter to inform
      socket owners of this state.
      
      If no socket with sufficient room is found, rollover fails. Also count
      these events.
      
      Finally, also count when flows are rolled over early thanks to huge
      flow detection, to validate its correctness.
      
      Tested:
        Read counters in bench_rollover on all other tests in the patchset
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9b63918
    • W
      packet: rollover huge flows before small flows · 3b3a5b0a
      Willem de Bruijn 提交于
      Migrate flows from a socket to another socket in the fanout group not
      only when the socket is full. Start migrating huge flows early, to
      divert possible 4-tuple attacks without affecting normal traffic.
      
      Introduce fanout_flow_is_huge(). This detects huge flows, which are
      defined as taking up more than half the load. It does so cheaply, by
      storing the rxhashes of the N most recent packets. If over half of
      these are the same rxhash as the current packet, then drop it. This
      only protects against 4-tuple attacks. N is chosen to fit all data in
      a single cache line.
      
      Tested:
        Ran bench_rollover for 10 sec with 1.5 Mpps of single flow input.
      
          lpbb5:/export/hda3/willemb# ./bench_rollover -l 1000 -r -s
          cpu         rx       rx.k     drop.k   rollover     r.huge   r.failed
            0         14         14          0          0          0          0
            1         20         20          0          0          0          0
            2         16         16          0          0          0          0
            3    6168824    6168824          0    4867721    4867721          0
            4    4867741    4867741          0          0          0          0
            5         12         12          0          0          0          0
            6         15         15          0          0          0          0
            7         17         17          0          0          0          0
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b3a5b0a