1. 18 12月, 2013 2 次提交
  2. 10 12月, 2013 1 次提交
    • D
      packet: introduce PACKET_QDISC_BYPASS socket option · d346a3fa
      Daniel Borkmann 提交于
      This patch introduces a PACKET_QDISC_BYPASS socket option, that
      allows for using a similar xmit() function as in pktgen instead
      of taking the dev_queue_xmit() path. This can be very useful when
      PF_PACKET applications are required to be used in a similar
      scenario as pktgen, but with full, flexible packet payload that
      needs to be provided, for example.
      
      On default, nothing changes in behaviour for normal PF_PACKET
      TX users, so everything stays as is for applications. New users,
      however, can now set PACKET_QDISC_BYPASS if needed to prevent
      own packets from i) reentering packet_rcv() and ii) to directly
      push the frame to the driver.
      
      In doing so we can increase pps (here 64 byte packets) for
      PF_PACKET a bit:
      
        # CPUs -- QDISC_BYPASS   -- qdisc path -- qdisc path[**]
        1 CPU  ==  1,509,628 pps --  1,208,708 --  1,247,436
        2 CPUs ==  3,198,659 pps --  2,536,012 --  1,605,779
        3 CPUs ==  4,787,992 pps --  3,788,740 --  1,735,610
        4 CPUs ==  6,173,956 pps --  4,907,799 --  1,909,114
        5 CPUs ==  7,495,676 pps --  5,956,499 --  2,014,422
        6 CPUs ==  9,001,496 pps --  7,145,064 --  2,155,261
        7 CPUs == 10,229,776 pps --  8,190,596 --  2,220,619
        8 CPUs == 11,040,732 pps --  9,188,544 --  2,241,879
        9 CPUs == 12,009,076 pps -- 10,275,936 --  2,068,447
       10 CPUs == 11,380,052 pps -- 11,265,337 --  1,578,689
       11 CPUs == 11,672,676 pps -- 11,845,344 --  1,297,412
       [...]
       20 CPUs == 11,363,192 pps -- 11,014,933 --  1,245,081
      
       [**]: qdisc path with packet_rcv(), how probably most people
             seem to use it (hopefully not anymore if not needed)
      
      The test was done using a modified trafgen, sending a simple
      static 64 bytes packet, on all CPUs.  The trick in the fast
      "qdisc path" case, is to avoid reentering packet_rcv() by
      setting the RAW socket protocol to zero, like:
      socket(PF_PACKET, SOCK_RAW, 0);
      
      Tradeoffs are documented as well in this patch, clearly, if
      queues are busy, we will drop more packets, tc disciplines are
      ignored, and these packets are not visible to taps anymore. For
      a pktgen like scenario, we argue that this is acceptable.
      
      The pointer to the xmit function has been placed in packet
      socket structure hole between cached_dev and prot_hook that
      is hot anyway as we're working on cached_dev in each send path.
      
      Done in joint work together with Jesper Dangaard Brouer.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d346a3fa
  3. 30 8月, 2013 1 次提交
  4. 25 4月, 2013 2 次提交
  5. 20 3月, 2013 1 次提交
    • W
      packet: packet fanout rollover during socket overload · 77f65ebd
      Willem de Bruijn 提交于
      Changes:
        v3->v2: rebase (no other changes)
                passes selftest
        v2->v1: read f->num_members only once
                fix bug: test rollover mode + flag
      
      Minimize packet drop in a fanout group. If one socket is full,
      roll over packets to another from the group. Maintain flow
      affinity during normal load using an rxhash fanout policy, while
      dispersing unexpected traffic storms that hit a single cpu, such
      as spoofed-source DoS flows. Rollover breaks affinity for flows
      arriving at saturated sockets during those conditions.
      
      The patch adds a fanout policy ROLLOVER that rotates between sockets,
      filling each socket before moving to the next. It also adds a fanout
      flag ROLLOVER. If passed along with any other fanout policy, the
      primary policy is applied until the chosen socket is full. Then,
      rollover selects another socket, to delay packet drop until the
      entire system is saturated.
      
      Probing sockets is not free. Selecting the last used socket, as
      rollover does, is a greedy approach that maximizes chance of
      success, at the cost of extreme load imbalance. In practice, with
      sufficiently long queues to absorb bursts, sockets are drained in
      parallel and load balance looks uniform in `top`.
      
      To avoid contention, scales counters with number of sockets and
      accesses them lockfree. Values are bounds checked to ensure
      correctness.
      
      Tested using an application with 9 threads pinned to CPUs, one socket
      per thread and sufficient busywork per packet operation to limits each
      thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
      packets, a FANOUT_CPU setup processes 32 Kpps in total without this
      patch, 270 Kpps with the patch. Tested with read() and with a packet
      ring (V1).
      
      Also, passes psock_fanout.c unit test added to selftests.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77f65ebd
  6. 08 11月, 2012 1 次提交
  7. 13 10月, 2012 1 次提交
  8. 04 10月, 2011 1 次提交
  9. 27 8月, 2011 1 次提交
  10. 25 8月, 2011 1 次提交
  11. 06 7月, 2011 3 次提交
    • D
      packet: Add 'cpu' fanout policy. · 95ec3eb4
      David S. Miller 提交于
      Unfortunately we have to use a real modulus here as
      the multiply trick won't work as effectively with cpu
      numbers as it does with rxhash values.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95ec3eb4
    • D
      packet: Add pre-defragmentation support for ipv4 fanouts. · 7736d33f
      David S. Miller 提交于
      The skb->rxhash cannot be properly computed if the
      packet is a fragment.  To alleviate this, allow the
      AF_PACKET client to ask for defragmentation to be
      done at demux time.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7736d33f
    • D
      packet: Add fanout support. · dc99f600
      David S. Miller 提交于
      Fanouts allow packet capturing to be demuxed to a set of AF_PACKET
      sockets.  Two fanout policies are implemented:
      
      1) Hashing based upon skb->rxhash
      
      2) Pure round-robin
      
      An AF_PACKET socket must be fully bound before it tries to add itself
      to a fanout.  All AF_PACKET sockets trying to join the same fanout
      must all have the same bind settings.
      
      Fanouts are identified (within a network namespace) by a 16-bit ID.
      The first socket to try to add itself to a fanout with a particular
      ID, creates that fanout.  When the last socket leaves the fanout
      (which happens only when the socket is closed), that fanout is
      destroyed.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc99f600
  12. 07 6月, 2011 1 次提交
  13. 02 6月, 2011 1 次提交
  14. 02 6月, 2010 1 次提交
    • S
      packet_mmap: expose hw packet timestamps to network packet capture utilities · 614f60fa
      Scott McMillan 提交于
      This patch adds a setting, PACKET_TIMESTAMP, to specify the packet
      timestamp source that is exported to capture utilities like tcpdump by
      packet_mmap.
      
      PACKET_TIMESTAMP accepts the same integer bit field as
      SO_TIMESTAMPING.  However, only the SOF_TIMESTAMPING_SYS_HARDWARE and
      SOF_TIMESTAMPING_RAW_HARDWARE values are currently recognized by
      PACKET_TIMESTAMP.  SOF_TIMESTAMPING_SYS_HARDWARE takes precedence over
      SOF_TIMESTAMPING_RAW_HARDWARE if both bits are set.
      
      If PACKET_TIMESTAMP is not set, a software timestamp generated inside
      the networking stack is used (the behavior before this setting was
      added).
      Signed-off-by: NScott McMillan <scott.a.mcmillan@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      614f60fa
  15. 13 4月, 2010 1 次提交
  16. 05 2月, 2010 1 次提交
    • S
      packet: Add GSO/csum offload support. · bfd5f4a3
      Sridhar Samudrala 提交于
      This patch adds GSO/checksum offload to af_packet sockets using
      virtio_net_hdr. Based on Rusty's patch to add this support to tun.
      It allows GSO/checksum offload to be enabled when using raw socket
      backend with virtio_net.
      Adds PACKET_VNET_HDR socket option to prepend virtio_net_hdr in the
      receive path and process/skip virtio_net_hdr in the send path. This
      option is only allowed with SOCK_RAW sockets attached to ethernet
      type devices.
      
      v2 updates
      ----------
      Michael's Comments
      - Perform length check in packet_snd() when GSO is off even when
        vnet_hdr is present.
      - Check for SKB_GSO_FCOE type and return -EINVAL
      - don't allow tx/rx ring when vnet_hdr is enabled.
      Herbert's Comments
      - Removed ethernet specific code.
      - protocol value is assumed to be passed in by the caller.
      Signed-off-by: NSridhar Samudrala <sri@us.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfd5f4a3
  17. 05 11月, 2009 1 次提交
  18. 12 10月, 2009 1 次提交
  19. 05 10月, 2009 1 次提交
    • N
      af_packet: add interframe drop cmsg (v6) · 97775007
      Neil Horman 提交于
      Add Ancilliary data to better represent loss information
      
      I've had a few requests recently to provide more detail regarding frame loss
      during an AF_PACKET packet capture session.  Specifically the requestors want to
      see where in a packet sequence frames were lost, i.e. they want to see that 40
      frames were lost between frames 302 and 303 in a packet capture file.  In order
      to do this we need:
      
      1) The kernel to export this data to user space
      2) The applications to make use of it
      
      This patch addresses item (1).  It does this by doing the following:
      
      A) Anytime we drop a frame for which we would increment po->stats.tp_drops, we
      also no increment a stats called po->stats.tp_gap.
      
      B) Every time we successfully enqueue a frame to sk_receive_queue, we record the
      value of po->stats.tp_gap in skb->mark.  skb->cb would nominally be the place to
      record this, but since all the space there is used up, we're overloading
      skb->mark.  Its safe to do since any enqueued packet is guaranteed to be
      unshared at this point, and skb->mark isn't used for anything else in the rx
      path to the application.  After we record tp_gap in the skb, we zero
      po->stats.tp_gap.  This allows us to keep a counter of the number of frames lost
      between any two enqueued packets
      
      C) When the application goes to dequeue a frame from the packet socket, we look
      at skb->mark for that frame.  If it is non-zero, we add a cmsg chunk to the
      msghdr of level SOL_PACKET and type PACKET_GAPDATA.  Its a 32 bit integer that
      represents the number of frames lost between this packet and the last previous
      frame received.
      
      Note there is a chance that if there is frame loss after a receive, and then the
      socket is closed, some gap data might be lost.  This is covered by the use of
      the PACKET_AUXDATA socket option, which gives total loss data.  With a bit of
      math, the final gap can be determined that way.
      
      I've tested this patch myself, and it works well.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      
       include/linux/if_packet.h |    2 ++
       net/packet/af_packet.c    |   33 +++++++++++++++++++++++++++++++++
       2 files changed, 35 insertions(+)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97775007
  20. 22 5月, 2009 1 次提交
  21. 19 5月, 2009 1 次提交
  22. 19 7月, 2008 1 次提交
  23. 15 7月, 2008 2 次提交
    • P
      packet: deliver VLAN TCI to userspace · 393e52e3
      Patrick McHardy 提交于
      Store the VLAN tag in the auxillary data/tpacket2_hdr so userspace can
      properly deal with hardware VLAN tagging/stripping.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      393e52e3
    • P
      packet: support extensible, 64 bit clean mmaped ring structure · bbd6ef87
      Patrick McHardy 提交于
      The tpacket_hdr is not 64 bit clean due to use of an unsigned long
      and can't be extended because the following struct sockaddr_ll needs
      to be at a fixed offset.
      
      Add support for a version 2 tpacket protocol that removes these
      limitations.
      
      Userspace can query the header size through a new getsockopt option
      and change the protocol version through a setsockopt option. The
      changes needed to switch to the new protocol version are:
      
      1. replace struct tpacket_hdr by struct tpacket2_hdr
      2. query header len and save
      3. set protocol version to 2
       - set up ring as usual
      4. for getting the sockaddr_ll, use (void *)hdr + TPACKET_ALIGN(hdrlen)
         instead of (void *)hdr + TPACKET_ALIGN(sizeof(struct tpacket_hdr))
      
      Steps 2 and 4 can be omitted if the struct sockaddr_ll isn't needed.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbd6ef87
  24. 26 4月, 2007 1 次提交
  25. 09 2月, 2007 1 次提交
    • H
      [PACKET]: Add optional checksum computation for recvmsg · 8dc41944
      Herbert Xu 提交于
      This patch is needed to make ISC's DHCP server (and probably other
      DHCP servers/clients using AF_PACKET) to be able to serve another
      client on the same Xen host.
      
      The problem is that packets between different domains on the same
      Xen host only have partial checksums.  Unfortunately this piece of
      information is not passed along in AF_PACKET unless you're using
      the mmap interface.  Since dhcpd doesn't support packet-mmap, UDP
      packets from the same host come out with apparently bogus checksums.
      
      This patch adds a mechanism for AF_PACKET recvmsg(2) to return the
      status along with the packet.  It does so by adding a new cmsg that
      contains this information along with some other relevant data such
      as the original packet length.
      
      I didn't include the time stamp information since there is already
      a cmsg for that.
      
      This patch also changes the mmap code to set the CSUMNOTREADY flag
      on all packets instead of just outoing packets on cooked sockets.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8dc41944
  26. 03 12月, 2006 1 次提交
  27. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4