1. 20 4月, 2016 2 次提交
    • D
      bpf: add event output helper for notifications/sampling/logging · bd570ff9
      Daniel Borkmann 提交于
      This patch adds a new helper for cls/act programs that can push events
      to user space applications. For networking, this can be f.e. for sampling,
      debugging, logging purposes or pushing of arbitrary wake-up events. The
      idea is similar to a43eec30 ("bpf: introduce bpf_perf_event_output()
      helper") and 39111695 ("samples: bpf: add bpf_perf_event_output example").
      
      The eBPF program utilizes a perf event array map that user space populates
      with fds from perf_event_open(), the eBPF program calls into the helper
      f.e. as skb_event_output(skb, &my_map, BPF_F_CURRENT_CPU, raw, sizeof(raw))
      so that the raw data is pushed into the fd f.e. at the map index of the
      current CPU.
      
      User space can poll/mmap/etc on this and has a data channel for receiving
      events that can be post-processed. The nice thing is that since the eBPF
      program and user space application making use of it are tightly coupled,
      they can define their own arbitrary raw data format and what/when they
      want to push.
      
      While f.e. packet headers could be one part of the meta data that is being
      pushed, this is not a substitute for things like packet sockets as whole
      packet is not being pushed and push is only done in a single direction.
      Intention is more of a generically usable, efficient event pipe to applications.
      Workflow is that tc can pin the map and applications can attach themselves
      e.g. after cls/act setup to one or multiple map slots, demuxing is done by
      the eBPF program.
      
      Adding this facility is with minimal effort, it reuses the helper
      introduced in a43eec30 ("bpf: introduce bpf_perf_event_output() helper")
      and we get its functionality for free by overloading its BPF_FUNC_ identifier
      for cls/act programs, ctx is currently unused, but will be made use of in
      future. Example will be added to iproute2's BPF example files.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd570ff9
    • K
      net/ipv6/addrconf: simplify sysctl registration · 607ea7cd
      Konstantin Khlebnikov 提交于
      Struct ctl_table_header holds pointer to sysctl table which could be used
      for freeing it after unregistration. IPv4 sysctls already use that.
      Remove redundant NULL assignment: ndev allocated using kzalloc.
      
      This also saves some bytes: sysctl table could be shorter than
      DEVCONF_MAX+1 if some options are disable in config.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      607ea7cd
  2. 19 4月, 2016 2 次提交
  3. 17 4月, 2016 1 次提交
  4. 16 4月, 2016 4 次提交
  5. 15 4月, 2016 4 次提交
    • D
      bpf, verifier: add ARG_PTR_TO_RAW_STACK type · 435faee1
      Daniel Borkmann 提交于
      When passing buffers from eBPF stack space into a helper function, we have
      ARG_PTR_TO_STACK argument type for helpers available. The verifier makes sure
      that such buffers are initialized, within boundaries, etc.
      
      However, the downside with this is that we have a couple of helper functions
      such as bpf_skb_load_bytes() that fill out the passed buffer in the expected
      success case anyway, so zero initializing them prior to the helper call is
      unneeded/wasted instructions in the eBPF program that can be avoided.
      
      Therefore, add a new helper function argument type called ARG_PTR_TO_RAW_STACK.
      The idea is to skip the STACK_MISC check in check_stack_boundary() and color
      the related stack slots as STACK_MISC after we checked all call arguments.
      
      Helper functions using ARG_PTR_TO_RAW_STACK must make sure that every path of
      the helper function will fill the provided buffer area, so that we cannot leak
      any uninitialized stack memory. This f.e. means that error paths need to
      memset() the buffers, but the expected fast-path doesn't have to do this
      anymore.
      
      Since there's no such helper needing more than at most one ARG_PTR_TO_RAW_STACK
      argument, we can keep it simple and don't need to check for multiple areas.
      Should in future such a use-case really appear, we have check_raw_mode() that
      will make sure we implement support for it first.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      435faee1
    • A
      GSO: Support partial segmentation offload · 802ab55a
      Alexander Duyck 提交于
      This patch adds support for something I am referring to as GSO partial.
      The basic idea is that we can support a broader range of devices for
      segmentation if we use fixed outer headers and have the hardware only
      really deal with segmenting the inner header.  The idea behind the naming
      is due to the fact that everything before csum_start will be fixed headers,
      and everything after will be the region that is handled by hardware.
      
      With the current implementation it allows us to add support for the
      following GSO types with an inner TSO_MANGLEID or TSO6 offload:
      NETIF_F_GSO_GRE
      NETIF_F_GSO_GRE_CSUM
      NETIF_F_GSO_IPIP
      NETIF_F_GSO_SIT
      NETIF_F_UDP_TUNNEL
      NETIF_F_UDP_TUNNEL_CSUM
      
      In the case of hardware that already supports tunneling we may be able to
      extend this further to support TSO_TCPV4 without TSO_MANGLEID if the
      hardware can support updating inner IPv4 headers.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      802ab55a
    • A
      GRO: Add support for TCP with fixed IPv4 ID field, limit tunnel IP ID values · 1530545e
      Alexander Duyck 提交于
      This patch does two things.
      
      First it allows TCP to aggregate TCP frames with a fixed IPv4 ID field.  As
      a result we should now be able to aggregate flows that were converted from
      IPv6 to IPv4.  In addition this allows us more flexibility for future
      implementations of segmentation as we may be able to use a fixed IP ID when
      segmenting the flow.
      
      The second thing this does is that it places limitations on the outer IPv4
      ID header in the case of tunneled frames.  Specifically it forces the IP ID
      to be incrementing by 1 unless the DF bit is set in the outer IPv4 header.
      This way we can avoid creating overlapping series of IP IDs that could
      possibly be fragmented if the frame goes through GRO and is then
      resegmented via GSO.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1530545e
    • A
      GSO: Add GSO type for fixed IPv4 ID · cbc53e08
      Alexander Duyck 提交于
      This patch adds support for TSO using IPv4 headers with a fixed IP ID
      field.  This is meant to allow us to do a lossless GRO in the case of TCP
      flows that use a fixed IP ID such as those that convert IPv6 header to IPv4
      headers.
      
      In addition I am adding a feature that for now I am referring to TSO with
      IP ID mangling.  Basically when this flag is enabled the device has the
      option to either output the flow with incrementing IP IDs or with a fixed
      IP ID regardless of what the original IP ID ordering was.  This is useful
      in cases where the DF bit is set and we do not care if the original IP ID
      value is maintained.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbc53e08
  6. 14 4月, 2016 4 次提交
  7. 12 4月, 2016 1 次提交
  8. 08 4月, 2016 8 次提交
  9. 06 4月, 2016 3 次提交
    • F
      mac80211: add A-MSDU tx support · 6e0456b5
      Felix Fietkau 提交于
      Requires software tx queueing and fast-xmit support. For good
      performance, drivers need frag_list support as well. This avoids the
      need for copying data of aggregated frames. Running without it is only
      supported for debugging purposes.
      
      To avoid performance and packet size issues, the rate control module or
      driver needs to limit the maximum A-MSDU size by setting
      max_rc_amsdu_len in struct ieee80211_sta.
      Signed-off-by: NFelix Fietkau <nbd@openwrt.org>
      [fix locking issue]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      6e0456b5
    • J
      mac80211: add fast-rx path · 49ddf8e6
      Johannes Berg 提交于
      The regular RX path has a lot of code, but with a few
      assumptions on the hardware it's possible to reduce the
      amount of code significantly. Currently the assumptions
      on the driver are the following:
       * hardware/driver reordering buffer (if supporting aggregation)
       * hardware/driver decryption & PN checking (if using encryption)
       * hardware/driver did de-duplication
       * hardware/driver did A-MSDU deaggregation
       * AP_LINK_PS is used (in AP mode)
       * no client powersave handling in mac80211 (in client mode)
      
      of which some are actually checked per packet:
       * de-duplication
       * PN checking
       * decryption
      and additionally packets must
       * not be A-MSDU (have been deaggregated by driver/device)
       * be data packets
       * not be fragmented
       * be unicast
       * have RFC 1042 header
      
      Additionally dynamically we assume:
       * no encryption or CCMP/GCMP, TKIP/WEP/other not allowed
       * station must be authorized
       * 4-addr format not enabled
      
      Some data needed for the RX path is cached in a new per-station
      "fast_rx" structure, so that we only need to look at this and
      the packet, no other memory when processing packets on the fast
      RX path.
      
      After doing the above per-packet checks, the data path collapses
      down to a pretty simple conversion function taking advantage of
      the data cached in the small fast_rx struct.
      
      This should speed up the RX processing, and will make it easier
      to reason about parallelizing RX (for which statistics will need
      to be per-CPU still.)
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      49ddf8e6
    • S
      udp: enable MSG_PEEK at non-zero offset · 627d2d6b
      samanthakumar 提交于
      Enable peeking at UDP datagrams at the offset specified with socket
      option SOL_SOCKET/SO_PEEK_OFF. Peek at any datagram in the queue, up
      to the end of the given datagram.
      
      Implement the SO_PEEK_OFF semantics introduced in commit ef64a54f
      ("sock: Introduce the SO_PEEK_OFF sock option"). Increase the offset
      on peek, decrease it on regular reads.
      
      When peeking, always checksum the packet immediately, to avoid
      recomputation on subsequent peeks and final read.
      
      The socket lock is not held for the duration of udp_recvmsg, so
      peek and read operations can run concurrently. Only the last store
      to sk_peek_off is preserved.
      Signed-off-by: NSam Kumar <samanthakumar@google.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      627d2d6b
  10. 05 4月, 2016 8 次提交
  11. 03 4月, 2016 1 次提交
  12. 02 4月, 2016 2 次提交