1. 22 4月, 2017 2 次提交
  2. 21 4月, 2017 1 次提交
  3. 18 4月, 2017 4 次提交
    • D
      net: rtnetlink: plumb extended ack to doit function · c21ef3e3
      David Ahern 提交于
      Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse
      for doit functions that call it directly.
      
      This is the first step to using extended error reporting in rtnetlink.
      >From here individual subsystems can be updated to set netlink_ext_ack as
      needed.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c21ef3e3
    • I
      gso: Validate assumption of frag_list segementation · 7a7a9bd7
      Ilan Tayari 提交于
      Commit 07b26c94 ("gso: Support partial splitting at the frag_list
      pointer") assumes that all SKBs in a frag_list (except maybe the last
      one) contain the same amount of GSO payload.
      
      This assumption is not always correct, resulting in the following
      warning message in the log:
          skb_segment: too many frags
      
      For example, mlx5 driver in Striding RQ mode creates some RX SKBs with
      one frag, and some with 2 frags.
      After GRO, the frag_list SKBs end up having different amounts of payload.
      If this frag_list SKB is then forwarded, the aforementioned assumption
      is violated.
      
      Validate the assumption, and fall back to software GSO if it not true.
      
      Fixes: 07b26c94 ("gso: Support partial splitting at the frag_list pointer")
      Signed-off-by: NIlan Tayari <ilant@mellanox.com>
      Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a7a9bd7
    • C
      Add uid and cookie bpf helper to cg_skb_func_proto · 9fd0f315
      Chenbo Feng 提交于
      BPF helper functions get_socket_cookie and get_socket_uid can be
      used for network traffic classifications, among others. Expose
      them also to programs of type BPF_PROG_TYPE_CGROUP_SKB. As of
      commit 8f917bba ("bpf: pass sk to helper functions") the
      required skb->sk function is available at both cgroup bpf ingress
      and egress hooks. With these two new helper, cg_skb_func_proto is
      effectively the same as sk_filter_func_proto.
      
      Change since V1:
      Instead of add the helper to cg_skb_func_proto, redirect the
      cg_skb_func_proto to sk_filter_func_proto since all helper function
      in sk_filter_func_proto are applicable to cg_skb_func_proto now.
      Signed-off-by: NChenbo Feng <fengc@google.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fd0f315
    • W
      net-timestamp: avoid use-after-free in ip_recv_error · 1862d620
      Willem de Bruijn 提交于
      Syzkaller reported a use-after-free in ip_recv_error at line
      
          info->ipi_ifindex = skb->dev->ifindex;
      
      This function is called on dequeue from the error queue, at which
      point the device pointer may no longer be valid.
      
      Save ifindex on enqueue in __skb_complete_tx_timestamp, when the
      pointer is valid or NULL. Store it in temporary storage skb->cb.
      
      It is safe to reference skb->dev here, as called from device drivers
      or dev_queue_xmit. The exception is when called from tcp_ack_tstamp;
      in that case it is NULL and ifindex is set to 0 (invalid).
      
      Do not return a pktinfo cmsg if ifindex is 0. This maintains the
      current behavior of not returning a cmsg if skb->dev was NULL.
      
      On dequeue, the ipv4 path will cast from sock_exterr_skb to
      in_pktinfo. Both have ifindex as their first element, so no explicit
      conversion is needed. This is by design, introduced in commit
      0b922b7a ("net: original ingress device index in PKTINFO"). For
      ipv6 ip6_datagram_support_cmsg converts to in6_pktinfo.
      
      Fixes: 829ae9d6 ("net-timestamp: allow reading recv cmsg on errqueue with origin tstamp")
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1862d620
  4. 14 4月, 2017 12 次提交
  5. 13 4月, 2017 1 次提交
    • I
      gso: Support frag_list splitting with head_frag · eaffadbb
      Ilan Tayari 提交于
      A driver may use build_skb() for received packets.
      These SKBs then have a head_frag.
      
      Since commit d7e8883c ("net: make GRO aware of
      skb->head_frag"), GRO may build frag_list SKBs out of
      head_frag received SKBs.
      In such a case, the chained SKBs end up with a head_frag.
      
      Commit 07b26c94 ("gso: Support partial splitting at
      the frag_list pointer") adds partial segmentation of frag_list
      SKB chains into individual SKBs.
      However, this is not done if the chained SKBs have any
      linear part, because the device may not be able to DMA
      the private linear buffer.
      
      A chained frag_list SKB with head_frag is wrongfully
      detected in this case as having a private linear part
      and thus falls back to software GSO, while in fact the
      linear part is backed by a DMA page just like any other frag.
      
      This causes low performance when forwarding those packets
      that were built with build_skb()
      
      Allow partial segmentation at the frag_list pointer for
      chained SKBs with head_frag.
      
      Note that such SKBs can only be created by GRO, when applied
      to received packets with head_frag.
      Also note that this change only affects the data path that
      performs the partial segmentation at frag_list pointer, and
      not any of the other more common data paths.
      Signed-off-by: NIlan Tayari <ilant@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eaffadbb
  6. 12 4月, 2017 4 次提交
  7. 10 4月, 2017 1 次提交
  8. 08 4月, 2017 1 次提交
  9. 05 4月, 2017 2 次提交
    • V
      rtnl: Add support for netdev event to link messages · def12888
      Vlad Yasevich 提交于
      When netdev events happen, a rtnetlink_event() handler will send
      messages for every event in it's white list.  These messages contain
      current information about a particular device, but they do not include
      the iformation about which event just happened.  The consumer of
      the message has to try to infer this information.  In some cases
      (ex: NETDEV_NOTIFY_PEERS), that is not possible.
      
      This patch adds a new extension to RTM_NEWLINK message called IFLA_EVENT
      that would have an encoding of the which event triggered this
      message.  This would allow the the message consumer to easily determine
      if it is interested in a particular event or not.
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      def12888
    • V
      rtnetlink: Convert rtnetlink_event to white list · 5138e86f
      Vlad Yasevich 提交于
      The rtnetlink_event currently functions as a blacklist where
      we block cerntain netdev events from being sent to user space.
      As a result, events have been added to the system that userspace
      probably doesn't care about.
      
      This patch converts the implementation to the white list so that
      newly events would have to be specifically added to the list to
      be sent to userspace.  This would force new event implementers to
      consider whether a given event is usefull to user space or if it's
      just a kernel event.
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5138e86f
  10. 04 4月, 2017 5 次提交
  11. 03 4月, 2017 1 次提交
    • A
      make skb_copy_datagram_msg() et.al. preserve ->msg_iter on error · 32786821
      Al Viro 提交于
      Fixes the mess observed in e.g. rsync over a noisy link we'd been
      seeing since last Summer.  What happens is that we copy part of
      a datagram before noticing a checksum mismatch.  Datagram will be
      resent, all right, but we want the next try go into the same place,
      not after it...
      
      All this family of primitives (copy/checksum and copy a datagram
      into destination) is "all or nothing" sort of interface - either
      we get 0 (meaning that copy had been successful) or we get an
      error (and no way to tell how much had been copied before we ran
      into whatever error it had been).  Make all of them leave iterator
      unadvanced in case of errors - all callers must be able to cope
      with that (an error might've been caught before the iterator had
      been advanced), it costs very little to arrange, it's safer for
      callers and actually fixes at least one bug in said callers.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      32786821
  12. 02 4月, 2017 1 次提交
    • A
      bpf: introduce BPF_PROG_TEST_RUN command · 1cf1cae9
      Alexei Starovoitov 提交于
      development and testing of networking bpf programs is quite cumbersome.
      Despite availability of user space bpf interpreters the kernel is
      the ultimate authority and execution environment.
      Current test frameworks for TC include creation of netns, veth,
      qdiscs and use of various packet generators just to test functionality
      of a bpf program. XDP testing is even more complicated, since
      qemu needs to be started with gro/gso disabled and precise queue
      configuration, transferring of xdp program from host into guest,
      attaching to virtio/eth0 and generating traffic from the host
      while capturing the results from the guest.
      
      Moreover analyzing performance bottlenecks in XDP program is
      impossible in virtio environment, since cost of running the program
      is tiny comparing to the overhead of virtio packet processing,
      so performance testing can only be done on physical nic
      with another server generating traffic.
      
      Furthermore ongoing changes to user space control plane of production
      applications cannot be run on the test servers leaving bpf programs
      stubbed out for testing.
      
      Last but not least, the upstream llvm changes are validated by the bpf
      backend testsuite which has no ability to test the code generated.
      
      To improve this situation introduce BPF_PROG_TEST_RUN command
      to test and performance benchmark bpf programs.
      
      Joint work with Daniel Borkmann.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cf1cae9
  13. 31 3月, 2017 1 次提交
    • P
      sock: avoid dirtying sk_stamp, if possible · 6c7c98ba
      Paolo Abeni 提交于
      sock_recv_ts_and_drops() unconditionally set sk->sk_stamp for
      every packet, even if the SOCK_TIMESTAMP flag is not set in the
      related socket.
      If selinux is enabled, this cause a cache miss for every packet
      since sk->sk_stamp and sk->sk_security share the same cacheline.
      With this change sk_stamp is set only if the SOCK_TIMESTAMP
      flag is set, and is cleared for the first packet, so that the user
      perceived behavior is unchanged.
      
      This gives up to 5% speed-up under udp-flood with small packets.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c7c98ba
  14. 29 3月, 2017 2 次提交
    • A
      net: break include loop netdevice.h, dsa.h, devlink.h · c6e970a0
      Andrew Lunn 提交于
      There is an include loop between netdevice.h, dsa.h, devlink.h because
      of NETDEV_ALIGN, making it impossible to use devlink structures in
      dsa.h.
      
      Break this loop by taking dsa.h out of netdevice.h, add a forward
      declaration of dsa_switch_tree and netdev_set_default_ethtool_ops()
      function, which is what netdevice.h requires.
      
      No longer having dsa.h in netdevice.h means the includes in dsa.h no
      longer get included. This breaks a few other files which depend on
      these includes. Add these directly in the affected file.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6e970a0
    • A
      devlink: Support for pipeline debug (dpipe) · 1555d204
      Arkadi Sharshevsky 提交于
      The pipeline debug is used to export the pipeline abstractions for the
      main objects - tables, headers and entries. The only support for set is
      for changing the counter parameter on specific table.
      
      The basic structures:
      
      Header - can represent a real protocol header information or internal
               metadata. Generic protocol headers like IPv4 can be shared
               between drivers. Each driver can add local headers.
      
      Field - part of a header. Can represent protocol field or specific ASIC
              metadata field. Hardware special metadata fields can be mapped
              to different resources, for example switch ASIC ports can have
              internal number which from the systems point of view is mapped
              to netdeivce ifindex.
      
      Match - represent specific match rule. Can describe match on specific
              field or header. The header index should be specified as well
              in order to support several header instances of the same type
              (tunneling).
      
      Action - represents specific action rule. Actions can describe operations
               on specific field values for example like set, increment, etc.
               And header operation like add and delete.
      
      Value - represents value which can be associated with specific match or
              action.
      
      Table - represents a hardware block which can be described with match/
              action behavior. The match/action can be done on the packets
              data or on the internal metadata that it gathered along the
              packets traversal throw the pipeline which is vendor specific
              and should be exported in order to provide understanding of
              ASICs behavior.
      
      Entry - represents single record in a specific table. The entry is
              identified by specific combination of values for match/action.
      
      Prior to accessing the tables/entries the drivers provide the header/
      field data base which is used by driver to user-space. The data base
      is split between the shared headers and unique headers.
      Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1555d204
  15. 25 3月, 2017 2 次提交