1. 23 9月, 2016 2 次提交
  2. 09 8月, 2016 1 次提交
  3. 27 7月, 2016 1 次提交
    • H
      net: neigh: disallow transition to NUD_STALE if lladdr is unchanged in neigh_update() · d1c2b501
      He Chunhui 提交于
      NUD_STALE is used when the caller(e.g. arp_process()) can't guarantee
      neighbour reachability. If the entry was NUD_VALID and lladdr is unchanged,
      the entry state should not be changed.
      
      Currently the code puts an extra "NUD_CONNECTED" condition. So if old state
      was NUD_DELAY or NUD_PROBE (they are NUD_VALID but not NUD_CONNECTED), the
      state can be changed to NUD_STALE.
      
      This may cause problem. Because NUD_STALE lladdr doesn't guarantee
      reachability, when we send traffic, the state will be changed to
      NUD_DELAY. In normal case, if we get no confirmation (by dst_confirm()),
      we will change the state to NUD_PROBE and send probe traffic. But now the
      state may be reset to NUD_STALE again(e.g. by broadcast ARP packets),
      so the probe traffic will not be sent. This situation may happen again and
      again, and packets will be sent to an non-reachable lladdr forever.
      
      The fix is to remove the "NUD_CONNECTED" condition. After that the
      "NEIGH_UPDATE_F_WEAK_OVERRIDE" condition (used by IPv6) in that branch will
      be redundant, so remove it.
      
      This change may increase probe traffic, but it's essential since NUD_STALE
      lladdr is unreliable. To ensure correctness, we prefer to resolve lladdr,
      when we can't get confirmation, even while remote packets try to set
      NUD_STALE state.
      Signed-off-by: NChunhui He <hchunhui@mail.ustc.edu.cn>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1c2b501
  4. 26 7月, 2016 1 次提交
    • D
      bpf, events: fix offset in skb copy handler · aa7145c1
      Daniel Borkmann 提交于
      This patch fixes the __output_custom() routine we currently use with
      bpf_skb_copy(). I missed that when len is larger than the size of the
      current handle, we can issue multiple invocations of copy_func, and
      __output_custom() advances destination but also source buffer by the
      written amount of bytes. When we have __output_custom(), this is actually
      wrong since in that case the source buffer points to a non-linear object,
      in our case an skb, which the copy_func helper is supposed to walk.
      Therefore, since this is non-linear we thus need to pass the offset into
      the helper, so that copy_func can use it for extracting the data from
      the source object.
      
      Therefore, adjust the callback signatures properly and pass offset
      into the skb_header_pointer() invoked from bpf_skb_copy() callback. The
      __DEFINE_OUTPUT_COPY_BODY() is adjusted to accommodate for two things:
      i) to pass in whether we should advance source buffer or not; this is
      a compile-time constant condition, ii) to pass in the offset for
      __output_custom(), which we do with help of __VA_ARGS__, so everything
      can stay inlined as is currently. Both changes allow for adapting the
      __output_* fast-path helpers w/o extra overhead.
      
      Fixes: 555c8a86 ("bpf: avoid stack copy and use skb ctx for event output")
      Fixes: 7e3f977e ("perf, events: add non-linear data support for raw records")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa7145c1
  5. 21 7月, 2016 1 次提交
  6. 20 7月, 2016 3 次提交
  7. 16 7月, 2016 1 次提交
    • D
      bpf: avoid stack copy and use skb ctx for event output · 555c8a86
      Daniel Borkmann 提交于
      This work addresses a couple of issues bpf_skb_event_output()
      helper currently has: i) We need two copies instead of just a
      single one for the skb data when it should be part of a sample.
      The data can be non-linear and thus needs to be extracted via
      bpf_skb_load_bytes() helper first, and then copied once again
      into the ring buffer slot. ii) Since bpf_skb_load_bytes()
      currently needs to be used first, the helper needs to see a
      constant size on the passed stack buffer to make sure BPF
      verifier can do sanity checks on it during verification time.
      Thus, just passing skb->len (or any other non-constant value)
      wouldn't work, but changing bpf_skb_load_bytes() is also not
      the proper solution, since the two copies are generally still
      needed. iii) bpf_skb_load_bytes() is just for rather small
      buffers like headers, since they need to sit on the limited
      BPF stack anyway. Instead of working around in bpf_skb_load_bytes(),
      this work improves the bpf_skb_event_output() helper to address
      all 3 at once.
      
      We can make use of the passed in skb context that we have in
      the helper anyway, and use some of the reserved flag bits as
      a length argument. The helper will use the new __output_custom()
      facility from perf side with bpf_skb_copy() as callback helper
      to walk and extract the data. It will pass the data for setup
      to bpf_event_output(), which generates and pushes the raw record
      with an additional frag part. The linear data used in the first
      frag of the record serves as programmatically defined meta data
      passed along with the appended sample.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      555c8a86
  8. 14 7月, 2016 2 次提交
    • W
      dccp: limit sk_filter trim to payload · 4f0c40d9
      Willem de Bruijn 提交于
      Dccp verifies packet integrity, including length, at initial rcv in
      dccp_invalid_packet, later pulls headers in dccp_enqueue_skb.
      
      A call to sk_filter in-between can cause __skb_pull to wrap skb->len.
      skb_copy_datagram_msg interprets this as a negative value, so
      (correctly) fails with EFAULT. The negative length is reported in
      ioctl SIOCINQ or possibly in a DCCP_WARN in dccp_close.
      
      Introduce an sk_receive_skb variant that caps how small a filter
      program can trim packets, and call this in dccp with the header
      length. Excessively trimmed packets are now processed normally and
      queued for reception as 0B payloads.
      
      Fixes: 7c657876 ("[DCCP]: Initial implementation")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f0c40d9
    • W
      rose: limit sk_filter trim to payload · f4979fce
      Willem de Bruijn 提交于
      Sockets can have a filter program attached that drops or trims
      incoming packets based on the filter program return value.
      
      Rose requires data packets to have at least ROSE_MIN_LEN bytes. It
      verifies this on arrival in rose_route_frame and unconditionally pulls
      the bytes in rose_recvmsg. The filter can trim packets to below this
      value in-between, causing pull to fail, leaving the partial header at
      the time of skb_copy_datagram_msg.
      
      Place a lower bound on the size to which sk_filter may trim packets
      by introducing sk_filter_trim_cap and call this for rose packets.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4979fce
  9. 13 7月, 2016 1 次提交
  10. 12 7月, 2016 1 次提交
  11. 10 7月, 2016 1 次提交
  12. 06 7月, 2016 3 次提交
  13. 05 7月, 2016 3 次提交
  14. 03 7月, 2016 1 次提交
  15. 02 7月, 2016 4 次提交
  16. 01 7月, 2016 1 次提交
  17. 30 6月, 2016 5 次提交
    • M
      fib_rules: Added NLM_F_EXCL support to fib_nl_newrule · 153380ec
      Mateusz Bajorski 提交于
      When adding rule with NLM_F_EXCL flag then check if the same rule exist.
      If yes then exit with -EEXIST.
      
      This is already implemented in iproute2:
              if (cmd == RTM_NEWRULE) {
                      req.n.nlmsg_flags |= NLM_F_CREATE|NLM_F_EXCL;
                      req.r.rtm_type = RTN_UNICAST;
              }
      
      Tested ipv4 and ipv6 with net-next linux on qemu x86
      
      expected behavior after patch:
      localhost ~ # ip rule
      0:    from all lookup local
      32766:    from all lookup main
      32767:    from all lookup default
      localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005
      localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005
      RTNETLINK answers: File exists
      localhost ~ # ip rule
      0:    from all lookup local
      1005:    from 10.46.177.97 lookup 104
      32766:    from all lookup main
      32767:    from all lookup default
      
      There was already topic regarding this but I don't see any changes
      merged and problem still occurs.
      https://lkml.kernel.org/r/1135778809.5944.7.camel+%28%29+localhost+%21+localdomainSigned-off-by: NMateusz Bajorski <mateusz.bajorski@nokia.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      153380ec
    • N
      net: rtnetlink: add support for the IFLA_STATS_LINK_XSTATS_SLAVE attribute · 80e73cc5
      Nikolay Aleksandrov 提交于
      This patch adds support for the IFLA_STATS_LINK_XSTATS_SLAVE attribute
      which allows to export per-slave statistics if the master device supports
      the linkxstats callback. The attribute is passed down to the linkxstats
      callback and it is up to the callback user to use it (an example has been
      added to the only current user - the bridge). This allows us to query only
      specific slaves of master devices like bridge ports and export only what
      we're interested in instead of having to dump all ports and searching only
      for a single one. This will be used to export per-port IGMP/MLD stats and
      also per-port vlan stats in the future, possibly other statistics as well.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80e73cc5
    • D
      bpf: add bpf_skb_change_type helper · d2485c42
      Daniel Borkmann 提交于
      This work adds a helper for changing skb->pkt_type in a controlled way.
      We only allow a subset of possible values and can extend that in future
      should other use cases come up. Doing this as a helper has the advantage
      that errors can be handeled gracefully and thus helper kept extensible.
      
      It's a write counterpart to pkt_type member we can already read from
      struct __sk_buff context. Major use case is to change incoming skbs to
      PACKET_HOST in a programmatic way instead of having to recirculate via
      redirect(..., BPF_F_INGRESS), for example.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2485c42
    • D
      bpf: add bpf_skb_change_proto helper · 6578171a
      Daniel Borkmann 提交于
      This patch adds a minimal helper for doing the groundwork of changing
      the skb->protocol in a controlled way. Currently supported is v4 to
      v6 and vice versa transitions, which allows f.e. for a minimal, static
      nat64 implementation where applications in containers that still
      require IPv4 can be transparently operated in an IPv6-only environment.
      For example, host facing veth of the container can transparently do
      the transitions in a programmatic way with the help of clsact qdisc
      and cls_bpf.
      
      Idea is to separate concerns for keeping complexity of the helper
      lower, which means that the programs utilize bpf_skb_change_proto(),
      bpf_skb_store_bytes() and bpf_lX_csum_replace() to get the job done,
      instead of doing everything in a single helper (and thus partially
      duplicating helper functionality). Also, bpf_skb_change_proto()
      shouldn't need to deal with raw packet data as this is done by other
      helpers.
      
      bpf_skb_proto_6_to_4() and bpf_skb_proto_4_to_6() unclone the skb to
      operate on a private one, push or pop additionally required header
      space and migrate the gso/gro meta data from the shared info. We do
      mark the gso type as dodgy so that headers are checked and segs
      recalculated by the gso/gro engine. The gso_size target is adapted
      as well. The flags argument added is currently reserved and can be
      used for future extensions.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6578171a
    • D
      bpf: don't use raw processor id in generic helper · 80b48c44
      Daniel Borkmann 提交于
      Use smp_processor_id() for the generic helper bpf_get_smp_processor_id()
      instead of the raw variant. This allows for preemption checks when we
      have DEBUG_PREEMPT, and otherwise uses the raw variant anyway. We only
      need to keep the raw variant for socket filters, but we can reuse the
      helper that is already there from cBPF side.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80b48c44
  18. 29 6月, 2016 2 次提交
    • D
      neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit() · b560f03d
      David Barroso 提交于
      neigh_xmit() expects to be called inside an RCU-bh read side critical
      section, and while one of its two current callers gets this right, the
      other one doesn't.
      
      More specifically, neigh_xmit() has two callers, mpls_forward() and
      mpls_output(), and while both callers call neigh_xmit() under
      rcu_read_lock(), this provides sufficient protection for neigh_xmit()
      only in the case of mpls_forward(), as that is always called from
      softirq context and therefore doesn't need explicit BH protection,
      while mpls_output() can be called from process context with softirqs
      enabled.
      
      When mpls_output() is called from process context, with softirqs
      enabled, we can be preempted by a softirq at any time, and RCU-bh
      considers the completion of a softirq as signaling the end of any
      pending read-side critical sections, so if we do get a softirq
      while we are in the part of neigh_xmit() that expects to be run inside
      an RCU-bh read side critical section, we can end up with an unexpected
      RCU grace period running right in the middle of that critical section,
      making things go boom.
      
      This patch fixes this impedance mismatch in the callee, by making
      neigh_xmit() always take rcu_read_{,un}lock_bh() around the code that
      expects to be treated as an RCU-bh read side critical section, as this
      seems a safer option than fixing it in the callers.
      
      Fixes: 4fd3d7d9 ("neigh: Add helper function neigh_xmit")
      Signed-off-by: NDavid Barroso <dbarroso@fastly.com>
      Signed-off-by: NLennert Buytenhek <lbuytenhek@fastly.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Acked-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b560f03d
    • W
      net: the space is required before the open parenthesis '(' · 8a01ed70
      Wei Tang 提交于
      The space is missing before the open parenthesis '(', and this
      will introduce much more noise when checking patch around.
      Signed-off-by: NWei Tang <tangwei@cmss.chinamobile.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a01ed70
  19. 26 6月, 2016 1 次提交
    • E
      net_sched: drop packets after root qdisc lock is released · 520ac30f
      Eric Dumazet 提交于
      Qdisc performance suffers when packets are dropped at enqueue()
      time because drops (kfree_skb()) are done while qdisc lock is held,
      delaying a dequeue() draining the queue.
      
      Nominal throughput can be reduced by 50 % when this happens,
      at a time we would like the dequeue() to proceed as fast as possible.
      
      Even FQ is vulnerable to this problem, while one of FQ goals was
      to provide some flow isolation.
      
      This patch adds a 'struct sk_buff **to_free' parameter to all
      qdisc->enqueue(), and in qdisc_drop() helper.
      
      I measured a performance increase of up to 12 %, but this patch
      is a prereq so that future batches in enqueue() can fly.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      520ac30f
  20. 17 6月, 2016 2 次提交
  21. 16 6月, 2016 2 次提交
  22. 13 6月, 2016 1 次提交