1. 12 8月, 2015 1 次提交
  2. 11 8月, 2015 5 次提交
  3. 10 8月, 2015 7 次提交
  4. 08 8月, 2015 1 次提交
  5. 07 8月, 2015 4 次提交
    • J
      net_dbg_ratelimited: turn into no-op when !DEBUG · d92cff89
      Jason A. Donenfeld 提交于
      The pr_debug family of functions turns into a no-op when -DDEBUG is not
      specified, opting instead to call "no_printk", which gets compiled to a
      no-op (but retains gcc's nice warnings about printf-style arguments).
      
      The problem with net_dbg_ratelimited is that it is defined to be a
      variant of net_ratelimited_function, which expands to essentially:
      
          if (net_ratelimit())
              pr_debug(fmt, ...);
      
      When DEBUG is not defined, then this becomes,
      
          if (net_ratelimit())
              ;
      
      This seems benign, except it isn't. Firstly, there's the obvious
      overhead of calling net_ratelimit needlessly, which does quite some book
      keeping for the rate limiting. Given that the pr_debug and
      net_dbg_ratelimited family of functions are sprinkled liberally through
      performance critical code, with developers assuming they'll be compiled
      out to a no-op most of the time, we certainly do not want this needless
      book keeping. Secondly, and most visibly, even though no debug message
      is printed when DEBUG is not defined, if there is a flood of
      invocations, dmesg winds up peppered with messages such as
      "net_ratelimit: 320 callbacks suppressed". This is because our
      aforementioned net_ratelimit() function actually prints this text in
      some circumstances. It's especially odd to see this when there isn't any
      other accompanying debug message.
      
      So, in sum, it doesn't make sense to have this function's current
      behavior, and instead it should match what every other debug family of
      functions in the kernel does with !DEBUG -- nothing.
      
      This patch replaces calls to net_dbg_ratelimited when !DEBUG with
      no_printk, keeping with the idiom of all the other debug print helpers.
      
      Also, though not strictly neccessary, it guards the call with an if (0)
      so that all evaluation of any arguments are sure to be compiled out.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d92cff89
    • G
      net/mlx5_core: Support physical port counters · efea389d
      Gal Pressman 提交于
      Added physical port counters in the following standard formats to
      ethtool statistics:
        - IEEE 802.3
        - RFC2863
        - RFC2819
      Signed-off-by: NGal Pressman <galp@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efea389d
    • A
      net/mlx5e: Light-weight netdev open/stop · 5c50368f
      Achiad Shochat 提交于
      Create/destroy TIRs, TISs and flow tables upon PCI probe/remove rather
      than upon the netdev ndo_open/stop.
      
      Upon ndo_stop(), redirect all RX traffic to the (lately introduced)
      "Drop RQ" and then close only the RX/TX rings, leaving the TIRs,
      TISs and flow tables alive.
      Signed-off-by: NAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c50368f
    • A
      net/mlx5_core: Introduce access function to modify RSS/LRO params · d9eea403
      Achiad Shochat 提交于
      To be used by the mlx5 Eth driver in following commit.
      
      This is in preparation for netdev "light-weight" open/stop flow
      change described in previous commit.
      Signed-off-by: NAchiad Shochat <achiad@mellanox.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9eea403
  6. 04 8月, 2015 1 次提交
  7. 03 8月, 2015 1 次提交
    • D
      ebpf: add skb->hash to offset map for usage in {cls, act}_bpf or filters · ba7591d8
      Daniel Borkmann 提交于
      Add skb->hash to the __sk_buff offset map, so it can be accessed from
      an eBPF program. We currently already do this for classic BPF filters,
      but not yet on eBPF, it might be useful as a demuxer in combination with
      helpers like bpf_clone_redirect(), toy example:
      
        __section("cls-lb") int ingress_main(struct __sk_buff *skb)
        {
          unsigned int which = 3 + (skb->hash & 7);
          /* bpf_skb_store_bytes(skb, ...); */
          /* bpf_l{3,4}_csum_replace(skb, ...); */
          bpf_clone_redirect(skb, which, 0);
          return -1;
        }
      
      I was thinking whether to add skb_get_hash(), but then concluded the
      raw skb->hash seems fine in this case: we can directly access the hash
      w/o extra eBPF helper function call, it's filled out by many NICs on
      ingress, and in case the entropy level would not be sufficient, people
      can still implement their own specific sw fallback hash mix anyway.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba7591d8
  8. 01 8月, 2015 8 次提交
  9. 31 7月, 2015 4 次提交
    • H
      net/ipv6: add sysctl option accept_ra_min_hop_limit · 8013d1d7
      Hangbin Liu 提交于
      Commit 6fd99094 ("ipv6: Don't reduce hop limit for an interface")
      disabled accept hop limit from RA if it is smaller than the current hop
      limit for security stuff. But this behavior kind of break the RFC definition.
      
      RFC 4861, 6.3.4.  Processing Received Router Advertisements
         A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
         and Retrans Timer) may contain a value denoting that it is
         unspecified.  In such cases, the parameter should be ignored and the
         host should continue using whatever value it is already using.
      
         If the received Cur Hop Limit value is non-zero, the host SHOULD set
         its CurHopLimit variable to the received value.
      
      So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
      hop limit value they can accept from RA. And set default to 1 to meet RFC
      standards.
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Acked-by: NYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8013d1d7
    • D
      net: sched: fix refcount imbalance in actions · 28e6b67f
      Daniel Borkmann 提交于
      Since commit 55334a5d ("net_sched: act: refuse to remove bound action
      outside"), we end up with a wrong reference count for a tc action.
      
      Test case 1:
      
        FOO="1,6 0 0 4294967295,"
        BAR="1,6 0 0 4294967294,"
        tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 \
           action bpf bytecode "$FOO"
        tc actions show action bpf
          action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
          index 1 ref 1 bind 1
        tc actions replace action bpf bytecode "$BAR" index 1
        tc actions show action bpf
          action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe
          index 1 ref 2 bind 1
        tc actions replace action bpf bytecode "$FOO" index 1
        tc actions show action bpf
          action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
          index 1 ref 3 bind 1
      
      Test case 2:
      
        FOO="1,6 0 0 4294967295,"
        tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok
        tc actions show action gact
          action order 0: gact action pass
          random type none pass val 0
           index 1 ref 1 bind 1
        tc actions add action drop index 1
          RTNETLINK answers: File exists [...]
        tc actions show action gact
          action order 0: gact action pass
           random type none pass val 0
           index 1 ref 2 bind 1
        tc actions add action drop index 1
          RTNETLINK answers: File exists [...]
        tc actions show action gact
          action order 0: gact action pass
           random type none pass val 0
           index 1 ref 3 bind 1
      
      What happens is that in tcf_hash_check(), we check tcf_common for a given
      index and increase tcfc_refcnt and conditionally tcfc_bindcnt when we've
      found an existing action. Now there are the following cases:
      
        1) We do a late binding of an action. In that case, we leave the
           tcfc_refcnt/tcfc_bindcnt increased and are done with the ->init()
           handler. This is correctly handeled.
      
        2) We replace the given action, or we try to add one without replacing
           and find out that the action at a specific index already exists
           (thus, we go out with error in that case).
      
      In case of 2), we have to undo the reference count increase from
      tcf_hash_check() in the tcf_hash_check() function. Currently, we fail to
      do so because of the 'tcfc_bindcnt > 0' check which bails out early with
      an -EPERM error.
      
      Now, while commit 55334a5d prevents 'tc actions del action ...' on an
      already classifier-bound action to drop the reference count (which could
      then become negative, wrap around etc), this restriction only accounts for
      invocations outside a specific action's ->init() handler.
      
      One possible solution would be to add a flag thus we possibly trigger
      the -EPERM ony in situations where it is indeed relevant.
      
      After the patch, above test cases have correct reference count again.
      
      Fixes: 55334a5d ("net_sched: act: refuse to remove bound action outside")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28e6b67f
    • D
      bpf: also show process name/pid in bpf_jit_dump · b13138ef
      Daniel Borkmann 提交于
      It can be useful for testing to see the actual process/pid who is loading
      a given filter. I was running some BPF test program and noticed unusual
      filter loads from time to time, triggered by some other application in the
      background. bpf_jit_disasm is still working after this change.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b13138ef
    • D
      bpf: provide helper that indicates eBPF was migrated · 7b36f929
      Daniel Borkmann 提交于
      During recent discussions we had with Michael, we found that it would
      be useful to have an indicator that tells the JIT that an eBPF program
      had been migrated from classic instructions into eBPF instructions, as
      only in that case A and X need to be cleared in the prologue. Such eBPF
      programs do not set a particular type, but all have BPF_PROG_TYPE_UNSPEC.
      Thus, introduce a small helper for cde66c2d ("s390/bpf: Only clear
      A and X for converted BPF programs") and possibly others in future.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b36f929
  10. 30 7月, 2015 7 次提交
    • F
      netfilter: bridge: reduce nf_bridge_info to 32 bytes again · 72b1e5e4
      Florian Westphal 提交于
      We can use union for most of the temporary cruft (original ipv4/ipv6
      address, source mac, physoutdev) since they're used during different
      stages of br netfilter traversal.
      
      Also get rid of the last two ->mask users.
      
      Shrinks struct from 48 to 32 on 64bit arch.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      72b1e5e4
    • M
      netfilter: nf_ct_sctp: minimal multihoming support · d7ee3519
      Michal Kubeček 提交于
      Currently nf_conntrack_proto_sctp module handles only packets between
      primary addresses used to establish the connection. Any packets between
      secondary addresses are classified as invalid so that usual firewall
      configurations drop them. Allowing HEARTBEAT and HEARTBEAT-ACK chunks to
      establish a new conntrack would allow traffic between secondary
      addresses to pass through. A more sophisticated solution based on the
      addresses advertised in the initial handshake (and possibly also later
      dynamic address addition and removal) would be much harder to implement.
      Moreover, in general we cannot assume to always see the initial
      handshake as it can be routed through a different path.
      
      The patch adds two new conntrack states:
      
        SCTP_CONNTRACK_HEARTBEAT_SENT  - a HEARTBEAT chunk seen but not acked
        SCTP_CONNTRACK_HEARTBEAT_ACKED - a HEARTBEAT acked by HEARTBEAT-ACK
      
      State transition rules:
      
      - HEARTBEAT_SENT responds to usual chunks the same way as NONE (so that
        the behaviour changes as little as possible)
      - HEARTBEAT_ACKED responds to usual chunks the same way as ESTABLISHED
        does, except the resulting state is HEARTBEAT_ACKED rather than
        ESTABLISHED
      - previously existing states except NONE are preserved when HEARTBEAT or
        HEARTBEAT-ACK is seen
      - NONE (in the initial direction) changes to HEARTBEAT_SENT on HEARTBEAT
        and to CLOSED on HEARTBEAT-ACK
      - HEARTBEAT_SENT changes to HEARTBEAT_ACKED on HEARTBEAT-ACK in the
        reply direction
      - HEARTBEAT_SENT and HEARTBEAT_ACKED are preserved on HEARTBEAT and
        HEARTBEAT-ACK otherwise
      
      Normally, vtag is set from the INIT chunk for the reply direction and
      from the INIT-ACK chunk for the originating direction (i.e. each of
      these defines vtag value for the opposite direction). For secondary
      conntracks, we can't rely on seeing INIT/INIT-ACK and even if we have
      seen them, we would need to connect two different conntracks. Therefore
      simplified logic is applied: vtag of first packet in each direction
      (HEARTBEAT in the originating and HEARTBEAT-ACK in reply direction) is
      saved and all following packets in that direction are compared with this
      saved value. While INIT and INIT-ACK define vtag for the opposite
      direction, vtags extracted from HEARTBEAT and HEARTBEAT-ACK are always
      for their direction.
      
      Default timeout values for new states are
      
        HEARTBEAT_SENT: 30 seconds (default hb_interval)
        HEARTBEAT_ACKED: 210 seconds (hb_interval * path_max_retry + max_rto)
      
      (We cannot expect to see the shutdown sequence so that, unlike
      ESTABLISHED, the HEARTBEAT_ACKED timeout shouldn't be too long.)
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d7ee3519
    • T
      lwtunnel: Make lwtun_encaps[] static · 92a99bf3
      Thomas Graf 提交于
      Any external user should use the registration API instead of
      accessing this directly.
      
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92a99bf3
    • T
      net: Recompute sk_txhash on negative routing advice · 265f94ff
      Tom Herbert 提交于
      When a connection is failing a transport protocol calls
      dst_negative_advice to try to get a better route. This patch includes
      changing the sk_txhash in that function. This provides a rudimentary
      method to try to find a different path in the network since sk_txhash
      affects ECMP on the local host and through the network (via flow labels
      or UDP source port in encapsulation).
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      265f94ff
    • T
      net: Set sk_txhash from a random number · 877d1f62
      Tom Herbert 提交于
      This patch creates sk_set_txhash and eliminates protocol specific
      inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a
      random number instead of performing flow dissection. sk_set_txash
      is also allowed to be called multiple times for the same socket,
      we'll need this when redoing the hash for negative routing advice.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      877d1f62
    • M
      drm/amdgpu: Drop drm/ prefix for including drm.h in amdgpu_drm.h · b3fcf36a
      Michel Dänzer 提交于
      This allows amdgpu_drm.h to be reused verbatim in libdrm.
      Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NMichel Dänzer <michel.daenzer@amd.com>
      b3fcf36a
    • M
      drm/radeon: Drop drm/ prefix for including drm.h in radeon_drm.h · e13af53e
      Michel Dänzer 提交于
      This allows radeon_drm.h to be reused verbatim in libdrm.
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NMichel Dänzer <michel.daenzer@amd.com>
      e13af53e
  11. 29 7月, 2015 1 次提交