1. 07 8月, 2015 3 次提交
  2. 04 8月, 2015 1 次提交
  3. 03 8月, 2015 1 次提交
    • D
      ebpf: add skb->hash to offset map for usage in {cls, act}_bpf or filters · ba7591d8
      Daniel Borkmann 提交于
      Add skb->hash to the __sk_buff offset map, so it can be accessed from
      an eBPF program. We currently already do this for classic BPF filters,
      but not yet on eBPF, it might be useful as a demuxer in combination with
      helpers like bpf_clone_redirect(), toy example:
      
        __section("cls-lb") int ingress_main(struct __sk_buff *skb)
        {
          unsigned int which = 3 + (skb->hash & 7);
          /* bpf_skb_store_bytes(skb, ...); */
          /* bpf_l{3,4}_csum_replace(skb, ...); */
          bpf_clone_redirect(skb, which, 0);
          return -1;
        }
      
      I was thinking whether to add skb_get_hash(), but then concluded the
      raw skb->hash seems fine in this case: we can directly access the hash
      w/o extra eBPF helper function call, it's filled out by many NICs on
      ingress, and in case the entropy level would not be sufficient, people
      can still implement their own specific sw fallback hash mix anyway.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba7591d8
  4. 01 8月, 2015 8 次提交
  5. 31 7月, 2015 4 次提交
    • H
      net/ipv6: add sysctl option accept_ra_min_hop_limit · 8013d1d7
      Hangbin Liu 提交于
      Commit 6fd99094 ("ipv6: Don't reduce hop limit for an interface")
      disabled accept hop limit from RA if it is smaller than the current hop
      limit for security stuff. But this behavior kind of break the RFC definition.
      
      RFC 4861, 6.3.4.  Processing Received Router Advertisements
         A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
         and Retrans Timer) may contain a value denoting that it is
         unspecified.  In such cases, the parameter should be ignored and the
         host should continue using whatever value it is already using.
      
         If the received Cur Hop Limit value is non-zero, the host SHOULD set
         its CurHopLimit variable to the received value.
      
      So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
      hop limit value they can accept from RA. And set default to 1 to meet RFC
      standards.
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Acked-by: NYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8013d1d7
    • D
      net: sched: fix refcount imbalance in actions · 28e6b67f
      Daniel Borkmann 提交于
      Since commit 55334a5d ("net_sched: act: refuse to remove bound action
      outside"), we end up with a wrong reference count for a tc action.
      
      Test case 1:
      
        FOO="1,6 0 0 4294967295,"
        BAR="1,6 0 0 4294967294,"
        tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 \
           action bpf bytecode "$FOO"
        tc actions show action bpf
          action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
          index 1 ref 1 bind 1
        tc actions replace action bpf bytecode "$BAR" index 1
        tc actions show action bpf
          action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe
          index 1 ref 2 bind 1
        tc actions replace action bpf bytecode "$FOO" index 1
        tc actions show action bpf
          action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe
          index 1 ref 3 bind 1
      
      Test case 2:
      
        FOO="1,6 0 0 4294967295,"
        tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok
        tc actions show action gact
          action order 0: gact action pass
          random type none pass val 0
           index 1 ref 1 bind 1
        tc actions add action drop index 1
          RTNETLINK answers: File exists [...]
        tc actions show action gact
          action order 0: gact action pass
           random type none pass val 0
           index 1 ref 2 bind 1
        tc actions add action drop index 1
          RTNETLINK answers: File exists [...]
        tc actions show action gact
          action order 0: gact action pass
           random type none pass val 0
           index 1 ref 3 bind 1
      
      What happens is that in tcf_hash_check(), we check tcf_common for a given
      index and increase tcfc_refcnt and conditionally tcfc_bindcnt when we've
      found an existing action. Now there are the following cases:
      
        1) We do a late binding of an action. In that case, we leave the
           tcfc_refcnt/tcfc_bindcnt increased and are done with the ->init()
           handler. This is correctly handeled.
      
        2) We replace the given action, or we try to add one without replacing
           and find out that the action at a specific index already exists
           (thus, we go out with error in that case).
      
      In case of 2), we have to undo the reference count increase from
      tcf_hash_check() in the tcf_hash_check() function. Currently, we fail to
      do so because of the 'tcfc_bindcnt > 0' check which bails out early with
      an -EPERM error.
      
      Now, while commit 55334a5d prevents 'tc actions del action ...' on an
      already classifier-bound action to drop the reference count (which could
      then become negative, wrap around etc), this restriction only accounts for
      invocations outside a specific action's ->init() handler.
      
      One possible solution would be to add a flag thus we possibly trigger
      the -EPERM ony in situations where it is indeed relevant.
      
      After the patch, above test cases have correct reference count again.
      
      Fixes: 55334a5d ("net_sched: act: refuse to remove bound action outside")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28e6b67f
    • D
      bpf: also show process name/pid in bpf_jit_dump · b13138ef
      Daniel Borkmann 提交于
      It can be useful for testing to see the actual process/pid who is loading
      a given filter. I was running some BPF test program and noticed unusual
      filter loads from time to time, triggered by some other application in the
      background. bpf_jit_disasm is still working after this change.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b13138ef
    • D
      bpf: provide helper that indicates eBPF was migrated · 7b36f929
      Daniel Borkmann 提交于
      During recent discussions we had with Michael, we found that it would
      be useful to have an indicator that tells the JIT that an eBPF program
      had been migrated from classic instructions into eBPF instructions, as
      only in that case A and X need to be cleared in the prologue. Such eBPF
      programs do not set a particular type, but all have BPF_PROG_TYPE_UNSPEC.
      Thus, introduce a small helper for cde66c2d ("s390/bpf: Only clear
      A and X for converted BPF programs") and possibly others in future.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b36f929
  6. 30 7月, 2015 7 次提交
    • F
      netfilter: bridge: reduce nf_bridge_info to 32 bytes again · 72b1e5e4
      Florian Westphal 提交于
      We can use union for most of the temporary cruft (original ipv4/ipv6
      address, source mac, physoutdev) since they're used during different
      stages of br netfilter traversal.
      
      Also get rid of the last two ->mask users.
      
      Shrinks struct from 48 to 32 on 64bit arch.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      72b1e5e4
    • M
      netfilter: nf_ct_sctp: minimal multihoming support · d7ee3519
      Michal Kubeček 提交于
      Currently nf_conntrack_proto_sctp module handles only packets between
      primary addresses used to establish the connection. Any packets between
      secondary addresses are classified as invalid so that usual firewall
      configurations drop them. Allowing HEARTBEAT and HEARTBEAT-ACK chunks to
      establish a new conntrack would allow traffic between secondary
      addresses to pass through. A more sophisticated solution based on the
      addresses advertised in the initial handshake (and possibly also later
      dynamic address addition and removal) would be much harder to implement.
      Moreover, in general we cannot assume to always see the initial
      handshake as it can be routed through a different path.
      
      The patch adds two new conntrack states:
      
        SCTP_CONNTRACK_HEARTBEAT_SENT  - a HEARTBEAT chunk seen but not acked
        SCTP_CONNTRACK_HEARTBEAT_ACKED - a HEARTBEAT acked by HEARTBEAT-ACK
      
      State transition rules:
      
      - HEARTBEAT_SENT responds to usual chunks the same way as NONE (so that
        the behaviour changes as little as possible)
      - HEARTBEAT_ACKED responds to usual chunks the same way as ESTABLISHED
        does, except the resulting state is HEARTBEAT_ACKED rather than
        ESTABLISHED
      - previously existing states except NONE are preserved when HEARTBEAT or
        HEARTBEAT-ACK is seen
      - NONE (in the initial direction) changes to HEARTBEAT_SENT on HEARTBEAT
        and to CLOSED on HEARTBEAT-ACK
      - HEARTBEAT_SENT changes to HEARTBEAT_ACKED on HEARTBEAT-ACK in the
        reply direction
      - HEARTBEAT_SENT and HEARTBEAT_ACKED are preserved on HEARTBEAT and
        HEARTBEAT-ACK otherwise
      
      Normally, vtag is set from the INIT chunk for the reply direction and
      from the INIT-ACK chunk for the originating direction (i.e. each of
      these defines vtag value for the opposite direction). For secondary
      conntracks, we can't rely on seeing INIT/INIT-ACK and even if we have
      seen them, we would need to connect two different conntracks. Therefore
      simplified logic is applied: vtag of first packet in each direction
      (HEARTBEAT in the originating and HEARTBEAT-ACK in reply direction) is
      saved and all following packets in that direction are compared with this
      saved value. While INIT and INIT-ACK define vtag for the opposite
      direction, vtags extracted from HEARTBEAT and HEARTBEAT-ACK are always
      for their direction.
      
      Default timeout values for new states are
      
        HEARTBEAT_SENT: 30 seconds (default hb_interval)
        HEARTBEAT_ACKED: 210 seconds (hb_interval * path_max_retry + max_rto)
      
      (We cannot expect to see the shutdown sequence so that, unlike
      ESTABLISHED, the HEARTBEAT_ACKED timeout shouldn't be too long.)
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d7ee3519
    • T
      lwtunnel: Make lwtun_encaps[] static · 92a99bf3
      Thomas Graf 提交于
      Any external user should use the registration API instead of
      accessing this directly.
      
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92a99bf3
    • T
      net: Recompute sk_txhash on negative routing advice · 265f94ff
      Tom Herbert 提交于
      When a connection is failing a transport protocol calls
      dst_negative_advice to try to get a better route. This patch includes
      changing the sk_txhash in that function. This provides a rudimentary
      method to try to find a different path in the network since sk_txhash
      affects ECMP on the local host and through the network (via flow labels
      or UDP source port in encapsulation).
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      265f94ff
    • T
      net: Set sk_txhash from a random number · 877d1f62
      Tom Herbert 提交于
      This patch creates sk_set_txhash and eliminates protocol specific
      inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a
      random number instead of performing flow dissection. sk_set_txash
      is also allowed to be called multiple times for the same socket,
      we'll need this when redoing the hash for negative routing advice.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      877d1f62
    • M
      drm/amdgpu: Drop drm/ prefix for including drm.h in amdgpu_drm.h · b3fcf36a
      Michel Dänzer 提交于
      This allows amdgpu_drm.h to be reused verbatim in libdrm.
      Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NMichel Dänzer <michel.daenzer@amd.com>
      b3fcf36a
    • M
      drm/radeon: Drop drm/ prefix for including drm.h in radeon_drm.h · e13af53e
      Michel Dänzer 提交于
      This allows radeon_drm.h to be reused verbatim in libdrm.
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NMichel Dänzer <michel.daenzer@amd.com>
      e13af53e
  7. 29 7月, 2015 2 次提交
  8. 28 7月, 2015 4 次提交
    • R
      cpufreq: Avoid attempts to create duplicate symbolic links · 559ed407
      Rafael J. Wysocki 提交于
      After commit 87549141 (cpufreq: Stop migrating sysfs files on
      hotplug) there is a problem with CPUs that share cpufreq policy
      objects with other CPUs and are initially offline.
      
      Say CPU1 shares a policy with CPU0 which is online and is registered
      first.  As part of the registration process, cpufreq_add_dev() is
      called for it.  It creates the policy object and a symbolic link
      to it from the CPU1's sysfs directory.  If CPU1 is registered
      subsequently and it is offline at that time, cpufreq_add_dev() will
      attempt to create a symbolic link to the policy object for it, but
      that link is present already, so a warning about that will be
      triggered.
      
      To avoid that warning, make cpufreq use an additional CPU mask
      containing related CPUs that are actually present for each policy
      object.  That mask is initialized when the policy object is populated
      after its creation (for the first online CPU using it) and it includes
      CPUs from the "policy CPUs" mask returned by the cpufreq driver's
      ->init() callback that are physically present at that time.  Symbolic
      links to the policy are created only for the CPUs in that mask.
      
      If cpufreq_add_dev() is invoked for an offline CPU, it checks the
      new mask and only creates the symlink if the CPU was not in it (the
      CPU is added to the mask at the same time).
      
      In turn, cpufreq_remove_dev() drops the given CPU from the new mask,
      removes its symlink to the policy object and returns, unless it is
      the CPU owning the policy object.  In that case, the policy object
      is moved to a new CPU's sysfs directory or deleted if the CPU being
      removed was the last user of the policy.
      
      While at it, notice that cpufreq_remove_dev() can't fail, because
      its return value is ignored, so make it ignore return values from
      __cpufreq_remove_dev_prepare() and __cpufreq_remove_dev_finish()
      and prevent these functions from aborting on errors returned by
      __cpufreq_governor().  Also drop the now unused sif argument from
      them.
      
      Fixes: 87549141 (cpufreq: Stop migrating sysfs files on hotplug)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reported-and-tested-by: NRussell King <linux@arm.linux.org.uk>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      559ed407
    • H
      net/mlx4_en: Add support for hardware accelerated 802.1ad vlan · e38af4fa
      Hadar Hen Zion 提交于
      To enable device support in accelerated 802.1ad vlan, the port
      capability "packet has vlan enable" (phv_en) should be set.
      Firmware won't work properly, in case phv_en is not set.
      
      The user can enable "phv_en" port capability with the new ethtool
      private flag phv-bit. The phv-bit private flag default value is OFF,
      users who are interested in 802.1ad hardware acceleration should turn ON
      the phv-bit private flag:
      $ ethtool --set-priv-flags eth1 phv-bit on
      
      Once the private flag is set, the device is ready for 802.1ad vlan
      acceleration.
      
      The user should also change the interface device features and turn on
      "tx-vlan-stag-hw-insert" which is off by default:
      $ ethtool -K eth1  tx-vlan-stag-hw-insert on
      
      "phv-bit" private flag setting is available only for Physical
      Functions(PF), the Virtual Function (VF) will be able to use the feature
      by setting "tx-vlan-stag-hw-insert" ethtool device feature only if the
      feature was enabled by the Hypervisor.
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e38af4fa
    • H
      net/mlx4: Prepare VLAN macros for 802.1ad Hardware accelerated support · e802f8e4
      Hadar Hen Zion 提交于
      To add Hardware accelerated support in 802.1ad vlan, replace
      Current VLAN macros to CVLAN.
      Replace:
      MLX4_WQE_CTRL_INS_VLAN
      MLX4_CQE_VLAN_PRESENT_MASK
      With:
      MLX4_WQE_CTRL_INS_CVLAN
      MLX4_CQE_CVLAN_PRESENT_MASK
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e802f8e4
    • H
      net/mlx4_core: Preparations for 802.1ad VLAN support · 77fc29c4
      Hadar Hen Zion 提交于
      mlx4_core preparation to support hardware accelerated 802.1ad VLAN
      device.
      
      To allow 802.1ad accelerated device, "packet has vlan" (phv)
      Firmware capability should be available. Firmware without the
      phv capability won't behave properly and can't support 802.1ad device
      acceleration.
      
      The driver checks the Firmware capability and sets the phv bit
      accordingly in SET_PORT command.
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77fc29c4
  9. 27 7月, 2015 10 次提交