1. 02 6月, 2014 5 次提交
    • D
      net: filter: improve filter block macros · f8f6d679
      Daniel Borkmann 提交于
      Commit 9739eef1 ("net: filter: make BPF conversion more readable")
      started to introduce helper macros similar to BPF_STMT()/BPF_JUMP()
      macros from classic BPF.
      
      However, quite some statements in the filter conversion functions
      remained in the old style which gives a mixture of block macros and
      non block macros in the code. This patch makes the block macros itself
      more readable by using explicit member initialization, and converts
      the remaining ones where possible to remain in a more consistent state.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8f6d679
    • D
      net: filter: get rid of BPF_S_* enum · 34805931
      Daniel Borkmann 提交于
      This patch finally allows us to get rid of the BPF_S_* enum.
      Currently, the code performs unnecessary encode and decode
      workarounds in seccomp and filter migration itself when a filter
      is being attached in order to overcome BPF_S_* encoding which
      is not used anymore by the new interpreter resp. JIT compilers.
      
      Keeping it around would mean that also in future we would need
      to extend and maintain this enum and related encoders/decoders.
      We can get rid of all that and save us these operations during
      filter attaching. Naturally, also JIT compilers need to be updated
      by this.
      
      Before JIT conversion is being done, each compiler checks if A
      is being loaded at startup to obtain information if it needs to
      emit instructions to clear A first. Since BPF extensions are a
      subset of BPF_LD | BPF_{W,H,B} | BPF_ABS variants, case statements
      for extensions can be removed at that point. To ease and minimalize
      code changes in the classic JITs, we have introduced bpf_anc_helper().
      
      Tested with test_bpf on x86_64 (JIT, int), s390x (JIT, int),
      arm (JIT, int), i368 (int), ppc64 (JIT, int); for sparc we
      unfortunately didn't have access, but changes are analogous to
      the rest.
      
      Joint work with Alexei Starovoitov.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mircea Gherzan <mgherzan@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Acked-by: NChema Gonzalez <chemag@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34805931
    • D
      net: Revert mlx4 cpumask changes. · ee39facb
      David S. Miller 提交于
      This reverts commit 70a640d0
      ("net/mlx4_en: Use affinity hint") and commit
      c8865b64 ("cpumask: Utility function
      to set n'th cpu - local cpu first") because these changes break
      the build when SMP is disabled amongst other things.
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee39facb
    • Y
      net/mlx4_en: Use affinity hint · 70a640d0
      Yuval Atias 提交于
      The “affinity hint” mechanism is used by the user space
      daemon, irqbalancer, to indicate a preferred CPU mask for irqs.
      Irqbalancer can use this hint to balance the irqs between the
      cpus indicated by the mask.
      
      We wish the HCA to preferentially map the IRQs it uses to numa cores
      close to it.  To accomplish this, we use cpumask_set_cpu_local_first(), that
      sets the affinity hint according the following policy:
      First it maps IRQs to “close” numa cores.  If these are exhausted, the
      remaining IRQs are mapped to “far” numa cores.
      Signed-off-by: NYuval Atias <yuvala@mellanox.com>
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70a640d0
    • A
      cpumask: Utility function to set n'th cpu - local cpu first · c8865b64
      Amir Vadai 提交于
      This function sets the n'th cpu - local cpu's first.
      For example: in a 16 cores server with even cpu's local, will get the
      following values:
      cpumask_set_cpu_local_first(0, numa, cpumask) => cpu 0 is set
      cpumask_set_cpu_local_first(1, numa, cpumask) => cpu 2 is set
      ...
      cpumask_set_cpu_local_first(7, numa, cpumask) => cpu 14 is set
      cpumask_set_cpu_local_first(8, numa, cpumask) => cpu 1 is set
      cpumask_set_cpu_local_first(9, numa, cpumask) => cpu 3 is set
      ...
      cpumask_set_cpu_local_first(15, numa, cpumask) => cpu 15 is set
      
      Curently this function will be used by multi queue networking devices to
      calculate the irq affinity mask, such that as many local cpu's as
      possible will be utilized to handle the mq device irq's.
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8865b64
  2. 30 5月, 2014 1 次提交
  3. 24 5月, 2014 8 次提交
    • D
      net: filter: let unattached filters use sock_fprog_kern · b1fcd35c
      Daniel Borkmann 提交于
      The sk_unattached_filter_create() API is used by BPF filters that
      are not directly attached or related to sockets, and are used in
      team, ptp, xt_bpf, cls_bpf, etc. As such all users do their own
      internal managment of obtaining filter blocks and thus already
      have them in kernel memory and set up before calling into
      sk_unattached_filter_create(). As a result, due to __user annotation
      in sock_fprog, sparse triggers false positives (incorrect type in
      assignment [different address space]) when filters are set up before
      passing them to sk_unattached_filter_create(). Therefore, let
      sk_unattached_filter_create() API use sock_fprog_kern to overcome
      this issue.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1fcd35c
    • D
      net: filter: remove DL macro · 8556ce79
      Daniel Borkmann 提交于
      Lets get rid of this macro. After commit 5bcfedf0 ("net: filter:
      simplify label names from jump-table"), labels have become more
      readable due to omission of BPF_ prefix but at the same time more
      generic, so that things like `git grep -n` would not find them. As
      a middle path, lets get rid of the DL macro as it's not strictly
      needed and would otherwise just hide the full name.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8556ce79
    • T
      l2tp: Add support for zero IPv6 checksums · 6b649fea
      Tom Herbert 提交于
      Added new L2TP configuration options to allow TX and RX of
      zero checksums in IPv6. Default is not to use them.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b649fea
    • T
      net: Make enabling of zero UDP6 csums more restrictive · 1c19448c
      Tom Herbert 提交于
      RFC 6935 permits zero checksums to be used in IPv6 however this is
      recommended only for certain tunnel protocols, it does not make
      checksums completely optional like they are in IPv4.
      
      This patch restricts the use of IPv6 zero checksums that was previously
      intoduced. no_check6_tx and no_check6_rx have been added to control
      the use of checksums in UDP6 RX and TX path. The normal
      sk_no_check_{rx,tx} settings are not used (this avoids ambiguity when
      dealing with a dual stack socket).
      
      A helper function has been added (udp_set_no_check6) which can be
      called by tunnel impelmentations to all zero checksums (send on the
      socket, and accept them as valid).
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c19448c
    • T
      net: Split sk_no_check into sk_no_check_{rx,tx} · 28448b80
      Tom Herbert 提交于
      Define separate fields in the sock structure for configuring disabling
      checksums in both TX and RX-- sk_no_check_tx and sk_no_check_rx.
      The SO_NO_CHECK socket option only affects sk_no_check_tx. Also,
      removed UDP_CSUM_* defines since they are no longer necessary.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28448b80
    • T
      net: Eliminate no_check from protosw · b26ba202
      Tom Herbert 提交于
      It doesn't seem like an protocols are setting anything other
      than the default, and allowing to arbitrarily disable checksums
      for a whole protocol seems dangerous. This can be done on a per
      socket basis.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b26ba202
    • S
      net-next:v4: Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool. · ed616689
      Sucheta Chakraborty 提交于
      o min_tx_rate puts lower limit on the VF bandwidth. VF is guaranteed
        to have a bandwidth of at least this value.
        max_tx_rate puts cap on the VF bandwidth. VF can have a bandwidth
        of up to this value.
      
      o A new handler set_vf_rate for attr IFLA_VF_RATE has been introduced
        which takes 4 arguments:
        netdev, VF number, min_tx_rate, max_tx_rate
      
      o ndo_set_vf_rate replaces ndo_set_vf_tx_rate handler.
      
      o Drivers that currently implement ndo_set_vf_tx_rate should now call
        ndo_set_vf_rate instead and reject attempt to set a minimum bandwidth
        greater than 0 for IFLA_VF_TX_RATE when IFLA_VF_RATE is not yet
        implemented by driver.
      
      o If user enters only one of either min_tx_rate or max_tx_rate, then,
        userland should read back the other value from driver and set both
        for IFLA_VF_RATE.
        Drivers that have not yet implemented IFLA_VF_RATE should always
        return min_tx_rate as 0 when read from ip tool.
      
      o If both IFLA_VF_TX_RATE and IFLA_VF_RATE options are specified, then
        IFLA_VF_RATE should override.
      
      o Idea is to have consistent display of rate values to user.
      
      o Usage example: -
      
        ./ip link set p4p1 vf 0 rate 900
      
        ./ip link show p4p1
        32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
        DEFAULT qlen 1000
          link/ether 00:0e:1e:08:b0:f0 brd ff:ff:ff:ff:ff:ff
          vf 0 MAC 3e:a0:ca:bd:ae:5a, tx rate 900 (Mbps), max_tx_rate 900Mbps
          vf 1 MAC f6:c6:7c:3f:3d:6c
          vf 2 MAC 56:32:43:98:d7:71
          vf 3 MAC d6:be:c3:b5:85:ff
          vf 4 MAC ee:a9:9a:1e:19:14
          vf 5 MAC 4a:d0:4c:07:52:18
          vf 6 MAC 3a:76:44:93:62:f9
          vf 7 MAC 82:e9:e7:e3:15:1a
      
        ./ip link set p4p1 vf 0 max_tx_rate 300 min_tx_rate 200
      
        ./ip link show p4p1
        32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
        DEFAULT qlen 1000
          link/ether 00:0e:1e:08:b0:f0 brd ff:ff:ff:ff:ff:ff
          vf 0 MAC 3e:a0:ca:bd:ae:5a, tx rate 300 (Mbps), max_tx_rate 300Mbps,
          min_tx_rate 200Mbps
          vf 1 MAC f6:c6:7c:3f:3d:6c
          vf 2 MAC 56:32:43:98:d7:71
          vf 3 MAC d6:be:c3:b5:85:ff
          vf 4 MAC ee:a9:9a:1e:19:14
          vf 5 MAC 4a:d0:4c:07:52:18
          vf 6 MAC 3a:76:44:93:62:f9
          vf 7 MAC 82:e9:e7:e3:15:1a
      
        ./ip link set p4p1 vf 0 max_tx_rate 600 rate 300
      
        ./ip link show p4p1
        32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
        DEFAULT qlen 1000
          link/ether 00:0e:1e:08:b0:f brd ff:ff:ff:ff:ff:ff
          vf 0 MAC 3e:a0:ca:bd:ae:5, tx rate 600 (Mbps), max_tx_rate 600Mbps,
          min_tx_rate 200Mbps
          vf 1 MAC f6:c6:7c:3f:3d:6c
          vf 2 MAC 56:32:43:98:d7:71
          vf 3 MAC d6:be:c3:b5:85:ff
          vf 4 MAC ee:a9:9a:1e:19:14
          vf 5 MAC 4a:d0:4c:07:52:18
          vf 6 MAC 3a:76:44:93:62:f9
          vf 7 MAC 82:e9:e7:e3:15:1a
      Signed-off-by: NSucheta Chakraborty <sucheta.chakraborty@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed616689
    • M
      wait: swap EXIT_ZOMBIE(Z) and EXIT_DEAD(X) chars in TASK_STATE_TO_CHAR_STR · ad0f614e
      Masatake YAMATO 提交于
      In commit ad86622b ("wait: swap EXIT_ZOMBIE and EXIT_DEAD to hide
      EXIT_TRACE from user-space") the order of task state definitions were
      changed: EXIT_DEAD and EXIT_ZOMBIE were swapped.  Though the charterers
      for the states in TASK_STATE_TO_CHAR_STR string were not updated.  This
      patch synchronizes the string to the order of definitions.
      Signed-off-by: NMasatake YAMATO <yamato@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ad0f614e
  4. 23 5月, 2014 6 次提交
  5. 22 5月, 2014 2 次提交
    • E
      cfg80211: allow RSSI compensation · 67af9811
      Emmanuel Grumbach 提交于
      Channels in 2.4GHz band overlap, this means that if we
      send a probe request on channel 1 and then move to channel
      2, we will hear the probe response on channel 2. In this
      case, the RSSI will be lower than if we had heard it on
      the channel on which it was sent (1 in this case).
      
      The firmware / low level driver can parse the channel in
      the DS IE or HT IE and compensate the RSSI so that it will
      still have a valid value even if we heard the frame on an
      adjacent channel. This can be done up to a certain offset.
      
      Add this offset as a configuration for the low level driver.
      A low level driver that can compensate the low RSSI in this
      case should assign the maximal offset for which the RSSI
      value is still valid.
      Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      67af9811
    • A
      net: filter: cleanup invocation of internal BPF · 5fe821a9
      Alexei Starovoitov 提交于
      Kernel API for classic BPF socket filters is:
      
      sk_unattached_filter_create() - validate classic BPF, convert, JIT
      SK_RUN_FILTER() - run it
      sk_unattached_filter_destroy() - destroy socket filter
      
      Cleanup internal BPF kernel API as following:
      
      sk_filter_select_runtime() - final step of internal BPF creation.
        Try to JIT internal BPF program, if JIT is not available select interpreter
      SK_RUN_FILTER() - run it
      sk_filter_free() - free internal BPF program
      
      Disallow direct calls to BPF interpreter. Execution of the BPF program should
      be done with SK_RUN_FILTER() macro.
      
      Example of internal BPF create, run, destroy:
      
        struct sk_filter *fp;
      
        fp = kzalloc(sk_filter_size(prog_len), GFP_KERNEL);
        memcpy(fp->insni, prog, prog_len * sizeof(fp->insni[0]));
        fp->len = prog_len;
      
        sk_filter_select_runtime(fp);
      
        SK_RUN_FILTER(fp, ctx);
      
        sk_filter_free(fp);
      
      Sockets, seccomp, testsuite, tracing are using different ways to populate
      sk_filter, so first steps of program creation are not common.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5fe821a9
  6. 21 5月, 2014 3 次提交
  7. 20 5月, 2014 6 次提交
  8. 19 5月, 2014 9 次提交