提交 · ea077c1cea36a6b5ded1256dcd56c72ff2a22c62 · openeuler / raspberrypi-kernel

25 4月, 2014 3 次提交

cfg80211: Add attributes describing prohibited channel bandwidth · ea077c1c

由 Rostislav Lisovy 提交于 4月 15, 2014

Since there are frequency bands (e.g. 5.9GHz) allowing channels
with only 10 or 5 MHz bandwidth, this patch adds attributes that
allow keeping track about this information.

When channel attributes are reported to user-space, make sure to
not break old tools, i.e. if the 'split wiphy dump' is enabled,
report the extra attributes (if present) describing the bandwidth
restrictions.  If the 'split wiphy dump' is not enabled,
completely omit those channels that have flags set to either
IEEE80211_CHAN_NO_10MHZ or IEEE80211_CHAN_NO_20MHZ.

Add the check for new bandwidth restriction flags in
cfg80211_chandef_usable() to comply with the restrictions.
Signed-off-by: NRostislav Lisovy <rostislav.lisovy@fel.cvut.cz>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

ea077c1c

mac80211: add option to generate CCMP IVs only for mgmt frames · 17d38fa8

由 Marek Kwaczynski 提交于 4月 14, 2014

Some chips can encrypt managment frames in HW, but
require generated IV in the frame. Add a key flag
that allows us to achieve this.
Signed-off-by: NMarek Kwaczynski <marek.kwaczynski@tieto.com>
[use BIT(0) to fill that spot, fix indentation]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

17d38fa8

cfg80211: allow drivers to iterate over matching combinations · 65a124dd

由 Michal Kazior 提交于 4月 09, 2014

The patch splits cfg80211_check_combinations()
into an iterator function and a simple iteration
user.

This makes it possible for drivers to asses how
many channels can use given iftype setup. This in
turn can be used for future
multi-interface/multi-channel channel switching.
Signed-off-by: NMichal Kazior <michal.kazior@tieto.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

65a124dd

09 4月, 2014 11 次提交

mac80211: Update conf_is_ht() to work properly with 5/10MHz channels · 041f607d

由 Rostislav Lisovy 提交于 4月 02, 2014

The channels with 5/10MHz bandwidth are not HT. We have to
reflect this in conf_is_ht() function which returns whether the
particular channel is HT or not.
Signed-off-by: NRostislav Lisovy <rostislav.lisovy@fel.cvut.cz>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

041f607d

cfg80211: update comment about WIPHY_FLAG_CUSTOM_REGULATORY · ce26151b

由 Kalle Valo 提交于 4月 03, 2014

Commit a2f73b6c ("cfg80211: move regulatory flags to their own variable")
renamed WIPHY_FLAG_CUSTOM_REGULATORY to REGULATORY_CUSTOM_REG, but missed to
update one comment.
Signed-off-by: NKalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

ce26151b

mac80211: allow reservation of a running chanctx · 5d52ee81

由 Luciano Coelho 提交于 2月 27, 2014

With single-channel drivers, we need to be able to change a running
chanctx if we want to use chanctx reservation.  Not all drivers may be
able to do this, so add a flag that indicates support for it.

Changing a running chanctx can also be used as an optimization in
multi-channel drivers when the context needs to be reserved for future
usage.

Introduce IEEE80211_CHANCTX_RESERVED chanctx mode to mark a channel as
reserved so nobody else can use it (since we know it's going to
change).  In the future, we may allow several vifs to use the same
reservation as long as they plan to use the chanctx on the same
future channel.
Signed-off-by: NLuciano Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

5d52ee81

cfg80211/mac80211: move interface counting for combination check to mac80211 · 73de86a3

由 Luciano Coelho 提交于 2月 13, 2014

Move the counting part of the interface combination check from
cfg80211 to mac80211.

This is needed to simplify locking when the driver has to perform a
combination check by itself (eg. with channel-switch).
Signed-off-by: NLuciano Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

73de86a3

cfg80211/mac80211: refactor cfg80211_chandef_dfs_required() · 2beb6dab

由 Luciano Coelho 提交于 2月 18, 2014

Some interface types don't require DFS (such as STATION, P2P_CLIENT
etc).  In order to centralize these decisions, make
cfg80211_chandef_dfs_required() take the iftype into consideration.
Signed-off-by: NLuciano Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

2beb6dab

cfg80211: refactor cfg80211_can_use_iftype_chan() · cb2d956d

由 Luciano Coelho 提交于 2月 17, 2014

Separate the code that counts the interface types and channels from
the code that check the interface combinations.  The new function that
checks for combinations is exported so it can be called by the
drivers.

This is done in preparation for moving the interface combinations
checks out of cfg80211.
Signed-off-by: NLuciano Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

cb2d956d

cfg80211: Add an option to hint indoor operation · 52616f2b

由 Ilan Peer 提交于 2月 25, 2014

Add the option to hint the wireless core that it is operating in an indoor
environment.
Signed-off-by: NIlan Peer <ilan.peer@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

52616f2b

cfg80211: Enable GO operation on additional channels · 174e0cd2

由 Ilan Peer 提交于 2月 23, 2014

Allow GO operation on a channel marked with IEEE80211_CHAN_GO_CONCURRENT
iff there is an active station interface that is associated to
an AP operating on the same channel in the 2 GHz band or the same UNII band
(in the 5 GHz band). This relaxation is not allowed if the channel is
marked with IEEE80211_CHAN_RADAR.

Note that this is a permissive approach to the FCC definitions,
that require a clear assessment that the device operating the AP is
an authorized master, i.e., with radar detection and DFS capabilities.

It is assumed that such restrictions are enforced by user space.
Furthermore, it is assumed, that if the conditions that allowed for
the operation of the GO on such a channel change, i.e., the station
interface disconnected from the AP, it is the responsibility of user
space to evacuate the GO from the channel.
Signed-off-by: NIlan Peer <ilan.peer@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

174e0cd2

cfg80211: Add indoor only and GO concurrent channel attributes · 570dbde1

由 David Spinadel 提交于 2月 23, 2014

The FCC are clarifying some soft configuration requirements,
which among other include the following:

1. Indoor operation, where a device can use channels requiring indoor
   operation, subject to that it can guarantee indoor operation,
   i.e., the device is connected to AC Power or the device is under
   the control of a local master that is acting as an AP and is
   connected to AC Power.
2. Concurrent GO operation, where devices may instantiate a P2P GO
   while they are under the guidance of an authorized master. For example,
   on a channel on which a BSS is connected to an authorized master, i.e.,
   with DFS and radar detection capability in the UNII band.

See https://apps.fcc.gov/eas/comments/GetPublishedDocument.html?id=327&tn=528122

Add support for advertising Indoor-only and GO-Concurrent channel
properties.
Signed-off-by: NDavid Spinadel <david.spinadel@intel.com>
Signed-off-by: NIlan Peer <ilan.peer@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

570dbde1

mac80211: add vif to flush call · 77be2c54

由 Emmanuel Grumbach 提交于 3月 27, 2014

This will allow the low level driver to make decision based
on the vif such as queues etc...
Since the vif might be NULL, we can't add it to the tracing
functions.
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
[fix staging rtl8821ae driver]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

77be2c54

cfg80211: allow userspace to take ownership of interfaces · 78f22b6a

由 Johannes Berg 提交于 3月 24, 2014

When dynamically creating interfaces from userspace, e.g. for P2P usage,
such interfaces are usually owned by the process that created them, i.e.
wpa_supplicant. Should wpa_supplicant crash, such interfaces will often
cease operating properly and cause problems on restarting the process.

To avoid this problem, introduce an ownership concept for interfaces. If
an interface is owned by a netlink socket, then it will be destroyed if
the netlink socket is closed for any reason, including if the process it
belongs to crashed. This gives us a race-free way to get rid of any such
interfaces.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

78f22b6a

02 4月, 2014 4 次提交

net: Add a test to see if a skb is freeable in irq context · 574f7194

由 Eric W. Biederman 提交于 4月 01, 2014

Currently netpoll and skb_release_head_state assume that a skb is
freeable in hard irq context except when skb->destructor is set.

The reality is far from this.  So add a function skb_irq_freeable to
compute the full test and in the process be the living documentation
of what the requirements are of actually freeing a skb in hard irq
context.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

574f7194

net: ptp: move PTP classifier in its own file · 408eccce

由 Daniel Borkmann 提交于 4月 01, 2014

This commit fixes a build error reported by Fengguang, that is
triggered when CONFIG_NETWORK_PHY_TIMESTAMPING is not set:

  ERROR: "ptp_classify_raw" [drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.ko] undefined!

The fix is to introduce its own file for the PTP BPF classifier,
so that PTP_1588_CLOCK and/or NETWORK_PHY_TIMESTAMPING can select
it independently from each other. IXP4xx driver on ARM needs to
select it as well since it does not seem to select PTP_1588_CLOCK
or similar that would pull it in automatically.

This also allows for hiding all of the internals of the BPF PTP
program inside that file, and only exporting relevant API bits
to drivers.

This patch also adds a kdoc documentation of ptp_classify_raw()
API to make it clear that it can return PTP_CLASS_* defines. Also,
the BPF program has been translated into bpf_asm code, so that it
can be more easily read and altered (extensively documented in [1]).

In the kernel tree under tools/net/ we have bpf_asm and bpf_dbg
tools, so the commented program can simply be translated via
`./bpf_asm -c prog` where prog is a file that contains the
commented code. This makes it easily readable/verifiable and when
there's a need to change something, jump offsets etc do not need
to be replaced manually which can be very error prone. Instead,
a newly translated version via bpf_asm can simply replace the old
code. I have checked opcode diffs before/after and it's the very
same filter.

  [1] Documentation/networking/filter.txt

Fixes: 164d8c66 ("net: ptp: do not reimplement PTP/BPF classifier")
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jiri Benc <jbenc@redhat.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

408eccce

mac802154: make csma/cca parameters per-wpan · e462ded6

由 Phoebe Buckheister 提交于 3月 31, 2014

Commit 9b2777d6 (ieee802154: add TX power control to wpan_phy)
and following erroneously added CSMA and CCA parameters for 802.15.4
devices as PHY parameters, while they are actually MAC parameters and
can differ for any two WPAN instances. Since it is now sensible to have
multiple WPAN devices with differing CSMA/CCA parameters, make these
parameters MAC parameters instead.
Signed-off-by: NPhoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e462ded6

HID: uhid: Add UHID_CREATE2 + UHID_INPUT2 · 4522643a

由 Petri Gynther 提交于 3月 24, 2014

UHID_CREATE2:
HID report descriptor data (rd_data) is an array in struct uhid_create2_req,
instead of a pointer. Enables use from languages that don't support pointers,
e.g. Python.

UHID_INPUT2:
Data array is the last field of struct uhid_input2_req. Enables userspace to
write only the required bytes to kernel (ev.type + ev.u.input2.size + the part
of the data array that matters), instead of the entire struct uhid_input2_req.

Note:
UHID_CREATE2 increases the total size of struct uhid_event slightly, thus
increasing the size of messages that are queued for userspace. However, this
won't affect the userspace processing of these events.

[Jiri Kosina <jkosina@suse.cz>: adjust to hid_get_raw_report() and
				hid_output_raw_report() API changes]
Signed-off-by: NPetri Gynther <pgynther@google.com>
Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

4522643a

01 4月, 2014 5 次提交

net-gro: restore frag0 optimization · a50e233c

由 Eric Dumazet 提交于 3月 29, 2014

Main difference between napi_frags_skb() and napi_gro_receive() is that
the later is called while ethernet header was already pulled by the NIC
driver (eth_type_trans() was called before napi_gro_receive())

Jerry Chu in commit 299603e8 ("net-gro: Prepare GRO stack for the
upcoming tunneling support") tried to remove this difference by calling
eth_type_trans() from napi_frags_skb() instead of doing this later from
napi_frags_finish()

Goal was that napi_gro_complete() could call
ptype->callbacks.gro_complete(skb, 0)  (offset of first network header =
0)

Also, xxx_gro_receive() handlers all use off = skb_gro_offset(skb) to
point to their own header, for the current skb and ones held in gro_list

Problem is this cleanup work defeated the frag0 optimization:
It turns out the consecutive pskb_may_pull() calls are too expensive.

This patch brings back the frag0 stuff in napi_frags_skb().

As all skb have their mac header in skb head, we no longer need
skb_gro_mac_header()
Reported-by: NMichal Schmidt <mschmidt@redhat.com>
Fixes: 299603e8 ("net-gro: Prepare GRO stack for the upcoming tunneling support")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a50e233c

net-sysfs: expose number of carrier on/off changes · 2d3b479d

由 david decotigny 提交于 3月 29, 2014

This allows to monitor carrier on/off transitions and detect link
flapping issues:
 - new /sys/class/net/X/carrier_changes
 - new rtnetlink IFLA_CARRIER_CHANGES (getlink)

Tested:
  - grep . /sys/class/net/*/carrier_changes
    + ip link set dev X down/up
    + plug/unplug cable
  - updated iproute2: prints IFLA_CARRIER_CHANGES
  - iproute2 20121211-2 (debian): unchanged behavior
Signed-off-by: NDavid Decotigny <decot@googlers.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d3b479d

ipv6: reuse rt6_need_strict · 60ea37f7

由 Wang Yufen 提交于 3月 29, 2014

Move the whole rt6_need_strict as static inline into ip6_route.h,
so that it can be reused
Signed-off-by: NWang Yufen <wangyufen@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60ea37f7

net: export NET_ADDR_* values to user-space API · 339e0223

由 Florian Fainelli 提交于 3月 28, 2014

NET_ADDR_* values are exported in the
/sys/class/net/<iface>/addr_assign_type sysfs attributes, and as such
constitutes an user-space ABI. Move the NET_ADDR_* definitions from
include/linux/netdevice.h to include/uapi/linux/netdevice.h
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

339e0223

net: Allow modules to use is_skb_forwardable · 1ee481fb

由 Vlad Yasevich 提交于 3月 27, 2014

Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ee481fb

31 3月, 2014 8 次提交

net: filter: rework/optimize internal BPF interpreter's instruction set · bd4cf0ed

由 Alexei Starovoitov 提交于 3月 28, 2014

This patch replaces/reworks the kernel-internal BPF interpreter with
an optimized BPF instruction set format that is modelled closer to
mimic native instruction sets and is designed to be JITed with one to
one mapping. Thus, the new interpreter is noticeably faster than the
current implementation of sk_run_filter(); mainly for two reasons:

1. Fall-through jumps:

  BPF jump instructions are forced to go either 'true' or 'false'
  branch which causes branch-miss penalty. The new BPF jump
  instructions have only one branch and fall-through otherwise,
  which fits the CPU branch predictor logic better. `perf stat`
  shows drastic difference for branch-misses between the old and
  new code.

2. Jump-threaded implementation of interpreter vs switch
   statement:

  Instead of single table-jump at the top of 'switch' statement,
  gcc will now generate multiple table-jump instructions, which
  helps CPU branch predictor logic.

Note that the verification of filters is still being done through
sk_chk_filter() in classical BPF format, so filters from user- or
kernel space are verified in the same way as we do now, and same
restrictions/constraints hold as well.

We reuse current BPF JIT compilers in a way that this upgrade would
even be fine as is, but nevertheless allows for a successive upgrade
of BPF JIT compilers to the new format.

The internal instruction set migration is being done after the
probing for JIT compilation, so in case JIT compilers are able to
create a native opcode image, we're going to use that, and in all
other cases we're doing a follow-up migration of the BPF program's
instruction set, so that it can be transparently run in the new
interpreter.

In short, the *internal* format extends BPF in the following way (more
details can be taken from the appended documentation):

  - Number of registers increase from 2 to 10
  - Register width increases from 32-bit to 64-bit
  - Conditional jt/jf targets replaced with jt/fall-through
  - Adds signed > and >= insns
  - 16 4-byte stack slots for register spill-fill replaced
    with up to 512 bytes of multi-use stack space
  - Introduction of bpf_call insn and register passing convention
    for zero overhead calls from/to other kernel functions
  - Adds arithmetic right shift and endianness conversion insns
  - Adds atomic_add insn
  - Old tax/txa insns are replaced with 'mov dst,src' insn

Performance of two BPF filters generated by libpcap resp. bpf_asm
was measured on x86_64, i386 and arm32 (other libpcap programs
have similar performance differences):

fprog #1 is taken from Documentation/networking/filter.txt:
tcpdump -i eth0 port 22 -dd

fprog #2 is taken from 'man tcpdump':
tcpdump -i eth0 'tcp port 22 and (((ip[2:2] - ((ip[0]&0xf)<<2)) -
   ((tcp[12]&0xf0)>>2)) != 0)' -dd

Raw performance data from BPF micro-benchmark: SK_RUN_FILTER on the
same SKB (cache-hit) or 10k SKBs (cache-miss); time in ns per call,
smaller is better:

--x86_64--
         fprog #1  fprog #1   fprog #2  fprog #2
         cache-hit cache-miss cache-hit cache-miss
old BPF      90       101        192       202
new BPF      31        71         47        97
old BPF jit  12        34         17        44
new BPF jit TBD

--i386--
         fprog #1  fprog #1   fprog #2  fprog #2
         cache-hit cache-miss cache-hit cache-miss
old BPF     107       136        227       252
new BPF      40       119         69       172

--arm32--
         fprog #1  fprog #1   fprog #2  fprog #2
         cache-hit cache-miss cache-hit cache-miss
old BPF     202       300        475       540
new BPF     180       270        330       470
old BPF jit  26       182         37       202
new BPF jit TBD

Thus, without changing any userland BPF filters, applications on
top of AF_PACKET (or other families) such as libpcap/tcpdump, cls_bpf
classifier, netfilter's xt_bpf, team driver's load-balancing mode,
and many more will have better interpreter filtering performance.

While we are replacing the internal BPF interpreter, we also need
to convert seccomp BPF in the same step to make use of the new
internal structure since it makes use of lower-level API details
without being further decoupled through higher-level calls like
sk_unattached_filter_{create,destroy}(), for example.

Just as for normal socket filtering, also seccomp BPF experiences
a time-to-verdict speedup:

05-sim-long_jumps.c of libseccomp was used as micro-benchmark:

  seccomp_rule_add_exact(ctx,...
  seccomp_rule_add_exact(ctx,...

  rc = seccomp_load(ctx);

  for (i = 0; i < 10000000; i++)
     syscall(199, 100);

'short filter' has 2 rules
'large filter' has 200 rules

'short filter' performance is slightly better on x86_64/i386/arm32
'large filter' is much faster on x86_64 and i386 and shows no
               difference on arm32

--x86_64-- short filter
old BPF: 2.7 sec
 39.12%  bench  libc-2.15.so       [.] syscall
  8.10%  bench  [kernel.kallsyms]  [k] sk_run_filter
  6.31%  bench  [kernel.kallsyms]  [k] system_call
  5.59%  bench  [kernel.kallsyms]  [k] trace_hardirqs_on_caller
  4.37%  bench  [kernel.kallsyms]  [k] trace_hardirqs_off_caller
  3.70%  bench  [kernel.kallsyms]  [k] __secure_computing
  3.67%  bench  [kernel.kallsyms]  [k] lock_is_held
  3.03%  bench  [kernel.kallsyms]  [k] seccomp_bpf_load
new BPF: 2.58 sec
 42.05%  bench  libc-2.15.so       [.] syscall
  6.91%  bench  [kernel.kallsyms]  [k] system_call
  6.25%  bench  [kernel.kallsyms]  [k] trace_hardirqs_on_caller
  6.07%  bench  [kernel.kallsyms]  [k] __secure_computing
  5.08%  bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp

--arm32-- short filter
old BPF: 4.0 sec
 39.92%  bench  [kernel.kallsyms]  [k] vector_swi
 16.60%  bench  [kernel.kallsyms]  [k] sk_run_filter
 14.66%  bench  libc-2.17.so       [.] syscall
  5.42%  bench  [kernel.kallsyms]  [k] seccomp_bpf_load
  5.10%  bench  [kernel.kallsyms]  [k] __secure_computing
new BPF: 3.7 sec
 35.93%  bench  [kernel.kallsyms]  [k] vector_swi
 21.89%  bench  libc-2.17.so       [.] syscall
 13.45%  bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp
  6.25%  bench  [kernel.kallsyms]  [k] __secure_computing
  3.96%  bench  [kernel.kallsyms]  [k] syscall_trace_exit

--x86_64-- large filter
old BPF: 8.6 seconds
    73.38%    bench  [kernel.kallsyms]  [k] sk_run_filter
    10.70%    bench  libc-2.15.so       [.] syscall
     5.09%    bench  [kernel.kallsyms]  [k] seccomp_bpf_load
     1.97%    bench  [kernel.kallsyms]  [k] system_call
new BPF: 5.7 seconds
    66.20%    bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp
    16.75%    bench  libc-2.15.so       [.] syscall
     3.31%    bench  [kernel.kallsyms]  [k] system_call
     2.88%    bench  [kernel.kallsyms]  [k] __secure_computing

--i386-- large filter
old BPF: 5.4 sec
new BPF: 3.8 sec

--arm32-- large filter
old BPF: 13.5 sec
 73.88%  bench  [kernel.kallsyms]  [k] sk_run_filter
 10.29%  bench  [kernel.kallsyms]  [k] vector_swi
  6.46%  bench  libc-2.17.so       [.] syscall
  2.94%  bench  [kernel.kallsyms]  [k] seccomp_bpf_load
  1.19%  bench  [kernel.kallsyms]  [k] __secure_computing
  0.87%  bench  [kernel.kallsyms]  [k] sys_getuid
new BPF: 13.5 sec
 76.08%  bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp
 10.98%  bench  [kernel.kallsyms]  [k] vector_swi
  5.87%  bench  libc-2.17.so       [.] syscall
  1.77%  bench  [kernel.kallsyms]  [k] __secure_computing
  0.93%  bench  [kernel.kallsyms]  [k] sys_getuid

BPF filters generated by seccomp are very branchy, so the new
internal BPF performance is better than the old one. Performance
gains will be even higher when BPF JIT is committed for the
new structure, which is planned in future work (as successive
JIT migrations).

BPF has also been stress-tested with trinity's BPF fuzzer.

Joint work with Daniel Borkmann.
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <pmoore@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: linux-kernel@vger.kernel.org
Acked-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd4cf0ed

net: isdn: use sk_unattached_filter api · 77e0114a

由 Daniel Borkmann 提交于 3月 28, 2014

Similarly as in ppp, we need to migrate the ISDN/PPP code to make use
of the sk_unattached_filter api in order to decouple having direct
filter structure access. By using sk_unattached_filter_{create,destroy},
we can allow for the possibility to jit compile filters for faster
filter verdicts as well.

Joint work with Alexei Starovoitov.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: isdn4linux@listserv.isdn4linux.de
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77e0114a

net: ptp: do not reimplement PTP/BPF classifier · 164d8c66

由 Daniel Borkmann 提交于 3月 28, 2014

There are currently pch_gbe, cpts, and ixp4xx_eth drivers that open-code
and reimplement a BPF classifier for the PTP protocol. Since all of them
effectively do the very same thing and load the very same PTP/BPF filter,
we can just consolidate that code by introducing ptp_classify_raw() in
the time-stamping core framework which can be used in drivers.

As drivers get initialized after bootstrapping the core networking
subsystem, they can make use of ptp_insns wrapped through
ptp_classify_raw(), which allows to simplify and remove PTP classifier
setup code in drivers.

Joint work with Alexei Starovoitov.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Cc: Richard Cochran <richard.cochran@omicron.at>
Cc: Jiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

164d8c66

net: ptp: use sk_unattached_filter_create() for BPF · e62d2df0

由 Daniel Borkmann 提交于 3月 28, 2014

This patch migrates an open-coded sk_run_filter() implementation with
proper use of the BPF API, that is, sk_unattached_filter_create(). This
migration is needed, as we will be internally transforming the filter
to a different representation, and therefore needs to be decoupled.

It is okay to do so as skb_timestamping_init() is called during
initialization of the network stack in core initcall via sock_init().
This would effectively also allow for PTP filters to be jit compiled if
bpf_jit_enable is set.

For better readability, there are also some newlines introduced, also
ptp_classify.h is only in kernel space.

Joint work with Alexei Starovoitov.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Cc: Richard Cochran <richard.cochran@omicron.at>
Cc: Jiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e62d2df0

net: filter: move filter accounting to filter core · fbc907f0

由 Daniel Borkmann 提交于 3月 28, 2014

This patch basically does two things, i) removes the extern keyword
from the include/linux/filter.h file to be more consistent with the
rest of Joe's changes, and ii) moves filter accounting into the filter
core framework.

Filter accounting mainly done through sk_filter_{un,}charge() take
care of the case when sockets are being cloned through sk_clone_lock()
so that removal of the filter on one socket won't result in eviction
as it's still referenced by the other.

These functions actually belong to net/core/filter.c and not
include/net/sock.h as we want to keep all that in a central place.
It's also not in fast-path so uninlining them is fine and even allows
us to get rd of sk_filter_release_rcu()'s EXPORT_SYMBOL and a forward
declaration.

Joint work with Alexei Starovoitov.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fbc907f0

net: filter: keep original BPF program around · a3ea269b

由 Daniel Borkmann 提交于 3月 28, 2014

In order to open up the possibility to internally transform a BPF program
into an alternative and possibly non-trivial reversible representation, we
need to keep the original BPF program around, so that it can be passed back
to user space w/o the need of a complex decoder.

The reason for that use case resides in commit a8fc9277 ("sk-filter:
Add ability to get socket filter program (v2)"), that is, the ability
to retrieve the currently attached BPF filter from a given socket used
mainly by the checkpoint-restore project, for example.

Therefore, we add two helpers sk_{store,release}_orig_filter for taking
care of that. In the sk_unattached_filter_create() case, there's no such
possibility/requirement to retrieve a loaded BPF program. Therefore, we
can spare us the work in that case.

This approach will simplify and slightly speed up both, sk_get_filter()
and sock_diag_put_filterinfo() handlers as we won't need to successively
decode filters anymore through sk_decode_filter(). As we still need
sk_decode_filter() later on, we're keeping it around.

Joint work with Alexei Starovoitov.
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3ea269b

net: filter: add jited flag to indicate jit compiled filters · f8bbbfc3

由 Daniel Borkmann 提交于 3月 28, 2014

This patch adds a jited flag into sk_filter struct in order to indicate
whether a filter is currently jited or not. The size of sk_filter is
not being expanded as the 32 bit 'len' member allows upper bits to be
reused since a filter can currently only grow as large as BPF_MAXINSNS.

Therefore, there's enough room also for other in future needed flags to
reuse 'len' field if necessary. The jited flag also allows for having
alternative interpreter functions running as currently, we can only
detect jit compiled filters by testing fp->bpf_func to not equal the
address of sk_run_filter().

Joint work with Alexei Starovoitov.
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8bbbfc3

ext4: atomically set inode->i_flags in ext4_set_inode_flags() · 00a1a053

由 Theodore Ts'o 提交于 3月 30, 2014

Use cmpxchg() to atomically set i_flags instead of clearing out the
S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race
where an immutable file has the immutable flag cleared for a brief
window of time.
Reported-by: NJohn Sullivan <jsrhbz@kanargh.force9.co.uk>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

00a1a053

30 3月, 2014 3 次提交

netpoll: Respect NETIF_F_LLTX · 5efeac44

由 Eric W. Biederman 提交于 3月 27, 2014

Stop taking the transmit lock when a network device has specified
NETIF_F_LLTX.

If no locks needed to trasnmit a packet this is the ideal scenario for
netpoll as all packets can be trasnmitted immediately.

Even if some locks are needed in ndo_start_xmit skipping any unnecessary
serialization is desirable for netpoll as it makes it more likely a
debugging packet may be trasnmitted immediately instead of being
deferred until later.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5efeac44

netpoll: Rename netpoll_rx_enable/disable to netpoll_poll_disable/enable · 66b5552f

由 Eric W. Biederman 提交于 3月 27, 2014

The netpoll_rx_enable and netpoll_rx_disable functions have always
controlled polling the network drivers transmit and receive queues.

Rename them to netpoll_poll_enable and netpoll_poll_disable to make
their functionality clear.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66b5552f

netpoll: Remove gfp parameter from __netpoll_setup · a8779ec1

由 Eric W. Biederman 提交于 3月 27, 2014

The gfp parameter was added in:
commit 47be03a2
Author: Amerigo Wang <amwang@redhat.com>
Date:   Fri Aug 10 01:24:37 2012 +0000

    netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup()

    slave_enable_netpoll() and __netpoll_setup() may be called
    with read_lock() held, so should use GFP_ATOMIC to allocate
    memory. Eric suggested to pass gfp flags to __netpoll_setup().

    Cc: Eric Dumazet <eric.dumazet@gmail.com>
    Cc: "David S. Miller" <davem@davemloft.net>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

The reason for the gfp parameter was removed in:
commit c4cdef9b
Author: dingtianhong <dingtianhong@huawei.com>
Date:   Tue Jul 23 15:25:27 2013 +0800

    bonding: don't call slave_xxx_netpoll under spinlocks

    The slave_xxx_netpoll will call synchronize_rcu_bh(),
    so the function may schedule and sleep, it should't be
    called under spinlocks.

    bond_netpoll_setup() and bond_netpoll_cleanup() are always
    protected by rtnl lock, it is no need to take the read lock,
    as the slave list couldn't be changed outside rtnl lock.
Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
    Cc: Jay Vosburgh <fubar@us.ibm.com>
    Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

Nothing else that calls __netpoll_setup or ndo_netpoll_setup
requires a gfp paramter, so remove the gfp parameter from both
of these functions making the code clearer.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8779ec1

29 3月, 2014 6 次提交

workqueue: remove deprecated WQ_NON_REENTRANT · 59ff3eb6

由 ZhangZhen 提交于 3月 27, 2014

Tejun Heo has made WQ_NON_REENTRANT useless in the dbf2576e
("workqueue: make all workqueues non-reentrant"). So remove its
usages and definition.

This patch doesn't introduce any behavior changes.

tj: minor description updates.
Signed-off-by: NZhangZhen <zhenzhang.zhang@huawei.com>
Sigend-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NJames Chapman <jchapman@katalix.com>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>

59ff3eb6

spi: bitbang: Make spi_bitbang_stop() return void · d9721ae1

由 Axel Lin 提交于 3月 29, 2014

spi_bitbang_stop() never fails, so make it return void.
Signed-off-by: NAxel Lin <axel.lin@ingics.com>
Signed-off-by: NMark Brown <broonie@linaro.org>

d9721ae1

vlan: Warn the user if lowerdev has bad vlan features. · 2adb956b

由 Vlad Yasevich 提交于 3月 27, 2014

Some drivers incorrectly assign vlan acceleration features to
vlan_features thus causing issues for Q-in-Q vlan configurations.
Warn the user of such cases.
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2adb956b

net: Account for all vlan headers in skb_mac_gso_segment · 53d6471c

由 Vlad Yasevich 提交于 3月 27, 2014

skb_network_protocol() already accounts for multiple vlan
headers that may be present in the skb.  However, skb_mac_gso_segment()
doesn't know anything about it and assumes that skb->mac_len
is set correctly to skip all mac headers.  That may not
always be the case.  If we are simply forwarding the packet (via
bridge or macvtap), all vlan headers may not be accounted for.

A simple solution is to allow skb_network_protocol to return
the vlan depth it has calculated.  This way skb_mac_gso_segment
will correctly skip all mac headers.
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53d6471c

ipv6: move DAD and addrconf_verify processing to workqueue · c15b1cca

由 Hannes Frederic Sowa 提交于 3月 27, 2014

addrconf_join_solict and addrconf_join_anycast may cause actions which
need rtnl locked, especially on first address creation.

A new DAD state is introduced which defers processing of the initial
DAD processing into a workqueue.

To get rtnl lock we need to push the code paths which depend on those
calls up to workqueues, specifically addrconf_verify and the DAD
processing.

(v2)
addrconf_dad_failure needs to be queued up to the workqueue, too. This
patch introduces a new DAD state and stop the DAD processing in the
workqueue (this is because of the possible ipv6_del_addr processing
which removes the solicited multicast address from the device).

addrconf_verify_lock is removed, too. After the transition it is not
needed any more.

As we are not processing in bottom half anymore we need to be a bit more
careful about disabling bottom half out when we lock spin_locks which are also
used in bh.

Relevant backtrace:
[  541.030090] RTNL: assertion failed at net/core/dev.c (4496)
[  541.031143] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O 3.10.33-1-amd64-vyatta #1
[  541.031145] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[  541.031146]  ffffffff8148a9f0 000000000000002f ffffffff813c98c1 ffff88007c4451f8
[  541.031148]  0000000000000000 0000000000000000 ffffffff813d3540 ffff88007fc03d18
[  541.031150]  0000880000000006 ffff88007c445000 ffffffffa0194160 0000000000000000
[  541.031152] Call Trace:
[  541.031153]  <IRQ>  [<ffffffff8148a9f0>] ? dump_stack+0xd/0x17
[  541.031180]  [<ffffffff813c98c1>] ? __dev_set_promiscuity+0x101/0x180
[  541.031183]  [<ffffffff813d3540>] ? __hw_addr_create_ex+0x60/0xc0
[  541.031185]  [<ffffffff813cfe1a>] ? __dev_set_rx_mode+0xaa/0xc0
[  541.031189]  [<ffffffff813d3a81>] ? __dev_mc_add+0x61/0x90
[  541.031198]  [<ffffffffa01dcf9c>] ? igmp6_group_added+0xfc/0x1a0 [ipv6]
[  541.031208]  [<ffffffff8111237b>] ? kmem_cache_alloc+0xcb/0xd0
[  541.031212]  [<ffffffffa01ddcd7>] ? ipv6_dev_mc_inc+0x267/0x300 [ipv6]
[  541.031216]  [<ffffffffa01c2fae>] ? addrconf_join_solict+0x2e/0x40 [ipv6]
[  541.031219]  [<ffffffffa01ba2e9>] ? ipv6_dev_ac_inc+0x159/0x1f0 [ipv6]
[  541.031223]  [<ffffffffa01c0772>] ? addrconf_join_anycast+0x92/0xa0 [ipv6]
[  541.031226]  [<ffffffffa01c311e>] ? __ipv6_ifa_notify+0x11e/0x1e0 [ipv6]
[  541.031229]  [<ffffffffa01c3213>] ? ipv6_ifa_notify+0x33/0x50 [ipv6]
[  541.031233]  [<ffffffffa01c36c8>] ? addrconf_dad_completed+0x28/0x100 [ipv6]
[  541.031241]  [<ffffffff81075c1d>] ? task_cputime+0x2d/0x50
[  541.031244]  [<ffffffffa01c38d6>] ? addrconf_dad_timer+0x136/0x150 [ipv6]
[  541.031247]  [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]
[  541.031255]  [<ffffffff8105313a>] ? call_timer_fn.isra.22+0x2a/0x90
[  541.031258]  [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]

Hunks and backtrace stolen from a patch by Stephen Hemminger.
Reported-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c15b1cca

net: net: add a core netdev->tx_dropped counter · 015f0688

由 Eric Dumazet 提交于 3月 27, 2014

Dropping packets in __dev_queue_xmit() when transmit queue
is stopped (NIC TX ring buffer full or BQL limit reached) currently
outputs a syslog message.

It would be better to get a precise count of such events available in
netdevice stats so that monitoring tools can have a clue.

This extends the work done in caf586e5
("net: add a core netdev->rx_dropped counter")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

015f0688