提交 · 5c90067c0f79705b8fa3207c734b18727f320637 · openeuler / Kernel

24 4月, 2015 2 次提交

mac80211: remove IEEE80211_RX_RA_MATCH · 5c90067c

由 Johannes Berg 提交于 4月 22, 2015

With promisc support gone, only AP and P2P-Device type interfaces
still clear IEEE80211_RX_RA_MATCH. In both cases this isn't really
necessary though, so we can remove that flag and the code.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

5c90067c

mac80211: remove support for IFF_PROMISC · df140465

由 Johannes Berg 提交于 4月 22, 2015

This support is essentially useless as typically networks are encrypted,
frames will be filtered by hardware, and rate scaling will be done with
the intended recipient in mind. For real monitoring of the network, the
monitor mode support should be used instead.

Removing it removes a lot of corner cases.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

df140465

23 4月, 2015 1 次提交

mac80211: make station hash table max_size configurable · ebd82b39

由 Johannes Berg 提交于 4月 23, 2015

Allow debug builds to configure the station hash table maximum
size in order to run with hash collisions in limited scenarios
such as hwsim testing. The default remains 0 which effectively
means no limit.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

ebd82b39

22 4月, 2015 7 次提交

mac80211: allow segmentation offloads · 80616c0d

由 Johannes Berg 提交于 4月 14, 2015

Implement the necessary software segmentation on the normal
TX path so that fast-xmit can use segmentation offload if
the hardware (or driver) supports it.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

80616c0d

mac80211: allow drivers to support S/G · 680a0dab

由 Johannes Berg 提交于 4月 13, 2015

If drivers want to support S/G (really just gather DMA on TX) then
we can now easily support this on the fast-xmit path since it just
needs to write to the ethernet header (and already has a check for
that being possible.)

However, disallow this on the regular TX path (which has to handle
fragmentation, software crypto, etc.) by calling skb_linearize().

Also allow the related HIGHDMA since that's not interesting to the
code in mac80211 at all anyway.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

680a0dab

mac80211: allow checksum offload only in fast-xmit · 2d981fdd

由 Johannes Berg 提交于 4月 10, 2015

When we go through the complete TX processing, there are a number
of things like fragmentation and software crypto that require the
checksum to be calculated already.

In favour of maintainability, instead of adding the necessary call
to skb_checksum_help() in all the places that need it, just do it
once before the regular TX processing.

Right now this only affects the TI wlcore and QCA ath10k drivers
since they're the only ones using checksum offload. The previous
commits enabled fast-xmit for them in almost all cases.

For wlcore this even fixes a corner case: when a key fails to be
programmed to hardware software encryption gets used, encrypting
frames with a bad checksum.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

2d981fdd

mac80211: extend fast-xmit to cover IBSS · 3ffd8840

由 Johannes Berg 提交于 4月 14, 2015

IBSS can be supported very easily since it uses the standard station
authorization state etc. so it just needs to be covered by the header
building switch statement.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

3ffd8840

mac80211: extend fast-xmit for more ciphers · e495c247

由 Johannes Berg 提交于 4月 10, 2015

When crypto is offloaded then in some cases it's all handled
by the device, and in others only some space for the IV must
be reserved in the frame. Handle both of these cases in the
fast-xmit path, up to a limit of 18 bytes of space for IVs.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

e495c247

mac80211: extend fast-xmit to driver fragmentation · 725b812c

由 Johannes Berg 提交于 4月 10, 2015

If the driver handles fragmentation then it wouldn't
be done in software so we can still use the fast-xmit
path in that case.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

725b812c

mac80211: add TX fastpath · 17c18bf8

由 Johannes Berg 提交于 3月 21, 2015

In order to speed up mac80211's TX path, add the "fast-xmit" cache
that will cache the data frame 802.11 header and other data to be
able to build the frame more quickly. This cache is rebuilt when
external triggers imply changes, but a lot of the checks done per
packet today are simplified away to the check for the cache.

There's also a more detailed description in the code.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

17c18bf8

20 4月, 2015 2 次提交

mac80211: lock rate control · 35c347ac

由 Johannes Berg 提交于 3月 05, 2015

Both minstrel (reported by Sven Eckelmann) and the iwlwifi rate
control aren't properly taking concurrency into account. It's
likely that the same is true for other rate control algorithms.

In the case of minstrel this manifests itself in crashes when an
update and other data access are run concurrently, for example
when the stations change bandwidth or similar. In iwlwifi, this
can cause firmware crashes.

Since fixing all rate control algorithms will be very difficult,
just provide locking for invocations. This protects the internal
data structures the algorithms maintain.

I've manipulated hostapd to test this, by having it change its
advertised bandwidth roughly ever 150ms. At the same time, I'm
running a flood ping between the client and the AP, which causes
this race of update vs. get_rate/status to easily happen on the
client. With this change, the system survives this test.
Reported-by: NSven Eckelmann <sven@open-mesh.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

35c347ac

mac80211: introduce plink lock for plink fields · 48bf6bed

由 Bob Copeland 提交于 4月 13, 2015

The mesh plink code uses sta->lock to serialize access to the
plink state fields between the peer link state machine and the
peer link timer.  Some paths (e.g. those involving
mps_qos_null_tx()) unfortunately hold this spinlock across
frame tx, which is soon to be disallowed.  Add a new spinlock
just for plink access.
Signed-off-by: NBob Copeland <me@bobcopeland.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

48bf6bed

18 4月, 2015 5 次提交

net: dsa: use DEVICE_ATTR_RW to declare temp1_max · e3122b7f

由 Vivien Didelot 提交于 4月 17, 2015

Since commit da4759c7 (sysfs: Use only return value from is_visible for
the file mode), it is possible to reduce the permissions of a file.

So declare temp1_max with the DEVICE_ATTR_RW macro and remove the write
permission in dsa_hwmon_attrs_visible if set_temp_limit isn't provided.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3122b7f

net: remove unused 'dev' argument from netif_needs_gso() · 8b86a61d

由 Johannes Berg 提交于 4月 17, 2015

In commit 04ffcb25 ("net: Add ndo_gso_check") Tom originally
added the 'dev' argument to be able to call ndo_gso_check().

Then later, when generalizing this in commit 5f35227e
("net: Generalize ndo_gso_check to ndo_features_check")
Jesse removed the call to ndo_gso_check() in netif_needs_gso()
by calling the new ndo_features_check() in a different place.
This made the 'dev' argument unused.

Remove the unused argument and go back to the code as before.

Cc: Tom Herbert <therbert@google.com>
Cc: Jesse Gross <jesse@nicira.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b86a61d

act_mirred: Fix bogus header when redirecting from VLAN · f40ae913

由 Herbert Xu 提交于 4月 17, 2015

When you redirect a VLAN device to any device, you end up with
crap in af_packet on the xmit path because hard_header_len is
not equal to skb->mac_len.  So the redirected packet contains
four extra bytes at the start which then gets interpreted as
part of the MAC address.

This patch fixes this by only pushing skb->mac_len.  We also
need to fix ifb because it tries to undo the pushing done by
act_mirred.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f40ae913

inet_diag: fix access to tcp cc information · 521f1cf1

由 Eric Dumazet 提交于 4月 16, 2015

Two different problems are fixed here :

1) inet_sk_diag_fill() might be called without socket lock held.
   icsk->icsk_ca_ops can change under us and module be unloaded.
   -> Access to freed memory.
   Fix this using rcu_read_lock() to prevent module unload.

2) Some TCP Congestion Control modules provide information
   but again this is not safe against icsk->icsk_ca_ops
   change and nla_put() errors were ignored. Some sockets
   could not get the additional info if skb was almost full.

Fix this by returning a status from get_info() handlers and
using rcu protection as well.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

521f1cf1

tcp: tcp_get_info() should fetch socket fields once · fad9dfef

由 Eric Dumazet 提交于 4月 16, 2015

tcp_get_info() can be called without holding socket lock,
so any socket fields can change under us.

Use READ_ONCE() to fetch sk_pacing_rate and sk_max_pacing_rate

Fixes: 977cb0ec ("tcp: add pacing_rate information into tcp_info")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fad9dfef

17 4月, 2015 5 次提交

skbuff: Do not scrub skb mark within the same name space · 213dd74a

由 Herbert Xu 提交于 4月 16, 2015

On Wed, Apr 15, 2015 at 05:41:26PM +0200, Nicolas Dichtel wrote:
> Le 15/04/2015 15:57, Herbert Xu a écrit :
> >On Wed, Apr 15, 2015 at 06:22:29PM +0800, Herbert Xu wrote:
> [snip]
> >Subject: skbuff: Do not scrub skb mark within the same name space
> >
> >The commit ea23192e ("tunnels:
> Maybe add a Fixes tag?
> Fixes: ea23192e ("tunnels: harmonize cleanup done on skb on rx path")
>
> >harmonize cleanup done on skb on rx path") broke anyone trying to
> >use netfilter marking across IPv4 tunnels.  While most of the
> >fields that are cleared by skb_scrub_packet don't matter, the
> >netfilter mark must be preserved.
> >
> >This patch rearranges skb_scurb_packet to preserve the mark field.
> nit: s/scurb/scrub
>
> Else it's fine for me.

Sure.

PS I used the wrong email for James the first time around.  So
let me repeat the question here.  Should secmark be preserved
or cleared across tunnels within the same name space? In fact,
do our security models even support name spaces?

---8<---
The commit ea23192e ("tunnels:
harmonize cleanup done on skb on rx path") broke anyone trying to
use netfilter marking across IPv4 tunnels.  While most of the
fields that are cleared by skb_scrub_packet don't matter, the
netfilter mark must be preserved.

This patch rearranges skb_scrub_packet to preserve the mark field.

Fixes: ea23192e ("tunnels: harmonize cleanup done on skb on rx path")
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

213dd74a

Revert "net: Reset secmark when scrubbing packet" · 4c0ee414

由 Herbert Xu 提交于 4月 16, 2015

This patch reverts commit b8fb4e06
because the secmark must be preserved even when a packet crosses
namespace boundaries.  The reason is that security labels apply to
the system as a whole and is not per-namespace.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c0ee414

bpf: fix bpf helpers to use skb->mac_header relative offsets · a166151c

由 Alexei Starovoitov 提交于 4月 15, 2015

For the short-term solution, lets fix bpf helper functions to use
skb->mac_header relative offsets instead of skb->data in order to
get the same eBPF programs with cls_bpf and act_bpf work on ingress
and egress qdisc path. We need to ensure that mac_header is set
before calling into programs. This is effectively the first option
from below referenced discussion.

More long term solution for LD_ABS|LD_IND instructions will be more
intrusive but also more beneficial than this, and implemented later
as it's too risky at this point in time.

I.e., we plan to look into the option of moving skb_pull() out of
eth_type_trans() and into netif_receive_skb() as has been suggested
as second option. Meanwhile, this solution ensures ingress can be
used with eBPF, too, and that we won't run into ABI troubles later.
For dealing with negative offsets inside eBPF helper functions,
we've implemented bpf_skb_clone_unwritable() to test for unwriteable
headers.

Reference: http://thread.gmane.org/gmane.linux.network/359129/focus=359694
Fixes: 608cd71a ("tc: bpf: generalize pedit action")
Fixes: 91bc4822 ("tc: bpf: add checksum helpers")
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a166151c

netns: remove duplicated include from net_namespace.c · 5a950ad5

由 Wei Yongjun 提交于 4月 16, 2015

Remove duplicated include.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a950ad5

fou: avoid missing unlock in failure path · 540207ae

由 WANG Cong 提交于 4月 15, 2015

Fixes: 7a6c8c34 ("fou: implement FOU_CMD_GET")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

540207ae

16 4月, 2015 2 次提交

lib/string_helpers.c: change semantics of string_escape_mem · 41416f23

由 Rasmus Villemoes 提交于 4月 15, 2015

The current semantics of string_escape_mem are inadequate for one of its
current users, vsnprintf().  If that is to honour its contract, it must
know how much space would be needed for the entire escaped buffer, and
string_escape_mem provides no way of obtaining that (short of allocating a
large enough buffer (~4 times input string) to let it play with, and
that's definitely a big no-no inside vsnprintf).

So change the semantics for string_escape_mem to be more snprintf-like:
Return the size of the output that would be generated if the destination
buffer was big enough, but of course still only write to the part of dst
it is allowed to, and (contrary to snprintf) don't do '\0'-termination.
It is then up to the caller to detect whether output was truncated and to
append a '\0' if desired.  Also, we must output partial escape sequences,
otherwise a call such as snprintf(buf, 3, "%1pE", "\123") would cause
printf to write a \0 to buf[2] but leaving buf[0] and buf[1] with whatever
they previously contained.

This also fixes a bug in the escaped_string() helper function, which used
to unconditionally pass a length of "end-buf" to string_escape_mem();
since the latter doesn't check osz for being insanely large, it would
happily write to dst.  For example, kasprintf(GFP_KERNEL, "something and
then %pE", ...); is an easy way to trigger an oops.

In test-string_helpers.c, the -ENOMEM test is replaced with testing for
getting the expected return value even if the buffer is too small.  We
also ensure that nothing is written (by relying on a NULL pointer deref)
if the output size is 0 by passing NULL - this has to work for
kasprintf("%pE") to work.

In net/sunrpc/cache.c, I think qword_add still has the same semantics.
Someone should definitely double-check this.

In fs/proc/array.c, I made the minimum possible change, but longer-term it
should stop poking around in seq_file internals.

[andriy.shevchenko@linux.intel.com: simplify qword_add]
[andriy.shevchenko@linux.intel.com: add missed curly braces]
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

41416f23

kernel: conditionally support non-root users, groups and capabilities · 2813893f

由 Iulia Manda 提交于 4月 15, 2015

There are a lot of embedded systems that run most or all of their
functionality in init, running as root:root.  For these systems,
supporting multiple users is not necessary.

This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for
non-root users, non-root groups, and capabilities optional.  It is enabled
under CONFIG_EXPERT menu.

When this symbol is not defined, UID and GID are zero in any possible case
and processes always have all capabilities.

The following syscalls are compiled out: setuid, setregid, setgid,
setreuid, setresuid, getresuid, setresgid, getresgid, setgroups,
getgroups, setfsuid, setfsgid, capget, capset.

Also, groups.c is compiled out completely.

In kernel/capability.c, capable function was moved in order to avoid
adding two ifdef blocks.

This change saves about 25 KB on a defconfig build.  The most minimal
kernels have total text sizes in the high hundreds of kB rather than
low MB.  (The 25k goes down a bit with allnoconfig, but not that much.

The kernel was booted in Qemu.  All the common functionalities work.
Adding users/groups is not possible, failing with -ENOSYS.

Bloat-o-meter output:
add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650)

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NIulia Manda <iulia.manda21@gmail.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Tested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2813893f

15 4月, 2015 1 次提交

mm: remove GFP_THISNODE · 4167e9b2

由 David Rientjes 提交于 4月 14, 2015

NOTE: this is not about __GFP_THISNODE, this is only about GFP_THISNODE.

GFP_THISNODE is a secret combination of gfp bits that have different
behavior than expected.  It is a combination of __GFP_THISNODE,
__GFP_NORETRY, and __GFP_NOWARN and is special-cased in the page
allocator slowpath to fail without trying reclaim even though it may be
used in combination with __GFP_WAIT.

An example of the problem this creates: commit e97ca8e5 ("mm: fix
GFP_THISNODE callers and clarify") fixed up many users of GFP_THISNODE
that really just wanted __GFP_THISNODE.  The problem doesn't end there,
however, because even it was a no-op for alloc_misplaced_dst_page(),
which also sets __GFP_NORETRY and __GFP_NOWARN, and
migrate_misplaced_transhuge_page(), where __GFP_NORETRY and __GFP_NOWAIT
is set in GFP_TRANSHUGE.  Converting GFP_THISNODE to __GFP_THISNODE is a
no-op in these cases since the page allocator special-cases
__GFP_THISNODE && __GFP_NORETRY && __GFP_NOWARN.

It's time to just remove GFP_THISNODE entirely.  We leave __GFP_THISNODE
to restrict an allocation to a local node, but remove GFP_THISNODE and
its obscurity.  Instead, we require that a caller clear __GFP_WAIT if it
wants to avoid reclaim.

This allows the aforementioned functions to actually reclaim as they
should.  It also enables any future callers that want to do
__GFP_THISNODE but also __GFP_NORETRY && __GFP_NOWARN to reclaim.  The
rule is simple: if you don't want to reclaim, then don't set __GFP_WAIT.

Aside: ovs_flow_stats_update() really wants to avoid reclaim as well, so
it is unchanged.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: NPekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Pravin Shelar <pshelar@nicira.com>
Cc: Jarno Rajahalme <jrajahalme@nicira.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4167e9b2

14 4月, 2015 13 次提交

tcp/dccp: get rid of central timewait timer · 789f558c

由 Eric Dumazet 提交于 4月 12, 2015

Using a timer wheel for timewait sockets was nice ~15 years ago when
memory was expensive and machines had a single processor.

This does not scale, code is ugly and source of huge latencies
(Typically 30 ms have been seen, cpus spinning on death_lock spinlock.)

We can afford to use an extra 64 bytes per timewait sock and spread
timewait load to all cpus to have better behavior.

Tested:

On following test, /proc/sys/net/ipv4/tcp_tw_recycle is set to 1
on the target (lpaa24)

Before patch :

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
419594

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
437171

While test is running, we can observe 25 or even 33 ms latencies.

lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
...
1000 packets transmitted, 1000 received, 0% packet loss, time 20601ms
rtt min/avg/max/mdev = 0.020/0.217/25.771/1.535 ms, pipe 2

lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
...
1000 packets transmitted, 1000 received, 0% packet loss, time 20702ms
rtt min/avg/max/mdev = 0.019/0.183/33.761/1.441 ms, pipe 2

After patch :

About 90% increase of throughput :

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
810442

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
800992

And latencies are kept to minimal values during this load, even
if network utilization is 90% higher :

lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
...
1000 packets transmitted, 1000 received, 0% packet loss, time 19991ms
rtt min/avg/max/mdev = 0.023/0.064/0.360/0.042 ms
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

789f558c

netfilter: Fix format string of nfnetlink_log proc file · 20a1d165

由 Richard Weinberger 提交于 4月 13, 2015

The printed values are all of type unsigned integer, therefore use
%u instead of %d. Otherwise an user can face negative values.
Signed-off-by: NRichard Weinberger <richard@nod.at>
Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

20a1d165

netfilter: Fix format string of nfnetlink_queue proc file · 6b46f7b7

由 Richard Weinberger 提交于 4月 13, 2015

The printed values are all of type unsigned integer, therefore use
%u instead of %d. Otherwise an user can face negative values.

Fixes:
$ cat /proc/net/netfilter/nfnetlink_queue
    0  29508   278 2 65531     0 2004213241 -2129885586  1
    1 -27747     0 2 65531     0     0        0  1
    2 -27748     0 2 65531     0     0        0  1
Signed-off-by: NRichard Weinberger <richard@nod.at>
Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b46f7b7

netfilter: Fix portid types · cc6bc448

由 Richard Weinberger 提交于 4月 13, 2015

The netlink portid is an unsigned integer, use this type
also in netfilter.
Signed-off-by: NRichard Weinberger <richard@nod.at>
Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc6bc448

nfc: Fix portid type in urelease_work · 65bc4f93

由 Richard Weinberger 提交于 4月 13, 2015

portid is an unsigned integer. Fix urelease_work to
match all other portid user in the kernel.
Signed-off-by: NRichard Weinberger <richard@nod.at>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65bc4f93

netfilter: nf_tables: get rid of the expression example code · 97bb43c3

由 Pablo Neira Ayuso 提交于 4月 13, 2015

There's an example net/netfilter/nft_expr_template.c example file in tree that
got out of sync along time, remove it.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Acked-by: NPatrick McHardy <kaber@trash.net>

97bb43c3

netfilter: nft_dynset: dynamic stateful expression instantiation · 3e135cd4

由 Patrick McHardy 提交于 4月 11, 2015

Support instantiating stateful expressions based on a template that
are associated with dynamically created set entries. The expressions
are evaluated when adding or updating the set element.

This allows to maintain per flow state using the existing set
infrastructure and expression types, with arbitrary definitions of
a flow.

Usage is currently restricted to anonymous sets, meaning only a single
binding can exist, since the desired semantics of multiple independant
bindings haven't been defined so far.

Examples (userspace syntax is still WIP):

1. Limit the rate of new SSH connections per host, similar to iptables
   hashlimit:

	flow ip saddr timeout 60s \
	limit 10/second \
	accept

2. Account network traffic between each set of /24 networks:

	flow ip saddr & 255.255.255.0 . ip daddr & 255.255.255.0 \
	counter

3. Account traffic to each host per user:

	flow skuid . ip daddr \
	counter

4. Account traffic for each combination of source address and TCP flags:

	flow ip saddr . tcp flags \
	counter

The resulting set content after a Xmas-scan look like this:

{
	192.168.122.1 . fin | psh | urg : counter packets 1001 bytes 40040,
	192.168.122.1 . ack : counter packets 74 bytes 3848,
	192.168.122.1 . psh | ack : counter packets 35 bytes 3144
}
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

3e135cd4

netfilter: nf_tables: add flag to indicate set contains expressions · 7c6c6e95

由 Patrick McHardy 提交于 4月 11, 2015

Add a set flag to indicate that the set is used as a state table and
contains expressions for evaluation. This operation is mutually
exclusive with the mapping operation, so sets specifying both are
rejected. The lookup expression also rejects binding to state tables
since it only deals with loopup and map operations.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

7c6c6e95

netfilter: nf_tables: mark stateful expressions · 151d799a

由 Patrick McHardy 提交于 4月 11, 2015

Add a flag to mark stateful expressions.

This is used for dynamic expression instanstiation to limit the usable
expressions. Strictly speaking only the dynset expression can not be
used in order to avoid recursion, but since dynamically instantiating
non-stateful expressions will simply create an identical copy, which
behaves no differently than the original, this limits to expressions
where it actually makes sense to dynamically instantiate them.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

151d799a

netfilter: nf_tables: prepare for expressions associated to set elements · f25ad2e9

由 Patrick McHardy 提交于 4月 11, 2015

Preparation to attach expressions to set elements: add a set extension
type to hold an expression and dump the expression information with the
set element.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

f25ad2e9

netfilter: nf_tables: add helper functions for expression handling · 0b2d8a7b

由 Patrick McHardy 提交于 4月 11, 2015

Add helper functions for initializing, cloning, dumping and destroying
a single expression that is not part of a rule.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

0b2d8a7b

tcp: fix bogus RTT for CC when retransmissions are acked · 3d0d26c7

由 Kenneth Klette Jonassen 提交于 4月 11, 2015

Since retransmitted segments are not used for RTT estimation, previously
SACKed segments present in the rtx queue are used. This estimation can be
several times larger than the actual RTT. When a cumulative ack covers both
previously SACKed and retransmitted segments, CC may thus get a bogus RTT.

Such segments previously had an RTT estimation in tcp_sacktag_one(), so it
seems reasonable to not reuse them in tcp_clean_rtx_queue() at all.

Afaik, this has had no effect on SRTT/RTO because of Karn's check.
Signed-off-by: NKenneth Klette Jonassen <kennetkl@ifi.uio.no>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Tested-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d0d26c7

net: use jump label patching for ingress qdisc in __netif_receive_skb_core · 4577139b

由 Daniel Borkmann 提交于 4月 10, 2015

Even if we make use of classifier and actions from the egress
path, we're going into handle_ing() executing additional code
on a per-packet cost for ingress qdisc, just to realize that
nothing is attached on ingress.

Instead, this can just be blinded out as a no-op entirely with
the use of a static key. On input fast-path, we already make
use of static keys in various places, e.g. skb time stamping,
in RPS, etc. It makes sense to not waste time when we're assured
that no ingress qdisc is attached anywhere.

Enabling/disabling of that code path is being done via two
helpers, namely net_{inc,dec}_ingress_queue(), that are being
invoked under RTNL mutex when a ingress qdisc is being either
initialized or destructed.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4577139b

13 4月, 2015 2 次提交

netfilter: nf_tables: variable sized set element keys / data · 7d740264

由 Patrick McHardy 提交于 4月 11, 2015

This patch changes sets to support variable sized set element keys / data
up to 64 bytes each by using variable sized set extensions. This allows
to use concatenations with bigger data items suchs as IPv6 addresses.

As a side effect, small keys/data now don't require the full 16 bytes
of struct nft_data anymore but just the space they need.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

7d740264

netfilter: nf_tables: support variable sized data in nft_data_init() · d0a11fc3

由 Patrick McHardy 提交于 4月 11, 2015

Add a size argument to nft_data_init() and pass in the available space.
This will be used by the following patches to support variable sized
set element data.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

d0a11fc3

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功