提交 · 4ea2607f7871ca433ab0c5300289215974213f26 · openanolis / cloud-kernel

12 10月, 2017 12 次提交

ipv6: addrconf: don't use rtnl mutex in RTM_GETNETCONF · 4ea2607f

由 Florian Westphal 提交于 10月 11, 2017

Instead of relying on rtnl mutex bump device reference count.
After this change, values reported can change in parallel, but thats not
much different from current state, as anyone can change the settings
right after rtnl_unlock (and before userspace processed reply).

While at it, switch to GFP_KERNEL allocation.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ea2607f

net: sched: remove unused tcf_exts_get_dev helper and cls_flower->egress_dev · 7578d7b4

由 Jiri Pirko 提交于 10月 11, 2017

The helper and the struct field ares no longer used by any code,
so remove them.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7578d7b4

net: sched: convert cls_flower->egress_dev users to tc_setup_cb_egdev infra · 717503b9

由 Jiri Pirko 提交于 10月 11, 2017

The only user of cls_flower->egress_dev is mlx5. So do the conversion
there alongside with the code originating the call in cls_flower
function fl_hw_replace_filter to the newly introduced egress device
callback infrastucture.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

717503b9

net: sched: introduce per-egress action device callbacks · b3f55bdd

由 Jiri Pirko 提交于 10月 11, 2017

Introduce infrastructure that allows drivers to register callbacks that
are called whenever tc would offload inserted rule and specified device
acts as tc action egress device.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b3f55bdd

net: sched: make tc_action_ops->get_dev return dev and avoid passing net · 843e79d0

由 Jiri Pirko 提交于 10月 11, 2017

Return dev directly, NULL if not possible. That is enough.

Makes no sense to pass struct net * to get_dev op, as there is only one
net possible, the one the action was created in. So just store it in
mirred priv and use directly.

Rename the mirred op callback function.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

843e79d0

net: qrtr: Support decoding incoming v2 packets · 194ccc88

由 Bjorn Andersson 提交于 10月 10, 2017

Add the necessary logic for decoding incoming messages of version 2 as
well. Also make sure there's room for the bigger of version 1 and 2
headers in the code allocating skbs for outgoing messages.
Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

194ccc88

net: qrtr: Use sk_buff->cb in receive path · f507a9b6

由 Bjorn Andersson 提交于 10月 10, 2017

Rather than parsing the header of incoming messages throughout the
implementation do it once when we retrieve the message and store the
relevant information in the "cb" member of the sk_buff.

This allows us to, in a later commit, decode version 2 messages into
this same structure.
Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f507a9b6

net: qrtr: Clean up control packet handling · 1a7959c7

由 Bjorn Andersson 提交于 10月 10, 2017

As the message header generation is deferred the internal functions for
generating control packets can be simplified.

This patch modifies qrtr_alloc_ctrl_packet() to, in addition to the
sk_buff, return a reference to a struct qrtr_ctrl_pkt, which clarifies
and simplifies the helpers to the point that these functions can be
folded back into the callers.
Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a7959c7

net: qrtr: Pass source and destination to enqueue functions · e7044482

由 Bjorn Andersson 提交于 10月 10, 2017

Defer writing the message header to the skb until its time to enqueue
the packet. As the receive path is reworked to decode the message header
as it's received from the transport and only pass around the payload in
the skb this change means that we do not have to fill out the full
message header just to decode it immediately in qrtr_local_enqueue().

In the future this change also makes it possible to prepend message
headers based on the version of each link.
Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7044482

net: qrtr: Add control packet definition to uapi · da7653f0

由 Bjorn Andersson 提交于 10月 10, 2017

The QMUX protocol specification defines structure of the special control
packet messages being sent between handlers of the control port.

Add these to the uapi header, as this structure and the associated types
are shared between the kernel and all userspace handlers of control
messages.
Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da7653f0

net: qrtr: Move constants to header file · 28978713

由 Bjorn Andersson 提交于 10月 10, 2017

The constants are used by both the name server and clients, so clarify
their value and move them to the uapi header.
Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

28978713

net: qrtr: Invoke sk_error_report() after setting sk_err · ae85bfa8

由 Bjorn Andersson 提交于 10月 10, 2017

Rather than manually waking up any context sleeping on the sock to
signal an error we should call sk_error_report(). This has the added
benefit that in-kernel consumers can override this notification with
its own callback.
Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae85bfa8

11 10月, 2017 13 次提交

cfg80211: implement regdb signature checking · 90a53e44

由 Johannes Berg 提交于 9月 13, 2017

Currently CRDA implements the signature checking, and the previous
commits added the ability to load the whole regulatory database
into the kernel.

However, we really can't lose the signature checking, so implement
it in the kernel by loading a detached signature (regulatory.db.p7s)
and check it against built-in keys.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

90a53e44

cfg80211: reg: remove support for built-in regdb · c8c240e2

由 Johannes Berg 提交于 10月 15, 2015

Parsing and building C structures from a regdb is no longer needed
since the "firmware" file (regulatory.db) can be linked into the
kernel image to achieve the same effect.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

c8c240e2

cfg80211: support reloading regulatory database · 1ea4ff3e

由 Johannes Berg 提交于 9月 13, 2017

If the regulatory database is loaded, and then updated, it may
be necessary to reload it. Add an nl80211 command to do this.

Note that this just reloads the database, it doesn't re-apply
the rules from it immediately.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

1ea4ff3e

cfg80211: support loading regulatory database as firmware file · 007f6c5e

由 Johannes Berg 提交于 10月 15, 2015

As the current regulatory database is only about 4k big, and already
difficult to extend, we decided that overall it would be better to
get rid of the complications with CRDA and load the database into the
kernel directly, but in a new format that is extensible.

The new file format can be extended since it carries a length field
on all the structs that need to be extensible.

In order to be able to request firmware when the module initializes,
move cfg80211 from subsys_initcall() to the later fs_initcall(); the
firmware loader is at the same level but linked earlier, so it can
be called from there. Otherwise, when both the firmware loader and
cfg80211 are built-in, the request will crash the kernel. We also
need to be before device_initcall() so that cfg80211 is available
for devices when they initialize.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

007f6c5e

mac80211: only remove AP VLAN frames from TXQ · 2a9e2579

由 Johannes Berg 提交于 10月 06, 2017

When removing an AP VLAN interface, mac80211 currently purges
the entire TXQ for the AP interface. Fix this by using the FQ
API introduced in the previous patch to filter frames.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Acked-by: NToke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

2a9e2579

mac80211: aead api to reduce redundancy · 4133da73

由 Xiang Gao 提交于 10月 10, 2017

Currently, the aes_ccm.c and aes_gcm.c are almost line by line copy of
each other. This patch reduce code redundancy by moving the code in these
two files to crypto/aead_api.c to make it a higher level aead api. The
file aes_ccm.c and aes_gcm.c are removed and all the functions there are
now implemented in their headers using the newly added aead api.
Signed-off-by: NXiang Gao <qasdfgtyuiop@gmail.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

4133da73

openvswitch: add ct_clear action · b8226962

由 Eric Garver 提交于 10月 10, 2017

This adds a ct_clear action for clearing conntrack state. ct_clear is
currently implemented in OVS userspace, but is not backed by an action
in the kernel datapath. This is useful for flows that may modify a
packet tuple after a ct lookup has already occurred.
Signed-off-by: NEric Garver <e@erig.me>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8226962

net: dst: move cpu inside ifdef to avoid compilation warning · 833e0e2f

由 Jakub Kicinski 提交于 10月 10, 2017

If CONFIG_DST_CACHE is not selected cpu variable
will be unused and we will see a compilation warning.
Move it under the ifdef.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Fixes: d66f2b91 ("bpf: don't rely on the verifier lock for metadata_dst allocation")
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

833e0e2f

rtnetlink: bridge: use ext_ack instead of printk · b88d12e4

由 Florian Westphal 提交于 10月 10, 2017

We can now piggyback error strings to userspace via extended acks
rather than using printk.

Before:
bridge fdb add 01:02:03:04:05:06 dev br0 vlan 4095
RTNETLINK answers: Invalid argument

After:
bridge fdb add 01:02:03:04:05:06 dev br0 vlan 4095
Error: invalid vlan id.

v3: drop 'RTM_' prefixes, suggested by David Ahern, they
are not useful, the add/del in bridge command line is enough.

Also reword error in response to malformed/bad vlan id attribute
size.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b88d12e4

net/core: Fix BUG to BUG_ON conditionals. · 9f77fad3

由 Tim Hansen 提交于 10月 09, 2017

Fix BUG() calls to use BUG_ON(conditional) macros.

This was found using make coccicheck M=net/core on linux next
tag next-2017092
Signed-off-by: NTim Hansen <devtimhansen@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f77fad3

bpf: don't rely on the verifier lock for metadata_dst allocation · d66f2b91

由 Jakub Kicinski 提交于 10月 09, 2017

bpf_skb_set_tunnel_*() functions require allocation of per-cpu
metadata_dst.  The allocation happens upon verification of the
first program using those helpers.  In preparation for removing
the verifier lock, use cmpxchg() to make sure we only allocate
the metadata_dsts once.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d66f2b91

ipv6: fix incorrect bitwise operator used on rt6i_flags · 442d713b

由 Colin Ian King 提交于 10月 10, 2017

The use of the | operator always leads to true which looks rather
suspect to me. Fix this by using & instead to just check the
RTF_CACHE entry bit.

Detected by CoverityScan, CID#1457734, #1457747 ("Wrong operator used")

Fixes: 35732d01 ("ipv6: introduce a hash table to store dst cache")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Acked-by: NWei Wang <weiwan@google.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

442d713b

ipv6: fix dereference of rt6_ex before null check error · b2427e67

由 Colin Ian King 提交于 10月 10, 2017

Currently rt6_ex is being dereferenced before it is null checked
hence there is a possible null dereference bug. Fix this by only
dereferencing rt6_ex after it has been null checked.

Detected by CoverityScan, CID#1457749 ("Dereference before null check")

Fixes: 81eb8447 ("ipv6: take care of rt6_stats")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2427e67

10 10月, 2017 11 次提交

openvswitch: Add erspan tunnel support. · ceaa001a

由 William Tu 提交于 10月 04, 2017

Add erspan netlink interface for OVS.
Signed-off-by: NWilliam Tu <u9012063@gmail.com>
Cc: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ceaa001a

ipv6: use rcu_dereference_bh() in ipv6_route_seq_next() · d0e60206

由 Wei Wang 提交于 10月 09, 2017

This patch replaces rcu_deference() with rcu_dereference_bh() in
ipv6_route_seq_next() to avoid the following warning:

[   19.431685] WARNING: suspicious RCU usage
[   19.433451] 4.14.0-rc3-00914-g66f5d6ce #118 Not tainted
[   19.435509] -----------------------------
[   19.437267] net/ipv6/ip6_fib.c:2259 suspicious
rcu_dereference_check() usage!
[   19.440790]
[   19.440790] other info that might help us debug this:
[   19.440790]
[   19.444734]
[   19.444734] rcu_scheduler_active = 2, debug_locks = 1
[   19.447757] 2 locks held by odhcpd/3720:
[   19.449480]  #0:  (&p->lock){+.+.}, at: [<ffffffffb1231f7d>]
seq_read+0x3c/0x333
[   19.452720]  #1:  (rcu_read_lock_bh){....}, at: [<ffffffffb1d2b984>]
ipv6_route_seq_start+0x5/0xfd
[   19.456323]
[   19.456323] stack backtrace:
[   19.458812] CPU: 0 PID: 3720 Comm: odhcpd Not tainted
4.14.0-rc3-00914-g66f5d6ce #118
[   19.462042] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1 04/01/2014
[   19.465414] Call Trace:
[   19.466788]  dump_stack+0x86/0xc0
[   19.468358]  lockdep_rcu_suspicious+0xea/0xf3
[   19.470183]  ipv6_route_seq_next+0x71/0x164
[   19.471963]  seq_read+0x244/0x333
[   19.473522]  proc_reg_read+0x48/0x67
[   19.475152]  ? proc_reg_write+0x67/0x67
[   19.476862]  __vfs_read+0x26/0x10b
[   19.478463]  ? __might_fault+0x37/0x84
[   19.480148]  vfs_read+0xba/0x146
[   19.481690]  SyS_read+0x51/0x8e
[   19.483197]  do_int80_syscall_32+0x66/0x15a
[   19.484969]  entry_INT80_compat+0x32/0x50
[   19.486707] RIP: 0023:0xf7f0be8e
[   19.488244] RSP: 002b:00000000ffa75d04 EFLAGS: 00000246 ORIG_RAX:
0000000000000003
[   19.491431] RAX: ffffffffffffffda RBX: 0000000000000009 RCX:
0000000008056068
[   19.493886] RDX: 0000000000001000 RSI: 0000000008056008 RDI:
0000000000001000
[   19.496331] RBP: 00000000000001ff R08: 0000000000000000 R09:
0000000000000000
[   19.498768] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[   19.501217] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000

Fixes: 66f5d6ce ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
Reported-by: NXiaolong Ye <xiaolong.ye@intel.com>
Signed-off-by: NWei Wang <weiwan@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d0e60206

ipv6: addrlabel: remove refcounting · 2809c095

由 Eric Dumazet 提交于 10月 09, 2017

After previous patch ("ipv6: addrlabel: rework ip6addrlbl_get()")
we can remove the refcount from struct ip6addrlbl_entry,
since it is no longer elevated in p6addrlbl_get()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2809c095

ipv6: addrlabel: rework ip6addrlbl_get() · 66c77ff3

由 Eric Dumazet 提交于 10月 09, 2017

If we allocate skb before the lookup, we can use RCU
without the need of ip6addrlbl_hold()

This means that the following patch can get rid of refcounting.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66c77ff3

ipv6: avoid zeroing per cpu data again · bfd8e5a4

由 Eric Dumazet 提交于 10月 09, 2017

per cpu allocations are already zeroed, no need to clear them again.

Fixes: d52d3997 ("ipv6: Create percpu rt6_info")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bfd8e5a4

udp: fix bcast packet reception · 996b44fc

由 Paolo Abeni 提交于 10月 09, 2017

The commit bc044e8d ("udp: perform source validation for
mcast early demux") does not take into account that broadcast packets
lands in the same code path and they need different checks for the
source address - notably, zero source address are valid for bcast
and invalid for mcast.

As a result, 2nd and later broadcast packets with 0 source address
landing to the same socket are dropped. This breaks dhcp servers.

Since we don't have stringent performance requirements for ingress
broadcast traffic, fix it by disabling UDP early demux such traffic.
Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Fixes: bc044e8d ("udp: perform source validation for mcast early demux")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

996b44fc

netlink: do not set cb_running if dump's start() errs · 41c87425

由 Jason A. Donenfeld 提交于 10月 09, 2017

It turns out that multiple places can call netlink_dump(), which means
it's still possible to dereference partially initialized values in
dump() that were the result of a faulty returned start().

This fixes the issue by calling start() _before_ setting cb_running to
true, so that there's no chance at all of hitting the dump() function
through any indirect paths.

It also moves the call to start() to be when the mutex is held. This has
the nice side effect of serializing invocations to start(), which is
likely desirable anyway. It also prevents any possible other races that
might come out of this logic.

In testing this with several different pieces of tricky code to trigger
these issues, this commit fixes all avenues that I'm aware of.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41c87425

net: bridge: Export bridge multicast router state · 0912bda4

由 Yotam Gigi 提交于 10月 09, 2017

Add an access function that, given a bridge netdevice, returns whether the
bridge device is currently an mrouter or not. The function uses the already
existing br_multicast_is_router function to check that.

This function is needed in order to allow ports that join an already
existing bridge to know the current mrouter state of the bridge device.
Together with the bridge device mrouter ports switchdev notifications, it
is possible to have full offloading of the semantics of the bridge device
mcast router state.

Due to the fact that the bridge multicast router status can change in
packet RX path, take the multicast_router bridge spinlock to protect the
read.
Signed-off-by: NYotam Gigi <yotamg@mellanox.com>
Reviewed-by: NNogah Frankel <nogahf@mellanox.com>
Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0912bda4

net: bridge: Notify on bridge device mrouter state changes · 77041420

由 Yotam Gigi 提交于 10月 09, 2017

Add the SWITCHDEV_ATTR_ID_BRIDGE_MROUTER switchdev notification type, used
to indicate whether the bridge is or isn't mrouter. Notify when the bridge
changes its state, similarly to the already existing bridged port mrouter
notifications.

The notification uses the switchdev_attr.u.mrouter boolean flag to indicate
the current bridge mrouter status. Thus, it only indicates whether the
bridge is currently used as an mrouter or not, and does not indicate the
exact mrouter state of the bridge (learning, permanent, etc.).
Signed-off-by: NYotam Gigi <yotamg@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77041420

ipv4: Fix traffic triggered IPsec connections. · 6c0e7284

由 Steffen Klassert 提交于 10月 09, 2017

A recent patch removed the dst_free() on the allocated
dst_entry in ipv4_blackhole_route(). The dst_free() marked the
dst_entry as dead and added it to the gc list. I.e. it was setup
for a one time usage. As a result we may now have a blackhole
route cached at a socket on some IPsec scenarios. This makes the
connection unusable.

Fix this by marking the dst_entry directly at allocation time
as 'dead', so it is used only once.

Fixes: b838d5e1 ("ipv4: mark DST_NOGC and remove the operation of dst_free()")
Reported-by: NTobias Brunner <tobias@strongswan.org>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c0e7284

ipv6: Fix traffic triggered IPsec connections. · 62cf27e5

由 Steffen Klassert 提交于 10月 09, 2017

A recent patch removed the dst_free() on the allocated
dst_entry in ipv6_blackhole_route(). The dst_free() marked
the dst_entry as dead and added it to the gc list. I.e. it
was setup for a one time usage. As a result we may now have
a blackhole route cached at a socket on some IPsec scenarios.
This makes the connection unusable.

Fix this by marking the dst_entry directly at allocation time
as 'dead', so it is used only once.

Fixes: 587fea74 ("ipv6: mark DST_NOGC and remove the operation of dst_free()")
Reported-by: NTobias Brunner <tobias@strongswan.org>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

62cf27e5

09 10月, 2017 4 次提交

netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' · 98589a09

由 Shmulik Ladkani 提交于 10月 09, 2017

Commit 2c16d603 ("netfilter: xt_bpf: support ebpf") introduced
support for attaching an eBPF object by an fd, with the
'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each
IPT_SO_SET_REPLACE call.

However this breaks subsequent iptables calls:

 # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT
 # iptables -A INPUT -s 5.6.7.8 -j ACCEPT
 iptables: Invalid argument. Run `dmesg' for more information.

That's because iptables works by loading existing rules using
IPT_SO_GET_ENTRIES to userspace, then issuing IPT_SO_SET_REPLACE with
the replacement set.

However, the loaded 'xt_bpf_info_v1' has an arbitrary '.fd' number
(from the initial "iptables -m bpf" invocation) - so when 2nd invocation
occurs, userspace passes a bogus fd number, which leads to
'bpf_mt_check_v1' to fail.

One suggested solution [1] was to hack iptables userspace, to perform a
"entries fixup" immediatley after IPT_SO_GET_ENTRIES, by opening a new,
process-local fd per every 'xt_bpf_info_v1' entry seen.

However, in [2] both Pablo Neira Ayuso and Willem de Bruijn suggested to
depricate the xt_bpf_info_v1 ABI dealing with pinned ebpf objects.

This fix changes the XT_BPF_MODE_FD_PINNED behavior to ignore the given
'.fd' and instead perform an in-kernel lookup for the bpf object given
the provided '.path'.

It also defines an alias for the XT_BPF_MODE_FD_PINNED mode, named
XT_BPF_MODE_PATH_PINNED, to better reflect the fact that the user is
expected to provide the path of the pinned object.

Existing XT_BPF_MODE_FD_ELF behavior (non-pinned fd mode) is preserved.

References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2
            [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2Reported-by: NRafael Buchbinder <rafi@rbk.ms>
Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

98589a09

netfilter: SYNPROXY: skip non-tcp packet in {ipv4, ipv6}_synproxy_hook · 49f817d7

由 Lin Zhang 提交于 10月 06, 2017

In function {ipv4,ipv6}_synproxy_hook we expect a normal tcp packet, but
the real server maybe reply an icmp error packet related to the exist
tcp conntrack, so we will access wrong tcp data.

Fix it by checking for the protocol field and only process tcp traffic.
Signed-off-by: NLin Zhang <xiaolou4617@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

49f817d7

ipv6: avoid cache line dirtying in ipv6_dev_get_saddr() · cc429c8f

由 Eric Dumazet 提交于 10月 07, 2017

By extending the rcu section a bit, we can avoid these
very expensive in6_ifa_put()/in6_ifa_hold() calls
done in __ipv6_dev_get_saddr() and ipv6_dev_get_saddr()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc429c8f

ipv6: __ipv6_dev_get_saddr() rcu conversion · f59c031e

由 Eric Dumazet 提交于 10月 07, 2017

Callers hold rcu_read_lock(), so we do not need
the rcu_read_lock()/rcu_read_unlock() pair.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f59c031e

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功