提交 · 87553aa5212f43d3d14b9b5d1dfba89f1a6e6f21 · openanolis / cloud-kernel

20 5月, 2016 3 次提交

tipc: block BH in TCP callbacks · b91083a4

由 Eric Dumazet 提交于 5月 17, 2016

TCP stack can now run from process context.

Use read_lock_bh(&sk->sk_callback_lock) variant to restore previous
assumption.

Fixes: 5413d1ba ("net: do not block BH while processing socket backlog")
Fixes: d41a69f1 ("tcp: make tcp_sendmsg() aware of socket backlog")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b91083a4

rds: tcp: block BH in TCP callbacks · 38036629

由 Eric Dumazet 提交于 5月 17, 2016

TCP stack can now run from process context.

Use read_lock_bh(&sk->sk_callback_lock) variant to restore previous
assumption.

Fixes: 5413d1ba ("net: do not block BH while processing socket backlog")
Fixes: d41a69f1 ("tcp: make tcp_sendmsg() aware of socket backlog")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38036629

kcm: fix a signedness in kcm_splice_read() · f1971a2e

由 WANG Cong 提交于 5月 17, 2016

skb_splice_bits() returns int, kcm_splice_read() returns ssize_t,
both are signed.

We may need another patch to make them all ssize_t, but that
deserves a separated patch.

Fixes: 91687355 ("kcm: Splice support")
Reported-by: NDavid Binderman <linuxdev.baldrick@gmail.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1971a2e

18 5月, 2016 3 次提交

switchdev: pass pointer to fib_info instead of copy · da4ed551

由 Jiri Pirko 提交于 5月 17, 2016

The problem is that fib_info->nh is [0] so the struct fib_info
allocation size depends on number of nexthops. If we just copy fib_info,
we do not copy the nexthops info and driver accesses memory which is not
ours.

Given the fact that fib4 does not defer operations and therefore it does
not need copy, just pass the pointer down to drivers as it was done
before.

Fixes: 850d0cbc ("switchdev: remove pointers from switchdev objects")
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da4ed551

net_sched: close another race condition in tcf_mirred_release() · dc327f89

由 WANG Cong 提交于 5月 16, 2016

We saw the following extra refcount release on veth device:

  kernel: [7957821.463992] unregister_netdevice: waiting for mesos50284 to become free. Usage count = -1

Since we heavily use mirred action to redirect packets to veth, I think
this is caused by the following race condition:

CPU0:
tcf_mirred_release(): (in RCU callback)
	struct net_device *dev = rcu_dereference_protected(m->tcfm_dev, 1);

CPU1:
mirred_device_event():
        spin_lock_bh(&mirred_list_lock);
        list_for_each_entry(m, &mirred_list, tcfm_list) {
                if (rcu_access_pointer(m->tcfm_dev) == dev) {
                        dev_put(dev);
                        /* Note : no rcu grace period necessary, as
                         * net_device are already rcu protected.
                         */
                        RCU_INIT_POINTER(m->tcfm_dev, NULL);
                }
        }
        spin_unlock_bh(&mirred_list_lock);

CPU0:
tcf_mirred_release():
        spin_lock_bh(&mirred_list_lock);
        list_del(&m->tcfm_list);
        spin_unlock_bh(&mirred_list_lock);
        if (dev)               // <======== Stil refers to the old m->tcfm_dev
                dev_put(dev);  // <======== dev_put() is called on it again

The action init code path is good because it is impossible to modify
an action that is being removed.

So, fix this by moving everything under the spinlock.

Fixes: 2ee22a90 ("net_sched: act_mirred: remove spinlock in fast path")
Fixes: 6bd00b85 ("act_mirred: fix a race condition on mirred_list")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc327f89

tipc: fix nametable publication field in nl compat · 03aaaa9b

由 Richard Alpe 提交于 5月 17, 2016

The publication field of the old netlink API should contain the
publication key and not the publication reference.

Fixes: 44a8ae94 (tipc: convert legacy nl name table dump to nl compat)
Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com>
Acked-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03aaaa9b

17 5月, 2016 15 次提交

netlink: Fix dump skb leak/double free · 92964c79

由 Herbert Xu 提交于 5月 16, 2016

When we free cb->skb after a dump, we do it after releasing the
lock.  This means that a new dump could have started in the time
being and we'll end up freeing their skb instead of ours.

This patch saves the skb and module before we unlock so we free
the right memory.

Fixes: 16b304f3 ("netlink: Eliminate kmalloc in netlink dump operation.")
Reported-by: NBaozeng Ding <sploving1@gmail.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92964c79

tipc: check nl sock before parsing nested attributes · 45e093ae

由 Richard Alpe 提交于 5月 16, 2016

Make sure the socket for which the user is listing publication exists
before parsing the socket netlink attributes.

Prior to this patch a call without any socket caused a NULL pointer
dereference in tipc_nl_publ_dump().
Tested-and-reported-by: NBaozeng Ding <sploving1@gmail.com>
Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com>
Acked-by: NJon Maloy <jon.maloy@ericsson.cm>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45e093ae

fq_codel: fix memory limitation drift · 77f57761

由 Eric Dumazet 提交于 5月 15, 2016

memory_usage must be decreased in dequeue_func(), not in
fq_codel_dequeue(), otherwise packets dropped by Codel algo
are missing this decrease.

Also we need to clear memory_usage in fq_codel_reset()

Fixes: 95b58430 ("fq_codel: add memory limitation per queue")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77f57761

net: also make sch_handle_egress() drop monitor ready · 7e2c3aea

由 Daniel Borkmann 提交于 5月 15, 2016

Follow-up for 8a3a4c6e ("net: make sch_handle_ingress() drop
monitor ready") to also make the egress side drop monitor ready.

Also here only TC_ACT_SHOT is a clear indication that something
went wrong. Hence don't provide false positives to drop monitors
such as 'perf record -e skb:kfree_skb ...'.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e2c3aea

net/hsr: Use setup_timer and mod_timer. · 15db6e0d

由 Muhammad Falak R Wani 提交于 5月 15, 2016

The function setup_timer combines the initialization of a timer with the
initialization of the timer's function and data fields. The mulitiline
code for timer initialization is now replaced with function setup_timer.

Also, quoting the mod_timer() function comment:
-> mod_timer() is a more efficient way to update the expire field of an
   active timer (if the timer is inactive it will be activated).

Use setup_timer() and mod_timer() to setup and arm a timer, making the
code compact and aid readablity.
Signed-off-by: NMuhammad Falak R Wani <falakreyaz@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

15db6e0d

bpf: add generic constant blinding for use in jits · 4f3446bb

由 Daniel Borkmann 提交于 5月 13, 2016

This work adds a generic facility for use from eBPF JIT compilers
that allows for further hardening of JIT generated images through
blinding constants. In response to the original work on BPF JIT
spraying published by Keegan McAllister [1], most BPF JITs were
changed to make images read-only and start at a randomized offset
in the page, where the rest was filled with trap instructions. We
have this nowadays in x86, arm, arm64 and s390 JIT compilers.
Additionally, later work also made eBPF interpreter images read
only for kernels supporting DEBUG_SET_MODULE_RONX, that is, x86,
arm, arm64 and s390 archs as well currently. This is done by
default for mentioned JITs when JITing is enabled. Furthermore,
we had a generic and configurable constant blinding facility on our
todo for quite some time now to further make spraying harder, and
first implementation since around netconf 2016.

We found that for systems where untrusted users can load cBPF/eBPF
code where JIT is enabled, start offset randomization helps a bit
to make jumps into crafted payload harder, but in case where larger
programs that cross page boundary are injected, we again have some
part of the program opcodes at a page start offset. With improved
guessing and more reliable payload injection, chances can increase
to jump into such payload. Elena Reshetova recently wrote a test
case for it [2, 3]. Moreover, eBPF comes with 64 bit constants, which
can leave some more room for payloads. Note that for all this,
additional bugs in the kernel are still required to make the jump
(and of course to guess right, to not jump into a trap) and naturally
the JIT must be enabled, which is disabled by default.

For helping mitigation, the general idea is to provide an option
bpf_jit_harden that admins can tweak along with bpf_jit_enable, so
that for cases where JIT should be enabled for performance reasons,
the generated image can be further hardened with blinding constants
for unpriviledged users (bpf_jit_harden == 1), with trading off
performance for these, but not for privileged ones. We also added
the option of blinding for all users (bpf_jit_harden == 2), which
is quite helpful for testing f.e. with test_bpf.ko. There are no
further e.g. hardening levels of bpf_jit_harden switch intended,
rationale is to have it dead simple to use as on/off. Since this
functionality would need to be duplicated over and over for JIT
compilers to use, which are already complex enough, we provide a
generic eBPF byte-code level based blinding implementation, which is
then just transparently JITed. JIT compilers need to make only a few
changes to integrate this facility and can be migrated one by one.

This option is for eBPF JITs and will be used in x86, arm64, s390
without too much effort, and soon ppc64 JITs, thus that native eBPF
can be blinded as well as cBPF to eBPF migrations, so that both can
be covered with a single implementation. The rule for JITs is that
bpf_jit_blind_constants() must be called from bpf_int_jit_compile(),
and in case blinding is disabled, we follow normally with JITing the
passed program. In case blinding is enabled and we fail during the
process of blinding itself, we must return with the interpreter.
Similarly, in case the JITing process after the blinding failed, we
return normally to the interpreter with the non-blinded code. Meaning,
interpreter doesn't change in any way and operates on eBPF code as
usual. For doing this pre-JIT blinding step, we need to make use of
a helper/auxiliary register, here BPF_REG_AX. This is strictly internal
to the JIT and not in any way part of the eBPF architecture. Just like
in the same way as JITs internally make use of some helper registers
when emitting code, only that here the helper register is one
abstraction level higher in eBPF bytecode, but nevertheless in JIT
phase. That helper register is needed since f.e. manually written
program can issue loads to all registers of eBPF architecture.

The core concept with the additional register is: blind out all 32
and 64 bit constants by converting BPF_K based instructions into a
small sequence from K_VAL into ((RND ^ K_VAL) ^ RND). Therefore, this
is transformed into: BPF_REG_AX := (RND ^ K_VAL), BPF_REG_AX ^= RND,
and REG <OP> BPF_REG_AX, so actual operation on the target register
is translated from BPF_K into BPF_X one that is operating on
BPF_REG_AX's content. During rewriting phase when blinding, RND is
newly generated via prandom_u32() for each processed instruction.
64 bit loads are split into two 32 bit loads to make translation and
patching not too complex. Only basic thing required by JITs is to
call the helper bpf_jit_blind_constants()/bpf_jit_prog_release_other()
pair, and to map BPF_REG_AX into an unused register.

Small bpf_jit_disasm extract from [2] when applied to x86 JIT:

echo 0 > /proc/sys/net/core/bpf_jit_harden

ffffffffa034f5e9 + <x>:
[...]
39: mov $0xa8909090,%eax
3e: mov $0xa8909090,%eax
43: mov $0xa8ff3148,%eax
48: mov $0xa89081b4,%eax
4d: mov $0xa8900bb0,%eax
52: mov $0xa810e0c1,%eax
57: mov $0xa8908eb4,%eax
5c: mov $0xa89020b0,%eax
[...]

echo 1 > /proc/sys/net/core/bpf_jit_harden

ffffffffa034f1e5 + <x>:
[...]
39: mov $0xe1192563,%r10d
3f: xor $0x4989b5f3,%r10d
46: mov %r10d,%eax
49: mov $0xb8296d93,%r10d
4f: xor $0x10b9fd03,%r10d
56: mov %r10d,%eax
59: mov $0x8c381146,%r10d
5f: xor $0x24c7200e,%r10d
66: mov %r10d,%eax
69: mov $0xeb2a830e,%r10d
6f: xor $0x43ba02ba,%r10d
76: mov %r10d,%eax
79: mov $0xd9730af,%r10d
7f: xor $0xa5073b1f,%r10d
86: mov %r10d,%eax
89: mov $0x9a45662b,%r10d
8f: xor $0x325586ea,%r10d
96: mov %r10d,%eax
[...]

As can be seen, original constants that carry payload are hidden
when enabled, actual operations are transformed from constant-based
to register-based ones, making jumps into constants ineffective.
Above extract/example uses single BPF load instruction over and
over, but of course all instructions with constants are blinded.

Performance wise, JIT with blinding performs a bit slower than just
JIT and faster than interpreter case. This is expected, since we
still get all the performance benefits from JITing and in normal
use-cases not every single instruction needs to be blinded. Summing
up all 296 test cases averaged over multiple runs from test_bpf.ko
suite, interpreter was 55% slower than JIT only and JIT with blinding
was 8% slower than JIT only. Since there are also some extremes in
the test suite, I expect for ordinary workloads that the performance
for the JIT with blinding case is even closer to JIT only case,
f.e. nmap test case from suite has averaged timings in ns 29 (JIT),
35 (+ blinding), and 151 (interpreter).

BPF test suite, seccomp test suite, eBPF sample code and various
bigger networking eBPF programs have been tested with this and were
running fine. For testing purposes, I also adapted interpreter and
redirected blinded eBPF image to interpreter and also here all tests
pass.

[1] http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html
[2] https://github.com/01org/jit-spray-poc-for-ksp/
[3] http://www.openwall.com/lists/kernel-hardening/2016/05/03/5Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NElena Reshetova <elena.reshetova@intel.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f3446bb

bpf: prepare bpf_int_jit_compile/bpf_prog_select_runtime apis · d1c55ab5

由 Daniel Borkmann 提交于 5月 13, 2016

Since the blinding is strictly only called from inside eBPF JITs,
we need to change signatures for bpf_int_jit_compile() and
bpf_prog_select_runtime() first in order to prepare that the
eBPF program we're dealing with can change underneath. Hence,
for call sites, we need to return the latest prog. No functional
change in this patch.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1c55ab5

bpf: split HAVE_BPF_JIT into cBPF and eBPF variant · 6077776b

由 Daniel Borkmann 提交于 5月 13, 2016

Split the HAVE_BPF_JIT into two for distinguishing cBPF and eBPF JITs.

Current cBPF ones:

  # git grep -n HAVE_CBPF_JIT arch/
  arch/arm/Kconfig:44:    select HAVE_CBPF_JIT
  arch/mips/Kconfig:18:   select HAVE_CBPF_JIT if !CPU_MICROMIPS
  arch/powerpc/Kconfig:129:       select HAVE_CBPF_JIT
  arch/sparc/Kconfig:35:  select HAVE_CBPF_JIT

Current eBPF ones:

  # git grep -n HAVE_EBPF_JIT arch/
  arch/arm64/Kconfig:61:  select HAVE_EBPF_JIT
  arch/s390/Kconfig:126:  select HAVE_EBPF_JIT if PACK_STACK && HAVE_MARCH_Z196_FEATURES
  arch/x86/Kconfig:94:    select HAVE_EBPF_JIT                    if X86_64

Later code also needs this facility to check for eBPF JITs.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6077776b

bpf: minor cleanups in ebpf code · 4936e352

由 Daniel Borkmann 提交于 5月 13, 2016

Besides others, remove redundant comments where the code is self
documenting enough, and properly indent various bpf_verifier_ops
and bpf_prog_type_list declarations. Moreover, remove two exports
that actually have no module user.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4936e352

tcp: minor optimizations around tcp_hdr() usage · ea1627c2

由 Eric Dumazet 提交于 5月 13, 2016

tcp_hdr() is slightly more expensive than using skb->data in contexts
where we know they point to the same byte.

In receive path, tcp_v4_rcv() and tcp_v6_rcv() are in this situation,
as tcp header has not been pulled yet.

In output path, the same can be said when we just pushed the tcp header
in the skb, in tcp_transmit_skb() and tcp_make_synack()

Also factorize the two checks for tcb->tcp_flags & TCPHDR_SYN in
tcp_transmit_skb() and pass tcp header pointer to tcp_ecn_send(),
so that compiler can further optimize and avoid a reload.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea1627c2

sock: propagate __sock_cmsg_send() error · 2632616b

由 Eric Dumazet 提交于 5月 13, 2016

__sock_cmsg_send() might return different error codes, not only -EINVAL.

Fixes: 24025c46 ("ipv4: process socket-level control messages in IPv4")
Fixes: ad1e46a8 ("ipv6: process socket-level control messages in IPv6")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2632616b

net: qrtr: fix build problems · a986a05d

由 Arnd Bergmann 提交于 5月 13, 2016

Having multiple loadable modules with the same name cannot work
with modprobe, and having both net/qrtr/smd.ko and drivers/soc/qcom/smd.ko
results in a (somewhat cryptic) build error:

ERROR: "qcom_smd_driver_unregister" [net/qrtr/smd.ko] undefined!
ERROR: "qcom_smd_driver_register" [net/qrtr/smd.ko] undefined!
ERROR: "qcom_smd_set_drvdata" [net/qrtr/smd.ko] undefined!
ERROR: "qcom_smd_send" [net/qrtr/smd.ko] undefined!
ERROR: "qcom_smd_get_drvdata" [net/qrtr/smd.ko] undefined!
ERROR: "qcom_smd_driver_unregister" [drivers/soc/qcom/wcnss_ctrl.ko] undefined!
ERROR: "qcom_smd_driver_register" [drivers/soc/qcom/wcnss_ctrl.ko] undefined!
ERROR: "qcom_smd_set_drvdata" [drivers/soc/qcom/wcnss_ctrl.ko] undefined!
ERROR: "qcom_smd_send" [drivers/soc/qcom/wcnss_ctrl.ko] undefined!
ERROR: "qcom_smd_get_drvdata" [drivers/soc/qcom/wcnss_ctrl.ko] undefined!

Also, the qrtr driver uses the SMD interface and has a Kconfig dependency,
but also allows for compile-testing when SMD is disabled. However, if
with QCOM_SMD=m and COMPILE_TEST=y we can end up with QRTR_SMD=y and
that fails with a related link error.

The changes the dependency so we can still compile-test the driver but
not have it built-in if SMD is a module, to avoid running in the broken
configuration, and changes the Makefile to provide the driver under
a different module name.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Fixes: bdabad3e ("net: Add Qualcomm IPC router")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a986a05d

net/sched: cls_flower: Hardware offloaded filters statistics support · 10cbc684

由 Amir Vadai 提交于 5月 13, 2016

Introduce a new command in ndo_setup_tc() for hardware offloaded
filters, to call the NIC driver, and make it update the statistics.
This will be done before dumping the filter and its statistics.
Signed-off-by: NAmir Vadai <amirva@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10cbc684

net/sched: act_gact: Update statistics when offloaded to hardware · 9fea47d9

由 Amir Vadai 提交于 5月 13, 2016

Implement the stats_update callback that will be called by NIC drivers
for hardware offloaded filters.
Signed-off-by: NAmir Vadai <amirva@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9fea47d9

net: cls_u32: Add support for skip-sw flag to tc u32 classifier. · d34e3e18

由 Samudrala, Sridhar 提交于 5月 12, 2016

On devices that support TC U32 offloads, this flag enables a filter to be
added only to HW. skip-sw and skip-hw are mutually exclusive flags. By
default without any flags, the filter is added to both HW and SW, but no
error checks are done in case of failure to add to HW. With skip-sw,
failure to add to HW is treated as an error.

Here is a sample script that adds 2 filters, one with skip-sw and the other
with skip-hw flag.

   # add ingress qdisc
   tc qdisc add dev p4p1 ingress

   # enable hw tc offload.
   ethtool -K p4p1 hw-tc-offload on

   # add u32 filter with skip-sw flag.
   tc filter add dev p4p1 parent ffff: protocol ip prio 99 \
      handle 800:0:1 u32 ht 800: flowid 800:1 \
      skip-sw \
      match ip src 192.168.1.0/24 \
      action drop

   # add u32 filter with skip-hw flag.
   tc filter add dev p4p1 parent ffff: protocol ip prio 99 \
      handle 800:0:2 u32 ht 800: flowid 800:2 \
      skip-hw \
      match ip src 192.168.2.0/24 \
      action drop
Signed-off-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d34e3e18

15 5月, 2016 3 次提交

net: switchdev: Drop EXPERIMENTAL from description · 8fbb89c6

由 Florian Fainelli 提交于 5月 14, 2016

Switchdev has been around for quite a while now, putting "EXPERIMENTAL"
in the description is no longer accurate, drop it.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fbb89c6

net/route: enforce hoplimit max value · 626abd59

由 Paolo Abeni 提交于 5月 13, 2016

Currently, when creating or updating a route, no check is performed
in both ipv4 and ipv6 code to the hoplimit value.

The caller can i.e. set hoplimit to 256, and when such route will
 be used, packets will be sent with hoplimit/ttl equal to 0.

This commit adds checks for the RTAX_HOPLIMIT value, in both ipv4
ipv6 route code, substituting any value greater than 255 with 255.

This is consistent with what is currently done for ADVMSS and MTU
in the ipv4 code.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

626abd59

nf_conntrack: avoid kernel pointer value leak in slab name · 31b0b385

由 Linus Torvalds 提交于 5月 14, 2016

The slab name ends up being visible in the directory structure under
/sys, and even if you don't have access rights to the file you can see
the filenames.

Just use a 64-bit counter instead of the pointer to the 'net' structure
to generate a unique name.

This code will go away in 4.7 when the conntrack code moves to a single
kmemcache, but this is the backportable simple solution to avoiding
leaking kernel pointers to user space.

Fixes: 5b3501fa ("netfilter: nf_conntrack: per netns nf_conntrack_cachep")
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31b0b385

13 5月, 2016 6 次提交

Bluetooth: fix power_on vs close race · bf389cab

由 Jiri Slaby 提交于 5月 13, 2016

With all the latest fixes applied, I am still able to reproduce this
(and other) warning(s):
WARNING: CPU: 1 PID: 19684 at ../kernel/workqueue.c:4092 destroy_workqueue+0x70a/0x770()
...
Call Trace:
 [<ffffffff819fee81>] ? dump_stack+0xb3/0x112
 [<ffffffff8117377e>] ? warn_slowpath_common+0xde/0x140
 [<ffffffff811ce68a>] ? destroy_workqueue+0x70a/0x770
 [<ffffffff811739ae>] ? warn_slowpath_null+0x2e/0x40
 [<ffffffff811ce68a>] ? destroy_workqueue+0x70a/0x770
 [<ffffffffa0c944c9>] ? hci_unregister_dev+0x2a9/0x720 [bluetooth]
 [<ffffffffa0b301db>] ? vhci_release+0x7b/0xf0 [hci_vhci]
 [<ffffffffa0b30160>] ? vhci_flush+0x50/0x50 [hci_vhci]
 [<ffffffff8117cd73>] ? do_exit+0x863/0x2b90

This is due to race present in the hci_unregister_dev path.
hdev->power_on work races with hci_dev_do_close. One tries to open,
the other tries to close, leading to warning like the above. (Another
example is a warning in kobject_get or kobject_put depending on who
wins the race.)

Fix this by switching those two racers to ensure hdev->power_on never
triggers while hci_dev_do_close is in progress.
Signed-off-by: NJiri Slaby <jslaby@suse.cz>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

bf389cab

udp: Resolve NULL pointer dereference over flow-based vxlan device · ed7cbbce

由 Alexander Duyck 提交于 5月 12, 2016

While testing an OpenStack configuration using VXLANs I saw the following
call trace:

 RIP: 0010:[<ffffffff815fad49>] udp4_lib_lookup_skb+0x49/0x80
 RSP: 0018:ffff88103867bc50  EFLAGS: 00010286
 RAX: ffff88103269bf00 RBX: ffff88103269bf00 RCX: 00000000ffffffff
 RDX: 0000000000004300 RSI: 0000000000000000 RDI: ffff880f2932e780
 RBP: ffff88103867bc60 R08: 0000000000000000 R09: 000000009001a8c0
 R10: 0000000000004400 R11: ffffffff81333a58 R12: ffff880f2932e794
 R13: 0000000000000014 R14: 0000000000000014 R15: ffffe8efbfd89ca0
 FS:  0000000000000000(0000) GS:ffff88103fd80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000488 CR3: 0000000001c06000 CR4: 00000000001426e0
 Stack:
  ffffffff81576515 ffffffff815733c0 ffff88103867bc98 ffffffff815fcc17
  ffff88103269bf00 ffffe8efbfd89ca0 0000000000000014 0000000000000080
  ffffe8efbfd89ca0 ffff88103867bcc8 ffffffff815fcf8b ffff880f2932e794
 Call Trace:
  [<ffffffff81576515>] ? skb_checksum+0x35/0x50
  [<ffffffff815733c0>] ? skb_push+0x40/0x40
  [<ffffffff815fcc17>] udp_gro_receive+0x57/0x130
  [<ffffffff815fcf8b>] udp4_gro_receive+0x10b/0x2c0
  [<ffffffff81605863>] inet_gro_receive+0x1d3/0x270
  [<ffffffff81589e59>] dev_gro_receive+0x269/0x3b0
  [<ffffffff8158a1b8>] napi_gro_receive+0x38/0x120
  [<ffffffffa0871297>] gro_cell_poll+0x57/0x80 [vxlan]
  [<ffffffff815899d0>] net_rx_action+0x160/0x380
  [<ffffffff816965c7>] __do_softirq+0xd7/0x2c5
  [<ffffffff8107d969>] run_ksoftirqd+0x29/0x50
  [<ffffffff8109a50f>] smpboot_thread_fn+0x10f/0x160
  [<ffffffff8109a400>] ? sort_range+0x30/0x30
  [<ffffffff81096da8>] kthread+0xd8/0xf0
  [<ffffffff81693c82>] ret_from_fork+0x22/0x40
  [<ffffffff81096cd0>] ? kthread_park+0x60/0x60

The following trace is seen when receiving a DHCP request over a flow-based
VXLAN tunnel.  I believe this is caused by the metadata dst having a NULL
dev value and as a result dev_net(dev) is causing a NULL pointer dereference.

To resolve this I am replacing the check for skb_dst(skb)->dev with just
skb->dev.  This makes sense as the callers of this function are usually in
the receive path and as such skb->dev should always be populated.  In
addition other functions in the area where these are called are already
using dev_net(skb->dev) to determine the namespace the UDP packet belongs
in.

Fixes: 63058308 ("udp: Add udp6_lib_lookup_skb and udp4_lib_lookup_skb")
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ed7cbbce

sunrpc: set SOCK_FASYNC · b4411457

由 Eric Dumazet 提交于 5月 12, 2016

sunrpc is using SOCKWQ_ASYNC_NOSPACE without setting SOCK_FASYNC,
so the recent optimizations done in sk_set_bit() and sk_clear_bit()
broke it.

There is still the risk that a subsequent sock_fasync() call
would clear SOCK_FASYNC, but sunrpc does not use this yet.

Fixes: 9317bb69 ("net: SOCKWQ_ASYNC_NOSPACE optimizations")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NJiri Pirko <jiri@resnulli.us>
Reported-by: NHuang, Ying <ying.huang@intel.com>
Tested-by: NJiri Pirko <jiri@resnulli.us>
Tested-by: NHuang, Ying <ying.huang@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4411457

tipc: eliminate risk of double link_up events · e7142c34

由 Jon Paul Maloy 提交于 5月 11, 2016

When an ACTIVATE or data packet is received in a link in state
ESTABLISHING, the link does not immediately change state to
ESTABLISHED, but does instead return a LINK_UP event to the caller,
which will execute the state change in a different lock context.

This non-atomic approach incurs a low risk that we may have two
LINK_UP events pending simultaneously for the same link, resulting
in the final part of the setup procedure being executed twice. The
only potential harm caused by this it that we may see two LINK_UP
events issued to subsribers of the topology server, something that
may cause confusion.

This commit eliminates this risk by checking if the link is already
up before proceeding with the second half of the setup.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7142c34

gre: Fix wrong tpi->proto in WCCP · da73b4e9

由 Haishuang Yan 提交于 5月 11, 2016

When dealing with WCCP in gre6 tunnel, it sets the wrong tpi->protocol,
that is, ETH_P_IP instead of ETH_P_IPV6 for the encapuslated traffic.
Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da73b4e9

ip6_gre: Fix get_size calculation for gre6 tunnel · 23f72215

由 Haishuang Yan 提交于 5月 11, 2016

Do not include attribute IFLA_GRE_TOS.
Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

23f72215

12 5月, 2016 10 次提交

mac80211: allow software PS-Poll/U-APSD with AP_LINK_PS · 46fa38e8

由 Johannes Berg 提交于 5月 03, 2016

When using RSS, frames might not be processed in the correct order,
and thus AP_LINK_PS must be used; most likely with firmware keeping
track of the powersave state, this is the case in iwlwifi now.

In this case, the driver can use ieee80211_sta_ps_transition() to
still have mac80211 manage powersave buffering. However, for U-APSD
and PS-Poll this isn't sufficient. If the device can't manage that
entirely on its own, mac80211's code should be used.

To allow this, export two functions: ieee80211_sta_uapsd_trigger()
and ieee80211_sta_pspoll().
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

46fa38e8

cfg80211: make wdev_list accessible to drivers · 53873f13

由 Johannes Berg 提交于 5月 03, 2016

There's no harm in having drivers read the list, since they can
use RCU protection or RTNL locking; allow this to not require
each and every driver to also implement its own bookkeeping.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

53873f13

cfg80211: remove erroneous comment · 8b9b2f06

由 Johannes Berg 提交于 5月 03, 2016

The devlist_mtx mutex was removed about two years ago, in favour of just
using RTNL/RCU protection. Remove the comment still referencing it.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

8b9b2f06

cfg80211: allow finding vendor with OUI without specifying the OUI type · 9e9ea439

由 Emmanuel Grumbach 提交于 5月 03, 2016

This allows finding vendor IE from a specific vendor.
Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

9e9ea439

mac80211: allow same PN for AMSDU sub-frames · f631a77b

由 Sara Sharon 提交于 5月 03, 2016

Some hardware (iwlwifi an example) de-aggregate AMSDUs and copy the IV
as is to the generated MPDUs, so the same PN appears in multiple
packets without being a replay attack.  Allow driver to explicitly
indicate that a frame is allowed to have the same PN as the previous
frame.
Signed-off-by: NSara Sharon <sara.sharon@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

f631a77b

mac80211: remove disconnected APs from BSS table · 20eb7ea9

由 David Spinadel 提交于 5月 03, 2016

In some cases, after a sudden AP disappearing and reconnection to
another AP in the same ESS, user space gets the old AP in scan
results (cached). User space may decide to roam to that old AP
which will cause a disconnection and longer recovery.
Remove APs that are probably out of range from BSS table.
Signed-off-by: NDavid Spinadel <david.spinadel@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

20eb7ea9

dsa: Rename switch chip data to cd · ff04955c

由 Andrew Lunn 提交于 5月 10, 2016

The dsa_switch structure contains a dsa_chip_data member called pd.
However in the rest of the code, pd is used for dsa_platform_data.
This is confusing. Rename it cd, which is already often used in dsa.c
and slave.c for this data type.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff04955c

dsa: Remove master_dev from switch structure · c33063d6

由 Andrew Lunn 提交于 5月 10, 2016

The switch drivers only use the master_dev member for dev_info()
messages.  Now that the device is passed to the old style probe, and
new style drivers are probed as true linux drivers, this is no longer
needed.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c33063d6

dsa: Move gpio reset into switch driver · 52638f71

由 Andrew Lunn 提交于 5月 10, 2016

Resetting the switch is something the driver does, not the framework.
So move the parsing of this property into the driver.

There are no in kernel users of this property, so moving it does not
break anything. There is however a board which will make use of this
property making its way into the kernel.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52638f71

net: original ingress device index in PKTINFO · 0b922b7a

由 David Ahern 提交于 5月 10, 2016

Applications such as OSPF and BFD need the original ingress device not
the VRF device; the latter can be derived from the former. To that end
add the skb_iif to inet_skb_parm and set it in ipv4 code after clearing
the skb control buffer similar to IPv6. From there the pktinfo can just
pull it from cb with the PKTINFO_SKB_CB cast.

The previous patch moving the skb->dev change to L3 means nothing else
is needed for IPv6; it just works.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b922b7a

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功