提交 · 3c1490913f3bcf750261649e7ffd62390187058c · openeuler / Kernel

28 12月, 2017 3 次提交

net: sch: api: fix tcf_block_get · 3c149091

由 Sudip Mukherjee 提交于 12月 22, 2017

The build of mips bcm47xx_defconfig is failing with the error:
net/sched/sch_fq_codel.c: In function 'fq_codel_init':
net/sched/sch_fq_codel.c:487:8: error:
	too many arguments to function 'tcf_block_get'

While adding the extack support, the commit missed adding it in the
headers when CONFIG_NET_CLS is not defined.

Fixes: 8d1a77f9 ("net: sch: api: add extack support in tcf_block_get")
Signed-off-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c149091

l2tp: add peer_offset parameter · f15bc54e

由 Lorenzo Bianconi 提交于 12月 22, 2017

Introduce peer_offset parameter in order to add the capability
to specify two different values for payload offset on tx/rx side.
If just offset is provided by userspace use it for rx side as well
in order to maintain compatibility with older l2tp versions
Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f15bc54e

phylib: rename reset-(post-)delay-us to reset-(de)assert-us · 04f629f7

由 Richard Leitner 提交于 12月 22, 2017

As suggested by Rob Herring [1] rename the previously introduced
reset-{,post-}delay-us bindings to the clearer reset-{,de}assert-us

[1] https://patchwork.kernel.org/patch/10104905/Signed-off-by: NRichard Leitner <richard.leitner@skidata.com>
Reviewed-by: NRob Herring <robh@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04f629f7

27 12月, 2017 1 次提交

rtnetlink: Replace implementation of ASSERT_RTNL() macro with WARN_ONCE() · 66364bdf

由 Leon Romanovsky 提交于 12月 21, 2017

ASSERT_RTNL() macro is actual open-coded variant of WARN_ONCE() with
two exceptions. First, it prints stack for multiple hits and not only
once as WARN_ONCE() does. Second, the user can disable prints of
WARN_ONCE by setting CONFIG_BUG to N.

The multiple prints of dump stack are actually not needed, because calls
without rtnl lock are programming errors and user can't do anything
about them except to complain to the mailing list after first occurrence
of such failure.

The user who disabled BUG/WARN prints did it explicitly because by default
in upstream kernel and distributions this option is enabled. It means
that user doesn't want to see prints about missing locks too.

This patch replaces open-coded variant in favor of already existing
macro and change error prints to be once only.
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66364bdf

22 12月, 2017 11 次提交

batman-adv: Convert packet.h to uapi header · fec149f5

由 Sven Eckelmann 提交于 12月 21, 2017

The header file is used by different userspace programs to inject packets
or to decode sniffed packets. It should therefore be available to them as
userspace header.

Also other components in the kernel (like the flow dissector) require
access to the packet definitions to be able to decode ETH_P_BATMAN ethernet
packets.
Signed-off-by: NSven Eckelmann <sven.eckelmann@openmesh.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fec149f5

net: reevalulate autoflowlabel setting after sysctl setting · 513674b5

由 Shaohua Li 提交于 12月 20, 2017

sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2.
If sockopt doesn't set autoflowlabel, outcome packets from the hosts are
supposed to not include flowlabel. This is true for normal packet, but
not for reset packet.

The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if
we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't
changed, so the sock will keep the old behavior in terms of auto
flowlabel. Reset packet is suffering from this problem, because reset
packet is sent from a special control socket, which is created at boot
time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control
socket will always have its ipv6_pinfo.autoflowlabel set, even after
user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always
have flowlabel. Normal sock created before sysctl setting suffers from
the same issue. We can't even turn off autoflowlabel unless we kill all
socks in the hosts.

To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the
autoflowlabel setting from user, otherwise we always call
ip6_default_np_autolabel() which has the new settings of sysctl.

Note, this changes behavior a little bit. Before commit 42240901
(ipv6: Implement different admin modes for automatic flow labels), the
autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes,
existing connection will change autoflowlabel behavior. After that
commit, autoflowlabel behavior is sticky in the whole life of the sock.
With this patch, the behavior isn't sticky again.

Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Herbert <tom@quantonium.net>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

513674b5

net: sch: api: add extack support in qdisc_create_dflt · a38a9882

由 Alexander Aring 提交于 12月 20, 2017

This patch adds extack support for the function qdisc_create_dflt which is
a common used function in the tc subsystem. Callers which are interested
in the receiving error can assign extack to get a more detailed
information why qdisc_create_dflt failed. The function qdisc_create_dflt
will also call an init callback which can fail by any per-qdisc specific
handling.

Cc: David Ahern <dsahern@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NAlexander Aring <aring@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a38a9882

net: sch: api: add extack support in qdisc_alloc · d0bd684d

由 Alexander Aring 提交于 12月 20, 2017

This patch adds extack support for the function qdisc_alloc which is
a common used function in the tc subsystem. Callers which are interested
in the receiving error can assign extack to get a more detailed
information why qdisc_alloc failed.

Cc: David Ahern <dsahern@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NAlexander Aring <aring@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d0bd684d

net: sch: api: add extack support in tcf_block_get · 8d1a77f9

由 Alexander Aring 提交于 12月 20, 2017

This patch adds extack support for the function tcf_block_get which is
a common used function in the tc subsystem. Callers which are interested
in the receiving error can assign extack to get a more detailed
information why tcf_block_get failed.

Cc: David Ahern <dsahern@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NAlexander Aring <aring@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d1a77f9

net: sch: api: add extack support in qdisc_get_rtab · e9bc3fa2

由 Alexander Aring 提交于 12月 20, 2017

This patch adds extack support for the function qdisc_get_rtab which is
a common used function in the tc subsystem. Callers which are interested
in the receiving error can assign extack to get a more detailed
information why qdisc_get_rtab failed.

Cc: David Ahern <dsahern@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NAlexander Aring <aring@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e9bc3fa2

net: sched: sch: add extack for graft callback · 653d6fd6

由 Alexander Aring 提交于 12月 20, 2017

This patch adds extack support for graft callback to prepare per-qdisc
specific changes for extack.

Cc: David Ahern <dsahern@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NAlexander Aring <aring@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

653d6fd6

net: sched: sch: add extack for block callback · cbaacc4e

由 Alexander Aring 提交于 12月 20, 2017

This patch adds extack support for block callback to prepare per-qdisc
specific changes for extack.

Cc: David Ahern <dsahern@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NAlexander Aring <aring@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cbaacc4e

net: sched: sch: add extack to change class · 793d81d6

由 Alexander Aring 提交于 12月 20, 2017

This patch adds extack support for class change callback api. This prepares
to handle extack support inside each specific class implementation.

Cc: David Ahern <dsahern@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NAlexander Aring <aring@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

793d81d6

net: sched: sch: add extack for change qdisc ops · 2030721c

由 Alexander Aring 提交于 12月 20, 2017

This patch adds extack support for change callback for qdisc ops
structtur to prepare per-qdisc specific changes for extack.

Cc: David Ahern <dsahern@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NAlexander Aring <aring@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2030721c

net: sched: sch: add extack for init callback · e63d7dfd

由 Alexander Aring 提交于 12月 20, 2017

This patch adds extack support for init callback to prepare per-qdisc
specific changes for extack.

Cc: David Ahern <dsahern@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NAlexander Aring <aring@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e63d7dfd

21 12月, 2017 9 次提交

xfrm: wrap xfrmdev_ops with offload config · 9cb0d21d

由 Shannon Nelson 提交于 12月 19, 2017

There's no reason to define netdev->xfrmdev_ops if
the offload facility is not CONFIG'd in.
Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

9cb0d21d

xfrm: check for xdo_dev_state_free · 7f05b467

由 Shannon Nelson 提交于 12月 19, 2017

The current XFRM code assumes that we've implemented the
xdo_dev_state_free() callback, even if it is meaningless to the driver.
This patch adds a check for it before calling, as done in other APIs,
to prevent a NULL function pointer kernel crash.
Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

7f05b467

bpf: fix integer overflows · bb7f0f98

由 Alexei Starovoitov 提交于 12月 18, 2017

There were various issues related to the limited size of integers used in
the verifier:
 - `off + size` overflow in __check_map_access()
 - `off + reg->off` overflow in check_mem_access()
 - `off + reg->var_off.value` overflow or 32-bit truncation of
   `reg->var_off.value` in check_mem_access()
 - 32-bit truncation in check_stack_boundary()

Make sure that any integer math cannot overflow by not allowing
pointer math with large values.

Also reduce the scope of "scalar op scalar" tracking.

Fixes: f1174f77 ("bpf/verifier: rework value tracking")
Reported-by: NJann Horn <jannh@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

bb7f0f98

block: unalign call_single_data in struct request · 4ccafe03

由 Jens Axboe 提交于 12月 20, 2017

A previous change blindly added massive alignment to the
call_single_data structure in struct request. This ballooned it in size
from 296 to 320 bytes on my setup, for no valid reason at all.

Use the unaligned struct __call_single_data variant instead.

Fixes: 966a9671 ("smp: Avoid using two cache lines for struct call_single_data")
Cc: stable@vger.kernel.org # v4.14
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4ccafe03

net: sock: replace sk_state_load with inet_sk_state_load and remove sk_state_store · 986ffdfd

由 Yafang Shao 提交于 12月 20, 2017

sk_state_load is only used by AF_INET/AF_INET6, so rename it to
inet_sk_state_load and move it into inet_sock.h.

sk_state_store is removed as it is not used any more.
Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

986ffdfd

net: tracepoint: replace tcp_set_state tracepoint with inet_sock_set_state tracepoint · 563e0bb0

由 Yafang Shao 提交于 12月 20, 2017

As sk_state is a common field for struct sock, so the state
transition tracepoint should not be a TCP specific feature.
Currently it traces all AF_INET state transition, so I rename this
tracepoint to inet_sock_set_state tracepoint with some minor changes and move it
into trace/events/sock.h.
We dont need to create a file named trace/events/inet_sock.h for this one single
tracepoint.

Two helpers are introduced to trace sk_state transition
    - void inet_sk_state_store(struct sock *sk, int newstate);
    - void inet_sk_set_state(struct sock *sk, int state);
As trace header should not be included in other header files,
so they are defined in sock.c.

The protocol such as SCTP maybe compiled as a ko, hence export
inet_sk_set_state().
Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

563e0bb0

tcp: Export to userspace the TCP state names for the trace events · d7b850a7

由 Steven Rostedt (VMware) 提交于 12月 20, 2017

The TCP trace events (specifically tcp_set_state), maps emums to symbol
names via __print_symbolic(). But this only works for reading trace events
from the tracefs trace files. If perf or trace-cmd were to record these
events, the event format file does not convert the enum names into numbers,
and you get something like:

__print_symbolic(REC->oldstate,
    { TCP_ESTABLISHED, "TCP_ESTABLISHED" },
    { TCP_SYN_SENT, "TCP_SYN_SENT" },
    { TCP_SYN_RECV, "TCP_SYN_RECV" },
    { TCP_FIN_WAIT1, "TCP_FIN_WAIT1" },
    { TCP_FIN_WAIT2, "TCP_FIN_WAIT2" },
    { TCP_TIME_WAIT, "TCP_TIME_WAIT" },
    { TCP_CLOSE, "TCP_CLOSE" },
    { TCP_CLOSE_WAIT, "TCP_CLOSE_WAIT" },
    { TCP_LAST_ACK, "TCP_LAST_ACK" },
    { TCP_LISTEN, "TCP_LISTEN" },
    { TCP_CLOSING, "TCP_CLOSING" },
    { TCP_NEW_SYN_RECV, "TCP_NEW_SYN_RECV" })

Where trace-cmd and perf do not know the values of those enums.

Use the TRACE_DEFINE_ENUM() macros that will have the trace events convert
the enum strings into their values at system boot. This will allow perf and
trace-cmd to see actual numbers and not enums:

__print_symbolic(REC->oldstate,
    { 1, "TCP_ESTABLISHED" },
    { 2, "TCP_SYN_SENT" },
    { 3, "TCP_SYN_RECV" },
    { 4, "TCP_FIN_WAIT1" },
    { 5, "TCP_FIN_WAIT2" },
    { 6, "TCP_TIME_WAIT" },
    { 7, "TCP_CLOSE" },
    { 8, "TCP_CLOSE_WAIT" },
    { 9, "TCP_LAST_ACK" },
    { 10, "TCP_LISTEN" },
    { 11, "TCP_CLOSING" },
    { 12, "TCP_NEW_SYN_RECV" })
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d7b850a7

block-throttle: avoid double charge · 111be883

由 Shaohua Li 提交于 12月 20, 2017

If a bio is throttled and split after throttling, the bio could be
resubmited and enters the throttling again. This will cause part of the
bio to be charged multiple times. If the cgroup has an IO limit, the
double charge will significantly harm the performance. The bio split
becomes quite common after arbitrary bio size change.

To fix this, we always set the BIO_THROTTLED flag if a bio is throttled.
If the bio is cloned/split, we copy the flag to new bio too to avoid a
double charge. However, cloned bio could be directed to a new disk,
keeping the flag be a problem. The observation is we always set new disk
for the bio in this case, so we can clear the flag in bio_set_dev().

This issue exists for a long time, arbitrary bio size change just makes
it worse, so this should go into stable at least since v4.2.

V1-> V2: Not add extra field in bio based on discussion with Tejun

Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

111be883

cls_bpf: fix offload assumptions after callback conversion · 102740bd

由 Jakub Kicinski 提交于 12月 19, 2017

cls_bpf used to take care of tracking what offload state a filter
is in, i.e. it would track if offload request succeeded or not.
This information would then be used to issue correct requests to
the driver, e.g. requests for statistics only on offloaded filters,
removing only filters which were offloaded, using add instead of
replace if previous filter was not added etc.

This tracking of offload state no longer functions with the new
callback infrastructure.  There could be multiple entities trying
to offload the same filter.

Throw out all the tracking and corresponding commands and simply
pass to the drivers both old and new bpf program.  Drivers will
have to deal with offload state tracking by themselves.

Fixes: 3f7889c4 ("net: sched: cls_bpf: call block callbacks for offload")
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

102740bd

20 12月, 2017 6 次提交

xfrm: Allow IPsec GSO with software crypto for local sockets. · 2271d519

由 Steffen Klassert 提交于 12月 20, 2017

With support of async crypto operations in the GSO codepath
we have everything in place to allow GSO for local sockets.
This patch enables the GSO codepath.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

2271d519

net: Add asynchronous callbacks for xfrm on layer 2. · f53c7239

由 Steffen Klassert 提交于 12月 20, 2017

This patch implements asynchronous crypto callbacks
and a backlog handler that can be used when IPsec
is done at layer 2 in the TX path. It also extends
the skb validate functions so that we can update
the driver transmit return codes based on async
crypto operation or to indicate that we queued the
packet in a backlog queue.

Joint work with: Aviv Heller <avivh@mellanox.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

f53c7239

xfrm: Separate ESP handling from segmentation for GRO packets. · 3dca3f38

由 Steffen Klassert 提交于 12月 20, 2017

We change the ESP GSO handlers to only segment the packets.
The ESP handling and encryption is defered to validate_xmit_xfrm()
where this is done for non GRO packets too. This makes the code
more robust and prepares for asynchronous crypto handling.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

3dca3f38

net/mlx5: Cleanup IRQs in case of unload failure · d6b2785c

由 Moshe Shemesh 提交于 11月 21, 2017

When mlx5_stop_eqs fails to destroy any of the eqs it returns with an error.
In such failure flow the function will return without
releasing all EQs irqs and then pci_free_irq_vectors will fail.
Fix by only warn on destroy EQ failure and continue to release other
EQs and their irqs.

It fixes the following kernel trace:
kernel: kernel BUG at drivers/pci/msi.c:352!
...
...
kernel: Call Trace:
kernel: pci_disable_msix+0xd3/0x100
kernel: pci_free_irq_vectors+0xe/0x20
kernel: mlx5_load_one.isra.17+0x9f5/0xec0 [mlx5_core]

Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

d6b2785c

net/mlx5: Fix rate limit packet pacing naming and struct · 37e92a9d

由 Eran Ben Elisha 提交于 11月 13, 2017

In mlx5_ifc, struct size was not complete, and thus driver was sending
garbage after the last defined field. Fixed it by adding reserved field
to complete the struct size.

In addition, rename all set_rate_limit to set_pp_rate_limit to be
compliant with the Firmware <-> Driver definition.

Fixes: 7486216b ("{net,IB}/mlx5: mlx5_ifc updates")
Fixes: 1466cc5b ("net/mlx5: Rate limit tables support")
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

37e92a9d

Revert "mlx5: move affinity hints assignments to generic code" · 231243c8

由 Saeed Mahameed 提交于 11月 10, 2017

Before the offending commit, mlx5 core did the IRQ affinity itself,
and it seems that the new generic code have some drawbacks and one
of them is the lack for user ability to modify irq affinity after
the initial affinity values got assigned.

The issue is still being discussed and a solution in the new generic code
is required, until then we need to revert this patch.

This fixes the following issue:
echo <new affinity> > /proc/irq/<x>/smp_affinity
fails with  -EIO

This reverts commit a435393a.
Note: kept mlx5_get_vector_affinity in include/linux/mlx5/driver.h since
it is used in mlx5_ib driver.

Fixes: a435393a ("mlx5: move affinity hints assignments to generic code")
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jes Sorensen <jsorensen@fb.com>
Reported-by: NJes Sorensen <jsorensen@fb.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

231243c8

19 12月, 2017 7 次提交

net: Introduce NETIF_F_GRO_HW. · fb1f5f79

由 Michael Chan 提交于 12月 16, 2017

Introduce NETIF_F_GRO_HW feature flag for NICs that support hardware
GRO.  With this flag, we can now independently turn on or off hardware
GRO when GRO is on.  Previously, drivers were using NETIF_F_GRO to
control hardware GRO and so it cannot be independently turned on or
off without affecting GRO.

Hardware GRO (just like GRO) guarantees that packets can be re-segmented
by TSO/GSO to reconstruct the original packet stream.  Logically,
GRO_HW should depend on GRO since it a subset, but we will let
individual drivers enforce this dependency as they see fit.

Since NETIF_F_GRO is not propagated between upper and lower devices,
NETIF_F_GRO_HW should follow suit since it is a subset of GRO.  In other
words, a lower device can independent have GRO/GRO_HW enabled or disabled
and no feature propagation is required.  This will preserve the current
GRO behavior.  This can be changed later if we decide to propagate GRO/
GRO_HW/RXCSUM from upper to lower devices.

Cc: Ariel Elior <Ariel.Elior@cavium.com>
Cc: everest-linux-l2@cavium.com
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb1f5f79

sock: Hide unused variable when !CONFIG_PROC_FS. · 398b841e

由 Tonghao Zhang 提交于 12月 14, 2017

When CONFIG_PROC_FS is disabled, we will not use the prot_inuse
counter. This adds an #ifdef to hide the variable definition in
that case. This is not a bugfix. But we can save bytes when there
are many network namespace.

Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: NMartin Zhang <zhangjunweimartin@didichuxing.com>
Signed-off-by: NTonghao Zhang <zhangtonghao@didichuxing.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

398b841e

sock: Move the socket inuse to namespace. · 648845ab

由 Tonghao Zhang 提交于 12月 14, 2017

In some case, we want to know how many sockets are in use in
different _net_ namespaces. It's a key resource metric.

This patch add a member in struct netns_core. This is a counter
for socket-inuse in the _net_ namespace. The patch will add/sub
counter in the sk_alloc, sk_clone_lock and __sk_free.

This patch will not counter the socket created in kernel.
It's not very useful for userspace to know how many kernel
sockets we created.

The main reasons for doing this are that:

1. When linux calls the 'do_exit' for process to exit, the functions
'exit_task_namespaces' and 'exit_task_work' will be called sequentially.
'exit_task_namespaces' may have destroyed the _net_ namespace, but
'sock_release' called in 'exit_task_work' may use the _net_ namespace
if we counter the socket-inuse in sock_release.

2. socket and sock are in pair. More important, sock holds the _net_
namespace. We counter the socket-inuse in sock, for avoiding holding
_net_ namespace again in socket. It's a easy way to maintain the code.
Signed-off-by: NMartin Zhang <zhangjunweimartin@didichuxing.com>
Signed-off-by: NTonghao Zhang <zhangtonghao@didichuxing.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

648845ab

sock: Change the netns_core member name. · 08fc7f81

由 Tonghao Zhang 提交于 12月 14, 2017

Change the member name will make the code more readable.
This patch will be used in next patch.
Signed-off-by: NMartin Zhang <zhangjunweimartin@didichuxing.com>
Signed-off-by: NTonghao Zhang <zhangtonghao@didichuxing.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

08fc7f81

nl80211: Remove obsolete kerneldoc line · 958a1b5a

由 Jonathan Corbet 提交于 12月 11, 2017

Commit ca986ad9 (nl80211: allow multiple active scheduled scan
requests) removed WIPHY_FLAG_SUPPORTS_SCHED_SCAN but left the kerneldoc
description in place, leading to this docs-build warning:

   ./include/net/cfg80211.h:3278: warning: Excess enum value
           'WIPHY_FLAG_SUPPORTS_SCHED_SCAN' description in 'wiphy_flags'

Remove the line and gain a bit of peace.
Signed-off-by: NJonathan Corbet <corbet@lwn.net>
Acked-by: NArend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

958a1b5a

block: fix blk_rq_append_bio · 0abc2a10

由 Jens Axboe 提交于 12月 18, 2017

Commit caa4b024(blk-map: call blk_queue_bounce from blk_rq_append_bio)
moves blk_queue_bounce() into blk_rq_append_bio(), but don't consider
the fact that the bounced bio becomes invisible to caller since the
parameter type is 'struct bio *'. Make it a pointer to a pointer to
a bio, so the caller sees the right bio also after a bounce.

Fixes: caa4b024 ("blk-map: call blk_queue_bounce from blk_rq_append_bio")
Cc: Christoph Hellwig <hch@lst.de>
Reported-by: NMichele Ballabio <barra_cuda@katamail.com>
(handling failure of blk_rq_append_bio(), only call bio_get() after
blk_rq_append_bio() returns OK)
Tested-by: NMichele Ballabio <barra_cuda@katamail.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0abc2a10

block: don't let passthrough IO go into .make_request_fn() · 14cb0dc6

由 Ming Lei 提交于 12月 18, 2017

Commit a8821f3f("block: Improvements to bounce-buffer handling") tries
to make sure that the bio to .make_request_fn won't exceed BIO_MAX_PAGES,
but ignores that passthrough I/O can use blk_queue_bounce() too.
Especially, passthrough IO may not be sector-aligned, and the check
of 'sectors < bio_sectors(*bio_orig)' inside __blk_queue_bounce() may
become true even though the max bvec number doesn't exceed BIO_MAX_PAGES,
then cause the bio splitted, and the original passthrough bio is submited
to generic_make_request().

This patch fixes this issue by checking if the bio is passthrough IO,
and use bio_kmalloc() to allocate the cloned passthrough bio.

Cc: NeilBrown <neilb@suse.com>
Fixes: a8821f3f("block: Improvements to bounce-buffer handling")
Tested-by: NMichele Ballabio <barra_cuda@katamail.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

14cb0dc6

18 12月, 2017 3 次提交

KVM: Fix stack-out-of-bounds read in write_mmio · e39d200f

由 Wanpeng Li 提交于 12月 14, 2017

Reported by syzkaller:

  BUG: KASAN: stack-out-of-bounds in write_mmio+0x11e/0x270 [kvm]
  Read of size 8 at addr ffff8803259df7f8 by task syz-executor/32298

  CPU: 6 PID: 32298 Comm: syz-executor Tainted: G           OE    4.15.0-rc2+ #18
  Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016
  Call Trace:
   dump_stack+0xab/0xe1
   print_address_description+0x6b/0x290
   kasan_report+0x28a/0x370
   write_mmio+0x11e/0x270 [kvm]
   emulator_read_write_onepage+0x311/0x600 [kvm]
   emulator_read_write+0xef/0x240 [kvm]
   emulator_fix_hypercall+0x105/0x150 [kvm]
   em_hypercall+0x2b/0x80 [kvm]
   x86_emulate_insn+0x2b1/0x1640 [kvm]
   x86_emulate_instruction+0x39a/0xb90 [kvm]
   handle_exception+0x1b4/0x4d0 [kvm_intel]
   vcpu_enter_guest+0x15a0/0x2640 [kvm]
   kvm_arch_vcpu_ioctl_run+0x549/0x7d0 [kvm]
   kvm_vcpu_ioctl+0x479/0x880 [kvm]
   do_vfs_ioctl+0x142/0x9a0
   SyS_ioctl+0x74/0x80
   entry_SYSCALL_64_fastpath+0x23/0x9a

The path of patched vmmcall will patch 3 bytes opcode 0F 01 C1(vmcall)
to the guest memory, however, write_mmio tracepoint always prints 8 bytes
through *(u64 *)val since kvm splits the mmio access into 8 bytes. This
leaks 5 bytes from the kernel stack (CVE-2017-17741).  This patch fixes
it by just accessing the bytes which we operate on.

Before patch:

syz-executor-5567  [007] .... 51370.561696: kvm_mmio: mmio write len 3 gpa 0x10 val 0x1ffff10077c1010f

After patch:

syz-executor-13416 [002] .... 51302.299573: kvm_mmio: mmio write len 3 gpa 0x10 val 0xc1010f
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
Tested-by: NMarc Zyngier <marc.zyngier@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e39d200f

KVM: arm/arm64: timer: Don't set irq as forwarded if no usable GIC · f384dcfe

由 Marc Zyngier 提交于 12月 07, 2017

If we don't have a usable GIC, do not try to set the vcpu affinity
as this is guaranteed to fail.
Reported-by: NAndre Przywara <andre.przywara@arm.com>
Reviewed-by: NAndre Przywara <andre.przywara@arm.com>
Tested-by: NAndre Przywara <andre.przywara@arm.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

f384dcfe

bpf: x64: add JIT support for multi-function programs · 1c2a088a

由 Alexei Starovoitov 提交于 12月 14, 2017

Typical JIT does several passes over bpf instructions to
compute total size and relative offsets of jumps and calls.
With multitple bpf functions calling each other all relative calls
will have invalid offsets intially therefore we need to additional
last pass over the program to emit calls with correct offsets.
For example in case of three bpf functions:
main:
  call foo
  call bpf_map_lookup
  exit
foo:
  call bar
  exit
bar:
  exit

We will call bpf_int_jit_compile() indepedently for main(), foo() and bar()
x64 JIT typically does 4-5 passes to converge.
After these initial passes the image for these 3 functions
will be good except call targets, since start addresses of
foo() and bar() are unknown when we were JITing main()
(note that call bpf_map_lookup will be resolved properly
during initial passes).
Once start addresses of 3 functions are known we patch
call_insn->imm to point to right functions and call
bpf_int_jit_compile() again which needs only one pass.
Additional safety checks are done to make sure this
last pass doesn't produce image that is larger or smaller
than previous pass.

When constant blinding is on it's applied to all functions
at the first pass, since doing it once again at the last
pass can change size of the JITed code.

Tested on x64 and arm64 hw with JIT on/off, blinding on/off.
x64 jits bpf-to-bpf calls correctly while arm64 falls back to interpreter.
All other JITs that support normal BPF_CALL will behave the same way
since bpf-to-bpf call is equivalent to bpf-to-kernel call from
JITs point of view.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

1c2a088a

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功