提交 · 11bf284f81b46f59d5f4a4522c13aa7852cfd560 · openanolis / cloud-kernel

15 11月, 2017 1 次提交

net: Protect iterations over net::fib_notifier_ops in fib_seq_sum() · 11bf284f

由 Kirill Tkhai 提交于 11月 14, 2017

There is at least unlocked deletion of net->ipv4.fib_notifier_ops
from net::fib_notifier_ops:

ip_fib_net_exit()
  rtnl_unlock()
  fib4_notifier_exit()
    fib_notifier_ops_unregister(net->ipv4.notifier_ops)
      list_del_rcu(&ops->list)

So fib_seq_sum() can't use rtnl_lock() only for protection.

The possible solution could be to use rtnl_lock()
in fib_notifier_ops_unregister(), but this adds
a possible delay during net namespace creation,
so we better use rcu_read_lock() till someone
really needs the mutex (if that happens).
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

11bf284f

14 11月, 2017 12 次提交

net-sysfs: trigger netlink notification on ifalias change via sysfs · c92eb77a

由 Roopa Prabhu 提交于 11月 13, 2017

This patch adds netlink notifications on iflias changes via sysfs.
makes it consistent with the netlink path which also calls
netdev_state_change. Also makes it consistent with other sysfs
netdev_store operations.
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c92eb77a

net: core: dev_get_valid_name is now the same as dev_alloc_name_ns · 87c320e5

由 Rasmus Villemoes 提交于 11月 13, 2017

If name contains a %, it's easy to see that this patch doesn't change
anything (other than eliminate the duplicate dev_valid_name
call). Otherwise, we'll now just spend a little time in snprintf()
copying name to the stack buffer allocated in dev_alloc_name_ns, and do
the __dev_get_by_name using that buffer rather than name.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87c320e5

net: core: maybe return -EEXIST in __dev_alloc_name · d6f295e9

由 Rasmus Villemoes 提交于 11月 13, 2017

If we're given format string with no %d, -EEXIST is a saner error code.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6f295e9

net: core: check dev_valid_name in __dev_alloc_name · 93809105

由 Rasmus Villemoes 提交于 11月 13, 2017

We currently only exclude non-sysfs-friendly names via
dev_get_valid_name; there doesn't seem to be a reason to allow such
names when we're called via dev_alloc_name.

This does duplicate the dev_valid_name check in the dev_get_valid_name()
case; we'll fix that shortly.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93809105

net: core: drop pointless check in __dev_alloc_name · 6224abda

由 Rasmus Villemoes 提交于 11月 13, 2017

The only caller passes a stack buffer as buf, so it won't equal the
passed-in name. Moreover, we're already using buf as a scratch buffer
inside the if (p) {} block, so if buf and name were the same, that
snprintf() call would be overwriting its own format string.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6224abda

net: core: eliminate dev_alloc_name{,_ns} code duplication · c46d7642

由 Rasmus Villemoes 提交于 11月 13, 2017

dev_alloc_name contained a BUG_ON(), which I moved to dev_alloc_name_ns;
the only other caller of that already has the same BUG_ON.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c46d7642

net: core: move dev_alloc_name_ns a little higher · 2c88b855

由 Rasmus Villemoes 提交于 11月 13, 2017

No functional change.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c88b855

net: core: improve sanity checking in __dev_alloc_name · 51f299dd

由 Rasmus Villemoes 提交于 11月 13, 2017

__dev_alloc_name is called from the public (and exported)
dev_alloc_name(), so we don't have a guarantee that strlen(name) is at
most IFNAMSIZ. If somebody manages to get __dev_alloc_name called with a
% char beyond the 31st character, we'd be making a snprintf() call that
will very easily crash the kernel (using an appropriate %p extension,
we'll likely dereference some completely bogus pointer).

In the normal case where strlen() is sane, we don't even save anything
by limiting to IFNAMSIZ, so just use strchr().
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51f299dd

tcp: allow drivers to tweak TSQ logic · 3a9b76fd

由 Eric Dumazet 提交于 11月 11, 2017

I had many reports that TSQ logic breaks wifi aggregation.

Current logic is to allow up to 1 ms of bytes to be queued into qdisc
and drivers queues.

But Wifi aggregation needs a bigger budget to allow bigger rates to
be discovered by various TCP Congestion Controls algorithms.

This patch adds an extra socket field, allowing wifi drivers to select
another log scale to derive TCP Small Queue credit from current pacing
rate.

Initial value is 10, meaning that this patch does not change current
behavior.

We expect wifi drivers to set this field to smaller values (tests have
been done with values from 6 to 9)

They would have to use following template :

if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT)
     skb->sk->sk_pacing_shift = MY_PACING_SHIFT;

Ref: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1670041Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Toke Høiland-Jørgensen <toke@toke.dk>
Cc: Kir Kolyshkin <kir@openvz.org>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a9b76fd

fib_rules: exit_net cleanup check added · ce2b7db3

由 Vasily Averin 提交于 11月 12, 2017

Be sure that rules_ops list initialized in net_init hook was return
to initial state.
Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce2b7db3

fib_notifier: exit_net cleanup check added · 0b6f5955

由 Vasily Averin 提交于 11月 12, 2017

Be sure that fib_notifier_ops list initilized in net_init hook was return
to initial state.
Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b6f5955

netdev: exit_net cleanup check added · ee21b18b

由 Vasily Averin 提交于 11月 12, 2017

Be sure that dev_base_head list initialized in net_init hook was return
to initial state
Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee21b18b

11 11月, 2017 2 次提交

sock: Remove the global prot_inuse counter. · 5290ada4

由 Tonghao Zhang 提交于 11月 09, 2017

The per-cpu counter for init_net is prepared in core_initcall.
The patch 7d720c3e ("percpu: add __percpu sparse annotations to net")
and d6d9ca0f ("net: this_cpu_xxx conversions") optimize the
routines. Then remove the old counter.

Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: NTonghao Zhang <zhangtonghao@didichuxing.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5290ada4

tipc: improve link resiliency when rps is activated · 8d6e79d3

由 Jon Maloy 提交于 11月 08, 2017

Currently, the TIPC RPS dissector is based only on the incoming packets'
source node address, hence steering all traffic from a node to the same
core. We have seen that this makes the links vulnerable to starvation
and unnecessary resets when we turn down the link tolerance to very low
values.

To reduce the risk of this happening, we exempt probe and probe replies
packets from the convergence to one core per source node. Instead, we do
the opposite, - we try to diverge those packets across as many cores as
possible, by randomizing the flow selector key.

To make such packets identifiable to the dissector, we add a new
'is_keepalive' bit to word 0 of the LINK_PROTOCOL header. This bit is
set both for PROBE and PROBE_REPLY messages, and only for those.

It should be noted that these packets are not part of any flow anyway,
and only constitute a minuscule fraction of all packets sent across a
link. Hence, there is no risk that this will affect overall performance.
Acked-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d6e79d3

10 11月, 2017 1 次提交

net: allow per netns sysctl_rmem and sysctl_wmem for protos · a3dcaf17

由 Eric Dumazet 提交于 11月 07, 2017

As we want to gradually implement per netns sysctl_rmem and sysctl_wmem
on per protocol basis, add two new fields in struct proto,
and two new helpers : sk_get_wmem0() and sk_get_rmem0()

First user will be TCP. Then UDP and SCTP can be easily converted,
while DECNET probably wont get this support.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3dcaf17

08 11月, 2017 2 次提交

pktgen: document 32-bit timestamp overflow · 7f5d3f27

由 Arnd Bergmann 提交于 11月 07, 2017

Timestamps in pktgen are currently retrieved using the deprecated
do_gettimeofday() function that wraps its signed 32-bit seconds in 2038
(on 32-bit architectures) and requires a division operation to calculate
microseconds.

The pktgen header is also defined with the same limitations, hardcoding
to a 32-bit seconds field that can be interpreted as unsigned to produce
times that only wrap in 2106. Whatever code reads the timestamps should
be aware of that problem in general, but probably doesn't care too
much as we are mostly interested in the time passing between packets,
and that is correctly represented.

Using 64-bit nanoseconds would be cheaper and good for 584 years. Using
monotonic times would also make this unambiguous by avoiding the overflow,
but would make it harder to correlate to the times with those on remote
machines. Either approach would require adding a new runtime flag and
implementing the same thing on the remote side, which we probably don't
want to do unless someone sees it as a real problem. Also, this should
be coordinated with other pktgen implementations and might need a new
magic number.

For the moment, I'm documenting the overflow in the source code, and
changing the implementation over to an open-coded ktime_get_real_ts64()
plus division, so we don't have to look at it again while scanning for
deprecated time interfaces.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f5d3f27

rtnetlink: fix missing size for IFLA_IF_NETNSID · 03ac738d

由 Colin Ian King 提交于 11月 06, 2017

The size for IFLA_IF_NETNSID is missing from the size calculation
because the proceeding semicolon was not removed. Fix this by removing
the semicolon.

Detected by CoverityScan, CID#1461135 ("Structurally dead code")

Fixes: 79e1ad14 ("rtnetlink: use netnsid to query interface")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Acked-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03ac738d

05 11月, 2017 6 次提交

bpf: remove old offload/analyzer · b37a5306

由 Jakub Kicinski 提交于 11月 03, 2017

Thanks to the ability to load a program for a specific device,
running verifier twice is no longer needed.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b37a5306

xdp: allow attaching programs loaded for specific device · 248f346f

由 Jakub Kicinski 提交于 11月 03, 2017

Pass the netdev pointer to bpf_prog_get_type().  This way
BPF code can decide whether the device matches what the
code was loaded/translated for.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

248f346f

net: bpf: rename ndo_xdp to ndo_bpf · f4e63525

由 Jakub Kicinski 提交于 11月 03, 2017

ndo_xdp is a control path callback for setting up XDP in the
driver.  We can reuse it for other forms of communication
between the eBPF stack and the drivers.  Rename the callback
and associated structures and definitions.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4e63525

rtnetlink: use netnsid to query interface · 79e1ad14

由 Jiri Benc 提交于 11月 02, 2017

Currently, when an application gets netnsid from the kernel (for example as
the result of RTM_GETLINK call on one end of the veth pair), it's not much
useful. There's no reliable way to get to the netns fd from the netnsid, nor
does any kernel API accept netnsid.

Extend the RTM_GETLINK call to also accept netnsid. It will operate on the
netns with the given netnsid in such case. Of course, the calling process
needs to have enough capabilities in the target name space; for now, require
CAP_NET_ADMIN. This can be relaxed in the future.

To signal to the calling process that the kernel understood the new
IFLA_IF_NETNSID attribute in the query, it will include it in the response.
This is needed to detect older kernels, as they will just ignore
IFLA_IF_NETNSID and query in the current name space.

This patch implemetns IFLA_IF_NETNSID only for get and dump. For set
operations, this can be extended later.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79e1ad14

net: export peernet2id_alloc · 7cbebc8a

由 Jiri Benc 提交于 11月 02, 2017

It will be used by openvswitch.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7cbebc8a

pktgen: do not abuse IN6_ADDR_HSIZE · df7e8e2e

由 Eric Dumazet 提交于 11月 04, 2017

pktgen accidentally used IN6_ADDR_HSIZE, instead of using the size of an
IPv6 address.

Since IN6_ADDR_HSIZE recently was increased from 16 to 256, this old
bug is hitting us.

Fixes: 3f27fb23 ("ipv6: addrconf: add per netns perturbation in inet6_addr_hash()")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df7e8e2e

04 11月, 2017 1 次提交

netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed · 2b5ec1a5

由 Ye Yin 提交于 10月 26, 2017

When run ipvs in two different network namespace at the same host, and one
ipvs transport network traffic to the other network namespace ipvs.
'ipvs_property' flag will make the second ipvs take no effect. So we should
clear 'ipvs_property' when SKB network namespace changed.

Fixes: 621e84d6 ("dev: introduce skb_scrub_packet()")
Signed-off-by: NYe Yin <hustcat@gmail.com>
Signed-off-by: NWei Zhou <chouryzhou@gmail.com>
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b5ec1a5

03 11月, 2017 1 次提交

net: core: introduce mini_Qdisc and eliminate usage of tp->q for clsact fastpath · 46209401

由 Jiri Pirko 提交于 11月 03, 2017

In sch_handle_egress and sch_handle_ingress tp->q is used only in order
to update stats. So stats and filter list are the only things that are
needed in clsact qdisc fastpath processing. Introduce new mini_Qdisc
struct to hold those items. Also, introduce a helper to swap the
mini_Qdisc structures in case filter list head changes.

This removes need for tp->q usage without added overhead.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46209401

02 11月, 2017 1 次提交

License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318

由 Greg Kroah-Hartman 提交于 11月 01, 2017

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b2441318

01 11月, 2017 1 次提交

net: Add extack to fib_notifier_info · 6c31e5a9

由 David Ahern 提交于 10月 27, 2017

Add extack to fib_notifier_info and plumb through stack to
call_fib_rule_notifiers, call_fib_entry_notifiers and
call_fib6_entry_notifiers. This allows notifer handlers to
return messages to user.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c31e5a9

31 10月, 2017 1 次提交

net: filter: remove unused variable and fix warning · aa2bc739

由 Jakub Kicinski 提交于 10月 30, 2017

bpf_getsockopt bpf call sets the ret variable to zero and
never changes it.  What's worse in case CONFIG_INET is
not selected the variable is completely unused generating
a warning.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Acked-by: NLawrence Brakmo <brakmo@fb.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa2bc739

29 10月, 2017 2 次提交

bpf: rename sk_actions to align with bpf infrastructure · bfa64075

由 John Fastabend 提交于 10月 27, 2017

Recent additions to support multiple programs in cgroups impose
a strict requirement, "all yes is yes, any no is no". To enforce
this the infrastructure requires the 'no' return code, SK_DROP in
this case, to be 0.

To apply these rules to SK_SKB program types the sk_actions return
codes need to be adjusted.

This fix adds SK_PASS and makes 'SK_DROP = 0'. Finally, remove
SK_ABORTED to remove any chance that the API may allow aborted
program flows to be passed up the stack. This would be incorrect
behavior and allow programs to break existing policies.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bfa64075

bpf: bpf_compute_data uses incorrect cb structure · 8108a775

由 John Fastabend 提交于 10月 27, 2017

SK_SKB program types use bpf_compute_data to store the end of the
packet data. However, bpf_compute_data assumes the cb is stored in the
qdisc layer format. But, for SK_SKB this is the wrong layer of the
stack for this type.

It happens to work (sort of!) because in most cases nothing happens
to be overwritten today. This is very fragile and error prone.
Fortunately, we have another hole in tcp_skb_cb we can use so lets
put the data_end value there.

Note, SK_SKB program types do not use data_meta, they are failed by
sk_skb_is_valid_access().
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8108a775

25 10月, 2017 2 次提交

bonding: remove rtmsg_ifinfo called after bond_lower_state_changed · ef5201c8

由 Xin Long 提交于 10月 24, 2017

After the patch 'rtnetlink: bring NETDEV_CHANGELOWERSTATE event
process back to rtnetlink_event', bond_lower_state_changed would
generate NETDEV_CHANGEUPPER event which would send a notification
to userspace in rtnetlink_event.

There's no need to call rtmsg_ifinfo to send the notification
any more. So this patch is to remove it from these places after
bond_lower_state_changed.

Besides, after this, rtmsg_ifinfo is not needed to be exported.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef5201c8

rtnetlink: bring NETDEV_CHANGELOWERSTATE event process back to rtnetlink_event · eeda3fb9

由 Xin Long 提交于 10月 24, 2017

This patch is to bring NETDEV_CHANGELOWERSTATE event process back
to rtnetlink_event so that bonding could use it instead of calling
rtmsg_ifinfo to send a notification to userspace after netdev lower
state is changed in the later patch.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eeda3fb9

24 10月, 2017 1 次提交

tcp: add tracepoint trace_tcp_send_reset · c24b14c4

由 Song Liu 提交于 10月 23, 2017

New tracepoint trace_tcp_send_reset is added and called from
tcp_v4_send_reset(), tcp_v6_send_reset() and tcp_send_active_reset().
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c24b14c4

23 10月, 2017 1 次提交

net: core: rtnetlink: use BUG_ON instead of if condition followed by BUG · 058c8d59

由 Gustavo A. R. Silva 提交于 10月 20, 2017

Use BUG_ON instead of if condition followed by BUG in do_setlink.

This issue was detected with the help of Coccinelle.
Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

058c8d59

22 10月, 2017 4 次提交

bpf: Adding helper function bpf_getsockops · cd86d1fd

由 Lawrence Brakmo 提交于 10月 20, 2017

Adding support for helper function bpf_getsockops to socket_ops BPF
programs. This patch only supports TCP_CONGESTION.
Signed-off-by: NVlad Vysotsky <vlad@cs.ucla.edu>
Acked-by: NLawrence Brakmo <brakmo@fb.com>
Acked-by: NAlexei Starovoitov <ast@fb.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd86d1fd

net: ethtool: remove error check for legacy setting transceiver type · 95491e3c

由 Niklas Söderlund 提交于 10月 20, 2017

Commit 9cab88726929605 ("net: ethtool: Add back transceiver type")
restores the transceiver type to struct ethtool_link_settings and
convert_link_ksettings_to_legacy_settings() but forgets to remove the
error check for the same in convert_legacy_settings_to_link_ksettings().
This prevents older versions of ethtool to change link settings.

    # ethtool --version
    ethtool version 3.16

    # ethtool -s eth0 autoneg on speed 100 duplex full
    Cannot set new settings: Invalid argument
      not setting speed
      not setting duplex
      not setting autoneg

While newer versions of ethtool works.

    # ethtool --version
    ethtool version 4.10

    # ethtool -s eth0 autoneg on speed 100 duplex full
    [   57.703268] sh-eth ee700000.ethernet eth0: Link is Down
    [   59.618227] sh-eth ee700000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx

Fixes: 19cab887 ("net: ethtool: Add back transceiver type")
Signed-off-by: NNiklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reported-by: NRenjith R V <renjith.rv@quest-global.com>
Tested-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95491e3c

soreuseport: fix initialization race · 1b5f962e

由 Craig Gallek 提交于 10月 19, 2017

Syzkaller stumbled upon a way to trigger
WARNING: CPU: 1 PID: 13881 at net/core/sock_reuseport.c:41
reuseport_alloc+0x306/0x3b0 net/core/sock_reuseport.c:39

There are two initialization paths for the sock_reuseport structure in a
socket: Through the udp/tcp bind paths of SO_REUSEPORT sockets or through
SO_ATTACH_REUSEPORT_[CE]BPF before bind. The existing implementation
assumedthat the socket lock protected both of these paths when it actually
only protects the SO_ATTACH_REUSEPORT path. Syzkaller triggered this
double allocation by running these paths concurrently.

This patch moves the check for double allocation into the reuseport_alloc
function which is protected by a global spin lock.

Fixes: e32ea7e7 ("soreuseport: fast reuseport UDP socket selection")
Fixes: c125e80b ("soreuseport: fast reuseport TCP socket selection")
Signed-off-by: NCraig Gallek <kraig@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b5f962e

sock: correct sk_wmem_queued accounting on efault in tcp zerocopy · 54d43117

由 Willem de Bruijn 提交于 10月 19, 2017

Syzkaller hits WARN_ON(sk->sk_wmem_queued) in sk_stream_kill_queues
after triggering an EFAULT in __zerocopy_sg_from_iter.

On this error, skb_zerocopy_stream_iter resets the skb to its state
before the operation with __pskb_trim. It cannot kfree_skb like
datagram callers, as the skb may have data from a previous send call.

__pskb_trim calls skb_condense for unowned skbs, which adjusts their
truesize. These tcp skbuffs are owned and their truesize must add up
to sk_wmem_queued. But they match because their skb->sk is NULL until
tcp_transmit_skb.

Temporarily set skb->sk when calling __pskb_trim to signal that the
skbuffs are owned and avoid the skb_condense path.

Fixes: 52267790 ("sock: add MSG_ZEROCOPY")
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54d43117

20 10月, 2017 1 次提交

bpf: remove mark access for SK_SKB program types · f7e9cb1e

由 John Fastabend 提交于 10月 18, 2017

The skb->mark field is a union with reserved_tailroom which is used
in the TCP code paths from stream memory allocation. Allowing SK_SKB
programs to set this field creates a conflict with future code
optimizations, such as "gifting" the skb to the egress path instead
of creating a new skb and doing a memcpy.

Because we do not have a released version of SK_SKB yet lets just
remove it for now. A more appropriate scratch pad to use at the
socket layer is dev_scratch, but lets add that in future kernels
when needed.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7e9cb1e

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功