提交 · 7d134041a89610ae552501fc88652805addcdee4 · openeuler / Kernel

25 5月, 2019 3 次提交

bpf: introduce new mov32 variant for doing explicit zero extension · 7d134041

由 Jiong Wang 提交于 5月 24, 2019

The encoding for this new variant is based on BPF_X format. "imm" field was
0 only, now it could be 1 which means doing zero extension unconditionally

  .code = BPF_ALU | BPF_MOV | BPF_X
  .dst_reg = DST
  .src_reg = SRC
  .imm  = 1

We use this new form for doing zero extension for which verifier will
guarantee SRC == DST.

Implications on JIT back-ends when doing code-gen for
BPF_ALU | BPF_MOV | BPF_X:
  1. No change if hardware already does zero extension unconditionally for
     sub-register write.
  2. Otherwise, when seeing imm == 1, just generate insns to clear high
     32-bit. No need to generate insns for the move because when imm == 1,
     dst_reg is the same as src_reg at the moment.

Interpreter doesn't need change as well. It is doing unconditionally zero
extension for mov32 already.

One helper macro BPF_ZEXT_REG is added to help creating zero extension
insn using this new mov32 variant.

One helper function insn_is_zext is added for checking one insn is an
zero extension on dst. This will be widely used by a few JIT back-ends in
later patches in this set.
Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

7d134041

bpf: verifier: mark verified-insn with sub-register zext flag · 5327ed3d

由 Jiong Wang 提交于 5月 24, 2019

eBPF ISA specification requires high 32-bit cleared when low 32-bit
sub-register is written. This applies to destination register of ALU32 etc.
JIT back-ends must guarantee this semantic when doing code-gen. x86_64 and
AArch64 ISA has the same semantics, so the corresponding JIT back-end
doesn't need to do extra work.

However, 32-bit arches (arm, x86, nfp etc.) and some other 64-bit arches
(PowerPC, SPARC etc) need to do explicit zero extension to meet this
requirement, otherwise code like the following will fail.

  u64_value = (u64) u32_value
  ... other uses of u64_value

This is because compiler could exploit the semantic described above and
save those zero extensions for extending u32_value to u64_value, these JIT
back-ends are expected to guarantee this through inserting extra zero
extensions which however could be a significant increase on the code size.
Some benchmarks show there could be ~40% sub-register writes out of total
insns, meaning at least ~40% extra code-gen.

One observation is these extra zero extensions are not always necessary.
Take above code snippet for example, it is possible u32_value will never be
casted into a u64, the value of high 32-bit of u32_value then could be
ignored and extra zero extension could be eliminated.

This patch implements this idea, insns defining sub-registers will be
marked when the high 32-bit of the defined sub-register matters. For
those unmarked insns, it is safe to eliminate high 32-bit clearnace for
them.

Algo:
 - Split read flags into READ32 and READ64.

 - Record index of insn that does sub-register write. Keep the index inside
   reg state and update it during verifier insn walking.

 - A full register read on a sub-register marks its definition insn as
   needing zero extension on dst register.

   A new sub-register write overrides the old one.

 - When propagating read64 during path pruning, also mark any insn defining
   a sub-register that is read in the pruned path as full-register.
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

5327ed3d

bpf: implement bpf_send_signal() helper · 8b401f9e

由 Yonghong Song 提交于 5月 23, 2019

This patch tries to solve the following specific use case.

Currently, bpf program can already collect stack traces
through kernel function get_perf_callchain()
when certain events happens (e.g., cache miss counter or
cpu clock counter overflows). But such stack traces are
not enough for jitted programs, e.g., hhvm (jited php).
To get real stack trace, jit engine internal data structures
need to be traversed in order to get the real user functions.

bpf program itself may not be the best place to traverse
the jit engine as the traversing logic could be complex and
it is not a stable interface either.

Instead, hhvm implements a signal handler,
e.g. for SIGALARM, and a set of program locations which
it can dump stack traces. When it receives a signal, it will
dump the stack in next such program location.

Such a mechanism can be implemented in the following way:
  . a perf ring buffer is created between bpf program
    and tracing app.
  . once a particular event happens, bpf program writes
    to the ring buffer and the tracing app gets notified.
  . the tracing app sends a signal SIGALARM to the hhvm.

But this method could have large delays and causing profiling
results skewed.

This patch implements bpf_send_signal() helper to send
a signal to hhvm in real time, resulting in intended stack traces.
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

8b401f9e

24 5月, 2019 2 次提交

bpf: convert explored_states to hash table · dc2a4ebc

由 Alexei Starovoitov 提交于 5月 21, 2019

All prune points inside a callee bpf function most likely will have
different callsites. For example, if function foo() is called from
two callsites the half of explored states in all prune points in foo()
will be useless for subsequent walking of one of those callsites.
Fortunately explored_states pruning heuristics keeps the number of states
per prune point small, but walking these states is still a waste of cpu
time when the callsite of the current state is different from the callsite
of the explored state.

To improve pruning logic convert explored_states into hash table and
use simple insn_idx ^ callsite hash to select hash bucket.
This optimization has no effect on programs without bpf2bpf calls
and drastically improves programs with calls.
In the later case it reduces total memory consumption in 1M scale tests
by almost 3 times (peak_states drops from 5752 to 2016).

Care should be taken when comparing the states for equivalency.
Since the same hash bucket can now contain states with different indices
the insn_idx has to be part of verifier_state and compared.

Different hash table sizes and different hash functions were explored,
but the results were not significantly better vs this patch.
They can be improved in the future.

Hit/miss heuristic is not counting index miscompare as a miss.
Otherwise verifier stats become unstable when experimenting
with different hash functions.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

dc2a4ebc

bpf: split explored_states · a8f500af

由 Alexei Starovoitov 提交于 5月 21, 2019

split explored_states into prune_point boolean mark
and link list of explored states.
This removes STATE_LIST_MARK hack and allows marks to be separate from states.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

a8f500af

23 5月, 2019 14 次提交

ipv4/igmp: shrink struct ip_sf_list · 0db355d4

由 Eric Dumazet 提交于 5月 22, 2019

Removing two 4 bytes holes allows to use kmalloc-32
kmem cache instead of kmalloc-64 on 64bit kernels.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0db355d4

neighbor: Add tracepoint to __neigh_create · fc651001

由 David Ahern 提交于 5月 22, 2019

Add tracepoint to __neigh_create to enable debugging of new entries.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc651001

net: Set strict_start_type for routes and rules · 75425657

由 David Ahern 提交于 5月 22, 2019

New userspace on an older kernel can send unknown and unsupported
attributes resulting in an incompelete config which is almost
always wrong for routing (few exceptions are passthrough settings
like the protocol that installed the route).

Set strict_start_type in the policies for IPv4 and IPv6 routes and
rules to detect new, unsupported attributes and fail the route add.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

75425657

ipv4: Rename and export nh_update_mtu · 06c77c3e

由 David Ahern 提交于 5月 22, 2019

Rename nh_update_mtu to fib_nhc_update_mtu and export for use by the
nexthop code.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06c77c3e

ipv4: export fib_info_update_nh_saddr · c3669486

由 David Ahern 提交于 5月 22, 2019

Add scope as input argument versus relying on fib_info reference in
fib_nh, and export fib_info_update_nh_saddr.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3669486

ipv4: export fib_flush · 9bd83667

由 David Ahern 提交于 5月 22, 2019

As nexthops are deleted, fib entries referencing it are marked dead.
Export fib_flush so those entries can be removed in a timely manner.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bd83667

ipv4: export fib_check_nh · ac1fab2d

由 David Ahern 提交于 5月 22, 2019

Change fib_check_nh to take net, table and scope as input arguments
over struct fib_config and export for use by nexthop code.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac1fab2d

ipv4: Add function to send route updates · 1bff1a0c

由 David Ahern 提交于 5月 22, 2019

Add fib_info_notify_update to walk the fib and send RTM_NEWROUTE
notifications with NLM_F_REPLACE set for entries linked to a fib_info
that have nh_updated flag set. This helper will be used by the nexthop
code to notify userspace of routes that are impacted when a nexthop
config is updated via replace. The new function and its helper are
similar to how fib_flush and fib_table_flush work for address delete
and link down events.

This notification is needed for legacy apps that do not understand
the new nexthop object. Apps that are nexthop aware can use the
RTA_NH_ID attribute in the route notification to just ignore it.

In the future this should be wrapped in a sysctl to allow OS'es that
are fully updated to avoid the notificaton storm.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1bff1a0c

ipv6: export function to send route updates · 19a3b7ee

由 David Ahern 提交于 5月 22, 2019

Add fib6_rt_update to send RTM_NEWROUTE with NLM_F_REPLACE set. This
helper will be used by the nexthop code to notify userspace of routes
that are impacted when a nexthop config is updated via replace.

This notification is needed for legacy apps that do not understand
the new nexthop object. Apps that are nexthop aware can use the
RTA_NH_ID attribute in the route notification to just ignore it.

In the future this should be wrapped in a sysctl to allow OS'es that
are fully updated to avoid the notificaton storm.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

19a3b7ee

ipv6: Add hook to bump sernum for a route to stubs · cdaa16a4

由 David Ahern 提交于 5月 22, 2019

Add hook to ipv6 stub to bump the sernum up to the root node for a
route. This is needed by the nexthop code when a nexthop config changes.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cdaa16a4

ipv6: Add delete route hook to stubs · 68a9b13d

由 David Ahern 提交于 5月 22, 2019

Add ip6_del_rt to the IPv6 stub. The hook is needed by the nexthop
code to remove entries linked to a nexthop that is getting deleted.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68a9b13d

net: phy: Add support for 100BaseT1 and 1000BaseT1 · b2557764

由 Andrew Lunn 提交于 5月 22, 2019

Add link modes for 100Mbps and 1Gbps over a single pair.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2557764

dt-bindings: phy: dp83867: Add documentation for disabling clock output · 980066e6

由 Trent Piepho 提交于 5月 22, 2019

The clock output is generally only used for testing and development and
not used to daisy-chain PHYs.  It's just a source of RF noise afterward.

Add a mux value for "off".  I've added it as another enumeration to the
output property.  In the actual PHY, the mux and the output enable are
independently controllable.  However, it doesn't seem useful to be able
to describe the mux setting when the output is disabled.

Document that PHY's default setting will be left as is if the property
is omitted.

Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: NTrent Piepho <tpiepho@impinj.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

980066e6

net: Add UNIX_DIAG_UID to Netlink UNIX socket diagnostics. · cae9910e

由 Felipe Gasper 提交于 5月 20, 2019

This adds the ability for Netlink to report a socket's UID along with the
other UNIX diagnostic information that is already available. This will
allow diagnostic tools greater insight into which users control which
socket.

To test this, do the following as a non-root user:

    unshare -U -r bash
    nc -l -U user.socket.$$ &

.. and verify from within that same session that Netlink UNIX socket
diagnostics report the socket's UID as 0. Also verify that Netlink UNIX
socket diagnostics report the socket's UID as the user's UID from an
unprivileged process in a different session. Verify the same from
a root process.
Signed-off-by: NFelipe Gasper <felipe@felipegasper.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cae9910e

21 5月, 2019 11 次提交

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 24 · fd9871f7

由 Thomas Gleixner 提交于 5月 19, 2019

Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or any
  later version this program is distributed in the hope that it will
  be useful but without any warranty without even the implied warranty
  of merchantability or fitness for a particular purpose see the gnu
  general public license for more details

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 50 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NJilayne Lovejoy <opensource@jilayne.com>
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: NSteve Winslow <swinslow@gmail.com>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190519154042.917228456@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

fd9871f7

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 18 · c82ee6d3

由 Thomas Gleixner 提交于 5月 19, 2019

Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 or at your option any
  later version this program is distributed in the hope that it will
  be useful but without any warranty without even the implied warranty
  of merchantability or fitness for a particular purpose see the gnu
  general public license for more details you should have received a
  copy of the gnu general public license along with this program see
  the file copying if not write to the free software foundation 675
  mass ave cambridge ma 02139 usa

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 52 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NJilayne Lovejoy <opensource@jilayne.com>
Reviewed-by: NSteve Winslow <swinslow@gmail.com>
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190519154042.342335923@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c82ee6d3

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 17 · aaf4989b

由 Thomas Gleixner 提交于 5月 19, 2019

Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 or at your option any
  later version this program is distributed in the hope that it will
  be useful but without any warranty without even the implied warranty
  of merchantability or fitness for a particular purpose see the gnu
  general public license for more details you should have received a
  copy of the gnu general public license along with this program if
  not see http www gnu org licenses

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 13 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NSteve Winslow <swinslow@gmail.com>
Reviewed-by: NJilayne Lovejoy <opensource@jilayne.com>
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190519154042.236620792@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

aaf4989b

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13 · 1ccea77e

由 Thomas Gleixner 提交于 5月 19, 2019

Based on 2 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details you
should have received a copy of the gnu general public license along
with this program if not see http www gnu org licenses

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details [based]
[from] [clk] [highbank] [c] you should have received a copy of the
gnu general public license along with this program if not see http
www gnu org licenses

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 355 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: NJilayne Lovejoy <opensource@jilayne.com>
Reviewed-by: NSteve Winslow <swinslow@gmail.com>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190519154041.837383322@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

1ccea77e

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 10 · aded9cb8

由 Thomas Gleixner 提交于 5月 19, 2019

Based on 1 normalized pattern(s):

  licensed under the fsf s gnu public license v2 or later

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 2 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: NJilayne Lovejoy <opensource@jilayne.com>
Reviewed-by: NSteve Winslow <swinslow@gmail.com>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190519154041.526489261@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

aded9cb8

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 7 · 9ab65aff

由 Thomas Gleixner 提交于 5月 19, 2019

Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version this program is distributed in the
  hope that it will be useful but without any warranty without even
  the implied warranty of merchantability or fitness for a particular
  purpose see the gnu general public license for more details the full
  gnu general public license is included in this distribution in the
  file called copying

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 9 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NSteve Winslow <swinslow@gmail.com>
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: NJilayne Lovejoy <opensource@jilayne.com>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190519154041.244154651@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

9ab65aff

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 5 · 0a65089e

由 Thomas Gleixner 提交于 5月 19, 2019

Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version this program is distributed in the
  hope that it will be useful but without any warranty without even
  the implied warranty of merchantability or fitness for a particular
  purpose see the gnu general public license for more details you
  should have received a copy of the gnu general public license along
  with this program if not see http www gnu org licenses the full gnu
  general public license is included in this distribution in the file
  called license

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NSteve Winslow <swinslow@gmail.com>
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: NJilayne Lovejoy <opensource@jilayne.com>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190519154041.052102771@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

0a65089e

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 4 · a636cd6c

由 Thomas Gleixner 提交于 5月 19, 2019

Based on 1 normalized pattern(s):

  licensed under gplv2 or later

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 118 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NJilayne Lovejoy <opensource@jilayne.com>
Reviewed-by: NSteve Winslow <swinslow@gmail.com>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190519154040.961286471@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

a636cd6c

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 1 · 16216333

由 Thomas Gleixner 提交于 5月 19, 2019

Based on 2 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option [no]_[pad]_[ctrl] any later version this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not write to the free
software foundation inc 51 franklin street fifth floor boston ma
02110 1301 usa

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 176 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NJilayne Lovejoy <opensource@jilayne.com>
Reviewed-by: NSteve Winslow <swinslow@gmail.com>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190519154040.652910950@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

16216333

treewide: Add SPDX license identifier for missed files · 457c8996

由 Thomas Gleixner 提交于 5月 19, 2019

Add SPDX license identifiers to all files which:

 - Have no license information of any form

 - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
   initial scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

457c8996

tipc: Avoid copying bytes beyond the supplied data · 9bbcdb07

由 Chris Packham 提交于 5月 20, 2019

TLV_SET is called with a data pointer and a len parameter that tells us
how many bytes are pointed to by data. When invoking memcpy() we need
to careful to only copy len bytes.

Previously we would copy TLV_LENGTH(len) bytes which would copy an extra
4 bytes past the end of the data pointer which newer GCC versions
complain about.

 In file included from test.c:17:
 In function 'TLV_SET',
     inlined from 'test' at test.c:186:5:
 /usr/include/linux/tipc_config.h:317:3:
 warning: 'memcpy' forming offset [33, 36] is out of the bounds [0, 32]
 of object 'bearer_name' with type 'char[32]' [-Warray-bounds]
     memcpy(TLV_DATA(tlv_ptr), data, tlv_len);
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 test.c: In function 'test':
 test.c::161:10: note:
 'bearer_name' declared here
     char bearer_name[TIPC_MAX_BEARER_NAME];
          ^~~~~~~~~~~

We still want to ensure any padding bytes at the end are initialised, do
this with a explicit memset() rather than copy bytes past the end of
data. Apply the same logic to TCM_SET.
Signed-off-by: NChris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bbcdb07

20 5月, 2019 1 次提交

of_net: fix of_get_mac_address retval if compiled without CONFIG_OF · 6a0a923d

由 Petr Štetiar 提交于 5月 19, 2019

of_get_mac_address prior to commit d01f449c ("of_net: add NVMEM
support to of_get_mac_address") could return only valid pointer or NULL,
after this change it could return only valid pointer or ERR_PTR encoded
error value, but I've forget to change the return value of
of_get_mac_address in case where the kernel is compiled without
CONFIG_OF, so I'm doing so now.

Cc: Mirko Lindner <mlindner@marvell.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Fixes: d01f449c ("of_net: add NVMEM support to of_get_mac_address")
Reported-by: NOctavio Alvarez <octallk1@alvarezp.org>
Signed-off-by: NPetr Štetiar <ynezz@true.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a0a923d

19 5月, 2019 2 次提交

panic: add an option to replay all the printk message in buffer · de6da1e8

由 Feng Tang 提交于 5月 17, 2019

Currently on panic, kernel will lower the loglevel and print out pending
printk msg only with console_flush_on_panic().

Add an option for users to configure the "panic_print" to replay all
dmesg in buffer, some of which they may have never seen due to the
loglevel setting, which will help panic debugging .

[feng.tang@intel.com: keep the original console_flush_on_panic() inside panic()]
  Link: http://lkml.kernel.org/r/1556199137-14163-1-git-send-email-feng.tang@intel.com
[feng.tang@intel.com: use logbuf lock to protect the console log index]
  Link: http://lkml.kernel.org/r/1556269868-22654-1-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1556095872-36838-1-git-send-email-feng.tang@intel.comSigned-off-by: NFeng Tang <feng.tang@intel.com>
Reviewed-by: NPetr Mladek <pmladek@suse.com>
Cc: Aaro Koskinen <aaro.koskinen@nokia.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

de6da1e8

mm/vmalloc.c: keep track of free blocks for vmap allocation · 68ad4a33

由 Uladzislau Rezki (Sony) 提交于 5月 17, 2019

Patch series "improve vmap allocation", v3.

Objective
---------

Please have a look for the description at:

  https://lkml.org/lkml/2018/10/19/786

but let me also summarize it a bit here as well.

The current implementation has O(N) complexity. Requests with different
permissive parameters can lead to long allocation time. When i say
"long" i mean milliseconds.

Description
-----------

This approach organizes the KVA memory layout into free areas of the
1-ULONG_MAX range, i.e.  an allocation is done over free areas lookups,
instead of finding a hole between two busy blocks.  It allows to have
lower number of objects which represent the free space, therefore to have
less fragmented memory allocator.  Because free blocks are always as large
as possible.

It uses the augment tree where all free areas are sorted in ascending
order of va->va_start address in pair with linked list that provides
O(1) access to prev/next elements.

Since the tree is augment, we also maintain the "subtree_max_size" of VA
that reflects a maximum available free block in its left or right
sub-tree.  Knowing that, we can easily traversal toward the lowest (left
most path) free area.

Allocation: ~O(log(N)) complexity.  It is sequential allocation method
therefore tends to maximize locality.  The search is done until a first
suitable block is large enough to encompass the requested parameters.
Bigger areas are split.

I copy paste here the description of how the area is split, since i
described it in https://lkml.org/lkml/2018/10/19/786

<snip>

A free block can be split by three different ways.  Their names are
FL_FIT_TYPE, LE_FIT_TYPE/RE_FIT_TYPE and NE_FIT_TYPE, i.e.  they
correspond to how requested size and alignment fit to a free block.

FL_FIT_TYPE - in this case a free block is just removed from the free
list/tree because it fully fits.  Comparing with current design there is
an extra work with rb-tree updating.

LE_FIT_TYPE/RE_FIT_TYPE - left/right edges fit.  In this case what we do
is just cutting a free block.  It is as fast as a current design.  Most of
the vmalloc allocations just end up with this case, because the edge is
always aligned to 1.

NE_FIT_TYPE - Is much less common case.  Basically it happens when
requested size and alignment does not fit left nor right edges, i.e.  it
is between them.  In this case during splitting we have to build a
remaining left free area and place it back to the free list/tree.

Comparing with current design there are two extra steps.  First one is we
have to allocate a new vmap_area structure.  Second one we have to insert
that remaining free block to the address sorted list/tree.

In order to optimize a first case there is a cache with free_vmap objects.
Instead of allocating from slab we just take an object from the cache and
reuse it.

Second one is pretty optimized.  Since we know a start point in the tree
we do not do a search from the top.  Instead a traversal begins from a
rb-tree node we split.
<snip>

De-allocation.  ~O(log(N)) complexity.  An area is not inserted straight
away to the tree/list, instead we identify the spot first, checking if it
can be merged around neighbors.  The list provides O(1) access to
prev/next, so it is pretty fast to check it.  Summarizing.  If merged then
large coalesced areas are created, if not the area is just linked making
more fragments.

There is one more thing that i should mention here.  After modification of
VA node, its subtree_max_size is updated if it was/is the biggest area in
its left or right sub-tree.  Apart of that it can also be populated back
to upper levels to fix the tree.  For more details please have a look at
the __augment_tree_propagate_from() function and the description.

Tests and stressing
-------------------

I use the "test_vmalloc.sh" test driver available under
"tools/testing/selftests/vm/" since 5.1-rc1 kernel.  Just trigger "sudo
./test_vmalloc.sh" to find out how to deal with it.

Tested on different platforms including x86_64/i686/ARM64/x86_64_NUMA.
Regarding last one, i do not have any physical access to NUMA system,
therefore i emulated it.  The time of stressing is days.

If you run the test driver in "stress mode", you also need the patch that
is in Andrew's tree but not in Linux 5.1-rc1.  So, please apply it:

http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/commit/?id=e0cf7749bade6da318e98e934a24d8b62fab512c

After massive testing, i have not identified any problems like memory
leaks, crashes or kernel panics.  I find it stable, but more testing would
be good.

Performance analysis
--------------------

I have used two systems to test.  One is i5-3320M CPU @ 2.60GHz and
another is HiKey960(arm64) board.  i5-3320M runs on 4.20 kernel, whereas
Hikey960 uses 4.15 kernel.  I have both system which could run on 5.1-rc1
as well, but the results have not been ready by time i an writing this.

Currently it consist of 8 tests.  There are three of them which correspond
to different types of splitting(to compare with default).  We have 3
ones(see above).  Another 5 do allocations in different conditions.

a) sudo ./test_vmalloc.sh performance

When the test driver is run in "performance" mode, it runs all available
tests pinned to first online CPU with sequential execution test order.  We
do it in order to get stable and repeatable results.  Take a look at time
difference in "long_busy_list_alloc_test".  It is not surprising because
the worst case is O(N).

# i5-3320M
How many cycles all tests took:
CPU0=646919905370(default) cycles vs CPU0=193290498550(patched) cycles

# See detailed table with results here:
ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_performance_default.txt
ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_performance_patched.txt

# Hikey960 8x CPUs
How many cycles all tests took:
CPU0=3478683207 cycles vs CPU0=463767978 cycles

# See detailed table with results here:
ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/HiKey960_performance_default.txt
ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/HiKey960_performance_patched.txt

b) time sudo ./test_vmalloc.sh test_repeat_count=1

With this configuration, all tests are run on all available online CPUs.
Before running each CPU shuffles its tests execution order.  It gives
random allocation behaviour.  So it is rough comparison, but it puts in
the picture for sure.

# i5-3320M
<default>            vs            <patched>
real    101m22.813s                real    0m56.805s
user    0m0.011s                   user    0m0.015s
sys     0m5.076s                   sys     0m0.023s

# See detailed table with results here:
ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_test_repeat_count_1_default.txt
ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_test_repeat_count_1_patched.txt

# Hikey960 8x CPUs
<default>            vs            <patched>
real    unknown                    real    4m25.214s
user    unknown                    user    0m0.011s
sys     unknown                    sys     0m0.670s

I did not manage to complete this test on "default Hikey960" kernel
version.  After 24 hours it was still running, therefore i had to cancel
it.  That is why real/user/sys are "unknown".

This patch (of 3):

Currently an allocation of the new vmap area is done over busy list
iteration(complexity O(n)) until a suitable hole is found between two busy
areas.  Therefore each new allocation causes the list being grown.  Due to
over fragmented list and different permissive parameters an allocation can
take a long time.  For example on embedded devices it is milliseconds.

This patch organizes the KVA memory layout into free areas of the
1-ULONG_MAX range.  It uses an augment red-black tree that keeps blocks
sorted by their offsets in pair with linked list keeping the free space in
order of increasing addresses.

Nodes are augmented with the size of the maximum available free block in
its left or right sub-tree.  Thus, that allows to take a decision and
traversal toward the block that will fit and will have the lowest start
address, i.e.  it is sequential allocation.

Allocation: to allocate a new block a search is done over the tree until a
suitable lowest(left most) block is large enough to encompass: the
requested size, alignment and vstart point.  If the block is bigger than
requested size - it is split.

De-allocation: when a busy vmap area is freed it can either be merged or
inserted to the tree.  Red-black tree allows efficiently find a spot
whereas a linked list provides a constant-time access to previous and next
blocks to check if merging can be done.  In case of merging of
de-allocated memory chunk a large coalesced area is created.

Complexity: ~O(log(N))

[urezki@gmail.com: v3]
  Link: http://lkml.kernel.org/r/20190402162531.10888-2-urezki@gmail.com
[urezki@gmail.com: v4]
  Link: http://lkml.kernel.org/r/20190406183508.25273-2-urezki@gmail.com
Link: http://lkml.kernel.org/r/20190321190327.11813-2-urezki@gmail.comSigned-off-by: NUladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: NRoman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

68ad4a33

18 5月, 2019 2 次提交

net/mlx5: E-Switch, Correct type to u16 for vport_num and int for vport_index · 02f3afd9

由 Parav Pandit 提交于 4月 05, 2019

To avoid any ambiguity between vport index and vport number,
rename functions that had vport, to vport_num or vport_index appropriately.

vport_num is u16 hence change mlx5_eswitch_index_to_vport_num() return
type to u16.

vport_index is an int in vport array. Hence change input type of vport
index in mlx5_eswitch_index_to_vport_num() to int.

Correct multiple eswitch representor interfaces use type u16 of
rep->vport as type int vport_index.

Send vport FW commands with correct eswitch u16 vport_num instead
host int vport_index.

Fixes: 5ae51620 ("net/mlx5: E-Switch, Assign a different position for uplink rep and vport")
Signed-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NVu Pham <vuhuong@mellanox.com>
Reviewed-by: NBodong Wang <bodong@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

02f3afd9

i2c: core: add device-managed version of i2c_new_dummy · b8f5fe3b

由 Heiner Kallweit 提交于 5月 16, 2019

i2c_new_dummy is typically called from the probe function of the
driver for the primary i2c client. It requires calls to
i2c_unregister_device in the error path of the probe function and
in the remove function.
This can be simplified by introducing a device-managed version.

Note the changed error case return value type: i2c_new_dummy returns
NULL whilst devm_i2c_new_dummy_device returns an ERR_PTR.
Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
[wsa: rename new functions and fix minor kdoc issues]
Signed-off-by: NWolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: NPeter Rosin <peda@axentia.se>
Reviewed-by: NKieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Reviewed-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: NWolfram Sang <wsa@the-dreams.de>

b8f5fe3b

17 5月, 2019 5 次提交

crypto: hash - fix incorrect HASH_MAX_DESCSIZE · e1354400

由 Eric Biggers 提交于 5月 14, 2019

The "hmac(sha3-224-generic)" algorithm has a descsize of 368 bytes,
which is greater than HASH_MAX_DESCSIZE (360) which is only enough for
sha3-224-generic.  The check in shash_prepare_alg() doesn't catch this
because the HMAC template doesn't set descsize on the algorithms, but
rather sets it on each individual HMAC transform.

This causes a stack buffer overflow when SHASH_DESC_ON_STACK() is used
with hmac(sha3-224-generic).

Fix it by increasing HASH_MAX_DESCSIZE to the real maximum.  Also add a
sanity check to hmac_init().

This was detected by the improved crypto self-tests in v5.2, by loading
the tcrypt module with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y enabled.  I
didn't notice this bug when I ran the self-tests by requesting the
algorithms via AF_ALG (i.e., not using tcrypt), probably because the
stack layout differs in the two cases and that made a difference here.

KASAN report:

    BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:359 [inline]
    BUG: KASAN: stack-out-of-bounds in shash_default_import+0x52/0x80 crypto/shash.c:223
    Write of size 360 at addr ffff8880651defc8 by task insmod/3689

    CPU: 2 PID: 3689 Comm: insmod Tainted: G            E     5.1.0-10741-g35c99ffa #11
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
    Call Trace:
     __dump_stack lib/dump_stack.c:77 [inline]
     dump_stack+0x86/0xc5 lib/dump_stack.c:113
     print_address_description+0x7f/0x260 mm/kasan/report.c:188
     __kasan_report+0x144/0x187 mm/kasan/report.c:317
     kasan_report+0x12/0x20 mm/kasan/common.c:614
     check_memory_region_inline mm/kasan/generic.c:185 [inline]
     check_memory_region+0x137/0x190 mm/kasan/generic.c:191
     memcpy+0x37/0x50 mm/kasan/common.c:125
     memcpy include/linux/string.h:359 [inline]
     shash_default_import+0x52/0x80 crypto/shash.c:223
     crypto_shash_import include/crypto/hash.h:880 [inline]
     hmac_import+0x184/0x240 crypto/hmac.c:102
     hmac_init+0x96/0xc0 crypto/hmac.c:107
     crypto_shash_init include/crypto/hash.h:902 [inline]
     shash_digest_unaligned+0x9f/0xf0 crypto/shash.c:194
     crypto_shash_digest+0xe9/0x1b0 crypto/shash.c:211
     generate_random_hash_testvec.constprop.11+0x1ec/0x5b0 crypto/testmgr.c:1331
     test_hash_vs_generic_impl+0x3f7/0x5c0 crypto/testmgr.c:1420
     __alg_test_hash+0x26d/0x340 crypto/testmgr.c:1502
     alg_test_hash+0x22e/0x330 crypto/testmgr.c:1552
     alg_test.part.7+0x132/0x610 crypto/testmgr.c:4931
     alg_test+0x1f/0x40 crypto/testmgr.c:4952

Fixes: b68a7ec1 ("crypto: hash - Remove VLA usage")
Reported-by: NCorentin Labbe <clabbe.montjoie@gmail.com>
Cc: <stable@vger.kernel.org> # v4.20+
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Tested-by: NCorentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

e1354400

slab: remove /proc/slab_allocators · 7878c231

由 Qian Cai 提交于 5月 16, 2019

It turned out that DEBUG_SLAB_LEAK is still broken even after recent
recue efforts that when there is a large number of objects like
kmemleak_object which is normal on a debug kernel,

  # grep kmemleak /proc/slabinfo
  kmemleak_object   2243606 3436210 ...

reading /proc/slab_allocators could easily loop forever while processing
the kmemleak_object cache and any additional freeing or allocating
objects will trigger a reprocessing. To make a situation worse,
soft-lockups could easily happen in this sitatuion which will call
printk() to allocate more kmemleak objects to guarantee an infinite
loop.

Also, since it seems no one had noticed when it was totally broken
more than 2-year ago - see the commit fcf88917 ("slab: fix a crash
by reading /proc/slab_allocators"), probably nobody cares about it
anymore due to the decline of the SLAB. Just remove it entirely.
Suggested-by: NVlastimil Babka <vbabka@suse.cz>
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NQian Cai <cai@lca.pw>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7878c231

ipv6: prevent possible fib6 leaks · 61fb0d01

由 Eric Dumazet 提交于 5月 15, 2019

At ipv6 route dismantle, fib6_drop_pcpu_from() is responsible
for finding all percpu routes and set their ->from pointer
to NULL, so that fib6_ref can reach its expected value (1).

The problem right now is that other cpus can still catch the
route being deleted, since there is no rcu grace period
between the route deletion and call to fib6_drop_pcpu_from()

This can leak the fib6 and associated resources, since no
notifier will take care of removing the last reference(s).

I decided to add another boolean (fib6_destroying) instead
of reusing/renaming exception_bucket_flushed to ease stable backports,
and properly document the memory barriers used to implement this fix.

This patch has been co-developped with Wei Wang.

Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Cc: Wei Wang <weiwan@google.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin Lau <kafai@fb.com>
Acked-by: NWei Wang <weiwan@google.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61fb0d01

net: test nouarg before dereferencing zerocopy pointers · 185ce5c3

由 Willem de Bruijn 提交于 5月 15, 2019

Zerocopy skbs without completion notification were added for packet
sockets with PACKET_TX_RING user buffers. Those signal completion
through the TP_STATUS_USER bit in the ring. Zerocopy annotation was
added only to avoid premature notification after clone or orphan, by
triggering a copy on these paths for these packets.

The mechanism had to define a special "no-uarg" mode because packet
sockets already use skb_uarg(skb) == skb_shinfo(skb)->destructor_arg
for a different pointer.

Before deferencing skb_uarg(skb), verify that it is a real pointer.

Fixes: 5cd8d46e ("packet: copy user buffers before orphan or clone")
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

185ce5c3

flow_offload: support CVLAN match · bae9ed69

由 Edward Cree 提交于 5月 14, 2019

Plumb it through from the flow_dissector.
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bae9ed69

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功