提交 · 1238c4fd4812e225c93bc4c152769afa8493f903 · openeuler / Kernel

24 11月, 2017 2 次提交

x86/PCI: Remove unused HyperTransport interrupt support · fd2fa6c1

由 Bjorn Helgaas 提交于 11月 22, 2017

There are no in-tree callers of ht_create_irq(), the driver interface for
HyperTransport interrupts, left.  Remove the unused entry point and all the
supporting code.

See 8b955b0d ("[PATCH] Initial generic hypertransport interrupt
support").
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-pci@vger.kernel.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Link: https://lkml.kernel.org/r/20171122221337.3877.23362.stgit@bhelgaas-glaptop.roam.corp.google.com

fd2fa6c1

net: accept UFO datagrams from tuntap and packet · 0c19f846

由 Willem de Bruijn 提交于 11月 21, 2017

Tuntap and similar devices can inject GSO packets. Accept type
VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.

Processes are expected to use feature negotiation such as TUNSETOFFLOAD
to detect supported offload types and refrain from injecting other
packets. This process breaks down with live migration: guest kernels
do not renegotiate flags, so destination hosts need to expose all
features that the source host does.

Partially revert the UFO removal from 182e0b6b~1..d9d30adf.
This patch introduces nearly(*) no new code to simplify verification.
It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
insertion and software UFO segmentation.

It does not reinstate protocol stack support, hardware offload
(NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.

To support SKB_GSO_UDP reappearing in the stack, also reinstate
logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
by squashing in commit 93991221 ("net: skb_needs_check() removes
CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee6
("net: avoid skb_warn_bad_offload false positives on UFO").

(*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
ipv6_proxy_select_ident is changed to return a __be32 and this is
assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
at the end of the enum to minimize code churn.

Tested
  Booted a v4.13 guest kernel with QEMU. On a host kernel before this
  patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
  enabled, same as on a v4.13 host kernel.

  A UFO packet sent from the guest appears on the tap device:
    host:
      nc -l -p -u 8000 &
      tcpdump -n -i tap0

    guest:
      dd if=/dev/zero of=payload.txt bs=1 count=2000
      nc -u 192.16.1.1 8000 < payload.txt

  Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
  packets arriving fragmented:

    ./with_tap_pair.sh ./tap_send_ufo tap0 tap1
    (from https://github.com/wdebruij/kerneltools/tree/master/tests)

Changes
  v1 -> v2
    - simplified set_offload change (review comment)
    - documented test procedure

Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com>
Fixes: fb652fdf ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
Reported-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c19f846

23 11月, 2017 2 次提交

bpf: fix branch pruning logic · c131187d

由 Alexei Starovoitov 提交于 11月 22, 2017

when the verifier detects that register contains a runtime constant
and it's compared with another constant it will prune exploration
of the branch that is guaranteed not to be taken at runtime.
This is all correct, but malicious program may be constructed
in such a way that it always has a constant comparison and
the other branch is never taken under any conditions.
In this case such path through the program will not be explored
by the verifier. It won't be taken at run-time either, but since
all instructions are JITed the malicious program may cause JITs
to complain about using reserved fields, etc.
To fix the issue we have to track the instructions explored by
the verifier and sanitize instructions that are dead at run time
with NOPs. We cannot reject such dead code, since llvm generates
it for valid C code, since it doesn't do as much data flow
analysis as the verifier does.

Fixes: 17a52670 ("bpf: verifier (add verifier core)")
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

c131187d

bpf: introduce ARG_PTR_TO_MEM_OR_NULL · db1ac496

由 Gianluca Borello 提交于 11月 22, 2017

With the current ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM semantics, an helper
argument can be NULL when the next argument type is ARG_CONST_SIZE_OR_ZERO
and the verifier can prove the value of this next argument is 0. However,
most helpers are just interested in handling <!NULL, 0>, so forcing them to
deal with <NULL, 0> makes the implementation of those helpers more
complicated for no apparent benefits, requiring them to explicitly handle
those corner cases with checks that bpf programs could start relying upon,
preventing the possibility of removing them later.

Solve this by making ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM never accept NULL
even when ARG_CONST_SIZE_OR_ZERO is set, and introduce a new argument type
ARG_PTR_TO_MEM_OR_NULL to explicitly deal with the NULL case.

Currently, the only helper that needs this is bpf_csum_diff_proto(), so
change arg1 and arg3 to this new type as well.

Also add a new battery of tests that explicitly test the
!ARG_PTR_TO_MEM_OR_NULL combination: all the current ones testing the
various <NULL, 0> variations are focused on bpf_csum_diff, so cover also
other helpers.
Signed-off-by: NGianluca Borello <g.borello@gmail.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

db1ac496

22 11月, 2017 9 次提交

treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts · 841b86f3

由 Kees Cook 提交于 10月 23, 2017

With all callbacks converted, and the timer callback prototype
switched over, the TIMER_FUNC_TYPE cast is no longer needed,
so remove it. Conversion was done with the following scripts:

    perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \
        $(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u)

    perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \
        $(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u)

The now unused macros are also dropped from include/linux/timer.h.
Signed-off-by: NKees Cook <keescook@chromium.org>

841b86f3

timer: Remove redundant __setup_timer*() macros · 919b250f

由 Kees Cook 提交于 10月 22, 2017

With __init_timer*() now matching __setup_timer*(), remove the redundant
internal interface, clean up the resulting definitions and add more
documentation.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NKees Cook <keescook@chromium.org>

919b250f

timer: Pass function down to initialization routines · 188665b2

由 Kees Cook 提交于 10月 22, 2017

In preparation for removing more macros, pass the function down to the
initialization routines instead of doing it in macros.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: NKees Cook <keescook@chromium.org>

188665b2

timer: Remove unused data arguments from macros · 1fe66ba5

由 Kees Cook 提交于 10月 22, 2017

With the .data field removed, the ignored data arguments in timer macros
can be removed.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Shaohua Li <shli@fb.com>
Signed-off-by: NKees Cook <keescook@chromium.org>

1fe66ba5

timer: Switch callback prototype to take struct timer_list * argument · 354b46b1

由 Kees Cook 提交于 10月 22, 2017

Since all callbacks have been converted, we can switch the core
prototype to "struct timer_list *" now too.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: NKees Cook <keescook@chromium.org>

354b46b1

timer: Pass timer_list pointer to callbacks unconditionally · c1eba5bc

由 Kees Cook 提交于 10月 22, 2017

Now that all timer callbacks are already taking their struct timer_list
pointer as the callback argument, just do this unconditionally and remove
the .data field.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: NKees Cook <keescook@chromium.org>

c1eba5bc

timer: Remove setup_*timer() interface · 513ae785

由 Kees Cook 提交于 10月 22, 2017

With all callers converted to timer_setup(), the old setup_*timer()
interface can be removed.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NKees Cook <keescook@chromium.org>

513ae785

timer: Remove init_timer() interface · 7eeb6b89

由 Kees Cook 提交于 10月 11, 2017

All users of init_timer() have been updated. Remove the ancient interface.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: NKees Cook <keescook@chromium.org>

7eeb6b89

block/laptop_mode: Convert timers to use timer_setup() · bca237a5

由 Kees Cook 提交于 8月 28, 2017

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: linux-block@vger.kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: NKees Cook <keescook@chromium.org>

bca237a5

21 11月, 2017 5 次提交

sched/deadline: Don't use dubious signed bitfields · aa5222e9

由 Dan Carpenter 提交于 10月 13, 2017

It doesn't cause a run-time bug, but these bitfields should be unsigned.
When it's signed ->dl_throttled is set to either 0 or -1, instead of
0 and 1 as expected.

The sched.h file is included into tons of places so Sparse generates
a flood of warnings like this:

  ./include/linux/sched.h:477:54: error: dubious one-bit signed bitfield
Reported-by: NMatthew Wilcox <willy@infradead.org>
Reported-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NLuca Abeni <luca.abeni@santannapisa.it>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-janitors@vger.kernel.org
Cc: luca abeni <luca.abeni@santannapisa.it>
Link: http://lkml.kernel.org/r/20171013070121.dzcncojuj2f4utij@mwandaSigned-off-by: NIngo Molnar <mingo@kernel.org>

aa5222e9

bpf: make bpf_prog_offload_verifier_prep() static inline · 14380194

由 Jakub Kicinski 提交于 11月 20, 2017

Header implementation of bpf_prog_offload_verifier_prep() which
is used if CONFIG_NET=n should be a static inline.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

14380194

bpf: revert report offload info to user space · 1ee64009

由 Jakub Kicinski 提交于 11月 20, 2017

This reverts commit bd601b6a ("bpf: report offload info to user
space").  The ifindex by itself is not sufficient, we should provide
information on which network namespace this ifindex belongs to.
After considering some options we concluded that it's best to just
remove this API for now, and rework it in -next.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

1ee64009

bpf: turn bpf_prog_get_type() into a wrapper · 479321e9

由 Jakub Kicinski 提交于 11月 20, 2017

bpf_prog_get_type() is identical to bpf_prog_get_type_dev(),
with false passed as attach_drv.  Instead of keeping it as
an exported symbol turn it into static inline wrapper.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

479321e9

bpf: offload: move offload device validation out to the drivers · 288b3de5

由 Jakub Kicinski 提交于 11月 20, 2017

With TC shared block changes we can't depend on correct netdev
pointer being available in cls_bpf.  Move the device validation
to the driver.  Core will only make sure that offloaded programs
are always attached in the driver (or in HW by the driver).  We
trust that drivers which implement offload callbacks will perform
necessary checks.

Moving the checks to the driver is generally a useful thing,
in practice the check should be against a switchdev instance,
not a netdev, given that most ASICs will probably allow using
the same program on many ports.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

288b3de5

19 11月, 2017 7 次提交

NTB: switchtec_ntb: Add skeleton NTB driver · e099b45b

由 Logan Gunthorpe 提交于 8月 03, 2017

Add a skeleton NTB driver which will be filled out in subsequent patches.
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Reviewed-by: NStephen Bates <sbates@raithlin.com>
Reviewed-by: NKurt Schwemmer <kurt.schwemmer@microsemi.com>
Acked-by: NAllen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: NJon Mason <jdmason@kudzu.us>

e099b45b

NTB: switchtec_ntb: Introduce initial NTB driver · 33dea5aa

由 Logan Gunthorpe 提交于 8月 03, 2017

Seeing the Switchtec NTB hardware shares the same endpoint as the
management endpoint we utilize the class_interface API to register
an NTB driver for every Switchtec device in the system that has the
NTB class code.
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Reviewed-by: NStephen Bates <sbates@raithlin.com>
Reviewed-by: NKurt Schwemmer <kurt.schwemmer@microsemi.com>
Acked-by: NAllen Hubbe <Allen.Hubbe@dell.com>
Acked-by: NBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NJon Mason <jdmason@kudzu.us>

33dea5aa

NTB: Add check and comment for link up to mw_count() and mw_get_align() · fa5ab66e

由 Logan Gunthorpe 提交于 8月 03, 2017

Adds a comment and a check to ntb_mw_get_align() so that it always fails
if the function is called before the link is up.

Also adds a comment to ntb_mw_count() to note that it may return 0 if
it is called before the link is up.

This is to prevent accidental mis-use in clients that are testing
on hardware that this doesn't matter for.
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Acked-by: NAllen Hubbe <Allen.Hubbe@dell.com>
Signed-off-by: NJon Mason <jdmason@kudzu.us>

fa5ab66e

NTB: switchtec: Add link event notifier callback · 48c302dc

由 Logan Gunthorpe 提交于 8月 03, 2017

In order for the Switchtec NTB code to handle link change events we
create a notifier callback in the switchtec code which gets called
whenever an appropriate event interrupt occurs.

In order to preserve userspace's ability to follow these events,
we compare the event count with a stored copy from last time we
checked.
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Reviewed-by: NStephen Bates <sbates@raithlin.com>
Reviewed-by: NKurt Schwemmer <kurt.schwemmer@microsemi.com>
Acked-by: NBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NJon Mason <jdmason@kudzu.us>

48c302dc

NTB: switchtec: Add NTB hardware register definitions · c082b04c

由 Logan Gunthorpe 提交于 8月 03, 2017

There are two additional regions: ctrl and dbmsg. The first is
for generic NTB control and memory windows. The second is for doorbells
and message registers. This patch also adds a number of related
constants for using these registers.
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Reviewed-by: NStephen Bates <sbates@raithlin.com>
Reviewed-by: NKurt Schwemmer <kurt.schwemmer@microsemi.com>
Signed-off-by: NJon Mason <jdmason@kudzu.us>

c082b04c

NTB: switchtec: Export class symbol for use in upper layer driver · 302e994d

由 Logan Gunthorpe 提交于 8月 03, 2017

We export the class pointer symbol and add an extern define in the
Switchtec header file.
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Reviewed-by: NStephen Bates <sbates@raithlin.com>
Reviewed-by: NKurt Schwemmer <kurt.schwemmer@microsemi.com>
Acked-by: NBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NJon Mason <jdmason@kudzu.us>

302e994d

NTB: switchtec: Move structure definitions into a common header · 5a1c269f

由 Logan Gunthorpe 提交于 8月 03, 2017

Create the switchtec.h header in include/linux with hardware defines
and the switchtec_dev structure. Both moved directly from switchtec.c.
This is a prep patch for creating an NTB driver for Switchtec.
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Reviewed-by: NStephen Bates <sbates@raithlin.com>
Reviewed-by: NKurt Schwemmer <kurt.schwemmer@microsemi.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NJon Mason <jdmason@kudzu.us>

5a1c269f

18 11月, 2017 15 次提交

sysvipc: make get_maxid O(1) again · 15df03c8

由 Davidlohr Bueso 提交于 11月 17, 2017

For a custom microbenchmark on a 3.30GHz Xeon SandyBridge, which calls
IPC_STAT over and over, it was calculated that, on avg the cost of
ipc_get_maxid() for increasing amounts of keys was:

 10 keys: ~900 cycles
 100 keys: ~15000 cycles
 1000 keys: ~150000 cycles
 10000 keys: ~2100000 cycles

This is unsurprising as maxid is currently O(n).

By having the max_id available in O(1) we save all those cycles for each
semctl(_STAT) command, the idr_find can be expensive -- which some real
(customer) workloads actually poll on.

Note that this used to be the case, until commit 7ca7e564 ("ipc:
store ipcs into IDRs").  The cost is the extra idr_find when doing
RMIDs, but we simply go backwards, and should not take too many
iterations to find the new value.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170831172049.14576-5-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

15df03c8

sysvipc: unteach ids->next_id for !CHECKPOINT_RESTORE · b8fd9983

由 Davidlohr Bueso 提交于 11月 17, 2017

Patch series "sysvipc: ipc-key management improvements".

Here are a few improvements I spotted while eyeballing Guillaume's
rhashtable implementation for ipc keys.  The first and fourth patches
are the interesting ones, the middle two are trivial.

This patch (of 4):

The next_id object-allocation functionality was introduced in commit
03f59566 ("ipc: add sysctl to specify desired next object id").

Given that these new entries are _only_ exported under the
CONFIG_CHECKPOINT_RESTORE option, there is no point for the common case
to even know about ->next_id.  As such rewrite ipc_buildid() such that
it can do away with the field as well as unnecessary branches when
adding a new identifier.  The end result also better differentiates both
cases, so the code ends up being cleaner; albeit the small duplications
regarding the default case.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170831172049.14576-2-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b8fd9983

kernel/reboot.c: add devm_register_reboot_notifier() · 2d8364ba

由 Andrey Smirnov 提交于 11月 17, 2017

Add devm_* wrapper around register_reboot_notifier to simplify device
specific reboot notifier registration/unregistration.

[akpm@linux-foundation.org: move `struct device' forward decl to top-of-file]
Link: http://lkml.kernel.org/r/20170320171753.1705-1-andrew.smirnov@gmail.comSigned-off-by: NAndrey Smirnov <andrew.smirnov@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2d8364ba

kcov: support comparison operands collection · ded97d2c

由 Victor Chibotaru 提交于 11月 17, 2017

Enables kcov to collect comparison operands from instrumented code.
This is done by using Clang's -fsanitize=trace-cmp instrumentation
(currently not available for GCC).

The comparison operands help a lot in fuzz testing.  E.g.  they are used
in Syzkaller to cover the interiors of conditional statements with way
less attempts and thus make previously unreachable code reachable.

To allow separate collection of coverage and comparison operands two
different work modes are implemented.  Mode selection is now done via a
KCOV_ENABLE ioctl call with corresponding argument value.

Link: http://lkml.kernel.org/r/20171011095459.70721-1-glider@google.comSigned-off-by: NVictor Chibotaru <tchibo@google.com>
Signed-off-by: NAlexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: <syzkaller@googlegroups.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ded97d2c

kernel/panic.c: add TAINT_AUX · 4efb442c

由 Borislav Petkov 提交于 11月 17, 2017

This is the gist of a patch which we've been forward-porting in our
kernels for a long time now and it probably would make a good sense to
have such TAINT_AUX flag upstream which can be used by each distro etc,
how they see fit.  This way, we won't need to forward-port a distro-only
version indefinitely.

Add an auxiliary taint flag to be used by distros and others.  This
obviates the need to forward-port whatever internal solutions people
have in favor of a single flag which they can map arbitrarily to a
definition of their pleasing.

The "X" mnemonic could also mean eXternal, which would be taint from a
distro or something else but not the upstream kernel.  We will use it to
mark modules for which we don't provide support.  I.e., a really
eXternal module.

Link: http://lkml.kernel.org/r/20170911134533.dp5mtyku5bongx4c@pd.tnicSigned-off-by: NBorislav Petkov <bp@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4efb442c

pid: remove pidhash · e8cfbc24

由 Gargi Sharma 提交于 11月 17, 2017

pidhash is no longer required as all the information can be looked up
from idr tree.  nr_hashed represented the number of pids that had been
hashed.  Since, nr_hashed and PIDNS_HASH_ADDING are no longer relevant,
it has been renamed to pid_allocated and PIDNS_ADDING respectively.

[gs051095@gmail.com: v6]
  Link: http://lkml.kernel.org/r/1507760379-21662-3-git-send-email-gs051095@gmail.com
Link: http://lkml.kernel.org/r/1507583624-22146-3-git-send-email-gs051095@gmail.comSigned-off-by: NGargi Sharma <gs051095@gmail.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>	[ia64]
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e8cfbc24

pid: replace pid bitmap implementation with IDR API · 95846ecf

由 Gargi Sharma 提交于 11月 17, 2017

Patch series "Replacing PID bitmap implementation with IDR API", v4.

This series replaces kernel bitmap implementation of PID allocation with
IDR API.  These patches are written to simplify the kernel by replacing
custom code with calls to generic code.

The following are the stats for pid and pid_namespace object files
before and after the replacement.  There is a noteworthy change between
the IDR and bitmap implementation.

Before
   text       data        bss        dec        hex    filename
   8447       3894         64      12405       3075    kernel/pid.o
After
   text       data        bss        dec        hex    filename
   3397        304          0       3701        e75    kernel/pid.o

Before
   text       data        bss        dec        hex    filename
   5692       1842        192       7726       1e2e    kernel/pid_namespace.o
After
   text       data        bss        dec        hex    filename
   2854        216         16       3086        c0e    kernel/pid_namespace.o

The following are the stats for ps, pstree and calling readdir on /proc
for 10,000 processes.

ps:
        With IDR API    With bitmap
real    0m1.479s        0m2.319s
user    0m0.070s        0m0.060s
sys     0m0.289s        0m0.516s

pstree:
        With IDR API    With bitmap
real    0m1.024s        0m1.794s
user    0m0.348s        0m0.612s
sys     0m0.184s        0m0.264s

proc:
        With IDR API    With bitmap
real    0m0.059s        0m0.074s
user    0m0.000s        0m0.004s
sys     0m0.016s        0m0.016s

This patch (of 2):

Replace the current bitmap implementation for Process ID allocation.
Functions that are no longer required, for example, free_pidmap(),
alloc_pidmap(), etc.  are removed.  The rest of the functions are
modified to use the IDR API.  The change was made to make the PID
allocation less complex by replacing custom code with calls to generic
API.

[gs051095@gmail.com: v6]
  Link: http://lkml.kernel.org/r/1507760379-21662-2-git-send-email-gs051095@gmail.com
[avagin@openvz.org: restore the old behaviour of the ns_last_pid sysctl]
  Link: http://lkml.kernel.org/r/20171106183144.16368-1-avagin@openvz.org
Link: http://lkml.kernel.org/r/1507583624-22146-2-git-send-email-gs051095@gmail.comSigned-off-by: NGargi Sharma <gs051095@gmail.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

95846ecf

pipe: add proc_dopipe_max_size() to safely assign pipe_max_size · 7a8d1819

由 Joe Lawrence 提交于 11月 17, 2017

pipe_max_size is assigned directly via procfs sysctl:

  static struct ctl_table fs_table[] = {
          ...
          {
                  .procname       = "pipe-max-size",
                  .data           = &pipe_max_size,
                  .maxlen         = sizeof(int),
                  .mode           = 0644,
                  .proc_handler   = &pipe_proc_fn,
                  .extra1         = &pipe_min_size,
          },
          ...

  int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
                   size_t *lenp, loff_t *ppos)
  {
          ...
          ret = proc_dointvec_minmax(table, write, buf, lenp, ppos)
          ...

and then later rounded in-place a few statements later:

          ...
          pipe_max_size = round_pipe_size(pipe_max_size);
          ...

This leaves a window of time between initial assignment and rounding
that may be visible to other threads.  (For example, one thread sets a
non-rounded value to pipe_max_size while another reads its value.)

Similar reads of pipe_max_size are potentially racy:

  pipe.c :: alloc_pipe_info()
  pipe.c :: pipe_set_size()

Add a new proc_dopipe_max_size() that consolidates reading the new value
from the user buffer, verifying bounds, and calling round_pipe_size()
with a single assignment to pipe_max_size.

Link: http://lkml.kernel.org/r/1507658689-11669-4-git-send-email-joe.lawrence@redhat.comSigned-off-by: NJoe Lawrence <joe.lawrence@redhat.com>
Reported-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7a8d1819

lib/genalloc.c: make the avail variable an atomic_long_t · 36a3d1dd

由 Stephen Bates 提交于 11月 17, 2017

If the amount of resources allocated to a gen_pool exceeds 2^32 then the
avail atomic overflows and this causes problems when clients try and
borrow resources from the pool.  This is only expected to be an issue on
64 bit systems.

Add the <linux/atomic.h> header to pull in atomic_long* operations.  So
that 32 bit systems continue to use atomic32_t but 64 bit systems can
use atomic64_t.

Link: http://lkml.kernel.org/r/1509033843-25667-1-git-send-email-sbates@raithlin.comSigned-off-by: NStephen Bates <sbates@raithlin.com>
Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
Reviewed-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: NDaniel Mentz <danielmentz@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

36a3d1dd

include/linux/radix-tree.h: remove unneeded #include <linux/bug.h> · f5bba9d1

由 Masahiro Yamada 提交于 11月 17, 2017

This include was added by commit 187f1882 ("BUG: headers with
BUG/BUG_ON etc. need linux/bug.h") because BUG_ON() was used in this
header at that time.

Some time later, commit 6d75f366 ("lib: radix-tree: check accounting
of existing slot replacement users") removed the use of BUG_ON() from
this header.

Since then, there is no reason to include <linux/bug.h>.

Link: http://lkml.kernel.org/r/1505660151-4383-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Chris Mi <chrism@mellanox.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f5bba9d1

include/linux/bitfield.h: include <linux/build_bug.h> instead of <linux/bug.h> · 8001541c

由 Masahiro Yamada 提交于 11月 17, 2017

Since commit bc6245e5 ("bug: split BUILD_BUG stuff out into
<linux/build_bug.h>"), #include <linux/build_bug.h> is better to pull
minimal headers needed for BUILG_BUG() family.

Link: http://lkml.kernel.org/r/1505700775-19826-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Cc: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8001541c

include/linux/compiler-clang.h: handle randomizable anonymous structs · 4ca59b14

由 Sandipan Das 提交于 11月 17, 2017

The GCC randomize layout plugin can randomize the member offsets of
sensitive kernel data structures.  To use this feature, certain
annotations and members are added to the structures which affect the
member offsets even if this plugin is not used.

All of these structures are completely randomized, except for task_struct
which leaves out some of its members.  All the other members are wrapped
within an anonymous struct with the __randomize_layout attribute.  This is
done using the randomized_struct_fields_start and
randomized_struct_fields_end defines.

When the plugin is disabled, the behaviour of this attribute can vary
based on the GCC version.  For GCC 5.1+, this attribute maps to
__designated_init otherwise it is just an empty define but the anonymous
structure is still present.  For other compilers, both
randomized_struct_fields_start and randomized_struct_fields_end default
to empty defines meaning the anonymous structure is not introduced at
all.

So, if a module compiled with Clang, such as a BPF program, needs to
access task_struct fields such as pid and comm, the offsets of these
members as recognized by Clang are different from those recognized by
modules compiled with GCC.  If GCC 4.6+ is used to build the kernel,
this can be solved by introducing appropriate defines for Clang so that
the anonymous structure is seen when determining the offsets for the
members.

Link: http://lkml.kernel.org/r/20171109064645.25581-1-sandipan@linux.vnet.ibm.comSigned-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4ca59b14

iopoll: avoid -Wint-in-bool-context warning · d15809f3

由 Arnd Bergmann 提交于 11月 17, 2017

When we pass the result of a multiplication as the timeout or the delay,
we can get a warning from gcc-7:

drivers/mmc/host/bcm2835.c:596:149: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context]
drivers/mfd/arizona-core.c:247:195: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context]
drivers/gpu/drm/sun4i/sun4i_hdmi_i2c.c:49:27: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context]

The warning is a bit questionable inside of a macro, but this is
intentional on the side of the gcc developers. It is also an indication
of another problem: we evaluate the timeout and sleep arguments multiple
times, which can have undesired side-effects when those are complex
expressions.

This changes the two iopoll variants to use local variables for storing
copies of the timeouts. This adds some more type safety, and avoids
both the double-evaluation and the gcc warning.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81484
Link: http://lkml.kernel.org/r/20170726133756.2161367-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20171102114048.1526955-1-arnd@arndb.deSigned-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NMark Brown <broonie@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d15809f3

kernel debug: support resetting WARN_ONCE for all architectures · aaf5dcfb

由 Andi Kleen 提交于 11月 17, 2017

Some architectures store the WARN_ONCE state in the flags field of the
bug_entry.  Clear that one too when resetting once state through
/sys/kernel/debug/clear_warn_once

Pointed out by Michael Ellerman

Improves the earlier patch that add clear_warn_once.

[ak@linux.intel.com: add a missing ifdef CONFIG_MODULES]
  Link: http://lkml.kernel.org/r/20171020170633.9593-1-andi@firstfloor.org
[akpm@linux-foundation.org: fix unused var warning]
[akpm@linux-foundation.org: Use 0200 for clear_warn_once file, per mpe]
[akpm@linux-foundation.org: clear BUGFLAG_DONE in clear_once_table(), per mpe]
Link: http://lkml.kernel.org/r/20171019204642.7404-1-andi@firstfloor.orgSigned-off-by: NAndi Kleen <ak@linux.intel.com>
Tested-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aaf5dcfb

mm, compaction: persistently skip hugetlbfs pageblocks · 21dc7e02

由 David Rientjes 提交于 11月 17, 2017

It is pointless to migrate hugetlb memory as part of memory compaction
if the hugetlb size is equal to the pageblock order.  No defragmentation
is occurring in this condition.

It is also pointless to for the freeing scanner to scan a pageblock
where a hugetlb page is pinned.  Unconditionally skip these pageblocks,
and do so peristently so that they are not rescanned until it is
observed that these hugepages are no longer pinned.

It would also be possible to do this by involving the hugetlb subsystem
in marking pageblocks to no longer be skipped when they hugetlb pages
are freed.  This is a simple solution that doesn't involve any
additional subsystems in pageblock skip manipulation.

[rientjes@google.com: fix build]
  Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1708201734390.117182@chino.kir.corp.google.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1708151639130.106658@chino.kir.corp.google.comSigned-off-by: NDavid Rientjes <rientjes@google.com>
Tested-by: NMichal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

21dc7e02

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功