提交 · 736706bee3298208343a76096370e4f6a5c55915 · openeuler / Kernel

05 3月, 2019 3 次提交

get rid of legacy 'get_ds()' function · 736706be

由 Linus Torvalds 提交于 3月 04, 2019

Every in-kernel use of this function defined it to KERNEL_DS (either as
an actual define, or as an inline function).  It's an entirely
historical artifact, and long long long ago used to actually read the
segment selector valueof '%ds' on x86.

Which in the kernel is always KERNEL_DS.

Inspired by a patch from Jann Horn that just did this for a very small
subset of users (the ones in fs/), along with Al who suggested a script.
I then just took it to the logical extreme and removed all the remaining
gunk.

Roughly scripted with

   git grep -l '(get_ds())' -- :^tools/ | xargs sed -i 's/(get_ds())/(KERNEL_DS)/'
   git grep -lw 'get_ds' -- :^tools/ | xargs sed -i '/^#define get_ds()/d'

plus manual fixups to remove a few unusual usage patterns, the couple of
inline function cases and to fix up a comment that had become stale.

The 'get_ds()' function remains in an x86 kvm selftest, since in user
space it actually does something relevant.
Inspired-by: NJann Horn <jannh@google.com>
Inspired-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

736706be

aio: simplify - and fix - fget/fput for io_submit() · 84c4e1f8

由 Linus Torvalds 提交于 3月 03, 2019

Al Viro root-caused a race where the IOCB_CMD_POLL handling of
fget/fput() could cause us to access the file pointer after it had
already been freed:

 "In more details - normally IOCB_CMD_POLL handling looks so:

   1) io_submit(2) allocates aio_kiocb instance and passes it to
      aio_poll()

   2) aio_poll() resolves the descriptor to struct file by req->file =
      fget(iocb->aio_fildes)

   3) aio_poll() sets ->woken to false and raises ->ki_refcnt of that
      aio_kiocb to 2 (bumps by 1, that is).

   4) aio_poll() calls vfs_poll(). After sanity checks (basically,
      "poll_wait() had been called and only once") it locks the queue.
      That's what the extra reference to iocb had been for - we know we
      can safely access it.

   5) With queue locked, we check if ->woken has already been set to
      true (by aio_poll_wake()) and, if it had been, we unlock the
      queue, drop a reference to aio_kiocb and bugger off - at that
      point it's a responsibility to aio_poll_wake() and the stuff
      called/scheduled by it. That code will drop the reference to file
      in req->file, along with the other reference to our aio_kiocb.

   6) otherwise, we see whether we need to wait. If we do, we unlock the
      queue, drop one reference to aio_kiocb and go away - eventual
      wakeup (or cancel) will deal with the reference to file and with
      the other reference to aio_kiocb

   7) otherwise we remove ourselves from waitqueue (still under the
      queue lock), so that wakeup won't get us. No async activity will
      be happening, so we can safely drop req->file and iocb ourselves.

  If wakeup happens while we are in vfs_poll(), we are fine - aio_kiocb
  won't get freed under us, so we can do all the checks and locking
  safely. And we don't touch ->file if we detect that case.

  However, vfs_poll() most certainly *does* touch the file it had been
  given. So wakeup coming while we are still in ->poll() might end up
  doing fput() on that file. That case is not too rare, and usually we
  are saved by the still present reference from descriptor table - that
  fput() is not the final one.

  But if another thread closes that descriptor right after our fget()
  and wakeup does happen before ->poll() returns, we are in trouble -
  final fput() done while we are in the middle of a method:

Al also wrote a patch to take an extra reference to the file descriptor
to fix this, but I instead suggested we just streamline the whole file
pointer handling by submit_io() so that the generic aio submission code
simply keeps the file pointer around until the aio has completed.

Fixes: bfe4037e ("aio: implement IOCB_CMD_POLL")
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Reported-by: syzbot+503d4cc169fcec1cb18c@syzkaller.appspotmail.com
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

84c4e1f8

x86-64: add warning for non-canonical user access address dereferences · 00c42373

由 Linus Torvalds 提交于 2月 26, 2019

This adds a warning (once) for any kernel dereference that has a user
exception handler, but accesses a non-canonical address.  It basically
is a simpler - and more limited - version of commit 9da3f2b7
("x86/fault: BUG() when uaccess helpers fault on kernel addresses") that
got reverted.

Note that unlike that original commit, this only causes a warning,
because there are real situations where we currently can do this
(notably speculative argument fetching for uprobes etc).  Also, unlike
that original commit, this _only_ triggers for #GP accesses, so the
cases of valid kernel pointers that cross into a non-mapped page aren't
affected.

The intent of this is two-fold:

 - the uprobe/tracing accesses really do need to be more careful. In
   particular, from a portability standpoint it's just wrong to think
   that "a pointer is a pointer", and use the same logic for any random
   pointer value you find on the stack. It may _work_ on x86-64, but it
   doesn't necessarily work on other architectures (where the same
   pointer value can be either a kernel pointer _or_ a user pointer, and
   you really need to be much more careful in how you try to access it)

   The warning can hopefully end up being a reminder that just any
   random pointer access won't do.

 - Kees in particular wanted a way to actually report invalid uses of
   wild pointers to user space accessors, instead of just silently
   failing them. Automated fuzzers want a way to get reports if the
   kernel ever uses invalid values that the fuzzer fed it.

   The non-canonical address range is a fair chunk of the address space,
   and with this you can teach syzkaller to feed in invalid pointer
   values and find cases where we do not properly validate user
   addresses (possibly due to bad uses of "set_fs()").
Acked-by: NKees Cook <keescook@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

00c42373

04 3月, 2019 1 次提交
- L
  
  Linux 5.0 · 1c163f4c
  由 Linus Torvalds 提交于 3月 03, 2019
  
  1c163f4c
03 3月, 2019 5 次提交

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · c027c7cf

由 Linus Torvalds 提交于 3月 02, 2019

Pull ARM SoC fixes from Arnd Bergmann:
 "One more set of simple ARM platform fixes:

   - A boot regression on qualcomm msm8998

   - Gemini display controllers got turned off by accident

   - incorrect reference counting in optee"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  tee: optee: add missing of_node_put after of_device_is_available
  arm64: dts: qcom: msm8998: Extend TZ reserved memory area
  ARM: dts: gemini: Re-enable display controller

c027c7cf

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e7c42a89

由 Linus Torvalds 提交于 3月 02, 2019

Pull x86 fixes from Thomas Gleixner:
 "Two last minute fixes:

   - Prevent value evaluation via functions happening in the user access
     enabled region of __put_user() (put another way: make sure to
     evaluate the value to be stored in user space _before_ enabling
     user space accesses)

   - Correct the definition of a Hyper-V hypercall constant"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/hyper-v: Fix definition of HV_MAX_FLUSH_REP_COUNT
  x86/uaccess: Don't leak the AC flag into __put_user() value evaluation

e7c42a89

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · df49fd0f

由 Linus Torvalds 提交于 3月 02, 2019

Pull SCSI fixes from James Bottomley:
 "Nine small fixes.

  The resume fix is a cosmetic removal of a warning with an incorrect
  condition causing it to alarm people wrongly.

  The other eight patches correct a thinko in Christoph Hellwig's DMA
  conversion series. Without it all these drivers end up with 32 bit DMA
  masks meaning they bounce any page over 4GB before sending it to the
  controller.

  Nowadays, even laptops mostly have memory above 4GB, so this can lead
  to significant performance degradation with all the bouncing"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: core: Avoid that system resume triggers a kernel warning
  scsi: hptiop: fix calls to dma_set_mask()
  scsi: hisi_sas: fix calls to dma_set_mask_and_coherent()
  scsi: csiostor: fix calls to dma_set_mask_and_coherent()
  scsi: bfa: fix calls to dma_set_mask_and_coherent()
  scsi: aic94xx: fix calls to dma_set_mask_and_coherent()
  scsi: 3w-sas: fix calls to dma_set_mask_and_coherent()
  scsi: 3w-9xxx: fix calls to dma_set_mask_and_coherent()
  scsi: lpfc: fix calls to dma_set_mask_and_coherent()

df49fd0f

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · c93d9218

由 Linus Torvalds 提交于 3月 02, 2019

Pull networking fixes from David Miller:

 1) Fix refcount leak in act_ipt during replace, from Davide Caratti.

 2) Set task state properly in tun during blocking reads, from Timur
    Celik.

 3) Leaked reference in DSA, from Wen Yang.

 4) NULL deref in act_tunnel_key, from Vlad Buslov.

 5) cipso_v4_erro can reference the skb IPCB in inappropriate contexts
    thus referencing garbage, from Nazarov Sergey.

 6) Don't accept RTA_VIA and RTA_GATEWAY in contexts where those
    attributes make no sense.

 7) Fix hung sendto in tipc, from Tung Nguyen.

 8) Out-of-bounds access in netlabel, from Paul Moore.

 9) Grant reference leak in xen-netback, from Igor Druzhinin.

10) Fix tx stalls with lan743x, from Bryan Whitehead.

11) Fix interrupt storm with mv88e6xxx, from Hein Kallweit.

12) Memory leak in sit on device registry failure, from Mao Wenan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
  net: sit: fix memory leak in sit_init_net()
  net: dsa: mv88e6xxx: Fix statistics on mv88e6161
  geneve: correctly handle ipv6.disable module parameter
  net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmode
  bpf: fix sanitation rewrite in case of non-pointers
  ipv4: Add ICMPv6 support when parse route ipproto
  MIPS: eBPF: Fix icache flush end address
  lan743x: Fix TX Stall Issue
  net: phy: phylink: fix uninitialized variable in phylink_get_mac_state
  net: aquantia: regression on cpus with high cores: set mode with 8 queues
  selftests: fixes for UDP GRO
  bpf: drop refcount if bpf_map_new_fd() fails in map_create()
  net: dsa: mv88e6xxx: power serdes on/off for 10G interfaces on 6390X
  net: dsa: mv88e6xxx: Fix u64 statistics
  xen-netback: don't populate the hash cache on XenBus disconnect
  xen-netback: fix occasional leak of grant ref mappings under memory pressure
  sctp: chunk.c: correct format string for size_t in printk
  net: netem: fix skb length BUG_ON in __skb_to_sgvec
  netlabel: fix out-of-bounds memory accesses
  ipv4: Pass original device to ip_rcv_finish_core
  ...

c93d9218

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · fa3294c5

由 Linus Torvalds 提交于 3月 02, 2019

Pull more crypto fixes from Herbert Xu:
 "This fixes a couple of issues in arm64/chacha that was introduced in
  5.0"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: arm64/chacha - fix hchacha_block_neon() for big endian
  crypto: arm64/chacha - fix chacha_4block_xor_neon() for big endian

fa3294c5

02 3月, 2019 17 次提交

net: sit: fix memory leak in sit_init_net() · 07f12b26

由 Mao Wenan 提交于 3月 01, 2019

If register_netdev() is failed to register sitn->fb_tunnel_dev,
it will go to err_reg_dev and forget to free netdev(sitn->fb_tunnel_dev).

BUG: memory leak
unreferenced object 0xffff888378daad00 (size 512):
  comm "syz-executor.1", pid 4006, jiffies 4295121142 (age 16.115s)
  hex dump (first 32 bytes):
    00 e6 ed c0 83 88 ff ff 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
backtrace:
    [<00000000d6dcb63e>] kvmalloc include/linux/mm.h:577 [inline]
    [<00000000d6dcb63e>] kvzalloc include/linux/mm.h:585 [inline]
    [<00000000d6dcb63e>] netif_alloc_netdev_queues net/core/dev.c:8380 [inline]
    [<00000000d6dcb63e>] alloc_netdev_mqs+0x600/0xcc0 net/core/dev.c:8970
    [<00000000867e172f>] sit_init_net+0x295/0xa40 net/ipv6/sit.c:1848
    [<00000000871019fa>] ops_init+0xad/0x3e0 net/core/net_namespace.c:129
    [<00000000319507f6>] setup_net+0x2ba/0x690 net/core/net_namespace.c:314
    [<0000000087db4f96>] copy_net_ns+0x1dc/0x330 net/core/net_namespace.c:437
    [<0000000057efc651>] create_new_namespaces+0x382/0x730 kernel/nsproxy.c:107
    [<00000000676f83de>] copy_namespaces+0x2ed/0x3d0 kernel/nsproxy.c:165
    [<0000000030b74bac>] copy_process.part.27+0x231e/0x6db0 kernel/fork.c:1919
    [<00000000fff78746>] copy_process kernel/fork.c:1713 [inline]
    [<00000000fff78746>] _do_fork+0x1bc/0xe90 kernel/fork.c:2224
    [<000000001c2e0d1c>] do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
    [<00000000ec48bd44>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<0000000039acff8a>] 0xffffffffffffffff
Signed-off-by: NMao Wenan <maowenan@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07f12b26

net: dsa: mv88e6xxx: Fix statistics on mv88e6161 · a6da21bb

由 Andrew Lunn 提交于 3月 01, 2019

Despite what the datesheet says, the silicon implements the older way
of snapshoting the statistics. Change the op.

Reported-by: Chris.Healy@zii.aero
Tested-by: Chris.Healy@zii.aero
Fixes: 0ac64c39 ("net: dsa: mv88e6xxx: mv88e6161 uses mv88e6320 stats snapshot")
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6da21bb

geneve: correctly handle ipv6.disable module parameter · cf1c9ccb

由 Jiri Benc 提交于 2月 28, 2019

When IPv6 is compiled but disabled at runtime, geneve_sock_add returns
-EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole
operation of bringing up the tunnel.

Ignore failure of IPv6 socket creation for metadata based tunnels caused by
IPv6 not being available.

This is the same fix as what commit d074bf96 ("vxlan: correctly handle
ipv6.disable module parameter") is doing for vxlan.

Note there's also commit c0a47e44 ("geneve: should not call rt6_lookup()
when ipv6 was disabled") which fixes a similar issue but for regular
tunnels, while this patch is needed for metadata based tunnels.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf1c9ccb

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · f08d6114

由 David S. Miller 提交于 3月 01, 2019

Alexei Starovoitov says:

====================
pull-request: bpf 2019-03-01

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) fix sanitation rewrite, from Daniel.

2) fix error path on map_new_fd, from Peng.

3) fix icache flush address, from Paul.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f08d6114

net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmode · ed8fe202

由 Heiner Kallweit 提交于 2月 28, 2019

When debugging another issue I faced an interrupt storm in this
driver (88E6390, port 9 in SGMII mode), consisting of alternating
link-up / link-down interrupts. Analysis showed that the driver
wanted to set a cmode that was set already. But so far
mv88e6390x_port_set_cmode() doesn't check this and powers down
SERDES, what causes the link to break, and eventually results in
the described interrupt storm.

Fix this by checking whether the cmode actually changes. We want
that the very first call to mv88e6390x_port_set_cmode() always
configures the registers, therefore initialize port.cmode with
a value that is different from any supported cmode value.
We have to take care that we only init the ports cmode once
chip->info->num_ports is set.

v2:
- add small helper and init the number of actual ports only

Fixes: 364e9d77 ("net: dsa: mv88e6xxx: Power on/off SERDES on cmode change")
Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ed8fe202

bpf: fix sanitation rewrite in case of non-pointers · 3612af78

由 Daniel Borkmann 提交于 3月 01, 2019

Marek reported that he saw an issue with the below snippet in that
timing measurements where off when loaded as unpriv while results
were reasonable when loaded as privileged:

    [...]
    uint64_t a = bpf_ktime_get_ns();
    uint64_t b = bpf_ktime_get_ns();
    uint64_t delta = b - a;
    if ((int64_t)delta > 0) {
    [...]

Turns out there is a bug where a corner case is missing in the fix
d3bd7413 ("bpf: fix sanitation of alu op with pointer / scalar
type from different paths"), namely fixup_bpf_calls() only checks
whether aux has a non-zero alu_state, but it also needs to test for
the case of BPF_ALU_NON_POINTER since in both occasions we need to
skip the masking rewrite (as there is nothing to mask).

Fixes: d3bd7413 ("bpf: fix sanitation of alu op with pointer / scalar type from different paths")
Reported-by: NMarek Majkowski <marek@cloudflare.com>
Reported-by: NArthur Fabre <afabre@cloudflare.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/netdev/CAJPywTJqP34cK20iLM5YmUMz9KXQOdu1-+BZrGMAGgLuBWz7fg@mail.gmail.com/T/Acked-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

3612af78

ipv4: Add ICMPv6 support when parse route ipproto · 5e1a99ea

由 Hangbin Liu 提交于 2月 27, 2019

For ip rules, we need to use 'ipproto ipv6-icmp' to match ICMPv6 headers.
But for ip -6 route, currently we only support tcp, udp and icmp.

Add ICMPv6 support so we can match ipv6-icmp rules for route lookup.

v2: As David Ahern and Sabrina Dubroca suggested, Add an argument to
rtm_getroute_parse_ip_proto() to handle ICMP/ICMPv6 with different family.
Reported-by: NJianlin Shi <jishi@redhat.com>
Fixes: eacb9384 ("ipv6: support sport, dport and ip_proto in RTM_GETROUTE")
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e1a99ea

MIPS: eBPF: Fix icache flush end address · d1a2930d

由 Paul Burton 提交于 3月 01, 2019

The MIPS eBPF JIT calls flush_icache_range() in order to ensure the
icache observes the code that we just wrote. Unfortunately it gets the
end address calculation wrong due to some bad pointer arithmetic.

The struct jit_ctx target field is of type pointer to u32, and as such
adding one to it will increment the address being pointed to by 4 bytes.
Therefore in order to find the address of the end of the code we simply
need to add the number of 4 byte instructions emitted, but we mistakenly
add the number of instructions multiplied by 4. This results in the call
to flush_icache_range() operating on a memory region 4x larger than
intended, which is always wasteful and can cause crashes if we overrun
into an unmapped page.

Fix this by correcting the pointer arithmetic to remove the bogus
multiplication, and use braces to remove the need for a set of brackets
whilst also making it obvious that the target field is a pointer.
Signed-off-by: NPaul Burton <paul.burton@mips.com>
Fixes: b6bd53f9 ("MIPS: Add missing file for eBPF JIT.")
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: netdev@vger.kernel.org
Cc: bpf@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

d1a2930d

lan743x: Fix TX Stall Issue · 90490ef7

由 Bryan Whitehead 提交于 2月 26, 2019

It has been observed that tx queue stalls while downloading
from certain web sites (example www.speedtest.net)

The cause has been tracked down to a corner case where
dma descriptors where not setup properly. And there for a tx
completion interrupt was not signaled.

This fix corrects the problem by properly marking the end of
a multi descriptor transmission.

Fixes: 23f0703c ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: NBryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

90490ef7

net: phy: phylink: fix uninitialized variable in phylink_get_mac_state · d25ed413

由 Heiner Kallweit 提交于 2月 26, 2019

When debugging an issue I found implausible values in state->pause.
Reason in that state->pause isn't initialized and later only single
bits are changed. Also the struct itself isn't initialized in
phylink_resolve(). So better initialize state->pause and other
not yet initialized fields.

v2:
- use right function name in subject
v3:
- initialize additional fields

Fixes: 9525ae83 ("phylink: add phylink infrastructure")
Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d25ed413

net: aquantia: regression on cpus with high cores: set mode with 8 queues · 15f3ddf5

由 Dmitry Bogdanov 提交于 2月 26, 2019

Recently the maximum number of queues was increased up to 8, but
NIC was not fully configured for 8 queues. In setups with more than 4 CPU
cores parts of TX traffic gets lost if the kernel routes it to queues 4th-8th.

This patch sets a tx hw traffic mode with 8 queues.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202651

Fixes: 71a963cf ("net: aquantia: increase max number of hw queues")
Reported-by: NNicholas Johnson <nicholas.johnson@outlook.com.au>
Signed-off-by: NDmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: NIgor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

15f3ddf5

selftests: fixes for UDP GRO · ada641ff

由 Paolo Abeni 提交于 2月 26, 2019

The current implementation for UDP GRO tests is racy: the receiver
may flush the RX queue while the sending is still transmitting and
incorrectly report RX errors, with a wrong number of packet received.

Add explicit timeouts to the receiver for both connection activation
(first packet received for UDP) and reception completion, so that
in the above critical scenario the receiver will wait for the
transfer completion.

Fixes: 3327a9c4 ("selftests: add functionals test for UDP GRO")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ada641ff

Merge tag 'iommu-fix-v5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · a215ce8f

由 Linus Torvalds 提交于 3月 01, 2019

Pull IOMMU fix from Joerg Roedel:
 "One important fix for a memory corruption issue in the Intel VT-d
  driver that triggers on hardware with deep PCI hierarchies"

* tag 'iommu-fix-v5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/dmar: Fix buffer overflow during PCI bus notification

a215ce8f

Merge branch 'akpm' (patches from Andrew) · 2d28e01d

由 Linus Torvalds 提交于 3月 01, 2019

Merge misc fixes from Andrew Morton:
 "2 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  hugetlbfs: fix races and page leaks during migration
  kasan: turn off asan-stack for clang-8 and earlier

2d28e01d

hugetlbfs: fix races and page leaks during migration · cb6acd01

由 Mike Kravetz 提交于 2月 28, 2019

hugetlb pages should only be migrated if they are 'active'.  The
routines set/clear_page_huge_active() modify the active state of hugetlb
pages.

When a new hugetlb page is allocated at fault time, set_page_huge_active
is called before the page is locked.  Therefore, another thread could
race and migrate the page while it is being added to page table by the
fault code.  This race is somewhat hard to trigger, but can be seen by
strategically adding udelay to simulate worst case scheduling behavior.
Depending on 'how' the code races, various BUG()s could be triggered.

To address this issue, simply delay the set_page_huge_active call until
after the page is successfully added to the page table.

Hugetlb pages can also be leaked at migration time if the pages are
associated with a file in an explicitly mounted hugetlbfs filesystem.
For example, consider a two node system with 4GB worth of huge pages
available.  A program mmaps a 2G file in a hugetlbfs filesystem.  It
then migrates the pages associated with the file from one node to
another.  When the program exits, huge page counts are as follows:

  node0
  1024    free_hugepages
  1024    nr_hugepages

  node1
  0       free_hugepages
  1024    nr_hugepages

  Filesystem                         Size  Used Avail Use% Mounted on
  nodev                              4.0G  2.0G  2.0G  50% /var/opt/hugepool

That is as expected.  2G of huge pages are taken from the free_hugepages
counts, and 2G is the size of the file in the explicitly mounted
filesystem.  If the file is then removed, the counts become:

  node0
  1024    free_hugepages
  1024    nr_hugepages

  node1
  1024    free_hugepages
  1024    nr_hugepages

  Filesystem                         Size  Used Avail Use% Mounted on
  nodev                              4.0G  2.0G  2.0G  50% /var/opt/hugepool

Note that the filesystem still shows 2G of pages used, while there
actually are no huge pages in use.  The only way to 'fix' the filesystem
accounting is to unmount the filesystem

If a hugetlb page is associated with an explicitly mounted filesystem,
this information in contained in the page_private field.  At migration
time, this information is not preserved.  To fix, simply transfer
page_private from old to new page at migration time if necessary.

There is a related race with removing a huge page from a file and
migration.  When a huge page is removed from the pagecache, the
page_mapping() field is cleared, yet page_private remains set until the
page is actually freed by free_huge_page().  A page could be migrated
while in this state.  However, since page_mapping() is not set the
hugetlbfs specific routine to transfer page_private is not called and we
leak the page count in the filesystem.

To fix that, check for this condition before migrating a huge page.  If
the condition is detected, return EBUSY for the page.

Link: http://lkml.kernel.org/r/74510272-7319-7372-9ea6-ec914734c179@oracle.com
Link: http://lkml.kernel.org/r/20190212221400.3512-1-mike.kravetz@oracle.com
Fixes: bcc54222 ("mm: hugetlb: introduce page_huge_active")
Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <stable@vger.kernel.org>
[mike.kravetz@oracle.com: v2]
  Link: http://lkml.kernel.org/r/7534d322-d782-8ac6-1c8d-a8dc380eb3ab@oracle.com
[mike.kravetz@oracle.com: update comment and changelog]
  Link: http://lkml.kernel.org/r/420bcfd6-158b-38e4-98da-26d0cd85bd01@oracle.comSigned-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cb6acd01

kasan: turn off asan-stack for clang-8 and earlier · 6baec880

由 Arnd Bergmann 提交于 2月 28, 2019

Building an arm64 allmodconfig kernel with clang results in over 140
warnings about overly large stack frames, the worst ones being:

drivers/gpu/drm/panel/panel-sitronix-st7789v.c:196:12: error: stack frame size of 20224 bytes in function 'st7789v_prepare'
drivers/video/fbdev/omap2/omapfb/displays/panel-tpo-td028ttec1.c:196:12: error: stack frame size of 13120 bytes in function 'td028ttec1_panel_enable'
drivers/usb/host/max3421-hcd.c:1395:1: error: stack frame size of 10048 bytes in function 'max3421_spi_thread'
drivers/net/wan/slic_ds26522.c:209:12: error: stack frame size of 9664 bytes in function 'slic_ds26522_probe'
drivers/crypto/ccp/ccp-ops.c:2434:5: error: stack frame size of 8832 bytes in function 'ccp_run_cmd'
drivers/media/dvb-frontends/stv0367.c:1005:12: error: stack frame size of 7840 bytes in function 'stv0367ter_algo'

None of these happen with gcc today, and almost all of these are the
result of a single known issue in llvm. Hopefully it will eventually
get fixed with the clang-9 release.

In the meantime, the best idea I have is to turn off asan-stack for
clang-8 and earlier, so we can produce a kernel that is safe to run.

I have posted three patches that address the frame overflow warnings
that are not addressed by turning off asan-stack, so in combination with
this change, we get much closer to a clean allmodconfig build, which in
turn is necessary to do meaningful build regression testing.

It is still possible to turn on the CONFIG_ASAN_STACK option on all
versions of clang, and it's always enabled for gcc, but when
CONFIG_COMPILE_TEST is set, the option remains invisible, so
allmodconfig and randconfig builds (which are normally done with a
forced CONFIG_COMPILE_TEST) will still result in a mostly clean build.

Link: http://lkml.kernel.org/r/20190222222950.3997333-1-arnd@arndb.de
Link: https://bugs.llvm.org/show_bug.cgi?id=38809Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NQian Cai <cai@lca.pw>
Reviewed-by: NMark Brown <broonie@kernel.org>
Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6baec880

Merge tag 'drm-fixes-2019-03-01' of git://anongit.freedesktop.org/drm/drm · 6357c812

由 Linus Torvalds 提交于 3月 01, 2019

Pull drm fixes from Dave Airlie:
 "Three final fixes, one for a feature that is new in this kernel, one
  bochs fix for qemu riscv and one atomic modesetting fix.

  I've left a few of the other late fixes until next as I didn't want to
  throw in anything that wasn't really necessary"

* tag 'drm-fixes-2019-03-01' of git://anongit.freedesktop.org/drm/drm:
  drm/bochs: Fix the ID mismatch error
  drm: Block fb changes for async plane updates
  drm/amd/display: Use vrr friendly pageflip throttling in DC.

6357c812

01 3月, 2019 14 次提交

bpf: drop refcount if bpf_map_new_fd() fails in map_create() · 352d20d6

由 Peng Sun 提交于 2月 27, 2019

In bpf/syscall.c, map_create() first set map->usercnt to 1, a file
descriptor is supposed to return to userspace. When bpf_map_new_fd()
fails, drop the refcount.

Fixes: bd5f5f4e ("bpf: Add BPF_MAP_GET_FD_BY_ID")
Signed-off-by: NPeng Sun <sironhide0null@gmail.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

352d20d6

Merge tag 'qcom-fixes-for-5.0-rc8' of... · 6089e656

由 Arnd Bergmann 提交于 3月 01, 2019

Merge tag 'qcom-fixes-for-5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux into arm/fixes

Qualcomm ARM64 Fixes for 5.0-rc8

* Fix TZ memory area size to avoid crashes during boot

* tag 'qcom-fixes-for-5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/agross/linux:
  arm64: dts: qcom: msm8998: Extend TZ reserved memory area

6089e656

Merge tag 'tee-fix-for-v5.0' of... · 36baa6ed

由 Arnd Bergmann 提交于 3月 01, 2019

Merge tag 'tee-fix-for-v5.0' of https://git.linaro.org/people/jens.wiklander/linux-tee into arm/fixes

OP-TEE driver
- add missing of_node_put after of_device_is_available

* tag 'tee-fix-for-v5.0' of https://git.linaro.org/people/jens.wiklander/linux-tee:
  tee: optee: add missing of_node_put after of_device_is_available

36baa6ed

Merge tag 'mips_fixes_5.0_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · bf23aba1

由 Linus Torvalds 提交于 2月 28, 2019

Pull MIPS fixes from Paul Burton:
 "A few more MIPS fixes:

   - Fix 16b cmpxchg() operations which could erroneously fail if bits
     15:8 of the old value are non-zero. In practice I'm not aware of
     any actual users of 16b cmpxchg() on MIPS, but this fixes the
     support for it was was introduced in v4.13.

   - Provide a struct device to dma_alloc_coherent for Lantiq XWAY
     systems with a "Voice MIPS Macro Core" (VMMC) device.

   - Provide DMA masks for BCM63xx ethernet devices, fixing a regression
     introduced in v4.19.

   - Fix memblock reservation for the kernel when the system has a
     non-zero PHYS_OFFSET, correcting the memblock conversion performed
     in v4.20"

* tag 'mips_fixes_5.0_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: fix memory setup for platforms with PHYS_OFFSET != 0
  MIPS: BCM63XX: provide DMA masks for ethernet devices
  MIPS: lantiq: pass struct device to DMA API functions
  MIPS: fix truncation in __cmpxchg_small for short values

bf23aba1

Merge tag 'for-linus-5.0-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux · 3eb07d20

由 Linus Torvalds 提交于 2月 28, 2019

Pull orangefs fixlet from Mike Marshall:
 "Remove two un-needed BUG_ONs"

* tag 'for-linus-5.0-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: remove two un-needed BUG_ONs...

3eb07d20

net: dsa: mv88e6xxx: power serdes on/off for 10G interfaces on 6390X · d235c48b

由 Maxime Chevallier 提交于 2月 28, 2019

Upon setting the cmode on 6390 and 6390X, the associated serdes
interfaces must be powered off/on.

Both 6390X and 6390 share code to do so, but it currently uses the 6390
specific helper mv88e6390_serdes_power() to disable and enable the
serdes interface.

This call will fail silently on 6390X when trying so set a 10G interface
such as XAUI or RXAUI, since mv88e6390_serdes_power() internally grabs
the lane number based on modes supported by the 6390, and returns 0 when
getting -ENODEV as a lane number.

Using mv88e6390x_serdes_power() should be safe here, since we explicitly
rule-out all ports but the 9 and 10, and because modes supported by 6390
ports 9 and 10 are a subset of those supported on 6390X.

This was tested on 6390X using RXAUI mode.

Fixes: 364e9d77 ("net: dsa: mv88e6xxx: Power on/off SERDES on cmode change")
Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d235c48b

net: dsa: mv88e6xxx: Fix u64 statistics · 6e46e2d8

由 Andrew Lunn 提交于 2月 28, 2019

The switch maintains u64 counters for the number of octets sent and
received. These are kept as two u32's which need to be combined.  Fix
the combing, which wrongly worked on u16's.

Fixes: 80c4627b ("dsa: mv88x6xxx: Refactor getting a single statistic")
Reported-by: NChris Healy <Chris.Healy@zii.aero>
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e46e2d8

xen-netback: don't populate the hash cache on XenBus disconnect · a2288d4e

由 Igor Druzhinin 提交于 2月 28, 2019

Occasionally, during the disconnection procedure on XenBus which
includes hash cache deinitialization there might be some packets
still in-flight on other processors. Handling of these packets includes
hashing and hash cache population that finally results in hash cache
data structure corruption.

In order to avoid this we prevent hashing of those packets if there
are no queues initialized. In that case RCU protection of queues guards
the hash cache as well.
Signed-off-by: NIgor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: NPaul Durrant <paul.durrant@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2288d4e

xen-netback: fix occasional leak of grant ref mappings under memory pressure · 99e87f56

由 Igor Druzhinin 提交于 2月 28, 2019

Zero-copy callback flag is not yet set on frag list skb at the moment
xenvif_handle_frag_list() returns -ENOMEM. This eventually results in
leaking grant ref mappings since xenvif_zerocopy_callback() is never
called for these fragments. Those eventually build up and cause Xen
to kill Dom0 as the slots get reused for new mappings:

"d0v0 Attempt to implicitly unmap a granted PTE c010000329fce005"

That behavior is observed under certain workloads where sudden spikes
of page cache writes coexist with active atomic skb allocations from
network traffic. Additionally, rework the logic to deal with frag_list
deallocation in a single place.
Signed-off-by: NPaul Durrant <paul.durrant@citrix.com>
Signed-off-by: NIgor Druzhinin <igor.druzhinin@citrix.com>
Acked-by: NWei Liu <wei.liu2@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

99e87f56

sctp: chunk.c: correct format string for size_t in printk · ac510505

由 Matthias Maennich 提交于 2月 28, 2019

According to Documentation/core-api/printk-formats.rst, size_t should be
printed with %zu, rather than %Zu.

In addition, using %Zu triggers a warning on clang (-Wformat-extra-args):

net/sctp/chunk.c:196:25: warning: data argument not used by format string [-Wformat-extra-args]
                                    __func__, asoc, max_data);
                                    ~~~~~~~~~~~~~~~~^~~~~~~~~
./include/linux/printk.h:440:49: note: expanded from macro 'pr_warn_ratelimited'
        printk_ratelimited(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
./include/linux/printk.h:424:17: note: expanded from macro 'printk_ratelimited'
                printk(fmt, ##__VA_ARGS__);                             \
                       ~~~    ^

Fixes: 5b5e0928 ("lib/vsprintf.c: remove %Z support")
Link: https://github.com/ClangBuiltLinux/linux/issues/378Signed-off-by: NMatthias Maennich <maennich@google.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac510505

net: netem: fix skb length BUG_ON in __skb_to_sgvec · 5845f706

由 Sheng Lan 提交于 2月 28, 2019

It can be reproduced by following steps:
1. virtio_net NIC is configured with gso/tso on
2. configure nginx as http server with an index file bigger than 1M bytes
3. use tc netem to produce duplicate packets and delay:
   tc qdisc add dev eth0 root netem delay 100ms 10ms 30% duplicate 90%
4. continually curl the nginx http server to get index file on client
5. BUG_ON is seen quickly

[10258690.371129] kernel BUG at net/core/skbuff.c:4028!
[10258690.371748] invalid opcode: 0000 [#1] SMP PTI
[10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W         5.0.0-rc6 #2
[10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202
[10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea
[10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002
[10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028
[10258690.372094] FS:  0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000
[10258690.372094] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0
[10258690.372094] Call Trace:
[10258690.372094]  <IRQ>
[10258690.372094]  skb_to_sgvec+0x11/0x40
[10258690.372094]  start_xmit+0x38c/0x520 [virtio_net]
[10258690.372094]  dev_hard_start_xmit+0x9b/0x200
[10258690.372094]  sch_direct_xmit+0xff/0x260
[10258690.372094]  __qdisc_run+0x15e/0x4e0
[10258690.372094]  net_tx_action+0x137/0x210
[10258690.372094]  __do_softirq+0xd6/0x2a9
[10258690.372094]  irq_exit+0xde/0xf0
[10258690.372094]  smp_apic_timer_interrupt+0x74/0x140
[10258690.372094]  apic_timer_interrupt+0xf/0x20
[10258690.372094]  </IRQ>

In __skb_to_sgvec(), the skb->len is not equal to the sum of the skb's
linear data size and nonlinear data size, thus BUG_ON triggered.
Because the skb is cloned and a part of nonlinear data is split off.

Duplicate packet is cloned in netem_enqueue() and may be delayed
some time in qdisc. When qdisc len reached the limit and returns
NET_XMIT_DROP, the skb will be retransmit later in write queue.
the skb will be fragmented by tso_fragment(), the limit size
that depends on cwnd and mss decrease, the skb's nonlinear
data will be split off. The length of the skb cloned by netem
will not be updated. When we use virtio_net NIC and invoke skb_to_sgvec(),
the BUG_ON trigger.

To fix it, netem returns NET_XMIT_SUCCESS to upper stack
when it clones a duplicate packet.

Fixes: 35d889d1 ("sch_netem: fix skb leak in netem_enqueue()")
Signed-off-by: NSheng Lan <lansheng@huawei.com>
Reported-by: NQin Ji <jiqin.ji@huawei.com>
Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5845f706

Merge tag 'mmc-v5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 3a8ed368

由 Linus Torvalds 提交于 2月 28, 2019

Pull MMC fixes from Ulf Hansson:
 "MMC core:
   - Fix NULL ptr crash for a special test case
   - Align max segment size with logical block size to prevent bugs in
     v5.1-rc1.

  MMC host:
   - cqhci: Minor fixes
   - tmio: Prevent interrupt storm
   - tmio: Fixup SD/MMC card initialization
   - spi: Allow card to be detected during probe
   - sdhci-esdhc-imx: Fixup fix for ERR004536"

* tag 'mmc-v5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: sdhci-esdhc-imx: correct the fix of ERR004536
  mmc: core: align max segment size with logical block size
  mmc: cqhci: Fix a tiny potential memory leak on error condition
  mmc: cqhci: fix space allocated for transfer descriptor
  mmc: core: Fix NULL ptr crash from mmc_should_fail_request
  mmc: tmio: fix access width of Block Count Register
  mmc: tmio_mmc_core: don't claim spurious interrupts
  mmc: spi: Fix card detection during probe

3a8ed368

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 3f25a599

由 Linus Torvalds 提交于 2月 28, 2019

Pull crypto fixes from Herbert Xu:
 "This fixes a compiler warning introduced by a previous fix, as well as
  two crash bugs on ARM"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: sha512/arm - fix crash bug in Thumb2 build
  crypto: sha256/arm - fix crash bug in Thumb2 build
  crypto: ccree - add missing inline qualifier

3f25a599

kvm: properly check debugfs dentry before using it · 8ed0579c

由 Greg Kroah-Hartman 提交于 2月 28, 2019

debugfs can now report an error code if something went wrong instead of
just NULL.  So if the return value is to be used as a "real" dentry, it
needs to be checked if it is an error before dereferencing it.

This is now happening because of ff9fb72b ("debugfs: return error
values, not NULL").  syzbot has found a way to trigger multiple debugfs
files attempting to be created, which fails, and then the error code
gets passed to dentry_path_raw() which obviously does not like it.
Reported-by: NEric Biggers <ebiggers@kernel.org>
Reported-and-tested-by: syzbot+7857962b4d45e602b8ad@syzkaller.appspotmail.com
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8ed0579c

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功