提交 · a94fed6c9ffb02a303eb00d9d7953101ea8ee0b1 · openeuler / raspberrypi-kernel

You need to sign in or sign up before continuing.

27 12月, 2019 40 次提交

sdei_watchdog: refresh 'last_timestamp' when enabling nmi_watchdog · a94fed6c

由 Xiongfeng Wang 提交于 2月 15, 2019

euler inclusion
category: feature
Bugzilla: 5515
CVE: N/A

----------------------------------------

The trigger period of secure time is set by firmware. We need to check
the time_stamp every time the secure time fires to make sure the
hardlockup detection is not executed too soon. We need to refresh
'last_timestamp' to the current time when we enable the nmi_watchdog.
Otherwise, false hardlockup may be detected when the secure timer fires
the first time.
Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

a94fed6c

pagecache: add sysctl interface to limit pagecache · 6174ecb5

由 zhong jiang 提交于 2月 15, 2019

euleros inclusion
category: feature
feature: pagecache limit

add proc sysctl interface to set pagecache limit for reclaim memory
Signed-off-by: Nzhong jiang <zhongjiang@huawei.com>
Reviewed-by: NJing xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: Nzhong jiang <zhongjiang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

6174ecb5

Revert "net/mlx5e: Fail attempt to offload e-switch TC flows" · c8e46f29

由 Keefe LIU 提交于 2月 15, 2019

hulk inclusion
category: bugfix
bugzilla: 6105
CVE: NA

-------------------------------------------------

Patch "net/mlx5e: Fail attempt to offload e-switch TC flows"
depends on another patch "net/mlx5e: Use dedicated uplink
vport netdev representor", the depended patch isn't a bugfix
and it has too many conflicts with present code, so we'd
better revert this patch.
Signed-off-by: NKeefe LIU <liuqifa@huawei.com>
Reviewed-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c8e46f29

mm, memory_hotplug: do not clear numa_node association after hot_remove · b5e63033

由 Michal Hocko 提交于 2月 13, 2019

mainline inclusion
from mainline-5.0-rc1
commit 46a3679b8190101e4ebdfe252ef79e6150a4f2ac
category: bugfix
bugzilla: 5960
CVE: NA

---------------------

Per-cpu numa_node provides a default node for each possible cpu.  The
association gets initialized during the boot when the architecture
specific code explores cpu->NUMA affinity.  When the whole NUMA node is
removed though we are clearing this association

try_offline_node
  check_and_unmap_cpu_on_node
    unmap_cpu_on_node
      numa_clear_node
        numa_set_node(cpu, NUMA_NO_NODE)

This means that whoever calls cpu_to_node for a cpu associated with such a
node will get NUMA_NO_NODE.  This is problematic for two reasons.  First
it is fragile because __alloc_pages_node would simply blow up on an
out-of-bound access.  We have encountered this when loading kvm module

  BUG: unable to handle kernel paging request at 00000000000021c0
  IP: __alloc_pages_nodemask+0x93/0xb70
  PGD 800000ffe853e067 PUD 7336bbc067 PMD 0
  Oops: 0000 [#1] SMP
  [...]
  CPU: 88 PID: 1223749 Comm: modprobe Tainted: G        W          4.4.156-94.64-default #1
  RIP: __alloc_pages_nodemask+0x93/0xb70
  RSP: 0018:ffff887354493b40  EFLAGS: 00010202
  RAX: 00000000000021c0 RBX: 0000000000000000 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000002 RDI: 00000000014000c0
  RBP: 00000000014000c0 R08: ffffffffffffffff R09: 0000000000000000
  R10: ffff88fffc89e790 R11: 0000000000014000 R12: 0000000000000101
  R13: ffffffffa0772cd4 R14: ffffffffa0769ac0 R15: 0000000000000000
  FS:  00007fdf2f2f1700(0000) GS:ffff88fffc880000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000000021c0 CR3: 00000077205ee000 CR4: 0000000000360670
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
    alloc_vmcs_cpu+0x3d/0x90 [kvm_intel]
    hardware_setup+0x781/0x849 [kvm_intel]
    kvm_arch_hardware_setup+0x28/0x190 [kvm]
    kvm_init+0x7c/0x2d0 [kvm]
    vmx_init+0x1e/0x32c [kvm_intel]
    do_one_initcall+0xca/0x1f0
    do_init_module+0x5a/0x1d7
    load_module+0x1393/0x1c90
    SYSC_finit_module+0x70/0xa0
    entry_SYSCALL_64_fastpath+0x1e/0xb7
  DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xb7

on an older kernel but the code is basically the same in the current Linus
tree as well.  alloc_vmcs_cpu could use alloc_pages_nodemask which would
recognize NUMA_NO_NODE and use alloc_pages_node which would translate it
to numa_mem_id but that is wrong as well because it would use a cpu
affinity of the local CPU which might be quite far from the original node.
It is also reasonable to expect that cpu_to_node will provide a sane
value and there might be many more callers like that.

The second problem is that __register_one_node relies on cpu_to_node to
properly associate cpus back to the node when it is onlined.  We do not
want to lose that link as there is no arch independent way to get it from
the early boot time AFAICS.

Drop the whole check_and_unmap_cpu_on_node machinery and keep the
association to fix both issues.  The NODE_DATA(nid) is not deallocated so
it will stay in place and if anybody wants to allocate from that node then
a fallback node will be used.

Thanks to Vlastimil Babka for his live system debugging skills that helped
debugging the issue.

Link: http://lkml.kernel.org/r/20181108100413.966-1-mhocko@kernel.org
Fixes: e13fe869 ("cpu-hotplug,memory-hotplug: clear cpu_to_node() when offlining the node")
Signed-off-by: NMichal Hocko <mhocko@suse.com>
Debugged-by: NVlastimil Babka <vbabka@suse.cz>
Reported-by: NMiroslav Benes <mbenes@suse.cz>
Acked-by: NAnshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NJing xiangfeng <jingxiangfeng@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b5e63033

nvme-pci: rerun irq setup on IO queue init errors · 2c337261

由 Keith Busch 提交于 2月 14, 2019

mainline inclusion
from mainline-5.0-rc2
commit 8fae268b40f5191227ae7050a99cb2cf1b914ddd
category: bugfix
bugzilla: 6924
CVE: NA
---------------------------

If the driver is unable to create a subset of IO queues for any reason,
the read/write and polled queue sets will not match the actual allocated
hardware contexts. This leaves gaps in the CPU affinity mappings and
causes the following kernel panic after blk_mq_map_queue_type() returns
a NULL hctx.

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000198
  #PF error: [normal kernel read fault]
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP
  CPU: 64 PID: 1171 Comm: kworker/u259:1 Not tainted 4.20.0+ #241
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
  Workqueue: nvme-wq nvme_scan_work [nvme_core]
  RIP: 0010:blk_mq_init_allocated_queue+0x2d9/0x440
  RSP: 0018:ffffb1bf0abc3cd0 EFLAGS: 00010286
  RAX: 000000000000001f RBX: ffff8ea744cf0718 RCX: 0000000000000000
  RDX: 0000000000000002 RSI: 000000000000007c RDI: ffffffff9109a820
  RBP: ffff8ea7565f7008 R08: 000000000000001f R09: 000000000000003f
  R10: ffffb1bf0abc3c00 R11: 0000000000000000 R12: 000000000001d008
  R13: ffff8ea7565f7008 R14: 000000000000003f R15: 0000000000000001
  FS:  0000000000000000(0000) GS:ffff8ea757200000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000198 CR3: 0000000013058000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   blk_mq_init_queue+0x35/0x60
   nvme_validate_ns+0xc6/0x7c0 [nvme_core]
   ? nvme_identify_ctrl.isra.56+0x7e/0xc0 [nvme_core]
   nvme_scan_work+0xc8/0x340 [nvme_core]
   ? __wake_up_common+0x6d/0x120
   ? try_to_wake_up+0x55/0x410
   process_one_work+0x1e9/0x3d0
   worker_thread+0x2d/0x3d0
   ? process_one_work+0x3d0/0x3d0
   kthread+0x111/0x130
   ? kthread_park+0x90/0x90
   ret_from_fork+0x1f/0x30
  Modules linked in: nvme nvme_core serio_raw
  CR2: 0000000000000198

Fix by re-running the interrupt vector setup from scratch using a reduced
count that may be successful until the created queues matches the irq
affinity plus polling queue sets.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

[Conflict:
drivers/nvme/host/pci.c
conflict commit:
4e2241066("nvme-pci: use atomic bitops to mark a queue enabled")
5271edd41("nvme-pci: refactor nvme_disable_io_queues")
3b6592f70("nvme: utilize two queue maps, one for reads and one for
	writes")
]
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: NMiao Xie <miaoxie@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

2c337261

mm/vmstat.c: fix NUMA statistics updates · 656c1d66

由 Janne Huttunen 提交于 2月 13, 2019

mainline inclusion
from mainline-4.20-rc3
commit 13c9aaf7fa01cc7600c61981609feadeef3354ec
category: bugfix
bugzilla: 5955
CVE: NA

------------------------

Scan through the whole array to see if an update is needed.  While we're
at it, use sizeof() to be safe against any possible type changes in the
future.

The bug here is that we wouldn't sync per-cpu counters into global ones
if there was an update of numa_stats for higher cpus.  Highly
theoretical one though because it is much more probable that zone_stats
are updated so we would refresh anyway.  So I wouldn't bother to mark
this for stable, yet something nice to fix.

[mhocko@suse.com: changelog enhancement]
Link: http://lkml.kernel.org/r/1541601517-17282-1-git-send-email-janne.huttunen@nokia.com
Fixes: 1d90ca89 ("mm: update NUMA counter threshold size")
Signed-off-by: NJanne Huttunen <janne.huttunen@nokia.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NJing xiangfeng <jingxiangfeng@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

656c1d66

mm/gup_benchmark.c: prevent integer overflow in ioctl · ed747dac

由 Dan Carpenter 提交于 2月 13, 2019

mainline inclusion
from mainline-4.20-rc1
commit 4b408c74ee5a0b74fc9265c2fe39b0e7dec7c056
category: bugfix
bugzilla: 5946
CVE: NA

----------------------------

The concern here is that "gup->size" is a u64 and "nr_pages" is unsigned
long.  On 32 bit systems we could trick the kernel into allocating fewer
pages than expected.

Link: http://lkml.kernel.org/r/20181025061546.hnhkv33diogf2uis@kili.mountain
Fixes: 64c349f4 ("mm: add infrastructure for get_user_pages_fast() benchmarking")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NJing xiangfeng <jingxiangfeng@huawei.com>
Reviewed-by: Nzhongjiang <zhongjiang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

ed747dac

mm: add memmap interface to reserved memory for mremap syscall useage · 65103a7c

由 zhong jiang 提交于 2月 14, 2019

euleros inclusion
category: feature
feature: memmap

add memap interface to reserved memory for mremap syscall useage.
Signed-off-by: Nzhong jiang <zhongjiang@huawei.com>
Reviewed-by: NJing xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: Nzhong jiang <zhongjiang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

65103a7c

mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone · 65e9b099

由 Mikhail Zaslonko 提交于 2月 13, 2019

mainline inclusion
from mainline-5.0-rc5
commit 24feb47c5fa5b825efb0151f28906dfdad027e61
category: bugfix
bugzilla: 7436
CVE: NA

------------------------------

If memory end is not aligned with the sparse memory section boundary,
the mapping of such a section is only partly initialized.  This may lead
to VM_BUG_ON due to uninitialized struct pages access from
test_pages_in_a_zone() function triggered by memory_hotplug sysfs
handlers.

Here are the the panic examples:
 CONFIG_DEBUG_VM_PGFLAGS=y
 kernel parameter mem=2050M
 --------------------------
 page:000003d082008000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
   test_pages_in_a_zone+0xde/0x160
   show_valid_zones+0x5c/0x190
   dev_attr_show+0x34/0x70
   sysfs_kf_seq_show+0xc8/0x148
   seq_read+0x204/0x480
   __vfs_read+0x32/0x178
   vfs_read+0x82/0x138
   ksys_read+0x5a/0xb0
   system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
   test_pages_in_a_zone+0xde/0x160
 Kernel panic - not syncing: Fatal exception: panic_on_oops

Fix this by checking whether the pfn to check is within the zone.

[mhocko@suse.com: separated this change from http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com]
Link: http://lkml.kernel.org/r/20190128144506.15603-3-mhocko@kernel.org

[mhocko@suse.com: separated this change from
http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com]
Signed-off-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NMikhail Zaslonko <zaslonko@linux.ibm.com>
Tested-by: NMikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Tested-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NJing xiangfeng <jingxiangfeng@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

65e9b099

mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone · d970f071

由 Michal Hocko 提交于 2月 13, 2019

mainline inclusion
from mainline-5.0-rc5
commit efad4e475c312456edb3c789d0996d12ed744c13
category: bugfix
bugzilla: 7429
CVE: NA

------------------------------

Patch series "mm, memory_hotplug: fix uninitialized pages fallouts", v2.

Mikhail Zaslonko has posted fixes for the two bugs quite some time ago
[1].  I have pushed back on those fixes because I believed that it is
much better to plug the problem at the initialization time rather than
play whack-a-mole all over the hotplug code and find all the places
which expect the full memory section to be initialized.

We have ended up with commit 2830bf6f05fb ("mm, memory_hotplug:
initialize struct pages for the full memory section") merged and cause a
regression [2][3].  The reason is that there might be memory layouts
when two NUMA nodes share the same memory section so the merged fix is
simply incorrect.

In order to plug this hole we really have to be zone range aware in
those handlers.  I have split up the original patch into two.  One is
unchanged (patch 2) and I took a different approach for `removable'
crash.

[1] http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1666948
[3] http://lkml.kernel.org/r/20190125163938.GA20411@dhcp22.suse.cz

This patch (of 2):

Mikhail has reported the following VM_BUG_ON triggered when reading sysfs
removable state of a memory block:

 page:000003d08300c000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
   is_mem_section_removable+0xb4/0x190
   show_mem_removable+0x9a/0xd8
   dev_attr_show+0x34/0x70
   sysfs_kf_seq_show+0xc8/0x148
   seq_read+0x204/0x480
   __vfs_read+0x32/0x178
   vfs_read+0x82/0x138
   ksys_read+0x5a/0xb0
   system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
   is_mem_section_removable+0xb4/0x190
 Kernel panic - not syncing: Fatal exception: panic_on_oops

The reason is that the memory block spans the zone boundary and we are
stumbling over an unitialized struct page.  Fix this by enforcing zone
range in is_mem_section_removable so that we never run away from a zone.

Link: http://lkml.kernel.org/r/20190128144506.15603-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Reported-by: NMikhail Zaslonko <zaslonko@linux.ibm.com>
Debugged-by: NMikhail Zaslonko <zaslonko@linux.ibm.com>
Tested-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Tested-by: NMikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NJing xiangfeng <jingxiangfeng@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d970f071

Revert "iommu/io-pgtable-arm: Check for v7s-incapable systems" · 60a40145

由 Yong Wu 提交于 2月 12, 2019

mainline inclusion
from mainline-5.0-rc1
commit 2713fe37153efb90b7a8427a2f53fa49216faf5c
category: bugfix
bugzilla: 5951
CVE: NA

-----------------

This reverts commit 82db33dc.

After the commit 29859aeb ("iommu/io-pgtable-arm-v7s: Abort
allocation when table address overflows the PTE"), v7s will return fail
if the page table allocation isn't expected. this PHYS_OFFSET check
is unnecessary now.

And this check may lead to fail. For example, If CONFIG_RANDOMIZE_BASE
is enabled, the "memstart_addr" will be updated randomly, then the
PHYS_OFFSET may be random.
Reported-by: NCK Hu <ck.hu@mediatek.com>
Signed-off-by: NYong Wu <yong.wu@mediatek.com>
Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NJing xiangfeng <jingxiangfeng@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

60a40145

arm64/numa: Unify common error path in numa_init() · ce32fae0

由 Anshuman Khandual 提交于 2月 12, 2019

mainline inclusion
from mainline-4.20-rc1
commit 52338088ef0569290b7eae0759c58a3de494e6c0
category: bugfix
bugzilla: 5595
CVE: NA

-------------------------

At present numa_free_distance() is being called before numa_distance is
even initialized with numa_alloc_distance() which is really pointless.
Instead lets call numa_free_distance() on the common error path inside
numa_init() after numa_alloc_distance() has been successful.

Fixes: 1a2db300 ("arm64, numa: Add NUMA support for arm64 platforms")
Acked-by: NPunit Agrawal <punit.agrawal@arm.com>
Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NJing xiangfeng <jingxiangfeng@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

ce32fae0

drivers/rtc/rtc-lib.c: check whether tm->tm_year in int32 range · bdf7854a

由 Xuefeng Wang 提交于 2月 15, 2019

euler inclusion
category: bugfix
bugzilla: 9301
CVE: N/A

When setting rtc alarm (RTC_WKALM_SET), the tm_year is not checked if it
is in suiteable range. Use INT_MAX - 1900 to check it.

UBSAN: Undefined behaviour in drivers/rtc/rtc-lib.c:119:30
signed integer overflow:
2147483647 + 1900 cannot be represented in type 'int'
CPU: 1 PID: 20994 Comm: syz-executor0 Not tainted 4.19.18-514.55.6.9.x86_64
+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1
04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xca/0x13e lib/dump_stack.c:113
 ubsan_epilogue+0xe/0x81 lib/ubsan.c:159
 handle_overflow+0x193/0x1e2 lib/ubsan.c:190
 rtc_tm_to_time64+0x267/0x280 drivers/rtc/rtc-lib.c:119
 rtc_tm_to_ktime+0x16/0x70 drivers/rtc/rtc-lib.c:129
 rtc_set_alarm+0x1a9/0x2d0 drivers/rtc/interface.c:466
 rtc_dev_ioctl+0x6db/0x810 drivers/rtc/rtc-dev.c:380
 vfs_ioctl fs/ioctl.c:46 [inline]
 do_vfs_ioctl+0x1a5/0x10b0 fs/ioctl.c:690
 ksys_ioctl+0x89/0xa0 fs/ioctl.c:705
 __do_sys_ioctl fs/ioctl.c:712 [inline]
 __se_sys_ioctl fs/ioctl.c:710 [inline]
 __x64_sys_ioctl+0x74/0xb0 fs/ioctl.c:710
 do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x462589
Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89
f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08
0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8
64 89 01 48
RSP: 002b:00007f5348896c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000000000072bf00 RCX: 0000000000462589
RDX: 0000000020000000 RSI: 000000004028700f RDI: 0000000000000003
RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f53488976bc
R13: 00000000004bf67e R14: 00000000006f96e0 R15: 00000000ffffffff

==========================================================================
Signed-off-by: NXuefeng Wang <wxf.wang@hisilicon.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

bdf7854a

arm64: Add memory hotplug support · a7a4dd0b

由 Robin Murphy 提交于 2月 15, 2019

mainline inclusion
from mainline-5.0-rc1
commit 4ab215061554ae2a4b78744a5dd3b3c6639f16a7
category: feature
bugzilla: NA
CVE: NA

--------------------------------

Wire up the basic support for hot-adding memory. Since memory hotplug
is fairly tightly coupled to sparsemem, we tweak pfn_valid() to also
cross-check the presence of a section in the manner of the generic
implementation, before falling back to memblock to check for no-map
regions within a present section as before. By having arch_add_memory(()
create the linear mapping first, this then makes everything work in the
way that __add_section() expects.

We expect hotplug to be ACPI-driven, so the swapper_pg_dir updates
should be safe from races by virtue of the global device hotplug lock.
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NJing xiangfeng <jingxiangfeng@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

a7a4dd0b

arm64: irqflags: Fix clang build warnings · 62fc9247

由 Julien Thierry 提交于 2月 15, 2019

hulk inclusion
category: bugfix
bugzilla: 9299
CVE: NA

-------------------------------------------------

Clang complains when passing asm operands that are smaller than the
registers they are mapped to:

arch/arm64/include/asm/irqflags.h:50:10: warning: value size does not
	match register size specified by the constraint and modifier
	[-Wasm-operand-widths]
                : "r" (GIC_PRIO_IRQON)

Fix it by casting the affected input operands to a type of the correct
size.
Reported-by: NNathan Chancellor <natechancellor@gmail.com>
Tested-by: NNathan Chancellor <natechancellor@gmail.com>
Signed-off-by: NJulien Thierry <julien.thierry@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NWei Li <liwei391@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

62fc9247

arm64: xen: Use existing helper to check interrupt status · 303da41a

由 Julien Thierry 提交于 2月 15, 2019

mainline inclusion
from mainline-v4.20-rc1
commit b0506a8bbb42a859f6d25b3ecc4b6da93bae8d5a
category: bugfix
bugzilla: 9300
CVE: NA

--------------------------------

The status of interrupts might depend on more than just pstate. Use
interrupts_disabled() instead of raw_irqs_disabled_flags() to take the full
context into account.
Signed-off-by: NJulien Thierry <julien.thierry@arm.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: NStefano Stabellini <sstabellini@kernel.org>
Signed-off-by: NWei Li <liwei391@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

303da41a

xprtrdma: Simplify RPC wake-ups on connect · 4d4b0a1e

由 Chuck Lever 提交于 2月 13, 2019

mainline inclusion
from mainline-4.20
commit 31e62d25b5b8155b2ff6a7c6d31256475dbbcc7a
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

Currently, when a connection is established, rpcrdma_conn_upcall
invokes rpcrdma_conn_func and then
wake_up_all(&ep->rep_connect_wait). The former wakes waiting RPCs,
but the connect worker is not done yet, and that leads to races,
double wakes, and difficulty understanding how this logic is
supposed to work.

Instead, collect all the "connection established" logic in the
connect worker (xprt_rdma_connect_worker). A disconnect worker is
retained to handle provider upcalls safely.

Fixes: 254f91e2 ("xprtrdma: RPC/RDMA must invoke ... ")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>

Conflicts:
  net/sunrpc/xprtrdma/transport.c
  net/sunrpc/xprtrdma/verbs.c
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

4d4b0a1e

xprtrdma: Eliminate "connstate" variable from rpcrdma_conn_upcall() · 133376e5

由 Chuck Lever 提交于 2月 13, 2019

mainline inclusion
from mainline-4.20
commit aadc5a94483b138c8d9ade6e8416b089733a34dd
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

Clean up.

Since commit 173b8f49 ("xprtrdma: Demote "connect" log messages")
there has been no need to initialize connstat to zero. In fact, in
this code path there's now no reason not to set rep_connected
directly.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

Conflicts:
  net/sunrpc/xprtrdma/verbs.c
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

133376e5

macsec: update operstate when lower device changes · 2e8f3598

由 Sabrina Dubroca 提交于 2月 13, 2019

mainline inclusion
from mainline-4.20
commit e6ac075882b2afcdf2d5ab328ce4ab42a1eb9593
category: bugfix
bugzilla: 6048
CVE: NA

-------------------------------------------------

Like all other virtual devices (macvlan, vlan), the operstate of a
macsec device should match the state of its lower device. This is done
by calling netif_stacked_transfer_operstate from its netdevice notifier.

We also need to call netif_stacked_transfer_operstate when a new macsec
device is created, so that its operstate is set properly. This is only
relevant when we try to bring the device up directly when we create it.

Radu Rendec proposed a similar patch, inspired from the 802.1q driver,
that included changing the administrative state of the macsec device,
instead of just the operstate. This version is similar to what the
macvlan driver does, and updates only the operstate.

Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver")
Reported-by: NRadu Rendec <radu.rendec@gmail.com>
Reported-by: NPatrick Talbert <ptalbert@redhat.com>
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

2e8f3598

macsec: let the administrator set UP state even if lowerdev is down · 8f4bd8b0

由 Sabrina Dubroca 提交于 2月 13, 2019

mainline inclusion
from mainline-4.20
commit 07bddef9839378bd6f95b393cf24c420529b4ef1
category: bugfix
bugzilla: 6026
CVE: NA

-------------------------------------------------

Currently, the kernel doesn't let the administrator set a macsec device
up unless its lower device is currently up. This is inconsistent, as a
macsec device that is up won't automatically go down when its lower
device goes down.

Now that linkstate propagation works, there's really no reason for this
limitation, so let's remove it.

Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver")
Reported-by: NRadu Rendec <radu.rendec@gmail.com>
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

8f4bd8b0

net: phy: meson-gxl: Use the genphy_soft_reset callback · 666d9947

由 Timotej Lazar 提交于 2月 13, 2019

mainline inclusion
from mainline-4.20
commit f2f98c1d7fa81e25a5cf910edc9db4d3c6f36c1b
category: bugfix
bugzilla: 6040
CVE: NA

-------------------------------------------------

Since the referenced commit, Ethernet fails to come up at boot on the
board meson-gxl-s905x-libretech-cc. Fix this by re-enabling the
genphy_soft_reset callback for the Amlogic Meson GXL PHY driver.

Fixes: 6e2d85ec0559 ("net: phy: Stop with excessive soft reset")
Signed-off-by: NTimotej Lazar <timotej.lazar@araneo.si>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

666d9947

net: phy: micrel: set soft_reset callback to genphy_soft_reset for KSZ9031 · 712665c1

由 Heiner Kallweit 提交于 2月 13, 2019

mainline inclusion
from mainline-4.20
commit 1d16073a326891c2a964e4cb95bc18fbcafb5f74
category: bugfix
bugzilla: 6040
CVE: NA

-------------------------------------------------

So far genphy_soft_reset was used automatically if the PHY driver
didn't implement the soft_reset callback. This changed with the
mentioned commit and broke KSZ9031. To fix this configure the
KSZ9031 PHY driver to use genphy_soft_reset.

Fixes: 6e2d85ec0559 ("net: phy: Stop with excessive soft reset")
Reported-by: NTony Lindgren <tony@atomide.com>
Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
Tested-by: NTony Lindgren <tony@atomide.com>
Tested-by: NSekhar Nori <nsekhar@ti.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

712665c1

net: phy: Stop with excessive soft reset · 7d13e6f2

由 Florian Fainelli 提交于 2月 13, 2019

mainline inclusion
from mainline-4.20
commit 6e2d85ec05591b739059f65fe8438c9c5999f7d8
category: bugfix
bugzilla: 6040
CVE: NA

-------------------------------------------------

While consolidating the PHY reset in phy_init_hw() an unconditionaly
BMCR soft-reset I became quite trigger happy with those. This was later
on deactivated for the Generic PHY driver on the premise that a prior
software entity (e.g: bootloader) might have applied workarounds in
commit 0878fff1 ("net: phy: Do not perform software reset for
Generic PHY").

Since we have a hook to wire-up a soft_reset callback, just use that and
get rid of the call to genphy_soft_reset() entirely. This speeds up
initialization and link establishment for most PHYs out there that do
not require a reset.

Fixes: 87aa9f9c ("net: phy: consolidate PHY reset in phy_init_hw()")
Tested-by: NWang, Dongsheng <dongsheng.wang@hxt-semitech.com>
Tested-by: NChris Healy <cphealy@gmail.com>
Tested-by: NAndrew Lunn <andrew@lunn.ch>
Tested-by: NClemens Gruber <clemens.gruber@pqgruber.com>
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

7d13e6f2

net/mlx5: Use multi threaded workqueue for page fault handling · 84158a1d

由 Moni Shoua 提交于 2月 13, 2019

mainline inclusion
from mainline-5.0
commit 90290db7669ba680b37b7006cbf6e5cee6cba779
category: bugfix
bugzilla: 6037
CVE: NA

-------------------------------------------------

Page fault events are processed in a workqueue context. Since each QP
can have up to two concurrent unrelated page-faults, one for requester
and one for responder, page-fault handling can be done in parallel.
Achieve this by changing the workqueue to be multi-threaded.
The number of threads is the same as the number of command interface
channels to avoid command interface bottlenecks.

In addition to multi-threads, change the workqueue flags to give it high
priority.

Stress benchmark shows that before this change 85% of page faults were
waiting in queue 8 seconds or more while after the change 98% of page
faults were waiting in queue 64 milliseconds or less. The number of threads
was chosen as the number of channels to the command interface.

Fixes: d9aaed83 ("{net,IB}/mlx5: Refactor page fault handling")
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

84158a1d

i40e: Protect access to VF control methods · c7004aec

由 Jan Sokolowski 提交于 2月 13, 2019

mainline inclusion
from mainline-5.0
commit f5a7b21b243952d4d26a2c91a041d122c0306504
category: bugfix
bugzilla: 6032
CVE: NA

-------------------------------------------------

A scenario has been found in which simultaneous
addition/removal and modification of VF's might cause
unstable behaviour, up to and including kernel panics.

Protect the methods that create/modify/destroy VF's
by locking them behind an atomically set bit in PF status
bitfield.
Signed-off-by: NJan Sokolowski <jan.sokolowski@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c7004aec

ntb_netdev: fix sleep time mismatch · 86d78bfc

由 Jon Mason 提交于 2月 13, 2019

mainline inclusion
from mainline-4.20
commit a861594b1b7ffd630f335b351c4e9f938feadb8e
category: bugfix
bugzilla: 6015
CVE: NA

-------------------------------------------------

The tx_time should be in usecs (according to the comment above the
variable), but the setting of the timer during the rearming is done in
msecs.  Change it to match the expected units.

Fixes: e74bfeed ("NTB: Add flow control to the ntb_netdev")
Suggested-by: NGerd W. Haeussler <gerd.haeussler@cesys-it.com>
Signed-off-by: NJon Mason <jdmason@kudzu.us>
Acked-by: NDave Jiang <dave.jiang@intel.com>
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

86d78bfc

vxlan: changelink: Fix handling of default remotes · af568256

由 Petr Machata 提交于 2月 13, 2019

mainline inclusion
from mainline-4.20
commit ce5e098f7a10b4bf8e948c12fa350320c5c3afad
category: bugfix
bugzilla: 6009
CVE: NA

-------------------------------------------------

Default remotes are stored as FDB entries with an Ethernet address of
00:00:00:00:00:00. When a request is made to change a remote address of
a VXLAN device, vxlan_changelink() first deletes the existing default
remote, and then creates a new FDB entry.

This works well as long as the list of default remotes matches exactly
the configuration of a VXLAN remote address. Thus when the VXLAN device
has a remote of X, there should be exactly one default remote FDB entry
X. If the VXLAN device has no remote address, there should be no such
entry.

Besides using "ip link set", it is possible to manipulate the list of
default remotes by using the "bridge fdb". It is therefore easy to break
the above condition. Under such circumstances, the __vxlan_fdb_delete()
call doesn't delete the FDB entry itself, but just one remote. The
following vxlan_fdb_create() then creates a new FDB entry, leading to a
situation where two entries exist for the address 00:00:00:00:00:00,
each with a different subset of default remotes.

An even more obvious breakage rooted in the same cause can be observed
when a remote address is configured for a VXLAN device that did not have
one before. In that case vxlan_changelink() doesn't remove any remote,
and just creates a new FDB entry for the new address:

$ ip link add name vx up type vxlan id 2000 dstport 4789
$ bridge fdb ap dev vx 00:00:00:00:00:00 dst 192.0.2.20 self permanent
$ bridge fdb ap dev vx 00:00:00:00:00:00 dst 192.0.2.30 self permanent
$ ip link set dev vx type vxlan remote 192.0.2.30
$ bridge fdb sh dev vx | grep 00:00:00:00:00:00
00:00:00:00:00:00 dst 192.0.2.30 self permanent <- new entry, 1 rdst
00:00:00:00:00:00 dst 192.0.2.20 self permanent <- orig. entry, 2 rdsts
00:00:00:00:00:00 dst 192.0.2.30 self permanent

To fix this, instead of calling vxlan_fdb_create() directly, defer to
vxlan_fdb_update(). That has logic to handle the duplicates properly.
Additionally, it also handles notifications, so drop that call from
changelink as well.

Fixes: 0241b836 ("vxlan: fix default fdb entry netlink notify ordering during netdev create")
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

af568256

mm/readahead.c: simplify get_next_ra_size() · ad3fb6ba

由 Gao Xiang 提交于 2月 13, 2019

mainline inclusion
from mainline-5.0-rc1
commit 20ff1c950500380c6f74ec1a0d6f4eafab673ef6
category: bugfix
bugzilla: 5928
CVE: NA

-----------------------------

It's a trivial simplification for get_next_ra_size() and clear enough for
humans to understand.

It also fixes potential overflow if ra->size(< ra_pages) is too large.

Link: http://lkml.kernel.org/r/1540707206-19649-1-git-send-email-hsiangkao@aol.comSigned-off-by: NGao Xiang <hsiangkao@aol.com>
Reviewed-by: NFengguang Wu <fengguang.wu@intel.com>
Reviewed-by: NMatthew Wilcox <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NJing xiangfeng <jingxiangfeng@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

ad3fb6ba

kernel/sysctl: add panic_print into sysctl · 3df173ee

由 Feng Tang 提交于 2月 14, 2019

mainline inclusion
from mainline-5.0
commit d999bd9392de
category: bugfix
bugzilla: 6829
CVE: NA

-------------------------------------------------

So that we can also runtime chose to print out the needed system info
for panic, other than setting the kernel cmdline.

Link: http://lkml.kernel.org/r/1543398842-19295-3-git-send-email-feng.tang@intel.comSigned-off-by: NFeng Tang <feng.tang@intel.com>
Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

3df173ee

genirq/debugfs: Reset domain debugfs_file on removal of the debugfs file · 033a4322

由 Marc Zyngier 提交于 2月 14, 2019

mainline inclusion
from mainline-4.20
commit 94967b55ebf3
category: bugfix
bugzilla: 5755
CVE: NA

-------------------------------------------------

When removing a debugfs file for a given irq domain, we fail to clear the
corresponding field, meaning that the corresponding domain won't be created
again if we need to do so.

It turns out that this is exactly what irq_domain_update_bus_token does
(delete old file, update domain name, recreate file).

This doesn't have any impact other than making debug more difficult, but we
do value ease of debugging... So clear the debugfs_file field.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20181001100522.180054-2-marc.zyngier@arm.comSigned-off-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

033a4322

fork, memcg: fix cached_stacks case · 00ff24bd

由 Shakeel Butt 提交于 2月 14, 2019

mainline inclusion
from mainline-5.0
commit ba4a4574
category: bugfix
bugzilla: 5751
CVE: NA

-------------------------------------------------

Commit 5eed6f1d ("fork,memcg: fix crash in free_thread_stack on
memcg charge fail") fixes a crash caused due to failed memcg charge of
the kernel stack.  However the fix misses the cached_stacks case which
this patch fixes.  So, the same crash can happen if the memcg charge of
a cached stack is failed.

Link: http://lkml.kernel.org/r/20190102180145.57406-1-shakeelb@google.com
Fixes: 5eed6f1d ("fork,memcg: fix crash in free_thread_stack on memcg charge fail")
Signed-off-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NRik van Riel <riel@surriel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

00ff24bd

fork, memcg: fix crash in free_thread_stack on memcg charge fail · 94775c2e

由 Rik van Riel 提交于 2月 14, 2019

mainline inclusion
from mainline-4.20
commit 5eed6f1d
category: bugfix
bugzilla: 5751
CVE: NA

-------------------------------------------------

Commit 9b6f7e16 ("mm: rework memcg kernel stack accounting") will
result in fork failing if allocating a kernel stack for a task in
dup_task_struct exceeds the kernel memory allowance for that cgroup.

Unfortunately, it also results in a crash.

This is due to the code jumping to free_stack and calling
free_thread_stack when the memcg kernel stack charge fails, but without
tsk->stack pointing at the freshly allocated stack.

This in turn results in the vfree_atomic in free_thread_stack oopsing
with a backtrace like this:

 #6 [ffffc900244efcb8] do_general_protection at ffffffff8101cb86
 #7 [ffffc900244efce0] general_protection at ffffffff818ff082
    [exception RIP: llist_add_batch+7]
    RIP: ffffffff8150d487  RSP: ffffc900244efd98  RFLAGS: 00010282
    RAX: 0000000000000000  RBX: ffff88085ef55980  RCX: 0000000000000000
    RDX: ffff88085ef55980  RSI: 343834343531203a  RDI: 343834343531203a
    RBP: ffffc900244efd98   R8: 0000000000000001   R9: ffff8808578c3600
    R10: 0000000000000000  R11: 0000000000000001  R12: ffff88029f6c21c0
    R13: 0000000000000286  R14: ffff880147759b00  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffffc900244efda0] vfree_atomic at ffffffff811df2c7
 #9 [ffffc900244efdb8] copy_process at ffffffff81086e37
    RIP: 000000000049b948  RSP: 00007ffcdb307830  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000896030  RCX: 000000000049b948
    RDX: 0000000000000000  RSI: 00007ffcdb307790  RDI: 00000000005d7421
    RBP: 000000000067370f   R8: 00007ffcdb3077b0   R9: 000000000001ed00
    R10: 0000000000000008  R11: 0000000000000246  R12: 0000000000000040
    R13: 000000000000000f  R14: 0000000000000000  R15: 000000000088d018
    ORIG_RAX: 000000000000003a  CS: 0033  SS: 002b

The simplest fix is to assign tsk->stack right where it is allocated.

Link: http://lkml.kernel.org/r/20181214231726.7ee4843c@imladris.surriel.com
Fixes: 9b6f7e16 ("mm: rework memcg kernel stack accounting")
Signed-off-by: NRik van Riel <riel@surriel.com>
Acked-by: NRoman Gushchin <guro@fb.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

94775c2e

mm: handle no memcg case in memcg_kmem_charge() properly · c84d2f61

由 Roman Gushchin 提交于 2月 14, 2019

mainline inclusion
from mainline-4.20
commit e68599a3
category: bugfix
bugzilla: 5751
CVE: NA

-------------------------------------------------

Mike Galbraith reported a regression caused by the commit 9b6f7e16
("mm: rework memcg kernel stack accounting") on a system with
"cgroup_disable=memory" boot option: the system panics with the following
stack trace:

  BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8
  PGD 0 P4D 0
  Oops: 0002 [#1] PREEMPT SMP PTI
  CPU: 0 PID: 1 Comm: systemd Not tainted 4.19.0-preempt+ #410
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fed4
  RIP: 0010:page_counter_try_charge+0x22/0xc0
  Code: 41 5d c3 c3 0f 1f 40 00 0f 1f 44 00 00 48 85 ff 0f 84 a7 00 00 00 41 56 48 89 f8 49 89 fe 49
  Call Trace:
   try_charge+0xcb/0x780
   memcg_kmem_charge_memcg+0x28/0x80
   memcg_kmem_charge+0x8b/0x1d0
   copy_process.part.41+0x1ca/0x2070
   _do_fork+0xd7/0x3d0
   do_syscall_64+0x5a/0x180
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

The problem occurs because get_mem_cgroup_from_current() returns the NULL
pointer if memory controller is disabled.  Let's check if this is a case
at the beginning of memcg_kmem_charge() and just return 0 if
mem_cgroup_disabled() returns true.  This is how we handle this case in
many other places in the memory controller code.

Link: http://lkml.kernel.org/r/20181029215123.17830-1-guro@fb.com
Fixes: 9b6f7e16 ("mm: rework memcg kernel stack accounting")
Signed-off-by: NRoman Gushchin <guro@fb.com>
Reported-by: NMike Galbraith <efault@gmx.de>
Acked-by: NRik van Riel <riel@surriel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-by: NJing xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: NRoman Gushchin <guro@fb.com>
Reported-by: NMike Galbraith <efault@gmx.de>
Acked-by: NRik van Riel <riel@surriel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-by: NJing xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c84d2f61

mm: rework memcg kernel stack accounting · a4b41d8b

由 Roman Gushchin 提交于 2月 14, 2019

mainline inclusion
from mainline-4.20
commit 9b6f7e16
category: bugfix
bugzilla: 5751
CVE: NA

-------------------------------------------------

If CONFIG_VMAP_STACK is set, kernel stacks are allocated using
__vmalloc_node_range() with __GFP_ACCOUNT.  So kernel stack pages are
charged against corresponding memory cgroups on allocation and uncharged
on releasing them.

The problem is that we do cache kernel stacks in small per-cpu caches and
do reuse them for new tasks, which can belong to different memory cgroups.

Each stack page still holds a reference to the original cgroup, so the
cgroup can't be released until the vmap area is released.

To make this happen we need more than two subsequent exits without forks
in between on the current cpu, which makes it very unlikely to happen.  As
a result, I saw a significant number of dying cgroups (in theory, up to 2
* number_of_cpu + number_of_tasks), which can't be released even by
significant memory pressure.

As a cgroup structure can take a significant amount of memory (first of
all, per-cpu data like memcg statistics), it leads to a noticeable waste
of memory.

Link: http://lkml.kernel.org/r/20180827162621.30187-1-guro@fb.com
Fixes: ac496bf4 ("fork: Optimize task creation by caching two thread stacks per CPU if CONFIG_VMAP_STACK=y")
Signed-off-by: NRoman Gushchin <guro@fb.com>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NMichal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-by: NJing xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: NRoman Gushchin <guro@fb.com>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NMichal Hocko <mhocko@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-by: NJing xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

a4b41d8b

futex: Fix (possible) missed wakeup · cc835d2d

由 Peter Zijlstra 提交于 2月 14, 2019

mainline inclusion
from mainline-5.0
commit b061c38b
category: bugfix
bugzilla: 7208
CVE: NA

-------------------------------------------------

We must not rely on wake_q_add() to delay the wakeup; in particular
commit:

  1d0dcb3a ("futex: Implement lockless wakeups")

moved wake_q_add() before smp_store_release(&q->lock_ptr, NULL), which
could result in futex_wait() waking before observing ->lock_ptr ==
NULL and going back to sleep again.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 1d0dcb3a ("futex: Implement lockless wakeups")
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

cc835d2d

lib/vsprintf: Print time and date in human readable format via %pt · 76711303

由 Andy Shevchenko 提交于 2月 14, 2019

mainline inclusion
from mainline-4.20
commit 4d42c447
category: bugfix
bugzilla: 5743
CVE: NA

-------------------------------------------------

There are users which print time and date represented by content of
struct rtc_time in human readable format.

Instead of open coding that each time introduce %ptR[dt][r] specifier.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jonathan Hunter <jonathanh@nvidia.com>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: NPetr Mladek <pmladek@suse.com>
Signed-off-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

76711303

lib/vsprintf: Hash printed address for netdev bits fallback · 949e63ee

由 Geert Uytterhoeven 提交于 2月 14, 2019

mainline inclusion
from mainline-4.20
commit 431bca24
category: bugfix
bugzilla: 5743
CVE: NA

-------------------------------------------------

The handler for "%pN" falls back to printing the raw pointer value when
using a different format than the (sole supported) special format
"%pNF", potentially leaking sensitive information regarding the kernel
layout in memory.

Avoid this leak by printing the hashed address instead.
Note that there are no in-tree users of the fallback.

Fixes: ad67b74d ("printk: hash addresses printed with %p")
Link: http://lkml.kernel.org/r/20181011084249.4520-4-geert+renesas@glider.be
To: "Tobin C . Harding" <me@tobin.cc>
To: Andrew Morton <akpm@linux-foundation.org>
To: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NPetr Mladek <pmladek@suse.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

949e63ee

lib/vsprintf: Hash legacy clock addresses · e4fcbc74

由 Geert Uytterhoeven 提交于 2月 14, 2019

mainline inclusion
from mainline-4.20
commit ec12bc29
category: bugfix
bugzilla: 5743
CVE: NA

-------------------------------------------------

On platforms using the Common Clock Framework, "%pC" prints the clock's
name. On legacy platforms, it prints the unhashed clock's address,
potentially leaking sensitive information regarding the kernel layout in
memory.

Avoid this leak by printing the hashed address instead.  To distinguish
between clocks, a 32-bit unique identifier is as good as an actual
pointer value.

Fixes: ad67b74d ("printk: hash addresses printed with %p")
Link: http://lkml.kernel.org/r/20181011084249.4520-3-geert+renesas@glider.be
To: "Tobin C . Harding" <me@tobin.cc>
To: Andrew Morton <akpm@linux-foundation.org>
To: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NPetr Mladek <pmladek@suse.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

e4fcbc74

lib/vsprintf: Prepare for more general use of ptr_to_id() · e71c0fea

由 Geert Uytterhoeven 提交于 2月 14, 2019

mainline inclusion
from mainline-4.20
commit 9073dac1
category: bugfix
bugzilla: 5743
CVE: NA

-------------------------------------------------

Move the function and its dependencies up so it can be called from
special pointer type formatting routines.

Link: http://lkml.kernel.org/r/20181011084249.4520-2-geert+renesas@glider.be
To: "Tobin C . Harding" <me@tobin.cc>
To: Andrew Morton <akpm@linux-foundation.org>
To: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
[pmladek@suse.com: Split into separate patch]
Signed-off-by: NPetr Mladek <pmladek@suse.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

e71c0fea

lib/vsprintf: Make ptr argument conts in ptr_to_id() · d6224b3e

由 Geert Uytterhoeven 提交于 2月 14, 2019

mainline inclusion
from mainline-4.20
commit f31b224c
category: bugfix
bugzilla: 5743
CVE: NA

-------------------------------------------------

Make the ptr argument const to avoid adding casts in future callers.

Link: http://lkml.kernel.org/r/20181011084249.4520-2-geert+renesas@glider.be
To: "Tobin C . Harding" <me@tobin.cc>
To: Andrew Morton <akpm@linux-foundation.org>
To: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
[pmladek@suse.com: split into separate patch]
Signed-off-by: NPetr Mladek <pmladek@suse.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d6224b3e