提交 · ddb51d94f2bd3082975433051b440d0db1c85d0c · openeuler / Kernel

22 9月, 2020 40 次提交

arm64/ascend: Enable CONFIG_ASCEND_AUTO_TUNING_HUGEPAGE for hulk_defconfig · ddb51d94

由 Weilong Chen 提交于 9月 22, 2020

ascend inclusion
category: feature
bugzilla: NA
CVE: NA

-------------------------------------------------

Enable the hugepage auto tuning features for hulk default config.
Signed-off-by: NWeilong Chen <chenweilong@huawei.com>
Reviewed-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

ddb51d94

arm64/ascend: Notifier will return a freed val to indecate print logs · 525b0962

由 Weilong Chen 提交于 9月 22, 2020

ascend inclusion
category: feature
bugzilla: NA
CVE: NA

-------------------------------------------------

When there's not enough memory, a log of oom messages will be dump.
This patch is to reduce messages by using an indecate val passed to
the notifier.
Signed-off-by: NWeilong Chen <chenweilong@huawei.com>
Reviewed-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

525b0962

arm64/ascend: Add hugepage flags change interface · db1d159b

由 Weilong Chen 提交于 9月 22, 2020

ascend inclusion
category: feature
bugzilla: NA
CVE: NA

-------------------------------------------------

Add a variable to change alloc hugepage gfp flags, and export it out.
Hugepage need to be accounted by cgroup. We use this to set ACCOUNT
flag to memory subsystem.
Signed-off-by: NWeilong Chen <chenweilong@huawei.com>
Reviewed-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

db1d159b

arm64/ascend: Add set hugepage number helper function · b6bcd500

由 Weilong Chen 提交于 9月 22, 2020

ascend inclusion
category: feature
bugzilla: NA
CVE: NA

-------------------------------------------------

Add helper function for change system hugepage nr, and export it out.
Signed-off-by: NWeilong Chen <chenweilong@huawei.com>
Reviewed-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b6bcd500

arm64/ascend: Add mmap hook when alloc hugepage · d9952490

由 Weilong Chen 提交于 9月 22, 2020

ascend inclusion
category: feature
bugzilla: NA
CVE: NA

-------------------------------------------------

Use block notify chain to notify user that mmap event.
Signed-off-by: NWeilong Chen <chenweilong@huawei.com>
Reviewed-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d9952490

arm64/ascend: Add new CONFIG for auto-tuning hugepage · 2597ada2

由 Weilong Chen 提交于 9月 22, 2020

ascend inclusion
category: feature
bugzilla: NA
CVE: NA

-------------------------------------------------

Add a new config ASCEND_AUTO_TUNING_HUGEPAGE to support for
auto-tuning hugepage.
Signed-off-by: NWeilong Chen <chenweilong@huawei.com>
Reviewed-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

2597ada2

dm thin metadata: Fix use-after-free in dm_bm_set_read_only · 4fb84de6

由 Ye Bin 提交于 9月 22, 2020

mainline inclusion
from mainline-v5.9
commit 3a653b20
category: bugfix
bugzilla: 41672
CVE: NA

-----------------------------------------------

The following error ocurred when testing disk online/offline:

[  301.798344] device-mapper: thin: 253:5: aborting current metadata transaction
[  301.848441] device-mapper: thin: 253:5: failed to abort metadata transaction
[  301.849206] Aborting journal on device dm-26-8.
[  301.850489] EXT4-fs error (device dm-26) in __ext4_new_inode:943: Journal has aborted
[  301.851095] EXT4-fs (dm-26): Delayed block allocation failed for inode 398742 at logical offset 181 with max blocks 19 with error 30
[  301.854476] BUG: KASAN: use-after-free in dm_bm_set_read_only+0x3a/0x40 [dm_persistent_data]

Reason is:

 metadata_operation_failed
    abort_transaction
        dm_pool_abort_metadata
	    __create_persistent_data_objects
	        r = __open_or_format_metadata
	        if (r) --> If failed will free pmd->bm but pmd->bm not set NULL
		    dm_block_manager_destroy(pmd->bm);
    set_pool_mode
	dm_pool_metadata_read_only(pool->pmd);
	dm_bm_set_read_only(pmd->bm);  --> use-after-free

Add checks to see if pmd->bm is NULL in dm_bm_set_read_only and
dm_bm_set_read_write functions.  If bm is NULL it means creating the
bm failed and so dm_bm_is_read_only must return true.
Signed-off-by: NYe Bin <yebin10@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

4fb84de6

dm thin metadata: Avoid returning cmd->bm wild pointer on error · b06ad21f

由 Ye Bin 提交于 9月 22, 2020

mainline inclusion
from mainline-v5.9
commit 219403d7
category: bugfix
bugzilla: 41672
CVE: NA

-----------------------------------------------

Maybe __create_persistent_data_objects() caller will use PTR_ERR as a
pointer, it will lead to some strange things.
Signed-off-by: NYe Bin <yebin10@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b06ad21f

dm cache metadata: Avoid returning cmd->bm wild pointer on error · 46906bf8

由 Ye Bin 提交于 9月 22, 2020

mainline inclusion
from mainline-v5.9
commit d16ff19e
category: bugfix
bugzilla: 41672
CVE: NA

-----------------------------------------------

Maybe __create_persistent_data_objects() caller will use PTR_ERR as a
pointer, it will lead to some strange things.
Signed-off-by: NYe Bin <yebin10@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

46906bf8

watchdog: Enable CONFIG_ASCEND_WATCHDOG_SYSFS_CONFIGURE in hulk_defconfig · 391eab17

由 Wang Wensheng 提交于 9月 22, 2020

ascend inclusion
category: bugfix
bugzilla: NA
CVE: NA

Enable CONFIG_ASCEND_WATCHDOG_SYSFS_CONFIGURE in hulk_defconfig
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

391eab17

watchdog: Add interface to config timeout and pretimeout in sysfs · 9476b8f0

由 Wang Wensheng 提交于 9月 22, 2020

ascend inclusion
category: feature
bugzilla: NA
CVE: NA

Those interfaces exist already in sysfs of watchdog driver core, but
they are readonly. This patch add write hook so we can config timeout
and pretimeout by writing those files.

Use Kconfig ASCEND_WATCHDOG_SYSFS_CONFIGURE to select this feature.
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

9476b8f0

mm/swapfile: fix and annotate various data races · 1180dc6a

由 Qian Cai 提交于 9月 22, 2020

mainline inclusion
from mainline-v5.9-rc1
commit a449bf58
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

swap_info_struct si.highest_bit, si.swap_map[offset] and si.flags could
be accessed concurrently separately as noticed by KCSAN,

=== si.highest_bit ===

 write to 0xffff8d5abccdc4d4 of 4 bytes by task 5353 on cpu 24:
  swap_range_alloc+0x81/0x130
  swap_range_alloc at mm/swapfile.c:681
  scan_swap_map_slots+0x371/0xb90
  get_swap_pages+0x39d/0x5c0
  get_swap_page+0xf2/0x524
  add_to_swap+0xe4/0x1c0
  shrink_page_list+0x1795/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290

 read to 0xffff8d5abccdc4d4 of 4 bytes by task 6672 on cpu 70:
  scan_swap_map_slots+0x4a6/0xb90
  scan_swap_map_slots at mm/swapfile.c:892
  get_swap_pages+0x39d/0x5c0
  get_swap_page+0xf2/0x524
  add_to_swap+0xe4/0x1c0
  shrink_page_list+0x1795/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 70 PID: 6672 Comm: oom01 Tainted: G        W    L 5.5.0-next-20200205+ #3
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

=== si.swap_map[offset] ===

 write to 0xffffbc370c29a64c of 1 bytes by task 6856 on cpu 86:
  __swap_entry_free_locked+0x8c/0x100
  __swap_entry_free_locked at mm/swapfile.c:1209 (discriminator 4)
  __swap_entry_free.constprop.20+0x69/0xb0
  free_swap_and_cache+0x53/0xa0
  unmap_page_range+0x7f8/0x1d70
  unmap_single_vma+0xcd/0x170
  unmap_vmas+0x18b/0x220
  exit_mmap+0xee/0x220
  mmput+0x10e/0x270
  do_exit+0x59b/0xf40
  do_group_exit+0x8b/0x180

 read to 0xffffbc370c29a64c of 1 bytes by task 6855 on cpu 20:
  _swap_info_get+0x81/0xa0
  _swap_info_get at mm/swapfile.c:1140
  free_swap_and_cache+0x40/0xa0
  unmap_page_range+0x7f8/0x1d70
  unmap_single_vma+0xcd/0x170
  unmap_vmas+0x18b/0x220
  exit_mmap+0xee/0x220
  mmput+0x10e/0x270
  do_exit+0x59b/0xf40
  do_group_exit+0x8b/0x180

=== si.flags ===

 write to 0xffff956c8fc6c400 of 8 bytes by task 6087 on cpu 23:
  scan_swap_map_slots+0x6fe/0xb50
  scan_swap_map_slots at mm/swapfile.c:887
  get_swap_pages+0x39d/0x5c0
  get_swap_page+0x377/0x524
  add_to_swap+0xe4/0x1c0
  shrink_page_list+0x1795/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290

 read to 0xffff956c8fc6c400 of 8 bytes by task 6207 on cpu 63:
  _swap_info_get+0x41/0xa0
  __swap_info_get at mm/swapfile.c:1114
  put_swap_page+0x84/0x490
  __remove_mapping+0x384/0x5f0
  shrink_page_list+0xff1/0x2870
  shrink_inactive_list+0x316/0x880
  shrink_lruvec+0x8dc/0x1380
  shrink_node+0x317/0xd80
  do_try_to_free_pages+0x1f7/0xa10
  try_to_free_pages+0x26c/0x5e0
  __alloc_pages_slowpath+0x458/0x1290

The writes are under si->lock but the reads are not. For si.highest_bit
and si.swap_map[offset], data race could trigger logic bugs, so fix them
by having WRITE_ONCE() for the writes and READ_ONCE() for the reads
except those isolated reads where they compare against zero which a data
race would cause no harm. Thus, annotate them as intentional data races
using the data_race() macro.

For si.flags, the readers are only interested in a single bit where a
data race there would cause no issue there.

[cai@lca.pw: add a missing annotation for si->flags in memory.c]
  Link: http://lkml.kernel.org/r/1581612647-5958-1-git-send-email-cai@lca.pwSigned-off-by: NQian Cai <cai@lca.pw>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Marco Elver <elver@google.com>
Cc: Hugh Dickins <hughd@google.com>
Link: http://lkml.kernel.org/r/1581095163-12198-1-git-send-email-cai@lca.pwSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
[yyl: remove data_race(), this macro is used when KCSAN is enabled]
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

1180dc6a

serial: 8250: fix null-ptr-deref in serial8250_start_tx() · fafc4a44

由 Yang Yingliang 提交于 9月 22, 2020

stable inclusion
from linux-4.19.135
commit c358255ff1dfa51ddbcbc8dfcc4eaa5719008daa

--------------------------------

commit f4c23a14 upstream.

I got null-ptr-deref in serial8250_start_tx():

[   78.114630] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[   78.123778] Mem abort info:
[   78.126560]   ESR = 0x86000007
[   78.129603]   EC = 0x21: IABT (current EL), IL = 32 bits
[   78.134891]   SET = 0, FnV = 0
[   78.137933]   EA = 0, S1PTW = 0
[   78.141064] user pgtable: 64k pages, 48-bit VAs, pgdp=00000027d41a8600
[   78.147562] [0000000000000000] pgd=00000027893f0003, p4d=00000027893f0003, pud=00000027893f0003, pmd=00000027c9a20003, pte=0000000000000000
[   78.160029] Internal error: Oops: 86000007 [#1] SMP
[   78.164886] Modules linked in: sunrpc vfat fat aes_ce_blk crypto_simd cryptd aes_ce_cipher crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce ses enclosure sg sbsa_gwdt ipmi_ssif spi_dw_mmio sch_fq_codel vhost_net tun vhost vhost_iotlb tap ip_tables ext4 mbcache jbd2 ahci hisi_sas_v3_hw libahci hisi_sas_main libsas hns3 scsi_transport_sas hclge libata megaraid_sas ipmi_si hnae3 ipmi_devintf ipmi_msghandler br_netfilter bridge stp llc nvme nvme_core xt_sctp sctp libcrc32c dm_mod nbd
[   78.207383] CPU: 11 PID: 23258 Comm: null-ptr Not tainted 5.8.0-rc6+ #48
[   78.214056] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B210.01 03/12/2020
[   78.222888] pstate: 80400089 (Nzcv daIf +PAN -UAO BTYPE=--)
[   78.228435] pc : 0x0
[   78.230618] lr : serial8250_start_tx+0x160/0x260
[   78.235215] sp : ffff800062eefb80
[   78.238517] x29: ffff800062eefb80 x28: 0000000000000fff
[   78.243807] x27: ffff800062eefd80 x26: ffff202fd83b3000
[   78.249098] x25: ffff800062eefd80 x24: ffff202fd83b3000
[   78.254388] x23: ffff002fc5e50be8 x22: 0000000000000002
[   78.259679] x21: 0000000000000001 x20: 0000000000000000
[   78.264969] x19: ffffa688827eecc8 x18: 0000000000000000
[   78.270259] x17: 0000000000000000 x16: 0000000000000000
[   78.275550] x15: ffffa68881bc67a8 x14: 00000000000002e6
[   78.280841] x13: ffffa68881bc67a8 x12: 000000000000c539
[   78.286131] x11: d37a6f4de9bd37a7 x10: ffffa68881cccff0
[   78.291421] x9 : ffffa68881bc6000 x8 : ffffa688819daa88
[   78.296711] x7 : ffffa688822a0f20 x6 : ffffa688819e0000
[   78.302002] x5 : ffff800062eef9d0 x4 : ffffa68881e707a8
[   78.307292] x3 : 0000000000000000 x2 : 0000000000000002
[   78.312582] x1 : 0000000000000001 x0 : ffffa688827eecc8
[   78.317873] Call trace:
[   78.320312]  0x0
[   78.322147]  __uart_start.isra.9+0x64/0x78
[   78.326229]  uart_start+0xb8/0x1c8
[   78.329620]  uart_flush_chars+0x24/0x30
[   78.333442]  n_tty_receive_buf_common+0x7b0/0xc30
[   78.338128]  n_tty_receive_buf+0x44/0x2c8
[   78.342122]  tty_ioctl+0x348/0x11f8
[   78.345599]  ksys_ioctl+0xd8/0xf8
[   78.348903]  __arm64_sys_ioctl+0x2c/0xc8
[   78.352812]  el0_svc_common.constprop.2+0x88/0x1b0
[   78.357583]  do_el0_svc+0x44/0xd0
[   78.360887]  el0_sync_handler+0x14c/0x1d0
[   78.364880]  el0_sync+0x140/0x180
[   78.368185] Code: bad PC value

SERIAL_PORT_DFNS is not defined on each arch, if it's not defined,
serial8250_set_defaults() won't be called in serial8250_isa_init_ports(),
so the p->serial_in pointer won't be initialized, and it leads a null-ptr-deref.
Fix this problem by calling serial8250_set_defaults() after init uart port.
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200721143852.4058352-1-yangyingliang@huawei.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

fafc4a44

timekeeping: Prevent 32bit truncation in scale64_check_overflow() · 5ad5d337

由 Wen Yang 提交于 9月 22, 2020

mainline inclusion
from mainline-v5.6
commit 4cbbc3a0
category: bugfix
bugzilla: 32422
CVE: NA

------------------------

While unlikely the divisor in scale64_check_overflow() could be >= 32bit in
scale64_check_overflow(). do_div() truncates the divisor to 32bit at least
on 32bit platforms.

Use div64_u64() instead to avoid the truncation to 32-bit.

[ tglx: Massaged changelog ]
Signed-off-by: NWen Yang <wenyang@linux.alibaba.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200120100523.45656-1-wenyang@linux.alibaba.comSigned-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

5ad5d337

lib : kobject: fix refcount imblance on kobject_rename · 960ac933

由 Lin Yi 提交于 9月 22, 2020

mainline inclusion
from mainline-v5.3-rc1
commit 122f8ec7
category: bugfix
bugzilla: 18462
CVE: NA

-------------------------------------------------

the kobj refcount increased by kobject_get should be released before
error return, otherwise lead to a memory leak.
Signed-off-by: NLin Yi <teroincn@163.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLi Heng <liheng40@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

960ac933

genirq/debugfs: Add missing sanity checks to interrupt injection · 4b50cb05

由 Thomas Gleixner 提交于 9月 22, 2020

mainline inclusion
from mainline-v5.6
commit a740a423
category: bugfix
bugzilla: 32425
CVE: NA

-------------------

Interrupts cannot be injected when the interrupt is not activated and when
a replay is already in progress.

Fixes: 536e2e34 ("genirq/debugfs: Triggering of interrupts from userspace")
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NMarc Zyngier <maz@kernel.org>
Link: https://lkml.kernel.org/r/20200306130623.500019114@linutronix.deSigned-off-by: NYuan Can <yuancan@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

4b50cb05

ovl: fix WARN_ON nlink drop to zero · 62834c28

由 Miklos Szeredi 提交于 9月 22, 2020

mainline inclusion
from mainline-5.7-rc1
commit 83552eac
category: bugfix
bugzilla: 37632
CVE: NA

---------------------------

Changes to underlying layers should not cause WARN_ON(), but this repro
does:

 mkdir w l u mnt
 sudo mount -t overlay -o workdir=w,lowerdir=l,upperdir=u overlay mnt
 touch mnt/h
 ln u/h u/k
 rm -rf mnt/k
 rm -rf mnt/h
 dmesg

 ------------[ cut here ]------------
 WARNING: CPU: 1 PID: 116244 at fs/inode.c:302 drop_nlink+0x28/0x40

After upper hardlinks were added while overlay is mounted, unlinking all
overlay hardlinks drops overlay nlink to zero before all upper inodes
are unlinked.

After unlink/rename prevent i_nlink from going to zero if there are still
hashed aliases (i.e. cached hard links to the victim) remaining.
Reported-by: NPhasip <phasip@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

Conflicts:
	fs/overlayfs/dir.c
	[ Non bugfix patch 0e32992f("ovl: remove the 'locked'
	  argument of ovl_nlink_{start,end}") removed all locked
	  arg is not applied. ]
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

62834c28

ovl: fix some xino configurations · 682c799b

由 Amir Goldstein 提交于 9月 22, 2020

mainline inclusion
from mainline-5.6-rc6
commit 53afcd31
category: bugfix
bugzilla: 37632
CVE: NA

---------------------------

Fix up two bugs in the coversion to xino_mode:
1. xino=off does not always end up in disabled mode
2. xino=auto on 32bit arch should end up in disabled mode

Take a proactive approach to disabling xino on 32bit kernel:
1. Disable XINO_AUTO config during build time
2. Disable xino with a warning on mount time

As a by product, xino=on on 32bit arch also ends up in disabled mode.
We never intended to enable xino on 32bit arch and this will make the
rest of the logic simpler.

Fixes: 0f831ec8 ("ovl: simplify ovl_same_sb() helper")
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

682c799b

ovl: fix corner case of non-constant st_dev; st_ino · e414ff73

由 Amir Goldstein 提交于 9月 22, 2020

mainline inclusion
from mainline-5.6-rc1
commit b7bf9908
category: bugfix
bugzilla: 37632
CVE: NA

---------------------------

On non-samefs overlay without xino, non pure upper inodes should use a
pseudo_dev assigned to each unique lower fs, but if lower layer is on the
same fs and upper layer, it has no pseudo_dev assigned.

In this overlay layers setup:
 - two filesystems, A and B
 - upper layer is on A
 - lower layer 1 is also on A
 - lower layer 2 is on B

Non pure upper overlay inode, whose origin is in layer 1 will have the
st_dev;st_ino values of the real lower inode before copy up and the
st_dev;st_ino values of the real upper inode after copy up.

Fix this inconsitency by assigning a unique pseudo_dev also for upper fs,
that will be used as st_dev value along with the lower inode st_dev for
overlay inodes in the case above.
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

e414ff73

ovl: fix corner case of conflicting lower layer uuid · 616834bf

由 Amir Goldstein 提交于 9月 22, 2020

mainline inclusion
from mainline-5.6-rc1
commit 1b81dddd
category: bugfix
bugzilla: 37632
CVE: NA

---------------------------

This fixes ovl_lower_uuid_ok() to correctly detect the corner case:
 - two filesystems, A and B, both have null uuid
 - upper layer is on A
 - lower layer 1 is also on A
 - lower layer 2 is on B

In this case, bad_uuid would not have been set for B, because the check
only involved the list of lower fs.  Hence we'll try to decode a layer 2
origin on layer 1 and fail.

We check for conflicting (and null) uuid among all lower layers, including
those layers that are on the same fs as the upper layer.
Reported-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

616834bf

ovl: generalize the lower_fs[] array · 1de120e2

由 Amir Goldstein 提交于 9月 22, 2020

mainline inclusion
from mainline-5.6-rc1
commit 07f1e596
category: bugfix
bugzilla: 37632
CVE: NA

---------------------------

Rename lower_fs[] array to fs[], extend its size by one and use index fsid
(instead of fsid-1) to access the fs[] array.

Initialize fs[0] with upper fs values. fsid 0 is reserved even with lower
only overlay, so fs[0] remains null in this case.
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

Conflicts:
	fs/overlayfs/super.c
	[ Non bugfix patch 1bd0a3ae("ovl: use pr_fmt auto generate
	  prefix") reconstructed code is not applied. ]
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

1de120e2

ovl: simplify ovl_same_sb() helper · 6ff7f4d5

由 Amir Goldstein 提交于 9月 22, 2020

mainline inclusion
from mainline-5.6-rc1
commit 0f831ec8
category: bugfix
bugzilla: 37632
CVE: NA

---------------------------

No code uses the sb returned from this helper, so make it retrun a boolean
and rename it to ovl_same_fs().

The xino mode is irrelevant when all layers are on same fs, so instead of
describing samefs with mode OVL_XINO_OFF, use a new xino_mode state, which
is 0 in the case of samefs, -1 in the case of xino=off and > 0 with xino
enabled.

Create a new helper ovl_same_dev(), to use instead of the common check for
(ovl_same_fs() || xinobits).
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

Conflicts:
	fs/overlayfs/overlayfs.h
	[ Non bugfix patch 1e92e307("ovl: abstract ovl_inode lock
	  with a helper") reconstructed code is not applied. ]
	fs/overlayfs/super.c
	[ Non bugfix patch 1bd0a3ae("ovl: use pr_fmt auto generate
	  prefix") reconstructed code is not applied. ]
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

6ff7f4d5

ovl: generalize the lower_layers[] array · d701cf73

由 Amir Goldstein 提交于 9月 22, 2020

mainline inclusion
from mainline-5.6-rc1
commit 94375f9d
category: bugfix
bugzilla: 37632
CVE: NA

---------------------------

Rename lower_layers[] array to layers[], extend its size by one and
initialize layers[0] with upper layer values.  Lower layers are now
addressed with index 1..numlower.  layers[0] is reserved even with lower
only overlay.

[SzM: replace ofs->numlower with ofs->numlayer, the latter's value is
incremented by one]
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d701cf73

ovl: fix lookup failure on multi lower squashfs · d66c7cb7

由 Amir Goldstein 提交于 9月 22, 2020

mainline inclusion
from mainline-5.5-rc2
commit 7e63c87f
category: bugfix
bugzilla: 37632
CVE: NA

---------------------------

In the past, overlayfs required that lower fs have non null uuid in
order to support nfs export and decode copy up origin file handles.

Commit 9df085f3 ("ovl: relax requirement for non null uuid of
lower fs") relaxed this requirement for nfs export support, as long
as uuid (even if null) is unique among all lower fs.

However, said commit unintentionally also relaxed the non null uuid
requirement for decoding copy up origin file handles, regardless of
the unique uuid requirement.

Amend this mistake by disabling decoding of copy up origin file handle
from lower fs with a conflicting uuid.

We still encode copy up origin file handles from those fs, because
file handles like those already exist in the wild and because they
might provide useful information in the future.

There is an unhandled corner case described by Miklos this way:
- two filesystems, A and B, both have null uuid
- upper layer is on A
- lower layer 1 is also on A
- lower layer 2 is on B

In this case bad_uuid won't be set for B, because the check only
involves the list of lower fs.  Hence we'll try to decode a layer 2
origin on layer 1 and fail.

We will deal with this corner case later.
Reported-by: NColin Ian King <colin.king@canonical.com>
Tested-by: NColin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/lkml/20191106234301.283006-1-colin.king@canonical.com/
Fixes: 9df085f3 ("ovl: relax requirement for non null uuid ...")
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d66c7cb7

fat: don't allow to mount if the FAT length == 0 · ba28aebc

由 OGAWA Hirofumi 提交于 9月 22, 2020

mainline inclusion
from mainline-v5.8-rc1
commit b1b65750
category: bugfix
bugzilla: 36873
CVE: NA

---------------------------

If FAT length == 0, the image doesn't have any data. And it can be the
cause of overlapping the root dir and FAT entries.

Also Windows treats it as invalid format.

Reported-by: syzbot+6f1624f937d9d6911e2d@syzkaller.appspotmail.com
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Link: http://lkml.kernel.org/r/87r1wz8mrd.fsf@mail.parknet.co.jpSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NZheng Bin <zhengbin13@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

ba28aebc

serial: amba-pl011: Make sure we initialize the port.lock spinlock · 86d39267

由 John Stultz 提交于 9月 22, 2020

mainline inclusion
from mainline-v5.8-rc1
commit 8508f4cb
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

Valentine reported seeing:

[    3.626638] INFO: trying to register non-static key.
[    3.626639] the code is fine but needs lockdep annotation.
[    3.626640] turning off the locking correctness validator.
[    3.626644] CPU: 7 PID: 51 Comm: kworker/7:1 Not tainted 5.7.0-rc2-00115-g8c2e9790f196 #116
[    3.626646] Hardware name: HiKey960 (DT)
[    3.626656] Workqueue: events deferred_probe_work_func
[    3.632476] sd 0:0:0:0: [sda] Optimal transfer size 8192 bytes not a multiple of physical block size (16384 bytes)
[    3.640220] Call trace:
[    3.640225]  dump_backtrace+0x0/0x1b8
[    3.640227]  show_stack+0x20/0x30
[    3.640230]  dump_stack+0xec/0x158
[    3.640234]  register_lock_class+0x598/0x5c0
[    3.640235]  __lock_acquire+0x80/0x16c0
[    3.640236]  lock_acquire+0xf4/0x4a0
[    3.640241]  _raw_spin_lock_irqsave+0x70/0xa8
[    3.640245]  uart_add_one_port+0x388/0x4b8
[    3.640248]  pl011_register_port+0x70/0xf0
[    3.640250]  pl011_probe+0x184/0x1b8
[    3.640254]  amba_probe+0xdc/0x180
[    3.640256]  really_probe+0xe0/0x338
[    3.640257]  driver_probe_device+0x60/0xf8
[    3.640259]  __device_attach_driver+0x8c/0xd0
[    3.640260]  bus_for_each_drv+0x84/0xd8
[    3.640261]  __device_attach+0xe4/0x140
[    3.640263]  device_initial_probe+0x1c/0x28
[    3.640265]  bus_probe_device+0xa4/0xb0
[    3.640266]  deferred_probe_work_func+0x7c/0xb8
[    3.640269]  process_one_work+0x2c0/0x768
[    3.640271]  worker_thread+0x4c/0x498
[    3.640272]  kthread+0x14c/0x158
[    3.640275]  ret_from_fork+0x10/0x1c

Which seems to be due to the fact that after allocating the uap
structure, nothing initializes the spinlock.

Its a little confusing, as uart_port_spin_lock_init() is one
place where the lock is supposed to be initialized, but it has
an exception for the case where the port is a console.

This makes it seem like a deeper fix is needed to properly
register the console, but I'm not sure what that entails, and
Andy suggested that this approach is less invasive.

Thus, this patch resolves the issue by initializing the spinlock
in the driver, and resolves the resulting warning.

Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: linux-serial@vger.kernel.org
Reported-by: NValentin Schneider <valentin.schneider@arm.com>
Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Reviewed-and-tested-by: NValentin Schneider <valentin.schneider@arm.com>
Link: https://lore.kernel.org/r/20200428184050.6501-1-john.stultz@linaro.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

86d39267

perf top: Fix wrong hottest instruction highlighted · ed0207eb

由 He Kuang 提交于 9月 22, 2020

mainline inclusion
from mainline-5.0-rc5
commit da06d568
category: bugfix
bugzilla: 38073
CVE: NA

-------------------------------------------------

The annotation line percentage is compared and inserted into the rbtree,
but the percent field of 'struct annotation_data' is an array, the
comparison result between them is the address difference.

This patch compares the right slot of percent array according to
opts->percent_type and makes things right.

The problem can be reproduced by pressing 'H' in perf top annotation view.
It should highlight the instruction line which has the highest sampling
percentage.
Signed-off-by: NHe Kuang <hekuang@huawei.com>
Reviewed-by: NJiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190120160523.4391-1-hekuang@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NWei Li <liwei391@huawei.com>
Reviewed-by: NJian Cheng <cj.chengjian@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

ed0207eb

xfs: prohibit fs freezing when using empty transactions · 3d507560

由 Darrick J. Wong 提交于 9月 22, 2020

mainline inclusion
from mainline-v5.7-rc1
commit 27fb5a72
category: bugfix
bugzilla: 33320
CVE: NA

---------------------------

I noticed that fsfreeze can take a very long time to freeze an XFS if
there happens to be a GETFSMAP caller running in the background.  I also
happened to notice the following in dmesg:

------------[ cut here ]------------
WARNING: CPU: 2 PID: 43492 at fs/xfs/xfs_super.c:853 xfs_quiesce_attr+0x83/0x90 [xfs]
Modules linked in: xfs libcrc32c ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 ip_set_hash_ip ip_set_hash_net xt_tcpudp xt_set ip_set_hash_mac ip_set nfnetlink ip6table_filter ip6_tables bfq iptable_filter sch_fq_codel ip_tables x_tables nfsv4 af_packet [last unloaded: xfs]
CPU: 2 PID: 43492 Comm: xfs_io Not tainted 5.6.0-rc4-djw #rc4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014
RIP: 0010:xfs_quiesce_attr+0x83/0x90 [xfs]
Code: 7c 07 00 00 85 c0 75 22 48 89 df 5b e9 96 c1 00 00 48 c7 c6 b0 2d 38 a0 48 89 df e8 57 64 ff ff 8b 83 7c 07 00 00 85 c0 74 de <0f> 0b 48 89 df 5b e9 72 c1 00 00 66 90 0f 1f 44 00 00 41 55 41 54
RSP: 0018:ffffc900030f3e28 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff88802ac54000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff81e4a6f0 RDI: 00000000ffffffff
RBP: ffff88807859f070 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000010 R12: 0000000000000000
R13: ffff88807859f388 R14: ffff88807859f4b8 R15: ffff88807859f5e8
FS:  00007fad1c6c0fc0(0000) GS:ffff88807e000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0c7d237000 CR3: 0000000077f01003 CR4: 00000000001606a0
Call Trace:
 xfs_fs_freeze+0x25/0x40 [xfs]
 freeze_super+0xc8/0x180
 do_vfs_ioctl+0x70b/0x750
 ? __fget_files+0x135/0x210
 ksys_ioctl+0x3a/0xb0
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x50/0x1a0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

These two things appear to be related.  The assertion trips when another
thread initiates a fsmap request (which uses an empty transaction) after
the freezer waited for m_active_trans to hit zero but before the the
freezer executes the WARN_ON just prior to calling xfs_log_quiesce.

The lengthy delays in freezing happen because the freezer calls
xfs_wait_buftarg to clean out the buffer lru list.  Meanwhile, the
GETFSMAP caller is continuing to grab and release buffers, which means
that it can take a very long time for the buffer lru list to empty out.

We fix both of these races by calling sb_start_write to obtain freeze
protection while using empty transactions for GETFSMAP and for metadata
scrubbing.  The other two users occur during mount, during which time we
cannot fs freeze.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

Conflict: fs/xfs/scrub/scrub.c
Signed-off-by: Nyu kuai <yukuai3@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

3d507560

xfs: Use scnprintf() for avoiding potential buffer overflow · cb2fab4e

由 Takashi Iwai 提交于 9月 22, 2020

mainline inclusion
from mainline-v5.7-rc1
commit 17bb60b7
category: bugfix
bugzilla: 33322
CVE: NA

---------------------------

Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit.  Fix it by replacing with scnprintf().
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Nyu kuai <yukuai3@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

cb2fab4e

xfs: use bitops interface for buf log item AIL flag check · 830a5fbb

由 Brian Foster 提交于 9月 22, 2020

mainline inclusion
from mainline-5.5-rc3
commit 826f7e34
category: bugfix
bugzilla: 27823
CVE: NA

---------------------------

The xfs_log_item flags were converted to atomic bitops as of commit
22525c17 ("xfs: log item flags are racy"). The assert check for
AIL presence in xfs_buf_item_relse() still uses the old value based
check. This likely went unnoticed as XFS_LI_IN_AIL evaluates to 0
and causes the assert to unconditionally pass. Fix up the check.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Fixes: 22525c17 ("xfs: log item flags are racy")
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Nyu kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

830a5fbb

xfs: fix some memory leaks in log recovery · 3d4eefab

由 Darrick J. Wong 提交于 9月 22, 2020

mainline inclusion
from mainline-5.5-rc1
commit 050552cb
category: bugfix
bugzilla: 26779
CVE: NA

---------------------------

Fix a few places where we xlog_alloc_buffer a buffer, hit an error, and
then bail out without freeing the buffer.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

Conflict: fs/xfs/xfs_log_recover.c
Signed-off-by: Nyu kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

3d4eefab

xfs: convert EIO to EFSCORRUPTED when log contents are invalid · bb99c879

由 Darrick J. Wong 提交于 9月 22, 2020

mainline inclusion
from mainline-5.5-rc1
commit 895e196f
category: bugfix
bugzilla: 26779
CVE: NA

---------------------------

Convert EIO to EFSCORRUPTED in the logging code when we can determine
that the log contents are invalid.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: Nyu kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

bb99c879

xfs: fix inode fork extent count overflow · 59ce3b4b

由 Dave Chinner 提交于 9月 22, 2020

mainline inclusion
from mainline-5.5-rc1
commit 3f8a4f1d
category: bugfix
bugzilla: 26723
CVE: NA

---------------------------

[commit message is verbose for discussion purposes - will trim it
down later. Some questions about implementation details at the end.]

Zorro Lang recently ran a new test to stress single inode extent
counts now that they are no longer limited by memory allocation.
The test was simply:

This test uncovered a problem where the hole punching operation
appeared to finish with no error, but apparently only created 268M
extents instead of the 10 billion it was supposed to.

Further, trying to punch out extents that should have been present
resulted in success, but no change in the extent count. It looked
like a silent failure.

While running the test and observing the behaviour in real time,
I observed the extent coutn growing at ~2M extents/minute, and saw
this after about an hour:

> sleep 60 ; \
> xfs_io -f -c "stat" /mnt/scratch/big-file |grep next
fsxattr.nextents = 127657993
fsxattr.nextents = 129683339

And a few minutes later this:

fsxattr.nextents = 4177861124

Ah, what? Where did that 4 billion extra extents suddenly come from?

Stop the workload, unmount, mount:

fsxattr.nextents = 166044375

And it's back at the expected number. i.e. the extent count is
correct on disk, but it's screwed up in memory. I loaded up the
extent list, and immediately:

fsxattr.nextents = 4192576215

It's bad again. So, where does that number come from?
xfs_fill_fsxattr():

                if (ip->i_df.if_flags & XFS_IFEXTENTS)
                        fa->fsx_nextents = xfs_iext_count(&ip->i_df);
                else
                        fa->fsx_nextents = ip->i_d.di_nextents;

And that's the behaviour I just saw in a nutshell. The on disk count
is correct, but once the tree is loaded into memory, it goes whacky.
Clearly there's something wrong with xfs_iext_count():

inline xfs_extnum_t xfs_iext_count(struct xfs_ifork *ifp)
{
        return ifp->if_bytes / sizeof(struct xfs_iext_rec);
}

Simple enough, but 134M extents is 2**27, and that's right about
where things went wrong. A struct xfs_iext_rec is 16 bytes in size,
which means 2**27 * 2**4 = 2**31 and we're right on target for an
integer overflow. And, sure enough:

struct xfs_ifork {
        int                     if_bytes;       /* bytes in if_u1 */
....

Once we get 2**27 extents in a file, we overflow if_bytes and the
in-core extent count goes wrong. And when we reach 2**28 extents,
if_bytes wraps back to zero and things really start to go wrong
there. This is where the silent failure comes from - only the first
2**28 extents can be looked up directly due to the overflow, all the
extents above this index wrap back to somewhere in the first 2**28
extents. Hence with a regular pattern, trying to punch a hole in the
range that didn't have holes mapped to a hole in the first 2**28
extents and so "succeeded" without changing anything. Hence "silent
failure"...

Fix this by converting if_bytes to a int64_t and converting all the
index variables and size calculations to use int64_t types to avoid
overflows in future. Signed integers are still used to enable easy
detection of extent count underflows. This enables scalability of
extent counts to the limits of the on-disk format - MAXEXTNUM
(2**31) extents.

Current testing is at over 500M extents and still going:

fsxattr.nextents = 517310478
Reported-by: NZorro Lang <zlang@redhat.com>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

Conflict: fs/xfs/libxfs/xfs_inode_fork.h
Signed-off-by: Nyu kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

59ce3b4b

nvme: fix memory leak caused by incorrect subsystem free · 068400df

由 Logan Gunthorpe 提交于 9月 22, 2020

mainline inclusion
from mainline-5.3-rc2
commit e654dfd3
category: bugfix
bugzilla: 19960
CVE: NA
---------------------------

When freeing the subsystem after finding another match with
__nvme_find_get_subsystem(), use put_device() instead of
__nvme_release_subsystem() which calls kfree() directly.

Per the documentation, put_device() should always be used
after device_initialization() is called. Otherwise, leaks
like the one below which was detected by kmemleak may occur.

Once the call of __nvme_release_subsystem() is removed it no
longer makes sense to keep the helper, so fold it back
into nvme_release_subsystem().

unreferenced object 0xffff8883d12bfbc0 (size 16):
  comm "nvme", pid 2635, jiffies 4294933602 (age 739.952s)
  hex dump (first 16 bytes):
    6e 76 6d 65 2d 73 75 62 73 79 73 32 00 88 ff ff  nvme-subsys2....
  backtrace:
    [<000000007d8fc208>] __kmalloc_track_caller+0x16d/0x2a0
    [<0000000081169e5f>] kvasprintf+0xad/0x130
    [<0000000025626f25>] kvasprintf_const+0x47/0x120
    [<00000000fa66ad36>] kobject_set_name_vargs+0x44/0x120
    [<000000004881f8b3>] dev_set_name+0x98/0xc0
    [<000000007124dae3>] nvme_init_identify+0x1995/0x38e0
    [<000000009315020a>] nvme_loop_configure_admin_queue+0x4fa/0x5e0
    [<000000001a63e766>] nvme_loop_create_ctrl+0x489/0xf80
    [<00000000a46ecc23>] nvmf_dev_write+0x1a12/0x2220
    [<000000002259b3d5>] __vfs_write+0x66/0x120
    [<000000002f6df81e>] vfs_write+0x154/0x490
    [<000000007e8cfc19>] ksys_write+0x10a/0x240
    [<00000000ff5c7b85>] __x64_sys_write+0x73/0xb0
    [<00000000fee6d692>] do_syscall_64+0xaa/0x470
    [<00000000997e1ede>] entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: ab9e00cc ("nvme: track subsystems")
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSun Ke <sunke32@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

068400df

nvme: fix possible deadlock when nvme_update_formats fails · 066b9c4c

由 Sagi Grimberg 提交于 9月 22, 2020

mainline inclusion
from mainline-5.4-rc4
commit 6abff1b9
category: bugfix
bugzilla: 24170
CVE: NA
---------------------------

nvme_update_formats may fail to revalidate the namespace and
attempt to remove the namespace. This may lead to a deadlock
as nvme_ns_remove will attempt to acquire the subsystem lock
which is already acquired by the passthru command with effects.

Move the invalid namepsace removal to after the passthru command
releases the subsystem lock.
Reported-by: NJudy Brock <judy.brock@samsung.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NSun Ke <sunke32@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

066b9c4c

dm verity: don't prefetch hash blocks for already-verified data · 12f27b72

由 xianrong.zhou 提交于 9月 22, 2020

mainline inclusion
from mainline-5.6-rc1
commit 0a531c5a
category: bugfix
bugzilla: 29444
CVE: NA
---------------------------

Try to skip prefetching hash blocks that won't be needed due to the
"check_at_most_once" option being enabled and the corresponding data
blocks already having been verified.

Since prefetching operates on a range of data blocks, do this by just
trimming the two ends of the range.  This doesn't skip every unneeded
hash block, since data blocks in the middle of the range could also be
unneeded, and hash blocks are still prefetched in large clusters as
controlled by dm_verity_prefetch_cluster.  But it can still help a lot.

In a test on Android Q launching 91 apps every 15s repeated 21 times,
prefetching was only done for 447177/4776629 = 9.36% of data blocks.
Tested-by: Nruxian.feng <ruxian.feng@transsion.com>
Co-developed-by: Nyuanjiong.gao <yuanjiong.gao@transsion.com>
Signed-off-by: Nyuanjiong.gao <yuanjiong.gao@transsion.com>
Signed-off-by: Nxianrong.zhou <xianrong.zhou@transsion.com>
[EB: simplified the 'while' loops and improved the commit message]
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NSun Ke <sunke32@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

12f27b72

arm64: kprobes: Recover pstate.D in single-step exception handler · 9bcec7a7

由 Masami Hiramatsu 提交于 9月 22, 2020

mainline inclusion
from mainline-5.3-rc3
commit b3980e48
category: bugfix
bugzilla: 20080
CVE: NA

-------------------------------------------------

kprobes manipulates the interrupted PSTATE for single step, and
doesn't restore it. Thus, if we put a kprobe where the pstate.D
(debug) masked, the mask will be cleared after the kprobe hits.

Moreover, in the most complicated case, this can lead a kernel
crash with below message when a nested kprobe hits.

[  152.118921] Unexpected kernel single-step exception at EL1

When the 1st kprobe hits, do_debug_exception() will be called.
At this point, debug exception (= pstate.D) must be masked (=1).
But if another kprobes hits before single-step of the first kprobe
(e.g. inside user pre_handler), it unmask the debug exception
(pstate.D = 0) and return.
Then, when the 1st kprobe setting up single-step, it saves current
DAIF, mask DAIF, enable single-step, and restore DAIF.
However, since "D" flag in DAIF is cleared by the 2nd kprobe, the
single-step exception happens soon after restoring DAIF.

This has been introduced by commit 7419333f ("arm64: kprobe:
Always clear pstate.D in breakpoint exception handler")

To solve this issue, this stores all DAIF bits and restore it
after single stepping.
Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
Fixes: 7419333f ("arm64: kprobe: Always clear pstate.D in breakpoint exception handler")
Reviewed-by: NJames Morse <james.morse@arm.com>
Tested-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NWei Li <liwei391@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

9bcec7a7

nbd: fix possible page fault for nbd disk · 86d4595d

由 Xiubo Li 提交于 9月 22, 2020

mainline inclusion
from mainline-5.4-rc1
commit 8454d685
category: bugfix
bugzilla: 23253
CVE: NA
---------------------------

When the NBD_CFLAG_DESTROY_ON_DISCONNECT flag is set and at the same
time when the socket is closed due to the server daemon is restarted,
just before the last DISCONNET is totally done if we start a new connection
by using the old nbd_index, there will be crashing randomly, like:

<3>[  110.151949] block nbd1: Receive control failed (result -32)
<1>[  110.152024] BUG: unable to handle page fault for address: 0000058000000840
<1>[  110.152063] #PF: supervisor read access in kernel mode
<1>[  110.152083] #PF: error_code(0x0000) - not-present page
<6>[  110.152094] PGD 0 P4D 0
<4>[  110.152106] Oops: 0000 [#1] SMP PTI
<4>[  110.152120] CPU: 0 PID: 6698 Comm: kworker/u5:1 Kdump: loaded Not tainted 5.3.0-rc4+ #2
<4>[  110.152136] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
<4>[  110.152166] Workqueue: knbd-recv recv_work [nbd]
<4>[  110.152187] RIP: 0010:__dev_printk+0xd/0x67
<4>[  110.152206] Code: 10 e8 c5 fd ff ff 48 8b 4c 24 18 65 48 33 0c 25 28 00 [...]
<4>[  110.152244] RSP: 0018:ffffa41581f13d18 EFLAGS: 00010206
<4>[  110.152256] RAX: ffffa41581f13d30 RBX: ffff96dd7374e900 RCX: 0000000000000000
<4>[  110.152271] RDX: ffffa41581f13d20 RSI: 00000580000007f0 RDI: ffffffff970ec24f
<4>[  110.152285] RBP: ffffa41581f13d80 R08: ffff96dd7fc17908 R09: 0000000000002e56
<4>[  110.152299] R10: ffffffff970ec24f R11: 0000000000000003 R12: ffff96dd7374e900
<4>[  110.152313] R13: 0000000000000000 R14: ffff96dd7374e9d8 R15: ffff96dd6e3b02c8
<4>[  110.152329] FS:  0000000000000000(0000) GS:ffff96dd7fc00000(0000) knlGS:0000000000000000
<4>[  110.152362] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  110.152383] CR2: 0000058000000840 CR3: 0000000067cc6002 CR4: 00000000001606f0
<4>[  110.152401] Call Trace:
<4>[  110.152422]  _dev_err+0x6c/0x83
<4>[  110.152435]  nbd_read_stat.cold+0xda/0x578 [nbd]
<4>[  110.152448]  ? __switch_to_asm+0x34/0x70
<4>[  110.152468]  ? __switch_to_asm+0x40/0x70
<4>[  110.152478]  ? __switch_to_asm+0x34/0x70
<4>[  110.152491]  ? __switch_to_asm+0x40/0x70
<4>[  110.152501]  ? __switch_to_asm+0x34/0x70
<4>[  110.152511]  ? __switch_to_asm+0x40/0x70
<4>[  110.152522]  ? __switch_to_asm+0x34/0x70
<4>[  110.152533]  recv_work+0x35/0x9e [nbd]
<4>[  110.152547]  process_one_work+0x19d/0x340
<4>[  110.152558]  worker_thread+0x50/0x3b0
<4>[  110.152568]  kthread+0xfb/0x130
<4>[  110.152577]  ? process_one_work+0x340/0x340
<4>[  110.152609]  ? kthread_park+0x80/0x80
<4>[  110.152637]  ret_from_fork+0x35/0x40

This is very easy to reproduce by running the nbd-runner.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NXiubo Li <xiubli@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NSunKe <sunke32@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

86d4595d

nbd: rename the runtime flags as NBD_RT_ prefixed · 1d6a11af

由 Xiubo Li 提交于 9月 22, 2020

mainline inclusion
from mainline-5.4-rc1
commit ec76a7b9
category: bugfix
bugzilla: 23253
CVE: NA
---------------------------

Preparing for the destory when disconnecting crash fixing.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NXiubo Li <xiubli@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NSun Ke <sunke32@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

1d6a11af

jbd2: flush_descriptor(): Do not decrease buffer head's ref count · 16b5dfc8

由 Chandan Rajendra 提交于 9月 22, 2020

mainline inclusion
from mainline-5.4-rc1
commit 547b9ad6
category: bugfix
bugzilla: 23039
CVE: NA
---------------------------

When executing generic/388 on a ppc64le machine, we notice the following
call trace,

VFS: brelse: Trying to free free buffer
WARNING: CPU: 0 PID: 6637 at /root/repos/linux/fs/buffer.c:1195 __brelse+0x84/0xc0

Call Trace:
 __brelse+0x80/0xc0 (unreliable)
 invalidate_bh_lru+0x78/0xc0
 on_each_cpu_mask+0xa8/0x130
 on_each_cpu_cond_mask+0x130/0x170
 invalidate_bh_lrus+0x44/0x60
 invalidate_bdev+0x38/0x70
 ext4_put_super+0x294/0x560
 generic_shutdown_super+0xb0/0x170
 kill_block_super+0x38/0xb0
 deactivate_locked_super+0xa4/0xf0
 cleanup_mnt+0x164/0x1d0
 task_work_run+0x110/0x160
 do_notify_resume+0x414/0x460
 ret_from_except_lite+0x70/0x74

The warning happens because flush_descriptor() drops bh reference it
does not own. The bh reference acquired by
jbd2_journal_get_descriptor_buffer() is owned by the log_bufs list and
gets released when this list is processed. The reference for doing IO is
only acquired in write_dirty_buffer() later in flush_descriptor().
Reported-by: NHarish Sriram <harish@linux.ibm.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NChandan Rajendra <chandan@linux.ibm.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

16b5dfc8

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功