- 09 4月, 2021 40 次提交
-
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc4 commit 2d01ddc8 category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- If journalling is still working at the moment we get to writing error information to the superblock we cannot write directly to the superblock as such write could race with journalled update of the superblock and cause journal checksum failures, writing inconsistent information to the journal or other problems. We cannot journal the superblock directly from the error handling functions as we are running in uncertain context and could deadlock so just punt journalled superblock update to a workqueue. Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201216101844.22917-5-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc4 commit 05c2c00f category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- Protect all superblock modifications (including checksum computation) with a superblock buffer lock. That way we are sure computed checksum matches current superblock contents (a mismatch could cause checksum failures in nojournal mode or if an unjournalled superblock update races with a journalled one). Also we avoid modifying superblock contents while it is being written out (which can cause DIF/DIX failures if we are running in nojournal mode). Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201216101844.22917-4-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> conflicts: fs/ext4/file.c Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc4 commit 4392fbc4 category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- Everybody passes 1 as sync argument of ext4_commit_super(). Just drop it. Reviewed-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com> Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201216101844.22917-3-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc4 commit e789ca0c category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- save_error_info() is always called together with ext4_handle_error(). Combine them into a single call and move unconditional bits out of save_error_info() into ext4_handle_error(). Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201216101844.22917-2-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc1 commit c92dc856 category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- When filesystem inconsistency is detected with group locked, we currently try to modify superblock to store error there without blocking. However this can cause superblock checksum failures (or DIF/DIX failure) when the superblock is just being written out. Make error handling code just store error information in ext4_sb_info structure and copy it to on-disk superblock only in ext4_commit_super(). In case of error happening with group locked, we just postpone the superblock flushing to a workqueue. [ Added fixup so that s_first_error_* does not get updated after the file system is remounted. Also added fix for syzbot failure. - Ted ] Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201127113405.26867-8-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: Hillf Danton <hdanton@sina.com> Reported-by: syzbot+9043030c040ce1849a60@syzkaller.appspotmail.com Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc1 commit 02a7780e category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- We convert errno's to ext4 on-disk format error codes in save_error_info(). Add a function and a bit of macro magic to make this simpler. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127113405.26867-7-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc1 commit 40676623 category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- Just move error info related functions in super.c close to ext4_handle_error(). We'll want to combine save_error_info() with ext4_handle_error() and this makes change more obvious and saves a forward declaration as well. No functional change. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127113405.26867-6-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc1 commit 014c9caa category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- The only difference between __ext4_abort() and __ext4_error() is that the former one ignores errors=continue mount option. Unify the code to reduce duplication. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127113405.26867-5-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc1 commit 93c20bc3 category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- We use __ext4_error() when ext4_protect_reserved_inode() finds filesystem corruption. However EXT4_ERROR_INODE_ERR() is perfectly capable of reporting all the needed information. So just use that. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127113405.26867-4-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
mainline inclusion from mainline-v5.11-rc1 commit 81414b4d category: bugfix bugzilla: 50839 CVE: NA ----------------------------------------------- Superblock is written out either through ext4_commit_super() or through ext4_handle_dirty_super(). In both cases we recompute the checksum so it is not necessary to recompute it after updating superblock free inodes & blocks counters. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201127113405.26867-3-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NYe Bin <yebin10@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Lior Ribak 提交于
stable inclusion from stable-5.10.24 commit 5ab9464a2a3c538eedbb438f1802f2fd98d0953f bugzilla: 51348 -------------------------------- commit e7850f4d upstream. There is a deadlock in bm_register_write: First, in the begining of the function, a lock is taken on the binfmt_misc root inode with inode_lock(d_inode(root)). Then, if the user used the MISC_FMT_OPEN_FILE flag, the function will call open_exec on the user-provided interpreter. open_exec will call a path lookup, and if the path lookup process includes the root of binfmt_misc, it will try to take a shared lock on its inode again, but it is already locked, and the code will get stuck in a deadlock To reproduce the bug: $ echo ":iiiii:E::ii::/proc/sys/fs/binfmt_misc/bla:F" > /proc/sys/fs/binfmt_misc/register backtrace of where the lock occurs (#5): 0 schedule () at ./arch/x86/include/asm/current.h:15 1 0xffffffff81b51237 in rwsem_down_read_slowpath (sem=0xffff888003b202e0, count=<optimized out>, state=state@entry=2) at kernel/locking/rwsem.c:992 2 0xffffffff81b5150a in __down_read_common (state=2, sem=<optimized out>) at kernel/locking/rwsem.c:1213 3 __down_read (sem=<optimized out>) at kernel/locking/rwsem.c:1222 4 down_read (sem=<optimized out>) at kernel/locking/rwsem.c:1355 5 0xffffffff811ee22a in inode_lock_shared (inode=<optimized out>) at ./include/linux/fs.h:783 6 open_last_lookups (op=0xffffc9000022fe34, file=0xffff888004098600, nd=0xffffc9000022fd10) at fs/namei.c:3177 7 path_openat (nd=nd@entry=0xffffc9000022fd10, op=op@entry=0xffffc9000022fe34, flags=flags@entry=65) at fs/namei.c:3366 8 0xffffffff811efe1c in do_filp_open (dfd=<optimized out>, pathname=pathname@entry=0xffff8880031b9000, op=op@entry=0xffffc9000022fe34) at fs/namei.c:3396 9 0xffffffff811e493f in do_open_execat (fd=fd@entry=-100, name=name@entry=0xffff8880031b9000, flags=<optimized out>, flags@entry=0) at fs/exec.c:913 10 0xffffffff811e4a92 in open_exec (name=<optimized out>) at fs/exec.c:948 11 0xffffffff8124aa84 in bm_register_write (file=<optimized out>, buffer=<optimized out>, count=19, ppos=<optimized out>) at fs/binfmt_misc.c:682 12 0xffffffff811decd2 in vfs_write (file=file@entry=0xffff888004098500, buf=buf@entry=0xa758d0 ":iiiii:E::ii::i:CF ", count=count@entry=19, pos=pos@entry=0xffffc9000022ff10) at fs/read_write.c:603 13 0xffffffff811defda in ksys_write (fd=<optimized out>, buf=0xa758d0 ":iiiii:E::ii::i:CF ", count=19) at fs/read_write.c:658 14 0xffffffff81b49813 in do_syscall_64 (nr=<optimized out>, regs=0xffffc9000022ff58) at arch/x86/entry/common.c:46 15 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:120 To solve the issue, the open_exec call is moved to before the write lock is taken by bm_register_write Link: https://lkml.kernel.org/r/20210228224414.95962-1-liorribak@gmail.com Fixes: 948b701a ("binfmt_misc: add persistent opened binary handler for containers") Signed-off-by: NLior Ribak <liorribak@gmail.com> Acked-by: NHelge Deller <deller@gmx.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Daiyue Zhang 提交于
stable inclusion from stable-5.10.24 commit 109720342efd6ace3d2e8f34a25ea65036bb1d3b bugzilla: 51348 -------------------------------- [ Upstream commit 14fbbc82 ] Commit b0841eef ("configfs: provide exclusion between IO and removals") uses ->frag_dead to mark the fragment state, thus no bothering with extra refcount on config_item when opening a file. The configfs_get_config_item was removed in __configfs_open_file, but not with config_item_put. So the refcount on config_item will lost its balance, causing use-after-free issues in some occasions like this: Test: 1. Mount configfs on /config with read-only items: drwxrwx--- 289 root root 0 2021-04-01 11:55 /config drwxr-xr-x 2 root root 0 2021-04-01 11:54 /config/a --w--w--w- 1 root root 4096 2021-04-01 11:53 /config/a/1.txt ...... 2. Then run: for file in /config do echo $file grep -R 'key' $file done 3. __configfs_open_file will be called in parallel, the first one got called will do: if (file->f_mode & FMODE_READ) { if (!(inode->i_mode & S_IRUGO)) goto out_put_module; config_item_put(buffer->item); kref_put() package_details_release() kfree() the other one will run into use-after-free issues like this: BUG: KASAN: use-after-free in __configfs_open_file+0x1bc/0x3b0 Read of size 8 at addr fffffff155f02480 by task grep/13096 CPU: 0 PID: 13096 Comm: grep VIP: 00 Tainted: G W 4.14.116-kasan #1 TGID: 13096 Comm: grep Call trace: dump_stack+0x118/0x160 kasan_report+0x22c/0x294 __asan_load8+0x80/0x88 __configfs_open_file+0x1bc/0x3b0 configfs_open_file+0x28/0x34 do_dentry_open+0x2cc/0x5c0 vfs_open+0x80/0xe0 path_openat+0xd8c/0x2988 do_filp_open+0x1c4/0x2fc do_sys_open+0x23c/0x404 SyS_openat+0x38/0x48 Allocated by task 2138: kasan_kmalloc+0xe0/0x1ac kmem_cache_alloc_trace+0x334/0x394 packages_make_item+0x4c/0x180 configfs_mkdir+0x358/0x740 vfs_mkdir2+0x1bc/0x2e8 SyS_mkdirat+0x154/0x23c el0_svc_naked+0x34/0x38 Freed by task 13096: kasan_slab_free+0xb8/0x194 kfree+0x13c/0x910 package_details_release+0x524/0x56c kref_put+0xc4/0x104 config_item_put+0x24/0x34 __configfs_open_file+0x35c/0x3b0 configfs_open_file+0x28/0x34 do_dentry_open+0x2cc/0x5c0 vfs_open+0x80/0xe0 path_openat+0xd8c/0x2988 do_filp_open+0x1c4/0x2fc do_sys_open+0x23c/0x404 SyS_openat+0x38/0x48 el0_svc_naked+0x34/0x38 To fix this issue, remove the config_item_put in __configfs_open_file to balance the refcount of config_item. Fixes: b0841eef ("configfs: provide exclusion between IO and removals") Signed-off-by: NDaiyue Zhang <zhangdaiyue1@huawei.com> Signed-off-by: NYi Chen <chenyi77@huawei.com> Signed-off-by: NGe Qiu <qiuge@huawei.com> Reviewed-by: NChao Yu <yuchao0@huawei.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Ondrej Mosnacek 提交于
stable inclusion from stable-5.10.24 commit caa86901c863e7c3646d189f2deb9e844afd0568 bugzilla: 51348 -------------------------------- [ Upstream commit 53cb2454 ] An xattr 'get' handler is expected to return the length of the value on success, yet _nfs4_get_security_label() (and consequently also nfs4_xattr_get_nfs4_label(), which is used as an xattr handler) returns just 0 on success. Fix this by returning label.len instead, which contains the length of the result. Fixes: aa9c2669 ("NFS: Client implementation of Labeled-NFS") Signed-off-by: NOndrej Mosnacek <omosnace@redhat.com> Reviewed-by: NJames Morris <jamorris@linux.microsoft.com> Reviewed-by: NPaul Moore <paul@paul-moore.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Trond Myklebust 提交于
stable inclusion from stable-5.10.24 commit e181960ec51d5fa089d6e8e2478febe01ca8be04 bugzilla: 51348 -------------------------------- [ Upstream commit 47397915 ] The fact that the lookup revalidation failed, does not mean that the inode contents have changed. Fixes: 5ceb9d7f ("NFS: Refactor nfs_lookup_revalidate()") Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Trond Myklebust 提交于
stable inclusion from stable-5.10.24 commit dd756d05bee58077ea0239861022ca83e7d8d23d bugzilla: 51348 -------------------------------- [ Upstream commit 82e7ca13 ] There should be no reason to expect the directory permissions to change just because the directory contents changed or a negative lookup timed out. So let's avoid doing a full call to nfs_mark_for_revalidate() in that case. Furthermore, if this is a negative dentry, and we haven't actually done a new lookup, then we have no reason yet to believe the directory has changed at all. So let's remove the gratuitous directory inode invalidation altogether when called from nfs_lookup_revalidate_negative(). Reported-by: NGeert Jansen <gerardu@amazon.com> Fixes: 5ceb9d7f ("NFS: Refactor nfs_lookup_revalidate()") Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Paulo Alcantara 提交于
stable inclusion from stable-5.10.24 commit d308202c1b96024a2f3325642f5e087cf997b5d9 bugzilla: 51348 -------------------------------- commit 04ad69c3 upstream. In case of interrupted syscalls, prevent sending CLOSE commands for compound CREATE+CLOSE requests by introducing an CIFS_CP_CREATE_CLOSE_OP flag to indicate lower layers that it should not send a CLOSE command to the MIDs corresponding the compound CREATE+CLOSE request. A simple reproducer: #!/bin/bash mount //server/share /mnt -o username=foo,password=*** tc qdisc add dev eth0 root netem delay 450ms stat -f /mnt &>/dev/null & pid=$! sleep 0.01 kill $pid tc qdisc del dev eth0 root umount /mnt Before patch: ... 6 0.256893470 192.168.122.2 → 192.168.122.15 SMB2 402 Create Request File: ;GetInfo Request FS_INFO/FileFsFullSizeInformation;Close Request 7 0.257144491 192.168.122.15 → 192.168.122.2 SMB2 498 Create Response File: ;GetInfo Response;Close Response 9 0.260798209 192.168.122.2 → 192.168.122.15 SMB2 146 Close Request File: 10 0.260841089 192.168.122.15 → 192.168.122.2 SMB2 130 Close Response, Error: STATUS_FILE_CLOSED Signed-off-by: NPaulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com> Reviewed-by: NAurelien Aptel <aaptel@suse.com> CC: <stable@vger.kernel.org> Signed-off-by: NSteve French <stfrench@microsoft.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jan Kara 提交于
stable inclusion from stable-5.10.24 commit d44c9780ed40db88626c9354868eab72159c7a7f bugzilla: 51348 -------------------------------- commit 56887cff upstream. Commit 384d87ef ("block: Do not discard buffers under a mounted filesystem") made paths issuing discard or zeroout requests to the underlying device try to grab block device in exclusive mode. If that failed we returned EBUSY to userspace. This however caused unexpected fallout in userspace where e.g. FUSE filesystems issue discard requests from userspace daemons although the device is open exclusively by the kernel. Also shrinking of logical volume by LVM issues discard requests to a device which may be claimed exclusively because there's another LV on the same PV. So to avoid these userspace regressions, fall back to invalidate_inode_pages2_range() instead of returning EBUSY to userspace and return EBUSY only of that call fails as well (meaning that there's indeed someone using the particular device range we are trying to discard). Link: https://bugzilla.kernel.org/show_bug.cgi?id=211167 Fixes: 384d87ef ("block: Do not discard buffers under a mounted filesystem") CC: stable@vger.kernel.org Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Theodore Ts'o 提交于
stable inclusion from stable-5.10.24 commit 64578f9417e1e3482f3e4492496772fca130f526 bugzilla: 51348 -------------------------------- [ Upstream commit 027f14f5 ] If we try to make any changes via the journal between when the journal is initialized, but before the multi-block allocated is initialized, we will end up deferencing a NULL pointer when the journal commit callback function calls ext4_process_freed_data(). The proximate cause of this failure was commit 2d01ddc8 ("ext4: save error info to sb through journal if available") since file system corruption problems detected before the call to ext4_mb_init() would result in a journal commit before we aborted the mount of the file system.... and we would then trigger the NULL pointer deref. Link: https://lore.kernel.org/r/YAm8qH/0oo2ofSMR@mit.eduReported-by: NMurphy Zhou <jencce.kernel@gmail.com> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Steven J. Magnani 提交于
stable inclusion from stable-5.10.24 commit 82d6c12899e2645bd17d6f9c7d494f360e1089e1 bugzilla: 51348 -------------------------------- [ Upstream commit 63c9e47a ] When extending a file, udf_do_extend_file() may enter following empty indirect extent. At the end of udf_do_extend_file() we revert prev_epos to point to the last written extent. However if we end up not adding any further extent in udf_do_extend_file(), the reverting points prev_epos into the header area of the AED and following updates of the extents (in udf_update_extents()) will corrupt the header. Make sure that we do not follow indirect extent if we are not going to add any more extents so that returning back to the last written extent works correctly. Link: https://lore.kernel.org/r/20210107234116.6190-2-magnani@ieee.orgSigned-off-by: NSteven J. Magnani <magnani@ieee.org> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Aurelien Aptel 提交于
stable inclusion from stable-5.10.24 commit 3370a84d781ca5227682bd6e747aaefb6dcc8e21 bugzilla: 51348 -------------------------------- commit a249cc8b upstream. With multichannel, operations like the queries from "ls -lR" can cause all credits to be used and errors to be returned since max_credits was not being set correctly on the secondary channels and thus the client was requesting 0 credits incorrectly in some cases (which can lead to not having enough credits to perform any operation on that channel). Signed-off-by: NAurelien Aptel <aaptel@suse.com> CC: <stable@vger.kernel.org> # v5.8+ Reviewed-by: NShyam Prasad N <sprasad@microsoft.com> Signed-off-by: NSteve French <stfrench@microsoft.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Paulo Alcantara 提交于
stable inclusion from stable-5.10.24 commit 3d0bbd97eb6f32bcc1365252aa04a8984bab5007 bugzilla: 51348 -------------------------------- commit 14302ee3 upstream. In cifs_statfs(), if server->ops->queryfs is not NULL, then we should use its return value rather than always returning 0. Instead, use rc variable as it is properly set to 0 in case there is no server->ops->queryfs. Signed-off-by: NPaulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: NAurelien Aptel <aaptel@suse.com> Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com> CC: <stable@vger.kernel.org> Signed-off-by: NSteve French <stfrench@microsoft.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Christian Brauner 提交于
stable inclusion from stable-5.10.24 commit 36e1efcdc54274d03e67ed6a9d5c1c2a2e77e947 bugzilla: 51348 -------------------------------- commit ee2e3f50 upstream. Creating a series of detached mounts, attaching them to the filesystem, and unmounting them can be used to trigger an integer overflow in ns->mounts causing the kernel to block any new mounts in count_mounts() and returning ENOSPC because it falsely assumes that the maximum number of mounts in the mount namespace has been reached, i.e. it thinks it can't fit the new mounts into the mount namespace anymore. Depending on the number of mounts in your system, this can be reproduced on any kernel that supportes open_tree() and move_mount() by compiling and running the following program: /* SPDX-License-Identifier: LGPL-2.1+ */ #define _GNU_SOURCE #include <errno.h> #include <fcntl.h> #include <getopt.h> #include <limits.h> #include <stdbool.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/mount.h> #include <sys/stat.h> #include <sys/syscall.h> #include <sys/types.h> #include <unistd.h> /* open_tree() */ #ifndef OPEN_TREE_CLONE #define OPEN_TREE_CLONE 1 #endif #ifndef OPEN_TREE_CLOEXEC #define OPEN_TREE_CLOEXEC O_CLOEXEC #endif #ifndef __NR_open_tree #if defined __alpha__ #define __NR_open_tree 538 #elif defined _MIPS_SIM #if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */ #define __NR_open_tree 4428 #endif #if _MIPS_SIM == _MIPS_SIM_NABI32 /* n32 */ #define __NR_open_tree 6428 #endif #if _MIPS_SIM == _MIPS_SIM_ABI64 /* n64 */ #define __NR_open_tree 5428 #endif #elif defined __ia64__ #define __NR_open_tree (428 + 1024) #else #define __NR_open_tree 428 #endif #endif /* move_mount() */ #ifndef MOVE_MOUNT_F_EMPTY_PATH #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004 /* Empty from path permitted */ #endif #ifndef __NR_move_mount #if defined __alpha__ #define __NR_move_mount 539 #elif defined _MIPS_SIM #if _MIPS_SIM == _MIPS_SIM_ABI32 /* o32 */ #define __NR_move_mount 4429 #endif #if _MIPS_SIM == _MIPS_SIM_NABI32 /* n32 */ #define __NR_move_mount 6429 #endif #if _MIPS_SIM == _MIPS_SIM_ABI64 /* n64 */ #define __NR_move_mount 5429 #endif #elif defined __ia64__ #define __NR_move_mount (428 + 1024) #else #define __NR_move_mount 429 #endif #endif static inline int sys_open_tree(int dfd, const char *filename, unsigned int flags) { return syscall(__NR_open_tree, dfd, filename, flags); } static inline int sys_move_mount(int from_dfd, const char *from_pathname, int to_dfd, const char *to_pathname, unsigned int flags) { return syscall(__NR_move_mount, from_dfd, from_pathname, to_dfd, to_pathname, flags); } static bool is_shared_mountpoint(const char *path) { bool shared = false; FILE *f = NULL; char *line = NULL; int i; size_t len = 0; f = fopen("/proc/self/mountinfo", "re"); if (!f) return 0; while (getline(&line, &len, f) > 0) { char *slider1, *slider2; for (slider1 = line, i = 0; slider1 && i < 4; i++) slider1 = strchr(slider1 + 1, ' '); if (!slider1) continue; slider2 = strchr(slider1 + 1, ' '); if (!slider2) continue; *slider2 = '\0'; if (strcmp(slider1 + 1, path) == 0) { /* This is the path. Is it shared? */ slider1 = strchr(slider2 + 1, ' '); if (slider1 && strstr(slider1, "shared:")) { shared = true; break; } } } fclose(f); free(line); return shared; } static void usage(void) { const char *text = "mount-new [--recursive] <base-dir>\n"; fprintf(stderr, "%s", text); _exit(EXIT_SUCCESS); } #define exit_usage(format, ...) \ ({ \ fprintf(stderr, format "\n", ##__VA_ARGS__); \ usage(); \ }) #define exit_log(format, ...) \ ({ \ fprintf(stderr, format "\n", ##__VA_ARGS__); \ exit(EXIT_FAILURE); \ }) static const struct option longopts[] = { {"help", no_argument, 0, 'a'}, { NULL, no_argument, 0, 0 }, }; int main(int argc, char *argv[]) { int exit_code = EXIT_SUCCESS, index = 0; int dfd, fd_tree, new_argc, ret; char *base_dir; char *const *new_argv; char target[PATH_MAX]; while ((ret = getopt_long_only(argc, argv, "", longopts, &index)) != -1) { switch (ret) { case 'a': /* fallthrough */ default: usage(); } } new_argv = &argv[optind]; new_argc = argc - optind; if (new_argc < 1) exit_usage("Missing base directory\n"); base_dir = new_argv[0]; if (*base_dir != '/') exit_log("Please specify an absolute path"); /* Ensure that target is a shared mountpoint. */ if (!is_shared_mountpoint(base_dir)) exit_log("Please ensure that \"%s\" is a shared mountpoint", base_dir); dfd = open(base_dir, O_RDONLY | O_DIRECTORY | O_CLOEXEC); if (dfd < 0) exit_log("%m - Failed to open base directory \"%s\"", base_dir); ret = mkdirat(dfd, "detached-move-mount", 0755); if (ret < 0) exit_log("%m - Failed to create required temporary directories"); ret = snprintf(target, sizeof(target), "%s/detached-move-mount", base_dir); if (ret < 0 || (size_t)ret >= sizeof(target)) exit_log("%m - Failed to assemble target path"); /* * Having a mount table with 10000 mounts is already quite excessive * and shoult account even for weird test systems. */ for (size_t i = 0; i < 10000; i++) { fd_tree = sys_open_tree(dfd, "detached-move-mount", OPEN_TREE_CLONE | OPEN_TREE_CLOEXEC | AT_EMPTY_PATH); if (fd_tree < 0) { fprintf(stderr, "%m - Failed to open %d(detached-move-mount)", dfd); exit_code = EXIT_FAILURE; break; } ret = sys_move_mount(fd_tree, "", dfd, "detached-move-mount", MOVE_MOUNT_F_EMPTY_PATH); if (ret < 0) { if (errno == ENOSPC) fprintf(stderr, "%m - Buggy mount counting"); else fprintf(stderr, "%m - Failed to attach mount to %d(detached-move-mount)", dfd); exit_code = EXIT_FAILURE; break; } close(fd_tree); ret = umount2(target, MNT_DETACH); if (ret < 0) { fprintf(stderr, "%m - Failed to unmount %s", target); exit_code = EXIT_FAILURE; break; } } (void)unlinkat(dfd, "detached-move-mount", AT_REMOVEDIR); close(dfd); exit(exit_code); } and wait for the kernel to refuse any new mounts by returning ENOSPC. How many iterations are needed depends on the number of mounts in your system. Assuming you have something like 50 mounts on a standard system it should be almost instantaneous. The root cause of this is that detached mounts aren't handled correctly when source and target mount are identical and reside on a shared mount causing a broken mount tree where the detached source itself is propagated which propagation prevents for regular bind-mounts and new mounts. This ultimately leads to a miscalculation of the number of mounts in the mount namespace. Detached mounts created via open_tree(fd, path, OPEN_TREE_CLONE) are essentially like an unattached new mount, or an unattached bind-mount. They can then later on be attached to the filesystem via move_mount() which calls into attach_recursive_mount(). Part of attaching it to the filesystem is making sure that mounts get correctly propagated in case the destination mountpoint is MS_SHARED, i.e. is a shared mountpoint. This is done by calling into propagate_mnt() which walks the list of peers calling propagate_one() on each mount in this list making sure it receives the propagation event. The propagate_one() functions thereby skips both new mounts and bind mounts to not propagate them "into themselves". Both are identified by checking whether the mount is already attached to any mount namespace in mnt->mnt_ns. The is what the IS_MNT_NEW() helper is responsible for. However, detached mounts have an anonymous mount namespace attached to them stashed in mnt->mnt_ns which means that IS_MNT_NEW() doesn't realize they need to be skipped causing the mount to propagate "into itself" breaking the mount table and causing a disconnect between the number of mounts recorded as being beneath or reachable from the target mountpoint and the number of mounts actually recorded/counted in ns->mounts ultimately causing an overflow which in turn prevents any new mounts via the ENOSPC issue. So teach propagation to handle detached mounts by making it aware of them. I've been tracking this issue down for the last couple of days and then verifying that the fix is correct by unmounting everything in my current mount table leaving only /proc and /sys mounted and running the reproducer above overnight verifying the number of mounts counted in ns->mounts. With this fix the counts are correct and the ENOSPC issue can't be reproduced. This change will only have an effect on mounts created with the new mount API since detached mounts cannot be created with the old mount API so regressions are extremely unlikely. Link: https://lore.kernel.org/r/20210306101010.243666-1-christian.brauner@ubuntu.com Fixes: 2db154b3 ("vfs: syscall: Add move_mount(2) to move mounts around") Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: <stable@vger.kernel.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: N Weilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: bugfix bugzilla: 48265 CVE: NA -------------------------------- This fixes two problems: 1) when cpu offline, we should clear cpu mask from all associated resctrl group but not only default group. 2) when cpu online, we should set cpu mask for default group and update default group's cpus to default state if cdp on, this operation is to fill code and data fields of mpam sysregs with appropriate value. Fixes: 2e2c511ff49d ("arm64/mpam: resctrl: Handle cpuhp and resctrl_dom allocation") Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NJian Cheng <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: bugfix bugzilla: 48265 CVE: NA -------------------------------- When we support configure different types of resources for a resource, the wrong history value will be updated in the default group after remounting. e.g. > mount -t resctrl resctrl /sys/fs/resctrl/ -o mbMax,mbMin && cd resctrl/ > echo 'MBMIN:0=2;1=2;2=2;3=2' > schemata > cat schemata L3:0=7fff;1=7fff;2=7fff;3=7fff MBMAX:0=100;1=100;2=100;3=100 MBMIN:0=2;1=2;2=2;3=2 > cd .. && umount /sys/fs/resctrl/ > mount -t resctrl resctrl /sys/fs/resctrl/ -o mbMax,mbMin && cd resctrl/ && cat schemata L3:0=7fff;1=7fff;2=7fff;3=7fff MBMAX:0=100;1=100;2=100;3=100 MBMIN:0=0;1=0;2=0;3=0 > echo 'MBMAX:0=10;1=10;2=10;3=10' > schemata > cat schemata L3:0=7fff;1=7fff;2=7fff;3=7fff MBMAX:0=10;1=10;2=10;3=10 MBMIN:0=2;1=2;2=2;3=2 #update error history value When writing schemata sysfile, call path like this: resctrl_group_schemata_write() -=> resctrl_update_groups_config() -=> resctrl_group_update_domains() -=> resctrl_group_update_domain_ctrls() { .../*refresh new_ctrl array of supported conf type once for each resource*/ } We should refresh new_ctrl field in struct resctrl_staged_config by resctrl_group_init_alloc() before calling resctrl_group_update_domain_ctrls(). Fixes: 6b2471f089be ("arm64/mpam: resctrl: Support priority and hardlimit(Memory bandwidth) configuration") Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: bugfix bugzilla: 48265 CVE: NA -------------------------------- This function is called only when we mount resctrl sysfs, for error handling we need to destroy schemata list when next few steps failed after creation of schemata list. Fixes: 7e9b5caeefff ("arm64/mpam: resctrl: Add helpers for init and destroy schemata list") Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: bugfix bugzilla: 48265 CVE: NA -------------------------------- Use fs_context to parse mount options, this old process parsing from parse_rdtgroupfs_options() will be obsoleted and removed. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- Sometimes monitoring will have such anomalies: e.g. > cd /sys/fs/resctrl/ && grep . mon_data/* mon_data/mon_L3CODE_00:14336 mon_data/mon_L3CODE_01:344064 mon_data/mon_L3CODE_02:2048 mon_data/mon_L3CODE_03:27648 mon_data/mon_L3DATA_00:0 #L3DATA's monitoring data always be 0 mon_data/mon_L3DATA_01:0 mon_data/mon_L3DATA_02:0 mon_data/mon_L3DATA_03:0 mon_data/mon_MB_00:392 mon_data/mon_MB_01:552 mon_data/mon_MB_02:160 mon_data/mon_MB_03:0 If cdp on, tasks in resctrl default group with closid=0 and rmid=0 don't know how to fill proper partid_i/pmg_i and partid_d/pmg_d into MPAMx_ELx sysregs by mpam_sched_in() called by __switch_to(), it's because current cpu's default closid and rmid are also equal to 0 and to make the operation modifying configuration passed. Update per cpu default closid of none-zero value, call update_closid_rmid() to update each cpu's mpam proper MPAMx_ELx sysregs for setting partid and pmg when mounting resctrl sysfs, it looks like a practical method. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- So far there are some declarations shared by resctrlfs.c and mpam core module files under kernel/mpam directory scattered in mpam.h and resctrl.h, this is organized like this: -- asm/ +-- resctrl.h + +-- mpam.h | + +-- mpam_resource.h | | + | | | -- fs/ | | +-> mpam/ +-- resctrlfs.c <----+----+------> +-- mpam_resctrl.c ... We move this declarations shared by resctrlfs.c and mpam/ to resctrl.h and split another declarations into mpam_internal.h, also including moving mpam_resource.h to mpam/ directory, currently this is organized like this: -- asm/ +-- mpam.h +----> export to other modules(e.g. SMMU master io) +-- resctrl.h + | -- mpam/ | +-- mpam_internal.h | + +-- mpam_resource.h | | + | | | -- fs/ | +----+-> mpam/ +-- resctrlfs.c <----+-----------> +-- mpam_resctrl.c ... In this way can we build a clearer framework for MPAM usage. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- Some resource's properities such as closid and rmid are exported like Intel-RDT in our resctrl design, but there also has two main differences, one is MB(Memory Bandwidth), for we MB is also divided into two directories MB and MB_MON to show respective properties about control and monitor type as same as LxCache, another is we adopt features sysfile under resources' directories, which indicates the properties of control type of corresponding resource, for instance MB hardlimit. e.g. > mount -t resctrl resctrl /sys/fs/resctrl -o mbHdl > cd /sys/fs/resctrl/ && cat info/MB/features mbHdl@1 #indicate MBHDL setting's upper bound is 1 > cat schemata L3:0=7fff;1=7fff;2=7fff;3=7fff MB:0=100;1=100;2=100;3=100 MBHDL:0=1;1=1;2=1;3=1 Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- For MPAM, a rmid can do monitoring work only with a monitor resource allocated, we adopt a mechanism for monitor resource dynamic allocation and recycling, it is different from Intel-RDT operation who creates a kworker thread for dynamically monitoring Cache usage and checks if it is below a threshold adjustable for rmid free, for we have detected that this method will affect the cpu utilization in many cases, sometimes this influence cannot be accepted. Our method is simple, as different resource's monitor number varies, we deliever two list, one for storing rmids which has exclusive monitor resource and another for storing this rmids which have monitor resource shared, this shared monitor id always be 0. it works like this, if a new rmid apply for a resource monitor which is in used, then we put this rmid to the tail of latter list and temporarily give a default monitor id 0 util someone releases available monitor resource, if this new rmid has all resources' monitor resource needed, then it will be put into exclusive list. This implements the LRU allocation of monitor resources and give users part control rights of allocation and release, if resctrl group's quantity can be guaranteed or user don't need monitoring too many groups synchronously, this is a more appropriate way for user deployment, not only that, also can it avoid the risk of inaccuracy in monitoring when monitoring operation happen to too many groups at the same time. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- So far we use sd_closid, including {reqpartid, intpartid}, to label each resctrl group including ctrlgroup and mongroup, This can perfectly handle this case where number of reqpartid exceeds intpartid, this always happen when intpartid narrowing supported, otherwise their two are of same number. So we use excessive reqpartid to indicate (1)- how configurations can be synchronized from the configuration indexed by intpartid, not only that, (2)- take part of monitor role. But reqpartid in (2) with pmg still be scattered, So far we have not yet a right way to explain how can we use their two properly. In order to ensure their resources can be fully utilized, and given this idea from Intel-RDT's design which uses rmid for monitoring, a rmid remap matrix is delivered for transforming partid and pmg to rmid, this matrix is organized like this: [bitmap entry indexed by partid] [col pos is partid] [0] [1] [2] [3] [4] [5] occ->bitmap[:0] 1 0 0 1 1 1 bitmap[:1] 1 0 0 1 1 1 bitmap[:2] 1 1 1 1 1 1 bitmap[:3] 1 1 1 1 1 1 [row pos-1 is pmg] Calculate rmid = partid + NR_partid * pmg occ represents if this bitmap has been used by a partid, it is because a certain partid should not be accompany with a duplicated pmg for monitoring, this design easily saves a lot of space, and can also decrease time complexity of allocating and free rmid process from O(NR_partid)* O(NR_pmg) to O(NR_partid) + O(log(NR_pmg)) compared with using list. By this way, we get a continuous rmid set with upper bound(NR_pmg * NR_partid - 1), given an rmid we can assume that if it's a valid rmid by judging whether it falls within this range or not. rmid implicts the reqpartid info, so we can use relevant helpers to get this reqpartid for sd_closid@reqpartid and perfectly accomplish this configuration sync mission, this also makes closid simpler which can be consists of intpartid index only, also each resctrl group is happy to own consecutive rmid. This also has some profound influences, for instance for MPAM there also support SMMU io using partid and pmg, we can use a single helper mpam_rmid_to_partid_pmg() in SMMU driver to complete this remap process for rmid input from outside user space. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- Currently we use partid and pmg (Performance Monitoring Group) to filter some performance events so that the performance of a particular partid and pmg can be monitored, but pmg looks useless except for making a filter with partid, especially when pmg varies in different MPAM resources, it makes difficult to allocate pmg resource when creating new mon group in resctrl sysfs, even causes a lot of waste. So we use a software-defined sd_closid instead of 32-bit integer to label each rdtgroup (including mon group), sd_closid include intpartid for allocation and reqpartid for synchronizing configuration and monitoring, Given MPAM has narrowing feature, also includes the concept (hw_reqpartid, hw_intpartid we named), when narrowing is not supported, number of intpartid and reqpartid equals to hw_reqpartid, otherwise intpartid and reqpartid is related to minimum number of both hw_reqpartid and hw_intpartid supported across different resources, by using this way, not only we solve above problem but also use relax reqpartid for creating new mon group. additionally, pmg is also preferred when it is available. e.g. hw_intpartid: 0 1 2 3 4 5 6 7 hw_reqpartid: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | | | | | | | | | | | | | | | | | | | | | | | | resctrl ctrl group: p0 p1 p2 p3 p4 p5 p6 p7 | | | | | | | | | | resctrl mon group: | +-----------------------m4 m5 m6 m7 +-----------------m0 m1 m2 m3 In this case, use extra reqpartid to create m0, m1, m2, m3 mon group for p2 ctrl group, and m4, m5, m6, m7 for p4. As we know reqpartid both supports allocating and monitoring filter, we should synchronize config of ctrl group with child mon groups under this design, each mon group's configuration indexed by a reqpartid that called slave is closely following it's father ctrl group that called master whenever configuration changes. not only that, we let task_struct keep both intpartid and reqpartid so we can know if tasks belong to a same ctrl group through intpartid and change cpu's partid by writing MPAMx_ELx through reqpartid when tasks switching. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- Code in resctrlfs.c is not shared with x86 RDT currently, but may be updated to support both in the future, so remove unrelated CONFIG for now to make code clearer. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- We redesign monitoring process for user, as following illustrates: e.g. before rewriting: mount /sys/fs/resctrl && cd /sys/fs/resctrl mkdir p1 && cd p1 echo 1 > ctrlmon # this allocates a monitor resource for this group ... # associating task/cpu with this group grep . mon_data/* # get monitor data from mon_data directory e.g. after rewriting: mount /sys/fs/resctrl && cd /sys/fs/resctrl mkdir p1 && cd p1 # automically allocating a monitoring resource ... # associate task/cpu with this group grep . mon_data/* # directly get monitor data ctrlmon is used for manually allocating a monitor resource for monitoring a specified group (labeled by partid and pmg), we delete ctrlmon because this action is redundant. User should know which group has been allocated a available monitor resource and only this monitor resource is released then this monitor resource can be reallocated to a new group after, this action is redundant and unnecessary, as monitor resource is used only when monitoring process happens, so a relax monitor resource can be allocated to multiple groups and take effect when monitoring process happened. But should some restrictions be known, a monitor resource for monitoring Cache-occupancy might be kept for a long time until it doesn't need to be use anymore, or below a threshold as like intel-RDT limbo list works, otherwise you may see that the monitoring result is very small beyond exception when you force switch one mon resource from one group to another. We deliver a simple LRU mon resource allocation mechanism, but so far it just assign a monitor according to the order in which groups was created, this is incomplete and needs subsequent improvement. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- Replace u32 bitmask with bitmap for closid allocation, it's because closid may be too large to use 32 bits. This also support cdp, when cdp is enabled, closid will be assigned twice once time, giving closid to code LxCache and closid+1 to data LxDATA, so do free process. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- Add a schema list for each rdt domain, we use this list to store changes from schemata row instead of previous ctrlval array live in resctrl resource structure, when mounting resctrl sysfs happened, we would reset all resource's configuration into default by resctrl_group_update_domains(). Currently each row in schemata sysfile occupy a list node, this may be extended for perfecting control types. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- Initialize schemata list when mount resctrl sysfs and destroy it when umount, each list node contains the value updated by schemata (in resctrl sysfs) row. Partial code is borrowed from 250656171d95 ("x86/resctrl: Stop using Lx CODE/DATA resources"), as it illustrates: Now that CDP enable/disable is global, and the closid offset correction is based on the configuration being applied, we are using different hw_closid slots in the ctrl array for CODE/DATA schema. This lets us merge them using the same Lx resource twice for CDP's CODE/DATA schema. This keeps the illusion of separate caches in the resctrl code. When CDP is enabled for a cache, create two schema generating the names and setting the configuration type. We can now remove the initialisation of the illusionary hw_resources: 'cdp_capable' just requires setting a flag, resctrl knows what to do from there. Link: http://www.linux-arm.org/git?p=linux-jm.git;a=commit;h=250656171d95dea079cc661098a0984e7237aa25Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: feature feature: ARM MPAM support bugzilla: 48265 CVE: NA -------------------------------- We now bridge resctrl intermediate processing module and mpam devices module, a large block of code refer to configuration and monitoring process involved need to be modified. We change the previous method where straightly writing MSCs' registers, this jobs are handed over to helpers offered by mpam devices module instead, when configuration or monitoring action happened, each domains' ctrlval array changed by resctrl sysfs input would be updated into mpam config structure live in each mpam component structure, relevant helpers provided by mpam devices module will soon accomplish the remaining jobs. Comparasion: configuration or monitoring old new + + | | | +---------+------------+ | | intermediate helpers | | +---------+------------+ | | | | +--+-----------------+----+ | [reading writing MMIO] | +-------------------------+ So far we nearly accomplish the mission that open up process between resctrl sysfs and mpam devices module but still incomplete currently, also some proper actions are needed after. Also this moves relevant structures such as struct mongroup to suitable place,. Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang ShaoBo 提交于
hulk inclusion category: bugfix bugzilla: 48265 CVE: NA -------------------------------- There are two problems related to schemata: 1) When rmdir a group and then mkdir a new group under resctrl root directory, the new group still inherits the schemata configuration from old. e.g. > mount -t resctrl resctrl /sys/fs/resctrl > cd /sys/fs/resctrl > mkdir p1 && cd p1 > echo 'L3:0=7f' > schemata > cd .. && rmdir p1 && mkdir p1 && cd p1 > cat schemata L3:0=7f;1=7fff;2=7fff;3=7fff MB:0=100;1=100;2=100;3=100 2) It still exists when umount /sys/fs/resctrl and remount. e.g. > mount -t resctrl resctrl /sys/fs/resctrl > cd /sys/fs/resctrl > echo 'L3:0=7f' > schemata > umount /sys/fs/resctrl > mount -t resctrl resctrl /sys/fs/resctrl > cat schemata L3:0=7f;1=7fff;2=7fff;3=7fff MB:0=100;1=100;2=100;3=100 Firstly we make each resctrl resource obtains their corresponding default configuration. NOTE we use zero to initialize L3 default value instead of max cpbm bits, as zero configurarion equals to maximum configuration for L3 MSCs. And we use max-percentage masks of max bandwidth to generate maximum configuration for MB. Then we reset resources' configuration settings to default value and back MSCs to default state, when mkdir or umount happended. Fixes: caf75b6b2540 ("resctrlfs: mpam: init struct for mpam") Fixes: 916dd9321e3c ("resctrlfs: init support resctrlfs") Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Xie XiuQi 提交于
hulk inclusion category: bugfix bugzilla: 48265 CVE: NA -------------------------------- Rewrite the source file's licence of mpam feature. Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-