- 22 1月, 2018 33 次提交
-
-
由 Qu Wenruo 提交于
Since tree-checker has verified leaf when reading from disk, we don't need the existing verify_dir_item() or btrfs_is_name_len_valid() checks. Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Add checker for dir item, for key types DIR_ITEM, DIR_INDEX and XATTR_ITEM. This checker does comprehensive checks for: 1) dir_item header and its data size Against item boundary and maximum name/xattr length. This part is mostly the same as old verify_dir_item(). 2) dir_type Against maximum file types, and against key type. Since XATTR key should only have FT_XATTR dir item, and normal dir item type should not have XATTR key. The check between key->type and dir_type is newly introduced by this patch. 3) name hash For XATTR and DIR_ITEM key, key->offset is name hash (crc32c). Check the hash of the name against the key to ensure it's correct. The name hash check is only found in btrfs-progs before this patch. Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NSu Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
This callback is called directly from VFS, no locks are held at the allocation time. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
There's only one callsite with GFP_NOFS. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
All callers use GFP_NOFS, we don't have to pass it as an argument. The built-in tests pass GFP_KERNEL, but they run only at module load time and NOFS works there as well. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Use __clear_extent_bit directly in case we want to pass unknown gfp flags. Otherwise all clear_extent_bit callers use GFP_NOFS, so we can sink them to the function and reduce argument count, at the cost that __clear_extent_bit has to be exported. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
We take the fs_devices::device_list_mutex mutex in write_all_supers which will prevent any add/del changes to the device list. Therefore we don't need to use the RCU variant list_for_each_entry_rcu in any of the called functions. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
We don't need to use the mutex as we do not modify the devices nor the list itself and just read information about device counts. Move copying fsid out of the protected section, not applicable to RCU same as the rest of the retrieved information. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
We don't need to use the mutex as we do not modify the devices nor the list itself and just read some information: does not change during device lifetime: - devid - uuid - name (ie. the path) may change in parallel to the ioctl call, but can lead only to reporting inacurracy: - bytes_used - total_bytes Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Split the conditions a bit. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Overview of the main locks protecting various device-related structures. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Reviewed-by: NAnand Jain <anand.jain@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Reviewed-by: NAnand Jain <anand.jain@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
A helper to free a device and all it's dynamically allocated members, like the rcu_string name or flush_bio. This is going to replace all open coded places. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 David Sterba 提交于
Make it clear that it is an RCU helper, we want to use the name free_device for a wrapper freeing all device members. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
These rules have been hidden in several if-else and are not straightforward to follow, for example, dio submit hook's nocsum case has a bug , i.e. doing async submit instead of sync submit, which has been fixed recently. This is documenting the rules for reference. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
After commit 996478ca ("btrfs: change how we decide to commit transactions during flushing") there is no need to hold the delayed_rsv during the percpu_counter_compare call since we get the byte's snapshot earlier. So hold the lock only while reading delayed_rsv. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
Adding __init macro gives kernel a hint that this function is only used during the initialization phase and its memory resources can be freed up after. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
Function btrfs_add_device() is adding the device item so rename to reflect that in the function. Similarly we have btrfs_rm_dev_item(). Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
btrfs_create_tree() will unconditionally generate UUID for any root. So for quota tree and data reloc tree created by kernel, they will have unique UUIDs. However UUID in root item is only referred by UUID tree, which only records UUID for fs trees. This makes unique UUIDs for quota/data reloc tree meaningless. Leave the UUID as zero for non-fs tree, making btrfs-debug-tree output less confusing. Reported-by: NMisono Tomohiro <misono.tomohiro@jp.fujitsu.com> Signed-off-by: NQu Wenruo <wqu@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
A cleanup patch no functional change, we hold volume_mutex before calling btrfs_rm_device, so move it into the function itself. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Right before we go into this loop locked_end is set to alloc_end - 1 and is being used in nearby functions, no need to have exceptions. This just makes the code consistent, no functional changes. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Fallocating a file in btrfs goes through several stages. The one before actually inserting the fallocated extents is to create a qgroup reservation, covering the desired range. To this end there is a loop in btrfs_fallocate which checks to see if there are holes in the fallocated range or !PREALLOC extents past EOF and if so create qgroup reservations for them. Unfortunately, the main condition of the loop is burried right at the end of its body rather than in the actual while statement which makes it non-obvious. Fix this by moving the condition in the while statement where it belongs. No functional changes. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
It was introduced because btrfs used to do blkdev_put in a deferred work, now that btrfs has blkdev_put in place, this rcu_barrier can be removed. modprobe -r btrfs will do btrfs_cleanup_fs_uuids(), where it cleanup every %fs_devices on the list, but when we do btrfs_close_devices(), we have replaced the devices on the list with dummy ones which only have the same name and uuid, so modprobe -r btrfs will free those instead of what we were using, this change won't cause a problem for it. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ copied 2nd paragraph from mailinglist discussion ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
btrfs_balance_delayed_items is the sole caller of btrfs_wq_run_delayed_node and already includes one of the checks whether the delayed inodes should be run. On the other hand btrfs_wq_run_delayed_node duplicates that check and performs an additional one for wq congestion. Let's remove the duplicate check and move the congestion one in btrfs_balance_delayed_items, leaving btrfs_wq_run_delayed_node to only care about setting up the wq run. No functional changes. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Currently btrfs_async_run_delayed_root's implementation uses 3 goto labels to mimic the functionality of a simple do {} while loop. Refactor the function to use a do {} while construct, making intention clear and code easier to follow. No functional changes. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
The following callpath is always invoked with mirror_num set to 0, so let's remove it as an argument and directly pass 0 to __do_redpage. No functional change. extent_readpages __extent_readpages __do_contiguous_readpages __do_readpage Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
It's sole callsite was removed in a previous patch so just nuke it for good. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
As per atomic_t.txt documentation : - RMW operations that have a return value are fully ordered; atomic_xchg is one such operation so it already includes everything it needs w.r.t memory ordering and add a comment to be more explicit about that. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Commit addc3fa7 ("Btrfs: Fix the problem that the dirty flag of dev stats is cleared") reworked the way device stats changes are tracked. A new atomic dev_stats_ccnt counter was introduced which is incremented every time any of the device stats counters are changed. This serves as a flag whether there are any pending stats changes. However, this patch only partially implemented the correct memory barriers necessary: - It only ordered the stores to the counters but not the reads e.g. btrfs_run_dev_stats - It completely omitted any comments documenting the intended design and how the memory barriers pair with each-other This patch provides the necessary comments as well as adds a missing smp_rmb in btrfs_run_dev_stats. Furthermore since dev_stats_cnt is only a snapshot at best there was no point in reading the counter twice - once in btrfs_dev_stats_dirty and then again when assigning stats_cnt. Just collapse both reads into 1. Fixes: addc3fa7 ("Btrfs: Fix the problem that the dirty flag of dev stats is cleared") Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
btrfs_end_bio() is using btrfs_dev_stat_inc() and then btrfs_dev_stat_print_on_error() separately instead use btrfs_dev_stat_inc_and_print() directly. As of now there isn't any bio in btrfs which is - a non-empty write and also the REQ_PREFLUSH flag is set. So in actual the condition if (bio->bi_opf & REQ_PREFLUSH) is never true in btrfs_end_bio(), and so there won't be any redundant error log by using btrfs_dev_stat_inc_and_print() separately one for write and another for flush. This consolidation will help to add the device critical error handles in the function btrfs_dev_stat_inc_and_print() and which can be renamed as needed. Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
It's pointless to defer it to a kthread helper as we're not under a special context. For reference, commit 1f78160c ("Btrfs: using rcu lock in the reader side of devices list") introduced RCU freeing for device structures. Originally the blkdev_put was called from free_device and rcu_barrier had to be called. This is no longer required, bdev and our device structures are now freed separately. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ enhance changelog ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Liu Bo 提交于
In functions like btrfs_create(), we run both btrfs_balance_delayed_items() and btrfs_btree_balance_dirty() after the operation, but btrfs_btree_balance_dirty() is surely going to run btrfs_balance_delayed_items(). This keeps only btrfs_btree_balance_dirty(). Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 20 1月, 2018 1 次提交
-
-
由 Alexey Dobriyan 提交于
do_task_stat() accesses IP and SP of a task without bumping reference count of a stack (which became an entity with independent lifetime at some point). Steps to reproduce: #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/time.h> #include <sys/resource.h> #include <unistd.h> #include <sys/wait.h> int main(void) { setrlimit(RLIMIT_CORE, &(struct rlimit){}); while (1) { char buf[64]; char buf2[4096]; pid_t pid; int fd; pid = fork(); if (pid == 0) { *(volatile int *)0 = 0; } snprintf(buf, sizeof(buf), "/proc/%u/stat", pid); fd = open(buf, O_RDONLY); read(fd, buf2, sizeof(buf2)); close(fd); waitpid(pid, NULL, 0); } return 0; } BUG: unable to handle kernel paging request at 0000000000003fd8 IP: do_task_stat+0x8b4/0xaf0 PGD 800000003d73e067 P4D 800000003d73e067 PUD 3d558067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 1417 Comm: a.out Not tainted 4.15.0-rc8-dirty #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc27 04/01/2014 RIP: 0010:do_task_stat+0x8b4/0xaf0 Call Trace: proc_single_show+0x43/0x70 seq_read+0xe6/0x3b0 __vfs_read+0x1e/0x120 vfs_read+0x84/0x110 SyS_read+0x3d/0xa0 entry_SYSCALL_64_fastpath+0x13/0x6c RIP: 0033:0x7f4d7928cba0 RSP: 002b:00007ffddb245158 EFLAGS: 00000246 Code: 03 b7 a0 01 00 00 4c 8b 4c 24 70 4c 8b 44 24 78 4c 89 74 24 18 e9 91 f9 ff ff f6 45 4d 02 0f 84 fd f7 ff ff 48 8b 45 40 48 89 ef <48> 8b 80 d8 3f 00 00 48 89 44 24 20 e8 9b 97 eb ff 48 89 44 24 RIP: do_task_stat+0x8b4/0xaf0 RSP: ffffc90000607cc8 CR2: 0000000000003fd8 John Ogness said: for my tests I added an else case to verify that the race is hit and correctly mitigated. Link: http://lkml.kernel.org/r/20180116175054.GA11513@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Reported-by: N"Kohli, Gaurav" <gkohli@codeaurora.org> Tested-by: NJohn Ogness <john.ogness@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 1月, 2018 1 次提交
-
-
由 Andrea Arcangeli 提交于
The previous fix in commit 384632e6 ("userfaultfd: non-cooperative: fix fork use after free") corrected the refcounting in case of UFFD_EVENT_FORK failure for the fork userfault paths. That still didn't clear the vma->vm_userfaultfd_ctx of the vmas that were set to point to the aborted new uffd ctx earlier in dup_userfaultfd. Link: http://lkml.kernel.org/r/20171223002505.593-2-aarcange@redhat.comSigned-off-by: NAndrea Arcangeli <aarcange@redhat.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Reviewed-by: NMike Rapoport <rppt@linux.vnet.ibm.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 1月, 2018 1 次提交
-
-
由 Kees Cook 提交于
This is a logical revert of commit e37fdb78 ("exec: Use secureexec for setting dumpability") This weakens dumpability back to checking only for uid/gid changes in current (which is useless), but userspace depends on dumpability not being tied to secureexec. https://bugzilla.redhat.com/show_bug.cgi?id=1528633Reported-by: NTom Horsley <horsley1953@gmail.com> Fixes: e37fdb78 ("exec: Use secureexec for setting dumpability") Cc: stable@vger.kernel.org Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 1月, 2018 4 次提交
-
-
由 Darrick J. Wong 提交于
Fix some integer overflow problems if offset + count happen to be large enough to cause an integer overflow. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Aliaksei Karaliou 提交于
xfs_qm_init_quotainfo() does not check result of register_shrinker() which was tagged as __must_check recently, reported by sparse. Signed-off-by: NAliaksei Karaliou <akaraliou.dev@gmail.com> [darrick: move xfs_qm_destroy_quotainos nearer xfs_qm_init_quotainos] Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Aliaksei Karaliou 提交于
xfs_qm_destroy_quotainfo() does not destroy quotainfo->qi_tree_lock while destroys quotainfo->qi_quotaofflock. Signed-off-by: NAliaksei Karaliou <akaraliou.dev@gmail.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Chris Mason 提交于
refcounts have a generic implementation and an asm optimized one. The generic version has extra debugging to make sure that once a refcount goes to zero, refcount_inc won't increase it. The btrfs delayed inode code wasn't expecting this, and we're tripping over the warnings when the generic refcounts are used. We ended up with this race: Process A Process B btrfs_get_delayed_node() spin_lock(root->inode_lock) radix_tree_lookup() __btrfs_release_delayed_node() refcount_dec_and_test(&delayed_node->refs) our refcount is now zero refcount_add(2) <--- warning here, refcount unchanged spin_lock(root->inode_lock) radix_tree_delete() With the generic refcounts, we actually warn again when process B above tries to release his refcount because refcount_add() turned into a no-op. We saw this in production on older kernels without the asm optimized refcounts. The fix used here is to use refcount_inc_not_zero() to detect when the object is in the middle of being freed and return NULL. This is almost always the right answer anyway, since we usually end up pitching the delayed_node if it didn't have fresh data in it. This also changes __btrfs_release_delayed_node() to remove the extra check for zero refcounts before radix tree deletion. btrfs_get_delayed_node() was the only path that was allowing refcounts to go from zero to one. Fixes: 6de5f18e ("btrfs: fix refcount_t usage when deleting btrfs_delayed_node") CC: <stable@vger.kernel.org> # 4.12+ Signed-off-by: NChris Mason <clm@fb.com> Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-