- 11 4月, 2018 7 次提交
-
-
由 Trond Myklebust 提交于
Move the delegation recall out of the generic code, and into the NFSv4 specific callback. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Trond Myklebust 提交于
Move the delegation return out of generic code and down into the NFSv4 specific unlink code. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Trond Myklebust 提交于
Move the delegation return out of generic code. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Trond Myklebust 提交于
The 'fmode' argument can take an FMODE_EXEC value, which we want to filter out before comparing to the delegation type. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Jeff Layton 提交于
If we have a RECLAIM_COMPLETE with a populated cl_lock_waitq, then that implies that a reconnect has occurred. Since we can't expect a CB_NOTIFY_LOCK callback at that point, just wake up the entire queue so that all the tasks can re-poll for their locks. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Jeff Layton 提交于
The task is expected to sleep for a while here, and it's possible that a new EXCHANGE_ID has occurred in the interim, and we were assigned a new clientid. Since this is a per-client list, there isn't a lot of value in vetting the clientid on the incoming request. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Jeff Layton 提交于
We may get a notification and lose the race to another client. Ensure that we wait again for a notification in that case. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 01 3月, 2018 9 次提交
-
-
由 Chengguang Xu 提交于
There is lack of cache destroy operation for ceph_file_cachep when failing from fscache register. Signed-off-by: NChengguang Xu <cgxu519@icloud.com> Reviewed-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Filipe Manana 提交于
If we have a file with 2 (or more) hard links in the same directory, remove one of the hard links, create a new file (or link an existing file) in the same directory with the name of the removed hard link, and then finally fsync the new file, we end up with a log that fails to replay, causing a mount failure. Example: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkdir /mnt/testdir $ touch /mnt/testdir/foo $ ln /mnt/testdir/foo /mnt/testdir/bar $ sync $ unlink /mnt/testdir/bar $ touch /mnt/testdir/bar $ xfs_io -c "fsync" /mnt/testdir/bar <power failure> $ mount /dev/sdb /mnt mount: mount(2) failed: /mnt: No such file or directory When replaying the log, for that example, we also see the following in dmesg/syslog: [71813.671307] BTRFS info (device dm-0): failed to delete reference to bar, inode 258 parent 257 [71813.674204] ------------[ cut here ]------------ [71813.675694] BTRFS: Transaction aborted (error -2) [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 __btrfs_unlink_inode+0x17b/0x355 [btrfs] [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last unloaded: btrfs] [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: G W 4.15.0-rc9-btrfs-next-56+ #1 [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs] [71813.679669] RSP: 0018:ffffc90001cef738 EFLAGS: 00010286 [71813.679669] RAX: 0000000000000025 RBX: ffff880217ce4708 RCX: 0000000000000001 [71813.679669] RDX: 0000000000000000 RSI: ffffffff81c14bae RDI: 00000000ffffffff [71813.679669] RBP: ffffc90001cef7c0 R08: 0000000000000001 R09: 0000000000000001 [71813.679669] R10: ffffc90001cef5e0 R11: ffffffff8343f007 R12: ffff880217d474c8 [71813.679669] R13: 00000000fffffffe R14: ffff88021ccf1548 R15: 0000000000000101 [71813.679669] FS: 00007f7cee84c480(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000 [71813.679669] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [71813.679669] CR2: 00007f7cedc1abf9 CR3: 00000002354b4003 CR4: 00000000001606e0 [71813.679669] Call Trace: [71813.679669] btrfs_unlink_inode+0x17/0x41 [btrfs] [71813.679669] drop_one_dir_item+0xfa/0x131 [btrfs] [71813.679669] add_inode_ref+0x71e/0x851 [btrfs] [71813.679669] ? __lock_is_held+0x39/0x71 [71813.679669] ? replay_one_buffer+0x53/0x53a [btrfs] [71813.679669] replay_one_buffer+0x4a4/0x53a [btrfs] [71813.679669] ? rcu_read_unlock+0x3a/0x57 [71813.679669] ? __lock_is_held+0x39/0x71 [71813.679669] walk_up_log_tree+0x101/0x1d2 [btrfs] [71813.679669] walk_log_tree+0xad/0x188 [btrfs] [71813.679669] btrfs_recover_log_trees+0x1fa/0x31e [btrfs] [71813.679669] ? replay_one_extent+0x544/0x544 [btrfs] [71813.679669] open_ctree+0x1cf6/0x2209 [btrfs] [71813.679669] btrfs_mount_root+0x368/0x482 [btrfs] [71813.679669] ? trace_hardirqs_on_caller+0x14c/0x1a6 [71813.679669] ? __lockdep_init_map+0x176/0x1c2 [71813.679669] ? mount_fs+0x64/0x10b [71813.679669] mount_fs+0x64/0x10b [71813.679669] vfs_kern_mount+0x68/0xce [71813.679669] btrfs_mount+0x13e/0x772 [btrfs] [71813.679669] ? trace_hardirqs_on_caller+0x14c/0x1a6 [71813.679669] ? __lockdep_init_map+0x176/0x1c2 [71813.679669] ? mount_fs+0x64/0x10b [71813.679669] mount_fs+0x64/0x10b [71813.679669] vfs_kern_mount+0x68/0xce [71813.679669] do_mount+0x6e5/0x973 [71813.679669] ? memdup_user+0x3e/0x5c [71813.679669] SyS_mount+0x72/0x98 [71813.679669] entry_SYSCALL_64_fastpath+0x1e/0x8b [71813.679669] RIP: 0033:0x7f7cedf150ba [71813.679669] RSP: 002b:00007ffca71da688 EFLAGS: 00000206 [71813.679669] Code: 7f a0 e8 51 0c fd ff 48 8b 43 50 f0 0f ba a8 30 2c 00 00 02 72 17 41 83 fd fb 74 11 44 89 ee 48 c7 c7 7d 11 7f a0 e8 38 f5 8d e0 <0f> ff 44 89 e9 ba 20 10 00 00 eb 4d 48 8b 4d b0 48 8b 75 88 4c [71813.679669] ---[ end trace 83bd473fc5b4663b ]--- [71813.854764] BTRFS: error (device dm-0) in __btrfs_unlink_inode:4128: errno=-2 No such entry [71813.886994] BTRFS: error (device dm-0) in btrfs_replay_log:2307: errno=-2 No such entry (Failed to recover log tree) [71813.903357] BTRFS error (device dm-0): cleaner transaction attach returned -30 [71814.128078] BTRFS error (device dm-0): open_ctree failed This happens because the log has inode reference items for both inode 258 (the first file we created) and inode 259 (the second file created), and when processing the reference item for inode 258, we replace the corresponding item in the subvolume tree (which has two names, "foo" and "bar") witht he one in the log (which only has one name, "foo") without removing the corresponding dir index keys from the parent directory. Later, when processing the inode reference item for inode 259, which has a name of "bar" associated to it, we notice that dir index entries exist for that name and for a different inode, so we attempt to unlink that name, which fails because the inode reference item for inode 258 no longer has the name "bar" associated to it, making a call to btrfs_unlink_inode() fail with a -ENOENT error. Fix this by unlinking all the names in an inode reference item from a subvolume tree that are not present in the inode reference item found in the log tree, before overwriting it with the item from the log tree. Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Filipe Manana 提交于
If in the same transaction we rename a special file (fifo, character/block device or symbolic link), create a hard link for it having its old name then sync the log, we will end up with a log that can not be replayed and at when attempting to replay it, an EEXIST error is returned and mounting the filesystem fails. Example scenario: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt $ mkdir /mnt/testdir $ mkfifo /mnt/testdir/foo # Make sure everything done so far is durably persisted. $ sync # Create some unrelated file and fsync it, this is just to create a log # tree. The file must be in the same directory as our special file. $ touch /mnt/testdir/f1 $ xfs_io -c "fsync" /mnt/testdir/f1 # Rename our special file and then create a hard link with its old name. $ mv /mnt/testdir/foo /mnt/testdir/bar $ ln /mnt/testdir/bar /mnt/testdir/foo # Create some other unrelated file and fsync it, this is just to persist # the log tree which was modified by the previous rename and link # operations. Alternatively we could have modified file f1 and fsync it. $ touch /mnt/f2 $ xfs_io -c "fsync" /mnt/f2 <power failure> $ mount /dev/sdc /mnt mount: mount /dev/sdc on /mnt failed: File exists This happens because when both the log tree and the subvolume's tree have an entry in the directory "testdir" with the same name, that is, there is one key (258 INODE_REF 257) in the subvolume tree and another one in the log tree (where 258 is the inode number of our special file and 257 is the inode for directory "testdir"). Only the data of those two keys differs, in the subvolume tree the index field for inode reference has a value of 3 while the log tree it has a value of 5. Because the same key exists in both trees, but have different index, the log replay fails with an -EEXIST error when attempting to replay the inode reference from the log tree. Fix this by setting the last_unlink_trans field of the inode (our special file) to the current transaction id when a hard link is created, as this forces logging the parent directory inode, solving the conflict at log replay time. A new generic test case for fstests was also submitted. Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Filipe Manana 提交于
When doing an incremental send of a filesystem with the no-holes feature enabled, we end up issuing a write operation when using the no data mode send flag, instead of issuing an update extent operation. Fix this by issuing the update extent operation instead. Trivial reproducer: $ mkfs.btrfs -f -O no-holes /dev/sdc $ mkfs.btrfs -f /dev/sdd $ mount /dev/sdc /mnt/sdc $ mount /dev/sdd /mnt/sdd $ xfs_io -f -c "pwrite -S 0xab 0 32K" /mnt/sdc/foobar $ btrfs subvolume snapshot -r /mnt/sdc /mnt/sdc/snap1 $ xfs_io -c "fpunch 8K 8K" /mnt/sdc/foobar $ btrfs subvolume snapshot -r /mnt/sdc /mnt/sdc/snap2 $ btrfs send /mnt/sdc/snap1 | btrfs receive /mnt/sdd $ btrfs send --no-data -p /mnt/sdc/snap1 /mnt/sdc/snap2 \ | btrfs receive -vv /mnt/sdd Before this change the output of the second receive command is: receiving snapshot snap2 uuid=f6922049-8c22-e544-9ff9-fc6755918447... utimes write foobar, offset 8192, len 8192 utimes foobar BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=f6922049-8c22-e544-9ff9-... After this change it is: receiving snapshot snap2 uuid=564d36a3-ebc8-7343-aec9-bf6fda278e64... utimes update_extent foobar: offset=8192, len=8192 utimes foobar BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=564d36a3-ebc8-7343-aec9-bf6fda278e64... Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Anand Jain 提交于
The fs_info::super_copy is a byte copy of the on-disk structure and all members must use the accessor macros/functions to obtain the right value. This was missing in update_super_roots and in sysfs readers. Moving between opposite endianness hosts will report bogus numbers in sysfs, and mount may fail as the root will not be restored correctly. If the filesystem is always used on a same endian host, this will not be a problem. Fix this by using the btrfs_set_super...() functions to set fs_info::super_copy values, and for the sysfs, use the cached fs_info::nodesize/sectorsize values. CC: stable@vger.kernel.org Fixes: df93589a ("btrfs: export more from FS_INFO to sysfs") Signed-off-by: NAnand Jain <anand.jain@oracle.com> Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Hans van Kranenburg 提交于
In case of using DUP, we search for enough unallocated disk space on a device to hold two stripes. The devices_info[ndevs-1].max_avail that holds the amount of unallocated space found is directly assigned to stripe_size, while it's actually twice the stripe size. Later on in the code, an unconditional division of stripe_size by dev_stripes corrects the value, but in the meantime there's a check to see if the stripe_size does not exceed max_chunk_size. Since during this check stripe_size is twice the amount as intended, the check will reduce the stripe_size to max_chunk_size if the actual correct to be used stripe_size is more than half the amount of max_chunk_size. The unconditional division later tries to correct stripe_size, but will actually make sure we can't allocate more than half the max_chunk_size. Fix this by moving the division by dev_stripes before the max chunk size check, so it always contains the right value, instead of putting a duct tape division in further on to get it fixed again. Since in all other cases than DUP, dev_stripes is 1, this change only affects DUP. Other attempts in the past were made to fix this: * 37db63a4 "Btrfs: fix max chunk size check in chunk allocator" tried to fix the same problem, but still resulted in part of the code acting on a wrongly doubled stripe_size value. * 86db2578 "Btrfs: fix max chunk size on raid5/6" unintentionally broke this fix again. The real problem was already introduced with the rest of the code in 73c5de00. The user visible result however will be that the max chunk size for DUP will suddenly double, while it's actually acting according to the limits in the code again like it was 5 years ago. Reported-by: NNaohiro Aota <naohiro.aota@wdc.com> Link: https://www.spinics.net/lists/linux-btrfs/msg69752.html Fixes: 73c5de00 ("btrfs: quasi-round-robin for chunk allocation") Fixes: 86db2578 ("Btrfs: fix max chunk size on raid5/6") Signed-off-by: NHans van Kranenburg <hans.van.kranenburg@mendix.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> [ update comment ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
Essentially duplicate the error handling from the above block which handles the !PageUptodate(page) case and additionally clear EXTENT_BOUNDARY. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Nikolay Borisov 提交于
add_pending_csums was added as part of the new data=ordered implementation in e6dcd2dc ("Btrfs: New data=ordered implementation"). Even back then it called the btrfs_csum_file_blocks which can fail but it never bothered handling the failure. In ENOMEM situation this could lead to the filesystem failing to write the checksums for a particular extent and not detect this. On read this could lead to the filesystem erroring out due to crc mismatch. Fix it by propagating failure from add_pending_csums and handling them. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NJosef Bacik <jbacik@fb.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Jeff Mahoney 提交于
The srcu_struct in btrfs_fs_info scales in size with NR_CPUS. On kernels built with NR_CPUS=8192, this can result in kmalloc failures that prevent mounting. There is work in progress to try to resolve this for every user of srcu_struct but using kvzalloc will work around the failures until that is complete. As an example with NR_CPUS=512 on x86_64: the overall size of subvol_srcu is 3460 bytes, fs_info is 6496. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 27 2月, 2018 5 次提交
-
-
由 Chengguang Xu 提交于
When specifying string type mount option (e.g., logdev) several times in a mount, current option parsing may cause memory leak. Hence, call kfree for previous one in this case. Signed-off-by: NChengguang Xu <cgxu519@icloud.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Jan Kara 提交于
When blkdev_open() races with device removal and creation it can happen that unhashed bdev inode gets associated with newly created gendisk like: CPU0 CPU1 blkdev_open() bdev = bd_acquire() del_gendisk() bdev_unhash_inode(bdev); remove device create new device with the same number __blkdev_get() disk = get_gendisk() - gets reference to gendisk of the new device Now another blkdev_open() will not find original 'bdev' as it got unhashed, create a new one and associate it with the same 'disk' at which point problems start as we have two independent page caches for one device. Fix the problem by verifying that the bdev inode didn't get unhashed before we acquired gendisk reference. That way we make sure gendisk can get associated only with visible bdev inodes. Tested-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jan Kara 提交于
When two blkdev_open() calls race with device removal and recreation, __blkdev_get() can use looked up gendisk after it is freed: CPU0 CPU1 CPU2 del_gendisk(disk); bdev_unhash_inode(inode); blkdev_open() blkdev_open() bdev = bd_acquire(inode); - creates and returns new inode bdev = bd_acquire(inode); - returns the same inode __blkdev_get(devt) __blkdev_get(devt) disk = get_gendisk(devt); - got structure of device going away <finish device removal> <new device gets created under the same device number> disk = get_gendisk(devt); - got new device structure if (!bdev->bd_openers) { does the first open } if (!bdev->bd_openers) - false } else { put_disk_and_module(disk) - remember this was old device - this was last ref and disk is now freed } disk_unblock_events(disk); -> oops Fix the problem by making sure we drop reference to disk in __blkdev_get() only after we are really done with it. Reported-by: NHou Tao <houtao1@huawei.com> Tested-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jan Kara 提交于
Add a proper counterpart to get_disk_and_module() - put_disk_and_module(). Currently it is opencoded in several places. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jan Kara 提交于
Commit e864f395 "fs: add RWF_DSYNC aand RWF_SYNC" added additional way for direct IO to become synchronous and thus trigger fsync from the IO completion handler. Then commit 9830f4be "fs: Use RWF_* flags for AIO operations" allowed these flags to be set for AIO as well. However that commit forgot to update the condition checking whether the IO completion handling should be defered to a workqueue and thus AIO DIO with RWF_[D]SYNC set will call fsync() from IRQ context resulting in sleep in atomic. Fix the problem by checking directly iocb flags (the same way as it is done in dio_complete()) instead of checking all conditions that could lead to IO being synchronous. CC: Christoph Hellwig <hch@lst.de> CC: Goldwyn Rodrigues <rgoldwyn@suse.com> CC: stable@vger.kernel.org Reported-by: NMark Rutland <mark.rutland@arm.com> Tested-by: NMark Rutland <mark.rutland@arm.com> Fixes: 9830f4beSigned-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 26 2月, 2018 3 次提交
-
-
由 Chengguang Xu 提交于
When failing from ceph_fs_debugfs_init() in ceph_real_mount(), there is lack of dput of root_dentry and it causes slab errors, so change the calling order of ceph_fs_debugfs_init() and open_root_dentry() and do some cleanups to avoid this issue. Signed-off-by: NChengguang Xu <cgxu519@icloud.com> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Chengguang Xu 提交于
When parsing string option, in order to avoid memory leak we need to carefully free it first in case of specifying same option several times. Signed-off-by: NChengguang Xu <cgxu519@icloud.com> Reviewed-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Zhi Zhang 提交于
Client should release unlinked inode from its cache ASAP. But client can't release inode with dirty caps. Link: http://tracker.ceph.com/issues/22886Signed-off-by: NZhi Zhang <zhang.david2011@gmail.com> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 23 2月, 2018 6 次提交
-
-
由 Darrick J. Wong 提交于
During log recovery, the per-AG reservations aren't yet set up, so log recovery has to reserve enough blocks to handle all possible btree splits. Reported-by: NDave Chinner <david@fromorbit.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Eric Sandeen 提交于
Apparently different gcc versions have competing and incompatible notions of how to initialize at declaration, so just give up and fall back to the time-tested memset(). Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Randy Dunlap 提交于
Fix build error in fs/signalfd.c by using same method that is used in kernel/signal.c: separate blocks for different signal si_code values. ./fs/signalfd.c: error: 'BUS_MCEERR_AR' undeclared (first use in this function) Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
由 Luck, Tony 提交于
Each read from a file in efivarfs results in two calls to EFI (one to get the file size, another to get the actual data). On X86 these EFI calls result in broadcast system management interrupts (SMI) which affect performance of the whole system. A malicious user can loop performing reads from efivarfs bringing the system to its knees. Linus suggested per-user rate limit to solve this. So we add a ratelimit structure to "user_struct" and initialize it for the root user for no limit. When allocating user_struct for other users we set the limit to 100 per second. This could be used for other places that want to limit the rate of some detrimental user action. In efivarfs if the limit is exceeded when reading, we take an interruptible nap for 50ms and check the rate limit again. Signed-off-by: NTony Luck <tony.luck@intel.com> Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Colin Ian King 提交于
The structure nlmclnt_fl_close_lock_ops s local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: fs/nfs/nfs3proc.c:876:33: warning: symbol 'nlmclnt_fl_close_lock_ops' was not declared. Should it be static? Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Bill.Baker@oracle.com 提交于
nfs4_update_server unconditionally releases the nfs_client for the source server. If migration fails, this can cause the source server's nfs_client struct to be left with a low reference count, resulting in use-after-free. Also, adjust reference count handling for ELOOP. NFS: state manager: migration failed on NFSv4 server nfsvmu10 with error 6 WARNING: CPU: 16 PID: 17960 at fs/nfs/client.c:281 nfs_put_client+0xfa/0x110 [nfs]() nfs_put_client+0xfa/0x110 [nfs] nfs4_run_state_manager+0x30/0x40 [nfsv4] kthread+0xd8/0xf0 BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8 nfs4_xdr_enc_write+0x6b/0x160 [nfsv4] rpcauth_wrap_req+0xac/0xf0 [sunrpc] call_transmit+0x18c/0x2c0 [sunrpc] __rpc_execute+0xa6/0x490 [sunrpc] rpc_async_schedule+0x15/0x20 [sunrpc] process_one_work+0x160/0x470 worker_thread+0x112/0x540 ? rescuer_thread+0x3f0/0x3f0 kthread+0xd8/0xf0 This bug was introduced by 32e62b7c ("NFS: Add nfs4_update_server"), but the fix applies cleanly to 52442f9b ("NFS4: Avoid migration loops") Reported-by: NHelen Chao <helen.chao@oracle.com> Fixes: 52442f9b ("NFS4: Avoid migration loops") Signed-off-by: NBill Baker <bill.baker@oracle.com> Reviewed-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 22 2月, 2018 1 次提交
-
-
由 Trond Myklebust 提交于
Passing a pointer to a unsigned integer to test_bit() is broken. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 14 2月, 2018 1 次提交
-
-
由 Andreas Gruenbacher 提交于
It turns out that commit 3974320c "Implement iomap for block_map" introduced a few bugs that trigger occasional failures with xfstest generic/476: In gfs2_iomap_begin, we jump to do_alloc when we determine that we are beyond the end of the allocated metadata (height > ip->i_height). There, we can end up calling hole_size with a metapath that doesn't match the current metadata tree, which doesn't make sense. After untangling the code at do_alloc, fix this by checking if the block we are looking for is within the range of allocated metadata. In addition, add a BUG() in case gfs2_iomap_begin is accidentally called for reading stuffed files: this is handled separately. Make sure we don't truncate iomap->length for reads beyond the end of the file; in that case, the entire range counts as a hole. Finally, revert to taking a bitmap write lock when doing allocations. It's unclear why that change didn't lead to any failures during testing. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
- 13 2月, 2018 1 次提交
-
-
由 jia zhang 提交于
Commit: df04abfd ("fs/proc/kcore.c: Add bounce buffer for ktext data") ... introduced a bounce buffer to work around CONFIG_HARDENED_USERCOPY=y. However, accessing the vsyscall user page will cause an SMAP fault. Replace memcpy() with copy_from_user() to fix this bug works, but adding a common way to handle this sort of user page may be useful for future. Currently, only vsyscall page requires KCORE_USER. Signed-off-by: Jia Zhang <zhang.jia@linux.alibaba.com> Reviewed-by: NJiri Olsa <jolsa@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1518446694-21124-2-git-send-email-zhang.jia@linux.alibaba.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 12 2月, 2018 1 次提交
-
-
由 Linus Torvalds 提交于
This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'` for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL* constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 2月, 2018 5 次提交
-
-
由 Nicolas Pitre 提交于
Commit b9f5fb18 ("cramfs: fix MTD dependency") did what it says. Since commit 9059a349 ("kconfig: fix relational operators for bool and tristate symbols") it is possible to do it slightly better though. Signed-off-by: NNicolas Pitre <nico@linaro.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Arnd Bergmann 提交于
There is now only one caller left for svcxdr_dupstr() and this is inside of an #ifdef, so we can get a warning when the option is disabled: fs/nfsd/nfs4xdr.c:241:1: error: 'svcxdr_dupstr' defined but not used [-Werror=unused-function] This changes the remaining caller to use a nicer IS_ENABLED() check, which lets the compiler drop the unused code silently. Fixes: e40d99e6183e ("NFSD: Clean up symlink argument XDR decoders") Suggested-by: NRasmus Villemoes <rasmus.villemoes@prevas.dk> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Amir Goldstein 提交于
The time values in stat and inode may differ for overlayfs and stat time values are the correct ones to use. This is also consistent with the fact that fill_post_wcc() also stores stat time values. This means introducing a stat call that could fail, where previously we were just copying values out of the inode. To be conservative about changing behavior, we fall back to copying values out of the inode in the error case. It might be better just to clear fh_pre_saved (though note the BUG_ON in set_change_info). Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Amir Goldstein 提交于
The values of stat->mtime and inode->i_mtime may differ for overlayfs and stat->mtime is the correct value to use when encoding getattr. This is also consistent with the fact that other attr times are also encoded from stat values. Both callers of lease_get_mtime() already have the value of stat->mtime, so the only needed change is that lease_get_mtime() will not overwrite this value with inode->i_mtime in case the inode does not have an exclusive lease. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Reviewed-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
A client that sends more than a hundred ops in a single compound currently gets an rpc-level GARBAGE_ARGS error. It would be more helpful to return NFS4ERR_RESOURCE, since that gives the client a better idea how to recover (for example by splitting up the compound into smaller compounds). This is all a bit academic since we've never actually seen a reason for clients to send such long compounds, but we may as well fix it. While we're there, just use NFSD4_MAX_OPS_PER_COMPOUND == 16, the constant we already use in the 4.1 case, instead of hard-coding 100. Chances anyone actually uses even 16 ops per compound are small enough that I think there's a neglible risk or any regression. This fixes pynfs test COMP6. Reported-by: N"Lu, Xinyu" <luxy.fnst@cn.fujitsu.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 07 2月, 2018 1 次提交
-
-
由 Steve French 提交于
The last two updates to MS-SMB2 protocol documentation added various flags and structs (especially relating to SMB3.1.1 tree connect). Add missing defines and structs to smb2pdu.h Signed-off-by: NSteve French <smfrench@gmail.com> Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
-