- 07 9月, 2017 10 次提交
-
-
由 Yan, Zheng 提交于
Snapdir inode has no capability. __choose_mds() should choose mds base on capabilities of snapdir's parent inode. Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
In LSSNAP case, req->r_dentry is already set to snapdir dentry. Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
Ensure that when writeback errors are marked that we report those to all file descriptions that were open at the time of the error. Signed-off-by: NJeff Layton <jlayton@redhat.com> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
These flags tell mds if there is pending capsnap explicitly. Without this explicit notification, mds can only conclude if client has pending capsnap. The method mds use is inefficient and error-prone. Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yanhu Cao 提交于
startsync is a no-op, has been for years. Remove it. Link: http://tracker.ceph.com/issues/20604Signed-off-by: NYanhu Cao <gmayyyha@gmail.com> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
OSD has a configurable limitation of max write size. OSD return error if write request size is larger than the limitation. For now, set max write size to CEPH_MSG_MAX_DATA_LEN. It should be small enough. Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
libceph returns -EIO when read size > CEPH_MSG_MAX_DATA_LEN. Link: http://tracker.ceph.com/issues/20528Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 02 9月, 2017 1 次提交
-
-
由 Oleg Nesterov 提交于
The race was introduced by me in commit 971316f0 ("epoll: ep_unregister_pollwait() can use the freed pwq->whead"). I did not realize that nothing can protect eventpoll after ep_poll_callback() sets ->whead = NULL, only whead->lock can save us from the race with ep_free() or ep_remove(). Move ->whead = NULL to the end of ep_poll_callback() and add the necessary barriers. TODO: cleanup the ewake/EPOLLEXCLUSIVE logic, it was confusing even before this patch. Hopefully this explains use-after-free reported by syzcaller: BUG: KASAN: use-after-free in debug_spin_lock_before ... _raw_spin_lock_irqsave+0x4a/0x60 kernel/locking/spinlock.c:159 ep_poll_callback+0x29f/0xff0 fs/eventpoll.c:1148 this is spin_lock(eventpoll->lock), ... Freed by task 17774: ... kfree+0xe8/0x2c0 mm/slub.c:3883 ep_free+0x22c/0x2a0 fs/eventpoll.c:865 Fixes: 971316f0 ("epoll: ep_unregister_pollwait() can use the freed pwq->whead") Reported-by: N范龙飞 <long7573@126.com> Cc: stable@vger.kernel.org Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 9月, 2017 4 次提交
-
-
由 Steve French 提交于
When mounting to older servers, such as Windows XP (or even Windows 7), the limited error messages that can be passed back to user space can get confusing since the default dialect has changed from SMB1 (CIFS) to more secure SMB3 dialect. Log additional information when the user chooses to use the default dialects and when the server does not support the dialect requested. Signed-off-by: NSteve French <smfrench@gmail.com> Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com> Acked-by: NPavel Shilovsky <pshilov@microsoft.com>
-
由 Dave Kleikamp 提交于
jfs had previously avoided the use of MAX_LFS_FILESIZE because it hadn't accounted for the whole 32-bit index range on 32-bit systems. That has been fixed by commit 0cc3b0ec ("Clarify (and fix) MAX_LFS_FILESIZE macros"), so we can simplify the code now. Suggested by Andreas Dilger. Signed-off-by: NDave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Cc: jfs-discussion@lists.sourceforge.net Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jérôme Glisse 提交于
Replace all mmu_notifier_invalidate_page() calls by *_invalidate_range() and make sure it is bracketed by calls to *_invalidate_range_start()/end(). Note that because we can not presume the pmd value or pte value we have to assume the worst and unconditionaly report an invalidation as happening. Signed-off-by: NJérôme Glisse <jglisse@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Bernhard Held <berny156@gmx.de> Cc: Adam Borowski <kilobyte@angband.pl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: axie <axie@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yan, Zheng 提交于
ceph_readpage() unlocks page prematurely prematurely in the case that page is reading from fscache. Caller of readpage expects that page is uptodate when it get unlocked. So page shoule get locked by completion callback of fscache_read_or_alloc_pages() Cc: stable@vger.kernel.org # 4.1+, needs backporting for < 4.7 Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 31 8月, 2017 2 次提交
-
-
由 Steve French 提交于
Recent patch had an endian warning ie cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup() Signed-off-by: NSteve French <smfrench@gmail.com> CC: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org> Acked-by: NPavel Shilovsky <pshilov@microsoft.com>
-
由 Pavel Shilovsky 提交于
Currently the maximum size of SMB2/3 header is set incorrectly which leads to hanging of directory listing operations on encrypted SMB3 connections. Fix this by setting the maximum size to 170 bytes that is calculated as RFC1002 length field size (4) + transform header size (52) + SMB2 header size (64) + create response size (56). Cc: <stable@vger.kernel.org> Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com> Signed-off-by: NSteve French <smfrench@gmail.com> Acked-by: NSachin Prabhu <sprabhu@redhat.com>
-
- 29 8月, 2017 1 次提交
-
-
由 Helge Deller 提交于
Commit 464d6242 ("select: switch compat_{get,put}_fd_set() to compat_{get,put}_bitmap()") changed the calculation on how many bytes need to be zeroed when userspace handed over a NULL pointer for a fdset array in the select syscall. The calculation was changed in compat_get_fd_set() wrongly from memset(fdset, 0, ((nr + 1) & ~1)*sizeof(compat_ulong_t)); to memset(fdset, 0, ALIGN(nr, BITS_PER_LONG)); The ALIGN(nr, BITS_PER_LONG) calculates the number of _bits_ which need to be zeroed in the target fdset array (rounded up to the next full bits for an unsigned long). But the memset() call expects the number of _bytes_ to be zeroed. This leads to clearing more memory than wanted (on the stack area or even at kmalloc()ed memory areas) and to random kernel crashes as we have seen them on the parisc platform. The correct change should have been memset(fdset, 0, (ALIGN(nr, BITS_PER_LONG) / BITS_PER_LONG) * BYTES_PER_LONG); which is the same as can be archieved with a call to zero_fd_set(nr, fdset). Fixes: 464d6242 ("select: switch compat_{get,put}_fd_set() to compat_{get,put}_bitmap()" Acked-by: N: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NHelge Deller <deller@gmx.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 8月, 2017 1 次提交
-
-
由 Ross Zwisler 提交于
In DAX there are two separate places where the 2MiB range of a PMD is defined. The first is in the page tables, where a PMD mapping inserted for a given address spans from (vmf->address & PMD_MASK) to ((vmf->address & PMD_MASK) + PMD_SIZE - 1). That is, from the 2MiB boundary below the address to the 2MiB boundary above the address. So, for example, a fault at address 3MiB (0x30 0000) falls within the PMD that ranges from 2MiB (0x20 0000) to 4MiB (0x40 0000). The second PMD range is in the mapping->page_tree, where a given file offset is covered by a radix tree entry that spans from one 2MiB aligned file offset to another 2MiB aligned file offset. So, for example, the file offset for 3MiB (pgoff 768) falls within the PMD range for the order 9 radix tree entry that ranges from 2MiB (pgoff 512) to 4MiB (pgoff 1024). This system works so long as the addresses and file offsets for a given mapping both have the same offsets relative to the start of each PMD. Consider the case where the starting address for a given file isn't 2MiB aligned - say our faulting address is 3 MiB (0x30 0000), but that corresponds to the beginning of our file (pgoff 0). Now all the PMDs in the mapping are misaligned so that the 2MiB range defined in the page tables never matches up with the 2MiB range defined in the radix tree. The current code notices this case for DAX faults to storage with the following test in dax_pmd_insert_mapping(): if (pfn_t_to_pfn(pfn) & PG_PMD_COLOUR) goto unlock_fallback; This test makes sure that the pfn we get from the driver is 2MiB aligned, and relies on the assumption that the 2MiB alignment of the pfn we get back from the driver matches the 2MiB alignment of the faulting address. However, faults to holes were not checked and we could hit the problem described above. This was reported in response to the NVML nvml/src/test/pmempool_sync TEST5: $ cd nvml/src/test/pmempool_sync $ make TEST5 You can grab NVML here: https://github.com/pmem/nvml/ The dmesg warning you see when you hit this error is: WARNING: CPU: 13 PID: 2900 at fs/dax.c:641 dax_insert_mapping_entry+0x2df/0x310 Where we notice in dax_insert_mapping_entry() that the radix tree entry we are about to replace doesn't match the locked entry that we had previously inserted into the tree. This happens because the initial insertion was done in grab_mapping_entry() using a pgoff calculated from the faulting address (vmf->address), and the replacement in dax_pmd_load_hole() => dax_insert_mapping_entry() is done using vmf->pgoff. In our failure case those two page offsets (one calculated from vmf->address, one using vmf->pgoff) point to different order 9 radix tree entries. This failure case can result in a deadlock because the radix tree unlock also happens on the pgoff calculated from vmf->address. This means that the locked radix tree entry that we swapped in to the tree in dax_insert_mapping_entry() using vmf->pgoff is never unlocked, so all future faults to that 2MiB range will block forever. Fix this by validating that the faulting address's PMD offset matches the PMD offset from the start of the file. This check is done at the very beginning of the fault and covers faults that would have mapped to storage as well as faults to holes. I left the COLOUR check in dax_pmd_insert_mapping() in place in case we ever hit the insanity condition where the alignment of the pfn we get from the driver doesn't match the alignment of the userspace address. Link: http://lkml.kernel.org/r/20170822222436.18926-1-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Reported-by: N"Slusarz, Marcin" <marcin.slusarz@intel.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 8月, 2017 2 次提交
-
-
由 Chuck Lever 提交于
When processing an NFSv4 WRITE operation, argp->end should never point past the end of the data in the final page of the page list. Otherwise, nfsd4_decode_compound can walk into uninitialized memory. More critical, nfsd4_decode_write is failing to increment argp->pagelen when it increments argp->pagelist. This can cause later xdr decoders to assume more data is available than really is, which can cause server crashes on malformed requests. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Eric W. Biederman 提交于
The implementation of TIOCGPTPEER has two issues. When /dev/ptmx (as opposed to /dev/pts/ptmx) is opened the wrong vfsmount is passed to dentry_open. Which results in the kernel displaying the wrong pathname for the peer. The second is simply by caching the vfsmount and dentry of the peer it leaves them open, in a way they were not previously Which because of the inreased reference counts can cause unnecessary behaviour differences resulting in regressions. To fix these move the ioctl into tty_io.c at a generic level allowing the ioctl to have access to the struct file on which the ioctl is being called. This allows the path of the slave to be derived when opening the slave through TIOCGPTPEER instead of requiring the path to the slave be cached. Thus removing the need for caching the path. A new function devpts_ptmx_path is factored out of devpts_acquire and used to implement a function devpts_mntget. The new function devpts_mntget takes a filp to perform the lookup on and fsi so that it can confirm that the superblock that is found by devpts_ptmx_path is the proper superblock. v2: Lots of fixes to make the code actually work v3: Suggestions by Linus - Removed the unnecessary initialization of filp in ptm_open_peer - Simplified devpts_ptmx_path as gotos are no longer required [ This is the fix for the issue that was reverted in commit 143c97cc, but this time without breaking 'pbuilder' due to increased reference counts - Linus ] Fixes: 54ebbfb1 ("tty: add TIOCGPTPEER ioctl") Reported-by: NChristian Brauner <christian.brauner@canonical.com> Reported-and-tested-by: NStefan Lippers-Hollmann <s.l-h@gmx.de> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 8月, 2017 4 次提交
-
-
由 Omar Sandoval 提交于
This fixes several instances of blk_status_t and bare errno ints being mixed up, some of which are real bugs. In the normal case, 0 matches BLK_STS_OK, so we don't observe any effects of the missing conversion, but in case of errors or passes through the repair/retry paths, the errors get mixed up. The changes were identified using 'sparse', we don't have reports of the buggy behaviour. Fixes: 4e4cbee9 ("block: switch bios to blk_status_t") Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Linus Torvalds 提交于
This reverts commit c8c03f18. It turns out that while fixing the ptmx file descriptor to have the correct 'struct path' to the associated slave pty is a really good thing, it breaks some user space tools for a very annoying reason. The problem is that /dev/ptmx and its associated slave pty (/dev/pts/X) are on different mounts. That was what caused us to have the wrong path in the first place (we would mix up the vfsmount of the 'ptmx' node, with the dentry of the pty slave node), but it also means that now while we use the right vfsmount, having the pty master open also keeps the pts mount busy. And it turn sout that that makes 'pbuilder' very unhappy, as noted by Stefan Lippers-Hollmann: "This patch introduces a regression for me when using pbuilder 0.228.7[2] (a helper to build Debian packages in a chroot and to create and update its chroots) when trying to umount /dev/ptmx (inside the chroot) on Debian/ unstable (full log and pbuilder configuration file[3] attached). [...] Setting up build-essential (12.3) ... Processing triggers for libc-bin (2.24-15) ... I: unmounting dev/ptmx filesystem W: Could not unmount dev/ptmx: umount: /var/cache/pbuilder/build/1340/dev/ptmx: target is busy (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1).)" apparently pbuilder tries to unmount the /dev/pts filesystem while still holding at least one master node open, which is arguably not very nice, but we don't break user space even when fixing other bugs. So this commit has to be reverted. I'll try to figure out a way to avoid caching the path to the slave pty in the master pty. The only thing that actually wants that slave pty path is the "TIOCGPTPEER" ioctl, and I think we could just recreate the path at that time. Reported-by: NStefan Lippers-Hollmann <s.l-h@gmx.de> Cc: Eric W Biederman <ebiederm@xmission.com> Cc: Christian Brauner <christian.brauner@canonical.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ronnie Sahlberg 提交于
Add checking for the path component length and verify it is <= the maximum that the server advertizes via FileFsAttributeInformation. With this patch cifs.ko will now return ENAMETOOLONG instead of ENOENT when users to access an overlong path. To test this, try to cd into a (non-existing) directory on a CIFS share that has a too long name: cd /mnt/aaaaaaaaaaaaaaa... and it now should show a good error message from the shell: bash: cd: /mnt/aaaaaaaaaaaaaaaa...aaaaaa: File name too long rh bz 1153996 Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com> Signed-off-by: NSteve French <smfrench@gmail.com> Cc: <stable@vger.kernel.org>
-
由 Sachin Prabhu 提交于
The df for a SMB2 share triggers a GetInfo call for FS_FULL_SIZE_INFORMATION. The values returned are used to populate struct statfs. The problem is that none of the information returned by the call contains the total blocks available on the filesystem. Instead we use the blocks available to the user ie. quota limitation when filling out statfs.f_blocks. The information returned does contain Actual free units on the filesystem and is used to populate statfs.f_bfree. For users with quota enabled, it can lead to situations where the total free space reported is more than the total blocks on the system ending up with df reports like the following # df -h /mnt/a Filesystem Size Used Avail Use% Mounted on //192.168.22.10/a 2.5G -2.3G 2.5G - /mnt/a To fix this problem, we instead populate both statfs.f_bfree with the same value as statfs.f_bavail ie. CallerAvailableAllocationUnits. This is similar to what is done already in the code for cifs and df now reports the quota information for the user used to mount the share. # df --si /mnt/a Filesystem Size Used Avail Use% Mounted on //192.168.22.10/a 2.7G 101M 2.6G 4% /mnt/a Signed-off-by: NSachin Prabhu <sprabhu@redhat.com> Signed-off-by: NPierguido Lambri <plambri@redhat.com> Signed-off-by: NSteve French <smfrench@gmail.com> Cc: <stable@vger.kernel.org>
-
- 18 8月, 2017 3 次提交
-
-
由 Darrick J. Wong 提交于
If we fail a mount on account of cow recovery errors, it's possible that a previous quotacheck left some dquots in memory. The bailout clause of xfs_mountfs forgets to purge these, and so we leak them. Fix that. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
由 Darrick J. Wong 提交于
Way back when we established inode block-map redo log items, it was discovered that we needed to prevent the VFS from evicting inodes during log recovery because any given inode might be have bmap redo items to replay even if the inode has no link count and is ultimately deleted, and any eviction of an unlinked inode causes the inode to be truncated and freed too early. To make this possible, we set MS_ACTIVE so that inodes would not be torn down immediately upon release. Unfortunately, this also results in the quota inodes not being released at all if a later part of the mount process should fail, because we never reclaim the inodes. So, set MS_ACTIVE right before we do the last part of log recovery and clear it immediately after we finish the log recovery so that everything will be torn down properly if we abort the mount. Fixes: 17c12bcd ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped") Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
由 Linus Torvalds 提交于
Christian Brauner reported that if you use the TIOCGPTPEER ioctl() to get a slave pty file descriptor, the resulting file descriptor doesn't look right in /proc/<pid>/fd/<fd>. In particular, he wanted to use readlink() on /proc/self/fd/<fd> to get the pathname of the slave pty (basically implementing "ptsname{_r}()"). The reason for that was that we had generated the wrong 'struct path' when we create the pty in ptmx_open(). In particular, the dentry was correct, but the vfsmount pointed to the mount of the ptmx node. That _can_ be correct - in case you use "/dev/pts/ptmx" to open the master - but usually is not. The normal case is to use /dev/ptmx, which then looks up the pts/ directory, and then the vfsmount of the ptmx node is obviously the /dev directory, not the /dev/pts/ directory. We actually did have the right vfsmount available, but in the wrong place (it gets looked up in 'devpts_acquire()' when we get a reference to the pts filesystem), and so ptmx_open() used the wrong mnt pointer. The end result of this confusion was that the pty worked fine, but when if you did TIOCGPTPEER to get the slave side of the pty, end end result would also work, but have that dodgy 'struct path'. And then when doing "d_path()" on to get the pathname, the vfsmount would not match the root of the pts directory, and d_path() would return an empty pathname thinking that the entry had escaped a bind mount into another mount. This fixes the problem by making devpts_acquire() return the vfsmount for the pts filesystem, allowing ptmx_open() to trivially just use the right mount for the pts dentry, and create the proper 'struct path'. Reported-by: NChristian Brauner <christian.brauner@ubuntu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: NEric Biederman <ebiederm@xmission.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 8月, 2017 1 次提交
-
-
由 Oleg Nesterov 提交于
The ADDR_NO_RANDOMIZE checks in stack_maxrandom_size() and randomize_stack_top() are not required. PF_RANDOMIZE is set by load_elf_binary() only if ADDR_NO_RANDOMIZE is not set, no need to re-check after that. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NDmitry Safonov <dsafonov@virtuozzo.com> Cc: stable@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20170815154011.GB1076@redhat.com
-
- 14 8月, 2017 2 次提交
-
-
由 Tahsin Erdogan 提交于
When updating an extended attribute, if the padded value sizes are the same, a shortcut is taken to avoid the bulk of the work. This was fine until the xattr hash update was moved inside ext4_xattr_set_entry(). With that change, the hash update got missed in the shortcut case. Thanks to ZhangYi (yizhang089@gmail.com) for root causing the problem. Fixes: daf83281 ("ext4: eliminate xattr entry e_hash recalculation for removes") Reported-by: NMiklos Szeredi <miklos@szeredi.hu> Signed-off-by: NTahsin Erdogan <tahsin@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Arnd Bergmann <arnd@arndb.de> As Stefan pointed out, I misremembered what clang can do specifically, and it turns out that the variable-length array at the end of the structure did not work (a flexible array would have worked here but not solved the problem): fs/ext4/mballoc.c:2303:17: error: fields must have a constant size: 'variable length array in structure' extension will never be supported ext4_grpblk_t counters[blocksize_bits + 2]; This reverts part of my previous patch, using a fixed-size array again, but keeping the check for the array overflow. Fixes: 2df2c340 ("ext4: fix warning about stack corruption") Reported-by: NStefan Agner <stefan@agner.ch> Tested-by: NChandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 12 8月, 2017 3 次提交
-
-
由 Christoph Hellwig 提交于
Fix the min_t calls in the zeroing and dirtying helpers to perform the comparisms on 64-bit types, which prevents them from incorrectly being truncated, and larger zeroing operations being stuck in a never ending loop. Special thanks to Markus Stockhausen for spotting the bug. Reported-by: NPaul Menzel <pmenzel@molgen.mpg.de> Tested-by: NPaul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Omar Sandoval 提交于
When we try to allocate a free inode by searching the inobt, we try to find the inode nearest the parent inode by searching chunks both left and right of the chunk containing the parent. As an optimization, we cache the leftmost and rightmost records that we previously searched; if we do another allocation with the same parent inode, we'll pick up the search where it last left off. There's a bug in the case where we found a free inode to the left of the parent's chunk: we need to update the cached left and right records, but because we already reassigned the right record to point to the left, we end up assigning the left record to both the cached left and right records. This isn't a correctness problem strictly, but it can result in the next allocation rechecking chunks unnecessarily or allocating inodes further away from the parent than it needs to. Fix it by swapping the record pointer after we update the cached left and right records. Fixes: bd169565 ("xfs: speed up free inode search") Signed-off-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
The blocklayout code does not compile cleanly for a 32-bit sector_t, and also has no reliable checks for devices sizes, which makes it unsafe to use with a kernel that doesn't support large block devices. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reported-by: NArnd Bergmann <arnd@arndb.de> Fixes: 5c83746a ("pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing") Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 11 8月, 2017 4 次提交
-
-
由 Jeff Layton 提交于
This ensures that we see errors on fsync when writeback fails. Signed-off-by: NJeff Layton <jlayton@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Mike Rapoport 提交于
When the process exit races with outstanding mcopy_atomic, it would be better to return ESRCH error. When such race occurs the process and it's mm are going away and returning "no such process" to the uffd monitor seems better fit than ENOSPC. Link: http://lkml.kernel.org/r/1502111545-32305-1-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com> Suggested-by: NMichal Hocko <mhocko@suse.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
Nadav reported KSM can corrupt the user data by the TLB batching race[1]. That means data user written can be lost. Quote from Nadav Amit: "For this race we need 4 CPUs: CPU0: Caches a writable and dirty PTE entry, and uses the stale value for write later. CPU1: Runs madvise_free on the range that includes the PTE. It would clear the dirty-bit. It batches TLB flushes. CPU2: Writes 4 to /proc/PID/clear_refs , clearing the PTEs soft-dirty. We care about the fact that it clears the PTE write-bit, and of course, batches TLB flushes. CPU3: Runs KSM. Our purpose is to pass the following test in write_protect_page(): if (pte_write(*pvmw.pte) || pte_dirty(*pvmw.pte) || (pte_protnone(*pvmw.pte) && pte_savedwrite(*pvmw.pte))) Since it will avoid TLB flush. And we want to do it while the PTE is stale. Later, and before replacing the page, we would be able to change the page. Note that all the operations the CPU1-3 perform canhappen in parallel since they only acquire mmap_sem for read. We start with two identical pages. Everything below regards the same page/PTE. CPU0 CPU1 CPU2 CPU3 ---- ---- ---- ---- Write the same value on page [cache PTE as dirty in TLB] MADV_FREE pte_mkclean() 4 > clear_refs pte_wrprotect() write_protect_page() [ success, no flush ] pages_indentical() [ ok ] Write to page different value [Ok, using stale PTE] replace_page() Later, CPU1, CPU2 and CPU3 would flush the TLB, but that is too late. CPU0 already wrote on the page, but KSM ignored this write, and it got lost" In above scenario, MADV_FREE is fixed by changing TLB batching API including [set|clear]_tlb_flush_pending. Remained thing is soft-dirty part. This patch changes soft-dirty uses TLB batching API instead of flush_tlb_mm and KSM checks pending TLB flush by using mm_tlb_flush_pending so that it will flush TLB to avoid data lost if there are other parallel threads pending TLB flush. [1] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com Link: http://lkml.kernel.org/r/20170802000818.4760-8-namit@vmware.comSigned-off-by: NMinchan Kim <minchan@kernel.org> Signed-off-by: NNadav Amit <namit@vmware.com> Reported-by: NNadav Amit <namit@vmware.com> Tested-by: NNadav Amit <namit@vmware.com> Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Hugh Dickins <hughd@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
As Tetsuo points out: "Commit 385386cf ("mm: vmstat: move slab statistics from zone to node counters") broke "Slab:" field of /proc/meminfo . It shows nearly 0kB" In addition to /proc/meminfo, this problem also affects the slab counters OOM/allocation failure info dumps, can cause early -ENOMEM from overcommit protection, and miscalculate image size requirements during suspend-to-disk. This is because the patch in question switched the slab counters from the zone level to the node level, but forgot to update the global accessor functions to read the aggregate node data instead of the aggregate zone data. Use global_node_page_state() to access the global slab counters. Fixes: 385386cf ("mm: vmstat: move slab statistics from zone to node counters") Link: http://lkml.kernel.org/r/20170801134256.5400-1-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Stefan Agner <stefan@agner.ch> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 8月, 2017 1 次提交
-
-
由 Trond Myklebust 提交于
If the call to TEST_STATEID returns NFS4ERR_OLD_STATEID, then it just means we raced with other calls to OPEN. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 09 8月, 2017 1 次提交
-
-
由 Weston Andros Adamson 提交于
The client was freeing the nfs4_ff_layout_ds, but not the contained nfs4_ff_ds_version array. Signed-off-by: NWeston Andros Adamson <dros@primarydata.com> Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-