- 25 1月, 2014 14 次提交
-
-
由 Oleg Nesterov 提交于
put_files_struct() and close_files() do rcu_read_lock() to make rcu_dereference_check_fdtable() happy. This looks a bit ugly, files_fdtable() just reads the pointer, we can simply use rcu_dereference_raw() to avoid the warning. The patch also changes close_files() to return fdt, this avoids another rcu_read_lock()/files_fdtable() in put_files_struct(). I think close_files() needs more cleanups: - we do not need xchg() exactly because we are the last user of this files_struct - "if (file)" should be turned into WARN_ON(!file) Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Oleg Nesterov 提交于
rcu_dereference_check_fdtable() looks very wrong, 1. rcu_my_thread_group_empty() was added by 844b9a87 "vfs: fix RCU-lockdep false positive due to /proc" but it doesn't really fix the problem. A CLONE_THREAD (without CLONE_FILES) task can hit the same race with get_files_struct(). And otoh rcu_my_thread_group_empty() can suppress the correct warning if the caller is the CLONE_FILES (without CLONE_THREAD) task. 2. files->count == 1 check is not really right too. Even if this files_struct is not shared it is not safe to access it lockless unless the caller is the owner. Otoh, this check is sub-optimal. files->count == 0 always means it is safe to use it lockless even if files != current->files, but put_files_struct() has to take rcu_read_lock(). See the next patch. This patch removes the buggy checks and turns fcheck_files() into __fcheck_files() which uses rcu_dereference_raw(), the "unshared" callers, fget_light() and fget_raw_light(), can use it to avoid the warning from RCU-lockdep. fcheck_files() is trivially reimplemented as rcu_lockdep_assert() plus __fcheck_files(). Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
it's never called with NULL argument... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
kill pointless method instances and don't bother with ->owner - it's ignored for procfs files anyway, make use of remove_proc_subtree() for removal, get rid of cell->proc_dir. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
pass owner explicitly to __register_nls(), make register_nls() a macro passing THIS_MODULE as the owner argument to __register_nls(). Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
* don't assume that ->dest_count won't change between copy_from_user() and memdup_user() * use fdget instead of fget * don't bother comparing superblocks when we'd already compared vfsmounts * get rid of excessive goto * use file_inode() instead of open-coding the sucker Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
* pass on-disk superblock to qnx4_chkroot() explicitly * don't leave stale (and unused) pointers in qnx4_super_block * free stuff in ->kill_sb(); ->put_super() becomes empty and dies * simplify failure exits Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
simplifies failure exits in ->mount()... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
failure exits are simpler that way Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
... and return saner errors from ->mount(), while we are at it Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
don't bother open-coding it... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
If ecryptfs_readlink_lower() fails, buf remains an uninitialized pointer and passing it nd_set_link() won't do anything good. Fixed by switching ecryptfs_readlink_lower() to saner API - make it return buf or ERR_PTR(...) and update callers. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 15 1月, 2014 1 次提交
-
-
由 Andreas Rohner 提交于
There is a bug in the function nilfs_segctor_collect, which results in active data being written to a segment, that is marked as clean. It is possible, that this segment is selected for a later segment construction, whereby the old data is overwritten. The problem shows itself with the following kernel log message: nilfs_sufile_do_cancel_free: segment 6533 must be clean Usually a few hours later the file system gets corrupted: NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0, nchildren = 0 NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap (inode number=114660) The issue can be reproduced with a file system that is nearly full and with the cleaner running, while some IO intensive task is running. Although it is quite hard to reproduce. This is what happens: 1. The cleaner starts the segment construction 2. nilfs_segctor_collect is called 3. sc_stage is on NILFS_ST_SUFILE and segments are freed 4. sc_stage is on NILFS_ST_DAT current segment is full 5. nilfs_segctor_extend_segments is called, which allocates a new segment 6. The new segment is one of the segments freed in step 3 7. nilfs_sufile_cancel_freev is called and produces an error message 8. Loop around and the collection starts again 9. sc_stage is on NILFS_ST_SUFILE and segments are freed including the newly allocated segment, which will contain active data and can be allocated at a later time 10. A few hours later another segment construction allocates the segment and causes file system corruption This can be prevented by simply reordering the statements. If nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments the freed segments are marked as dirty and cannot be allocated any more. Signed-off-by: NAndreas Rohner <andreas.rohner@gmx.net> Reviewed-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: NAndreas Rohner <andreas.rohner@gmx.net> Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 1月, 2014 2 次提交
-
-
由 Chuansheng Liu 提交于
In case CONFIG_DEBUG_OBJECTS_WORK is defined, it is needed to call destroy_work_on_stack() which frees the debug object to pair with INIT_WORK_ONSTACK(). Signed-off-by: NLiu, Chuansheng <chuansheng.liu@intel.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com> (cherry picked from commit 6f96b306)
-
由 Jie Liu 提交于
With CRC check is enabled, if trying to set an attributes value just equal to the maximum size of XATTR_SIZE_MAX would cause the v3 remote attr write verification procedure failure, which would yield the back trace like below: <snip> XFS (sda7): Internal error xfs_attr3_rmt_write_verify at line 191 of file fs/xfs/xfs_attr_remote.c <snip> Call Trace: [<ffffffff816f0042>] dump_stack+0x45/0x56 [<ffffffffa0d99c8b>] xfs_error_report+0x3b/0x40 [xfs] [<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs] [<ffffffffa0d99ce5>] xfs_corruption_error+0x55/0x80 [xfs] [<ffffffffa0dbef6b>] xfs_attr3_rmt_write_verify+0x14b/0x1a0 [xfs] [<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs] [<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs] [<ffffffffa0d96edd>] _xfs_buf_ioapply+0x6d/0x390 [xfs] [<ffffffff81184cda>] ? vm_map_ram+0x31a/0x460 [<ffffffff81097230>] ? wake_up_state+0x20/0x20 [<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs] [<ffffffffa0d9726b>] xfs_buf_iorequest+0x6b/0xc0 [xfs] [<ffffffffa0d97315>] xfs_bdstrat_cb+0x55/0xb0 [xfs] [<ffffffffa0d97906>] xfs_bwrite+0x46/0x80 [xfs] [<ffffffffa0dbfa94>] xfs_attr_rmtval_set+0x334/0x490 [xfs] [<ffffffffa0db84aa>] xfs_attr_leaf_addname+0x24a/0x410 [xfs] [<ffffffffa0db8893>] xfs_attr_set_int+0x223/0x470 [xfs] [<ffffffffa0db8b76>] xfs_attr_set+0x96/0xb0 [xfs] [<ffffffffa0db13b2>] xfs_xattr_set+0x42/0x70 [xfs] [<ffffffff811df9b2>] generic_setxattr+0x62/0x80 [<ffffffff811e0213>] __vfs_setxattr_noperm+0x63/0x1b0 [<ffffffff81307afe>] ? evm_inode_setxattr+0xe/0x10 [<ffffffff811e0415>] vfs_setxattr+0xb5/0xc0 [<ffffffff811e054e>] setxattr+0x12e/0x1c0 [<ffffffff811c6e82>] ? final_putname+0x22/0x50 [<ffffffff811c708b>] ? putname+0x2b/0x40 [<ffffffff811cc4bf>] ? user_path_at_empty+0x5f/0x90 [<ffffffff811bdfd9>] ? __sb_start_write+0x49/0xe0 [<ffffffff81168589>] ? vm_mmap_pgoff+0x99/0xc0 [<ffffffff811e07df>] SyS_setxattr+0x8f/0xe0 [<ffffffff81700c2d>] system_call_fastpath+0x1a/0x1f Tests: setfattr -n user.longxattr -v `perl -e 'print "A"x65536'` testfile This patch fix it to check the remote EA size is greater than the XATTR_SIZE_MAX rather than more than or equal to it, because it's valid if the specified EA value size is equal to the limitation as per VFS setxattr interface. Signed-off-by: NJie Liu <jeff.liu@oracle.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com> (cherry picked from commit 85dd0707)
-
- 07 1月, 2014 1 次提交
-
-
由 Eric Whitney 提交于
Commit f5a44db5 introduced a regression on filesystems created with the bigalloc feature (cluster size > blocksize). It causes xfstests generic/006 and /013 to fail with an unexpected JBD2 failure and transaction abort that leaves the test file system in a read only state. Other xfstests run on bigalloc file systems are likely to fail as well. The cause is the accidental use of a cluster mask where a cluster offset was needed in ext4_ext_map_blocks(). Signed-off-by: NEric Whitney <enwlinux@gmail.com>
-
- 03 1月, 2014 1 次提交
-
-
由 Jason Baron 提交于
The EPOLL_CTL_DEL path of epoll contains a classic, ab-ba deadlock. That is, epoll_ctl(a, EPOLL_CTL_DEL, b, x), will deadlock with epoll_ctl(b, EPOLL_CTL_DEL, a, x). The deadlock was introduced with commmit 67347fe4 ("epoll: do not take global 'epmutex' for simple topologies"). The acquistion of the ep->mtx for the destination 'ep' was added such that a concurrent EPOLL_CTL_ADD operation would see the correct state of the ep (Specifically, the check for '!list_empty(&f.file->f_ep_links') However, by simply not acquiring the lock, we do not serialize behind the ep->mtx from the add path, and thus may perform a full path check when if we had waited a little longer it may not have been necessary. However, this is a transient state, and performing the full loop checking in this case is not harmful. The important point is that we wouldn't miss doing the full loop checking when required, since EPOLL_CTL_ADD always locks any 'ep's that its operating upon. The reason we don't need to do lock ordering in the add path, is that we are already are holding the global 'epmutex' whenever we do the double lock. Further, the original posting of this patch, which was tested for the intended performance gains, did not perform this additional locking. Signed-off-by: NJason Baron <jbaron@akamai.com> Cc: Nathan Zimmer <nzimmer@sgi.com> Cc: Eric Wong <normalperson@yhbt.net> Cc: Nelson Elhage <nelhage@nelhage.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 1月, 2014 1 次提交
-
-
由 Tetsuo Handa 提交于
GLOCK_BUG_ON() might call this function without RCU read lock. Make sure that RCU read lock is held when using task_struct returned from pid_task(). Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 28 12月, 2013 3 次提交
-
-
由 Shirish Pargaonkar 提交于
Set FILE_CREATED on O_CREAT|O_EXCL. cifs code didn't change during commit 116cc022 Kernel bugzilla 66251 Signed-off-by: NShirish Pargaonkar <spargaonkar@suse.com> Acked-by: NJeff Layton <jlayton@redhat.com> CC: Stable <stable@kernel.org> Signed-off-by: NSteve French <smfrench@gmail.com>
-
由 Sachin Prabhu 提交于
When we obtain tcon from cifs_sb, we use cifs_sb_tlink() to first obtain tlink which also grabs a reference to it. We do not drop this reference to tlink once we are done with the call. The patch fixes this issue by instead passing tcon as a parameter and avoids having to obtain a reference to the tlink. A lookup for the tcon is already made in the calling functions and this way we avoid having to re-run the lookup. This is also consistent with the argument list for other similar calls for M-F symlinks. We should also return an ENOSYS when we do not find a protocol specific function to lookup the MF Symlink data. Signed-off-by: NSachin Prabhu <sprabhu@redhat.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> CC: Stable <stable@kernel.org> Signed-off-by: NSteve French <smfrench@gmail.com>
-
由 Steve French 提交于
Signed-off-by: NSteve French <smfrench@gmail.com> Signed-off-by: NGregor Beck <gbeck@sernet.de> Reviewed-by: NJeff Layton <jlayton@redhat.com>
-
- 23 12月, 2013 1 次提交
-
-
由 Linus Torvalds 提交于
Since commit 36bc08cc ("fs/aio: Add support to aio ring pages migration") the aio ring setup code has used a special per-ring backing inode for the page allocations, rather than just using random anonymous pages. However, rather than remembering the pages as it allocated them, it would allocate the pages, insert them into the file mapping (dirty, so that they couldn't be free'd), and then forget about them. And then to look them up again, it would mmap the mapping, and then use "get_user_pages()" to get back an array of the pages we just created. Now, not only is that incredibly inefficient, it also leaked all the pages if the mmap failed (which could happen due to excessive number of mappings, for example). So clean it all up, making it much more straightforward. Also remove some left-overs of the previous (broken) mm_populate() usage that was removed in commit d6c355c7 ("aio: fix race in ring buffer page lookup introduced by page migration support") but left the pointless and now misleading MAP_POPULATE flag around. Tested-and-acked-by: NBenjamin LaHaise <bcrl@kvack.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 12月, 2013 2 次提交
-
-
由 Benjamin LaHaise 提交于
The arbitrary restriction on page counts offered by the core migrate_page_move_mapping() code results in rather suspicious looking fiddling with page reference counts in the aio_migratepage() operation. To fix this, make migrate_page_move_mapping() take an extra_count parameter that allows aio to tell the code about its own reference count on the page being migrated. While cleaning up aio_migratepage(), make it validate that the old page being passed in is actually what aio_migratepage() expects to prevent misbehaviour in the case of races. Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
-
由 Benjamin LaHaise 提交于
e34ecee2 reworked the percpu reference counting to correct a bug trinity found. Unfortunately, the change lead to kioctxes being leaked because there was no final reference count to put. Add that reference count back in to fix things. Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org> Cc: stable@vger.kernel.org
-
- 21 12月, 2013 1 次提交
-
-
由 Luck, Tony 提交于
Some pstore backing devices use on board flash as persistent storage. These have limited numbers of write cycles so it is a poor idea to use them from high frequency operations. Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 12月, 2013 3 次提交
-
-
由 Theodore Ts'o 提交于
The missing casts can cause the high 64-bits of the physical blocks to be lost. Set up new macros which allows us to make sure the right thing happen, even if at some point we end up supporting larger logical block numbers. Thanks to the Emese Revfy and the PaX security team for reporting this issue. Reported-by: NPaX Team <pageexec@freemail.hu> Reported-by: Emese Revfy <re.emese@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
由 Steven Whitehouse 提交于
We need to wait for any outstanding DIO to complete in a couple of situations. Firstly, in case we are changing out of deferred mode (in inode_go_sync) where GLF_DIRTY will not be set. That call could be prefixed with a test for gl_state == LM_ST_DEFERRED but it doesn't seem worth it bearing in mind that the test for outstanding DIO is very quick anyway, in the usual case that there is none. The second case is in inode_go_lock which will catch the cases where we have a cached EX lock, but where we grant deferred locks against it so that there is no glock state transistion. We only need to wait if the state is not deferred, since DIO is valid anyway in that state. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
In patch 209806ab we allowed local deferred locks to be granted against a cached exclusive lock. That opened up a corner case which this patch now fixes. The solution to the problem is to check whether we have cached pages each time we do direct I/O and if so to unmap, flush and invalidate those pages. Since the glock state machine normally does that for us, mostly the code will be a no-op. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 18 12月, 2013 1 次提交
-
-
由 Jan Kara 提交于
Akira-san has been reporting rare deadlocks of his machine when running xfstests test 269 on ext4 filesystem. The problem turned out to be in ext4_da_reserve_metadata() and ext4_da_reserve_space() which called ext4_should_retry_alloc() while holding i_data_sem. Since ext4_should_retry_alloc() can force a transaction commit, this is a lock ordering violation and leads to deadlocks. Fix the problem by just removing the retry loops. These functions should just report ENOSPC to the caller (e.g. ext4_da_write_begin()) and that function must take care of retrying after dropping all necessary locks. Reported-and-tested-by: NAkira Fujita <a-fujita@rs.jp.nec.com> Reviewed-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 17 12月, 2013 8 次提交
-
-
由 Dave Chinner 提交于
If we are doing aysnc writeback of metadata, we can get write errors but have nobody to report them to. At the moment, we simply attempt to reissue the write from io completion in the hope that it's a transient error. When it's not a transient error, the buffer is stuck forever in this loop, and we cannot break out of it. Eventually, unmount will hang because the AIL cannot be emptied and everything goes downhill from them. To solve this problem, only retry the write IO once before aborting it. We don't throw the buffer away because some transient errors can last minutes (e.g. FC path failover) or even hours (thin provisioned devices that have run out of backing space) before they go away. Hence we really want to keep trying until we can't try any more. Because the buffer was not cleaned, however, it does not get removed from the AIL and hence the next pass across the AIL will start IO on it again. As such, we still get the "retry forever" semantics that we currently have, but we allow other access to the buffer in the mean time. Meanwhile the filesystem can continue to modify the buffer and relog it, so the IO errors won't hang the log or the filesystem. Now when we are pushing the AIL, we can see all these "permanent IO error" buffers and we can issue a warning about failures before we retry the IO. We can also catch these buffers when unmounting an issue a corruption warning, too. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
When swalloc is specified as a mount option, allocations are supposed to be aligned to the stripe width rather than the stripe unit of the underlying filesystem. However, it does not do this. What the implementation does is round up the allocation size to a stripe width, hence ensuring that all allocations span a full stripe width. It does not, however, ensure that that allocation is aligned to a stripe width, and hence the allocations can span multiple underlying stripes and so still see RMW cycles for things like direct IO on MD RAID. So, if the swalloc mount option is set, change the allocation alignment in xfs_bmap_btalloc() to use the stripe width rather than the stripe unit. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Christoph Hellwig 提交于
The xfsbdstrat helper is a small but useless wrapper for xfs_buf_iorequest that handles the case of a shut down filesystem. Most of the users have private, uncached buffers that can just be freed in this case, but the complex error handling in xfs_bioerror_relse messes up the case when it's called without a locked buffer. Remove xfsbdstrat and opencode the error handling in the callers. All but one can simply return an error and don't need to deal with buffer state, and the one caller that cares about the buffer state could do with a major cleanup as well, but we'll defer that to later. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
The function xfs_bmap_isaeof() is used to indicate that an allocation is occurring at or past the end of file, and as such should be aligned to the underlying storage geometry if possible. Commit 27a3f8f2 ("xfs: introduce xfs_bmap_last_extent") changed the behaviour of this function for empty files - it turned off allocation alignment for this case accidentally. Hence large initial allocations from direct IO are not getting correctly aligned to the underlying geometry, and that is cause write performance to drop in alignment sensitive configurations. Fix it by considering allocation into empty files as requiring aligned allocation again. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com> (cherry picked from commit f9b395a8)
-
由 Jie Liu 提交于
xfs_quota(8) will hang up if trying to turn group/project quota off before the user quota is off, this could be 100% reproduced by: # mount -ouquota,gquota /dev/sda7 /xfs # mkdir /xfs/test # xfs_quota -xc 'off -g' /xfs <-- hangs up # echo w > /proc/sysrq-trigger # dmesg SysRq : Show Blocked State task PC stack pid father xfs_quota D 0000000000000000 0 27574 2551 0x00000000 [snip] Call Trace: [<ffffffff81aaa21d>] schedule+0xad/0xc0 [<ffffffff81aa327e>] schedule_timeout+0x35e/0x3c0 [<ffffffff8114b506>] ? mark_held_locks+0x176/0x1c0 [<ffffffff810ad6c0>] ? call_timer_fn+0x2c0/0x2c0 [<ffffffffa0c25380>] ? xfs_qm_shrink_count+0x30/0x30 [xfs] [<ffffffff81aa3306>] schedule_timeout_uninterruptible+0x26/0x30 [<ffffffffa0c26155>] xfs_qm_dquot_walk+0x235/0x260 [xfs] [<ffffffffa0c059d8>] ? xfs_perag_get+0x1d8/0x2d0 [xfs] [<ffffffffa0c05805>] ? xfs_perag_get+0x5/0x2d0 [xfs] [<ffffffffa0b7707e>] ? xfs_inode_ag_iterator+0xae/0xf0 [xfs] [<ffffffffa0c22280>] ? xfs_trans_free_dqinfo+0x50/0x50 [xfs] [<ffffffffa0b7709f>] ? xfs_inode_ag_iterator+0xcf/0xf0 [xfs] [<ffffffffa0c261e6>] xfs_qm_dqpurge_all+0x66/0xb0 [xfs] [<ffffffffa0c2497a>] xfs_qm_scall_quotaoff+0x20a/0x5f0 [xfs] [<ffffffffa0c2b8f6>] xfs_fs_set_xstate+0x136/0x180 [xfs] [<ffffffff8136cf7a>] do_quotactl+0x53a/0x6b0 [<ffffffff812fba4b>] ? iput+0x5b/0x90 [<ffffffff8136d257>] SyS_quotactl+0x167/0x1d0 [<ffffffff814cf2ee>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81abcd19>] system_call_fastpath+0x16/0x1b It's fine if we turn user quota off at first, then turn off other kind of quotas if they are enabled since the group/project dquot refcount is decreased to zero once the user quota if off. Otherwise, those dquots refcount is non-zero due to the user dquot might refer to them as hint(s). Hence, above operation cause an infinite loop at xfs_qm_dquot_walk() while trying to purge dquot cache. This problem has been around since Linux 3.4, it was introduced by: [ b84a3a96 xfs: remove the per-filesystem list of dquots ] Originally we will release the group dquot pointers because the user dquots maybe carrying around as a hint via xfs_qm_detach_gdquots(). However, with above change, there is no such work to be done before purging group/project dquot cache. In order to solve this problem, this patch introduces a special routine xfs_qm_dqpurge_hints(), and it would release the group/project dquot pointers the user dquots maybe carrying around as a hint, and then it will proceed to purge the user dquot cache if requested. Cc: stable@vger.kernel.org Signed-off-by: NJie Liu <jeff.liu@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com> (cherry picked from commit df8052e7)
-
由 Jie Liu 提交于
For CRC enabled v5 super block, change a file's ownership can simply trigger an ASSERT failure at xfs_setattr_nonsize() if both group and project quota are enabled, i.e, [ 305.337609] XFS: Assertion failed: !XFS_IS_PQUOTA_ON(mp), file: fs/xfs/xfs_iops.c, line: 621 [ 305.339250] Kernel BUG at ffffffffa0a7fa32 [verbose debug info unavailable] [ 305.383939] Call Trace: [ 305.385536] [<ffffffffa0a7d95a>] xfs_setattr_nonsize+0x69a/0x720 [xfs] [ 305.387142] [<ffffffffa0a7dea9>] xfs_vn_setattr+0x29/0x70 [xfs] [ 305.388727] [<ffffffff811ca388>] notify_change+0x1a8/0x350 [ 305.390298] [<ffffffff811ac39d>] chown_common+0xfd/0x110 [ 305.391868] [<ffffffff811ad6bf>] SyS_fchownat+0xaf/0x110 [ 305.393440] [<ffffffff811ad760>] SyS_lchown+0x20/0x30 [ 305.394995] [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f [ 305.399870] RIP [<ffffffffa0a7fa32>] assfail+0x22/0x30 [xfs] This fix adjust the assertion to check if the super block support both quota inodes or not. Signed-off-by: NJie Liu <jeff.liu@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com> (cherry picked from commit 5a01dd54)
-
由 Jie Liu 提交于
After the previous fix, there still has another ASSERT failure if turning off any type of quota while fsstress is running at the same time. Backtrace in this case: [ 50.867897] XFS: Assertion failed: XFS_IS_GQUOTA_ON(mp), file: fs/xfs/xfs_qm.c, line: 2118 [ 50.867924] ------------[ cut here ]------------ ... <snip> [ 50.867957] Kernel BUG at ffffffffa0b55a32 [verbose debug info unavailable] [ 50.867999] invalid opcode: 0000 [#1] SMP [ 50.869407] Call Trace: [ 50.869446] [<ffffffffa0bc408a>] xfs_qm_vop_create_dqattach+0x19a/0x2d0 [xfs] [ 50.869512] [<ffffffffa0b9cc45>] xfs_create+0x5c5/0x6a0 [xfs] [ 50.869564] [<ffffffffa0b5307c>] xfs_vn_mknod+0xac/0x1d0 [xfs] [ 50.869615] [<ffffffffa0b531d6>] xfs_vn_mkdir+0x16/0x20 [xfs] [ 50.869655] [<ffffffff811becd5>] vfs_mkdir+0x95/0x130 [ 50.869689] [<ffffffff811bf63a>] SyS_mkdirat+0xaa/0xe0 [ 50.869723] [<ffffffff811bf689>] SyS_mkdir+0x19/0x20 [ 50.869757] [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f [ 50.869793] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 <snip> [ 50.870003] RIP [<ffffffffa0b55a32>] assfail+0x22/0x30 [xfs] [ 50.870050] RSP <ffff88002941fd60> [ 50.879251] ---[ end trace c93a2b342341c65b ]--- We're hitting the ASSERT(XFS_IS_*QUOTA_ON(mp)) in xfs_qm_vop_create_dqattach(), however the assertion itself is not right IMHO. While performing quota off, we firstly clear the XFS_*QUOTA_ACTIVE bit(s) from struct xfs_mount without taking any special locks, see xfs_qm_scall_quotaoff(). Hence there is no guarantee that the desired quota is still active. Signed-off-by: NJie Liu <jeff.liu@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com> (cherry picked from commit 37eb9706)
-
由 Mark Tinguely 提交于
Fix the leak of kernel memory in xfs_dir2_node_removename() when xfs_dir2_leafn_remove() returns an error code. Signed-off-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com> (cherry picked from commit ef701600)
-
- 14 12月, 2013 1 次提交
-
-
由 Bob Peterson 提交于
This patch fixes a slab memory leak that sometimes can occur for files with a very short lifespan. The problem occurs when a dinode is deleted before it has gotten to the journal properly. In the leak scenario, the bd object is pinned for journal committment (queued to the metadata buffers queue: sd_log_le_buf) but is subsequently unpinned and dequeued before it finds its way to the ail or the revoke queue. In this rare circumstance, the bd object needs to be freed from slab memory, or it is forgotten. We have to be very careful how we do it, though, because multiple processes can call gfs2_remove_from_journal. In order to avoid double-frees, only the process that does the unpinning is allowed to free the bd. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-