- 08 7月, 2011 5 次提交
-
-
由 Christoph Hellwig 提交于
Get rid of the special case where we use unlogged timestamp updates for a truncate to the current inode size, and just call xfs_setattr_nonsize for it to treat it like a utimes calls. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Christoph Hellwig 提交于
Split up xfs_setattr into two functions, one for the complex truncate handling, and one for the trivial attribute updates. Also move both new routines to xfs_iops.c as they are fairly Linux-specific. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Christoph Hellwig 提交于
GCC 4.6 complains about an array subscript is above array bounds when using the btree index to index into the agf_levels array. The only two indices passed in are 0 and 1, and we have an assert insuring that. Replace the trick of using the array index directly with using constants in the already existing branch for assigning the XFS_BTREE_LASTREC_UPDATE flag. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Christoph Hellwig 提交于
The non-blockig behaviour in xfs_vm_writepage currently is conditional on having both the WB_SYNC_NONE sync_mode and the nonblocking flag set. The latter used to be used by both pdflush, kswapd and a few other places in older kernels, but has been fading out starting with the introduction of the per-bdi flusher threads. Enable the non-blocking behaviour for all WB_SYNC_NONE calls to get back the behaviour we want. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Christoph Hellwig 提交于
Now that we reject direct reclaim in addition to always using GFP_NOFS allocation there's no chance we'll ever end up in ->writepage with PF_FSTRANS set. Add a WARN_ON if we hit this case, and stop checking if we'd actually need to start a transaction. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 07 7月, 2011 1 次提交
-
-
由 Dave Chinner 提交于
When inodes are marked stale in a transaction, they are treated specially when the inode log item is being inserted into the AIL. It tries to avoid moving the log item forward in the AIL due to a race condition with the writing the underlying buffer back to disk. The was "fixed" in commit de25c181 ("xfs: avoid moving stale inodes in the AIL"). To avoid moving the item forward, we return a LSN smaller than the commit_lsn of the completing transaction, thereby trying to trick the commit code into not moving the inode forward at all. I'm not sure this ever worked as intended - it assumes the inode is already in the AIL, but I don't think the returned LSN would have been small enough to prevent moving the inode. It appears that the reason it worked is that the lower LSN of the inodes meant they were inserted into the AIL and flushed before the inode buffer (which was moved to the commit_lsn of the transaction). The big problem is that with delayed logging, the returning of the different LSN means insertion takes the slow, non-bulk path. Worse yet is that insertion is to a position -before- the commit_lsn so it is doing a AIL traversal on every insertion, and has to walk over all the items that have already been inserted into the AIL. It's expensive. To compound the matter further, with delayed logging inodes are likely to go from clean to stale in a single checkpoint, which means they aren't even in the AIL at all when we come across them at AIL insertion time. Hence these were all getting inserted into the AIL when they simply do not need to be as inodes marked XFS_ISTALE are never written back. Transactional/recovery integrity is maintained in this case by the other items in the unlink transaction that were modified (e.g. the AGI btree blocks) and committed in the same checkpoint. So to fix this, simply unpin the stale inodes directly in xfs_inode_item_committed() and return -1 to indicate that the AIL insertion code does not need to do any further processing of these inodes. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 24 6月, 2011 3 次提交
-
-
由 Dave Chinner 提交于
If the attribute fork on an inode is in btree format and has multiple levels (i.e node format rather than leaf format), then a lookup failure will trigger an assert failure in xfs_da_path_shift if the flag XFS_DA_OP_OKNOENT is not set. This flag is used to indicate to the directory btree code that not finding an entry is not a fatal error. In the case of doing a lookup for a directory name removal, this is valid as a user cannot insert an arbitrary name to remove from the directory btree. However, in the case of the attribute tree, a user has direct control over the attribute name and can ask for any random name to be removed without any validation. In this case, fsstress is asking for a non-existent user.selinux attribute to be removed, and that is causing xfs_da_path_shift() to fall off the bottom of the tree where it asserts that a lookup failure is allowed. Because the flag is not set, we die a horrible death on a debug enable kernel. Prevent this assert from firing on attribute removes by adding the op_flag XFS_DA_OP_OKNOENT to atribute removal operations. Discovered when testing on a SELinux enabled system by fsstress in test 070 by trying to remove a non-existent user.selinux attribute. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
When an inode is truncated down, speculative preallocation is removed from the inode. This should also reset the state bits for controlling whether preallocation is subsequently removed when the file is next closed. The flag is not being cleared, so repeated operations on a file that first involve a truncate (e.g. multiple repeated dd invocations on a file) give different file layouts for the second and subsequent invocations. Fix this by clearing the XFS_IDIRTY_RELEASE state bit when the XFS_ITRUNCATED bit is detected in xfs_release() and hence ensure that speculative delalloc is removed on files that have been truncated down. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
XFS inodes has several per-lifetime state fields that determine the behaviour of the inode. These state fields are not all reset when an inode is reused from the reclaimable state. This can lead to unexpected behaviour of the new inode such as speculative preallocation not being truncated away in the expected manner for local files until the inode is subsequently truncated, freed or cycles out of the cache. It can also lead to an inode being considered to be a filestream inode or having been truncated when that is not the case. Rework the reinitialisation of the inode when it is recycled to ensure that it is pristine before it is reused. While there, also fix the resetting of state flags in the recycling error paths so the inode does not become unreclaimable. Signed-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 16 6月, 2011 1 次提交
-
-
由 Christoph Hellwig 提交于
There's no reason not to support cache flushing on external log devices. The only thing this really requires is flushing the data device first both in fsync and log commits. A side effect is that we also have to remove the barrier write test during mount, which has been superflous since the new FLUSH+FUA code anyway. Also use the chance to flush the RT subvolume write cache before the fsync commit, which is required for correct semantics. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 15 6月, 2011 1 次提交
-
-
由 Al Viro 提交于
->mknod() should return negative on errors and PTR_ERR() gives already negative value... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 04 6月, 2011 11 次提交
-
-
由 David Sterba 提交于
With Linus' tree, today's linux-next build (powercp ppc64_defconfig) produced this warning: fs/btrfs/delayed-inode.c: In function 'btrfs_delayed_update_inode': fs/btrfs/delayed-inode.c:1598:6: warning: 'ret' may be used uninitialized in this function Introduced by commit 16cdcec7 ("btrfs: implement delayed inode items operation"). This fixes a bug in btrfs_update_inode(): if the returned value from btrfs_delayed_update_inode is a nonzero garbage, inode stat data are not updated and several call paths may hit a BUG_ON or fail with strange code. Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
由 David Sterba 提交于
wrap checking of filesystem 'closing' flag and fix a few missing memory barriers. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
由 Chris Mason 提交于
This makes the inode map cache default to off until we fix the overflow problem when the free space crcs don't fit inside a single page. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Arne Jansen 提交于
With the removal of the implicit plugging scrub ends up doing more and smaller I/O than necessary. This patch adds explicit plugging per chunk. Signed-off-by: NArne Jansen <sensille@gmx.net> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 David Sterba 提交于
commit 4cb5300b ("Btrfs: add mount -o auto_defrag") accesses inode number directly while it should use the helper with the new inode number allocator. Signed-off-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
With xfstest 254 I can panic the box every time with the inode number caching stuff on. This is because we clean the inodes out when we delete the subvolume, but then we write out the inode cache which adds an inode to the subvolume inode tree, and then when it gets evicted again the root gets added back on the dead roots list and is deleted again, so we have a double free. To stop this from happening just return 0 if refs is 0 (and we're not the tree root since tree root always has refs of 0). With this fix 254 no longer panics. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com> Tested-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Arne Jansen 提交于
In degraded mode the struct btrfs_device of missing devs don't have device->name set. A kstrdup of NULL correctly returns NULL. Don't BUG in this case. Signed-off-by: NArne Jansen <sensille@gmx.net> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 liubo 提交于
This adds extra checks to make sure the inode map we are caching really belongs to a FS root instead of a special relocation tree. It prevents crashes during balancing operations. Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
The free space cache uses only one page for crcs right now, which means we can't have a cache file bigger than the crcs we can fit in the first page. This adds a check to enforce that restriction. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
The nitems counter needs to start at zero Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Arne Jansen 提交于
The current scrub implementation reuses bios and pages as often as possible, allocating them only on start and releasing them when finished. This leads to more problems with the block layer than it's worth. The elevator gets confused when there are more pages added to the bio than bi_size suggests. This patch completely rips out the reuse of bios and pages and allocates them freshly for each submit. Signed-off-by: NArne Jansen <sensille@gmx.net> Signed-off-by: NChris Maosn <chris.mason@oracle.com>
-
- 03 6月, 2011 6 次提交
-
-
由 Ben Gardiner 提交于
The free space fixup is currently initiated during mount after the call to ubifs_write_master() which results in a write to PEBs; this has been observed with the patch 'assert no fixup when writing a node' applied: Move the free space fixup on mount to before the calls to ubifs_recover_inl_heads() and ubifs_write_master(). This results in no assertions with the previously mentioned patch applied. Artem: tweaked the patch a bit Signed-off-by: NBen Gardiner <bengardiner@nanometrics> Reviewed-by: NMatthew L. Creech <mlcreech@gmail.com> Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
由 Ben Gardiner 提交于
The current 'mount_ubifs()' implementation does not initialize the LPT until the the master node is marked dirty. Move the LPT initialization to before marking the master node dirty. This is a preparation for the next patch which will move the free-space-fixup check to before marking the master node dirty, because we have to fix-up the free space before doing any writes. Artem: massaged the patch and commit message. Signed-off-by: NBen Gardiner <bengardiner@nanometrics.ca> Reviewed-by: NMatthew L. Creech <mlcreech@gmail.com> Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
由 Ben Gardiner 提交于
The current free space fixup can result in some writing to the UBI volume when the space_fixup flag is set. To catch instances where UBIFS is writing to the NAND while the space_fixup flag is set, add an assert to ubifs_write_node(). Artem: tweaked the patch, added similar assertion to the write buffer write path. Signed-off-by: NBen Gardiner <bengardiner@nanometrics.ca> Reviewed-by: NMatthew L. Creech <mlcreech@gmail.com> Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
由 Artem Bityutskiy 提交于
UBIFS maintains per-filesystem and global clean znode counters ('c->clean_zn_cnt' and 'ubifs_clean_zn_cnt'). It is important to maintain correct values there since the shrinker relies on 'ubifs_clean_zn_cnt'. However, in case of failures during commit the counters were corrupted. E.g., if a failure happens in the middle of 'write_index()', then some nodes in the commit list ('c->cnext') are marked as clean, and some are marked as dirty. And the 'ubifs_destroy_tnc_subtree()' frees does not retrun correct count, and we end up with non-zero 'c->clean_zn_cnt' when unmounting. This means that if we have 2 file-sytem and one of them fails, and we unmount it, 'ubifs_clean_zn_cnt' stays incorrect and confuses the shrinker. Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
由 Artem Bityutskiy 提交于
UBIFS leaks memory on error path in 'ubifs_jnl_update()' in case of write failure because it forgets to free the 'struct ubifs_dent_node *dent' object. Although the object is small, the alignment can make it large - e.g., 2KiB if the min. I/O unit is 2KiB. Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com> Cc: stable@kernel.org
-
由 Artem Bityutskiy 提交于
Sometimes VM asks the shrinker to return amount of objects it can shrink, and we return the ubifs_clean_zn_cnt in that case. However, it is possible that this counter is negative for a short period of time, due to the way UBIFS TNC code updates it. And I can observe the following warnings sometimes: shrink_slab: ubifs_shrinker+0x0/0x2b7 [ubifs] negative objects to delete nr=-8541616642706119788 This patch makes sure UBIFS never returns negative count of objects. Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com> Cc: stable@kernel.org
-
- 01 6月, 2011 5 次提交
-
-
由 Artem Bityutskiy 提交于
Unfortunately, the recovery fix d1606a59b6be4ea392eabd40d1250aa1eeb19efb (UBIFS: fix extremely rare mount failure) broke recovery. This commit make UBIFS drop the last min. I/O unit in all journal heads, but this is needed only for the GC head. And this does not work for non-GC heads. For example, if suppose we have min. I/O units A and B, and A contains a valid node X, which was fsynced, and then a group of nodes Y which spans the rest of A and B. In this case we'll drop not only Y, but also X, which is obviously incorrect. This patch fixes the issue and additionally makes recovery to drop last min. I/O unit only for the GC head, and leave things as they have been for ages for the other heads - this is safer. Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
由 Artem Bityutskiy 提交于
Instead of passing "grouped" parameter to 'ubifs_recover_leb()' which tells whether the nodes are grouped in the LEB to recover, pass the journal head number and let 'ubifs_recover_leb()' look at the journal head's 'grouped' flag. This patch is a preparation to a further fix where we'll need to know the journal head number for other purposes. Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
由 Artem Bityutskiy 提交于
Journal heads are different in a way how UBIFS writes nodes there. All normal journal heads receive grouped nodes, while the GC journal heads receives ungrouped nodes. This patch adds a 'grouped' flag to 'struct ubifs_jhead' which describes this property. This patch is a preparation to a further recovery fix. Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
由 Artem Bityutskiy 提交于
Commit ab51afe05273741f72383529ef488aa1ea598ec6 was a good clean-up, but it introduced a regression - now UBIFS prints scary error messages during recovery on all corrupted nodes, even though the corruptions are expected (due to a power cut). This patch fixes the issue. Additionally fix a typo in a commentary introduced by the same commit. Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
由 Tejun Heo 提交于
d4dc210f (block: don't block events on excl write for non-optical devices) added dereferencing of bdev->bd_disk to test GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE; however, bdev->bd_disk can be %NULL if open failed which can lead to an oops. Test the flag after testing open was successful, not before. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NDavid Miller <davem@davemloft.net> Tested-by: NDavid Miller <davem@davemloft.net> Cc: stable@kernel.org Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 30 5月, 2011 7 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Sage Weil 提交于
The dentry_unhash push-down series missed that shink_dcache_parent needs to be called prior to rmdir or dir rename to clear DCACHE_REFERENCED and allow efficient dentry reclaim. Reported-by: NDave Chinner <david@fromorbit.com> Signed-off-by: NSage Weil <sage@newdream.net> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Jens Axboe 提交于
It was not a good idea to start dereferencing disk->queue from the fs sysfs strategy for displaying discard alignment. We ran into first a NULL pointer deref, and after fixing that we sometimes see unvalid disk->queue pointer values. Since discard is the only one of the bunch actually looking into the queue, just revert the change. This reverts commit 23ceb5b7. Conflicts: fs/partitions/check.c
-
由 Tyler Hicks 提交于
Now that ecryptfs_lookup_interpose() is no longer using ecryptfs_header_cache_2 to read in metadata, the kmem_cache can be removed and the ecryptfs_header_cache_1 kmem_cache can be renamed to ecryptfs_header_cache. Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
-
由 Tyler Hicks 提交于
ecryptfs_lookup_interpose() has turned into spaghetti code over the years. This is an effort to clean it up. - Shorten overly descriptive variable names such as ecryptfs_dentry - Simplify gotos and error paths - Create helper function for reading plaintext i_size from metadata It also includes an optimization when reading i_size from the metadata. A complete page-sized kmem_cache_alloc() was being done to read in 16 bytes of metadata. The buffer for that is now statically declared. Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
-
由 Tyler Hicks 提交于
Instead of having the calling functions translate the true/false return code to either 0 or -EINVAL, have contains_ecryptfs_marker() return 0 or -EINVAL so that the calling functions can just reuse the return code. Also, rename the function to ecryptfs_validate_marker() to avoid callers mistakenly thinking that it returns true/false codes. Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
-
由 Tyler Hicks 提交于
Only unlock and d_add() new inodes after the plaintext inode size has been read from the lower filesystem. This fixes a race condition that was sometimes seen during a multi-job kernel build in an eCryptfs mount. https://bugzilla.kernel.org/show_bug.cgi?id=36002Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com> Reported-by: NDavid <david@unsolicited.net> Tested-by: NDavid <david@unsolicited.net>
-