- 19 12月, 2013 3 次提交
-
-
由 Christoph Hellwig 提交于
Make it clear that we're only locking against the extent map on the data fork. Also clean the function up a little bit. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Christoph Hellwig 提交于
We can just use xfs_iunlock without any loss of clarity. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Christoph Hellwig 提交于
Both the inode number and the generation do not change on a live inode. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 12 12月, 2013 3 次提交
-
-
由 Dave Chinner 提交于
The function xfs_bmap_isaeof() is used to indicate that an allocation is occurring at or past the end of file, and as such should be aligned to the underlying storage geometry if possible. Commit 27a3f8f2 ("xfs: introduce xfs_bmap_last_extent") changed the behaviour of this function for empty files - it turned off allocation alignment for this case accidentally. Hence large initial allocations from direct IO are not getting correctly aligned to the underlying geometry, and that is cause write performance to drop in alignment sensitive configurations. Fix it by considering allocation into empty files as requiring aligned allocation again. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Ben Myers 提交于
rec.ir_startino is an agino rather than an ino. Use the correct macro when dealing with it in xfs_difree. Signed-off-by: NBen Myers <bpm@sgi.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Dave Chinner 提交于
If we are using a large directory block size, and memory becomes fragmented, we can get memory allocation failures trying to kmem_alloc(64k) for a temporary buffer. However, there is not need for a directory buffer sized allocation, as the end result ends up in the inode literal area. This is, at most, slightly less than 2k of space, and hence we don't need an allocation larger than that fora temporary buffer. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 10 12月, 2013 2 次提交
-
-
由 Jie Liu 提交于
xfs_quota(8) will hang up if trying to turn group/project quota off before the user quota is off, this could be 100% reproduced by: # mount -ouquota,gquota /dev/sda7 /xfs # mkdir /xfs/test # xfs_quota -xc 'off -g' /xfs <-- hangs up # echo w > /proc/sysrq-trigger # dmesg SysRq : Show Blocked State task PC stack pid father xfs_quota D 0000000000000000 0 27574 2551 0x00000000 [snip] Call Trace: [<ffffffff81aaa21d>] schedule+0xad/0xc0 [<ffffffff81aa327e>] schedule_timeout+0x35e/0x3c0 [<ffffffff8114b506>] ? mark_held_locks+0x176/0x1c0 [<ffffffff810ad6c0>] ? call_timer_fn+0x2c0/0x2c0 [<ffffffffa0c25380>] ? xfs_qm_shrink_count+0x30/0x30 [xfs] [<ffffffff81aa3306>] schedule_timeout_uninterruptible+0x26/0x30 [<ffffffffa0c26155>] xfs_qm_dquot_walk+0x235/0x260 [xfs] [<ffffffffa0c059d8>] ? xfs_perag_get+0x1d8/0x2d0 [xfs] [<ffffffffa0c05805>] ? xfs_perag_get+0x5/0x2d0 [xfs] [<ffffffffa0b7707e>] ? xfs_inode_ag_iterator+0xae/0xf0 [xfs] [<ffffffffa0c22280>] ? xfs_trans_free_dqinfo+0x50/0x50 [xfs] [<ffffffffa0b7709f>] ? xfs_inode_ag_iterator+0xcf/0xf0 [xfs] [<ffffffffa0c261e6>] xfs_qm_dqpurge_all+0x66/0xb0 [xfs] [<ffffffffa0c2497a>] xfs_qm_scall_quotaoff+0x20a/0x5f0 [xfs] [<ffffffffa0c2b8f6>] xfs_fs_set_xstate+0x136/0x180 [xfs] [<ffffffff8136cf7a>] do_quotactl+0x53a/0x6b0 [<ffffffff812fba4b>] ? iput+0x5b/0x90 [<ffffffff8136d257>] SyS_quotactl+0x167/0x1d0 [<ffffffff814cf2ee>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81abcd19>] system_call_fastpath+0x16/0x1b It's fine if we turn user quota off at first, then turn off other kind of quotas if they are enabled since the group/project dquot refcount is decreased to zero once the user quota if off. Otherwise, those dquots refcount is non-zero due to the user dquot might refer to them as hint(s). Hence, above operation cause an infinite loop at xfs_qm_dquot_walk() while trying to purge dquot cache. This problem has been around since Linux 3.4, it was introduced by: [ b84a3a96 xfs: remove the per-filesystem list of dquots ] Originally we will release the group dquot pointers because the user dquots maybe carrying around as a hint via xfs_qm_detach_gdquots(). However, with above change, there is no such work to be done before purging group/project dquot cache. In order to solve this problem, this patch introduces a special routine xfs_qm_dqpurge_hints(), and it would release the group/project dquot pointers the user dquots maybe carrying around as a hint, and then it will proceed to purge the user dquot cache if requested. Cc: stable@vger.kernel.org Signed-off-by: NJie Liu <jeff.liu@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Jie Liu 提交于
For CRC enabled v5 super block, change a file's ownership can simply trigger an ASSERT failure at xfs_setattr_nonsize() if both group and project quota are enabled, i.e, [ 305.337609] XFS: Assertion failed: !XFS_IS_PQUOTA_ON(mp), file: fs/xfs/xfs_iops.c, line: 621 [ 305.339250] Kernel BUG at ffffffffa0a7fa32 [verbose debug info unavailable] [ 305.383939] Call Trace: [ 305.385536] [<ffffffffa0a7d95a>] xfs_setattr_nonsize+0x69a/0x720 [xfs] [ 305.387142] [<ffffffffa0a7dea9>] xfs_vn_setattr+0x29/0x70 [xfs] [ 305.388727] [<ffffffff811ca388>] notify_change+0x1a8/0x350 [ 305.390298] [<ffffffff811ac39d>] chown_common+0xfd/0x110 [ 305.391868] [<ffffffff811ad6bf>] SyS_fchownat+0xaf/0x110 [ 305.393440] [<ffffffff811ad760>] SyS_lchown+0x20/0x30 [ 305.394995] [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f [ 305.399870] RIP [<ffffffffa0a7fa32>] assfail+0x22/0x30 [xfs] This fix adjust the assertion to check if the super block support both quota inodes or not. Signed-off-by: NJie Liu <jeff.liu@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 07 12月, 2013 5 次提交
-
-
由 Christoph Hellwig 提交于
Split out a xfs_setattr_time helper to share code between truncate and regular setattr similar to xfs_setattr_mode. I might also have another caller growing for this in the near future. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Christoph Hellwig 提交于
Remove the pointless tp argument, and properly align the local variable declarations. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Jie Liu 提交于
After the previous fix, there still has another ASSERT failure if turning off any type of quota while fsstress is running at the same time. Backtrace in this case: [ 50.867897] XFS: Assertion failed: XFS_IS_GQUOTA_ON(mp), file: fs/xfs/xfs_qm.c, line: 2118 [ 50.867924] ------------[ cut here ]------------ ... <snip> [ 50.867957] Kernel BUG at ffffffffa0b55a32 [verbose debug info unavailable] [ 50.867999] invalid opcode: 0000 [#1] SMP [ 50.869407] Call Trace: [ 50.869446] [<ffffffffa0bc408a>] xfs_qm_vop_create_dqattach+0x19a/0x2d0 [xfs] [ 50.869512] [<ffffffffa0b9cc45>] xfs_create+0x5c5/0x6a0 [xfs] [ 50.869564] [<ffffffffa0b5307c>] xfs_vn_mknod+0xac/0x1d0 [xfs] [ 50.869615] [<ffffffffa0b531d6>] xfs_vn_mkdir+0x16/0x20 [xfs] [ 50.869655] [<ffffffff811becd5>] vfs_mkdir+0x95/0x130 [ 50.869689] [<ffffffff811bf63a>] SyS_mkdirat+0xaa/0xe0 [ 50.869723] [<ffffffff811bf689>] SyS_mkdir+0x19/0x20 [ 50.869757] [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f [ 50.869793] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 <snip> [ 50.870003] RIP [<ffffffffa0b55a32>] assfail+0x22/0x30 [xfs] [ 50.870050] RSP <ffff88002941fd60> [ 50.879251] ---[ end trace c93a2b342341c65b ]--- We're hitting the ASSERT(XFS_IS_*QUOTA_ON(mp)) in xfs_qm_vop_create_dqattach(), however the assertion itself is not right IMHO. While performing quota off, we firstly clear the XFS_*QUOTA_ACTIVE bit(s) from struct xfs_mount without taking any special locks, see xfs_qm_scall_quotaoff(). Hence there is no guarantee that the desired quota is still active. Signed-off-by: NJie Liu <jeff.liu@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Jie Liu 提交于
The xfs_quota_priv header file is only included by xfs_qm header and there is no much users for its contents, hence we can move those stuff to xfs_qm header file and kill it. This patch also remove an unused macro DQFLAGTO_TYPESTR. Signed-off-by: NJie Liu <jeff.liu@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Jie Liu 提交于
In xfs_qm_scall_trunc_qfiles(), we ignore the error if failed to remove the users quota metadata and proceed to remove groups and projects if they are being there. However, in user space, the remove operation will break and return if failed to remove any kind of quota. Also for v5 super block, we can enabled both group and project quota at the same time, in this case the current error handling will cover the group error with projects but they might failed due to different reasons. It seems we'd better the error handling consistent to the user space and don't trying to remove another kind of quota metadata if the previous operation is failed. Signed-off-by: NJie Liu <jeff.liu@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 06 12月, 2013 3 次提交
-
-
由 Mark Tinguely 提交于
Fix the leak of kernel memory in xfs_dir2_node_removename() when xfs_dir2_leafn_remove() returns an error code. Signed-off-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Mark Tinguely 提交于
Recovery builds a list of items on the transaction's r_itemq head. Normally these items are committed and freed. But in the event of a recovery error, these allocations are leaked. If the error occurs during item reordering, then reconstruct the r_itemq list before deleting the list to avoid leaking the entries that were on one of the temporary lists. Signed-off-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
This loop in xfs_growfs_data_private() is incorrect for V4 superblocks filesystems: for (bucket = 0; bucket < XFS_AGFL_SIZE(mp); bucket++) agfl->agfl_bno[bucket] = cpu_to_be32(NULLAGBLOCK); For V4 filesystems, we don't have a agfl header structure, and so XFS_AGFL_SIZE() returns an entire sector's worth of entries, which we then index from an offset into the sector. Hence: buffer overrun. This problem was introduced in 3.10 by commit 77c95bba ("xfs: add CRC checks to the AGFL") which changed the AGFL structure but failed to update the growfs code to handle the different structures. Fix it by using the correct offset into the buffer for both V4 and V5 filesystems. Cc: <stable@vger.kernel.org> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 05 12月, 2013 5 次提交
-
-
由 Jie Liu 提交于
For discard operation, we should return EINVAL if the given range length is less than a block size, otherwise it will go through the file system to discard data blocks as the end range might be evaluated to -1, e.g, # fstrim -v -o 0 -l 100 /xfs7 /xfs7: 9811378176 bytes were trimmed This issue can be triggered via xfstests/generic/288. Also, it seems to get the request queue pointer via bdev_get_queue() instead of the hard code pointer dereference is not a bad thing. Signed-off-by: NJie Liu <jeff.liu@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dan Carpenter 提交于
If we allocate less than sizeof(struct attrlist) then we end up corrupting memory or doing a ZERO_PTR_SIZE dereference. This can only be triggered with CAP_SYS_ADMIN. Reported-by: NNico Golde <nico@ngolde.de> Reported-by: NFabian Yamaguchi <fabs@goesec.de> Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NEric Sandeen <sandeen@redhat.com.> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Eric Sandeen 提交于
The "verbose" argument to xfs_setsize_buftarg_flags() has been unused since: ffe37436 xfs: stop using the page cache to back the buffer cache Remove it, and fold the function into xfs_setsize_buftarg() now that there's no need for different types of callers. Fix inconsistent comment spacing while we're at it. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 18 11月, 2013 3 次提交
-
-
由 Dave Chinner 提交于
Michael L Semon reported that generic/069 runtime increased on v5 superblocks by 100% compared to v4 superblocks. his perf-based analysis pointed directly at the timestamp updates being done by the write path in this workload. The append writers are doing 4-byte writes, so there are lots of timestamp updates occurring. The thing is, they aren't being triggered by timestamp changes - they are being triggered by the inode change counter needing to be updated. That is, every write(2) system call needs to bump the inode version count, and it does that through the timestamp update mechanism. Hence for v5 filesystems, test generic/069 is running 3 orders of magnitude more timestmap update transactions on v5 filesystems due to the fact it does a huge number of *4 byte* write(2) calls. This isn't a real world scenario we really need to address - anyone doing such sequential IO should be using fwrite(3), not write(2). i.e. fwrite(3) buffers the writes in userspace to minimise the number of write(2) syscalls, and the problem goes away. However, there is a small change we can make to improve the situation - removing the expensive lock operation on the change counter update. All inode version counter changes in XFS occur under the ip->i_ilock during a transaction, and therefore we don't actually need the spin lock that provides exclusive access to it through inc_inode_iversion(). Hence avoid the lock and just open code the increment ourselves when logging the inode. Reported-by: NMichael L. Semon <mlsemon35@gmail.com> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
v5 filesystems use 512 byte inodes as a minimum, so read inodes in clusters that are effectively half the size of a v4 filesystem with 256 byte inodes. For v5 fielsystems, scale the inode cluster size with the size of the inode so that we keep a constant 32 inodes per cluster ratio for all inode IO. This only works if mkfs.xfs sets the inode alignment appropriately for larger inode clusters, so this functionality is made conditional on mkfs doing the right thing. xfs_repair needs to know about the inode alignment changes, too. Wall time: create bulkstat find+stat ls -R unlink v4 237s 161s 173s 201s 299s v5 235s 163s 205s 31s 356s patched 234s 160s 182s 29s 317s System time: create bulkstat find+stat ls -R unlink v4 2601s 2490s 1653s 1656s 2960s v5 2637s 2497s 1681s 20s 3216s patched 2613s 2451s 1658s 20s 3007s So, wall time same or down across the board, system time same or down across the board, and cache hit rates all improve except for the ls -R case which is a pure cold cache directory read workload on v5 filesystems... So, this patch removes most of the performance and CPU usage differential between v4 and v5 filesystems on traversal related workloads. Note: while this patch is currently for v5 filesystems only, there is no reason it can't be ported back to v4 filesystems. This hasn't been done here because bringing the code back to v4 requires forwards and backwards kernel compatibility testing. i.e. to deterine if older kernels(*) do the right thing with larger inode alignments but still only using 8k inode cluster sizes. None of this testing and validation on v4 filesystems has been done, so for the moment larger inode clusters is limited to v5 superblocks. (*) a current default config v4 filesystem should mount just fine on 2.6.23 (when lazy-count support was introduced), and so if we change the alignment emitted by mkfs without a feature bit then we have to make sure it works properly on all kernels since 2.6.23. And if we allow it to be changed when the lazy-count bit is not set, then it's all kernels since v2 logs were introduced that need to be tested for compatibility... Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Mark Tinguely 提交于
xfs_trans_ijoin() activates the inode in a transaction and also can specify which lock to free when the transaction is committed or canceled. xfs_bmap_add_attrfork call locks and adds the lock to the transaction but also manually removes the lock. Change the routine to not add the lock to the transaction and manually remove lock on completion. While here, clean up the xfs_trans_cancel flags and goto names. Signed-off-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 13 11月, 2013 1 次提交
-
-
由 Jan Kara 提交于
When there are processes heavily creating small files while sync(2) is running, it can easily happen that quite some new files are created between WB_SYNC_NONE and WB_SYNC_ALL pass of sync(2). That can happen especially if there are several busy filesystems (remember that sync traverses filesystems sequentially and waits in WB_SYNC_ALL phase on one fs before starting it on another fs). Because WB_SYNC_ALL pass is slow (e.g. causes a transaction commit and cache flush for each inode in ext3), resulting sync(2) times are rather large. The following script reproduces the problem: function run_writers { for (( i = 0; i < 10; i++ )); do mkdir $1/dir$i for (( j = 0; j < 40000; j++ )); do dd if=/dev/zero of=$1/dir$i/$j bs=4k count=4 &>/dev/null done & done } for dir in "$@"; do run_writers $dir done sleep 40 time sync Fix the problem by disregarding inodes dirtied after sync(2) was called in the WB_SYNC_ALL pass. To allow for this, sync_inodes_sb() now takes a time stamp when sync has started which is used for setting up work for flusher threads. To give some numbers, when above script is run on two ext4 filesystems on simple SATA drive, the average sync time from 10 runs is 267.549 seconds with standard deviation 104.799426. With the patched kernel, the average sync time from 10 runs is 2.995 seconds with standard deviation 0.096. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NFengguang Wu <fengguang.wu@intel.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 11月, 2013 3 次提交
-
-
由 Gu Zheng 提交于
Introduce flag KM_ZERO which is used to alloc zeroed entry, and convert kmem_{zone_}zalloc to call kmem_{zone_}alloc() with KM_ZERO directly, in order to avoid the setting to zero step. And following Dave's suggestion, make kmem_{zone_}zalloc static inline into kmem.h as they're now just a simple wrapper. V2: Make kmem_{zone_}zalloc static inline into kmem.h as Dave suggested. Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
To help track down AGI/AGF lock ordering issues, I added these tracepoints to tell us when an AGI or AGF is read and locked. With these we can now determine if the lock ordering goes wrong from tracing captures. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
I debugging a log tail issue on a RHEL6 kernel, I added these trace points to trace log items being added, moved and removed in the AIL and how that affected the log tail LSN that was written to the log. They were very helpful in that they immediately identified the cause of the problem being seen. Hence I'd like to always have them available for use. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 05 11月, 2013 1 次提交
-
-
由 Dave Chinner 提交于
Removing an inode from the namespace involves removing the directory entry and dropping the link count on the inode. Removing the directory entry can result in locking an AGF (directory blocks were freed) and removing a link count can result in placing the inode on an unlinked list which results in locking an AGI. The big problem here is that we have an ordering constraint on AGF and AGI locking - inode allocation locks the AGI, then can allocate a new extent for new inodes, locking the AGF after the AGI. Similarly, freeing the inode removes the inode from the unlinked list, requiring that we lock the AGI first, and then freeing the inode can result in an inode chunk being freed and hence freeing disk space requiring that we lock an AGF. Hence the ordering that is imposed by other parts of the code is AGI before AGF. This means we cannot remove the directory entry before we drop the inode reference count and put it on the unlinked list as this results in a lock order of AGF then AGI, and this can deadlock against inode allocation and freeing. Therefore we must drop the link counts before we remove the directory entry. This is still safe from a transactional point of view - it is not until we get to xfs_bmap_finish() that we have the possibility of multiple transactions in this operation. Hence as long as we remove the directory entry and drop the link count in the first transaction of the remove operation, there are no transactional constraints on the ordering here. Change the ordering of the operations in the xfs_remove() function to align the ordering of AGI and AGF locking to match that of the rest of the code. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 01 11月, 2013 1 次提交
-
-
由 Jie Liu 提交于
At xfs_iext_add(), if extent(s) are being appended to the last page in the indirection array and the new extent(s) don't fit in the page, the number of extents(erp->er_extcount) in a new allocated entry should be the minimum value between count and XFS_LINEAR_EXTS, instead of count. For now, there is no existing test case can demonstrates a problem with the er_extcount being set incorrectly here, but it obviously like a bug. Signed-off-by: NJie Liu <jeff.liu@oracle.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 31 10月, 2013 10 次提交
-
-
由 Eric Sandeen 提交于
Today, if xfs_sb_read_verify encounters a v4 superblock with junk past v4 fields which includes data in sb_crc, it will be treated as a failing checksum and a significant corruption. There are known prior bugs which leave junk at the end of the V4 superblock; we don't need to actually fail the verification in this case if other checks pan out ok. So if this is a secondary superblock, and the primary superblock doesn't indicate that this is a V5 filesystem, don't treat this as an actual checksum failure. We should probably check the garbage condition as we do in xfs_repair, and possibly warn about it or self-heal, but that's a different scope of work. Stable folks: This can go back to v3.10, which is what introduced the sb CRC checking that is tripped up by old, stale, incorrect V4 superblocks w/ unzeroed bits. Cc: stable@vger.kernel.org Signed-off-by: NEric Sandeen <sandeen@redhat.com> Acked-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Geyslan G. Bem 提交于
In xlog_verify_iclog a debug check of the incore log buffers prints an error if icptr is null and then goes on to dereference the pointer regardless. Convert this to an assert so that the intention is clear. This was reported by Coverty. Signed-off-by: NBen Myers <bpm@sgi.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com>
-
由 Denis Efremov 提交于
ASSERT on args takes place after args dereference. This assertion is redundant since we are going to panic anyway. Found by Linux Driver Verification project (linuxtesting.org) - PVS-Studio analyzer. Signed-off-by: NDenis Efremov <yefremov.denis@gmail.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Page cache allocation doesn't always go through ->begin_write and hence we don't always get the opportunity to set the allocation context to GFP_NOFS. Failing to do this means we open up the direct relcaim stack to recurse into the filesystem and consume a significant amount of stack. On RHEL6.4 kernels we are seeing ra_submit() and generic_file_splice_read() from an nfsd context recursing into the filesystem via the inode cache shrinker and evicting inodes. This is causing truncation to be run (e.g EOF block freeing) and causing bmap btree block merges and free space btree block splits to occur. These btree manipulations are occurring with the call chain already 30 functions deep and hence there is not enough stack space to complete such operations. To avoid these specific overruns, we need to prevent the page cache allocation from recursing via direct reclaim. We can do that because the allocation functions take the allocation context from that which is stored in the mapping for the inode. We don't set that right now, so the default is GFP_HIGHUSER_MOVABLE, which is effectively a GFP_KERNEL context. We need it to be the equivalent of GFP_NOFS, so when we initialise an inode, set the mapping gfp mask appropriately. This makes the use of AOP_FLAG_NOFS redundant from other parts of the XFS IO path, so get rid of it. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
The kbuild test robot indicated that there were some new sparse warnings in fs/xfs/xfs_dquot_buf.c. Actually, there were a lot more that is wasn't warning about, so fix them all up. Reported-by: kbuild test robot Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
The directory block format verifier fails to check that the leaf entry count is in a valid range, and so if it is corrupted then it can lead to derefencing a pointer outside the block buffer. While we can't exactly validate the count without first walking the directory block, we can ensure the count lands in the valid area within the directory block and hence avoid out-of-block references. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Rather than hiding the ftype field size accounting inside the dirent padding for the ".." and first entry offset functions for v2 directory formats, add explicit functions that calculate it correctly. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Many of the vectorised function calls now take no parameters and return a constant value. There is no reason for these to be vectored functions, so convert them to constants Binary sizes: text data bss dec hex filename 794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig 792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1 792350 96802 1096 890248 d9588 fs/xfs/xfs.o.p2 789293 96802 1096 887191 d8997 fs/xfs/xfs.o.p3 789005 96802 1096 886903 d8997 fs/xfs/xfs.o.p4 789061 96802 1096 886959 d88af fs/xfs/xfs.o.p5 789733 96802 1096 887631 d8b4f fs/xfs/xfs.o.p6 791421 96802 1096 889319 d91e7 fs/xfs/xfs.o.p7 791701 96802 1096 889599 d92ff fs/xfs/xfs.o.p8 791205 96802 1096 889103 d91cf fs/xfs/xfs.o.p9 Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Next step in the vectorisation process is the directory free block encode/decode operations. There are relatively few of these, though there are quite a number of calls to them. Binary sizes: text data bss dec hex filename 794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig 792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1 792350 96802 1096 890248 d9588 fs/xfs/xfs.o.p2 789293 96802 1096 887191 d8997 fs/xfs/xfs.o.p3 789005 96802 1096 886903 d8997 fs/xfs/xfs.o.p4 789061 96802 1096 886959 d88af fs/xfs/xfs.o.p5 789733 96802 1096 887631 d8b4f fs/xfs/xfs.o.p6 791421 96802 1096 889319 d91e7 fs/xfs/xfs.o.p7 791701 96802 1096 889599 d92ff fs/xfs/xfs.o.p8 Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Conversion from on-disk structures to in-core header structures currently relies on magic number checks. If the magic number is wrong, but one of the supported values, we do the wrong thing with the encode/decode operation. Split these functions so that there are discrete operations for the specific directory format we are handling. In doing this, move all the header encode/decode functions to xfs_da_format.c as they are directly manipulating the on-disk format. It should be noted that all the growth in binary size is from xfs_da_format.c - the rest of the code actaully shrinks. text data bss dec hex filename 794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig 792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1 792350 96802 1096 890248 d9588 fs/xfs/xfs.o.p2 789293 96802 1096 887191 d8997 fs/xfs/xfs.o.p3 789005 96802 1096 886903 d8997 fs/xfs/xfs.o.p4 789061 96802 1096 886959 d88af fs/xfs/xfs.o.p5 789733 96802 1096 887631 d8b4f fs/xfs/xfs.o.p6 791421 96802 1096 889319 d91e7 fs/xfs/xfs.o.p7 Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-