- 22 5月, 2011 1 次提交
-
-
由 Chris Mason 提交于
Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 21 5月, 2011 1 次提交
-
-
由 Miao Xie 提交于
Changelog V5 -> V6: - Fix oom when the memory load is high, by storing the delayed nodes into the root's radix tree, and letting btrfs inodes go. Changelog V4 -> V5: - Fix the race on adding the delayed node to the inode, which is spotted by Chris Mason. - Merge Chris Mason's incremental patch into this patch. - Fix deadlock between readdir() and memory fault, which is reported by Itaru Kitayama. Changelog V3 -> V4: - Fix nested lock, which is reported by Itaru Kitayama, by updating space cache inode in time. Changelog V2 -> V3: - Fix the race between the delayed worker and the task which does delayed items balance, which is reported by Tsutomu Itoh. - Modify the patch address David Sterba's comment. - Fix the bug of the cpu recursion spinlock, reported by Chris Mason Changelog V1 -> V2: - break up the global rb-tree, use a list to manage the delayed nodes, which is created for every directory and file, and used to manage the delayed directory name index items and the delayed inode item. - introduce a worker to deal with the delayed nodes. Compare with Ext3/4, the performance of file creation and deletion on btrfs is very poor. the reason is that btrfs must do a lot of b+ tree insertions, such as inode item, directory name item, directory name index and so on. If we can do some delayed b+ tree insertion or deletion, we can improve the performance, so we made this patch which implemented delayed directory name index insertion/deletion and delayed inode update. Implementation: - introduce a delayed root object into the filesystem, that use two lists to manage the delayed nodes which are created for every file/directory. One is used to manage all the delayed nodes that have delayed items. And the other is used to manage the delayed nodes which is waiting to be dealt with by the work thread. - Every delayed node has two rb-tree, one is used to manage the directory name index which is going to be inserted into b+ tree, and the other is used to manage the directory name index which is going to be deleted from b+ tree. - introduce a worker to deal with the delayed operation. This worker is used to deal with the works of the delayed directory name index items insertion and deletion and the delayed inode update. When the delayed items is beyond the lower limit, we create works for some delayed nodes and insert them into the work queue of the worker, and then go back. When the delayed items is beyond the upper bound, we create works for all the delayed nodes that haven't been dealt with, and insert them into the work queue of the worker, and then wait for that the untreated items is below some threshold value. - When we want to insert a directory name index into b+ tree, we just add the information into the delayed inserting rb-tree. And then we check the number of the delayed items and do delayed items balance. (The balance policy is above.) - When we want to delete a directory name index from the b+ tree, we search it in the inserting rb-tree at first. If we look it up, just drop it. If not, add the key of it into the delayed deleting rb-tree. Similar to the delayed inserting rb-tree, we also check the number of the delayed items and do delayed items balance. (The same to inserting manipulation) - When we want to update the metadata of some inode, we cached the data of the inode into the delayed node. the worker will flush it into the b+ tree after dealing with the delayed insertion and deletion. - We will move the delayed node to the tail of the list after we access the delayed node, By this way, we can cache more delayed items and merge more inode updates. - If we want to commit transaction, we will deal with all the delayed node. - the delayed node will be freed when we free the btrfs inode. - Before we log the inode items, we commit all the directory name index items and the delayed inode update. I did a quick test by the benchmark tool[1] and found we can improve the performance of file creation by ~15%, and file deletion by ~20%. Before applying this patch: Create files: Total files: 50000 Total time: 1.096108 Average time: 0.000022 Delete files: Total files: 50000 Total time: 1.510403 Average time: 0.000030 After applying this patch: Create files: Total files: 50000 Total time: 0.932899 Average time: 0.000019 Delete files: Total files: 50000 Total time: 1.215732 Average time: 0.000024 [1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3 Many thanks for Kitayama-san's help! Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Reviewed-by: NDavid Sterba <dave@jikos.cz> Tested-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com> Tested-by: NItaru Kitayama <kitayama@cl.bb4u.ne.jp> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 18 5月, 2011 4 次提交
-
-
由 Joel Becker 提交于
configfs_readdir() will use the existing inode numbers of inodes in the dcache, but it makes them up for attribute files that aren't currently instantiated. There is a race where a closing attribute file can be tearing down at the same time as configfs_readdir() is trying to get its inode number. We want to get the inode number of open attribute files, because they should match while instantiated. We can't lock down the transition where dentry->d_inode is set to NULL, so we just check for NULL there. We can, however, ensure that an inode we find isn't iput() in configfs_d_iput() until after we've accessed it. Signed-off-by: NJoel Becker <jlbec@evilplan.org>
-
由 Joel Becker 提交于
When configfs is faking mkdir() on its subsystem or default group objects, it starts by adding a negative dentry. It then tries to instantiate the group. If that should fail, it must clean up after itself. I was using d_delete() here, but configfs_attach_group() promises to return an empty dentry on error. d_delete() explodes with the entry dentry. Let's try d_drop() instead. The unhashing is what we want for our dentry. Signed-off-by: NJoel Becker <jlbec@evilplan.org>
-
由 Jeff Layton 提交于
As Metze pointed out, commit 84cdf74e broke mapchars option: Commit "cifs: fix unaligned accesses in cifsConvertToUCS" (84cdf74e) does multiple steps in just one commit (moving the function and changing it without testing). put_unaligned_le16(temp, &target[j]); is never called for any codepoint the goes via the 'default' switch statement. As a result we put just zero (or maybe uninitialized) bytes into the target buffer. His proposed patch looks correct, but doesn't apply to the current head of the tree. This patch should also fix it. Cc: <stable@kernel.org> # .38.x: 581ade4d: cifs: clean up various nits in unicode routines (try #2) Reported-by: NStefan Metzmacher <metze@samba.org> Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NSteve French <sfrench@us.ibm.com>
-
由 Jeff Layton 提交于
The is_path_accessible check uses a QPathInfo call, which isn't supported by ancient win9x era servers. Fall back to an older SMBQueryInfo call if it fails with the magic error codes. Cc: stable@kernel.org Reported-and-Tested-by: NSandro Bonazzola <sandro.bonazzola@gmail.com> Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NSteve French <sfrench@us.ibm.com>
-
- 15 5月, 2011 5 次提交
-
-
由 Li Zefan 提交于
Steps to reproduce the bug: - Call FS_IOC_SETLFAGS ioctl with flags=FS_COMPR_FL - Call FS_IOC_SETFLAGS ioctl with flags=0 - Call FS_IOC_GETFLAGS ioctl, and you'll see FS_COMPR_FL is still set! Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Li Zefan 提交于
As we've added per file compression/cow support. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Li Zefan 提交于
FS_COW_FL and FS_NOCOW_FL were newly introduced to control per file COW in btrfs, but FS_NOCOW_FL is sufficient. The fact is we don't have corresponding BTRFS_INODE_COW flag. COW is default, and FS_NOCOW_FL can be used to switch off COW for a single file. If we mount btrfs with nodatacow, a newly created file will be set with the FS_NOCOW_FL flag. So to turn on COW for it, we can just clear the FS_NOCOW_FL flag. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 liubo 提交于
When a btrfs disk is created by mixed data & metadata option, it will have no pure data or pure metadata space info. In btrfs's for-linus branch, commit 78b1ea13838039cd88afdd62519b40b344d6c920 (Btrfs: fix OOPS of empty filesystem after balance) initializes space infos at the very beginning. The problem is this initialization does not take the mixed case into account, which will cause btrfs will easily get into ENOSPC in mixed case. Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Daniel J Blueman 提交于
If posix_acl_from_xattr() returns an error code, a negative address is dereferenced causing an oops; fix by checking for error code first. Signed-off-by: NDaniel J Blueman <daniel.blueman@gmail.com> Reviewed-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 14 5月, 2011 8 次提交
-
-
由 Linus Torvalds 提交于
It's a hot function, and we're better off not mixing types in the mask calculations. The compiler just ends up mixing 16-bit and 32-bit operations, for no good reason. So do everything in 'unsigned int' rather than mixing 'unsigned int' masking with a 'umode_t' (16-bit) mode variable. This, together with the parent commit (47a150ed: "Cache user_ns in struct cred") makes acl_permission_check() much nicer. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sunil Mushran 提交于
During resource migration, if the target node were to die, the thread doing the migration spins until the target node is not removed from the domain map. This patch slows the spin by making the thread wait for the recovery to kick in. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NJoel Becker <jlbec@evilplan.org>
-
由 Sunil Mushran 提交于
Patch skips mount recovery for hard-ro mounts which otherwise leads to an oops. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Acked-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NJoel Becker <jlbec@evilplan.org>
-
由 Sunil Mushran 提交于
If o2hb finds unexpected values in the heartbeat slot, it prints a message "ERROR: Device "dm-6": another node is heartbeating in our slot!" This message could be misleading. This patch adds two more messages to help users better diagnose the problem. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Acked-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NJoel Becker <jlbec@evilplan.org>
-
由 Sunil Mushran 提交于
We have seen isolated cases (very few, I might add) of o2hb not detecting all live nodes on startup. One plausible reasoning for it is that other node had a hb io delay at the same time. The live threshold set at 2 (as low as it can be) could be increased to ameliorate the situation. But increasing the threshold directly affects mount time. Currently it takes around 5 secs to mount a volume in o2cb cluster with local heartbeat. Increasing the threshold will make mounts even slower. As the issue itself is rare, we have left things as they are for the local heartbeat mode. However we can improve the situation for global heartbeat mode as in that mode, we start the heartbeat much before the mount. This patch doubles the live threshold for the start of the first region in global heartbeat mode. Addresses internal Oracle bug#10635585. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Acked-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NJoel Becker <jlbec@evilplan.org>
-
由 Sunil Mushran 提交于
Patch fixes a bug in the o2dlm protocol negotiation in that it is using the builtin version rather than the negotiated version during the domain join. This causes join errors when a node having kernel >= 2.6.37 joins a cluster with nodes having kernels < 2.6.37. This only affects the o2cb cluster stack. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Reported-by: NJacek Stepniewski <Jacek.Stepniewski@agora.pl> Acked-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NJoel Becker <jlbec@evilplan.org>
-
由 Tristan Ye 提交于
In the case of removing a partial extent record which covers a hole, current punching-hole logic will try to remove more than the length of whole extent record, which leads to the failure of following assert(fs/ocfs2/alloc.c): 5507 BUG_ON(cpos < le32_to_cpu(rec->e_cpos) || trunc_range > rec_range); This patch tries to skip existing hole at the last attempt of removing a partial extent record, what's more, it also adds some necessary comments for better understanding of punching-hole codes. Signed-off-by: NTristan Ye <tristan.ye@oracle.com> Signed-off-by: NJoel Becker <jlbec@evilplan.org>
-
由 Marcus Meissner 提交于
CLANG found that there is a path that has data_ac uninitialized, this place 2917 /* This gets us the dx_root */ 2918 ret = ocfs2_reserve_new_metadata_blocks(osb, 1, &meta_ac); 2919 if (ret) { 3 Taking true branch 2920 mlog_errno(ret); 2921 goto out; 4 Control jumps to line 3168 2922 } Goes to the out: label without data_ac being initialized. Ciao, Marcus Signed-Off-By: NMarcus Meissner <meissner@suse.de> Signed-off-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NJoel Becker <jlbec@evilplan.org>
-
- 13 5月, 2011 5 次提交
-
-
由 Arne Jansen 提交于
In a multi device setup, the chunk allocator currently always allocates chunks on the devices in the same order. This leads to a very uneven distribution, especially with RAID1 or RAID10 and an uneven number of devices. This patch always sorts the devices before allocating, and allocates the stripes on the devices with the most available space, as long as there is enough space available. In a low space situation, it first tries to maximize striping. The patch also simplifies the allocator and reduces the checks for corner cases. The simplification is done by several means. First, it defines the properties of each RAID type upfront. These properties are used afterwards instead of differentiating cases in several places. Second, the old allocator defined a minimum stripe size for each block group type, tried to find a large enough chunk, and if this fails just allocates a smaller one. This is now done in one step. The largest possible chunk (up to max_chunk_size) is searched and allocated. Because we now have only one pass, the allocation of the map (struct map_lookup) is moved down to the point where the number of stripes is already known. This way we avoid reallocation of the map. We still avoid allocating stripes that are not a multiple of STRIPE_SIZE.
-
由 Arne Jansen 提交于
currently alloc_start is disregarded if the requested chunk size is bigger than (device size - alloc_start), but smaller than the device size. The only situation where I see this could have made sense was when a chunk equal the size of the device has been requested. This was possible as the allocator failed to take alloc_start into account when calculating the request chunk size. As this gets fixed by this patch, the workaround is not necessary anymore.
-
由 Arne Jansen 提交于
this function won't be used here anymore, so move it super.c where it is used for df-calculation
-
由 David Sterba 提交于
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
由 David Sterba 提交于
As per printk_ratelimit comment, it should not be used. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
- 12 5月, 2011 10 次提交
-
-
由 Arne Jansen 提交于
setting the readonly flag prevents writes in case an error is detected Signed-off-by: NArne Jansen <sensille@gmx.net>
-
由 Ilya Dryomov 提交于
btrfs scrub - make fixups sync, don't reuse fixup bios Fixups are already sync for csum failures, this patch makes them sync for EIO case as well. Fixups are now sharing pages with the parent sbio - instead of allocating a separate page to do a fixup we grab the page from the sbio buffer. Fixup bios are no longer reused. struct fixup is no longer needed, instead pass [sbio pointer, index]. Originally this was added to look at the possibility of sharing the code between drive swap and scrub, but it actually fixes a serious bug in scrub code where errors that could be corrected were ignored and reported as uncorrectable. btrfs scrub - restore bios properly after media errors The current code reallocates a bio after a media error. This is a temporary measure introduced in v3 after a serious problem related to bio reuse was found in v2 of scrub patchset. Basically we did not reset bv_offset and bv_len fields of the bio_vec structure. They are changed in case I/O error happens, for example, at offset 512 or 1024 into the page. Also bi_flags field wasn't properly setup before reusing the bio. Signed-off-by: NArne Jansen <sensille@gmx.net>
-
由 Jan Schmidt 提交于
adds ioctls necessary to start and cancel scrubs, to get current progress and to get info about devices to be scrubbed. Note that the scrub is done per-device and that the ioctl only returns after the scrub for this devices is finished or has been canceled. Signed-off-by: NArne Jansen <sensille@gmx.net>
-
由 Arne Jansen 提交于
This adds an initial implementation for scrub. It works quite straightforward. The usermode issues an ioctl for each device in the fs. For each device, it enumerates the allocated device chunks. For each chunk, the contained extents are enumerated and the data checksums fetched. The extents are read sequentially and the checksums verified. If an error occurs (checksum or EIO), a good copy is searched for. If one is found, the bad copy will be rewritten. All enumerations happen from the commit roots. During a transaction commit, the scrubs get paused and afterwards continue from the new roots. This commit is based on the series originally posted to linux-btrfs with some improvements that resulted from comments from David Sterba, Ilya Dryomov and Jan Schmidt. Signed-off-by: NArne Jansen <sensille@gmx.net>
-
由 Trond Myklebust 提交于
Currently, writebacks may end up recursing back into the filesystem due to GFP_KERNEL direct reclaims in the pnfs subsystem. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Andy Adamson 提交于
Prevents an infinite loop as list was never emptied. Signed-off-by: NAndy Adamson <andros@netapp.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Andy Adamson 提交于
Free the slot and resend the RPC with new session <slot#,seq#>. For nfs4_async_handle_error, return -EAGAIN and set the task->tk_status to 0 to restart the async rpc in the rpc_restart_call_prepare state which resets the slot. For nfs4_handle_exception, retrying a call that uses nfs4_call_sync will reset the slot via nfs41_call_sync_prepare. For open/close/lock/locku/delegreturn/layoutcommit/unlink/rename/write cachethis is true, so these operations will not trigger an NFS4ERR_RETRY_UNCACHED_REP. Signed-off-by: NAndy Adamson <andros@netapp.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Henry C Chang 提交于
We increments i_wrbuffer_ref when taking the Fb cap. This breaks the dirty page accounting and causes looping in __ceph_do_pending_vmtruncate, and ceph client hangs. This bug can be reproduced occasionally by running blogbench. Add a new field i_wb_ref to inode and dedicate it to Fb reference counting. Signed-off-by: NHenry C Chang <henry.cy.chang@gmail.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Henry C Chang 提交于
Signed-off-by: NHenry C Chang <henry.cy.chang@gmail.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Henry C Chang 提交于
The mds session, s, could be freed during ceph_put_mds_session. Move dout before ceph_put_mds_session. Signed-off-by: NHenry C Chang <henry.cy.chang@gmail.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 10 5月, 2011 6 次提交
-
-
由 Miklos Szeredi 提交于
Some cases (e.g. ecryptfs) can call ->dentry_revalidate with NULL nameidata. https://bugzilla.kernel.org/show_bug.cgi?id=34732 Tyler Hicks pointed out that this bug was introduced by commit e7c0a167 "fuse: make fuse_dentry_revalidate() RCU aware" Reported-by: NWitold Baryluk <baryluk@smp.if.uj.edu.pl> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Ryusuke Konishi 提交于
After having applied commit 9954e7af ("nilfs2: add free entries count only if clear bit operation succeeded"), a free routine of nilfs came to fall into an infinite loop, outputting the same message endlessly: nilfs_palloc_freev: entry number 29497 already freed nilfs_palloc_freev: entry number 29497 already freed nilfs_palloc_freev: entry number 29497 already freed nilfs_palloc_freev: entry number 29497 already freed nilfs_palloc_freev: entry number 29497 already freed ... That patch broke the routine so that a loop counter is never updated in an abnormal state. This fixes the regression. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Dave Chinner 提交于
The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. One is caused by a race condition in determining whether there is a psh in progress or not. The XFS_AIL_PUSHING_BIT is used to determine whether a push is currently in progress. When the AIL push work completes, it checked whether the target changed and cleared the PUSHING bit to allow a new push to be requeued. The race condition is as follows: Thread 1 push work smp_wmb() smp_rmb() check ailp->xa_target unchanged update ailp->xa_target test/set PUSHING bit does not queue clear PUSHING bit does not requeue Now that the push target is updated, new attempts to push the AIL will not trigger as the push target will be the same, and hence despite trying to push the AIL we won't ever wake it again. The fix is to ensure that the AIL push work clears the PUSHING bit before it checks if the target is unchanged. As a result, both push triggers operate on the same test/set bit criteria, so even if we race in the push work and miss the target update, the thread requesting the push will still set the PUSHING bit and queue the push work to occur. For safety sake, the same queue check is done if the push work detects the target change, though only one of the two will will queue new work due to the use of test_and_set_bit() checks. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com> (cherry picked from commit e4d3c4a4)
-
由 Dave Chinner 提交于
The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. One of the problems noticed was that updates of the push target are not 32 bit safe as the target is a 64 bit value. We cannot copy a 64 bit LSN without the possibility of corrupting the result when racing with another updating thread. We have function to do this update safely without needing to care about 32/64 bit issues - xfs_trans_ail_copy_lsn() - so use that when updating the AIL push target. Also move the reading of the target in the push work inside the AIL lock, and use XFS_LSN_CMP() for the unlocked comparison during work termination to close read holes as well. Signed-off-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com> (cherry picked from commit fd5670f2)
-
由 Dave Chinner 提交于
The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. One of the problems discovered is a target mismatch between the item pushing loop and the target itself. The push trigger checks for the target increasing (i.e. new target > current) while the push loop only pushes items that have a LSN < current. As a result, we can get the situation where the push target is X, the items at the tail of the AIL have LSN X and they don't get pushed. The push work then completes thinking it is done, and cannot be restarted until the push target increases to >= X + 1. If the push target then never increases (because the tail is not moving), then we never run the push work again and we stall. Fix it by making sure log items with a LSN that matches the target exactly are pushed during the loop. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com> (cherry picked from commit cb64026b)
-
由 Dave Chinner 提交于
The recent conversion of the xfsaild functionality to a work queue introduced a hard-to-hit log space grant hang. The main cause is a regression where a work exit path fails to clear the PUSHING state and recheck the target correctly. Make both exit paths do the same PUSHING bit clearing and target checking when the "no more work to be done" condition is hit. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com> (cherry picked from commit ea35a200)
-