- 04 6月, 2009 1 次提交
-
-
由 Jan Kara 提交于
It is not possible to get a read lock and then try to get the same write lock in one thread as that can block on downconvert being requested by other node leading to deadlock. So first drop the quota lock for reading and only after that get it for writing. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 06 5月, 2009 2 次提交
-
-
由 Coly Li 提交于
In the mainline ocfs2 code, the interface for masklog is in files under /sys/fs/o2cb/masklog, but the comments in fs/ocfs2/cluster/masklog.h reference the old /proc interface. They are out of date. This patch modifies the comments in cluster/masklog.h, which also provides a bash script example on how to change the log mask bits. Signed-off-by: NColy Li <coly.li@suse.de> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Tao Ma 提交于
Currently the kernel defines XATTR_LIST_MAX as 65536 in include/linux/limits.h. This is the largest buffer that is used for listing xattrs. But with ocfs2 xattr tree, we actually have no limit for the number. If filesystem has more names than can fit in the buffer, the kernel logs will be pollluted with something like this when listing: (27738,0):ocfs2_iterate_xattr_buckets:3158 ERROR: status = -34 (27738,0):ocfs2_xattr_tree_list_index_block:3264 ERROR: status = -34 So don't print "ERROR" message as this is not an ocfs2 error. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 01 5月, 2009 1 次提交
-
-
由 Joel Becker 提交于
The ocfs2 directory index updates two blocks when we remove an entry - the dx root and the dx leaf. OCFS2_DELETE_INODE_CREDITS was only accounting for the dx leaf. This shows up when ocfs2_delete_inode() runs out of credits in jbd2_journal_dirty_metadata() at "J_ASSERT_JH(jh, handle->h_buffer_credits > 0);". The test that caught this was running dirop_file_racer from the ocfs2-test suite with a 250-character filename PREFIX. Run on a 512B blocksize, it forces the orphan dir index to grow large enough to trigger. Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 30 4月, 2009 1 次提交
-
-
由 Tao Ma 提交于
With indexed dir enabled, now we use ocfs2_dir_lookup_result to wrap all the bh used for dir. So remove the 2 unused variables. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 24 4月, 2009 1 次提交
-
-
由 Sunil Mushran 提交于
In ocfs2_dentry_attach_lock(), if unable to get the dentry lock, we need to call iput(inode) because a failure here means no d_instantiate(), which means the normally matching iput() will not be called during dput(dentry). This patch fixes the oops that accompanies the following message: (3996,1):dlm_empty_lockres:2708 ERROR: lockres W00000000000000000a1046b06a4382 still has local locks! kernel BUG in dlm_empty_lockres at /rpmbuild/smushran/BUILD/ocfs2-1.4.2/fs/ocfs2/dlm/dlmmaster.c:2709! Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 22 4月, 2009 2 次提交
-
-
由 Joel Becker 提交于
The old %llu vs u64 battle. Cast them correctly. Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Tao Ma 提交于
fs/ocfs2/dir.c: In function ‘ocfs2_extend_dir’: fs/ocfs2/dir.c:2700: warning: ‘ret’ may be used uninitialized in this function fs/ocfs2/suballoc.c: In function ‘ocfs2_get_suballoc_slot_bit’: fs/ocfs2/suballoc.c:2216: warning: comparison is always true due to limited range of data type Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 15 4月, 2009 1 次提交
-
-
由 Miklos Szeredi 提交于
Rearrange locking of i_mutex on destination and call to ocfs2_rw_lock() so locks are only held while buffers are copied with the pipe_to_file() actor, and not while waiting for more data on the pipe. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 08 4月, 2009 1 次提交
-
-
由 Tao Ma 提交于
In ocfs2_expand_inline_dir, we calculate whether we need 1 extra cluster if we can't store the dx inline the root and save it in dx_alloc. So add it when we call ocfs2_reserve_clusters. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
- 07 4月, 2009 1 次提交
-
-
由 Miklos Szeredi 提交于
There's a possible deadlock in generic_file_splice_write(), splice_from_pipe() and ocfs2_file_splice_write(): - task A calls generic_file_splice_write() - this calls inode_double_lock(), which locks i_mutex on both pipe->inode and target inode - ordering depends on inode pointers, can happen that pipe->inode is locked first - __splice_from_pipe() needs more data, calls pipe_wait() - this releases lock on pipe->inode, goes to interruptible sleep - task B calls generic_file_splice_write(), similarly to the first - this locks pipe->inode, then tries to lock inode, but that is already held by task A - task A is interrupted, it tries to lock pipe->inode, but fails, as it is already held by task B - ABBA deadlock Fix this by explicitly ordering locks: the outer lock must be on target inode and the inner lock (which is later unlocked and relocked) must be on pipe->inode. This is OK, pipe inodes and target inodes form two nonoverlapping sets, generic_file_splice_write() and friends are not called with a target which is a pipe. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Acked-by: NMark Fasheh <mfasheh@suse.com> Acked-by: NJens Axboe <jens.axboe@oracle.com> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 4月, 2009 29 次提交
-
-
由 Srinivas Eeda 提交于
During recovery, a node recovers orphans in it's slot and the dead node(s). But if the dead nodes were holding orphans in offline slots, they will be left unrecovered. If the dead node is the last one to die and is holding orphans in other slots and is the first one to mount, then it only recovers it's own slot, which leaves orphans in offline slots. This patch queues complete_recovery to clean orphans for all offline slots during mount and node recovery. Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com> Acked-by: NJoel Becker <joel.becker@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Hisashi Hifumi 提交于
A page can have multiple buffers and even if a page is not uptodate, some buffers can be uptodate on pagesize != blocksize environment. This aops checks that all buffers which correspond to a part of a file that we want to read are uptodate. If so, we do not have to issue actual read IO to HDD even if a page is not uptodate because the portion we want to read are uptodate. "block_is_partially_uptodate" function is already used by ext2/3/4. With the following patch random read/write mixed workloads or random read after random write workloads can be optimized and we can get performance improvement. Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 wengang wang 提交于
For nfs exporting, ocfs2_get_dentry() returns the dentry for fh. ocfs2_get_dentry() may read from disk when the inode is not in memory, without any cross cluster lock. this leads to the file system loading a stale inode. This patch fixes above problem. Solution is that in case of inode is not in memory, we get the cluster lock(PR) of alloc inode where the inode in question is allocated from (this causes node on which deletion is done sync the alloc inode) before reading out the inode itsself. then we check the bitmap in the group (the inode in question allcated from) to see if the bit is clear. if it's clear then it's stale. if the bit is set, we then check generation as the existing code does. We have to read out the inode in question from disk first to know its alloc slot and allot bit. And if its not stale we read it out using ocfs2_iget(). The second read should then be from cache. And also we have to add a per superblock nfs_sync_lock to cover the lock for alloc inode and that for inode in question. this is because ocfs2_get_dentry() and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so that mutliple ocfs2_delete_inode() can run concurrently in normal case. [mfasheh@suse.com: build warning fixes and comment cleanups] Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com> Acked-by: NJoel Becker <joel.becker@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Sunil Mushran 提交于
The debugfs file, mle_state, now prints the number of largest number of mles in one hash link. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Sunil Mushran 提交于
This patch attempts to fix a fine race between purging and migration. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Sunil Mushran 提交于
This patch removes struct dlm_lock_name and adds the entries directly to struct dlm_master_list_entry. Under the new scheme, both mles that are backed by a lockres or not, will have the name populated in mle->mname. This allows us to get rid of code that was figuring out the location of the mle name. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Sunil Mushran 提交于
This patch shows the number of lockres' and mles in the debugfs file, dlm_state. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Sunil Mushran 提交于
This patch inlines dlm_set_lockres_owner() and dlm_change_lockres_owner(). Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Sunil Mushran 提交于
This patch replaces the lockres counts that tracked the number number of locally and remotely mastered lockres' with a current and total count. The total count is the number of lockres' that have been created since the dlm domain was created. The number of locally and remotely mastered counts can be computed using the locking_state output. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Sunil Mushran 提交于
The lifetime of a mle is limited to the duration of the lockres mastery process. While typically this lifetime is fairly short, we have noticed the number of mles explode under certain circumstances. This patch tracks the number of each different types of mles and should help us determine how best to speed up the mastery process. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Sunil Mushran 提交于
The previous patch explicitly did not indent dlm_cleanup_master_list() so as to make the patch readable. This patch properly indents the function. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Sunil Mushran 提交于
With this patch, the mles are stored in a hash and not a simple list. This should improve the mle lookup time when the number of outstanding masteries is large. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Sunil Mushran 提交于
This patch adds code to create and destroy the dlm->master_hash. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Sunil Mushran 提交于
This patch refactors dlm_clean_master_list() so as to make it easier to convert the mle list to a hash. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Sunil Mushran 提交于
For master mle, the name it stored in the attached lockres in struct qstr. For block and migration mle, the name is stored inline in struct dlm_lock_name. This patch attempts to make struct dlm_lock_name look like a struct qstr. While we could use struct qstr, we don't because we want to avoid having to malloc and free the lockname string as the mle's lifetime is fairly short. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Sunil Mushran 提交于
This patch encapsulates adding and removing of the mle from the dlm->master_list. This patch is part of the series of patches that converts the mle list to a mle hash. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Tao Ma 提交于
In ocfs2, the block group search looks for the "emptiest" group to allocate from. So if the allocator has many equally(or almost equally) empty groups, new block group will tend to get spread out amongst them. So we add osb_inode_alloc_group in ocfs2_super to record the last used inode allocation group. For more details, please see http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy. I have done some basic test and the results are a ten times improvement on some cold-cache stat workloads. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Tao Ma 提交于
Inode groups used to be allocated from local alloc file, but since we want all inodes to be contiguous enough, we will try to allocate them directly from global_bitmap. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Tao Ma 提交于
In ocfs2, the inode block search looks for the "emptiest" inode group to allocate from. So if an inode alloc file has many equally (or almost equally) empty groups, new inodes will tend to get spread out amongst them, which in turn can put them all over the disk. This is undesirable because directory operations on conceptually "nearby" inodes force a large number of seeks. So we add ip_last_used_group in core directory inodes which records the last used allocation group. Another field named ip_last_used_slot is also added in case inode stealing happens. When claiming new inode, we passed in directory's inode so that the allocation can use this information. For more details, please see http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Mark Fasheh 提交于
ocfs2_dx_dir_rebalance() is passed the block offset of a dx leaf which needs rebalancing. Since we rebalance an entire cluster at a time however, this function needs to calculate the beginning of that cluster, in blocks. The calculation was wrong, which would result in a read of non-leaf blocks. Fix the calculation by adding ocfs2_block_to_cluster_start() which is a more straight-forward way of determining this. Reported-by: NTristan Ye <tristan.ye@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Mark Fasheh 提交于
ocfs2_empty_dir() is far more expensive than checking link count. Since both need to be checked at the same time, we can improve performance by checking link count first. Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Mark Fasheh 提交于
Since the disk format is finalized, we can set this feature bit in the supported mask. Signed-off-by: NMark Fasheh <mfasheh@suse.com> Acked-by: NJoel Becker <Joel.Becker@oracle.com>
-
由 Mark Fasheh 提交于
This little bit of extra accounting speeds up ocfs2_empty_dir() dramatically by allowing us to short-circuit the full directory scan. Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Mark Fasheh 提交于
Since we've now got a directory format capable of handling a large number of entries, we can increase the maximum link count supported. This only gets increased if the directory indexing feature is turned on. Signed-off-by: NMark Fasheh <mfasheh@suse.com> Acked-by: NJoel Becker <joel.becker@oracle.com>
-
由 Mark Fasheh 提交于
The only operation which doesn't get faster with directory indexing is insert, which still has to walk the entire unindexed directory portion to find a free block. This patch provides an improvement in directory insert performance by maintaining a singly linked list of directory leaf blocks which have space for additional dirents. Signed-off-by: NMark Fasheh <mfasheh@suse.com> Acked-by: NJoel Becker <joel.becker@oracle.com>
-
由 Mark Fasheh 提交于
Allow us to store a small number of directory index records in the ocfs2_dx_root_block. This saves us a disk read on small to medium sized directories (less than about 250 entries). The inline root is automatically turned into a root block with extents if the directory size increases beyond it's capacity. Signed-off-by: NMark Fasheh <mfasheh@suse.com> Acked-by: NJoel Becker <joel.becker@oracle.com>
-
由 Mark Fasheh 提交于
This patch makes use of Ocfs2's flexible btree code to add an additional tree to directory inodes. The new tree stores an array of small, fixed-length records in each leaf block. Each record stores a hash value, and pointer to a block in the traditional (unindexed) directory tree where a dirent with the given name hash resides. Lookup exclusively uses this tree to find dirents, thus providing us with constant time name lookups. Some of the hashing code was copied from ext3. Unfortunately, it has lots of unfixed checkpatch errors. I left that as-is so that tracking changes would be easier. Signed-off-by: NMark Fasheh <mfasheh@suse.com> Acked-by: NJoel Becker <joel.becker@oracle.com>
-
由 Mark Fasheh 提交于
Many directory manipulation calls pass around a tuple of dirent, and it's containing buffer_head. Dir indexing has a bit more state, but instead of adding yet more arguments to functions, we introduce 'struct ocfs2_dir_lookup_result'. In this patch, it simply holds the same tuple, but future patches will add more state. Signed-off-by: NMark Fasheh <mfasheh@suse.com> Acked-by: NJoel Becker <joel.becker@oracle.com>
-
由 Sunil Mushran 提交于
This patch removes the debugfs file local_alloc_stats as that information is now included in the fs_state debugfs file. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-