提交 · 4e8a301929bfa017e6ffe11e3cf78ccaf8492801 · openanolis / cloud-kernel

04 6月, 2009 1 次提交

ocfs2: Fix possible deadlock in ocfs2_global_read_dquot() · 4e8a3019

由 Jan Kara 提交于 6月 02, 2009

It is not possible to get a read lock and then try to get the same write lock
in one thread as that can block on downconvert being requested by other node
leading to deadlock. So first drop the quota lock for reading and only after
that get it for writing.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

4e8a3019

06 5月, 2009 2 次提交

ocfs2: update comments in masklog.h · 2b53bc7b

由 Coly Li 提交于 5月 05, 2009

In the mainline ocfs2 code, the interface for masklog is in files under
/sys/fs/o2cb/masklog, but the comments in fs/ocfs2/cluster/masklog.h
reference the old /proc interface.  They are out of date.

This patch modifies the comments in cluster/masklog.h, which also provides
a bash script example on how to change the log mask bits.
Signed-off-by: NColy Li <coly.li@suse.de>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

2b53bc7b

ocfs2: Don't printk the error when listing too many xattrs. · a46fa684

由 Tao Ma 提交于 5月 04, 2009

Currently the kernel defines XATTR_LIST_MAX as 65536
in include/linux/limits.h.  This is the largest buffer that is used for
listing xattrs.

But with ocfs2 xattr tree, we actually have no limit for the number.  If
filesystem has more names than can fit in the buffer, the kernel
logs will be pollluted with something like this when listing:

(27738,0):ocfs2_iterate_xattr_buckets:3158 ERROR: status = -34
(27738,0):ocfs2_xattr_tree_list_index_block:3264 ERROR: status = -34

So don't print "ERROR" message as this is not an ocfs2 error.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

a46fa684

01 5月, 2009 1 次提交

ocfs2: Fix a missing credit when deleting from indexed directories. · dfa13f39

由 Joel Becker 提交于 4月 29, 2009

The ocfs2 directory index updates two blocks when we remove an entry -
the dx root and the dx leaf.  OCFS2_DELETE_INODE_CREDITS was only
accounting for the dx leaf.  This shows up when ocfs2_delete_inode()
runs out of credits in jbd2_journal_dirty_metadata() at
"J_ASSERT_JH(jh, handle->h_buffer_credits > 0);".

The test that caught this was running dirop_file_racer from the
ocfs2-test suite with a 250-character filename PREFIX.  Run on a 512B
blocksize, it forces the orphan dir index to grow large enough to
trigger.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

dfa13f39

30 4月, 2009 1 次提交

ocfs2/trivial: Remove unused variable in ocfs2_rename. · 7e31a966

由 Tao Ma 提交于 4月 29, 2009

With indexed dir enabled, now we use ocfs2_dir_lookup_result to
wrap all the bh used for dir. So remove the 2 unused variables.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

7e31a966

24 4月, 2009 1 次提交

ocfs2: Add missing iput() during error handling in ocfs2_dentry_attach_lock() · a5a0a630

由 Sunil Mushran 提交于 4月 20, 2009

In ocfs2_dentry_attach_lock(), if unable to get the dentry lock, we need to
call iput(inode) because a failure here means no d_instantiate(), which means
the normally matching iput() will not be called during dput(dentry).

This patch fixes the oops that accompanies the following message:
(3996,1):dlm_empty_lockres:2708 ERROR: lockres W00000000000000000a1046b06a4382 still has local locks!
kernel BUG in dlm_empty_lockres at /rpmbuild/smushran/BUILD/ocfs2-1.4.2/fs/ocfs2/dlm/dlmmaster.c:2709!
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

a5a0a630

22 4月, 2009 2 次提交

ocfs2: Fix some printk() warnings. · 5b09b507

由 Joel Becker 提交于 4月 21, 2009

The old %llu vs u64 battle.  Cast them correctly.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

5b09b507

ocfs2: Fix 2 warning during ocfs2 make. · 0fba8137

由 Tao Ma 提交于 3月 19, 2009

fs/ocfs2/dir.c: In function ‘ocfs2_extend_dir’:
fs/ocfs2/dir.c:2700: warning: ‘ret’ may be used uninitialized in this function

fs/ocfs2/suballoc.c: In function ‘ocfs2_get_suballoc_slot_bit’:
fs/ocfs2/suballoc.c:2216: warning: comparison is always true due to limited range of data type
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

0fba8137

15 4月, 2009 1 次提交

ocfs2: fix i_mutex locking in ocfs2_splice_to_file() · 328eaaba

由 Miklos Szeredi 提交于 4月 14, 2009

Rearrange locking of i_mutex on destination and call to
ocfs2_rw_lock() so locks are only held while buffers are copied with
the pipe_to_file() actor, and not while waiting for more data on the
pipe.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

328eaaba

08 4月, 2009 1 次提交

ocfs2: Reserve 1 more cluster in expanding_inline_dir for indexed dir. · 035a5711

由 Tao Ma 提交于 4月 07, 2009

In ocfs2_expand_inline_dir, we calculate whether we need 1 extra
cluster if we can't store the dx inline the root and save it in
dx_alloc. So add it when we call ocfs2_reserve_clusters.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

035a5711

07 4月, 2009 1 次提交

splice: fix deadlock in splicing to file · 7bfac9ec

由 Miklos Szeredi 提交于 4月 06, 2009

There's a possible deadlock in generic_file_splice_write(),
splice_from_pipe() and ocfs2_file_splice_write():

 - task A calls generic_file_splice_write()
 - this calls inode_double_lock(), which locks i_mutex on both
   pipe->inode and target inode
 - ordering depends on inode pointers, can happen that pipe->inode is
   locked first
 - __splice_from_pipe() needs more data, calls pipe_wait()
 - this releases lock on pipe->inode, goes to interruptible sleep
 - task B calls generic_file_splice_write(), similarly to the first
 - this locks pipe->inode, then tries to lock inode, but that is
   already held by task A
 - task A is interrupted, it tries to lock pipe->inode, but fails, as
   it is already held by task B
 - ABBA deadlock

Fix this by explicitly ordering locks: the outer lock must be on
target inode and the inner lock (which is later unlocked and relocked)
must be on pipe->inode.  This is OK, pipe inodes and target inodes
form two nonoverlapping sets, generic_file_splice_write() and friends
are not called with a target which is a pipe.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJens Axboe <jens.axboe@oracle.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7bfac9ec

04 4月, 2009 29 次提交

ocfs2: recover orphans in offline slots during recovery and mount · 9140db04

由 Srinivas Eeda 提交于 3月 06, 2009

During recovery, a node recovers orphans in it's slot and the dead node(s). But
if the dead nodes were holding orphans in offline slots, they will be left
unrecovered.

If the dead node is the last one to die and is holding orphans in other slots
and is the first one to mount, then it only recovers it's own slot, which
leaves orphans in offline slots.

This patch queues complete_recovery to clean orphans for all offline slots
during mount and node recovery.
Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

9140db04

ocfs2: Pagecache usage optimization on ocfs2 · 1fca3a05

由 Hisashi Hifumi 提交于 3月 05, 2009

A page can have multiple buffers and even if a page is not uptodate, some buffers
can be uptodate on pagesize != blocksize environment.
This aops checks that all buffers which correspond to a part of a file
that we want to read are uptodate. If so, we do not have to issue actual
read IO to HDD even if a page is not uptodate because the portion we
want to read are uptodate.
"block_is_partially_uptodate" function is already used by ext2/3/4.
With the following patch random read/write mixed workloads or random read after
random write workloads can be optimized and we can get performance improvement.
Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

1fca3a05

ocfs2: fix rare stale inode errors when exporting via nfs · 6ca497a8

由 wengang wang 提交于 3月 06, 2009

For nfs exporting, ocfs2_get_dentry() returns the dentry for fh.
ocfs2_get_dentry() may read from disk when the inode is not in memory,
without any cross cluster lock. this leads to the file system loading a
stale inode.

This patch fixes above problem.

Solution is that in case of inode is not in memory, we get the cluster
lock(PR) of alloc inode where the inode in question is allocated from (this
causes node on which deletion is done sync the alloc inode) before reading
out the inode itsself. then we check the bitmap in the group (the inode in
question allcated from) to see if the bit is clear. if it's clear then it's
stale. if the bit is set, we then check generation as the existing code
does.

We have to read out the inode in question from disk first to know its alloc
slot and allot bit. And if its not stale we read it out using ocfs2_iget().
The second read should then be from cache.

And also we have to add a per superblock nfs_sync_lock to cover the lock for
alloc inode and that for inode in question. this is because ocfs2_get_dentry()
and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked
in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so
that mutliple ocfs2_delete_inode() can run concurrently in normal case.

[mfasheh@suse.com: build warning fixes and comment cleanups]
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

6ca497a8

ocfs2/dlm: Tweak mle_state output · 9405dccf

由 Sunil Mushran 提交于 2月 26, 2009

The debugfs file, mle_state, now prints the number of largest number of mles
in one hash link.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

9405dccf

ocfs2/dlm: Do not purge lockres that is being migrated dlm_purge_lockres() · 516b7e52

由 Sunil Mushran 提交于 2月 26, 2009

This patch attempts to fix a fine race between purging and migration.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

516b7e52

ocfs2/dlm: Remove struct dlm_lock_name in struct dlm_master_list_entry · 7141514b

由 Sunil Mushran 提交于 2月 26, 2009

This patch removes struct dlm_lock_name and adds the entries directly
to struct dlm_master_list_entry. Under the new scheme, both mles that
are backed by a lockres or not, will have the name populated in mle->mname.
This allows us to get rid of code that was figuring out the location of
the mle name.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

7141514b

ocfs2/dlm: Show the number of lockres/mles in dlm_state · e64ff146

由 Sunil Mushran 提交于 2月 26, 2009

This patch shows the number of lockres' and mles in the debugfs file, dlm_state.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

e64ff146

ocfs2/dlm: dlm_set_lockres_owner() and dlm_change_lockres_owner() inlined · 7d62a978

由 Sunil Mushran 提交于 2月 26, 2009

This patch inlines dlm_set_lockres_owner() and dlm_change_lockres_owner().
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

7d62a978

ocfs2/dlm: Improve lockres counts · 6800791a

由 Sunil Mushran 提交于 2月 26, 2009

This patch replaces the lockres counts that tracked the number number of
locally and remotely mastered lockres' with a current and total count. The
total count is the number of lockres' that have been created since the dlm
domain was created.

The number of locally and remotely mastered counts can be computed using
the locking_state output.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

6800791a

ocfs2/dlm: Track number of mles · 2041d8fd

由 Sunil Mushran 提交于 2月 26, 2009

The lifetime of a mle is limited to the duration of the lockres mastery
process. While typically this lifetime is fairly short, we have noticed
the number of mles explode under certain circumstances. This patch tracks
the number of each different types of mles and should help us determine
how best to speed up the mastery process.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

2041d8fd

ocfs2/dlm: Indent dlm_cleanup_master_list() · 67ae1f06

由 Sunil Mushran 提交于 2月 26, 2009

The previous patch explicitly did not indent dlm_cleanup_master_list()
so as to make the patch readable. This patch properly indents the
function.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

67ae1f06

ocfs2/dlm: Activate dlm->master_hash for master list entries · 2ed6c750

由 Sunil Mushran 提交于 2月 26, 2009

With this patch, the mles are stored in a hash and not a simple list.
This should improve the mle lookup time when the number of outstanding
masteries is large.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

2ed6c750

ocfs2/dlm: Create and destroy the dlm->master_hash · e2b66ddc

由 Sunil Mushran 提交于 2月 26, 2009

This patch adds code to create and destroy the dlm->master_hash.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

e2b66ddc

ocfs2/dlm: Refactor dlm_clean_master_list() · c2cd4a44

由 Sunil Mushran 提交于 2月 26, 2009

This patch refactors dlm_clean_master_list() so as to make it
easier to convert the mle list to a hash.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

c2cd4a44

ocfs2/dlm: Clean up struct dlm_lock_name · f77a9a78

由 Sunil Mushran 提交于 2月 26, 2009

For master mle, the name it stored in the attached lockres in struct qstr.
For block and migration mle, the name is stored inline in struct dlm_lock_name.
This patch attempts to make struct dlm_lock_name look like a struct qstr. While
we could use struct qstr, we don't because we want to avoid having to malloc
and free the lockname string as the mle's lifetime is fairly short.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

f77a9a78

ocfs2/dlm: Encapsulate adding and removing of mle from dlm->master_list · 1c084577

由 Sunil Mushran 提交于 2月 26, 2009

This patch encapsulates adding and removing of the mle from the
dlm->master_list. This patch is part of the series of patches that
converts the mle list to a mle hash.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

1c084577

ocfs2: Optimize inode group allocation by recording last used group. · feb473a6

由 Tao Ma 提交于 2月 25, 2009

In ocfs2, the block group search looks for the "emptiest" group
to allocate from. So if the allocator has many equally(or almost
equally) empty groups, new block group will tend to get spread
out amongst them.

So we add osb_inode_alloc_group in ocfs2_super to record the last
used inode allocation group.
For more details, please see
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.

I have done some basic test and the results are a ten times improvement on
some cold-cache stat workloads.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

feb473a6

ocfs2: Allocate inode groups from global_bitmap. · 60ca81e8

由 Tao Ma 提交于 2月 25, 2009

Inode groups used to be allocated from local alloc file,
but since we want all inodes to be contiguous enough, we
will try to allocate them directly from global_bitmap.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

60ca81e8

ocfs2: Optimize inode allocation by remembering last group · 13821151

由 Tao Ma 提交于 2月 25, 2009

In ocfs2, the inode block search looks for the "emptiest" inode
group to allocate from. So if an inode alloc file has many equally
(or almost equally) empty groups, new inodes will tend to get
spread out amongst them, which in turn can put them all over the
disk. This is undesirable because directory operations on conceptually
"nearby" inodes force a large number of seeks.

So we add ip_last_used_group in core directory inodes which records
the last used allocation group. Another field named ip_last_used_slot
is also added in case inode stealing happens. When claiming new inode,
we passed in directory's inode so that the allocation can use this
information.
For more details, please see
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

13821151

ocfs2: fix leaf start calculation in ocfs2_dx_dir_rebalance() · 1d46dc08

由 Mark Fasheh 提交于 2月 19, 2009

ocfs2_dx_dir_rebalance() is passed the block offset of a dx leaf which needs
rebalancing. Since we rebalance an entire cluster at a time however, this
function needs to calculate the beginning of that cluster, in blocks. The
calculation was wrong, which would result in a read of non-leaf blocks. Fix
the calculation by adding ocfs2_block_to_cluster_start() which is a more
straight-forward way of determining this.
Reported-by: NTristan Ye <tristan.ye@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

1d46dc08

ocfs2: re-order ocfs2_empty_dir checks · b80b549c

由 Mark Fasheh 提交于 2月 18, 2009

ocfs2_empty_dir() is far more expensive than checking link count. Since both
need to be checked at the same time, we can improve performance by checking
link count first.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

b80b549c

ocfs2: Enable indexed directories · 3a8df2b9

由 Mark Fasheh 提交于 11月 24, 2008

Since the disk format is finalized, we can set this feature bit in the
supported mask.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <Joel.Becker@oracle.com>

3a8df2b9

ocfs2: Add total entry count to dx_root_block · e3a93c2d

由 Mark Fasheh 提交于 2月 17, 2009

This little bit of extra accounting speeds up ocfs2_empty_dir()
dramatically by allowing us to short-circuit the full directory scan.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

e3a93c2d

ocfs2: Increase max links count · 198a1ca3

由 Mark Fasheh 提交于 11月 20, 2008

Since we've now got a directory format capable of handling a large number of
entries, we can increase the maximum link count supported. This only gets
increased if the directory indexing feature is turned on.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>

198a1ca3

ocfs2: Introduce dir free space list · e7c17e43

由 Mark Fasheh 提交于 1月 29, 2009

The only operation which doesn't get faster with directory indexing is
insert, which still has to walk the entire unindexed directory portion to
find a free block. This patch provides an improvement in directory insert
performance by maintaining a singly linked list of directory leaf blocks
which have space for additional dirents.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>

e7c17e43

ocfs2: Store dir index records inline · 4ed8a6bb

由 Mark Fasheh 提交于 11月 24, 2008

Allow us to store a small number of directory index records in the
ocfs2_dx_root_block. This saves us a disk read on small to medium sized
directories (less than about 250 entries). The inline root is automatically
turned into a root block with extents if the directory size increases beyond
it's capacity.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>

4ed8a6bb

ocfs2: Add a name indexed b-tree to directory inodes · 9b7895ef

由 Mark Fasheh 提交于 11月 12, 2008

This patch makes use of Ocfs2's flexible btree code to add an additional
tree to directory inodes. The new tree stores an array of small,
fixed-length records in each leaf block. Each record stores a hash value,
and pointer to a block in the traditional (unindexed) directory tree where a
dirent with the given name hash resides. Lookup exclusively uses this tree
to find dirents, thus providing us with constant time name lookups.

Some of the hashing code was copied from ext3. Unfortunately, it has lots of
unfixed checkpatch errors. I left that as-is so that tracking changes would
be easier.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>

9b7895ef

ocfs2: Introduce dir lookup helper struct · 4a12ca3a

由 Mark Fasheh 提交于 11月 12, 2008

Many directory manipulation calls pass around a tuple of dirent, and it's
containing buffer_head. Dir indexing has a bit more state, but instead of
adding yet more arguments to functions, we introduce 'struct
ocfs2_dir_lookup_result'. In this patch, it simply holds the same tuple, but
future patches will add more state.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>

4a12ca3a

ocfs2: Remove debugfs file local_alloc_stats · 59b526a3

由 Sunil Mushran 提交于 12月 16, 2008

This patch removes the debugfs file local_alloc_stats as that information
is now included in the fs_state debugfs file.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

59b526a3

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功