提交 · c3d38840abaa45c1c5a5fabbb8ffc9a0d1a764d1 · openanolis / cloud-kernel

23 6月, 2009 4 次提交

由 Sunil Mushran 提交于 6月 19, 2009

Skip printing information that is not valid for local mounts.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

c3d38840

ocfs2: Pin journal head before accessing jh->b_committed_data · 94e41ecf

由 Sunil Mushran 提交于 6月 19, 2009

This patch adds jbd_lock_bh_state() and jbd_unlock_bh_state() around accessses
to jh->b_committed_data.

Fixes oss bugzilla#1131
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1131Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

94e41ecf

ocfs2: Update atime in splice read if necessary. · 1962f39a

由 Tao Ma 提交于 6月 19, 2009

We should call ocfs2_inode_lock_atime instead of ocfs2_inode_lock
in ocfs2_file_splice_read like we do in ocfs2_file_aio_read so
that we can update atime in splice read if necessary.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

1962f39a

ocfs2: Provide the ocfs2_dlm_lvb_valid() stack API. · 1c520dfb

由 Joel Becker 提交于 6月 19, 2009

The Lock Value Block (LVB) of a DLM lock can be lost when nodes die and
the DLM cannot reconstruct its state.  Clients of the DLM need to know
this.

ocfs2's internal DLM, o2dlm, explicitly zeroes out the LVB when it loses
track of the state.  This is not a standard behavior, but ocfs2 has
always relied on it.  Thus, an o2dlm LVB is always "valid".

ocfs2 now supports both o2dlm and fs/dlm via the stack glue.  When
fs/dlm loses track of an LVBs state, it sets a flag
(DLM_SBF_VALNOTVALID) on the Lock Status Block (LKSB).  The contents of
the LVB may be garbage or merely stale.

ocfs2 doesn't want to try to guess at the validity of the stale LVB.
Instead, it should be checking the VALNOTVALID flag.  As this is the
'standard' way of treating LVBs, we will promote this behavior.

We add a stack glue API ocfs2_dlm_lvb_valid().  It returns non-zero when
the LVB is valid.  o2dlm will always return valid, while fs/dlm will
check VALNOTVALID.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Acked-by: NMark Fasheh <mfasheh@suse.com>

1c520dfb

16 6月, 2009 2 次提交

ocfs2/net: Use wait_event() in o2net_send_message_vec() · 9af0b38f

由 Sunil Mushran 提交于 6月 11, 2009

Replace wait_event_interruptible() with wait_event() in o2net_send_message_vec().
This is because this function is called by the dlm that expects signals to be
blocked.

Fixes oss bugzilla#1126
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1126Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

9af0b38f

ocfs2: Adjust rightmost path in ocfs2_add_branch. · 6b791bcc

由 Tao Ma 提交于 6月 12, 2009

In ocfs2_add_branch, we use the rightmost rec of the leaf extent block
to generate the e_cpos for the newly added branch. In the most case, it
is OK but if the parent extent block's rightmost rec covers more clusters
than the leaf does, it will cause kernel panic if we insert some clusters
in it. The message is something like:
(7445,1):ocfs2_insert_at_leaf:3775 ERROR: bug expression:
le16_to_cpu(el->l_next_free_rec) >= le16_to_cpu(el->l_count)
(7445,1):ocfs2_insert_at_leaf:3775 ERROR: inode 66053, depth 0, count 28,
next free 28, rec.cpos 270, rec.clusters 1, insert.cpos 275, insert.clusters 1
 [<fa7ad565>] ? ocfs2_do_insert_extent+0xb58/0xda0 [ocfs2]
 [<fa7b08f2>] ? ocfs2_insert_extent+0x5bd/0x6ba [ocfs2]
 [<fa7b1b8b>] ? ocfs2_add_clusters_in_btree+0x37f/0x564 [ocfs2]
...

The panic can be easily reproduced by the following small test case
(with bs=512, cs=4K, and I remove all the error handling so that it looks
clear enough for reading).

int main(int argc, char **argv)
{
	int fd, i;
	char buf[5] = "test";

	fd = open(argv[1], O_RDWR|O_CREAT);

	for (i = 0; i < 30; i++) {
		lseek(fd, 40960 * i, SEEK_SET);
		write(fd, buf, 5);
	}

	ftruncate(fd, 1146880);

	lseek(fd, 1126400, SEEK_SET);
	write(fd, buf, 5);

	close(fd);

	return 0;
}

The reason of the panic is that:
the 30 writes and the ftruncate makes the file's extent list looks like:

	Tree Depth: 1   Count: 19   Next Free Rec: 1
	## Offset        Clusters       Block#
	0  0             280            86183
	SubAlloc Bit: 7   SubAlloc Slot: 0
	Blknum: 86183   Next Leaf: 0
	CRC32: 00000000   ECC: 0000
	Tree Depth: 0   Count: 28   Next Free Rec: 28
	## Offset        Clusters       Block#          Flags
	0  0             1              143368          0x0
	1  10            1              143376          0x0
	...
	26 260           1              143576          0x0
	27 270           1              143584          0x0

Now another write at 1126400(275 cluster) whiich will write at the gap
between 271 and 280 will trigger ocfs2_add_branch, but the result after
the function looks like:
	Tree Depth: 1   Count: 19   Next Free Rec: 2
	## Offset        Clusters       Block#
	0  0             280            86183
	1  271           0             143592
So the extent record is intersected and make the following operation bug out.

This patch just try to remove the gap before we add the new branch, so that
the root(branch) rightmost rec will cover the same right position. So in the
above case, before adding branch the tree will be changed to
	Tree Depth: 1   Count: 19   Next Free Rec: 1
	## Offset        Clusters       Block#
	0  0             271            86183
	SubAlloc Bit: 7   SubAlloc Slot: 0
	Blknum: 86183   Next Leaf: 0
	CRC32: 00000000   ECC: 0000
	Tree Depth: 0   Count: 28   Next Free Rec: 28
	## Offset        Clusters       Block#          Flags
	0  0             1              143368          0x0
	1  10            1              143376          0x0
	...
	26 260           1              143576          0x0
	27 270           1              143584          0x0
And after branch add, the tree looks like
	Tree Depth: 1   Count: 19   Next Free Rec: 2
	## Offset        Clusters       Block#
	0  0             271            86183
	1  271           0             143592
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

6b791bcc

12 6月, 2009 3 次提交

Push BKL down into ->remount_fs() · 337eb00a

由 Alessio Igor Bogani 提交于 5月 12, 2009

[xfs, btrfs, capifs, shmem don't need BKL, exempt]
Signed-off-by: NAlessio Igor Bogani <abogani@texware.it>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

337eb00a

push BKL down into ->put_super · 6cfd0148

由 Christoph Hellwig 提交于 5月 05, 2009

Move BKL into ->put_super from the only caller.  A couple of
filesystems had trivial enough ->put_super (only kfree and NULLing of
s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
hugetlbfs, omfs, qnx4, shmem, all others got the full treatment.  Most
of them probably don't need it, but I'd rather sort that out individually.
Preferably after all the other BKL pushdowns in that area.

[AV: original used to move lock_super() down as well; these changes are
removed since we don't do lock_super() at all in generic_shutdown_super()
now]
[AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6cfd0148

ocfs2: remove ->write_super and stop maintaining ->s_dirt · 94cb993f

由 Christoph Hellwig 提交于 4月 27, 2009

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

94cb993f

10 6月, 2009 1 次提交

ocfs2: fdatasync should skip unimportant metadata writeout · e04cc15f

由 Hisashi Hifumi 提交于 6月 09, 2009

In ocfs2, fdatasync and fsync are identical.
I think fdatasync should skip committing transaction when
inode->i_state is set just I_DIRTY_SYNC and this indicates
only atime or/and mtime updates.
Following patch improves fdatasync throughput.

#sysbench --num-threads=16 --max-requests=300000 --test=fileio
--file-block-size=4K --file-total-size=16G --file-test-mode=rndwr
--file-fsync-mode=fdatasync run

Results:
-2.6.30-rc8
Test execution summary:
    total time:                          107.1445s
    total number of events:              119559
    total time taken by event execution: 116.1050
    per-request statistics:
         min:                            0.0000s
         avg:                            0.0010s
         max:                            0.1220s
         approx.  95 percentile:         0.0016s

Threads fairness:
    events (avg/stddev):           7472.4375/303.60
    execution time (avg/stddev):   7.2566/0.64

-2.6.30-rc8-patched
Test execution summary:
    total time:                          86.8529s
    total number of events:              300016
    total time taken by event execution: 24.3077
    per-request statistics:
         min:                            0.0000s
         avg:                            0.0001s
         max:                            0.0336s
         approx.  95 percentile:         0.0001s

Threads fairness:
    events (avg/stddev):           18751.0000/718.75
    execution time (avg/stddev):   1.5192/0.05
Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

e04cc15f

04 6月, 2009 9 次提交

T
ocfs2: Remove redundant gotos in ocfs2_mount_volume() · 06c59bb8
由 Tao Ma 提交于 5月 19, 2009
```
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
```
06c59bb8

ocfs2: Add statistics for the checksum and ecc operations. · 73be192b

由 Joel Becker 提交于 1月 06, 2009

It would be nice to know how often we get checksum failures.  Even
better, how many of them we can fix with the single bit ecc.  So, we add
a statistics structure.  The structure can be installed into debugfs
wherever the user wants.

For ocfs2, we'll put it in the superblock-specific debugfs directory and
pass it down from our higher-level functions.  The stats are only
registered with debugfs when the filesystem supports metadata ecc.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

73be192b

ocfs2 patch to track delayed orphan scan timer statistics · 15633a22

由 Srinivas Eeda 提交于 6月 03, 2009

Patch to track delayed orphan scan timer statistics.

Modifies ocfs2_osb_dump to print the following:
  Orphan Scan=> Local: 10  Global: 21  Last Scan: 67 seconds ago
Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

15633a22

ocfs2: timer to queue scan of all orphan slots · 83273932

由 Srinivas Eeda 提交于 6月 03, 2009

When a dentry is unlinked, the unlinking node takes an EX on the dentry lock
before moving the dentry to the orphan directory. Other nodes that have
this dentry in cache have a PR on the same dentry lock. When the EX is
requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED
during downconvert. The inode is finally deleted when the last node to iput
the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set.

A problem arises if a node is forced to free dentry locks because of memory
pressure. If this happens, the node will no longer get downconvert
notifications for the dentries that have been unlinked on another node.
If it also happens that node is actively using the corresponding inode and
happens to be the one performing the last iput on that inode, it will fail
to delete the inode as it will not have the MAYBE_ORPHANED flag set.

This patch fixes this shortcoming by introducing a periodic scan of the
orphan directories to delete such inodes. Care has been taken to distribute
the workload across the cluster so that no one node has to perform the task
all the time.
Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

83273932

ocfs2: Correct ordering of ip_alloc_sem and localloc locks for directories · edd45c08

由 Jan Kara 提交于 6月 02, 2009

We use ordering ip_alloc_sem -> local alloc locks in ocfs2_write_begin().
So change lock ordering in ocfs2_extend_dir() and ocfs2_expand_inline_dir()
to also use this lock ordering.
Signed-off-by: NJan Kara <jack@suse.cz>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

edd45c08

ocfs2: Fix possible deadlock in quota recovery · 80d73f15

由 Jan Kara 提交于 6月 02, 2009

In ocfs2_finish_quota_recovery() we acquired global quota file lock and started
recovering local quota file. During this process we need to get quota
structures, which calls ocfs2_dquot_acquire() which gets global quota file lock
again. This second lock can block in case some other node has requested the
quota file lock in the mean time. Fix the problem by moving quota file locking
down into the function where it is really needed. Then dqget() or dqput()
won't be called with the lock held.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

80d73f15

ocfs2: Fix possible deadlock with quotas in ocfs2_setattr() · 65bac575

由 Jan Kara 提交于 6月 02, 2009

We called vfs_dq_transfer() with global quota file lock held. This can lead
to deadlocks as if vfs_dq_transfer() has to allocate new quota structure,
it calls ocfs2_dquot_acquire() which tries to get quota file lock again and
this can block if another node requested the lock in the mean time.

Since we have to call vfs_dq_transfer() with transaction already started
and quota file lock ranks above the transaction start, we cannot just rely
on ocfs2_dquot_acquire() or ocfs2_dquot_release() on getting the lock
if they need it. We fix the problem by acquiring pointers to all quota
structures needed by vfs_dq_transfer() already before calling the function.
By this we are sure that all quota structures are properly allocated and
they can be freed only after we drop references to them. Thus we don't need
quota file lock anywhere inside vfs_dq_transfer().
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

65bac575

ocfs2: Fix lock inversion in ocfs2_local_read_info() · b4c30de3

由 Jan Kara 提交于 6月 02, 2009

This function is called with dqio_mutex held but it has to acquire lock
from global quota file which ranks above this lock. This is not deadlockable
lock inversion since this code path is take only during mount when noone
else can race with us but let's clean this up to silence lockdep.

We just drop the dqio_mutex in the beginning of the function and reacquire
it in the end since we don't need it - noone can race with us at this moment.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

b4c30de3

ocfs2: Fix possible deadlock in ocfs2_global_read_dquot() · 4e8a3019

由 Jan Kara 提交于 6月 02, 2009

It is not possible to get a read lock and then try to get the same write lock
in one thread as that can block on downconvert being requested by other node
leading to deadlock. So first drop the quota lock for reading and only after
that get it for writing.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

4e8a3019

23 5月, 2009 1 次提交

block: Do away with the notion of hardsect_size · e1defc4f

由 Martin K. Petersen 提交于 5月 22, 2009

Until now we have had a 1:1 mapping between storage device physical
block size and the logical block sized used when addressing the device.
With SATA 4KB drives coming out that will no longer be the case.  The
sector size will be 4KB but the logical block size will remain
512-bytes.  Hence we need to distinguish between the physical block size
and the logical ditto.

This patch renames hardsect_size to logical_block_size.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

e1defc4f

09 5月, 2009 1 次提交

ocfs2: Use nd_set_link(). · a731d12d

由 Joel Becker 提交于 4月 06, 2009

ocfs2 was hand-calling vfs_follow_link(), but there's no point to that.
Let's use page_follow_link_light() and nd_set_link().
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a731d12d

06 5月, 2009 2 次提交

ocfs2: update comments in masklog.h · 2b53bc7b

由 Coly Li 提交于 5月 05, 2009

In the mainline ocfs2 code, the interface for masklog is in files under
/sys/fs/o2cb/masklog, but the comments in fs/ocfs2/cluster/masklog.h
reference the old /proc interface.  They are out of date.

This patch modifies the comments in cluster/masklog.h, which also provides
a bash script example on how to change the log mask bits.
Signed-off-by: NColy Li <coly.li@suse.de>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

2b53bc7b

ocfs2: Don't printk the error when listing too many xattrs. · a46fa684

由 Tao Ma 提交于 5月 04, 2009

Currently the kernel defines XATTR_LIST_MAX as 65536
in include/linux/limits.h.  This is the largest buffer that is used for
listing xattrs.

But with ocfs2 xattr tree, we actually have no limit for the number.  If
filesystem has more names than can fit in the buffer, the kernel
logs will be pollluted with something like this when listing:

(27738,0):ocfs2_iterate_xattr_buckets:3158 ERROR: status = -34
(27738,0):ocfs2_xattr_tree_list_index_block:3264 ERROR: status = -34

So don't print "ERROR" message as this is not an ocfs2 error.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

a46fa684

01 5月, 2009 1 次提交

ocfs2: Fix a missing credit when deleting from indexed directories. · dfa13f39

由 Joel Becker 提交于 4月 29, 2009

The ocfs2 directory index updates two blocks when we remove an entry -
the dx root and the dx leaf.  OCFS2_DELETE_INODE_CREDITS was only
accounting for the dx leaf.  This shows up when ocfs2_delete_inode()
runs out of credits in jbd2_journal_dirty_metadata() at
"J_ASSERT_JH(jh, handle->h_buffer_credits > 0);".

The test that caught this was running dirop_file_racer from the
ocfs2-test suite with a 250-character filename PREFIX.  Run on a 512B
blocksize, it forces the orphan dir index to grow large enough to
trigger.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

dfa13f39

30 4月, 2009 1 次提交

ocfs2/trivial: Remove unused variable in ocfs2_rename. · 7e31a966

由 Tao Ma 提交于 4月 29, 2009

With indexed dir enabled, now we use ocfs2_dir_lookup_result to
wrap all the bh used for dir. So remove the 2 unused variables.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

7e31a966

24 4月, 2009 1 次提交

ocfs2: Add missing iput() during error handling in ocfs2_dentry_attach_lock() · a5a0a630

由 Sunil Mushran 提交于 4月 20, 2009

In ocfs2_dentry_attach_lock(), if unable to get the dentry lock, we need to
call iput(inode) because a failure here means no d_instantiate(), which means
the normally matching iput() will not be called during dput(dentry).

This patch fixes the oops that accompanies the following message:
(3996,1):dlm_empty_lockres:2708 ERROR: lockres W00000000000000000a1046b06a4382 still has local locks!
kernel BUG in dlm_empty_lockres at /rpmbuild/smushran/BUILD/ocfs2-1.4.2/fs/ocfs2/dlm/dlmmaster.c:2709!
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

a5a0a630

22 4月, 2009 2 次提交

ocfs2: Fix some printk() warnings. · 5b09b507

由 Joel Becker 提交于 4月 21, 2009

The old %llu vs u64 battle.  Cast them correctly.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

5b09b507

ocfs2: Fix 2 warning during ocfs2 make. · 0fba8137

由 Tao Ma 提交于 3月 19, 2009

fs/ocfs2/dir.c: In function ‘ocfs2_extend_dir’:
fs/ocfs2/dir.c:2700: warning: ‘ret’ may be used uninitialized in this function

fs/ocfs2/suballoc.c: In function ‘ocfs2_get_suballoc_slot_bit’:
fs/ocfs2/suballoc.c:2216: warning: comparison is always true due to limited range of data type
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

0fba8137

15 4月, 2009 1 次提交

ocfs2: fix i_mutex locking in ocfs2_splice_to_file() · 328eaaba

由 Miklos Szeredi 提交于 4月 14, 2009

Rearrange locking of i_mutex on destination and call to
ocfs2_rw_lock() so locks are only held while buffers are copied with
the pipe_to_file() actor, and not while waiting for more data on the
pipe.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

328eaaba

08 4月, 2009 1 次提交

ocfs2: Reserve 1 more cluster in expanding_inline_dir for indexed dir. · 035a5711

由 Tao Ma 提交于 4月 07, 2009

In ocfs2_expand_inline_dir, we calculate whether we need 1 extra
cluster if we can't store the dx inline the root and save it in
dx_alloc. So add it when we call ocfs2_reserve_clusters.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

035a5711

07 4月, 2009 1 次提交

splice: fix deadlock in splicing to file · 7bfac9ec

由 Miklos Szeredi 提交于 4月 06, 2009

There's a possible deadlock in generic_file_splice_write(),
splice_from_pipe() and ocfs2_file_splice_write():

 - task A calls generic_file_splice_write()
 - this calls inode_double_lock(), which locks i_mutex on both
   pipe->inode and target inode
 - ordering depends on inode pointers, can happen that pipe->inode is
   locked first
 - __splice_from_pipe() needs more data, calls pipe_wait()
 - this releases lock on pipe->inode, goes to interruptible sleep
 - task B calls generic_file_splice_write(), similarly to the first
 - this locks pipe->inode, then tries to lock inode, but that is
   already held by task A
 - task A is interrupted, it tries to lock pipe->inode, but fails, as
   it is already held by task B
 - ABBA deadlock

Fix this by explicitly ordering locks: the outer lock must be on
target inode and the inner lock (which is later unlocked and relocked)
must be on pipe->inode.  This is OK, pipe inodes and target inodes
form two nonoverlapping sets, generic_file_splice_write() and friends
are not called with a target which is a pipe.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJens Axboe <jens.axboe@oracle.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7bfac9ec

04 4月, 2009 9 次提交

ocfs2: recover orphans in offline slots during recovery and mount · 9140db04

由 Srinivas Eeda 提交于 3月 06, 2009

During recovery, a node recovers orphans in it's slot and the dead node(s). But
if the dead nodes were holding orphans in offline slots, they will be left
unrecovered.

If the dead node is the last one to die and is holding orphans in other slots
and is the first one to mount, then it only recovers it's own slot, which
leaves orphans in offline slots.

This patch queues complete_recovery to clean orphans for all offline slots
during mount and node recovery.
Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

9140db04

ocfs2: Pagecache usage optimization on ocfs2 · 1fca3a05

由 Hisashi Hifumi 提交于 3月 05, 2009

A page can have multiple buffers and even if a page is not uptodate, some buffers
can be uptodate on pagesize != blocksize environment.
This aops checks that all buffers which correspond to a part of a file
that we want to read are uptodate. If so, we do not have to issue actual
read IO to HDD even if a page is not uptodate because the portion we
want to read are uptodate.
"block_is_partially_uptodate" function is already used by ext2/3/4.
With the following patch random read/write mixed workloads or random read after
random write workloads can be optimized and we can get performance improvement.
Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

1fca3a05

ocfs2: fix rare stale inode errors when exporting via nfs · 6ca497a8

由 wengang wang 提交于 3月 06, 2009

For nfs exporting, ocfs2_get_dentry() returns the dentry for fh.
ocfs2_get_dentry() may read from disk when the inode is not in memory,
without any cross cluster lock. this leads to the file system loading a
stale inode.

This patch fixes above problem.

Solution is that in case of inode is not in memory, we get the cluster
lock(PR) of alloc inode where the inode in question is allocated from (this
causes node on which deletion is done sync the alloc inode) before reading
out the inode itsself. then we check the bitmap in the group (the inode in
question allcated from) to see if the bit is clear. if it's clear then it's
stale. if the bit is set, we then check generation as the existing code
does.

We have to read out the inode in question from disk first to know its alloc
slot and allot bit. And if its not stale we read it out using ocfs2_iget().
The second read should then be from cache.

And also we have to add a per superblock nfs_sync_lock to cover the lock for
alloc inode and that for inode in question. this is because ocfs2_get_dentry()
and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked
in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so
that mutliple ocfs2_delete_inode() can run concurrently in normal case.

[mfasheh@suse.com: build warning fixes and comment cleanups]
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

6ca497a8

ocfs2/dlm: Tweak mle_state output · 9405dccf

由 Sunil Mushran 提交于 2月 26, 2009

The debugfs file, mle_state, now prints the number of largest number of mles
in one hash link.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

9405dccf

ocfs2/dlm: Do not purge lockres that is being migrated dlm_purge_lockres() · 516b7e52

由 Sunil Mushran 提交于 2月 26, 2009

This patch attempts to fix a fine race between purging and migration.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

516b7e52

ocfs2/dlm: Remove struct dlm_lock_name in struct dlm_master_list_entry · 7141514b

由 Sunil Mushran 提交于 2月 26, 2009

This patch removes struct dlm_lock_name and adds the entries directly
to struct dlm_master_list_entry. Under the new scheme, both mles that
are backed by a lockres or not, will have the name populated in mle->mname.
This allows us to get rid of code that was figuring out the location of
the mle name.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

7141514b

ocfs2/dlm: Show the number of lockres/mles in dlm_state · e64ff146

由 Sunil Mushran 提交于 2月 26, 2009

This patch shows the number of lockres' and mles in the debugfs file, dlm_state.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

e64ff146

ocfs2/dlm: dlm_set_lockres_owner() and dlm_change_lockres_owner() inlined · 7d62a978

由 Sunil Mushran 提交于 2月 26, 2009

This patch inlines dlm_set_lockres_owner() and dlm_change_lockres_owner().
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

7d62a978

ocfs2/dlm: Improve lockres counts · 6800791a

由 Sunil Mushran 提交于 2月 26, 2009

This patch replaces the lockres counts that tracked the number number of
locally and remotely mastered lockres' with a current and total count. The
total count is the number of lockres' that have been created since the dlm
domain was created.

The number of locally and remotely mastered counts can be computed using
the locking_state output.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

6800791a

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功