提交 · f945b7abcb6cfd3106c9855aa2aa6e4396a19d76 · OpenHarmony / kernel_linux

04 4月, 2009 32 次提交

ocfs2: recover orphans in offline slots during recovery and mount · 9140db04

由 Srinivas Eeda 提交于 3月 06, 2009

During recovery, a node recovers orphans in it's slot and the dead node(s). But
if the dead nodes were holding orphans in offline slots, they will be left
unrecovered.

If the dead node is the last one to die and is holding orphans in other slots
and is the first one to mount, then it only recovers it's own slot, which
leaves orphans in offline slots.

This patch queues complete_recovery to clean orphans for all offline slots
during mount and node recovery.
Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

9140db04

ocfs2: Pagecache usage optimization on ocfs2 · 1fca3a05

由 Hisashi Hifumi 提交于 3月 05, 2009

A page can have multiple buffers and even if a page is not uptodate, some buffers
can be uptodate on pagesize != blocksize environment.
This aops checks that all buffers which correspond to a part of a file
that we want to read are uptodate. If so, we do not have to issue actual
read IO to HDD even if a page is not uptodate because the portion we
want to read are uptodate.
"block_is_partially_uptodate" function is already used by ext2/3/4.
With the following patch random read/write mixed workloads or random read after
random write workloads can be optimized and we can get performance improvement.
Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

1fca3a05

ocfs2: fix rare stale inode errors when exporting via nfs · 6ca497a8

由 wengang wang 提交于 3月 06, 2009

For nfs exporting, ocfs2_get_dentry() returns the dentry for fh.
ocfs2_get_dentry() may read from disk when the inode is not in memory,
without any cross cluster lock. this leads to the file system loading a
stale inode.

This patch fixes above problem.

Solution is that in case of inode is not in memory, we get the cluster
lock(PR) of alloc inode where the inode in question is allocated from (this
causes node on which deletion is done sync the alloc inode) before reading
out the inode itsself. then we check the bitmap in the group (the inode in
question allcated from) to see if the bit is clear. if it's clear then it's
stale. if the bit is set, we then check generation as the existing code
does.

We have to read out the inode in question from disk first to know its alloc
slot and allot bit. And if its not stale we read it out using ocfs2_iget().
The second read should then be from cache.

And also we have to add a per superblock nfs_sync_lock to cover the lock for
alloc inode and that for inode in question. this is because ocfs2_get_dentry()
and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked
in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so
that mutliple ocfs2_delete_inode() can run concurrently in normal case.

[mfasheh@suse.com: build warning fixes and comment cleanups]
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

6ca497a8

ocfs2/dlm: Tweak mle_state output · 9405dccf

由 Sunil Mushran 提交于 2月 26, 2009

The debugfs file, mle_state, now prints the number of largest number of mles
in one hash link.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

9405dccf

ocfs2/dlm: Do not purge lockres that is being migrated dlm_purge_lockres() · 516b7e52

由 Sunil Mushran 提交于 2月 26, 2009

This patch attempts to fix a fine race between purging and migration.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

516b7e52

ocfs2/dlm: Remove struct dlm_lock_name in struct dlm_master_list_entry · 7141514b

由 Sunil Mushran 提交于 2月 26, 2009

This patch removes struct dlm_lock_name and adds the entries directly
to struct dlm_master_list_entry. Under the new scheme, both mles that
are backed by a lockres or not, will have the name populated in mle->mname.
This allows us to get rid of code that was figuring out the location of
the mle name.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

7141514b

ocfs2/dlm: Show the number of lockres/mles in dlm_state · e64ff146

由 Sunil Mushran 提交于 2月 26, 2009

This patch shows the number of lockres' and mles in the debugfs file, dlm_state.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

e64ff146

ocfs2/dlm: dlm_set_lockres_owner() and dlm_change_lockres_owner() inlined · 7d62a978

由 Sunil Mushran 提交于 2月 26, 2009

This patch inlines dlm_set_lockres_owner() and dlm_change_lockres_owner().
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

7d62a978

ocfs2/dlm: Improve lockres counts · 6800791a

由 Sunil Mushran 提交于 2月 26, 2009

This patch replaces the lockres counts that tracked the number number of
locally and remotely mastered lockres' with a current and total count. The
total count is the number of lockres' that have been created since the dlm
domain was created.

The number of locally and remotely mastered counts can be computed using
the locking_state output.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

6800791a

ocfs2/dlm: Track number of mles · 2041d8fd

由 Sunil Mushran 提交于 2月 26, 2009

The lifetime of a mle is limited to the duration of the lockres mastery
process. While typically this lifetime is fairly short, we have noticed
the number of mles explode under certain circumstances. This patch tracks
the number of each different types of mles and should help us determine
how best to speed up the mastery process.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

2041d8fd

ocfs2/dlm: Indent dlm_cleanup_master_list() · 67ae1f06

由 Sunil Mushran 提交于 2月 26, 2009

The previous patch explicitly did not indent dlm_cleanup_master_list()
so as to make the patch readable. This patch properly indents the
function.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

67ae1f06

ocfs2/dlm: Activate dlm->master_hash for master list entries · 2ed6c750

由 Sunil Mushran 提交于 2月 26, 2009

With this patch, the mles are stored in a hash and not a simple list.
This should improve the mle lookup time when the number of outstanding
masteries is large.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

2ed6c750

ocfs2/dlm: Create and destroy the dlm->master_hash · e2b66ddc

由 Sunil Mushran 提交于 2月 26, 2009

This patch adds code to create and destroy the dlm->master_hash.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

e2b66ddc

ocfs2/dlm: Refactor dlm_clean_master_list() · c2cd4a44

由 Sunil Mushran 提交于 2月 26, 2009

This patch refactors dlm_clean_master_list() so as to make it
easier to convert the mle list to a hash.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

c2cd4a44

ocfs2/dlm: Clean up struct dlm_lock_name · f77a9a78

由 Sunil Mushran 提交于 2月 26, 2009

For master mle, the name it stored in the attached lockres in struct qstr.
For block and migration mle, the name is stored inline in struct dlm_lock_name.
This patch attempts to make struct dlm_lock_name look like a struct qstr. While
we could use struct qstr, we don't because we want to avoid having to malloc
and free the lockname string as the mle's lifetime is fairly short.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

f77a9a78

ocfs2/dlm: Encapsulate adding and removing of mle from dlm->master_list · 1c084577

由 Sunil Mushran 提交于 2月 26, 2009

This patch encapsulates adding and removing of the mle from the
dlm->master_list. This patch is part of the series of patches that
converts the mle list to a mle hash.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

1c084577

ocfs2: Optimize inode group allocation by recording last used group. · feb473a6

由 Tao Ma 提交于 2月 25, 2009

In ocfs2, the block group search looks for the "emptiest" group
to allocate from. So if the allocator has many equally(or almost
equally) empty groups, new block group will tend to get spread
out amongst them.

So we add osb_inode_alloc_group in ocfs2_super to record the last
used inode allocation group.
For more details, please see
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.

I have done some basic test and the results are a ten times improvement on
some cold-cache stat workloads.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

feb473a6

ocfs2: Allocate inode groups from global_bitmap. · 60ca81e8

由 Tao Ma 提交于 2月 25, 2009

Inode groups used to be allocated from local alloc file,
but since we want all inodes to be contiguous enough, we
will try to allocate them directly from global_bitmap.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

60ca81e8

ocfs2: Optimize inode allocation by remembering last group · 13821151

由 Tao Ma 提交于 2月 25, 2009

In ocfs2, the inode block search looks for the "emptiest" inode
group to allocate from. So if an inode alloc file has many equally
(or almost equally) empty groups, new inodes will tend to get
spread out amongst them, which in turn can put them all over the
disk. This is undesirable because directory operations on conceptually
"nearby" inodes force a large number of seeks.

So we add ip_last_used_group in core directory inodes which records
the last used allocation group. Another field named ip_last_used_slot
is also added in case inode stealing happens. When claiming new inode,
we passed in directory's inode so that the allocation can use this
information.
For more details, please see
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

13821151

ocfs2: fix leaf start calculation in ocfs2_dx_dir_rebalance() · 1d46dc08

由 Mark Fasheh 提交于 2月 19, 2009

ocfs2_dx_dir_rebalance() is passed the block offset of a dx leaf which needs
rebalancing. Since we rebalance an entire cluster at a time however, this
function needs to calculate the beginning of that cluster, in blocks. The
calculation was wrong, which would result in a read of non-leaf blocks. Fix
the calculation by adding ocfs2_block_to_cluster_start() which is a more
straight-forward way of determining this.
Reported-by: NTristan Ye <tristan.ye@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

1d46dc08

ocfs2: re-order ocfs2_empty_dir checks · b80b549c

由 Mark Fasheh 提交于 2月 18, 2009

ocfs2_empty_dir() is far more expensive than checking link count. Since both
need to be checked at the same time, we can improve performance by checking
link count first.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

b80b549c

ocfs2: Enable indexed directories · 3a8df2b9

由 Mark Fasheh 提交于 11月 24, 2008

Since the disk format is finalized, we can set this feature bit in the
supported mask.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <Joel.Becker@oracle.com>

3a8df2b9

ocfs2: Add total entry count to dx_root_block · e3a93c2d

由 Mark Fasheh 提交于 2月 17, 2009

This little bit of extra accounting speeds up ocfs2_empty_dir()
dramatically by allowing us to short-circuit the full directory scan.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

e3a93c2d

ocfs2: Increase max links count · 198a1ca3

由 Mark Fasheh 提交于 11月 20, 2008

Since we've now got a directory format capable of handling a large number of
entries, we can increase the maximum link count supported. This only gets
increased if the directory indexing feature is turned on.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>

198a1ca3

ocfs2: Introduce dir free space list · e7c17e43

由 Mark Fasheh 提交于 1月 29, 2009

The only operation which doesn't get faster with directory indexing is
insert, which still has to walk the entire unindexed directory portion to
find a free block. This patch provides an improvement in directory insert
performance by maintaining a singly linked list of directory leaf blocks
which have space for additional dirents.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>

e7c17e43

ocfs2: Store dir index records inline · 4ed8a6bb

由 Mark Fasheh 提交于 11月 24, 2008

Allow us to store a small number of directory index records in the
ocfs2_dx_root_block. This saves us a disk read on small to medium sized
directories (less than about 250 entries). The inline root is automatically
turned into a root block with extents if the directory size increases beyond
it's capacity.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>

4ed8a6bb

ocfs2: Add a name indexed b-tree to directory inodes · 9b7895ef

由 Mark Fasheh 提交于 11月 12, 2008

This patch makes use of Ocfs2's flexible btree code to add an additional
tree to directory inodes. The new tree stores an array of small,
fixed-length records in each leaf block. Each record stores a hash value,
and pointer to a block in the traditional (unindexed) directory tree where a
dirent with the given name hash resides. Lookup exclusively uses this tree
to find dirents, thus providing us with constant time name lookups.

Some of the hashing code was copied from ext3. Unfortunately, it has lots of
unfixed checkpatch errors. I left that as-is so that tracking changes would
be easier.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>

9b7895ef

ocfs2: Introduce dir lookup helper struct · 4a12ca3a

由 Mark Fasheh 提交于 11月 12, 2008

Many directory manipulation calls pass around a tuple of dirent, and it's
containing buffer_head. Dir indexing has a bit more state, but instead of
adding yet more arguments to functions, we introduce 'struct
ocfs2_dir_lookup_result'. In this patch, it simply holds the same tuple, but
future patches will add more state.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>

4a12ca3a

ocfs2: Remove debugfs file local_alloc_stats · 59b526a3

由 Sunil Mushran 提交于 12月 16, 2008

This patch removes the debugfs file local_alloc_stats as that information
is now included in the fs_state debugfs file.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

59b526a3

ocfs2: Expose the file system state via debugfs · 50397507

由 Sunil Mushran 提交于 12月 17, 2008

This patch creates a per mount debugfs file, fs_state, which exposes
information like, cluster stack in use, states of the downconvert, recovery
and commit threads, number of journal txns, some allocation stats, list of
all slots, etc.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

50397507

ocfs2: Move struct recovery_map to a header file · 96a6c64b

由 Sunil Mushran 提交于 12月 16, 2008

Move the definition of struct recovery_map from journal.c to journal.h. This
is preparation for the next patch.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

96a6c64b

ocfs2/hb: Expose the list of heartbeating nodes via debugfs · 87d3d3f3

由 Sunil Mushran 提交于 12月 17, 2008

This patch creates a debugfs file, o2hb/livesnodes, which exposes the
aggregate list of heartbeating node across all heartbeat regions.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

87d3d3f3

03 4月, 2009 8 次提交

NFS: Add mount options to enable local caching on NFS · b797cac7

由 David Howells 提交于 4月 03, 2009

Add NFS mount options to allow the local caching support to be enabled.

The attached patch makes it possible for the NFS filesystem to be told to make
use of the network filesystem local caching service (FS-Cache).

To be able to use this, a recent nfsutils package is required.

There are three variant NFS mount options that can be added to a mount command
to control caching for a mount.  Only the last one specified takes effect:

 (*) Adding "fsc" will request caching.

 (*) Adding "fsc=<string>" will request caching and also specify a uniquifier.

 (*) Adding "nofsc" will disable caching.

For example:

	mount warthog:/ /a -o fsc

The cache of a particular superblock (NFS FSID) will be shared between all
mounts of that volume, provided they have the same connection parameters and
are not marked 'nosharecache'.

Where it is otherwise impossible to distinguish superblocks because all the
parameters are identical, but the 'nosharecache' option is supplied, a
uniquifying string must be supplied, else only the first mount will be
permitted to use the cache.

If there's a key collision, then the second mount will disable caching and give
a warning into the kernel log.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NSteve Dickson <steved@redhat.com>
Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>

b797cac7

NFS: Display local caching state · 5d1acff1

由 David Howells 提交于 4月 03, 2009

Display the local caching state in /proc/fs/nfsfs/volumes.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NSteve Dickson <steved@redhat.com>
Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>

5d1acff1

NFS: Store pages from an NFS inode into a local cache · 7f8e05f6

由 David Howells 提交于 4月 03, 2009

Store pages from an NFS inode into the cache data storage object associated
with that inode.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NSteve Dickson <steved@redhat.com>
Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>

7f8e05f6

NFS: Read pages from FS-Cache into an NFS inode · 9a9fc1c0

由 David Howells 提交于 4月 03, 2009

Read pages from an FS-Cache data storage object representing an inode into an
NFS inode.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NSteve Dickson <steved@redhat.com>
Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>

9a9fc1c0

NFS: nfs_readpage_async() needs to be accessible as a fallback for local caching · f42b293d

由 David Howells 提交于 4月 03, 2009

nfs_readpage_async() needs to be non-static so that it can be used as a
fallback for the local on-disk caching should an EIO crop up when reading the
cache.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NSteve Dickson <steved@redhat.com>
Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>

f42b293d

NFS: Add read context retention for FS-Cache to call back with · 1fcdf534

由 David Howells 提交于 4月 03, 2009

Add read context retention so that FS-Cache can call back into NFS when a read
operation on the cache fails EIO rather than reading data.  This permits NFS to
then fetch the data from the server instead using the appropriate security
context.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NSteve Dickson <steved@redhat.com>
Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>

1fcdf534

NFS: FS-Cache page management · 545db45f

由 David Howells 提交于 4月 03, 2009

FS-Cache page management for NFS.  This includes hooking the releasing and
invalidation of pages marked with PG_fscache (aka PG_private_2) and waiting for
completion of the write-to-cache flag (PG_fscache_write aka PG_owner_priv_2).
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NSteve Dickson <steved@redhat.com>
Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>

545db45f

NFS: Add some new I/O counters for FS-Cache doing things for NFS · 6a51091d

由 David Howells 提交于 4月 03, 2009

Add some new NFS I/O counters for FS-Cache doing things for NFS.  A new line is
emitted into /proc/pid/mountstats if caching is enabled that looks like:

	fsc: <rok> <rfl> <wok> <wfl> <unc>

Where <rok> is the number of pages read successfully from the cache, <rfl> is
the number of failed page reads against the cache, <wok> is the number of
successful page writes to the cache, <wfl> is the number of failed page writes
to the cache, and <unc> is the number of NFS pages that have been disconnected
from the cache.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NSteve Dickson <steved@redhat.com>
Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>

6a51091d

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年