提交 · 7bfac9ecf0585962fe13584f5cf526d8c8e76f17 · OpenHarmony / kernel_linux

07 4月, 2009 1 次提交

splice: fix deadlock in splicing to file · 7bfac9ec

由 Miklos Szeredi 提交于 4月 06, 2009

There's a possible deadlock in generic_file_splice_write(),
splice_from_pipe() and ocfs2_file_splice_write():

 - task A calls generic_file_splice_write()
 - this calls inode_double_lock(), which locks i_mutex on both
   pipe->inode and target inode
 - ordering depends on inode pointers, can happen that pipe->inode is
   locked first
 - __splice_from_pipe() needs more data, calls pipe_wait()
 - this releases lock on pipe->inode, goes to interruptible sleep
 - task B calls generic_file_splice_write(), similarly to the first
 - this locks pipe->inode, then tries to lock inode, but that is
   already held by task A
 - task A is interrupted, it tries to lock pipe->inode, but fails, as
   it is already held by task B
 - ABBA deadlock

Fix this by explicitly ordering locks: the outer lock must be on
target inode and the inner lock (which is later unlocked and relocked)
must be on pipe->inode.  This is OK, pipe inodes and target inodes
form two nonoverlapping sets, generic_file_splice_write() and friends
are not called with a target which is a pipe.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJens Axboe <jens.axboe@oracle.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7bfac9ec

04 4月, 2009 32 次提交

ocfs2: recover orphans in offline slots during recovery and mount · 9140db04

由 Srinivas Eeda 提交于 3月 06, 2009

During recovery, a node recovers orphans in it's slot and the dead node(s). But
if the dead nodes were holding orphans in offline slots, they will be left
unrecovered.

If the dead node is the last one to die and is holding orphans in other slots
and is the first one to mount, then it only recovers it's own slot, which
leaves orphans in offline slots.

This patch queues complete_recovery to clean orphans for all offline slots
during mount and node recovery.
Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

9140db04

ocfs2: Pagecache usage optimization on ocfs2 · 1fca3a05

由 Hisashi Hifumi 提交于 3月 05, 2009

A page can have multiple buffers and even if a page is not uptodate, some buffers
can be uptodate on pagesize != blocksize environment.
This aops checks that all buffers which correspond to a part of a file
that we want to read are uptodate. If so, we do not have to issue actual
read IO to HDD even if a page is not uptodate because the portion we
want to read are uptodate.
"block_is_partially_uptodate" function is already used by ext2/3/4.
With the following patch random read/write mixed workloads or random read after
random write workloads can be optimized and we can get performance improvement.
Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

1fca3a05

ocfs2: fix rare stale inode errors when exporting via nfs · 6ca497a8

由 wengang wang 提交于 3月 06, 2009

For nfs exporting, ocfs2_get_dentry() returns the dentry for fh.
ocfs2_get_dentry() may read from disk when the inode is not in memory,
without any cross cluster lock. this leads to the file system loading a
stale inode.

This patch fixes above problem.

Solution is that in case of inode is not in memory, we get the cluster
lock(PR) of alloc inode where the inode in question is allocated from (this
causes node on which deletion is done sync the alloc inode) before reading
out the inode itsself. then we check the bitmap in the group (the inode in
question allcated from) to see if the bit is clear. if it's clear then it's
stale. if the bit is set, we then check generation as the existing code
does.

We have to read out the inode in question from disk first to know its alloc
slot and allot bit. And if its not stale we read it out using ocfs2_iget().
The second read should then be from cache.

And also we have to add a per superblock nfs_sync_lock to cover the lock for
alloc inode and that for inode in question. this is because ocfs2_get_dentry()
and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked
in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so
that mutliple ocfs2_delete_inode() can run concurrently in normal case.

[mfasheh@suse.com: build warning fixes and comment cleanups]
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

6ca497a8

ocfs2/dlm: Tweak mle_state output · 9405dccf

由 Sunil Mushran 提交于 2月 26, 2009

The debugfs file, mle_state, now prints the number of largest number of mles
in one hash link.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

9405dccf

ocfs2/dlm: Do not purge lockres that is being migrated dlm_purge_lockres() · 516b7e52

由 Sunil Mushran 提交于 2月 26, 2009

This patch attempts to fix a fine race between purging and migration.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

516b7e52

ocfs2/dlm: Remove struct dlm_lock_name in struct dlm_master_list_entry · 7141514b

由 Sunil Mushran 提交于 2月 26, 2009

This patch removes struct dlm_lock_name and adds the entries directly
to struct dlm_master_list_entry. Under the new scheme, both mles that
are backed by a lockres or not, will have the name populated in mle->mname.
This allows us to get rid of code that was figuring out the location of
the mle name.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

7141514b

ocfs2/dlm: Show the number of lockres/mles in dlm_state · e64ff146

由 Sunil Mushran 提交于 2月 26, 2009

This patch shows the number of lockres' and mles in the debugfs file, dlm_state.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

e64ff146

ocfs2/dlm: dlm_set_lockres_owner() and dlm_change_lockres_owner() inlined · 7d62a978

由 Sunil Mushran 提交于 2月 26, 2009

This patch inlines dlm_set_lockres_owner() and dlm_change_lockres_owner().
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

7d62a978

ocfs2/dlm: Improve lockres counts · 6800791a

由 Sunil Mushran 提交于 2月 26, 2009

This patch replaces the lockres counts that tracked the number number of
locally and remotely mastered lockres' with a current and total count. The
total count is the number of lockres' that have been created since the dlm
domain was created.

The number of locally and remotely mastered counts can be computed using
the locking_state output.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

6800791a

ocfs2/dlm: Track number of mles · 2041d8fd

由 Sunil Mushran 提交于 2月 26, 2009

The lifetime of a mle is limited to the duration of the lockres mastery
process. While typically this lifetime is fairly short, we have noticed
the number of mles explode under certain circumstances. This patch tracks
the number of each different types of mles and should help us determine
how best to speed up the mastery process.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

2041d8fd

ocfs2/dlm: Indent dlm_cleanup_master_list() · 67ae1f06

由 Sunil Mushran 提交于 2月 26, 2009

The previous patch explicitly did not indent dlm_cleanup_master_list()
so as to make the patch readable. This patch properly indents the
function.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

67ae1f06

ocfs2/dlm: Activate dlm->master_hash for master list entries · 2ed6c750

由 Sunil Mushran 提交于 2月 26, 2009

With this patch, the mles are stored in a hash and not a simple list.
This should improve the mle lookup time when the number of outstanding
masteries is large.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

2ed6c750

ocfs2/dlm: Create and destroy the dlm->master_hash · e2b66ddc

由 Sunil Mushran 提交于 2月 26, 2009

This patch adds code to create and destroy the dlm->master_hash.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

e2b66ddc

ocfs2/dlm: Refactor dlm_clean_master_list() · c2cd4a44

由 Sunil Mushran 提交于 2月 26, 2009

This patch refactors dlm_clean_master_list() so as to make it
easier to convert the mle list to a hash.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

c2cd4a44

ocfs2/dlm: Clean up struct dlm_lock_name · f77a9a78

由 Sunil Mushran 提交于 2月 26, 2009

For master mle, the name it stored in the attached lockres in struct qstr.
For block and migration mle, the name is stored inline in struct dlm_lock_name.
This patch attempts to make struct dlm_lock_name look like a struct qstr. While
we could use struct qstr, we don't because we want to avoid having to malloc
and free the lockname string as the mle's lifetime is fairly short.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

f77a9a78

ocfs2/dlm: Encapsulate adding and removing of mle from dlm->master_list · 1c084577

由 Sunil Mushran 提交于 2月 26, 2009

This patch encapsulates adding and removing of the mle from the
dlm->master_list. This patch is part of the series of patches that
converts the mle list to a mle hash.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

1c084577

ocfs2: Optimize inode group allocation by recording last used group. · feb473a6

由 Tao Ma 提交于 2月 25, 2009

In ocfs2, the block group search looks for the "emptiest" group
to allocate from. So if the allocator has many equally(or almost
equally) empty groups, new block group will tend to get spread
out amongst them.

So we add osb_inode_alloc_group in ocfs2_super to record the last
used inode allocation group.
For more details, please see
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.

I have done some basic test and the results are a ten times improvement on
some cold-cache stat workloads.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

feb473a6

ocfs2: Allocate inode groups from global_bitmap. · 60ca81e8

由 Tao Ma 提交于 2月 25, 2009

Inode groups used to be allocated from local alloc file,
but since we want all inodes to be contiguous enough, we
will try to allocate them directly from global_bitmap.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

60ca81e8

ocfs2: Optimize inode allocation by remembering last group · 13821151

由 Tao Ma 提交于 2月 25, 2009

In ocfs2, the inode block search looks for the "emptiest" inode
group to allocate from. So if an inode alloc file has many equally
(or almost equally) empty groups, new inodes will tend to get
spread out amongst them, which in turn can put them all over the
disk. This is undesirable because directory operations on conceptually
"nearby" inodes force a large number of seeks.

So we add ip_last_used_group in core directory inodes which records
the last used allocation group. Another field named ip_last_used_slot
is also added in case inode stealing happens. When claiming new inode,
we passed in directory's inode so that the allocation can use this
information.
For more details, please see
http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

13821151

ocfs2: fix leaf start calculation in ocfs2_dx_dir_rebalance() · 1d46dc08

由 Mark Fasheh 提交于 2月 19, 2009

ocfs2_dx_dir_rebalance() is passed the block offset of a dx leaf which needs
rebalancing. Since we rebalance an entire cluster at a time however, this
function needs to calculate the beginning of that cluster, in blocks. The
calculation was wrong, which would result in a read of non-leaf blocks. Fix
the calculation by adding ocfs2_block_to_cluster_start() which is a more
straight-forward way of determining this.
Reported-by: NTristan Ye <tristan.ye@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

1d46dc08

ocfs2: re-order ocfs2_empty_dir checks · b80b549c

由 Mark Fasheh 提交于 2月 18, 2009

ocfs2_empty_dir() is far more expensive than checking link count. Since both
need to be checked at the same time, we can improve performance by checking
link count first.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

b80b549c

ocfs2: Enable indexed directories · 3a8df2b9

由 Mark Fasheh 提交于 11月 24, 2008

Since the disk format is finalized, we can set this feature bit in the
supported mask.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <Joel.Becker@oracle.com>

3a8df2b9

ocfs2: Add total entry count to dx_root_block · e3a93c2d

由 Mark Fasheh 提交于 2月 17, 2009

This little bit of extra accounting speeds up ocfs2_empty_dir()
dramatically by allowing us to short-circuit the full directory scan.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

e3a93c2d

ocfs2: Increase max links count · 198a1ca3

由 Mark Fasheh 提交于 11月 20, 2008

Since we've now got a directory format capable of handling a large number of
entries, we can increase the maximum link count supported. This only gets
increased if the directory indexing feature is turned on.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>

198a1ca3

ocfs2: Introduce dir free space list · e7c17e43

由 Mark Fasheh 提交于 1月 29, 2009

The only operation which doesn't get faster with directory indexing is
insert, which still has to walk the entire unindexed directory portion to
find a free block. This patch provides an improvement in directory insert
performance by maintaining a singly linked list of directory leaf blocks
which have space for additional dirents.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>

e7c17e43

ocfs2: Store dir index records inline · 4ed8a6bb

由 Mark Fasheh 提交于 11月 24, 2008

Allow us to store a small number of directory index records in the
ocfs2_dx_root_block. This saves us a disk read on small to medium sized
directories (less than about 250 entries). The inline root is automatically
turned into a root block with extents if the directory size increases beyond
it's capacity.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>

4ed8a6bb

ocfs2: Add a name indexed b-tree to directory inodes · 9b7895ef

由 Mark Fasheh 提交于 11月 12, 2008

This patch makes use of Ocfs2's flexible btree code to add an additional
tree to directory inodes. The new tree stores an array of small,
fixed-length records in each leaf block. Each record stores a hash value,
and pointer to a block in the traditional (unindexed) directory tree where a
dirent with the given name hash resides. Lookup exclusively uses this tree
to find dirents, thus providing us with constant time name lookups.

Some of the hashing code was copied from ext3. Unfortunately, it has lots of
unfixed checkpatch errors. I left that as-is so that tracking changes would
be easier.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>

9b7895ef

ocfs2: Introduce dir lookup helper struct · 4a12ca3a

由 Mark Fasheh 提交于 11月 12, 2008

Many directory manipulation calls pass around a tuple of dirent, and it's
containing buffer_head. Dir indexing has a bit more state, but instead of
adding yet more arguments to functions, we introduce 'struct
ocfs2_dir_lookup_result'. In this patch, it simply holds the same tuple, but
future patches will add more state.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>

4a12ca3a

ocfs2: Remove debugfs file local_alloc_stats · 59b526a3

由 Sunil Mushran 提交于 12月 16, 2008

This patch removes the debugfs file local_alloc_stats as that information
is now included in the fs_state debugfs file.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

59b526a3

ocfs2: Expose the file system state via debugfs · 50397507

由 Sunil Mushran 提交于 12月 17, 2008

This patch creates a per mount debugfs file, fs_state, which exposes
information like, cluster stack in use, states of the downconvert, recovery
and commit threads, number of journal txns, some allocation stats, list of
all slots, etc.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

50397507

ocfs2: Move struct recovery_map to a header file · 96a6c64b

由 Sunil Mushran 提交于 12月 16, 2008

Move the definition of struct recovery_map from journal.c to journal.h. This
is preparation for the next patch.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

96a6c64b

ocfs2/hb: Expose the list of heartbeating nodes via debugfs · 87d3d3f3

由 Sunil Mushran 提交于 12月 17, 2008

This patch creates a debugfs file, o2hb/livesnodes, which exposes the
aggregate list of heartbeating node across all heartbeat regions.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

87d3d3f3

01 4月, 2009 2 次提交

mm: page_mkwrite change prototype to match fault · c2ec175c

由 Nick Piggin 提交于 3月 31, 2009

Change the page_mkwrite prototype to take a struct vm_fault, and return
VM_FAULT_xxx flags.  There should be no functional change.

This makes it possible to return much more detailed error information to
the VM (and also can provide more information eg.  virtual_address to the
driver, which might be important in some special cases).

This is required for a subsequent fix.  And will also make it easier to
merge page_mkwrite() with fault() in future.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Artem Bityutskiy <dedekind@infradead.org>
Cc: Felix Blyakher <felixb@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c2ec175c

New helper - current_umask() · ce3b0f8d

由 Al Viro 提交于 3月 29, 2009

current->fs->umask is what most of fs_struct users are doing.
Put that into a helper function.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ce3b0f8d

28 3月, 2009 1 次提交
- A
  constify dentry_operations: OCFS2 · d8fba0ff
  由 Al Viro 提交于 2月 20, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  d8fba0ff
13 3月, 2009 4 次提交

ocfs2: Use xs->bucket to set xattr value outside · 712e53e4

由 Tao Ma 提交于 3月 12, 2009

A long time ago, xs->base is allocated a 4K size and all the contents
in the bucket are copied to the it. Now we use ocfs2_xattr_bucket to
abstract xattr bucket and xs->base is initialized to the start of the
bu_bhs[0]. So xs->base + offset will overflow when the value root is
stored outside the first block.

Then why we can survive the xattr test by now? It is because we always
read the bucket contiguously now and kernel mm allocate continguous
memory for us. We are lucky, but we should fix it. So just get the
right value root as other callers do.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

712e53e4

ocfs2: Fix a bug found by sparse check. · 74e77eb3

由 Tao Ma 提交于 3月 12, 2009

We need to use le32_to_cpu to test rec->e_cpos in
ocfs2_dinode_insert_check.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

74e77eb3

ocfs2: tweak to get the maximum inline data size with xattr · d9ae49d6

由 Tiger Yang 提交于 3月 05, 2009

Replace max_inline_data with max_inline_data_with_xattr
to ensure it correct when xattr inlined.
Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

d9ae49d6

ocfs2: reserve xattr block for new directory with inline data · 6c9fd1dc

由 Tiger Yang 提交于 3月 06, 2009

If this is a new directory with inline data, we choose to
reserve the entire inline area for directory contents and
force an external xattr block.
Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

6c9fd1dc

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年