提交 · 7ce1418f95e918cfc5ad36e3ec3431145c768cd0 · openeuler / Kernel

22 5月, 2010 2 次提交

ocfs2: Use __dquot_transfer to avoid lock inversion · 52a9ee28

由 Jan Kara 提交于 5月 13, 2010

dquot_transfer() acquires own references to dquots via dqget(). Thus it waits
for dq_lock which creates a lock inversion because dq_lock ranks above
transaction start but transaction is already started in ocfs2_setattr(). Fix
the problem by passing own references directly to __dquot_transfer.
Acked-by: NJoel Becker <Joel.Becker@oracle.com>
Signed-off-by: NJan Kara <jack@suse.cz>

52a9ee28

quota: unify quota init condition in setattr · 12755627

由 Dmitry Monakhov 提交于 4月 08, 2010

Quota must being initialized if size or uid/git changes requested.
But initialization performed in two different places:
in case of i_size file system is responsible for dquot init
, but in case of uid/gid init will be called internally in
dquot_transfer().
This ambiguity makes code harder to understand.
Let's move this logic to one common helper function.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NJan Kara <jack@suse.cz>

12755627

19 5月, 2010 3 次提交

Ocfs2: Optimize punching-hole code. · c1631d4a

由 Tristan Ye 提交于 5月 11, 2010

This patch simplifies the logic of handling existing holes and
skipping extent blocks and removes some confusing comments.

The patch survived the fill_verify_holes testcase in ocfs2-test.
It also passed my manual sanity check and stress tests with enormous
extent records.

Currently punching a hole on a file with 3+ extent tree depth was
really a performance disaster.  It can even take several hours,
though we may not hit this in real life with such a huge extent
number.

One simple way to improve the performance is quite straightforward.
From the logic of truncate, we can punch the hole from hole_end to
hole_start, which reduces the overhead of btree operations in a
significant way, such as tree rotation and moving.

Following is the testing result when punching hole from 0 to file end
in bytes, on a 1G file, 1G file consists of 256k extent records, each record
cover 4k data(just one cluster, clustersize is 4k):

===========================================================================
 * Original punching-hole mechanism:
===========================================================================

   I waited 1 hour for its completion, unfortunately it's still ongoing.

===========================================================================
 * Patched punching-hode mechanism:
===========================================================================

   real 0m2.518s
   user 0m0.000s
   sys  0m2.445s

That means we've gained up to 1000 times improvement on performance in this
case, whee! It's fairly cool. and it looks like that performance gain will
be raising when extent records grow.

The patch was based on my former 2 patches, which were about truncating
codes optimization and fixup to handle CoW on punching hole.
Signed-off-by: NTristan Ye <tristan.ye@oracle.com>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

c1631d4a

Ocfs2: Fix hole punching to correctly do CoW during cluster zeroing. · e8aec068

由 Tristan Ye 提交于 5月 11, 2010

Based on the previous patch of optimizing truncate, the bugfix for
refcount trees when punching holes can be fairly easy
and straightforward since most of work we should take into account for
refcounting have been completed already in ocfs2_remove_btree_range().

This patch performs CoW for refcounted extents when a hole being punched
whose start or end offset were in the middle of a cluster, which means
partial zeroing of the cluster will be performed soon.

The patch has been tested fixing the following bug:

http://oss.oracle.com/bugzilla/show_bug.cgi?id=1216Signed-off-by: NTristan Ye <tristan.ye@oracle.com>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

e8aec068

Ocfs2: Optimize ocfs2 truncate to use ocfs2_remove_btree_range() instead. · 78f94673

由 Tristan Ye 提交于 5月 11, 2010

Truncate is just a special case of punching holes(from new i_size to
end), we therefore could take advantage of the existing
ocfs2_remove_btree_range() to reduce the comlexity and redundancy in
alloc.c.  The goal here is to make truncate more generic and
straightforward.

Several functions only used by ocfs2_commit_truncate() will smiply be
removed.

ocfs2_remove_btree_range() was originally used by the hole punching
code, which didn't take refcount trees into account (definitely a bug).
We therefore need to change that func a bit to handle refcount trees.
It must take the refcount lock, calculate and reserve blocks for
refcount tree changes, and decrease refcounts at the end.  We replace 
ocfs2_lock_allocators() here by adding a new func
ocfs2_reserve_blocks_for_rec_trunc() which accepts some extra blocks to
reserve.  This will not hurt any other code using
ocfs2_remove_btree_range() (such as dir truncate and hole punching).

I merged the following steps into one patch since they may be
logically doing one thing, though I know it looks a little bit fat
to review.

1). Remove redundant code used by ocfs2_commit_truncate(), since we're
    moving to ocfs2_remove_btree_range anyway.

2). Add a new func ocfs2_reserve_blocks_for_rec_trunc() for purpose of
    accepting some extra blocks to reserve.

3). Change ocfs2_prepare_refcount_change_for_del() a bit to fit our
    needs.  It's safe to do this since it's only being called by
    truncate.

4). Change ocfs2_remove_btree_range() a bit to take refcount case into
    account.

5). Finally, we change ocfs2_commit_truncate() to call
    ocfs2_remove_btree_range() in a proper way.

The patch has been tested normally for sanity check, stress tests
with heavier workload will be expected.

Based on this patch, fixing the punching holes bug will be fairly easy.
Signed-off-by: NTristan Ye <tristan.ye@oracle.com>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

78f94673

06 5月, 2010 2 次提交

ocfs2: use allocation reservations during file write · 4fe370af

由 Mark Fasheh 提交于 12月 07, 2009

Add a per-inode reservations structure and pass it through to the
reservations code.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

4fe370af

ocfs2: Make ocfs2_journal_dirty() void. · ec20cec7

由 Joel Becker 提交于 3月 19, 2010

jbd[2]_journal_dirty_metadata() only returns 0.  It's been returning 0
since before the kernel moved to git.  There is no point in checking
this error.

ocfs2_journal_dirty() has been faithfully returning the status since the
beginning.  All over ocfs2, we have blocks of code checking this can't
fail status.  In the past few years, we've tried to avoid adding these
checks, because they are pointless.  But anyone who looks at our code
assumes they are needed.

Finally, ocfs2_journal_dirty() is made a void function.  All error
checking is removed from other files.  We'll BUG_ON() the status of
jbd2_journal_dirty_metadata() just in case they change it someday.  They
won't.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

ec20cec7

01 5月, 2010 1 次提交

ocfs2: Avoid direct write if we fall back to buffered I/O · 6b933c8e

由 Li Dongyang 提交于 4月 17, 2010

when we fall back to buffered write from direct write, we call
__generic_file_aio_write() but that will end up doing direct write
even we are only prepared to do buffered write because the file
has the O_DIRECT flag set. This is a fix for
https://bugzilla.novell.com/show_bug.cgi?id=591039
revised with Joel's comments.
Signed-off-by: NLi Dongyang <lidongyang@novell.com>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

6b933c8e

16 4月, 2010 1 次提交

ocfs2: Reset status if we want to restart file extension. · 79681842

由 Tao Ma 提交于 4月 16, 2010

In __ocfs2_extend_allocation, we will restart our file extension
if ((!status) && restart_func). But there is a bug that the
status is still left as -EGAIN. This is really an old bug,
but it is masked by the return value of ocfs2_journal_dirty.
So it show up when we make ocfs2_journal_dirty void.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

79681842

31 3月, 2010 1 次提交

ocfs2: one more warning fix in ocfs2_file_aio_write(), v2 · a03ab788

由 Coly Li 提交于 3月 26, 2010

This patch fixes another compiling warning in ocfs2_file_aio_write() like this,
fs/ocfs2/file.c: In function ‘ocfs2_file_aio_write’:
fs/ocfs2/file.c:2026: warning: suggest parentheses around ‘&&’ within ‘||’

As Joel suggested, '!ret' is unary, this version removes the wrap from '!ret'.
Signed-off-by: NColy Li <coly.li@suse.de>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

a03ab788

05 3月, 2010 4 次提交

dquot: cleanup dquot initialize routine · 871a2931

由 Christoph Hellwig 提交于 3月 03, 2010

Get rid of the initialize dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.

Rename the now static low-level dquot_initialize helper to __dquot_initialize
and vfs_dq_init to dquot_initialize to have a consistent namespace.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

871a2931

dquot: move dquot initialization responsibility into the filesystem · 907f4554

由 Christoph Hellwig 提交于 3月 03, 2010

Currently various places in the VFS call vfs_dq_init directly.  This means
we tie the quota code into the VFS.  Get rid of that and make the
filesystem responsible for the initialization.   For most metadata operations
this is a straight forward move into the methods, but for truncate and
open it's a bit more complicated.

For truncate we currently only call vfs_dq_init for the sys_truncate case
because open already takes care of it for ftruncate and open(O_TRUNC) - the
new code causes an additional vfs_dq_init for those which is harmless.

For open the initialization is moved from do_filp_open into the open method,
which means it happens slightly earlier now, and only for regular files.
The latter is fine because we don't need to initialize it for operations
on special files, and we already do it as part of the namespace operations
for directories.

Add a dquot_file_open helper that filesystems that support generic quotas
can use to fill in ->open.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

907f4554

dquot: cleanup dquot transfer routine · b43fa828

由 Christoph Hellwig 提交于 3月 03, 2010

Get rid of the transfer dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.

Rename the now static low-level dquot_transfer helper to __dquot_transfer
and vfs_dq_transfer to dquot_transfer to have a consistent namespace,
and make the new dquot_transfer return a normal negative errno value
which all callers expect.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

b43fa828

dquot: cleanup space allocation / freeing routines · 5dd4056d

由 Christoph Hellwig 提交于 3月 03, 2010

Get rid of the alloc_space, free_space, reserve_space, claim_space and
release_rsv dquot operations - they are always called from the filesystem
and if a filesystem really needs their own (which none currently does)
it can just call into it's own routine directly.

Move shared logic into the common __dquot_alloc_space,
dquot_claim_space_nodirty and __dquot_free_space low-level methods,
and rationalize the wrappers around it to move as much as possible
code into the common block for CONFIG_QUOTA vs not.  Also rename
all these helpers to be named dquot_* instead of vfs_dq_*.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

5dd4056d

28 2月, 2010 1 次提交

ocfs2: send SIGXFSZ if new filesize exceeds limit -v2 · 5051f768

由 Wengang Wang 提交于 2月 26, 2010

This patch makes ocfs2 send SIGXFSZ if new file size exceeds the rlimit.
Processes may get SIGXFSZ on one node (in the cluster) while others will
not on another if file size limits are different on the two nodes.
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

5051f768

27 2月, 2010 2 次提交

ocfs2: fix warning in ocfs2_file_aio_write() · 66b116c9

由 Coly Li 提交于 2月 25, 2010

This patch fixes a compiling warning in ocfs2_file_aio_write().
Signed-off-by: NColy Li <coly.li@suse.de>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

66b116c9

ocfs2: Clean up the checks for CoW and direct I/O. · 96a1cc73

由 Wengang Wang 提交于 2月 09, 2010

When ocfs2 has to do CoW for refcounted extents, we disable direct I/O
and go through the buffered I/O path.  This makes the combined check
easier to read.
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

96a1cc73

03 2月, 2010 1 次提交

ocfs2: Add parenthesis to wrap the check for O_DIRECT. · 60c48674

由 Tao Ma 提交于 2月 03, 2010

Add parenthesis to wrap the check for O_DIRECT.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

60c48674

26 1月, 2010 1 次提交

ocfs2/trivial: Remove trailing whitespaces · 2bd63216

由 Sunil Mushran 提交于 1月 25, 2010

Patch removes trailing whitespaces.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

2bd63216

31 12月, 2009 1 次提交

ocfs2: Handle O_DIRECT when writing to a refcounted cluster. · 86470e98

由 Tao Ma 提交于 12月 03, 2009

In case of writing to a refcounted cluster with O_DIRECT,
we need to fall back to buffer write. And when it is finished,
we need to flush the page and the journal as we did for other
O_DIRECT writes.

This patch fix oss bug 1191.
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1191Signed-off-by: NTao Ma <tao.ma@oracle.com>
Tested-by: NTristan Ye <tristan.ye@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

86470e98

10 12月, 2009 1 次提交

vfs: Implement proper O_SYNC semantics · 6b2f3d1f

由 Christoph Hellwig 提交于 10月 27, 2009

While Linux provided an O_SYNC flag basically since day 1, it took until
Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
since that day we had generic_osync_around with only minor changes and the
great "For now, when the user asks for O_SYNC, we'll actually give
O_DSYNC" comment.  This patch intends to actually give us real O_SYNC
semantics in addition to the O_DSYNC semantics.  After Jan's O_SYNC
patches which are required before this patch it's actually surprisingly
simple, we just need to figure out when to set the datasync flag to
vfs_fsync_range and when not.

This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's
numerical value to keep binary compatibility, and adds a new real O_SYNC
flag.  To guarantee backwards compatiblity it is defined as expanding to
both the O_DSYNC and the new additional binary flag (__O_SYNC) to make
sure we are backwards-compatible when compiled against the new headers.

This also means that all places that don't care about the differences can
just check O_DSYNC and get the right behaviour for O_SYNC, too - only
places that actuall care need to check __O_SYNC in addition.  Drivers and
network filesystems have been updated in a fail safe way to always do the
full sync magic if O_DSYNC is set.  The few places setting O_SYNC for
lower layers are kept that way for now to stay failsafe.

We enforce that O_DSYNC is set when __O_SYNC is set early in the open path
to make sure we always get these sane options.

Note that parisc really screwed up their headers as they already define a
O_DSYNC that has always been a no-op.  We try to repair it by using it for
the new O_DSYNC and redefinining O_SYNC to send both the traditional
O_SYNC numerical value _and_ the O_DSYNC one.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger@sun.com>
Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: NKyle McMartin <kyle@mcmartin.ca>
Acked-by: NUlrich Drepper <drepper@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJan Kara <jack@suse.cz>

6b2f3d1f

29 10月, 2009 1 次提交

ocfs2: duplicate inline data properly during reflink. · 2f48d593

由 Tao Ma 提交于 10月 15, 2009

The old reflink fails to handle inodes with inline data and will oops
if it encounters them.  This patch copies inline data to the new inode.
Extended attributes may still be refcounted.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Tested-by: NTristan Ye <tristan.ye@oracle.com>

2f48d593

23 9月, 2009 3 次提交

ocfs2: Call refcount tree remove process properly. · 8b2c0dba

由 Tao Ma 提交于 8月 18, 2009

Now with xattr refcount support, we need to check whether
we have xattr refcounted before we remove the refcount tree.

Now the mechanism is:
1) Check whether i_clusters == 0, if no, exit.
2) check whether we have i_xattr_loc in dinode. if yes, exit.
2) Check whether we have inline xattr stored outside, if yes, exit.
4) Remove the tree.
Signed-off-by: NTao Ma <tao.ma@oracle.com>

8b2c0dba

ocfs2: CoW a reflinked cluster when it is truncated. · 37f8a2bf

由 Tao Ma 提交于 8月 26, 2009

When we truncate a file to a specific size which resides in a reflinked
cluster, we need to CoW it since ocfs2_zero_range_for_truncate will
zero the space after the size(just another type of write).

So we add a "max_cpos" in ocfs2_refcount_cow so that it will stop when
it hit the max cluster offset.
Signed-off-by: NTao Ma <tao.ma@oracle.com>

37f8a2bf

ocfs2: Integrate CoW in file write. · 293b2f70

由 Tao Ma 提交于 8月 25, 2009

When we use mmap, we CoW the refcountd clusters in
ocfs2_write_begin_nolock. While for normal file
io(including directio), we do CoW in
ocfs2_prepare_inode_for_write.
Signed-off-by: NTao Ma <tao.ma@oracle.com>

293b2f70

14 9月, 2009 2 次提交

ocfs2: Update syncing after splicing to match generic version · d23c937b

由 Jan Kara 提交于 8月 18, 2009

Update ocfs2 specific splicing code to use generic syncing helper. The sync now
does not happen under rw_lock because generic_write_sync() acquires i_mutex
which ranks above rw_lock. That should not matter because standard fsync path
does not hold it either.
Acked-by: NJoel Becker <Joel.Becker@oracle.com>
Acked-by: NMark Fasheh <mfasheh@suse.com>
CC: ocfs2-devel@oss.oracle.com
Signed-off-by: NJan Kara <jack@suse.cz>

d23c937b

ocfs2: Use __generic_file_aio_write instead of generic_file_aio_write_nolock · 918941a3

由 Jan Kara 提交于 8月 17, 2009

Use the new helper. We have to submit data pages ourselves in case of O_SYNC
write because __generic_file_aio_write does not do it for us. OCFS2 developpers
might think about moving the sync out of i_mutex which seems to be easily
possible but that's out of scope of this patch.

CC: ocfs2-devel@oss.oracle.com
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NJan Kara <jack@suse.cz>

918941a3

05 9月, 2009 3 次提交

ocfs2: Pass ocfs2_caching_info into ocfs_init_*_extent_tree(). · 5e404e9e

由 Joel Becker 提交于 2月 13, 2009

With this commit, extent tree operations are divorced from inodes and
rely on ocfs2_caching_info.  Phew!
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

5e404e9e

ocfs2: ocfs2_add_clusters_in_btree() no longer needs struct inode. · cbee7e1a

由 Joel Becker 提交于 2月 13, 2009

One more function that doesn't need a struct inode to pass to its
children.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

cbee7e1a

ocfs2: Pass struct ocfs2_caching_info to the journal functions. · 0cf2f763

由 Joel Becker 提交于 2月 12, 2009

The next step in divorcing metadata I/O management from struct inode is
to pass struct ocfs2_caching_info to the journal functions.  Thus the
journal locks a metadata cache with the cache io_lock function.  It also
can compare ci_last_trans and ci_created_trans directly.

This is a large patch because of all the places we change
ocfs2_journal_access..(handle, inode, ...) to
ocfs2_journal_access..(handle, INODE_CACHE(inode), ...).
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

0cf2f763

21 7月, 2009 1 次提交

ocfs2: Initialize count in aio_write before generic_write_checks · cefcb800

由 Goldwyn Rodrigues 提交于 7月 11, 2009

generic_write_checks() expects count to be initialized to the size of
the write.  Writes to files open with O_DIRECT|O_LARGEFILE write 0 bytes
because count is uninitialized.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.de>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

cefcb800

11 7月, 2009 1 次提交

ocfs2: log the actual return value of ocfs2_file_aio_write() · 812e7a6a

由 Wengang Wang 提交于 7月 10, 2009

in ocfs2_file_aio_write(), log_exit() could don't log the value
which is really returned. this patch fixes it.
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

812e7a6a

23 6月, 2009 1 次提交

ocfs2: Update atime in splice read if necessary. · 1962f39a

由 Tao Ma 提交于 6月 19, 2009

We should call ocfs2_inode_lock_atime instead of ocfs2_inode_lock
in ocfs2_file_splice_read like we do in ocfs2_file_aio_read so
that we can update atime in splice read if necessary.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

1962f39a

10 6月, 2009 1 次提交

ocfs2: fdatasync should skip unimportant metadata writeout · e04cc15f

由 Hisashi Hifumi 提交于 6月 09, 2009

In ocfs2, fdatasync and fsync are identical.
I think fdatasync should skip committing transaction when
inode->i_state is set just I_DIRTY_SYNC and this indicates
only atime or/and mtime updates.
Following patch improves fdatasync throughput.

#sysbench --num-threads=16 --max-requests=300000 --test=fileio
--file-block-size=4K --file-total-size=16G --file-test-mode=rndwr
--file-fsync-mode=fdatasync run

Results:
-2.6.30-rc8
Test execution summary:
    total time:                          107.1445s
    total number of events:              119559
    total time taken by event execution: 116.1050
    per-request statistics:
         min:                            0.0000s
         avg:                            0.0010s
         max:                            0.1220s
         approx.  95 percentile:         0.0016s

Threads fairness:
    events (avg/stddev):           7472.4375/303.60
    execution time (avg/stddev):   7.2566/0.64

-2.6.30-rc8-patched
Test execution summary:
    total time:                          86.8529s
    total number of events:              300016
    total time taken by event execution: 24.3077
    per-request statistics:
         min:                            0.0000s
         avg:                            0.0001s
         max:                            0.0336s
         approx.  95 percentile:         0.0001s

Threads fairness:
    events (avg/stddev):           18751.0000/718.75
    execution time (avg/stddev):   1.5192/0.05
Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

e04cc15f

04 6月, 2009 1 次提交

ocfs2: Fix possible deadlock with quotas in ocfs2_setattr() · 65bac575

由 Jan Kara 提交于 6月 02, 2009

We called vfs_dq_transfer() with global quota file lock held. This can lead
to deadlocks as if vfs_dq_transfer() has to allocate new quota structure,
it calls ocfs2_dquot_acquire() which tries to get quota file lock again and
this can block if another node requested the lock in the mean time.

Since we have to call vfs_dq_transfer() with transaction already started
and quota file lock ranks above the transaction start, we cannot just rely
on ocfs2_dquot_acquire() or ocfs2_dquot_release() on getting the lock
if they need it. We fix the problem by acquiring pointers to all quota
structures needed by vfs_dq_transfer() already before calling the function.
By this we are sure that all quota structures are properly allocated and
they can be freed only after we drop references to them. Thus we don't need
quota file lock anywhere inside vfs_dq_transfer().
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

65bac575

15 4月, 2009 1 次提交

ocfs2: fix i_mutex locking in ocfs2_splice_to_file() · 328eaaba

由 Miklos Szeredi 提交于 4月 14, 2009

Rearrange locking of i_mutex on destination and call to
ocfs2_rw_lock() so locks are only held while buffers are copied with
the pipe_to_file() actor, and not while waiting for more data on the
pipe.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

328eaaba

07 4月, 2009 1 次提交

splice: fix deadlock in splicing to file · 7bfac9ec

由 Miklos Szeredi 提交于 4月 06, 2009

There's a possible deadlock in generic_file_splice_write(),
splice_from_pipe() and ocfs2_file_splice_write():

 - task A calls generic_file_splice_write()
 - this calls inode_double_lock(), which locks i_mutex on both
   pipe->inode and target inode
 - ordering depends on inode pointers, can happen that pipe->inode is
   locked first
 - __splice_from_pipe() needs more data, calls pipe_wait()
 - this releases lock on pipe->inode, goes to interruptible sleep
 - task B calls generic_file_splice_write(), similarly to the first
 - this locks pipe->inode, then tries to lock inode, but that is
   already held by task A
 - task A is interrupted, it tries to lock pipe->inode, but fails, as
   it is already held by task B
 - ABBA deadlock

Fix this by explicitly ordering locks: the outer lock must be on
target inode and the inner lock (which is later unlocked and relocked)
must be on pipe->inode.  This is OK, pipe inodes and target inodes
form two nonoverlapping sets, generic_file_splice_write() and friends
are not called with a target which is a pipe.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJens Axboe <jens.axboe@oracle.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7bfac9ec

09 1月, 2009 1 次提交

remove lots of double-semicolons · c19a28e1

由 Fernando Carrijo 提交于 1月 07, 2009

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: NTheodore Ts'o <tytso@mit.edu>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Acked-by: NCasey Schaufler <casey@schaufler-ca.com>
Acked-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c19a28e1

06 1月, 2009 2 次提交

ocfs2: Use metadata-specific ocfs2_journal_access_*() functions. · 13723d00

由 Joel Becker 提交于 10月 17, 2008

The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2
commit triggers and allow us to compute metadata ecc right before the
buffers are written out.  This commit provides ecc for inodes, extent
blocks, group descriptors, and quota blocks.  It is not safe to use
extened attributes and metaecc at the same time yet.

The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide
the type of block at their root.  Before, it didn't matter, but now the
root block must use the appropriate ocfs2_journal_access_*() function.
To keep this abstract, the structures now have a pointer to the matching
journal_access function and a wrapper call to call it.

A few places use naked ocfs2_write_block() calls instead of adding the
blocks to the journal.  We make sure to calculate their checksum and ecc
before the write.

Since we pass around the journal_access functions.  Let's typedef them
in ocfs2.h.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

13723d00

ocfs2: Add quota calls for allocation and freeing of inodes and space · a90714c1

由 Jan Kara 提交于 10月 09, 2008

Add quota calls for allocation and freeing of inodes and space, also update
estimates on number of needed credits for a transaction. Move out inode
allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called
outside of a transaction.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

a90714c1

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功