提交 · 6d6b77f163c7eabedbba00ed2abb7d4a570bff76 · openanolis / cloud-kernel

23 8月, 2011 2 次提交

block: separate priority boosting from REQ_META · 65299a3b

由 Christoph Hellwig 提交于 8月 23, 2011

Add a new REQ_PRIO to let requests preempt others in the cfq I/O schedule,
and lave REQ_META purely for marking requests as metadata in blktrace.

All existing callers of REQ_META except for XFS are updated to also
set REQ_PRIO for now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

65299a3b

block: remove READ_META and WRITE_META · 5dc06c5a

由 Christoph Hellwig 提交于 8月 23, 2011

Replace all occurnanced of the undocumented READ_META with READ | REQ_META
and remove the unused WRITE_META define.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

5dc06c5a

23 7月, 2011 1 次提交

ext3: Fix data corruption in inodes with journalled data · b22570d9

由 Jan Kara 提交于 7月 23, 2011

When journalling data for an inode (either because it is a symlink or
because the filesystem is mounted in data=journal mode), ext3_evict_inode()
can discard unwritten data by calling truncate_inode_pages(). This is
because we don't mark the buffer / page dirty when journalling data but only
add the buffer to the running transaction and thus mm does not know there
are still unwritten data.

Fix the problem by carefully tracking transaction containing inode's data,
committing this transaction, and writing uncheckpointed buffers when inode
should be reaped.
Signed-off-by: NJan Kara <jack@suse.cz>

b22570d9

21 7月, 2011 2 次提交

fs: simplify the blockdev_direct_IO prototype · aacfc19c

由 Christoph Hellwig 提交于 6月 24, 2011

Simple filesystems always pass inode->i_sb_bdev as the block device
argument, and never need a end_io handler.  Let's simply things for
them and for my grepping activity by dropping these arguments.  The
only thing not falling into that scheme is ext4, which passes and
end_io handler without needing special flags (yet), but given how
messy the direct I/O code there is use of __blockdev_direct_IO
in one instead of two out of three cases isn't going to make a large
difference anyway.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

aacfc19c

fs: move inode_dio_wait calls into ->setattr · 562c72aa

由 Christoph Hellwig 提交于 6月 24, 2011

Let filesystems handle waiting for direct I/O requests themselves instead
of doing it beforehand. This means filesystem-specific locks to prevent
new dio referenes from appearing can be held. This is important to allow
generalizing i_dio_count to non-DIO_LOCKING filesystems.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

562c72aa

25 6月, 2011 3 次提交

ext3: Improve truncate error handling · ee3e77f1

由 Jan Kara 提交于 6月 03, 2011

New truncate calling convention allows us to handle errors from
ext3_block_truncate_page(). So reorganize the code so that
ext3_block_truncate_page() is called before we change inode size.

This also removes unnecessary block zeroing from error recovery after failed
buffered writes (zeroing isn't needed because we could have never written
non-zero data to disk). We have to be careful and keep zeroing in direct IO
write error recovery because there we might have already overwritten end of the
last file block.
Signed-off-by: NJan Kara <jack@suse.cz>

ee3e77f1

ext3: Convert ext3 to new truncate calling convention · 40680f2f

由 Jan Kara 提交于 5月 24, 2011

Mostly trivial conversion. We fix a bug that IS_IMMUTABLE and IS_APPEND files
could not be truncated during failed writes as we change the code. In fact the
test is not needed at all because both IS_IMMUTABLE and IS_APPEND is tested in
upper layers in do_sys_[f]truncate(), may_write(), etc.
Signed-off-by: NJan Kara <jack@suse.cz>

40680f2f

ext3: Add fixed tracepoints · 785c4bcc

由 Lukas Czerner 提交于 5月 23, 2011

This commit adds fixed tracepoints to the ext3 code. It is based on ext4
tracepoints, however due to the differences of both file systems, there
are some tracepoints missing (those for delaloc and for multi-block
allocator) and there are some ext3 specific as well (for reservation
windows).

Here is a list:

ext3_free_inode
ext3_request_inode
ext3_allocate_inode
ext3_evict_inode
ext3_drop_inode
ext3_mark_inode_dirty
ext3_write_begin
ext3_ordered_write_end
ext3_writeback_write_end
ext3_journalled_write_end
ext3_ordered_writepage
ext3_writeback_writepage
ext3_journalled_writepage
ext3_readpage
ext3_releasepage
ext3_invalidatepage
ext3_discard_blocks
ext3_request_blocks
ext3_allocate_blocks
ext3_free_blocks
ext3_sync_file_enter
ext3_sync_file_exit
ext3_sync_fs
ext3_rsv_window_add
ext3_discard_reservation
ext3_alloc_new_reservation
ext3_reserved
ext3_forget
ext3_read_block_bitmap
ext3_direct_IO_enter
ext3_direct_IO_exit
ext3_unlink_enter
ext3_unlink_exit
ext3_truncate_enter
ext3_truncate_exit
ext3_get_blocks_enter
ext3_get_blocks_exit
ext3_load_inode
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: NJan Kara <jack@suse.cz>

785c4bcc

27 5月, 2011 1 次提交

fs: pass exact type of data dirties to ->dirty_inode · aa385729

由 Christoph Hellwig 提交于 5月 27, 2011

Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
anything else, so that the filesystem can track internally if it
needs to push out a transaction for fdatasync or not.

This is just the prototype change with no user for it yet.  I plan
to push large XFS changes for the next merge window, and getting
this trivial infrastructure in this window would help a lot to avoid
tree interdependencies.

Also remove incorrect comments that ->dirty_inode can't block.  That
has been changed a long time ago, and many implementations rely on it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

aa385729

31 3月, 2011 1 次提交

Fix common misspellings · 25985edc

由 Lucas De Marchi 提交于 3月 30, 2011

Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>

25985edc

24 3月, 2011 1 次提交

ext3: Fix writepage credits computation for ordered mode · 523334ba

由 Yongqiang Yang 提交于 3月 24, 2011

Original computation forgets to count writes of indirect block themselves
(it only counts with blocks necessary for their allocation) in ordered mode.
Acked-by: NAmir Goldstein <amir73il@users.sf.net>
Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>

523334ba

10 3月, 2011 1 次提交

block: remove per-queue plugging · 7eaceacc

由 Jens Axboe 提交于 3月 10, 2011

Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

7eaceacc

11 1月, 2011 1 次提交

ext3: Add more journal error check · 156e7431

由 Namhyung Kim 提交于 11月 25, 2010

Check return value of ext3_journal_get_write_acccess() and
ext3_journal_dirty_metadata().
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>

156e7431

28 10月, 2010 2 次提交

ext3: Update kernel-doc comments · a4c18ad2

由 Namhyung Kim 提交于 10月 19, 2010

Update missing/broken argument descriptions and fix formatting.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>

a4c18ad2

N
ext3: Remove misplaced BUFFER_TRACE() in ext3_truncate() · bfa01dfb
由 Namhyung Kim 提交于 10月 15, 2010
```
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>
```
bfa01dfb

26 10月, 2010 1 次提交

fs: kill block_prepare_write · ebdec241

由 Christoph Hellwig 提交于 10月 06, 2010

__block_write_begin and block_prepare_write are identical except for slightly
different calling conventions.  Convert all callers to the __block_write_begin
calling conventions and drop block_prepare_write.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ebdec241

10 8月, 2010 4 次提交

A
convert ext3 to ->evict_inode() · ac14a95b
由 Al Viro 提交于 6月 06, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
ac14a95b

remove inode_setattr · 1025774c

由 Christoph Hellwig 提交于 6月 04, 2010

Replace inode_setattr with opencoded variants of it in all callers.  This
moves the remaining call to vmtruncate into the filesystem methods where it
can be replaced with the proper truncate sequence.

In a few cases it was obvious that we would never end up calling vmtruncate
so it was left out in the opencoded variant:

 spufs: explicitly checks for ATTR_SIZE earlier
 btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
 ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above

In addition to that ncpfs called inode_setattr with handcrafted iattrs,
which allowed to trim down the opencoded variant.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1025774c

introduce __block_write_begin · 6e1db88d

由 Christoph Hellwig 提交于 6月 04, 2010

Split up the block_write_begin implementation - __block_write_begin is a new
trivial wrapper for block_prepare_write that always takes an already
allocated page and can be either called from block_write_begin or filesystem
code that already has a page allocated.  Remove the handling of already
allocated pages from block_write_begin after switching all callers that
do it to __block_write_begin.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6e1db88d

sort out blockdev_direct_IO variants · eafdc7d1

由 Christoph Hellwig 提交于 6月 04, 2010

Move the call to vmtruncate to get rid of accessive blocks to the callers
in prepearation of the new truncate calling sequence. This was only done
for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
was not needed anyway. Get rid of blockdev_direct_IO_no_locking and
its _newtrunc variant while at it as just opencoding the two additional
paramters is shorted than the name suffix.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

eafdc7d1

06 8月, 2010 1 次提交

ext3: Fix dirtying of journalled buffers in data=journal mode · 5f11e6a4

由 Jan Kara 提交于 8月 05, 2010

In data=journal mode, we still use block_write_begin() to prepare page for
writing. This function can occasionally mark buffer dirty which violates
journalling assumptions - when a buffer is part of a transaction, it should be
dirty and a buffer can be already part of a forget list of some transaction
when block_write_begin() gets called. This violation of journalling assumptions
then results in "JBD: Spotted dirty metadata buffer..." warnings.

In fact, temporary dirtying the buffer while the page is still locked does not
really cause problems to the journalling because we won't write the buffer
until the page gets unlocked. So we just have to make sure to clear dirty bits
before unlocking the page.
Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NJan Kara <jack@suse.cz>

5f11e6a4

21 7月, 2010 2 次提交

ext3: Avoid filesystem corruption after a crash under heavy delete load · f25f6242

由 Jan Kara 提交于 7月 12, 2010

It can happen that ext3_free_branches calls ext3_forget() for an indirect block
in an earlier transaction than a transaction in which we clear pointer to this
indirect block. Thus if we crash before a transaction clearing the block
pointer is committed, we will see indirect block pointing to already freed
blocks and complain during orphan list cleanup.

The fix is simple: Make sure ext3_forget() is called in the transaction
doing block pointer clearing.

This is a backport of an ext4 fix by Amir G. <amir73il@users.sourceforge.net>
Signed-off-by: NJan Kara <jack@suse.cz>

f25f6242

ext3: remove vestiges of nobh support · 4c4d3901

由 Christoph Hellwig 提交于 6月 07, 2010

The nobh option was only supported for writeback mode, but given that all
write paths (except mmapped writed) actually create buffer heads, it
effectively was a no-op already.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

4c4d3901

22 5月, 2010 1 次提交

quota: unify quota init condition in setattr · 12755627

由 Dmitry Monakhov 提交于 4月 08, 2010

Quota must being initialized if size or uid/git changes requested.
But initialization performed in two different places:
in case of i_size file system is responsible for dquot init
, but in case of uid/gid init will be called internally in
dquot_transfer().
This ambiguity makes code harder to understand.
Let's move this logic to one common helper function.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NJan Kara <jack@suse.cz>

12755627

30 3月, 2010 1 次提交

ext3: fix broken handling of EXT3_STATE_NEW · de329820

由 Linus Torvalds 提交于 3月 29, 2010

In commit 9df93939 ("ext3: Use bitops to read/modify
EXT3_I(inode)->i_state") ext3 changed its internal 'i_state' variable to
use bitops for its state handling.  However, unline the same ext4
change, it didn't actually change the name of the field when it changed
the semantics of it.

As a result, an old use of 'i_state' remained in fs/ext3/ialloc.c that
initialized the field to EXT3_STATE_NEW.  And that does not work
_at_all_ when we're now working with individually named bits rather than
values that get masked.  So the code tried to mark the state to be new,
but in actual fact set the field to EXT3_STATE_JDATA.  Which makes no
sense at all, and screws up all the code that checks whether the inode
was newly allocated.

In particular, it made the xattr code unhappy, and caused various random
behavior, like apparently

	https://bugzilla.redhat.com/show_bug.cgi?id=577911

So fix the initialization, and rename the field to match ext4 so that we
don't have this happen again.

Cc: James Morris <jmorris@namei.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Daniel J Walsh <dwalsh@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

de329820

06 3月, 2010 1 次提交

pass writeback_control to ->write_inode · a9185b41

由 Christoph Hellwig 提交于 3月 05, 2010

This gives the filesystem more information about the writeback that
is happening.  Trond requested this for the NFS unstable write handling,
and other filesystems might benefit from this too by beeing able to
distinguish between the different callers in more detail.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a9185b41

05 3月, 2010 7 次提交

dquot: cleanup dquot initialize routine · 871a2931

由 Christoph Hellwig 提交于 3月 03, 2010

Get rid of the initialize dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.

Rename the now static low-level dquot_initialize helper to __dquot_initialize
and vfs_dq_init to dquot_initialize to have a consistent namespace.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

871a2931

dquot: move dquot initialization responsibility into the filesystem · 907f4554

由 Christoph Hellwig 提交于 3月 03, 2010

Currently various places in the VFS call vfs_dq_init directly.  This means
we tie the quota code into the VFS.  Get rid of that and make the
filesystem responsible for the initialization.   For most metadata operations
this is a straight forward move into the methods, but for truncate and
open it's a bit more complicated.

For truncate we currently only call vfs_dq_init for the sys_truncate case
because open already takes care of it for ftruncate and open(O_TRUNC) - the
new code causes an additional vfs_dq_init for those which is harmless.

For open the initialization is moved from do_filp_open into the open method,
which means it happens slightly earlier now, and only for regular files.
The latter is fine because we don't need to initialize it for operations
on special files, and we already do it as part of the namespace operations
for directories.

Add a dquot_file_open helper that filesystems that support generic quotas
can use to fill in ->open.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

907f4554

dquot: cleanup dquot transfer routine · b43fa828

由 Christoph Hellwig 提交于 3月 03, 2010

Get rid of the transfer dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.

Rename the now static low-level dquot_transfer helper to __dquot_transfer
and vfs_dq_transfer to dquot_transfer to have a consistent namespace,
and make the new dquot_transfer return a normal negative errno value
which all callers expect.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

b43fa828

dquot: cleanup space allocation / freeing routines · 5dd4056d

由 Christoph Hellwig 提交于 3月 03, 2010

Get rid of the alloc_space, free_space, reserve_space, claim_space and
release_rsv dquot operations - they are always called from the filesystem
and if a filesystem really needs their own (which none currently does)
it can just call into it's own routine directly.

Move shared logic into the common __dquot_alloc_space,
dquot_claim_space_nodirty and __dquot_free_space low-level methods,
and rationalize the wrappers around it to move as much as possible
code into the common block for CONFIG_QUOTA vs not.  Also rename
all these helpers to be named dquot_* instead of vfs_dq_*.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

5dd4056d

ext3: add writepage sanity checks · 49792c80

由 Dmitry Monakhov 提交于 3月 02, 2010

- There is theoretical possibility to perform writepage on
   RO superblock. Add explicit check for what case.
- Page must being locked before writepage.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NJan Kara <jack@suse.cz>

49792c80

ext3: Truncate allocated blocks if direct IO write fails to update i_size · 7eb4969e

由 Jan Kara 提交于 3月 01, 2010

We have to truncate blocks allocated to file during direct IO when we
fail to update i_size properly.
Signed-off-by: NJan Kara <jack@suse.cz>

7eb4969e

ext3: Use bitops to read/modify EXT3_I(inode)->i_state · 9df93939

由 Jan Kara 提交于 1月 06, 2010

At several places we modify EXT3_I(inode)->i_state without holding i_mutex
(ext3_release_file, ext3_bmap, ext3_journalled_writepage, ext3_do_update_inode,
...). These modifications are racy and we can lose updates to i_state. So
convert handling of i_state to use bitops which are atomic.
Signed-off-by: NJan Kara <jack@suse.cz>

9df93939

23 12月, 2009 1 次提交

ext3: quota macros cleanup [V2] · c459001f

由 Dmitry Monakhov 提交于 12月 09, 2009

Currently all quota block reservation macros contains hardcoded "2"
aka MAXQUOTAS value. This is no good because in some places it is not
obvious to understand what does this digit represent. Let's introduce
new macro with self descriptive name.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NJan Kara <jack@suse.cz>

c459001f

10 12月, 2009 1 次提交

ext3: Fix data / filesystem corruption when write fails to copy data · 68eb3db0

由 Jan Kara 提交于 12月 01, 2009

When ext3_write_begin fails after allocating some blocks or
generic_perform_write fails to copy data to write, we truncate blocks already
instantiated beyond i_size. Although these blocks were never inside i_size, we
have to truncate pagecache of these blocks so that corresponding buffers get
unmapped. Otherwise subsequent __block_prepare_write (called because we are
retrying the write) will find the buffers mapped, not call ->get_block, and
thus the page will be backed by already freed blocks leading to filesystem and
data corruption.
Reported-by: NJames Y Knight <foom@fuhm.net>
Signed-off-by: NJan Kara <jack@suse.cz>

68eb3db0

04 12月, 2009 1 次提交

tree-wide: fix typos "offest" -> "offset" · bf48aabb

由 Uwe Kleine-König 提交于 10月 28, 2009

This patch was generated by

	git grep -E -i -l 'offest' | xargs -r perl -p -i -e 's/offest/offset/'
Signed-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

bf48aabb

11 11月, 2009 2 次提交

ext3: Wait for proper transaction commit on fsync · fe8bc91c

由 Jan Kara 提交于 10月 16, 2009

We cannot rely on buffer dirty bits during fsync because pdflush can come
before fsync is called and clear dirty bits without forcing a transaction
commit. What we do is that we track which transaction has last changed
the inode and which transaction last changed allocation and force it to
disk on fsync.
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

fe8bc91c

ext3: retry failed direct IO allocations · ea0174a7

由 Eric Sandeen 提交于 10月 12, 2009

On a 256M 4k block filesystem, doing this in a loop:

    dd if=/dev/zero of=test oflag=direct bs=1M count=64
    rm -f test

eventually leads to spurious ENOSPC:

    dd: writing `test': No space left on device

As with other block allocation callers, it looks like we need to
potentially retry the allocations on the initial ENOSPC.

A similar patch went into ext4 (commit
fbbf6945)
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>

ea0174a7

16 9月, 2009 2 次提交

ext3: Add locking to ext3_do_update_inode · 4f003fd3

由 Chris Mason 提交于 9月 08, 2009

I've been struggling with this off and on while I've been testing the
data=guarded work.  The symptom is corrupted orphan lists and inodes
with the wrong i_size stored on disk.  I was convinced the
data=guarded code was just missing a call to ext3_mark_inode_dirty, but
tracing showed the i_disksize I was sending to ext3_mark_inode_dirty
wasn't actually making it to the drive.

ext3_mark_inode_dirty can be called without locks held (atime updates
and a few others), so the data=guarded code uses locks while updating
the in-memory inode, and then calls ext3_mark_inode_dirty
without any locks held.

But, ext3_mark_inode_dirty has no internal locking to make sure that
only one CPU is updating the buffer head at a time.  Generally this
works out ok because everyone that changes the inode then calls
ext3_mark_inode_dirty themselves.  Even though it races, eventually
someone updates the buffer heads and things move on.

But there is still a risk of the wrong values getting in, and the
data=guarded code seems to hit the race very often.

Since everyone that changes the inode also logs it, it should be
possible to fix this with some memory barriers.  I'll leave that as an
exercise to the reader and lock the buffer head instead.

It it probably a good idea to have a different patch series for lockless
bit flipping on the ext3 i_state field.  ext3_do_update_inode &= clears
EXT3_STATE_NEW without any locks held.
Signed-off-by: NChris Mason <chris.mason@oracle.com>
Signed-off-by: NJan Kara <jack@suse.cz>

4f003fd3

ext3: Fix possible deadlock between ext3_truncate() and ext3_get_blocks() · 00171d3c

由 Jan Kara 提交于 8月 11, 2009

During truncate we are sometimes forced to start a new transaction as the
amount of blocks to be journaled is both quite large and hard to predict. So
far we restarted a transaction while holding truncate_mutex and that violates
lock ordering because truncate_mutex ranks below transaction start (and it
can lead to a real deadlock with ext3_get_blocks() allocating new blocks
from ext3_writepage()).

Luckily, the problem is easy to fix: We just drop the truncate_mutex before
restarting the transaction and acquire it afterwards. We are safe to do this as
by the time ext3_truncate() is called, all the page cache for the truncated
part of the file is dropped and so writepage() cannot come and allocate new
blocks in the part of the file we are truncating. The rest of writers is
stopped by us holding i_mutex.
Signed-off-by: NJan Kara <jack@suse.cz>

00171d3c

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功