提交 · 9c28cbccec66a5ca292c6659bf5a0fe0c8459fa7 · OpenHarmony / kernel_linux

16 9月, 2009 18 次提交

jbd: Journal block numbers can ever be only 32-bit use unsigned int for them · 9c28cbcc

由 Jan Kara 提交于 8月 03, 2009

It does not make sense to store block number for journal as unsigned long
since they can be only 32-bit (because of on-disk format limitation). So
change in-memory structures and variables to use unsigned int instead.
Signed-off-by: NJan Kara <jack@suse.cz>

9c28cbcc

JBD: round commit timer up to avoid uncommitted transaction · b449fc6f

由 Andreas Dilger 提交于 7月 30, 2009

Fix jiffie rounding in jbd commit timer setup code.  Rounding down could cause
the timer to be fired before the corresponding transaction has expired.  That
transaction can stay not committed forever if no new transaction is created or
explicit sync/umount happens.
Signed-off-by: NAndreas Dilger <adilger@sun.com>
Signed-off-by: NJan Kara <jack@suse.cz>

b449fc6f

writeback: fix possible bdi writeback refcounting problem · 1ef7d9aa

由 Nick Piggin 提交于 9月 15, 2009

wb_clear_pending AFAIKS should not be called after the item has been
put on the list, except by the worker threads. It could lead to the
situation where the refcount is decremented below 0 and cause lots of
problems.

Presumably the !wb_has_dirty_io case is not a common one, so it can
be discovered when the thread wakes up to check?

Also add a comment in bdi_work_clear.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

1ef7d9aa

writeback: Fix bdi use after free in wb_work_complete() · 77b9d059

由 Nick Piggin 提交于 9月 15, 2009

By the time bdi_work_on_stack gets evaluated again in bdi_work_free, it
can already have been deallocated and used for something else in the
!on stack case, giving a false positive in this test and causing
corruption.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

77b9d059

writeback: improve scalability of bdi writeback work queues · 77fad5e6

由 Nick Piggin 提交于 9月 15, 2009

If you're going to do an atomic RMW on each list entry, there's not much
point in all the RCU complexities of the list walking. This is only going
to help the multi-thread case I guess, but it doesn't hurt to do now.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

77fad5e6

writeback: remove smp_mb(), it's not needed with list_add_tail_rcu() · deed62ed

由 Nick Piggin 提交于 9月 15, 2009

list_add_tail_rcu contains required barriers.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

deed62ed

J
writeback: use schedule_timeout_interruptible() · 49db0414
由 Jens Axboe 提交于 9月 15, 2009
```
Gets rid of a manual set_current_state().
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
49db0414

writeback: add comments to bdi_work structure · 8010c3b6

由 Jens Axboe 提交于 9月 15, 2009

And document its retriever, get_next_work_item().
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

8010c3b6

writeback: separate starting of sync vs opportunistic writeback · b6e51316

由 Jens Axboe 提交于 9月 16, 2009

bdi_start_writeback() is currently split into two paths, one for
WB_SYNC_NONE and one for WB_SYNC_ALL. Add bdi_sync_writeback()
for WB_SYNC_ALL writeback and let bdi_start_writeback() handle
only WB_SYNC_NONE.

Push down the writeback_control allocation and only accept the
parameters that make sense for each function. This cleans up
the API considerably.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b6e51316

writeback: inline allocation failure handling in bdi_alloc_queue_work() · bcddc3f0

由 Jens Axboe 提交于 9月 13, 2009

This gets rid of work == NULL in bdi_queue_work() and puts the
OOM handling where it belongs.
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

bcddc3f0

writeback: use RCU to protect bdi_list · cfc4ba53

由 Jens Axboe 提交于 9月 14, 2009

Now that bdi_writeback_all() no longer handles integrity writeback,
it doesn't have to block anymore. This means that we can switch
bdi_list reader side protection to RCU.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

cfc4ba53

writeback: only use bdi_writeback_all() for WB_SYNC_NONE writeout · f11fcae8

由 Jens Axboe 提交于 9月 15, 2009

Data integrity writeback must use bdi_start_writeback() and ensure
that wbc->sb and wbc->bdi are set.
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

f11fcae8

fs: Assign bdi in super_block · 32a88aa1

由 Jens Axboe 提交于 9月 16, 2009

We do this automatically in get_sb_bdev() from the set_bdev_super()
callback. Filesystems that have their own private backing_dev_info
must assign that in ->fill_super().

Note that ->s_bdi assignment is required for proper writeback!
Acked-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

32a88aa1

writeback: make wb_writeback() take an argument structure · c4a77a6c

由 Jens Axboe 提交于 9月 16, 2009

We need to be able to pass in range_cyclic as well, so instead
of growing yet another argument, split the arguments into a
struct wb_writeback_args structure that we can use internally.
Also makes it easier to just copy all members to an on-stack
struct, since we can't access work after clearing the pending
bit.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

c4a77a6c

writeback: merely wakeup flusher thread if work allocation fails for WB_SYNC_NONE · f0fad8a5

由 Christoph Hellwig 提交于 9月 11, 2009

Since it's an opportunistic writeback and not a data integrity action,
don't punt to blocking writeback. Just wakeup the thread and it will
flush old data.
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

f0fad8a5

writeback: get rid of wbc->for_writepages · 1fe06ad8

由 Jens Axboe 提交于 9月 15, 2009

It's only set, it's never checked. Kill it.
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

1fe06ad8

fs: remove bdev->bd_inode_backing_dev_info · 2c96ce9f

由 Jens Axboe 提交于 9月 15, 2009

It has been unused since it was introduced in:

commit 520808bf20e90fdbdb320264ba7dd5cf9d47dcac
Author: Andrew Morton <akpm@osdl.org>
Date:   Fri May 21 00:46:17 2004 -0700

    [PATCH] block device layer: separate backing_dev_info infrastructure

So lets just kill it.
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

2c96ce9f

driver model: constify attribute groups · a4dbd674

由 David Brownell 提交于 6月 24, 2009

Let attribute group vectors be declared "const".  We'd
like to let most attribute metadata live in read-only
sections... this is a start.
Signed-off-by: NDavid Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

a4dbd674

15 9月, 2009 4 次提交

udf: Fix possible corruption when close races with write · cbc8cc33

由 Jan Kara 提交于 8月 07, 2009

When we close a file, we remove preallocated blocks from it. But this
truncation was not protected by i_mutex and thus it could have raced with a
write through a different fd and cause crashes or even filesystem corruption.
Signed-off-by: NJan Kara <jack@suse.cz>

cbc8cc33

udf: Perform preallocation only for regular files · 81056dd0

由 Jan Kara 提交于 7月 16, 2009

So far we preallocated blocks also for directories but that brings a
problem, when to get rid of preallocated blocks we don't need. So far
we removed them in udf_clear_inode() which has a disadvantage that
1) blocks are unavailable long after writing to a directory finished
   and thus one can get out of space unnecessarily early
2) releasing blocks from udf_clear_inode is problematic because VFS
   does not expect us to redirty inode there and it also slows down
   memory reclaim.

So preallocate blocks only for regular files where we can drop preallocation
in udf_release_file.
Signed-off-by: NJan Kara <jack@suse.cz>

81056dd0

udf: Remove wrong assignment in udf_symlink · 7c6e3d1a

由 Jan Kara 提交于 7月 16, 2009

Recomputation of the pointer was wrong (it should have been just increment).
Luckily, we never use the computed value. Remove it.
Signed-off-by: NJan Kara <jack@suse.cz>

7c6e3d1a

udf: Remove dead code · 5891d9dd

由 Jan Kara 提交于 7月 16, 2009

Remove code that gets never used.
Signed-off-by: NJan Kara <jack@suse.cz>

5891d9dd

14 9月, 2009 18 次提交

fsync: wait for data writeout completion before calling ->fsync · 2daea67e

由 Christoph Hellwig 提交于 9月 03, 2009

Currenly vfs_fsync(_range) first calls filemap_fdatawrite to write out
the data, the calls into ->fsync to write out the metadata and then finally
calls filemap_fdatawait to wait for the data I/O to complete.  What sounds
like a clever micro-optimization actually is nast trap for many filesystems.

For many modern filesystems i_size or other inode information is only
updated on I/O completion and we need to wait for I/O to finish before
we can write out the metadata.  For old fashionen filesystems that
instanciate blocks during the actual write and also update the metadata
at that point it opens up a large window were we could expose uninitialized
blocks after a crash.  While a few filesystems that need it already wait
for the I/O to finish inside their ->fsync methods it is rather suboptimal
as it is done under the i_mutex and also always for the whole file instead
of just a part as we could do for O_SYNC handling.

Here is a small audit of all fsync instances in the tree:

 - spufs_mfc_fsync:
 - ps3flash_fsync:
 - vol_cdev_fsync:
 - printer_fsync:
 - fb_deferred_io_fsync:
 - bad_file_fsync:
 - simple_sync_file:

	don't care - filesystems/drivers do't use the page cache or are
	purely in-memory.

 - simple_fsync:
 - file_fsync:
 - affs_file_fsync:
 - fat_file_fsync:
 - jfs_fsync:
 - ubifs_fsync:
 - reiserfs_dir_fsync:
 - reiserfs_sync_file:

	never touch pagecache themselves.  We need to wait before if we do
	not want to expose stale data after an allocation.

 - afs_fsync:
 - fuse_fsync_common:

	do the waiting writeback itself in awkward ways, would benefit from
	proper semantics

 - block_fsync:

	Does a filemap_write_and_wait on the block device inode.  Because we
	now have f_mapping that is the same inode we call it on in vfs_fsync.
	So just removing it and letting the VFS do the work in one go would
	be an improvement.

 - btrfs_sync_file:
 - cifs_fsync:
 - xfs_file_fsync:

	need the wait first and currently do it themselves. would benefit from
	doing it outside i_mutex.

 - coda_fsync:
 - ecryptfs_fsync:
 - exofs_file_fsync:
 - shm_fsync:

	only passes the fsync through to the lower layer

 - ext3_sync_file:

	doesn't seem to care, comments are confusing.

 - ext4_sync_file:

	would need the wait to work correctly for delalloc mode with late
	i_size updates.  Otherwise the ext3 comment applies.

	currently implemens it's own writeback and wait in an odd way,
	could benefit from doing it properly.

 - gfs2_fsync:

	not needed for journaled data mode, but probably harmless there.
	Currently writes back data asynchronously itself.  Needs some
	major audit.

 - hostfs_fsync:

	just calls fsync/datasync on the host FD.  Without the wait before
	data might not even be inflight yet if we're unlucky.

 - hpfs_file_fsync:
 - ncp_fsync:

	no-ops.  Dangerous before and after.

 - jffs2_fsync:

	just calls jffs2_flush_wbuf_gc, not sure how this relates to data.

 - nfs_fsync_dir:

	just increments stats, claims all directory operations are synchronous

 - nfs_file_fsync:

	only writes out data???  Looks very odd.

 - nilfs_sync_file:

	looks like it expects all data done, but not sure from the code

 - ntfs_dir_fsync:
 - ntfs_file_fsync:

	appear to do their own data writeback.  Very convoluted code.

 - ocfs2_sync_file:

	does it's own data writeback, but no wait.  probably needs the wait.

 - smb_fsync:

	according to a comment expects all pages written already, probably needs
	the wait before.

This patch only changes vfs_fsync_range, removal of the wait in the methods
that have it is left to the filesystem maintainers.  Note that most
filesystems really do need an audit for their fsync methods given the
gems found in this very brief audit.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

2daea67e

J
vfs: Remove generic_osync_inode() and sync_page_range{_nolock}() · 18f2ee70
由 Jan Kara 提交于 8月 18, 2009
```
Remove these three functions since nobody uses them anymore.
Signed-off-by: NJan Kara <jack@suse.cz>
```
18f2ee70

fat: Opencode sync_page_range_nolock() · 2f3d675b

由 Jan Kara 提交于 8月 17, 2009

fat_cont_expand() is the only user of sync_page_range_nolock(). It's also the
only user of generic_osync_inode() which does not have a file open.  So
opencode needed actions for FAT so that we can convert generic_osync_inode() to
a standard syncing path.

Update a comment about generic_osync_inode().

CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NJan Kara <jack@suse.cz>

2f3d675b

xfs: Convert sync_page_range() to simple filemap_write_and_wait_range() · af0f4414

由 Jan Kara 提交于 8月 18, 2009

Christoph Hellwig says that it is enough for XFS to call
filemap_write_and_wait_range() instead of sync_page_range() because we do
all the metadata syncing when forcing the log.

CC: Felix Blyakher <felixb@sgi.com>
CC: xfs@oss.sgi.com
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

af0f4414

ocfs2: Update syncing after splicing to match generic version · d23c937b

由 Jan Kara 提交于 8月 18, 2009

Update ocfs2 specific splicing code to use generic syncing helper. The sync now
does not happen under rw_lock because generic_write_sync() acquires i_mutex
which ranks above rw_lock. That should not matter because standard fsync path
does not hold it either.
Acked-by: NJoel Becker <Joel.Becker@oracle.com>
Acked-by: NMark Fasheh <mfasheh@suse.com>
CC: ocfs2-devel@oss.oracle.com
Signed-off-by: NJan Kara <jack@suse.cz>

d23c937b

ntfs: Use new syncing helpers and update comments · ebbbf757

由 Jan Kara 提交于 8月 18, 2009

Use new syncing helpers in .write and .aio_write functions. Also
remove superfluous syncing in ntfs_file_buffered_write() and update
comments about generic_osync_inode().

CC: Anton Altaparmakov <aia21@cantab.net>
CC: linux-ntfs-dev@lists.sourceforge.net
Signed-off-by: NJan Kara <jack@suse.cz>

ebbbf757

ext4: Remove syncing logic from ext4_file_write · 0d34ec62

由 Jan Kara 提交于 8月 18, 2009

The syncing is now properly handled by generic_file_aio_write() so
no special ext4 code is needed.

CC: linux-ext4@vger.kernel.org
CC: tytso@mit.edu
Signed-off-by: NJan Kara <jack@suse.cz>

0d34ec62

ext3: Remove syncing logic from ext3_file_write · e367626b

由 Jan Kara 提交于 8月 18, 2009

Syncing is now properly done by generic_file_aio_write() so no special logic is
needed in ext3.

CC: linux-ext4@vger.kernel.org
Signed-off-by: NJan Kara <jack@suse.cz>

e367626b

ext2: Update comment about generic_osync_inode · a2a735ad

由 Jan Kara 提交于 8月 18, 2009

We rely on generic_write_sync() now.

CC: linux-ext4@vger.kernel.org
Signed-off-by: NJan Kara <jack@suse.cz>

a2a735ad

vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode · 148f948b

由 Jan Kara 提交于 8月 17, 2009

Introduce new function for generic inode syncing (vfs_fsync_range) and use
it from fsync() path. Introduce also new helper for syncing after a sync
write (generic_write_sync) using the generic function.

Use these new helpers for syncing from generic VFS functions. This makes
O_SYNC writes to block devices acquire i_mutex for syncing. If we really
care about this, we can make block_fsync() drop the i_mutex and reacquire
it before it returns.

CC: Evgeniy Polyakov <zbr@ioremap.net>
CC: ocfs2-devel@oss.oracle.com
CC: Joel Becker <joel.becker@oracle.com>
CC: Felix Blyakher <felixb@sgi.com>
CC: xfs@oss.sgi.com
CC: Anton Altaparmakov <aia21@cantab.net>
CC: linux-ntfs-dev@lists.sourceforge.net
CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
CC: linux-ext4@vger.kernel.org
CC: tytso@mit.edu
Acked-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

148f948b

vfs: Rename generic_file_aio_write_nolock · eef99380

由 Christoph Hellwig 提交于 8月 20, 2009

generic_file_aio_write_nolock() is now used only by block devices and raw
character device. Filesystems should use __generic_file_aio_write() in case
generic_file_aio_write() doesn't suit them. So rename the function to
blkdev_aio_write() and move it to fs/blockdev.c.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

eef99380

ocfs2: Use __generic_file_aio_write instead of generic_file_aio_write_nolock · 918941a3

由 Jan Kara 提交于 8月 17, 2009

Use the new helper. We have to submit data pages ourselves in case of O_SYNC
write because __generic_file_aio_write does not do it for us. OCFS2 developpers
might think about moving the sync out of i_mutex which seems to be easily
possible but that's out of scope of this patch.

CC: ocfs2-devel@oss.oracle.com
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NJan Kara <jack@suse.cz>

918941a3

fs/Kconfig: move nilfs2 outside misc filesystems · 41f4db0f

由 Ryusuke Konishi 提交于 8月 08, 2009

Some people asked me questions like the following:

On Wed, 15 Jul 2009 13:11:21 +0200, Leon Woestenberg wrote:
> just wondering, any reasons why NILFS2 is one of the miscellaneous
> filesystems and, for example, btrfs, is not in Kconfig?

Actually, nilfs is NOT a filesystem came from other operating systems,
but a filesystem created purely for Linux.  Nor is it a flash
filesystem but that for generic block devices.

So, this moves nilfs outside the misc category as I responded in LKML
"Re: Why does NILFS2 hide under Miscellaneous filesystems?"
(Message-Id: <20090716.002526.93465395.ryusuke@osrg.net>).
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

41f4db0f

nilfs2: convert nilfs_bmap_lookup to an inline function · 0f3fe33b

由 Ryusuke Konishi 提交于 8月 15, 2009

The nilfs_bmap_lookup() is now a wrapper function of
nilfs_bmap_lookup_at_level().

This moves the nilfs_bmap_lookup() to a header file converting it to
an inline function and gives an opportunity for optimization.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

0f3fe33b

nilfs2: allow btree code to directly call dat operations · 2e0c2c73

由 Ryusuke Konishi 提交于 8月 15, 2009

The current btree code is written so that btree functions call dat
operations via wrapper functions in bmap.c when they allocate, free,
or modify virtual block addresses.

This abstraction requires additional function calls and causes
frequent call of nilfs_bmap_get_dat() function since it is used in the
every wrapper function.

This removes the wrapper functions and makes them available from
btree.c and direct.c, which will increase the opportunity of
compiler optimization.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

2e0c2c73

nilfs2: add update functions of virtual block address to dat · bd8169ef

由 Ryusuke Konishi 提交于 8月 15, 2009

This is a preparation for the successive cleanup ("nilfs2: allow btree
to directly call dat operations").

This adds functions bundling a few operations to change an entry of
virtual block address on the dat file.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

bd8169ef

nilfs2: remove individual gfp constants for each metadata file · 7a102b09

由 Ryusuke Konishi 提交于 8月 15, 2009

This gets rid of NILFS_CPFILE_GFP, NILFS_SUFILE_GFP, NILFS_DAT_GFP,
and NILFS_IFILE_GFP.  All of these constants refer to NILFS_MDT_GFP,
and can be removed.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

7a102b09

nilfs2: stop zero-fill of btree path just before free it · 3218929d

由 Ryusuke Konishi 提交于 8月 15, 2009

The btree path object is cleared just before it is freed.

This will remove the code doing the unnecessary clear operation.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

3218929d

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年