提交 · 2142914e3eb1168978e842f65cfd182be7582861 · openeuler / Kernel

08 5月, 2013 1 次提交

aio: don't include aio.h in sched.h · a27bb332

由 Kent Overstreet 提交于 5月 07, 2013

Faster kernel compiles by way of fewer unnecessary includes.

[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a27bb332

03 5月, 2013 1 次提交

ext4: fix fio regression · e30b5dca

由 Yan, Zheng 提交于 5月 03, 2013

We (Linux Kernel Performance project) found a regression introduced
by commit:

  f7fec032 ext4: track all extent status in extent status tree

The commit causes about 20% performance decrease in fio random write
test. Profiler shows that rb_next() uses a lot of CPU time. The call
stack is:

  rb_next
  ext4_es_find_delayed_extent
  ext4_map_blocks
  _ext4_get_block
  ext4_get_block_write
  __blockdev_direct_IO
  ext4_direct_IO
  generic_file_direct_write
  __generic_file_aio_write
  ext4_file_write
  aio_rw_vect_retry
  aio_run_iocb
  do_io_submit
  sys_io_submit
  system_call_fastpath
  io_submit
  td_io_getevents
  io_u_queued_complete
  thread_main
  main
  __libc_start_main

The cause is that ext4_es_find_delayed_extent() doesn't have an
upper bound, it keeps searching until a delayed extent is found.
When there are a lots of non-delayed entries in the extent state
tree, ext4_es_find_delayed_extent() may uses a lot of CPU time.
Reported-by: NLKP project <lkp@linux.intel.com>
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>

e30b5dca

23 2月, 2013 1 次提交
- A
  new helper: file_inode(file) · 496ad9aa
  由 Al Viro 提交于 1月 23, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  496ad9aa
18 2月, 2013 2 次提交

ext4: rename and improbe ext4_es_find_extent() · be401363

由 Zheng Liu 提交于 2月 18, 2013

This commit renames ext4_es_find_extent with ext4_es_find_delayed_extent
and improve this function.  First, we split input and output parameter.
Second, this function never return the first block of the next delayed
extent after 'es'.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Jan kara <jack@suse.cz>

be401363

ext4: refine extent status tree · 06b0c886

由 Zheng Liu 提交于 2月 18, 2013

This commit refines the extent status tree code.

1) A prefix 'es_' is added to to the extent status tree structure
members.

2) Refactored es_remove_extent() so that __es_remove_extent() can be
used by es_insert_extent() to remove the old extent entry(-ies) before
inserting a new one.

3) Rename extent_status_end() to ext4_es_end()

4) ext4_es_can_be_merged() is define to check whether two extents can
be merged or not.

5) Update and clarified comments.
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

06b0c886

09 2月, 2013 1 次提交

ext4: pass context information to jbd2__journal_start() · 9924a92a

由 Theodore Ts'o 提交于 2月 08, 2013

So we can better understand what bits of ext4 are responsible for
long-running jbd2 handles, use jbd2__journal_start() so we can pass
context information for logging purposes.

The recommended way for finding the longer-running handles is:

   T=/sys/kernel/debug/tracing
   EVENT=$T/events/jbd2/jbd2_handle_stats
   echo "interval > 5" > $EVENT/filter
   echo 1 > $EVENT/enable

   ./run-my-fs-benchmark

   cat $T/trace > /tmp/problem-handles

This will list handles that were active for longer than 20ms.  Having
longer-running handles is bad, because a commit started at the wrong
time could stall for those 20+ milliseconds, which could delay an
fsync() or an O_SYNC operation.  Here is an example line from the
trace file describing a handle which lived on for 311 jiffies, or over
1.2 seconds:

postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32 
   tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
   dirtied_blocks 0
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

9924a92a

26 12月, 2012 1 次提交

ext4: remove unaligned AIO warning printk · a28a9178

由 Eric Sandeen 提交于 12月 25, 2012

Although I put this in, I now think it was a bad decision.  For most
users, there is very little to be done in this case.  They get the
message, once per day, with no real context or proposed action.  TBH,
it generates support calls when it probably does not need to; the
message sounds more dire than the situation really is.

Just nuke it.  Normal investigation via blktrace or whatnot can
reveal poor IO patterns if bad performance is encountered.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a28a9178

18 12月, 2012 1 次提交

lseek: the "whence" argument is called "whence" · 965c8e59

由 Andrew Morton 提交于 12月 17, 2012

But the kernel decided to call it "origin" instead.  Fix most of the
sites.
Acked-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

965c8e59

11 12月, 2012 1 次提交

ext4: Remove CONFIG_EXT4_FS_XATTR · 939da108

由 Tao Ma 提交于 12月 10, 2012

Ted has sent out a RFC about removing this feature. Eric and Jan
confirmed that both RedHat and SUSE enable this feature in all their
product.  David also said that "As far as I know, it's enabled in all
Android kernels that use ext4."  So it seems OK for us.

And what's more, as inline data depends its implementation on xattr,
and to be frank, I don't run any test again inline data enabled while
xattr disabled.  So I think we should add inline data and remove this
config option in the same release.

[ The savings if you disable CONFIG_EXT4_FS_XATTR is only 27k, which
  isn't much in the grand scheme of things.  Since no one seems to be
  testing this configuration except for some automated compile farms, on
  balance we are better removing this config option, and so that it is
  effectively always enabled. -- tytso ]

Cc: David Brown <davidb@codeaurora.org>
Cc: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

939da108

09 11月, 2012 1 次提交

ext4: introduce lseek SEEK_DATA/SEEK_HOLE support · c8c0df24

由 Zheng Liu 提交于 11月 08, 2012

This patch makes ext4 really support SEEK_DATA/SEEK_HOLE flags. Block-mapped
and extent-mapped files are fully implemented together because ext4_map_blocks
hides this differences.

After applying this patch, it will cause a failure in xfstest #285 when the file
is block-mapped due to block-mapped file isn't support fallocate(2).

I had tried to use ext4_ext_walk_space() to retrieve the offset for a
extent-mapped file. But finally I decide to keep using ext4_map_blocks() to
support SEEK_DATA/SEEK_HOLE because ext4_map_blocks() can hide the difference
between block-mapped file and extent-mapped file. Moreover, in next step,
extent status tree will track all extent status, and we can get all mappings
from this tree. So I think that using ext4_map_blocks() is a better choice.

CC: Hugh Dickins <hughd@google.com>
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c8c0df24

09 10月, 2012 1 次提交

mm: kill vma flag VM_CAN_NONLINEAR · 0b173bc4

由 Konstantin Khlebnikov 提交于 10月 08, 2012

Move actual pte filling for non-linear file mappings into the new special
vma operation: ->remap_pages().

Filesystems must implement this method to get non-linear mapping support,
if it uses filemap_fault() then generic_file_remap_pages() can be used.

Now device drivers can implement this method and obtain nonlinear vma support.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>	#arch/tile
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0b173bc4

05 10月, 2012 1 次提交

ext4: fix ext4_flush_completed_IO wait semantics · c278531d

由 Dmitry Monakhov 提交于 10月 05, 2012

BUG #1) All places where we call ext4_flush_completed_IO are broken
    because buffered io and DIO/AIO goes through three stages
    1) submitted io,
    2) completed io (in i_completed_io_list) conversion pended
    3) finished  io (conversion done)
    And by calling ext4_flush_completed_IO we will flush only
    requests which were in (2) stage, which is wrong because:
     1) punch_hole and truncate _must_ wait for all outstanding unwritten io
      regardless to it's state.
     2) fsync and nolock_dio_read should also wait because there is
        a time window between end_page_writeback() and ext4_add_complete_io()
        As result integrity fsync is broken in case of buffered write
        to fallocated region:
        fsync                                      blkdev_completion
	 ->filemap_write_and_wait_range
                                                   ->ext4_end_bio
                                                     ->end_page_writeback
          <-- filemap_write_and_wait_range return
	 ->ext4_flush_completed_IO
   	 sees empty i_completed_io_list but pended
   	 conversion still exist
                                                     ->ext4_add_complete_io

BUG #2) Race window becomes wider due to the 'ext4: completed_io
locking cleanup V4' patch series

This patch make following changes:
1) ext4_flush_completed_io() now first try to flush completed io and when
   wait for any outstanding unwritten io via ext4_unwritten_wait()
2) Rename function to more appropriate name.
3) Assert that all callers of ext4_flush_unwritten_io should hold i_mutex to
   prevent endless wait
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

c278531d

29 9月, 2012 1 次提交

ext4: give i_aiodio_unwritten a more appropriate name · e27f41e1

由 Dmitry Monakhov 提交于 9月 28, 2012

AIO/DIO prefix is wrong because it account unwritten extents which
also may be scheduled from buffered write endio
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e27f41e1

23 7月, 2012 4 次提交

ext4: convert last user of ext4_mark_super_dirty() to ext4_handle_dirty_super() · 044ce47f

由 Jan Kara 提交于 7月 22, 2012

The last user of ext4_mark_super_dirty() in ext4_file_open() is so
rare it can well be modifying the superblock properly by journalling
the change.  Change it and get rid of ext4_mark_super_dirty() as it's
not needed anymore.

Artem: small amendments.
Artem: tested using xfstests for both journalled and non-journalled ext4.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Tested-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>

044ce47f

ext4: don't take the i_mutex lock when doing DIO overwrites · 4bd809db

由 Zheng Liu 提交于 7月 22, 2012

Aligned and overwrite direct I/O can be parallelized.  In
ext4_file_dio_write, we first check whether these conditions are
satisfied or not.  If so, we take i_data_sem and release i_mutex lock
directly.  Meanwhile iocb->private is set to indicate that this is a
dio overwrite, and it will be handled in ext4_ext_direct_IO.

[ Added fix from Dan Carpenter to fix locking bug on the error path. ]

CC: Tao Ma <tm@tao.ma>
CC: Eric Sandeen <sandeen@redhat.com>
CC: Robin Dong <hao.bigrat@gmail.com>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>

4bd809db

ext4: use core vfs llseek code for dir seeks · ec7268ce

由 Eric Sandeen 提交于 4月 30, 2012

Use the new functionality in generic_file_llseek_size() to
accept a custom EOF position, and un-cut-and-paste all the
vfs llseek code from ext4.

Also fix up comments on ext4_llseek() to reflect reality.
Signed-off-by: NEric Sandeen <sandeen@redaht.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ec7268ce

vfs: allow custom EOF in generic_file_llseek code · e8b96eb5

由 Eric Sandeen 提交于 4月 30, 2012

For ext3/4 htree directories, using the vfs llseek function with
SEEK_END goes to i_size like for any other file, but in reality
we want the maximum possible hash value.  Recent changes
in ext4 have cut & pasted generic_file_llseek() back into fs/ext4/dir.c,
but replicating this core code seems like a bad idea, especially
since the copy has already diverged from the vfs.

This patch updates generic_file_llseek_size to accept
both a custom maximum offset, and a custom EOF position.  With this
in place, ext4_dir_llseek can pass in the appropriate maximum hash
position for both maxsize and eof, and get what it wants.

As far as I know, this does not fix any bugs - nfs in the kernel
doesn't use SEEK_END, and I don't know of any user who does.  But
some ext4 folks seem keen on doing the right thing here, and I can't
really argue.

(Patch also fixes up some comments slightly)
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e8b96eb5

10 7月, 2012 1 次提交

ext4: split ext4_file_write into buffered IO and direct IO · fbe10494

由 Zheng Liu 提交于 7月 09, 2012

ext4_file_dio_write is defined in order to split buffered IO and
direct IO in ext4.  This patch just refactor some stuff in write path.

CC: Tao Ma <tm@tao.ma>
CC: Eric Sandeen <sandeen@redhat.com>
CC: Robin Dong <hao.bigrat@gmail.com>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

fbe10494

29 5月, 2012 1 次提交

ext4: use consistent ssize_t type in ext4_file_write() · 8563000d

由 Zheng Liu 提交于 5月 28, 2012

The generic_file_aio_write() function returns ssize_t, and
ext4_file_write() returns a ssize_t, so use a ssize_t to collect the
return value from generic_file_aio_write().  It shouldn't matter since
the VFS read/write paths shouldn't allow a read greater than MAX_INT,
but there was previously a bug in the AIO code paths, and it's best if
we use a consistent type so that the return value from
generic_file_aio_write() can't get truncated.
Reported-by: NJouni Siren <jouni.siren@iki.fi>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8563000d

28 10月, 2011 1 次提交

ext4: replace cut'n'pasted llseek code with generic_file_llseek_size · 4cce0e28

由 Andi Kleen 提交于 9月 15, 2011

This gives ext4 the benefits of unlocked llseek.

Cc: tytso@mit.edu
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

4cce0e28

25 10月, 2011 1 次提交

ext4: prevent stack overrun in ext4_file_open · cf803903

由 Darrick J. Wong 提交于 10月 25, 2011

In ext4_file_open, the filesystem records the mountpoint of the first
file that is opened after mounting the filesystem.  It does this by
allocating a 64-byte stack buffer, calling d_path() to grab the mount
point through which this file was accessed, and then memcpy()ing 64
bytes into the superblock's s_last_mounted field, starting from the
return value of d_path(), which is stored as "cp".  However, if cp >
buf (which it frequently is since path components are prepended
starting at the end of buf) then we can end up copying stack data into
the superblock.

Writing stack variables into the superblock doesn't sound like a great
idea, so use strlcpy instead.  Andi Kleen suggested using strlcpy
instead of strncpy.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

cf803903

26 7月, 2011 1 次提交

fs: take the ACL checks to common code · 4e34e719

由 Christoph Hellwig 提交于 7月 23, 2011

Replace the ->check_acl method with a ->get_acl method that simply reads an
ACL from disk after having a cache miss. This means we can replace the ACL
checking boilerplate code with a single implementation in namei.c.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4e34e719

21 7月, 2011 1 次提交

Ext4: handle SEEK_HOLE/SEEK_DATA generically · c334b113

由 Josef Bacik 提交于 7月 18, 2011

Since Ext4 has its own lseek we need to make sure it handles
SEEK_HOLE/SEEK_DATA. For now just do the same thing that is done in the generic
case, somebody else can come along and make it do fancy things later. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c334b113

26 5月, 2011 1 次提交

ext4: Convert ext4 to new truncate calling convention · ae24f28d

由 Jan Kara 提交于 5月 25, 2011

Trivial conversion. Fixup one error handling case calling vmtruncate()
and remove ->truncate callback. We also fix a bug that IS_IMMUTABLE and
IS_APPEND files could not be truncated during failed writes. In fact, the
test can be completely removed as upper layers do necessary permission
checks for truncate in do_sys_[f]truncate() and may_open() anyway.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

ae24f28d

12 2月, 2011 1 次提交

ext4: serialize unaligned asynchronous DIO · e9e3bcec

由 Eric Sandeen 提交于 2月 12, 2011

ext4 has a data corruption case when doing non-block-aligned
asynchronous direct IO into a sparse file, as demonstrated
by xfstest 240.

The root cause is that while ext4 preallocates space in the
hole, mappings of that space still look "new" and 
dio_zero_block() will zero out the unwritten portions.  When
more than one AIO thread is going, they both find this "new"
block and race to zero out their portion; this is uncoordinated
and causes data corruption.

Dave Chinner fixed this for xfs by simply serializing all
unaligned asynchronous direct IO.  I've done the same here.
The difference is that we only wait on conversions, not all IO.
This is a very big hammer, and I'm not very pleased with
stuffing this into ext4_file_write().  But since ext4 is
DIO_LOCKING, we need to serialize it at this high level.

I tried to move this into ext4_ext_direct_IO, but by then
we have the i_mutex already, and we will wait on the
work queue to do conversions - which must also take the
i_mutex.  So that won't work.

This was originally exposed by qemu-kvm installing to
a raw disk image with a normal sector-63 alignment.  I've
tested a backport of this patch with qemu, and it does
avoid the corruption.  It is also quite a lot slower
(14 min for package installs, vs. 8 min for well-aligned)
but I'll take slow correctness over fast corruption any day.

Mingming suggested that we can track outstanding
conversions, and wait on those so that non-sparse
files won't be affected, and I've implemented that here;
unaligned AIO to nonsparse files won't take a perf hit.

[tytso@mit.edu: Keep the mutex as a hashed array instead
 of bloating the ext4 inode]

[tytso@mit.edu: Fix up namespace issues so that global
 variables are protected with an "ext4_" prefix.]
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e9e3bcec

17 1月, 2011 1 次提交

fallocate should be a file operation · 2fe17c10

由 Christoph Hellwig 提交于 1月 14, 2011

Currently all filesystems except XFS implement fallocate asynchronously,
while XFS forced a commit. Both of these are suboptimal - in case of O_SYNC
I/O we really want our allocation on disk, especially for the !KEEP_SIZE
case where we actually grow the file with user-visible zeroes. On the
other hand always commiting the transaction is a bad idea for fast-path
uses of fallocate like for example in recent Samba versions. Given
that block allocation is a data plane operation anyway change it from
an inode operation to a file operation so that we have the file structure
available that lets us check for O_SYNC.

This also includes moving the code around for a few of the filesystems,
and remove the already unnedded S_ISDIR checks given that we only wire
up fallocate for regular files.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2fe17c10

11 1月, 2011 1 次提交

ext4: dynamically allocate the jbd2_inode in ext4_inode_info as necessary · 8aefcd55

由 Theodore Ts'o 提交于 1月 10, 2011

Replace the jbd2_inode structure (which is 48 bytes) with a pointer
and only allocate the jbd2_inode when it is needed --- that is, when
the file system has a journal present and the inode has been opened
for writing.  This allows us to further slim down the ext4_inode_info
structure.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8aefcd55

28 10月, 2010 1 次提交

ext4: improve llseek error handling for overly large seek offsets · e0d10bfa

由 Toshiyuki Okajima 提交于 10月 27, 2010

The llseek system call should return EINVAL if passed a seek offset
which results in a write error.  What this maximum offset should be
depends on whether or not the huge_file file system feature is set,
and whether or not the file is extent based or not.


If the file has no "EXT4_EXTENTS_FL" flag, the maximum size which can be 
written (write systemcall) is different from the maximum size which can be 
sought (lseek systemcall).

For example, the following 2 cases demonstrates the differences
between the maximum size which can be written, versus the seek offset
allowed by the llseek system call:

#1: mkfs.ext3 <dev>; mount -t ext4 <dev>
#2: mkfs.ext3 <dev>; tune2fs -Oextent,huge_file <dev>; mount -t ext4 <dev>

Table. the max file size which we can write or seek
       at each filesystem feature tuning and file flag setting
+============+===============================+===============================+
| \ File flag|                               |                               |
|      \     |     !EXT4_EXTENTS_FL          |        EXT4_EXTETNS_FL        |
|case       \|                               |                               |
+------------+-------------------------------+-------------------------------+
| #1         |   write:      2194719883264   | write:       --------------   |
|            |   seek:       2199023251456   | seek:        --------------   |
+------------+-------------------------------+-------------------------------+
| #2         |   write:      4402345721856   | write:       17592186044415   |
|            |   seek:      17592186044415   | seek:        17592186044415   |
+------------+-------------------------------+-------------------------------+

The differences exist because ext4 has 2 maxbytes which are sb->s_maxbytes
(= extent-mapped maxbytes) and EXT4_SB(sb)->s_bitmap_maxbytes (= block-mapped 
maxbytes).  Although generic_file_llseek uses only extent-mapped maxbytes.
(llseek of ext4_file_operations is generic_file_llseek which uses
sb->s_maxbytes.)

Therefore we create ext4 llseek function which uses 2 maxbytes.

The new own function originates from generic_file_llseek().
If the file flag, "EXT4_EXTENTS_FL" is not set, the function alters 
inode->i_sb->s_maxbytes into EXT4_SB(inode->i_sb)->s_bitmap_maxbytes.
Signed-off-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>

e0d10bfa

27 7月, 2010 1 次提交

ext4: fix EFBIG edge case when writing to large non-extent file · d889dc83

由 Toshiyuki Okajima 提交于 7月 27, 2010

By running the following reproducer, we can confirm that the write 
system call returns with 0 when it should return the error EFBIG.

#!/bin/sh

/bin/dd if=/dev/zero of=./img bs=1k count=1 seek=1024k > /dev/null 2>&1
/sbin/mkfs.ext3 -Fq ./img
/bin/mount -o loop -t ext4 ./img /mnt
/bin/touch /mnt/file
strace /bin/dd if=/dev/zero of=/mnt/file conv=notrunc bs=1k count=1 seek=$((2194719883264/1024)) 2>&1 | /bin/egrep "write.* 1024\) = "
/bin/umount /mnt
exit
Signed-off-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Eric Sandeen <sandeen@redhat.com>

d889dc83

12 6月, 2010 1 次提交

ext4: Clean up s_dirt handling · a0375156

由 Theodore Ts'o 提交于 6月 11, 2010

We don't need to set s_dirt in most of the ext4 code when journaling
is enabled.  In ext3/4 some of the summary statistics for # of free
inodes, blocks, and directories are calculated from the per-block
group statistics when the file system is mounted or unmounted.  As a
result the superblock doesn't have to be updated, either via the
journal or by setting s_dirt.  There are a few exceptions, most
notably when resizing the file system, where the superblock needs to
be modified --- and in that case it should be done as a journalled
operation if possible, and s_dirt set only in no-journal mode.

This patch will optimize out some unneeded disk writes when using ext4
with a journal.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a0375156

17 5月, 2010 1 次提交

ext4: Use bitops to read/modify i_flags in struct ext4_inode_info · 12e9b892

由 Dmitry Monakhov 提交于 5月 16, 2010

At several places we modify EXT4_I(inode)->i_flags without holding
i_mutex (ext4_do_update_inode, ...). These modifications are racy and
we can lose updates to i_flags. So convert handling of i_flags to use
bitops which are atomic.

https://bugzilla.kernel.org/show_bug.cgi?id=15792Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

12e9b892

05 3月, 2010 2 次提交

dquot: cleanup dquot initialize routine · 871a2931

由 Christoph Hellwig 提交于 3月 03, 2010

Get rid of the initialize dquot operation - it is now always called from
the filesystem and if a filesystem really needs it's own (which none
currently does) it can just call into it's own routine directly.

Rename the now static low-level dquot_initialize helper to __dquot_initialize
and vfs_dq_init to dquot_initialize to have a consistent namespace.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

871a2931

dquot: move dquot initialization responsibility into the filesystem · 907f4554

由 Christoph Hellwig 提交于 3月 03, 2010

Currently various places in the VFS call vfs_dq_init directly.  This means
we tie the quota code into the VFS.  Get rid of that and make the
filesystem responsible for the initialization.   For most metadata operations
this is a straight forward move into the methods, but for truncate and
open it's a bit more complicated.

For truncate we currently only call vfs_dq_init for the sys_truncate case
because open already takes care of it for ftruncate and open(O_TRUNC) - the
new code causes an additional vfs_dq_init for those which is harmless.

For open the initialization is moved from do_filp_open into the open method,
which means it happens slightly earlier now, and only for regular files.
The latter is fine because we don't need to initialize it for operations
on special files, and we already do it as part of the namespace operations
for directories.

Add a dquot_file_open helper that filesystems that support generic quotas
can use to fill in ->open.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

907f4554

04 3月, 2010 1 次提交

Get rid of mnt_mountpoint abuses in ext4 · 3899167d

由 Al Viro 提交于 1月 23, 2010

path to mnt/mnt->mnt_root is no worse than that to
mnt->mnt_parent/mnt->mnt_mountpoint *and* needs no
pinning the sucker down (mnt is not going away and
mnt->mnt_root won't change)
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3899167d

25 1月, 2010 1 次提交

ext4: Use bitops to read/modify EXT4_I(inode)->i_state · 19f5fb7a

由 Theodore Ts'o 提交于 1月 24, 2010

At several places we modify EXT4_I(inode)->i_state without holding
i_mutex (ext4_release_file, ext4_bmap, ext4_journalled_writepage,
ext4_do_update_inode, ...). These modifications are racy and we can
lose updates to i_state. So convert handling of i_state to use bitops
which are atomic.

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

19f5fb7a

28 9月, 2009 1 次提交

const: mark struct vm_struct_operations · f0f37e2f

由 Alexey Dobriyan 提交于 9月 27, 2009

* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code

But leave TTM code alone, something is fishy there with global vm_ops
being used.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f0f37e2f

14 9月, 2009 1 次提交

ext4: Remove syncing logic from ext4_file_write · 0d34ec62

由 Jan Kara 提交于 8月 18, 2009

The syncing is now properly handled by generic_file_aio_write() so
no special ext4 code is needed.

CC: linux-ext4@vger.kernel.org
CC: tytso@mit.edu
Signed-off-by: NJan Kara <jack@suse.cz>

0d34ec62

09 9月, 2009 1 次提交

ext[234]: move over to 'check_acl' permission model · 1d5ccd1c

由 Linus Torvalds 提交于 8月 28, 2009

Don't implement per-filesystem 'extX_permission()' functions that have
to be called for every path component operation, and instead just expose
the actual ACL checking so that the VFS layer can now do it for us.
Reviewed-by: NJames Morris <jmorris@namei.org>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1d5ccd1c

13 6月, 2009 1 次提交

ext4: update the s_last_mounted field in the superblock · bc0b0d6d

由 Theodore Ts'o 提交于 6月 13, 2009

This field can be very helpful when a system administrator is trying
to sort through large numbers of block devices or filesystem images.
What is stored in this field can be ambiguous if multiple filesystem
namespaces are in play; what we store in practice is the mountpoint
interpreted by the process's namespace which first opens a file in the
filesystem.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

bc0b0d6d

28 3月, 2009 1 次提交

ext4: Fix discard of inode prealloc space with delayed allocation. · d6014301

由 Aneesh Kumar K.V 提交于 3月 27, 2009

With delayed allocation we should not/cannot discard inode prealloc
space during file close. We would still have dirty pages for which we
haven't allocated blocks yet. With this fix after each get_blocks
request we check whether we have zero reserved blocks and if yes and
we don't have any writers on the file we discard inode prealloc space.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d6014301

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功