提交 · 93917411be8db5d21abf15c781dfa43c5cc6edf2 · openeuler / Kernel

21 5月, 2011 2 次提交

ext4: Remove unnecessary wait_event ext4_run_lazyinit_thread() · e1290b3e

由 Lukas Czerner 提交于 5月 20, 2011

For some reason we have been waiting for lazyinit thread to start in the
ext4_run_lazyinit_thread() but it is not needed since it was jus
unnecessary complexity, so get rid of it. We can also remove li_task and
li_wait_task since it is not used anymore.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>

e1290b3e

ext4: Use schedule_timeout_interruptible() for waiting in lazyinit thread · 4ed5c033

由 Lukas Czerner 提交于 5月 20, 2011

In order to make lazyinit eat approx. 10% of io bandwidth at max, we
are sleeping between zeroing each single inode table. For that purpose
we are using timer which wakes up thread when it expires. It is set
via add_timer() and this may cause troubles in the case that thread
has been woken up earlier and in next iteration we call add_timer() on
still running timer hence hitting BUG_ON in add_timer(). We could fix
that by using mod_timer() instead however we can use
schedule_timeout_interruptible() for waiting and hence simplifying
things a lot.

This commit exchange the old "waiting mechanism" with simple
schedule_timeout_interruptible(), setting the time to sleep. Hence we
do not longer need li_wait_daemon waiting queue and others, so get rid
of it.

Addresses-Red-Hat-Bugzilla: #699708
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>

4ed5c033

09 5月, 2011 1 次提交

ext4: move ext4_add_groupblocks() to mballoc.c · 2846e820

由 Amir Goldstein 提交于 5月 09, 2011

In preparation for the next patch, the function ext4_add_groupblocks()
is moved to mballoc.c, where it could use some static functions.
Signed-off-by: NAmir Goldstein <amir73il@users.sf.net>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2846e820

19 4月, 2011 1 次提交

ext4: check for ext[23] file system features when mounting as ext[23] · 2035e776

由 Theodore Ts'o 提交于 4月 18, 2011

Provide better emulation for ext[23] mode by enforcing that the file
system does not have any unsupported file system features as defined
by ext[23] when emulating the ext[23] file system driver when
CONFIG_EXT4_USE_FOR_EXT23 is defined.

This causes the file system type information in /proc/mounts to be
correct for the automatically mounted root file system.  This also
means that "mount -t ext2 /dev/sda /mnt" will fail if /dev/sda
contains an ext3 or ext4 file system, just as one would expect if the
original ext2 file system driver were in use.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2035e776

24 3月, 2011 1 次提交

ext4: use little-endian bitops · 50e0168c

由 Akinobu Mita 提交于 3月 23, 2011

As a preparation for removing ext2 non-atomic bit operations from
asm/bitops.h.  This converts ext2 non-atomic bit operations to
little-endian bit operations.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

50e0168c

12 2月, 2011 1 次提交

ext4: serialize unaligned asynchronous DIO · e9e3bcec

由 Eric Sandeen 提交于 2月 12, 2011

ext4 has a data corruption case when doing non-block-aligned
asynchronous direct IO into a sparse file, as demonstrated
by xfstest 240.

The root cause is that while ext4 preallocates space in the
hole, mappings of that space still look "new" and 
dio_zero_block() will zero out the unwritten portions.  When
more than one AIO thread is going, they both find this "new"
block and race to zero out their portion; this is uncoordinated
and causes data corruption.

Dave Chinner fixed this for xfs by simply serializing all
unaligned asynchronous direct IO.  I've done the same here.
The difference is that we only wait on conversions, not all IO.
This is a very big hammer, and I'm not very pleased with
stuffing this into ext4_file_write().  But since ext4 is
DIO_LOCKING, we need to serialize it at this high level.

I tried to move this into ext4_ext_direct_IO, but by then
we have the i_mutex already, and we will wait on the
work queue to do conversions - which must also take the
i_mutex.  So that won't work.

This was originally exposed by qemu-kvm installing to
a raw disk image with a normal sector-63 alignment.  I've
tested a backport of this patch with qemu, and it does
avoid the corruption.  It is also quite a lot slower
(14 min for package installs, vs. 8 min for well-aligned)
but I'll take slow correctness over fast corruption any day.

Mingming suggested that we can track outstanding
conversions, and wait on those so that non-sparse
files won't be affected, and I've implemented that here;
unaligned AIO to nonsparse files won't take a perf hit.

[tytso@mit.edu: Keep the mutex as a hashed array instead
 of bloating the ext4 inode]

[tytso@mit.edu: Fix up namespace issues so that global
 variables are protected with an "ext4_" prefix.]
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e9e3bcec

17 1月, 2011 1 次提交

fallocate should be a file operation · 2fe17c10

由 Christoph Hellwig 提交于 1月 14, 2011

Currently all filesystems except XFS implement fallocate asynchronously,
while XFS forced a commit. Both of these are suboptimal - in case of O_SYNC
I/O we really want our allocation on disk, especially for the !KEEP_SIZE
case where we actually grow the file with user-visible zeroes. On the
other hand always commiting the transaction is a bad idea for fast-path
uses of fallocate like for example in recent Samba versions. Given
that block allocation is a data plane operation anyway change it from
an inode operation to a file operation so that we have the file structure
available that lets us check for O_SYNC.

This also includes moving the code around for a few of the filesystems,
and remove the already unnedded S_ISDIR checks given that we only wire
up fallocate for regular files.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2fe17c10

11 1月, 2011 8 次提交

ext4: flush the i_completed_io_list during ext4_truncate · 3889fd57

由 Jiaying Zhang 提交于 1月 10, 2011

Ted first found the bug when running 2.6.36 kernel with dioread_nolock
mount option that xfstests #13 complained about wrong file size during fsck.
However, the bug exists in the older kernels as well although it is
somehow harder to trigger.

The problem is that ext4_end_io_work() can happen after we have truncated an
inode to a smaller size. Then when ext4_end_io_work() calls
ext4_convert_unwritten_extents(), we may reallocate some blocks that have
been truncated, so the inode size becomes inconsistent with the allocated
blocks.

The following patch flushes the i_completed_io_list during truncate to reduce
the risk that some pending end_io requests are executed later and convert
already truncated blocks to initialized.

Note that although the fix helps reduce the problem a lot there may still
be a race window between vmtruncate() and ext4_end_io_work(). The fundamental
problem is that if vmtruncate() is called without either i_mutex or i_alloc_sem
held, it can race with an ongoing write request so that the io_end request is
processed later when the corresponding blocks have been truncated.

Ted and I have discussed the problem offline and we saw a few ways to fix
the race completely:

a) We guarantee that i_mutex lock and i_alloc_sem write lock are both hold
whenever vmtruncate() is called. The i_mutex lock prevents any new write
requests from entering writeback and the i_alloc_sem prevents the race
from ext4_page_mkwrite(). Currently we hold both locks if vmtruncate()
is called from do_truncate(), which is probably the most common case.
However, there are places where we may call vmtruncate() without holding
either i_mutex or i_alloc_sem. I would like to ask for other people's
opinions on what locks are expected to be held before calling vmtruncate().
There seems a disagreement among the callers of that function.

b) We change the ext4 write path so that we change the extent tree to contain
the newly allocated blocks and update i_size both at the same time --- when
the write of the data blocks is completed.

c) We add some additional locking to synchronize vmtruncate() and
ext4_end_io_work(). This approach may have performance implications so we
need to be careful.

All of the above proposals may require more substantial changes, so
we may consider to take the following patch as a bandaid.
Signed-off-by: NJiaying Zhang <jiayingz@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3889fd57

ext4: dynamically allocate the jbd2_inode in ext4_inode_info as necessary · 8aefcd55

由 Theodore Ts'o 提交于 1月 10, 2011

Replace the jbd2_inode structure (which is 48 bytes) with a pointer
and only allocate the jbd2_inode when it is needed --- that is, when
the file system has a journal present and the inode has been opened
for writing.  This allows us to further slim down the ext4_inode_info
structure.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8aefcd55

ext4: drop i_state_flags on architectures with 64-bit longs · 353eb83c

由 Theodore Ts'o 提交于 1月 10, 2011

We can store the dynamic inode state flags in the high bits of
EXT4_I(inode)->i_flags, and eliminate i_state_flags.  This saves 8
bytes from the size of ext4_inode_info structure, which when
multiplied by the number of the number of in the inode cache, can save
a lot of memory.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

353eb83c

ext4: reorder ext4_inode_info structure elements to remove unneeded padding · 8a2005d3

由 Theodore Ts'o 提交于 1月 10, 2011

By reordering the elements in the ext4_inode_info structure, we can
reduce the padding needed on an x86_64 system by 16 bytes.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8a2005d3

ext4: drop ec_type from the ext4_ext_cache structure · b05e6ae5

由 Theodore Ts'o 提交于 1月 10, 2011

We can encode the ec_type information by using ee_len == 0 to denote
EXT4_EXT_CACHE_NO, ee_start == 0 to denote EXT4_EXT_CACHE_GAP, and if
neither is true, then the cache type must be EXT4_EXT_CACHE_EXTENT.
This allows us to reduce the size of ext4_ext_inode by another 8
bytes. (ec_type is 4 bytes, plus another 4 bytes of padding)
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b05e6ae5

ext4: use ext4_lblk_t instead of sector_t for logical blocks · 01f49d0b

由 Theodore Ts'o 提交于 1月 10, 2011

This fixes a number of places where we used sector_t instead of
ext4_lblk_t for logical blocks, which for ext4 are still 32-bit data
types.  No point wasting space in the ext4_inode_info structure, and
requiring 64-bit arithmetic on 32-bit systems, when it isn't
necessary.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

01f49d0b

ext4: replace i_delalloc_reserved_flag with EXT4_STATE_DELALLOC_RESERVED · f2321097

由 Theodore Ts'o 提交于 1月 10, 2011

Remove the short element i_delalloc_reserved_flag from the
ext4_inode_info structure and replace it a new bit in i_state_flags.
Since we have an ext4_inode_info for every ext4 inode cached in the
inode cache, any savings we can produce here is a very good thing from
a memory utilization perspective.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f2321097

ext4: Use ext4_error_file() to print the pathname to the corrupted inode · f7c21177

由 Theodore Ts'o 提交于 1月 10, 2011

Where the file pointer is available, use ext4_error_file() instead of
ext4_error_inode().
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f7c21177

20 12月, 2010 2 次提交

ext4: zero out nanosecond timestamps for small inodes · af0b44a1

由 Eric Sandeen 提交于 12月 19, 2010

When nanosecond timestamp resolution isn't supported on an ext4
partition (inode size = 128), stat() appears to be returning
uninitialized garbage in the nanosecond component of timestamps.

EXT4_INODE_GET_XTIME should zero out tv_nsec when EXT4_FITS_IN_INODE
evaluates to false.
Reported-by: NJordan Russell <jr-list-2010@quo.to>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

af0b44a1

ext4: optimize ext4_check_dir_entry() with unlikely() annotations · cad3f007

由 Theodore Ts'o 提交于 12月 19, 2010

This function gets called a lot for large directories, and the answer
is almost always "no, no, there's no problem".  This means using
unlikely() is a good thing.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

cad3f007

16 12月, 2010 3 次提交

T
ext4: Add second mount options field since the s_mount_opt is full up · a2595b8a
由 Theodore Ts'o 提交于 12月 15, 2010
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
a2595b8a

ext4: Move struct ext4_mount_options from ext4.h to super.c · 673c6100

由 Theodore Ts'o 提交于 12月 15, 2010

Move the ext4_mount_options structure definition from ext4.h, since it
is only used in super.c.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

673c6100

ext4: Simplify the usage of clear_opt() and set_opt() macros · fd8c37ec

由 Theodore Ts'o 提交于 12月 15, 2010

Change clear_opt() and set_opt() to take a superblock pointer instead
of a pointer to EXT4_SB(sb)->s_mount_opt.  This makes it easier for us
to support a second mount option field.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

fd8c37ec

15 12月, 2010 1 次提交

ext4: Turn off multiple page-io submission by default · 1449032b

由 Theodore Ts'o 提交于 12月 14, 2010

Jon Nelson has found a test case which causes postgresql to fail with
the error:

psql:t.sql:4: ERROR: invalid page header in block 38269 of relation base/16384/16581

Under memory pressure, it looks like part of a file can end up getting
replaced by zero's.  Until we can figure out the cause, we'll roll
back the change and use block_write_full_page() instead of
ext4_bio_write_page().  The new, more efficient writing function can
be used via the mount option mblk_io_submit, so we can test and fix
the new page I/O code.

To reproduce the problem, install postgres 8.4 or 9.0, and pin enough
memory such that the system just at the end of triggering writeback
before running the following sql script:

begin;
create temporary table foo as select x as a, ARRAY[x] as b FROM
generate_series(1, 10000000 ) AS x;
create index foo_a_idx on foo (a);
create index foo_b_idx on foo USING GIN (b);
rollback;

If the temporary table is created on a hard drive partition which is
encrypted using dm_crypt, then under memory pressure, approximately
30-40% of the time, pgsql will issue the above failure.

This patch should fix this problem, and the problem will come back if
the file system is mounted with the mblk_io_submit mount option.
Reported-by: NJon Nelson <jnelson@jamponi.net>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

1449032b

09 11月, 2010 2 次提交

ext4: fix potential race when freeing ext4_io_page structures · 83668e71

由 Theodore Ts'o 提交于 11月 08, 2010

Use an atomic_t and make sure we don't free the structure while we
might still be submitting I/O for that page.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

83668e71

ext4: handle writeback of inodes which are being freed · f7ad6d2e

由 Theodore Ts'o 提交于 11月 08, 2010

The following BUG can occur when an inode which is getting freed when
it still has dirty pages outstanding, and it gets deleted (in this
because it was the target of a rename).  In ordered mode, we need to
make sure the data pages are written just in case we crash before the
rename (or unlink) is committed.  If the inode is being freed then
when we try to igrab the inode, we end up tripping the BUG_ON at
fs/ext4/page-io.c:146.

To solve this problem, we need to keep track of the number of io
callbacks which are pending, and avoid destroying the inode until they
have all been completed.  That way we don't have to bump the inode
count to keep the inode from being destroyed; an approach which
doesn't work because the count could have already been dropped down to
zero before the inode writeback has started (at which point we're not
allowed to bump the count back up to 1, since it's already started
getting freed).

Thanks to Dave Chinner for suggesting this approach, which is also
used by XFS.

  kernel BUG at /scratch_space/linux-2.6/fs/ext4/page-io.c:146!
  Call Trace:
   [<ffffffff811075b1>] ext4_bio_write_page+0x172/0x307
   [<ffffffff811033a7>] mpage_da_submit_io+0x2f9/0x37b
   [<ffffffff811068d7>] mpage_da_map_and_submit+0x2cc/0x2e2
   [<ffffffff811069b3>] mpage_add_bh_to_extent+0xc6/0xd5
   [<ffffffff81106c66>] write_cache_pages_da+0x2a4/0x3ac
   [<ffffffff81107044>] ext4_da_writepages+0x2d6/0x44d
   [<ffffffff81087910>] do_writepages+0x1c/0x25
   [<ffffffff810810a4>] __filemap_fdatawrite_range+0x4b/0x4d
   [<ffffffff810815f5>] filemap_fdatawrite_range+0xe/0x10
   [<ffffffff81122a2e>] jbd2_journal_begin_ordered_truncate+0x7b/0xa2
   [<ffffffff8110615d>] ext4_evict_inode+0x57/0x24c
   [<ffffffff810c14a3>] evict+0x22/0x92
   [<ffffffff810c1a3d>] iput+0x212/0x249
   [<ffffffff810bdf16>] dentry_iput+0xa1/0xb9
   [<ffffffff810bdf6b>] d_kill+0x3d/0x5d
   [<ffffffff810be613>] dput+0x13a/0x147
   [<ffffffff810b990d>] sys_renameat+0x1b5/0x258
   [<ffffffff81145f71>] ? _atomic_dec_and_lock+0x2d/0x4c
   [<ffffffff810b2950>] ? cp_new_stat+0xde/0xea
   [<ffffffff810b29c1>] ? sys_newlstat+0x2d/0x38
   [<ffffffff810b99c6>] sys_rename+0x16/0x18
   [<ffffffff81002a2b>] system_call_fastpath+0x16/0x1b
Reported-by: NNick Bowler <nbowler@elliptictech.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Tested-by: NNick Bowler <nbowler@elliptictech.com>

f7ad6d2e

02 11月, 2010 1 次提交

tree-wide: fix comment/printk typos · b595076a

由 Uwe Kleine-König 提交于 11月 01, 2010

"gadget", "through", "command", "maintain", "maintain", "controller", "address",
"between", "initiali[zs]e", "instead", "function", "select", "already",
"equal", "access", "management", "hierarchy", "registration", "interest",
"relative", "memory", "offset", "already",
Signed-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

b595076a

28 10月, 2010 12 次提交

ext4: move ext4_mb_{get,put}_buddy_cache_lock and make them static · eee4adc7

由 Eric Sandeen 提交于 10月 27, 2010

These functions are only used within fs/ext4/mballoc.c, so move them
so they are used after they are defined, and then make them be static.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

eee4adc7

T
ext4: rename mark_bitmap_end() to ext4_mark_bitmap_end() · 61d08673
由 Theodore Ts'o 提交于 10月 27, 2010
```
Fix a namespace leak from fs/ext4
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
61d08673

ext4: move flush_completed_IO to fs/ext4/fsync.c and make it static · 4a873a47

由 Theodore Ts'o 提交于 10月 27, 2010

Fix a namespace leak by moving the function to the file where it is
used and making it static.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4a873a47

ext4: make various ext4 functions be static · 1f109d5a

由 Theodore Ts'o 提交于 10月 27, 2010

These functions have no need to be exported beyond file context.

No functions needed to be moved for this commit; just some function
declarations changed to be static and removed from header files.

(A similar patch was submitted by Eric Sandeen, but I wanted to handle
code movement in separate patches to make sure code changes didn't
accidentally get dropped.)
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

1f109d5a

T
ext4: rename {exit,init}_ext4_*() to ext4_{exit,init}_*() · 5dabfc78
由 Theodore Ts'o 提交于 10月 27, 2010
```
This is a cleanup to avoid namespace leaks out of fs/ext4
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
5dabfc78

ext4: Add batched discard support for ext4 · 7360d173

由 Lukas Czerner 提交于 10月 27, 2010

Walk through allocation groups and trim all free extents. It can be
invoked through FITRIM ioctl on the file system. The main idea is to
provide a way to trim the whole file system if needed, since some SSD's
may suffer from performance loss after the whole device was filled (it
does not mean that fs is full!).

It search for free extents in allocation groups specified by Byte range
start -> start+len. When the free extent is within this range, blocks
are marked as used and then trimmed. Afterwards these blocks are marked
as free in per-group bitmap.

Since fstrim is a long operation it is good to have an ability to
interrupt it by a signal. This was added by Dmitry Monakhov.
Thanks Dimitry.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

7360d173

ext4: use bio layer instead of buffer layer in mpage_da_submit_io · bd2d0210

由 Theodore Ts'o 提交于 10月 27, 2010

Call the block I/O layer directly instad of going through the buffer
layer. This should give us much better performance and scalability,
as well as lowering our CPU utilization when doing buffered writeback.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

bd2d0210

ext4: remove unused ext4_sb_info members · 640e9396

由 Eric Sandeen 提交于 10月 27, 2010

Not that these take up a lot of room, but the structure is long enough
as it is, and there's no need to confuse people with these various
undocumented & unused structure members...
Signed-off-by: NEric Sandeen <sandeen@redaht.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

640e9396

ext4: improve llseek error handling for overly large seek offsets · e0d10bfa

由 Toshiyuki Okajima 提交于 10月 27, 2010

The llseek system call should return EINVAL if passed a seek offset
which results in a write error.  What this maximum offset should be
depends on whether or not the huge_file file system feature is set,
and whether or not the file is extent based or not.


If the file has no "EXT4_EXTENTS_FL" flag, the maximum size which can be 
written (write systemcall) is different from the maximum size which can be 
sought (lseek systemcall).

For example, the following 2 cases demonstrates the differences
between the maximum size which can be written, versus the seek offset
allowed by the llseek system call:

#1: mkfs.ext3 <dev>; mount -t ext4 <dev>
#2: mkfs.ext3 <dev>; tune2fs -Oextent,huge_file <dev>; mount -t ext4 <dev>

Table. the max file size which we can write or seek
       at each filesystem feature tuning and file flag setting
+============+===============================+===============================+
| \ File flag|                               |                               |
|      \     |     !EXT4_EXTENTS_FL          |        EXT4_EXTETNS_FL        |
|case       \|                               |                               |
+------------+-------------------------------+-------------------------------+
| #1         |   write:      2194719883264   | write:       --------------   |
|            |   seek:       2199023251456   | seek:        --------------   |
+------------+-------------------------------+-------------------------------+
| #2         |   write:      4402345721856   | write:       17592186044415   |
|            |   seek:      17592186044415   | seek:        17592186044415   |
+------------+-------------------------------+-------------------------------+

The differences exist because ext4 has 2 maxbytes which are sb->s_maxbytes
(= extent-mapped maxbytes) and EXT4_SB(sb)->s_bitmap_maxbytes (= block-mapped 
maxbytes).  Although generic_file_llseek uses only extent-mapped maxbytes.
(llseek of ext4_file_operations is generic_file_llseek which uses
sb->s_maxbytes.)

Therefore we create ext4 llseek function which uses 2 maxbytes.

The new own function originates from generic_file_llseek().
If the file flag, "EXT4_EXTENTS_FL" is not set, the function alters 
inode->i_sb->s_maxbytes into EXT4_SB(inode->i_sb)->s_bitmap_maxbytes.
Signed-off-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>

e0d10bfa

ext4: add interface to advertise ext4 features in sysfs · 857ac889

由 Lukas Czerner 提交于 10月 27, 2010

User-space should have the opportunity to check what features doest ext4
support in each particular copy. This adds easy interface by creating new
"features" directory in sys/fs/ext4/. In that directory files
advertising feature names can be created.

Add lazy_itable_init to the feature list.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

857ac889

ext4: add support for lazy inode table initialization · bfff6873

由 Lukas Czerner 提交于 10月 27, 2010

When the lazy_itable_init extended option is passed to mke2fs, it
considerably speeds up filesystem creation because inode tables are
not zeroed out.  The fact that parts of the inode table are
uninitialized is not a problem so long as the block group descriptors,
which contain information regarding how much of the inode table has
been initialized, has not been corrupted However, if the block group
checksums are not valid, e2fsck must scan the entire inode table, and
the the old, uninitialized data could potentially cause e2fsck to
report false problems.

Hence, it is important for the inode tables to be initialized as soon
as possble.  This commit adds this feature so that mke2fs can safely
use the lazy inode table initialization feature to speed up formatting
file systems.

This is done via a new new kernel thread called ext4lazyinit, which is
created on demand and destroyed, when it is no longer needed.  There
is only one thread for all ext4 filesystems in the system. When the
first filesystem with inititable mount option is mounted, ext4lazyinit
thread is created, then the filesystem can register its request in the
request list.

This thread then walks through the list of requests picking up
scheduled requests and invoking ext4_init_inode_table(). Next schedule
time for the request is computed by multiplying the time it took to
zero out last inode table with wait multiplier, which can be set with
the (init_itable=n) mount option (default is 10).  We are doing
this so we do not take the whole I/O bandwidth. When the thread is no
longer necessary (request list is empty) it frees the appropriate
structures and exits (and can be created later later by another
filesystem).

We do not disturb regular inode allocations in any way, it just do not
care whether the inode table is, or is not zeroed. But when zeroing, we
have to skip used inodes, obviously. Also we should prevent new inode
allocations from the group, while zeroing is on the way. For that we
take write alloc_sem lock in ext4_init_inode_table() and read alloc_sem
in the ext4_claim_inode, so when we are unlucky and allocator hits the
group which is currently being zeroed, it just has to wait.

This can be suppresed using the mount option no_init_itable.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

bfff6873

ext4: use dedicated slab caches for group_info structures · fb1813f4

由 Curt Wohlgemuth 提交于 10月 27, 2010

ext4_group_info structures are currently allocated with kmalloc().
With a typical 4K block size, these are 136 bytes each -- meaning
they'll each consume a 256-byte slab object.  On a system with many
ext4 large partitions, that's a lot of wasted kernel slab space.
(E.g., a single 1TB partition will have about 8000 block groups, using
about 2MB of slab, of which nearly 1MB is wasted.)

This patch creates an array of slab pointers created as needed --
depending on the superblock block size -- and uses these slabs to
allocate the group info objects.

Google-Bug-Id: 2980809
Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

fb1813f4

10 8月, 2010 1 次提交
- A
  convert ext4 to ->evict_inode() · 0930fcc1
  由 Al Viro 提交于 6月 07, 2010
```
pretty much brute-force...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  0930fcc1
05 8月, 2010 1 次提交

ext4: re-inline ext4_rec_len_(to|from)_disk functions · 0cfc9255

由 Eric Sandeen 提交于 8月 05, 2010

commit 3d0518f4, "ext4: New rec_len encoding for very
large blocksizes" made several changes to this path, but from
a perf perspective, un-inlining ext4_rec_len_from_disk() seems
most significant.  This function is called from ext4_check_dir_entry(),
which on a file-creation workload is called extremely often.

I tested this with bonnie:

# bonnie++ -u root -s 0 -f -x 200 -d /mnt/test -n 32

(this does 200 iterations) and got this for the file creations:

ext4 stock:   Average =  21206.8 files/s
ext4 inlined: Average =  22346.7 files/s  (+5%)
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0cfc9255

02 8月, 2010 1 次提交

ext4: Add mount options in superblock · 8b67f04a

由 Theodore Ts'o 提交于 8月 01, 2010

Allow mount options to be stored in the superblock. Also add default
mount option bits for nobarrier, block_validity, discard, and nodelalloc.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8b67f04a

27 7月, 2010 1 次提交

ext4: fix ext4_get_blocks references · 79e83036

由 Eric Sandeen 提交于 7月 27, 2010

ext4_get_blocks got renamed to ext4_map_blocks, but left stale
comments and a prototype littered around.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

79e83036

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功