提交 · 4df3d265bf8f3762e1d77f554ee279c39dedb020 · openanolis / cloud-kernel

29 1月, 2008 29 次提交

ext4: Take read lock during overwrite case. · 4df3d265

由 Aneesh Kumar K.V 提交于 1月 28, 2008

When we are overwriting a file and not actually allocating new file system
blocks we need to take only the read lock on i_data_sem.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

4df3d265

ext4: Convert truncate_mutex to read write semaphore. · 0e855ac8

由 Aneesh Kumar K.V 提交于 1月 28, 2008

We are currently taking the truncate_mutex for every read. This would have
performance impact on large CPU configuration. Convert the lock to read write
semaphore and take read lock when we are trying to read the file.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

0e855ac8

ext4: Make ext4_get_blocks_wrap take the truncate_mutex early. · c278bfec

由 Aneesh Kumar K.V 提交于 1月 28, 2008

When doing a migrate from ext3 to ext4 inode we need to make sure the test
for inode type and walking inode data happens inside lock. To make this
happen move truncate_mutex early before checking the i_flags.

This actually should enable us to remove the verify_chain().
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

c278bfec

ext4: remove unused code from ext4_find_entry() · 01f4adc0

由 Mariusz Kozlowski 提交于 1月 28, 2008

The unused code found in ext3_find_entry() is also present (and still
unused) in the ext4_find_entry() code. This patch removes it.
Signed-off-by: NMariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

01f4adc0

ext4: Check for the correct error return from · 221879c9

由 Aneesh Kumar K.V 提交于 1月 28, 2008

ext4_ext_get_blocks returns negative values on error. We should
check for  <= 0
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

221879c9

jbd2: Fix assertion failure in fs/jbd2/checkpoint.c · f5a7a6b0

由 Jan Kara 提交于 1月 28, 2008

Before we start committing a transaction, we call
__journal_clean_checkpoint_list() to cleanup transaction's written-back
buffers.

If this call happens to remove all of them (and there were already some
buffers), __journal_remove_checkpoint() will decide to free the transaction
because it isn't (yet) a committing transaction and soon we fail some
assertion - the transaction really isn't ready to be freed :).

We change the check in __journal_remove_checkpoint() to free only a
transaction in T_FINISHED state.  The locking there is subtle though (as
everywhere in JBD ;().  We use j_list_lock to protect the check and a
subsequent call to __journal_drop_transaction() and do the same in the end
of journal_commit_transaction() which is the only place where a transaction
can get to T_FINISHED state.

Probably I'm too paranoid here and such locking is not really necessary -
checkpoint lists are processed only from log_do_checkpoint() where a
transaction must be already committed to be processed or from
__journal_clean_checkpoint_list() where kjournald itself calls it and thus
transaction cannot change state either.  Better be safe if something
changes in future...
Signed-off-by: NJan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

f5a7a6b0

ext4: add block bitmap validation · abcb2947

由 Aneesh Kumar K.V 提交于 1月 28, 2008

When a new block bitmap is read from disk in read_block_bitmap()
there are a few bits that should ALWAYS be set. In particular,
the blocks given corresponding to block bitmap, inode bitmap and inode tables.
Validate the block bitmap against these blocks.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

abcb2947

Add buffer head related helper functions · 389d1b08

由 Aneesh Kumar K.V 提交于 1月 28, 2008

Add buffer head related helper function bh_uptodate_or_lock and
bh_submit_read which can be used by file system
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

389d1b08

ext4: Change the default behaviour on error · bb4f397a

由 Aneesh Kumar K.V 提交于 1月 28, 2008

ext4 file system was by default ignoring errors and continuing. This
is not a good default as continuing on error could lead to file system
corruption. Change the default to mark the file system
readonly. Debian and ubuntu already does this as the default in their
fstab.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>

bb4f397a

ext4: fix oops on corrupted ext4 mount · e7c95593

由 Eric Sandeen 提交于 1月 28, 2008

When mounting an ext4 filesystem with corrupted s_first_data_block, things
can go very wrong and oops.

Because blocks_count in ext4_fill_super is a u64, and we must use do_div,
the calculation of db_count is done differently than on ext4.  If
first_data_block is corrupted such that it is larger than ext4_blocks_count,
for example, then the intermediate blocks_count value may go negative,
but sign-extend to a very large value:

        blocks_count = (ext4_blocks_count(es) -
                        le32_to_cpu(es->s_first_data_block) +
                        EXT4_BLOCKS_PER_GROUP(sb) - 1);

This is then assigned to s_groups_count which is an unsigned long:

        sbi->s_groups_count = blocks_count;

This may result in a value of 0xFFFFFFFF which is then used to compute
db_count:

        db_count = (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) /
                   EXT4_DESC_PER_BLOCK(sb);

and in this case db_count will wind up as 0 because the addition overflows
32 bits.  This in turn causes the kmalloc for group_desc to be of 0 size:

        sbi->s_group_desc = kmalloc(db_count * sizeof (struct buffer_head *),
                                    GFP_KERNEL);

and eventually in ext4_check_descriptors, dereferencing
sbi->s_group_desc[desc_block] will result in a NULL pointer dereference.

The simplest test seems to be to sanity check s_first_data_block,
EXT4_BLOCKS_PER_GROUP, and ext4_blocks_count values to be sure
their combination won't result in a bad intermediate value for
blocks_count.  We could just check for db_count == 0, but
catching it at the root cause seems like it provides more info.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NMingming Cao <cmm@us.ibm.com>

e7c95593

A
ext4/super.c: fix #ifdef's (CONFIG_EXT4_* -> CONFIG_EXT4DEV_*) · 07620f69
由 Adrian Bunk 提交于 1月 28, 2008
```
Based on a report by Robert P. J. Day.
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
```
07620f69

ext4: Return after ext4_error in case of failures · cb47dce7

由 Aneesh Kumar K.V 提交于 1月 28, 2008

This fix some instances where we were continuing after calling
ext4_error. ext4_error call panic only if errors=panic mount option is
set. So we need to make sure we return correctly after ext4_error call

Reported by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

cb47dce7

ext3: Fix the max file size for ext3 file system. · fe7fdc37

由 Aneesh Kumar K.V 提交于 1月 28, 2008

The max file size for ext3 file system is now calculated
with hardcoded 4K block size. The patch fixes it to be
calculated with the right block size.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

fe7fdc37

ext2: Fix the max file size for ext2 file system. · 902be4c5

由 Aneesh Kumar K.V 提交于 1月 28, 2008

The max file size for ext2 file system is now calculated
with hardcoded 4K block size. The patch fixes it to be
calculated with the right block size.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

902be4c5

ext4: store maxbytes for bitmapped files and return EFBIG as appropriate · e2b46574

由 Eric Sandeen 提交于 1月 28, 2008

Calculate & store the max offset for bitmapped files, and
catch too-large seeks, truncates, and writes in ext4, shortening
or rejecting as appropriate.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>

e2b46574

ext4: export iov_shorten from kernel for ext4's use · 19295529

由 Eric Sandeen 提交于 1月 28, 2008

Export iov_shorten() from kernel so that ext4 can
truncate too-large writes to bitmapped files.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>

19295529

ext4: different maxbytes functions for bitmap & extent files · cd2291a4

由 Eric Sandeen 提交于 1月 28, 2008

use 2 different maxbytes functions for bitmapped & extent-based
files.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>

cd2291a4

ext4: Support large files · 8180a562

由 Aneesh Kumar K.V 提交于 1月 28, 2008

This patch converts ext4_inode i_blocks to represent total
blocks occupied by the inode in file system block size.
Earlier the variable used to represent this in 512 byte
block size. This actually limited the total size of the file.

The feature is enabled transparently when we write an inode
whose i_blocks cannot be represnted as 512 byte units in a
48 bit variable.

inode flag  EXT4_HUGE_FILE_FL
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

8180a562

ext4: Add support for 48 bit inode i_blocks. · 0fc1b451

由 Aneesh Kumar K.V 提交于 1月 28, 2008

Use the __le16 l_i_reserved1 field of the linux2 struct of ext4_inode
to represet the higher 16 bits for i_blocks. With this change max_file
size becomes (2**48 -1 )* 512 bytes.

We add a RO_COMPAT feature to the super block to indicate that inode
have i_blocks represented as a split 48 bits. Super block with this
feature set cannot be mounted read write on a kernel with CONFIG_LSF
disabled.

Super block flag EXT4_FEATURE_RO_COMPAT_HUGE_FILE
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

0fc1b451

ext4: Rename i_dir_acl to i_size_high · a48380f7

由 Aneesh Kumar K.V 提交于 1月 28, 2008

Rename ext4_inode.i_dir_acl to i_size_high
drop ext4_inode_info.i_dir_acl as it is not used
Rename ext4_inode.i_size to ext4_inode.i_size_lo
Add helper function for accessing the ext4_inode combined i_size.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

a48380f7

ext4: Rename i_file_acl to i_file_acl_lo · 7973c0c1

由 Aneesh Kumar K.V 提交于 1月 28, 2008

Rename i_file_acl to i_file_acl_lo. This helps
in finding bugs where we use i_file_acl instead
of the combined i_file_acl_lo and i_file_acl_high
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

7973c0c1

ext4: Fix sparse warnings. · 1d03ec98

由 Aneesh Kumar K.V 提交于 1月 28, 2008

Fix sparse warnings related to static functions
and local variables.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

1d03ec98

ext4: Introduce ext4_update_*_feature · 99e6f829

由 Aneesh Kumar K.V 提交于 1月 28, 2008

Introduce ext4_update_*_feature and use them instead
of opencoding.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

99e6f829

ext4: fixes block group number being set to a negative value · 2aa9fc4c

由 Avantika Mathur 提交于 1月 28, 2008

This patch fixes various places where the group number is set to a negative
value.
Signed-off-by: NAvantika Mathur <mathur@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2aa9fc4c

ext4: add ext4_group_t, and change all group variables to this type. · fd2d4291

由 Avantika Mathur 提交于 1月 28, 2008

In many places variables for block group are of type int, which limits the
maximum number of block groups to 2^31. Each block group can have up to
2^15 blocks, with a 4K block size, and the max filesystem size is limited to
2^31 * (2^15 * 2^12) = 2^58 -- or 256 PB

This patch introduces a new type ext4_group_t, of type unsigned long, to
represent block group numbers in ext4.
All occurrences of block group variables are converted to type ext4_group_t.
Signed-off-by: NAvantika Mathur <mathur@us.ibm.com>

fd2d4291

ext4 extents: remove unneeded casts · bba90743

由 Eric Sandeen 提交于 1月 28, 2008

There are many casts in extents.c which are not needed,
as the variables are already the type of the cast, or
are being promoted for no particular reason in printk's.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>

bba90743

ext4: Introduce ext4_lblk_t · 725d26d3

由 Aneesh Kumar K.V 提交于 1月 28, 2008

This patch adds a new data type ext4_lblk_t to represent
the logical file blocks.

This is the preparatory patch to support large files in ext4
The follow up patch with convert the ext4_inode i_blocks to
represent the number of blocks in file system block size. This
changes makes it possible to have a block number 2**32 -1 which
will result in overflow if the block number is represented by
signed long. This patch convert all the block number to type
ext4_lblk_t which is typedef to __u32

Also remove dead code ext4_ext_walk_space
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>

725d26d3

ext4: Avoid rec_len overflow with 64KB block size · a72d7f83

由 Jan Kara 提交于 1月 28, 2008

With 64KB blocksize, a directory entry can have size 64KB which does not fit
into 16 bits we have for entry lenght. So we store 0xffff instead and convert
value when read from / written to disk. The patch also converts some places
to use ext4_next_entry() when we are changing them anyway.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>

a72d7f83

ext4: Support large blocksize up to PAGESIZE · afc7cbca

由 Takashi Sato 提交于 1月 28, 2008

This patch set supports large block size(>4k, <=64k) in ext4,
just enlarging the block size limit. But it is NOT possible to have 64kB
blocksize on ext4 without some changes to the directory handling
code.  The reason is that an empty 64kB directory block would have a
rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in
the filesystem.  The proposed solution is treat 64k rec_len
with a an impossible value like rec_len = 0xffff to handle this.

The Patch-set consists of the following 2 patches.
  [1/2]  ext4: enlarge blocksize
         - Allow blocksize up to pagesize

  [2/2]  ext4: fix rec_len overflow
         - prevent rec_len from overflow with 64KB blocksize

Now on 64k page ppc64 box runs with this patch set we could create a 64k
block size ext4dev, and able to handle empty directory block.
Signed-off-by: NTakashi Sato <sho@tnes.nec.co.jp>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>

afc7cbca

28 1月, 2008 4 次提交

cfq-iosched: relax IOPRIO_CLASS_IDLE restrictions · 0871714e

由 Jens Axboe 提交于 1月 28, 2008

Currently you must be root to set idle io prio class on a process. This
is due to the fact that the idle class is implemented as a true idle
class, meaning that it will not make progress if someone else is
requesting disk access. Unfortunately this means that it opens DOS
opportunities by locking down file system resources, hence it is root
only at the moment.

This patch relaxes the idle class a little, by removing the truly idle
part (which entals a grace period with associated timer). The
modifications make the idle class as close to zero impact as can be done
while still guarenteeing progress. This means we can relax the root only
criteria as well.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

0871714e

io context sharing: preliminary support · d38ecf93

由 Jens Axboe 提交于 1月 24, 2008

Detach task state from ioc, instead keep track of how many processes
are accessing the ioc.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d38ecf93

ioprio: move io priority from task_struct to io_context · fd0928df

由 Jens Axboe 提交于 1月 24, 2008

This is where it belongs and then it doesn't take up space for a
process that doesn't do IO.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

fd0928df

__bio_clone: don't calculate hw/phys segment counts · 5d84070e

由 Jens Axboe 提交于 1月 25, 2008

If the users sets a new ->bi_bdev on the bio after __bio_clone() has
returned it, the "segment counts valid" flag still remains even though
it may be different with the new target. So don't calculate segment
counts in __bio_clone().
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

5d84070e

26 1月, 2008 7 次提交

ocfs2: clean up bh null checks · 2fe5c1d7

由 Mark Fasheh 提交于 1月 23, 2008

If we know a buffer_head is non-null, then brelse() is unnecessary and
put_bh() can be used instead. Also, an explicit check for NULL is
unnecessary when using brelse(). This patch only covers buffer_head_io.c and
resize.c, which have recently added code which exhibits this problem.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

2fe5c1d7

ocfs2: document access rules for blocked_lock_list · 7ec373cf

由 Mark Fasheh 提交于 1月 23, 2008

ocfs2_super->blocked_lock_list and ocfs2_super->blocked_lock_count have some
usage restrictions which aren't immediately obvious to anyone reading the
code. It's a good idea to document this so that we avoid making costly
mistakes in the future.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

7ec373cf

configfs: file.c fix possible recursive locking · 116ba5d5

由 Joonwoo Park 提交于 12月 26, 2007

configfs_register_subsystem() with default_groups triggers recursive locking.
it seems that mutex_lock_nested is needed.

=============================================
[ INFO: possible recursive locking detected ]
2.6.24-rc6 #145
---------------------------------------------
swapper/1 is trying to acquire lock:
 (&sb->s_type->i_mutex_key#3){--..}, at: [<c40c9a9e>] configfs_add_file+0x2e/0x70

but task is already holding lock:
 (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca985>] configfs_register_subsystem+0x55/0x130

other info that might help us debug this:
1 lock held by swapper/1:
 #0:  (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca985>] configfs_register_subsystem+0x55/0x130

stack backtrace:
Pid: 1, comm: swapper Not tainted 2.6.24-rc6 #145
 [<c40053ba>] show_trace_log_lvl+0x1a/0x30
 [<c4005e82>] show_trace+0x12/0x20
 [<c400687e>] dump_stack+0x6e/0x80
 [<c404ec72>] __lock_acquire+0xe62/0x1120
 [<c404efb2>] lock_acquire+0x82/0xa0
 [<c43fda88>] mutex_lock_nested+0x98/0x2e0
 [<c40c9a9e>] configfs_add_file+0x2e/0x70
 [<c40c9b0c>] configfs_create_file+0x2c/0x40
 [<c40ca639>] configfs_attach_item+0x139/0x220
 [<c40ca734>] configfs_attach_group+0x14/0x140
 [<c40ca7e9>] configfs_attach_group+0xc9/0x140
 [<c40ca9f6>] configfs_register_subsystem+0xc6/0x130
 [<c45c8186>] init_netconsole+0x2b6/0x300
 [<c45a75f2>] kernel_init+0x142/0x320
 [<c4004fb3>] kernel_thread_helper+0x7/0x14
 =======================
Signed-off-by: NJoonwoo Park <joonwpark81@gmail.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

116ba5d5

configfs: dir.c fix possible recursive locking · ba611edf

由 Joonwoo Park 提交于 12月 26, 2007

configfs_register_subsystem() with default_groups triggers recursive locking.
it seems that mutex_lock_nested is needed.

=============================================
[ INFO: possible recursive locking detected ]
2.6.24-rc6 #141
---------------------------------------------
swapper/1 is trying to acquire lock:
 (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca76f>] configfs_attach_group+0x4f/0x190

but task is already holding lock:
 (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca9d5>] configfs_register_subsystem+0x55/0x130

other info that might help us debug this:
1 lock held by swapper/1:
 #0:  (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca9d5>] configfs_register_subsystem+0x55/0x130

stack backtrace:
Pid: 1, comm: swapper Not tainted 2.6.24-rc6 #141
 [<c40053ba>] show_trace_log_lvl+0x1a/0x30
 [<c4005e82>] show_trace+0x12/0x20
 [<c400687e>] dump_stack+0x6e/0x80
 [<c404ec72>] __lock_acquire+0xe62/0x1120
 [<c404efb2>] lock_acquire+0x82/0xa0
 [<c43fdad8>] mutex_lock_nested+0x98/0x2e0
 [<c40ca76f>] configfs_attach_group+0x4f/0x190
 [<c40caa46>] configfs_register_subsystem+0xc6/0x130
 [<c45c8186>] init_netconsole+0x2b6/0x300
 [<c45a75f2>] kernel_init+0x142/0x320
 [<c4004fb3>] kernel_thread_helper+0x7/0x14
 =======================
Signed-off-by: NJoonwoo Park <joonwpark81@gmail.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

ba611edf

configfs: Remove EXPERIMENTAL · 02ac0499

由 Joel Becker 提交于 12月 31, 2007

configfs has been alive and kicking for a while now.  It underpins some
non-EXPERIMENTAL subsystems, such as OCFS2's cluster stack.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

02ac0499

ocfs2: bump version number · 0e5ae032

由 Mark Fasheh 提交于 11月 06, 2007

Bump the printed version to 1.5.0. This helps us quickly identify which
version of Ocfs2 a bug filer is running.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

0e5ae032

ocfs2/dlm: Clear joining_node on hearbeat node down · 2d4b1cbb

由 Tao Ma 提交于 1月 10, 2008

Currently the process of dlm join contains 2 steps: query join and assert join.
After query join, the joined node will set its joining_node. So if the joining
node happens to panic before the 2nd step, the joined node will fail to clear
its joining_node flag because that node isn't in the domain map. It at least
cause 2 problems.
1. All the new join request will fail. So no new node can mount the volume.
2. The joined node can't umount the volume since during the umount process it
has to wait for the joining_node to be unknown. So the umount will be hanged.

The solution is to clear the joining_node before we check the domain map.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

2d4b1cbb

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功