提交 · e2b4657453c0d5571bd3c7256585c486ed42d364 · openanolis / cloud-kernel

29 1月, 2008 15 次提交

ext4: store maxbytes for bitmapped files and return EFBIG as appropriate · e2b46574

由 Eric Sandeen 提交于 1月 28, 2008

Calculate & store the max offset for bitmapped files, and
catch too-large seeks, truncates, and writes in ext4, shortening
or rejecting as appropriate.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>

e2b46574

ext4: export iov_shorten from kernel for ext4's use · 19295529

由 Eric Sandeen 提交于 1月 28, 2008

Export iov_shorten() from kernel so that ext4 can
truncate too-large writes to bitmapped files.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>

19295529

ext4: different maxbytes functions for bitmap & extent files · cd2291a4

由 Eric Sandeen 提交于 1月 28, 2008

use 2 different maxbytes functions for bitmapped & extent-based
files.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>

cd2291a4

ext4: Support large files · 8180a562

由 Aneesh Kumar K.V 提交于 1月 28, 2008

This patch converts ext4_inode i_blocks to represent total
blocks occupied by the inode in file system block size.
Earlier the variable used to represent this in 512 byte
block size. This actually limited the total size of the file.

The feature is enabled transparently when we write an inode
whose i_blocks cannot be represnted as 512 byte units in a
48 bit variable.

inode flag  EXT4_HUGE_FILE_FL
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

8180a562

ext4: Add support for 48 bit inode i_blocks. · 0fc1b451

由 Aneesh Kumar K.V 提交于 1月 28, 2008

Use the __le16 l_i_reserved1 field of the linux2 struct of ext4_inode
to represet the higher 16 bits for i_blocks. With this change max_file
size becomes (2**48 -1 )* 512 bytes.

We add a RO_COMPAT feature to the super block to indicate that inode
have i_blocks represented as a split 48 bits. Super block with this
feature set cannot be mounted read write on a kernel with CONFIG_LSF
disabled.

Super block flag EXT4_FEATURE_RO_COMPAT_HUGE_FILE
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

0fc1b451

ext4: Rename i_dir_acl to i_size_high · a48380f7

由 Aneesh Kumar K.V 提交于 1月 28, 2008

Rename ext4_inode.i_dir_acl to i_size_high
drop ext4_inode_info.i_dir_acl as it is not used
Rename ext4_inode.i_size to ext4_inode.i_size_lo
Add helper function for accessing the ext4_inode combined i_size.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

a48380f7

ext4: Rename i_file_acl to i_file_acl_lo · 7973c0c1

由 Aneesh Kumar K.V 提交于 1月 28, 2008

Rename i_file_acl to i_file_acl_lo. This helps
in finding bugs where we use i_file_acl instead
of the combined i_file_acl_lo and i_file_acl_high
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

7973c0c1

ext4: Fix sparse warnings. · 1d03ec98

由 Aneesh Kumar K.V 提交于 1月 28, 2008

Fix sparse warnings related to static functions
and local variables.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

1d03ec98

ext4: Introduce ext4_update_*_feature · 99e6f829

由 Aneesh Kumar K.V 提交于 1月 28, 2008

Introduce ext4_update_*_feature and use them instead
of opencoding.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

99e6f829

ext4: fixes block group number being set to a negative value · 2aa9fc4c

由 Avantika Mathur 提交于 1月 28, 2008

This patch fixes various places where the group number is set to a negative
value.
Signed-off-by: NAvantika Mathur <mathur@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

2aa9fc4c

ext4: add ext4_group_t, and change all group variables to this type. · fd2d4291

由 Avantika Mathur 提交于 1月 28, 2008

In many places variables for block group are of type int, which limits the
maximum number of block groups to 2^31. Each block group can have up to
2^15 blocks, with a 4K block size, and the max filesystem size is limited to
2^31 * (2^15 * 2^12) = 2^58 -- or 256 PB

This patch introduces a new type ext4_group_t, of type unsigned long, to
represent block group numbers in ext4.
All occurrences of block group variables are converted to type ext4_group_t.
Signed-off-by: NAvantika Mathur <mathur@us.ibm.com>

fd2d4291

ext4 extents: remove unneeded casts · bba90743

由 Eric Sandeen 提交于 1月 28, 2008

There are many casts in extents.c which are not needed,
as the variables are already the type of the cast, or
are being promoted for no particular reason in printk's.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>

bba90743

ext4: Introduce ext4_lblk_t · 725d26d3

由 Aneesh Kumar K.V 提交于 1月 28, 2008

This patch adds a new data type ext4_lblk_t to represent
the logical file blocks.

This is the preparatory patch to support large files in ext4
The follow up patch with convert the ext4_inode i_blocks to
represent the number of blocks in file system block size. This
changes makes it possible to have a block number 2**32 -1 which
will result in overflow if the block number is represented by
signed long. This patch convert all the block number to type
ext4_lblk_t which is typedef to __u32

Also remove dead code ext4_ext_walk_space
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>

725d26d3

ext4: Avoid rec_len overflow with 64KB block size · a72d7f83

由 Jan Kara 提交于 1月 28, 2008

With 64KB blocksize, a directory entry can have size 64KB which does not fit
into 16 bits we have for entry lenght. So we store 0xffff instead and convert
value when read from / written to disk. The patch also converts some places
to use ext4_next_entry() when we are changing them anyway.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>

a72d7f83

ext4: Support large blocksize up to PAGESIZE · afc7cbca

由 Takashi Sato 提交于 1月 28, 2008

This patch set supports large block size(>4k, <=64k) in ext4,
just enlarging the block size limit. But it is NOT possible to have 64kB
blocksize on ext4 without some changes to the directory handling
code.  The reason is that an empty 64kB directory block would have a
rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in
the filesystem.  The proposed solution is treat 64k rec_len
with a an impossible value like rec_len = 0xffff to handle this.

The Patch-set consists of the following 2 patches.
  [1/2]  ext4: enlarge blocksize
         - Allow blocksize up to pagesize

  [2/2]  ext4: fix rec_len overflow
         - prevent rec_len from overflow with 64KB blocksize

Now on 64k page ppc64 box runs with this patch set we could create a 64k
block size ext4dev, and able to handle empty directory block.
Signed-off-by: NTakashi Sato <sho@tnes.nec.co.jp>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>

afc7cbca

28 1月, 2008 4 次提交

cfq-iosched: relax IOPRIO_CLASS_IDLE restrictions · 0871714e

由 Jens Axboe 提交于 1月 28, 2008

Currently you must be root to set idle io prio class on a process. This
is due to the fact that the idle class is implemented as a true idle
class, meaning that it will not make progress if someone else is
requesting disk access. Unfortunately this means that it opens DOS
opportunities by locking down file system resources, hence it is root
only at the moment.

This patch relaxes the idle class a little, by removing the truly idle
part (which entals a grace period with associated timer). The
modifications make the idle class as close to zero impact as can be done
while still guarenteeing progress. This means we can relax the root only
criteria as well.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

0871714e

io context sharing: preliminary support · d38ecf93

由 Jens Axboe 提交于 1月 24, 2008

Detach task state from ioc, instead keep track of how many processes
are accessing the ioc.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d38ecf93

ioprio: move io priority from task_struct to io_context · fd0928df

由 Jens Axboe 提交于 1月 24, 2008

This is where it belongs and then it doesn't take up space for a
process that doesn't do IO.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

fd0928df

__bio_clone: don't calculate hw/phys segment counts · 5d84070e

由 Jens Axboe 提交于 1月 25, 2008

If the users sets a new ->bi_bdev on the bio after __bio_clone() has
returned it, the "segment counts valid" flag still remains even though
it may be different with the new target. So don't calculate segment
counts in __bio_clone().
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

5d84070e

26 1月, 2008 21 次提交

ocfs2: clean up bh null checks · 2fe5c1d7

由 Mark Fasheh 提交于 1月 23, 2008

If we know a buffer_head is non-null, then brelse() is unnecessary and
put_bh() can be used instead. Also, an explicit check for NULL is
unnecessary when using brelse(). This patch only covers buffer_head_io.c and
resize.c, which have recently added code which exhibits this problem.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

2fe5c1d7

ocfs2: document access rules for blocked_lock_list · 7ec373cf

由 Mark Fasheh 提交于 1月 23, 2008

ocfs2_super->blocked_lock_list and ocfs2_super->blocked_lock_count have some
usage restrictions which aren't immediately obvious to anyone reading the
code. It's a good idea to document this so that we avoid making costly
mistakes in the future.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

7ec373cf

configfs: file.c fix possible recursive locking · 116ba5d5

由 Joonwoo Park 提交于 12月 26, 2007

configfs_register_subsystem() with default_groups triggers recursive locking.
it seems that mutex_lock_nested is needed.

=============================================
[ INFO: possible recursive locking detected ]
2.6.24-rc6 #145
---------------------------------------------
swapper/1 is trying to acquire lock:
 (&sb->s_type->i_mutex_key#3){--..}, at: [<c40c9a9e>] configfs_add_file+0x2e/0x70

but task is already holding lock:
 (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca985>] configfs_register_subsystem+0x55/0x130

other info that might help us debug this:
1 lock held by swapper/1:
 #0:  (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca985>] configfs_register_subsystem+0x55/0x130

stack backtrace:
Pid: 1, comm: swapper Not tainted 2.6.24-rc6 #145
 [<c40053ba>] show_trace_log_lvl+0x1a/0x30
 [<c4005e82>] show_trace+0x12/0x20
 [<c400687e>] dump_stack+0x6e/0x80
 [<c404ec72>] __lock_acquire+0xe62/0x1120
 [<c404efb2>] lock_acquire+0x82/0xa0
 [<c43fda88>] mutex_lock_nested+0x98/0x2e0
 [<c40c9a9e>] configfs_add_file+0x2e/0x70
 [<c40c9b0c>] configfs_create_file+0x2c/0x40
 [<c40ca639>] configfs_attach_item+0x139/0x220
 [<c40ca734>] configfs_attach_group+0x14/0x140
 [<c40ca7e9>] configfs_attach_group+0xc9/0x140
 [<c40ca9f6>] configfs_register_subsystem+0xc6/0x130
 [<c45c8186>] init_netconsole+0x2b6/0x300
 [<c45a75f2>] kernel_init+0x142/0x320
 [<c4004fb3>] kernel_thread_helper+0x7/0x14
 =======================
Signed-off-by: NJoonwoo Park <joonwpark81@gmail.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

116ba5d5

configfs: dir.c fix possible recursive locking · ba611edf

由 Joonwoo Park 提交于 12月 26, 2007

configfs_register_subsystem() with default_groups triggers recursive locking.
it seems that mutex_lock_nested is needed.

=============================================
[ INFO: possible recursive locking detected ]
2.6.24-rc6 #141
---------------------------------------------
swapper/1 is trying to acquire lock:
 (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca76f>] configfs_attach_group+0x4f/0x190

but task is already holding lock:
 (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca9d5>] configfs_register_subsystem+0x55/0x130

other info that might help us debug this:
1 lock held by swapper/1:
 #0:  (&sb->s_type->i_mutex_key#3){--..}, at: [<c40ca9d5>] configfs_register_subsystem+0x55/0x130

stack backtrace:
Pid: 1, comm: swapper Not tainted 2.6.24-rc6 #141
 [<c40053ba>] show_trace_log_lvl+0x1a/0x30
 [<c4005e82>] show_trace+0x12/0x20
 [<c400687e>] dump_stack+0x6e/0x80
 [<c404ec72>] __lock_acquire+0xe62/0x1120
 [<c404efb2>] lock_acquire+0x82/0xa0
 [<c43fdad8>] mutex_lock_nested+0x98/0x2e0
 [<c40ca76f>] configfs_attach_group+0x4f/0x190
 [<c40caa46>] configfs_register_subsystem+0xc6/0x130
 [<c45c8186>] init_netconsole+0x2b6/0x300
 [<c45a75f2>] kernel_init+0x142/0x320
 [<c4004fb3>] kernel_thread_helper+0x7/0x14
 =======================
Signed-off-by: NJoonwoo Park <joonwpark81@gmail.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

ba611edf

configfs: Remove EXPERIMENTAL · 02ac0499

由 Joel Becker 提交于 12月 31, 2007

configfs has been alive and kicking for a while now.  It underpins some
non-EXPERIMENTAL subsystems, such as OCFS2's cluster stack.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

02ac0499

ocfs2: bump version number · 0e5ae032

由 Mark Fasheh 提交于 11月 06, 2007

Bump the printed version to 1.5.0. This helps us quickly identify which
version of Ocfs2 a bug filer is running.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

0e5ae032

ocfs2/dlm: Clear joining_node on hearbeat node down · 2d4b1cbb

由 Tao Ma 提交于 1月 10, 2008

Currently the process of dlm join contains 2 steps: query join and assert join.
After query join, the joined node will set its joining_node. So if the joining
node happens to panic before the 2nd step, the joined node will fail to clear
its joining_node flag because that node isn't in the domain map. It at least
cause 2 problems.
1. All the new join request will fail. So no new node can mount the volume.
2. The joined node can't umount the volume since during the umount process it
has to wait for the joining_node to be unknown. So the umount will be hanged.

The solution is to clear the joining_node before we check the domain map.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

2d4b1cbb

ocfs2: convert byte order of constant instead of variable · 4092d49f

由 Marcin Slusarz 提交于 12月 25, 2007

Convert byte order of constant instead of variable it will be done at
compile time vs run time. Remove unused le32_and_cpu.
Signed-off-by: NMarcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

4092d49f

ocfs2: Update default cluster timeouts · 17104683

由 Sunil Mushran 提交于 11月 06, 2007

Lots of people are having trouble with the default timeouts, which are too
low. These new values are derived from an informal survey taken on
ocfs2-users, as well as data from bug reports. This should reduce the amount
of cluster disconnects and subsequent fencing seen during normal workloads.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

17104683

ocfs2: printf fixes · 634bf74d

由 Jan Kara 提交于 12月 19, 2007

Explicitely convert loff_t to long long in printf. Just for sure...
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

634bf74d

ocfs2: Use generic_file_llseek · 32c3c0e2

由 Jan Kara 提交于 12月 19, 2007

We should use generic_file_llseek() and not default_llseek() so that
s_maxbytes gets properly checked when seeking.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

32c3c0e2

ocfs2: Safer read_inline_data() · d2849fb2

由 Jan Kara 提交于 12月 19, 2007

In ocfs2_read_inline_data() we should store file size in loff_t. Although
the file size should fit in 32 bits we cannot be sure in case filesystem is
corrupted.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

d2849fb2

ocfs2: Silence false lockdep warnings · 5fa0613e

由 Jan Kara 提交于 1月 11, 2008

Create separate lockdep lock classes for system file's i_mutexes. They are
used to guard allocations and similar things and thus rank differently
than i_mutex of a regular file or directory.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

5fa0613e

[PATCH 2/2] ocfs2: cluster aware flock() · 53fc622b

由 Mark Fasheh 提交于 12月 20, 2007

Hook up ocfs2_flock(), using the new flock lock type in dlmglue.c. A new
mount option, "localflocks" is added so that users can revert to old
functionality as need be.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

53fc622b

[PATCH 1/2] ocfs2: add flock lock type · cf8e06f1

由 Mark Fasheh 提交于 12月 20, 2007

This adds a new dlmglue lock type which is intended to back flock()
requests.

Since these locks are driven from userspace, usage rules are much more
liberal than the typical Ocfs2 internal cluster lock. As a result, we can't
make use of most dlmglue features - lock caching and lock level
optimizations in particular. Additionally, userspace is free to deadlock
itself, so we have to deal with that in the same way as the rest of the
kernel - by allowing a signal to abort a lock request.

In order to keep ocfs2_cluster_lock() complexity down, ocfs2_file_lock()
does it's own dlm coordination. We still use the same helper functions
though, so duplicated code is kept to a minimum.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

cf8e06f1

ocfs2: Local alloc window size changeable via mount option · 2fbe8d1e

由 Sunil Mushran 提交于 12月 20, 2007

Local alloc is a performance optimization in ocfs2 in which a node
takes a window of bits from the global bitmap and then uses that for
all small local allocations. This window size is fixed to 8MB currently.
This patch allows users to specify the window size in MB including
disabling it by passing in 0. If the number specified is too large,
the fs will use the default value of 8MB.

mount -o localalloc=X /dev/sdX /mntpoint
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

2fbe8d1e

ocfs2: Support commit= mount option · d147b3d6

由 Mark Fasheh 提交于 11月 07, 2007

Mostly taken from ext3. This allows the user to set the jbd commit interval,
in seconds. The default of 5 seconds stays the same, but now users can
easily increase the commit interval. Typically, this would be increased in
order to benefit performance at the expense of data-safety.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

d147b3d6

ocfs2: Add missing permission checks · 0957f007

由 Mark Fasheh 提交于 12月 18, 2007

Check that an online resize is being driven by a user with permission to
change system resource limits.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

0957f007

[PATCH 2/2] ocfs2: Implement group add for online resize · 7909f2bf

由 Tao Ma 提交于 12月 18, 2007

This patch adds the ability for a userspace program to request that a
properly formatted cluster group be added to the main allocation bitmap for
an Ocfs2 file system. The request is made via an ioctl, OCFS2_IOC_GROUP_ADD.
On a high level, this is similar to ext3, but we use a different ioctl as
the structure which has to be passed through is different.

During an online resize, tunefs.ocfs2 will format any new cluster groups
which must be added to complete the resize, and call OCFS2_IOC_GROUP_ADD on
each one. Kernel verifies that the core cluster group information is valid
and then does the work of linking it into the global allocation bitmap.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

7909f2bf

[PATCH 1/2] ocfs2: Add group extend for online resize · d659072f

由 Tao Ma 提交于 12月 18, 2007

This patch adds the ability for a userspace program to request an extend of
last cluster group on an Ocfs2 file system. The request is made via ioctl,
OCFS2_IOC_GROUP_EXTEND. This is derived from EXT3_IOC_GROUP_EXTEND, but is
obviously Ocfs2 specific.

tunefs.ocfs2 would call this for an online-resize operation if the last
cluster group isn't full.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

d659072f

ocfs2: Initalize bitmap_cpg of ocfs2_super to be the maximum. · e9d578a8

由 Tao Ma 提交于 12月 18, 2007

This value is initialized from global_bitmap->id2.i_chain.cl_cpg. If there
is only 1 group, it will be equal to the total clusters in the volume. So
as for online resize, it should change for all the nodes in the cluster.
It isn't easy and there is no corresponding lock for it.

bitmap_cpg is only used in 2 areas:
1. Check whether the suballoc is too large for us to allocate from the global
bitmap, so it is little used. And now the suballoc size is 2048, it rarely
meet this situation and the check is almost useless.
2. Calculate which group a cluster belongs to. We use it during truncate to
figure out which cluster group an extent belongs too. But we should be OK
if we increase it though as the cluster group calculated shouldn't change
and we only ever have a small bitmap_cpg on file systems with a single
cluster group.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

e9d578a8

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功