提交 · 9b4c0ff32ccd87ab52d4c5bd0a0536febce11370 · openanolis / cloud-kernel

08 9月, 2010 7 次提交

ocfs2: Fix deadlock when allocating page · 9b4c0ff3

由 Jan Kara 提交于 8月 24, 2010

We cannot call grab_cache_page() when holding filesystem locks or with
a transaction started as grab_cache_page() calls page allocation with
GFP_KERNEL flag and thus page reclaim can recurse back into the filesystem
causing deadlocks or various assertion failures. We have to use
find_or_create_page() instead and pass it GFP_NOFS as we do with other
allocations.
Acked-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTao Ma <tao.ma@oracle.com>

9b4c0ff3

ocfs2: properly set and use inode group alloc hint · b2b6ebf5

由 Mark Fasheh 提交于 8月 26, 2010

We were setting ac->ac_last_group in ocfs2_claim_suballoc_bits from
res->sr_bg_blkno.  Unfortunately, res->sr_bg_blkno is going to be zero under
normal (non-fragmented) circumstances. The discontig block group patches
effectively turned off that feature. Fix this by correctly calculating what
the next group hint should be.
Acked-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Tested-by: NGoldwyn Rodrigues <rgoldwyn@suse.de>
Signed-off-by: NTao Ma <tao.ma@oracle.com>

b2b6ebf5

ocfs2: Use the right group in nfs sync check. · 889f004a

由 Tao Ma 提交于 9月 02, 2010

We have added discontig block group now, and now an inode
can be allocated in an discontig block group. So get
it in ocfs2_get_suballoc_slot_bit.

The old ocfs2_test_suballoc_bit gets group block no
from the allocation inode which is wrong. Fix it by
passing the right group.
Acked-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NTao Ma <tao.ma@oracle.com>

889f004a

ocfs2: Flush drive's caches on fdatasync · 04eda1a1

由 Jan Kara 提交于 8月 05, 2010

When 'barrier' mount option is specified, we have to issue a cache flush
during fdatasync(2). We have to do this even if inode doesn't have
I_DIRTY_DATASYNC set because we still have to get written *data* to disk so
that they are not lost in case of crash.
Acked-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Singed-off-by: NTao Ma <tao.ma@oracle.com>

04eda1a1

ocfs2: make __ocfs2_page_mkwrite handle file end properly. · f63afdb2

由 Tao Ma 提交于 7月 17, 2010

__ocfs2_page_mkwrite now is broken in handling file end.
1. the last page should be the page contains i_size - 1.
2. the len in the last page is also calculated wrong.
So change them accordingly.
Acked-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NTao Ma <tao.ma@oracle.com>

f63afdb2

ocfs2: Fix incorrect checksum validation error · f5ce5a08

由 Sunil Mushran 提交于 8月 12, 2010

For local mounts, ocfs2_read_locked_inode() calls ocfs2_read_blocks_sync() to
read the inode off the disk. The latter first checks to see if that block is
cached in the journal, and, if so, returns that block. That is ok.

But ocfs2_read_locked_inode() goes wrong when it tries to validate the checksum
of such blocks. Blocks that are cached in the journal may not have had their
checksum computed as yet. We should not validate the checksums of such blocks.

Fixes ossbz#1282
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1282Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Cc: stable@kernel.org
Singed-off-by: NTao Ma <tao.ma@oracle.com>

f5ce5a08

ocfs2: Fix metaecc error messages · dc696ace

由 Sunil Mushran 提交于 8月 12, 2010

Like tools, the checksum validate function now prints the values in hex.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Singed-off-by: NTao Ma <tao.ma@oracle.com>

dc696ace

10 8月, 2010 6 次提交

A
convert remaining ->clear_inode() to ->evict_inode() · b57922d9
由 Al Viro 提交于 6月 07, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
b57922d9
A
Make ->drop_inode() just return whether inode needs to be dropped · 45321ac5
由 Al Viro 提交于 6月 07, 2010
```
... and let iput_final() do the actual eviction or retention
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
45321ac5
A
convert ocfs2 to ->evict_inode() · 066d92dc
由 Al Viro 提交于 6月 08, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
066d92dc

check ATTR_SIZE contraints in inode_change_ok · 2c27c65e

由 Christoph Hellwig 提交于 6月 04, 2010

Make sure we check the truncate constraints early on in ->setattr by adding
those checks to inode_change_ok.  Also clean up and document inode_change_ok
to make this obvious.

As a fallout we don't have to call inode_newsize_ok from simple_setsize and
simplify it down to a truncate_setsize which doesn't return an error.  This
simplifies a lot of setattr implementations and means we use truncate_setsize
almost everywhere.  Get rid of fat_setsize now that it's trivial and mark
ext2_setsize static to make the calling convention obvious.

Keep the inode_newsize_ok in vmtruncate for now as all callers need an
audit for its removal anyway.

Note: setattr code in ecryptfs doesn't call inode_change_ok at all and
needs a deeper audit, but that is left for later.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2c27c65e

remove inode_setattr · 1025774c

由 Christoph Hellwig 提交于 6月 04, 2010

Replace inode_setattr with opencoded variants of it in all callers.  This
moves the remaining call to vmtruncate into the filesystem methods where it
can be replaced with the proper truncate sequence.

In a few cases it was obvious that we would never end up calling vmtruncate
so it was left out in the opencoded variant:

 spufs: explicitly checks for ATTR_SIZE earlier
 btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
 ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above

In addition to that ncpfs called inode_setattr with handcrafted iattrs,
which allowed to trim down the opencoded variant.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1025774c

sort out blockdev_direct_IO variants · eafdc7d1

由 Christoph Hellwig 提交于 6月 04, 2010

Move the call to vmtruncate to get rid of accessive blocks to the callers
in prepearation of the new truncate calling sequence. This was only done
for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
was not needed anyway. Get rid of blockdev_direct_IO_no_locking and
its _newtrunc variant while at it as just opencoding the two additional
paramters is shorted than the name suffix.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

eafdc7d1

08 8月, 2010 8 次提交

O2net: Disallow o2net accept connection request from itself. · 415cf32c

由 Tristan Ye 提交于 8月 02, 2010

Currently, o2net_accept_one() is allowed to accept a connection from
listening node itself, such a fake connection will not be successfully
established due to no handshake detected afterwards, and later end up
with triggering connecting worker in a loop.

We're going to fix this by treating such connection request as 'invalid',
since we've got no chance of requesting connection from a node to itself
in a OCFS2 cluster.

The fix doesn't hurt user's scan for o2net-listener, it always gets a
successful connection from userpace.
Signed-off-by: NTristan Ye <tristan.ye@oracle.com>
Acked-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

415cf32c

ocfs2/dlm: remove potential deadlock -V3 · b11f1f1a

由 Wengang Wang 提交于 7月 30, 2010

When we need to take both dlm_domain_lock and dlm->spinlock, we should take
them in order of: dlm_domain_lock then dlm->spinlock.

There is pathes disobey this order. That is calling dlm_lockres_put() with
dlm->spinlock held in dlm_run_purge_list. dlm_lockres_put() calls dlm_put() at
the ref and dlm_put() locks on dlm_domain_lock.

Fix:
Don't grab/put the dlm when the initialising/releasing lockres.
That grab is not required because we don't call dlm_unregister_domain()
based on refcount.
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Cc: stable@kernel.org
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

b11f1f1a

ocfs2/dlm: avoid incorrect bit set in refmap on recovery master · a524812b

由 Wengang Wang 提交于 7月 30, 2010

In the following situation, there remains an incorrect bit in refmap on the
recovery master. Finally the recovery master will fail at purging the lockres
due to the incorrect bit in refmap.

1) node A has no interest on lockres A any longer, so it is purging it.
2) the owner of lockres A is node B, so node A is sending de-ref message
to node B.
3) at this time, node B crashed. node C becomes the recovery master. it recovers
lockres A(because the master is the dead node B).
4) node A migrated lockres A to node C with a refbit there.
5) node A failed to send de-ref message to node B because it crashed. The failure
is ignored. no other action is done for lockres A any more.

For mormal, re-send the deref message to it to recovery master can fix it. Well,
ignoring the failure of deref to the original master and not recovering the lockres
to recovery master has the same effect. And the later is simpler.
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Acked-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Cc: stable@kernel.org
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

a524812b

Fix the nested PR lock calling issue in ACL · 845b6cf3

由 Jiaju Zhang 提交于 7月 28, 2010

Hi,

Thanks a lot for all the review and comments so far;) I'd like to send
the improved (V4) version of this patch.

This patch fixes a deadlock in OCFS2 ACL. We found this bug in OCFS2
and Samba integration using scenario, the symptom is several smbd
processes will be hung under heavy workload. Finally we found out it
is the nested PR lock calling that leads to this deadlock:

 node1        node2
              gr PR
                |
                V
 PR(EX)---> BAST:OCFS2_LOCK_BLOCKED
                |
                V
              rq PR
                |
                V
              wait=1

After requesting the 2nd PR lock, the process "smbd" went into D
state. It can only be woken up when the 1st PR lock's RO holder equals
zero. There should be an ocfs2_inode_unlock in the calling path later
on, which can decrement the RO holder. But since it has been in
uninterruptible sleep, the unlock function has no chance to be called.

The related stack trace is:
smbd          D ffff8800013d0600     0  9522   5608 0x00000000
 ffff88002ca7fb18 0000000000000282 ffff88002f964500 ffff88002ca7fa98
 ffff8800013d0600 ffff88002ca7fae0 ffff88002f964340 ffff88002f964340
 ffff88002ca7ffd8 ffff88002ca7ffd8 ffff88002f964340 ffff88002f964340
Call Trace:
[<ffffffff80350425>] schedule_timeout+0x175/0x210
[<ffffffff8034f580>] wait_for_common+0xf0/0x210
[<ffffffffa03e12b9>] __ocfs2_cluster_lock+0x3b9/0xa90 [ocfs2]
[<ffffffffa03e7665>] ocfs2_inode_lock_full_nested+0x255/0xdb0 [ocfs2]
[<ffffffffa0446019>] ocfs2_get_acl+0x69/0x120 [ocfs2]
[<ffffffffa0446368>] ocfs2_check_acl+0x28/0x80 [ocfs2]
[<ffffffff800e3507>] acl_permission_check+0x57/0xb0
[<ffffffff800e357d>] generic_permission+0x1d/0xc0
[<ffffffffa03eecea>] ocfs2_permission+0x10a/0x1d0 [ocfs2]
[<ffffffff800e3f65>] inode_permission+0x45/0x100
[<ffffffff800d86b3>] sys_chdir+0x53/0x90
[<ffffffff80007458>] system_call_fastpath+0x16/0x1b
[<00007f34a4ef6927>] 0x7f34a4ef6927

For details, please see:
https://bugzilla.novell.com/show_bug.cgi?id=614332 and
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1278Signed-off-by: NJiaju Zhang <jjzhang@suse.de>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Cc: stable@kernel.org
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

845b6cf3

ocfs2: Count more refcount records in file system fragmentation. · 8a2e70c4

由 Tao Ma 提交于 7月 22, 2010

The refcount record calculation in ocfs2_calc_refcount_meta_credits
is too optimistic that we can always allocate contiguous clusters
and handle an already existed refcount rec as a whole. Actually
because of file system fragmentation, we may have the chance to split
a refcount record into 3 parts during the transaction. So consider
the worst case in record calculation.

Cc: stable@kernel.org
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

8a2e70c4

ocfs2 fix o2dlm dlm run purgelist (rev 3) · 7beaf243

由 Srinivas Eeda 提交于 7月 19, 2010

This patch fixes two problems in dlm_run_purgelist

1. If a lockres is found to be in use, dlm_run_purgelist keeps trying to purge
the same lockres instead of trying the next lockres.

2. When a lockres is found unused, dlm_run_purgelist releases lockres spinlock
before setting DLM_LOCK_RES_DROPPING_REF and calls dlm_purge_lockres.
spinlock is reacquired but in this window lockres can get reused. This leads
to BUG.

This patch modifies dlm_run_purgelist to skip lockres if it's in use and purge
 next lockres. It also sets DLM_LOCK_RES_DROPPING_REF before releasing the
lockres spinlock protecting it from getting reused.
Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Acked-by: NSunil Mushran <sunil.mushran@oracle.com>
Cc: stable@kernel.org
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

7beaf243

ocfs2/dlm: fix a dead lock · 6d98c3cc

由 Wengang Wang 提交于 7月 16, 2010

When we have to take both dlm->master_lock and lockres->spinlock,
take them in order

lockres->spinlock and then dlm->master_lock.

The patch fixes a violation of the rule.
We can simply move taking dlm->master_lock to where we have dropped res->spinlock
since when we access res->state and free mle memory we don't need master_lock's
protection.
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Cc: stable@kernel.org
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

6d98c3cc

ocfs2: do not overwrite error codes in ocfs2_init_acl · 6eda3dd3

由 Tiger Yang 提交于 7月 16, 2010

Setting the acl while creating a new inode depends on
the error codes of posix_acl_create_masq. This patch fix
a issue of overwriting the error codes of it.
Reported-by: NPawel Zawora <pzawora@gmail.com>
Cc: <stable@kernel.org> [ .33, .34 ]
Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

6eda3dd3

04 8月, 2010 1 次提交

jbd2: Change j_state_lock to be a rwlock_t · a931da6a

由 Theodore Ts'o 提交于 8月 03, 2010

Lockstat reports have shown that j_state_lock is a major source of
lock contention, especially on systems with more than 4 CPU cores.  So
change it to be a read/write spinlock.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a931da6a

27 7月, 2010 2 次提交

direct-io: move aio_complete into ->end_io · 552ef802

由 Christoph Hellwig 提交于 7月 27, 2010

Filesystems with unwritten extent support must not complete an AIO request
until the transaction to convert the extent has been commited.  That means
the aio_complete calls needs to be moved into the ->end_io callback so
that the filesystem can control when to call it exactly.

This makes a bit of a mess out of dio_complete and the ->end_io callback
prototype even more complicated. 
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz> 
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

552ef802

direct-io: move aio_complete into ->end_io · 40e2e973

由 Christoph Hellwig 提交于 7月 18, 2010

Filesystems with unwritten extent support must not complete an AIO request
until the transaction to convert the extent has been commited.  That means
the aio_complete calls needs to be moved into the ->end_io callback so
that the filesystem can control when to call it exactly.

This makes a bit of a mess out of dio_complete and the ->end_io callback
prototype even more complicated.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAlex Elder <aelder@sgi.com>

40e2e973

20 7月, 2010 1 次提交

fs/ocfs2: Remove unnecessary casts of private_data · 33fa1d90

由 Joe Perches 提交于 7月 12, 2010

Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

33fa1d90

17 7月, 2010 1 次提交

ocfs2: Silence gcc warning in ocfs2_write_zero_page(). · 5453258d

由 Joel Becker 提交于 7月 16, 2010

ocfs2_write_zero_page() has a loop that won't ever be skipped, but gcc
doesn't know that.  Set ret=0 just to make gcc happy.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

5453258d

16 7月, 2010 3 次提交

jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions · 13ceef09

由 Jan Kara 提交于 7月 14, 2010

OCFS2 uses t_commit trigger to compute and store checksum of the just
committed blocks. When a buffer has b_frozen_data, checksum is computed
for it instead of b_data but this can result in an old checksum being
written to the filesystem in the following scenario:

1) transaction1 is opened
2) handle1 is opened
3) journal_access(handle1, bh)
    - This sets jh->b_transaction to transaction1
4) modify(bh)
5) journal_dirty(handle1, bh)
6) handle1 is closed
7) start committing transaction1, opening transaction2
8) handle2 is opened
9) journal_access(handle2, bh)
    - This copies off b_frozen_data to make it safe for transaction1 to commit.
      jh->b_next_transaction is set to transaction2.
10) jbd2_journal_write_metadata() checksums b_frozen_data
11) the journal correctly writes b_frozen_data to the disk journal
12) handle2 is closed
    - There was no dirty call for the bh on handle2, so it is never queued for
      any more journal operation
13) Checkpointing finally happens, and it just spools the bh via normal buffer
writeback.  This will write b_data, which was never triggered on and thus
contains a wrong (old) checksum.

This patch fixes the problem by calling the trigger at the moment data is
frozen for journal commit - i.e., either when b_frozen_data is created by
do_get_write_access or just before we write a buffer to the log if
b_frozen_data does not exist. We also rename the trigger to t_frozen as
that better describes when it is called.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

13ceef09

ocfs2/dlm: Remove BUG_ON from migration in the rare case of a down node · a39953dd

由 Wengang Wang 提交于 7月 14, 2010

For migration, we are waiting for DLM_LOCK_RES_MIGRATING flag to be set
before sending DLM_MIG_LOCKRES_MSG message to the target. We are using
dlm_migration_can_proceed() for that purpose. However, if the node is
down, dlm_migration_can_proceed() will also return "go ahead". In this
rare case, the DLM_LOCK_RES_MIGRATING flag might not be set yet. Remove
the BUG_ON() that trips over this condition.
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

a39953dd

ocfs2: Don't duplicate pages past i_size during CoW. · f5e27b6d

由 Tao Ma 提交于 7月 14, 2010

During CoW, the pages after i_size don't contain valid data, so there's
no need to read and duplicate them.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

f5e27b6d

13 7月, 2010 6 次提交

ocfs2: tighten up strlen() checking · e372357b

由 Dan Carpenter 提交于 7月 10, 2010

This function is only called from one place and it's like this:
	dlm_register_domain(conn->cc_name, dlm_key, &fs_version);

The "conn->cc_name" is 64 characters long.  If strlen(conn->cc_name)
were equal to O2NM_MAX_NAME_LEN (64) that would be a bug because
strlen() doesn't count the NULL character.

In fact, if you look how O2NM_MAX_NAME_LEN is used, it mostly describes
64 character buffers.  The only exception is nd_name from struct
o2nm_node.

Anyway I looked into it and in this case the domain string comes from
osb->uuid_str in ocfs2_setup_osb_uuid().  That's 32 characters and NULL
which easily fits into O2NM_MAX_NAME_LEN.  This patch doesn't change how
the code works, but I think it makes the code a little cleaner.
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

e372357b

ocfs2: Make xattr reflink work with new local alloc reservation. · 121a39bb

由 Tao Ma 提交于 7月 09, 2010

The new reservation code in local alloc has add the limitation
that the caller should handle the case that the local alloc
doesn't give use enough contiguous clusters. It make the old
xattr reflink code broken.

So this patch udpate the xattr reflink code so that it can
handle the case that local alloc give us one cluster at a time.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

121a39bb

ocfs2: make xattr extension work with new local alloc reservation. · a78f9f46

由 Tao Ma 提交于 7月 09, 2010

The old ocfs2_xattr_extent_allocation is too optimistic about
the clusters we can get. So actually if the file system is
too fragmented, ocfs2_add_clusters_in_btree will return us
with EGAIN and we need to allocate clusters once again.

So this patch change it to a while loop so that we can allocate
clusters until we reach clusters_to_add.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Cc: stable@kernel.org

a78f9f46

ocfs2: Remove the redundant cpu_to_le64. · 0a463b74

由 Tao Ma 提交于 7月 08, 2010

In ocfs2_block_group_alloc, we set c_blkno by bg->bg_blkno.
But actually bg->bg_blkno is already changed to little endian
in ocfs2_block_group_fill. So remove the extra cpu_to_le64.
Reported-by: NMarcos Matsunaga <Marcos.Matsunaga@oracle.com>
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

0a463b74

ocfs2/dlm: don't access beyond bitmap size · f471c9df

由 Wengang Wang 提交于 6月 30, 2010

dlm->recovery_map is defined as
	unsigned long recovery_map[BITS_TO_LONGS(O2NM_MAX_NODES)];

We should treat O2NM_MAX_NODES as the bit map size in bits.
This patches fixes a bit operation that takes O2NM_MAX_NODES + 1 as bitmap size.
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

f471c9df

ocfs2: No need to zero pages past i_size. · 693c241a

由 Joel Becker 提交于 7月 02, 2010

When ocfs2 fills a hole, it does so by allocating clusters.  When a
cluster is larger than the write, ocfs2 must zero the portions of the
cluster outside of the write.  If the clustersize is smaller than a
pagecache page, this is handled by the normal pagecache mechanisms, but
when the clustersize is larger than a page, ocfs2's write code will zero
the pages adjacent to the write.  This makes sure the entire cluster is
zeroed correctly.

Currently ocfs2 behaves exactly the same when writing past i_size.
However, this means ocfs2 is writing zeroed pages for portions of a new
cluster that are beyond i_size.  The page writeback code isn't expecting
this.  It treats all pages past the one containing i_size as left behind
due to a previous truncate operation.

Thankfully, ocfs2 calculates the number of pages it will be working on
up front.  The rest of the write code merely honors the original
calculation.  We can simply trim the number of pages to only cover the
actual file data.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Cc: stable@kernel.org

693c241a

09 7月, 2010 2 次提交

ocfs2: Zero the tail cluster when extending past i_size. · 5693486b

由 Joel Becker 提交于 7月 01, 2010

ocfs2's allocation unit is the cluster.  This can be larger than a block
or even a memory page.  This means that a file may have many blocks in
its last extent that are beyond the block containing i_size.  There also
may be more unwritten extents after that.

When ocfs2 grows a file, it zeros the entire cluster in order to ensure
future i_size growth will see cleared blocks.  Unfortunately,
block_write_full_page() drops the pages past i_size.  This means that
ocfs2 is actually leaking garbage data into the tail end of that last
cluster.  This is a bug.

We adjust ocfs2_write_begin_nolock() and ocfs2_extend_file() to detect
when a write or truncate is past i_size.  They will use
ocfs2_zero_extend() to ensure the data is properly zeroed.

Older versions of ocfs2_zero_extend() simply zeroed every block between
i_size and the zeroing position.  This presumes three things:

1) There is allocation for all of these blocks.
2) The extents are not unwritten.
3) The extents are not refcounted.

(1) and (2) hold true for non-sparse filesystems, which used to be the
only users of ocfs2_zero_extend().  (3) is another bug.

Since we're now using ocfs2_zero_extend() for sparse filesystems as
well, we teach ocfs2_zero_extend() to check every extent between
i_size and the zeroing position.  If the extent is unwritten, it is
ignored.  If it is refcounted, it is CoWed.  Then it is zeroed.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Cc: stable@kernel.org

5693486b

ocfs2: When zero extending, do it by page. · a4bfb4cf

由 Joel Becker 提交于 7月 06, 2010

ocfs2_zero_extend() does its zeroing block by block, but it calls a
function named ocfs2_write_zero_page().  Let's have
ocfs2_write_zero_page() handle the page level.  From
ocfs2_zero_extend()'s perspective, it is now page-at-a-time.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Cc: stable@kernel.org

a4bfb4cf

28 6月, 2010 1 次提交

ocfs2: update gfp/slab.h includes · 327f935a

由 Tejun Heo 提交于 3月 30, 2010

Implicit slab.h inclusion via percpu.h is about to go away.  Make sure
gfp.h or slab.h is included as necessary.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Joel Becker <joel.becker@oracle.com>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>

327f935a

17 6月, 2010 1 次提交

fix typos concerning "initiali[zs]e" · 421f91d2

由 Uwe Kleine-König 提交于 6月 11, 2010

Signed-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

421f91d2

16 6月, 2010 1 次提交

ocfs2: Limit default local alloc size within bitmap range. · 1739da40

由 Tao Ma 提交于 6月 09, 2010

In commit 6b82021b, we increase
our local alloc size and calculate how much megabytes we can
get according to group size and volume size.
But we also need to check the maximum bits a local alloc block
bitmap can have. With a bs=512, cs=32K, local volume with 160G,
it calculate 96MB while the maximum local alloc size is only
76M. So the bitmap will overflow and corrupt the system truncate
log file. See bug
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1262Signed-off-by: NTao Ma <tao.ma@oracle.com>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

1739da40

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功