提交 · d82a6f1d7e8b61ed5996334d0db66651bb43641d · openanolis / cloud-kernel

24 5月, 2011 9 次提交

Btrfs: kill BTRFS_I(inode)->block_group · d82a6f1d

由 Josef Bacik 提交于 5月 11, 2011

Originally this was going to be used as a way to give hints to the allocator,
but frankly we can get much better hints elsewhere and it's not even used at all
for anything usefull. In addition to be completely useless, when we initialize
an inode we try and find a freeish block group to set as the inodes block group,
and with a completely full 40gb fs this takes _forever_, so I imagine with say
1tb fs this is just unbearable. So just axe the thing altoghether, we don't
need it and it saves us 8 bytes in the inode and saves us 500 microseconds per
inode lookup in my testcase. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

d82a6f1d

Btrfs: don't look at the extent buffer level 3 times in a row · 7e2355ba

由 Josef Bacik 提交于 5月 11, 2011

We have a bit of debugging in btrfs_search_slot to make sure the level of the
cow block is the same as the original block we were cow'ing.  I don't think I've
ever seen this tripped, so kill it.  This saves us 2 kmap's per level in our
search.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

7e2355ba

Btrfs: map the node block when looking for readahead targets · cb25c2ea

由 Josef Bacik 提交于 5月 11, 2011

If we have particularly full nodes, we could call btrfs_node_blockptr up to 32
times, which is 32 pairs of kmap/kunmap, which _sucks_. So go ahead and map the
extent buffer while we look for readahead targets. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

cb25c2ea

Btrfs: set range_start to the right start in count_range_bits · af60bed2

由 Josef Bacik 提交于 5月 04, 2011

In count_range_bits we are adjusting total_bytes based on the range we are
searching for, but we don't adjust the range start according to the range we are
searching for, which makes for weird results.  For example, if the range

[0-8192]

is set DELALLOC, but I search for 4096-8192, I will get back 4096 for the number
of bytes found, but the range_start will be 0, which makes it look like the
range is [0-4096].  So instead set range_start = max(cur_start, state->start).
This makes everything come out right.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

af60bed2

Btrfs: fix how we do space reservation for truncate · fcb80c2a

由 Josef Bacik 提交于 5月 03, 2011

The ceph guys keep running into problems where we have space reserved in our
orphan block rsv when freeing it up. This is because they tend to do snapshots
alot, so their truncates tend to use a bunch of space, so when we go to do
things like update the inode we have to steal reservation space in order to make
the reservation happen. This happens because truncate can use as much space as
it freaking feels like, but we still have to hold space for removing the orphan
item and updating the inode, which will definitely always happen. So in order
to fix this we need to split all of the reservation stuf up. So with this patch
we have

1) The orphan block reserve which only holds the space for deleting our orphan
item when everything is over.

2) The truncate block reserve which gets allocated and used specifically for the
space that the truncate will use on a per truncate basis.

3) The transaction will always have 1 item's worth of data reserved so we can
update the inode normally.

Hopefully this will make the ceph problem go away. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

fcb80c2a

Btrfs: kill trans_mutex · a4abeea4

由 Josef Bacik 提交于 4月 11, 2011

We use trans_mutex for lots of things, here's a basic list

1) To serialize trans_handles joining the currently running transaction
2) To make sure that no new trans handles are started while we are committing
3) To protect the dead_roots list and the transaction lists

Really the serializing trans_handles joining is not too hard, and can really get
bogged down in acquiring a reference to the transaction. So replace the
trans_mutex with a trans_lock spinlock and use it to do the following

1) Protect fs_info->running_transaction. All trans handles have to do is check
this, and then take a reference of the transaction and keep on going.
2) Protect the fs_info->trans_list. This doesn't get used too much, basically
it just holds the current transactions, which will usually just be the currently
committing transaction and the currently running transaction at most.
3) Protect the dead roots list. This is only ever processed by splicing the
list so this is relatively simple.
4) Protect the fs_info->reloc_ctl stuff. This is very lightweight and was using
the trans_mutex before, so this is a pretty straightforward change.
5) Protect fs_info->no_trans_join. Because we don't hold the trans_lock over
the entirety of the commit we need to have a way to block new people from
creating a new transaction while we're doing our work. So we set no_trans_join
and in join_transaction we test to see if that is set, and if it is we do a
wait_on_commit.
6) Make the transaction use count atomic so we don't need to take locks to
modify it when we're dropping references.
7) Add a commit_lock to the transaction to make sure multiple people trying to
commit the same transaction don't race and commit at the same time.
8) Make open_ioctl_trans an atomic so we don't have to take any locks for ioctl
trans.

I have tested this with xfstests, but obviously it is a pretty hairy change so
lots of testing is greatly appreciated. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

a4abeea4

Btrfs: if we've already started a trans handle, use that one · 2a1eb461

由 Josef Bacik 提交于 4月 13, 2011

We currently track trans handles in current->journal_info, but we don't actually
use it. This patch fixes it. This will cover the case where we have multiple
people starting transactions down the call chain. This keeps us from having to
allocate a new handle and all of that, we just increase the use count of the
current handle, save the old block_rsv, and return. I tested this with xfstests
and it worked out fine. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

2a1eb461

Btrfs: take away the num_items argument from btrfs_join_transaction · 7a7eaa40

由 Josef Bacik 提交于 4月 13, 2011

I keep forgetting that btrfs_join_transaction() just ignores the num_items
argument, which leads me to sending pointless patches and looking stupid :). So
just kill the num_items argument from btrfs_join_transaction and
btrfs_start_ioctl_transaction, since neither of them use it. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

7a7eaa40

Btrfs: make sure to use the delalloc reserve when filling delalloc · 74b21075

由 Josef Bacik 提交于 4月 13, 2011

In the prealloc filling code and compressed code we don't set trans->block_rsv
to the delalloc block reserve properly, which is going to make us use metadata
from the wrong pool, this patch fixes that. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

74b21075

18 5月, 2011 4 次提交

configfs: Fix race between configfs_readdir() and configfs_d_iput() · 24307aa1

由 Joel Becker 提交于 5月 18, 2011

configfs_readdir() will use the existing inode numbers of inodes in the
dcache, but it makes them up for attribute files that aren't currently
instantiated.  There is a race where a closing attribute file can be
tearing down at the same time as configfs_readdir() is trying to get its
inode number.

We want to get the inode number of open attribute files, because they
should match while instantiated.  We can't lock down the transition
where dentry->d_inode is set to NULL, so we just check for NULL there.
We can, however, ensure that an inode we find isn't iput() in
configfs_d_iput() until after we've accessed it.
Signed-off-by: NJoel Becker <jlbec@evilplan.org>

24307aa1

configfs: Don't try to d_delete() negative dentries. · df7f9967

由 Joel Becker 提交于 2月 22, 2011

When configfs is faking mkdir() on its subsystem or default group
objects, it starts by adding a negative dentry.  It then tries to
instantiate the group.  If that should fail, it must clean up after
itself.

I was using d_delete() here, but configfs_attach_group() promises to
return an empty dentry on error.  d_delete() explodes with the entry
dentry.  Let's try d_drop() instead.  The unhashing is what we want for
our dentry.
Signed-off-by: NJoel Becker <jlbec@evilplan.org>

df7f9967

cifs: fix cifsConvertToUCS() for the mapchars case · 11379b5e

由 Jeff Layton 提交于 5月 17, 2011

As Metze pointed out, commit 84cdf74e broke mapchars option:

    Commit "cifs: fix unaligned accesses in cifsConvertToUCS"
    (84cdf74e) does multiple steps
    in just one commit (moving the function and changing it without
    testing).

    put_unaligned_le16(temp, &target[j]); is never called for any
    codepoint the goes via the 'default' switch statement. As a result
    we put just zero (or maybe uninitialized) bytes into the target
    buffer.

His proposed patch looks correct, but doesn't apply to the current head
of the tree. This patch should also fix it.

Cc: <stable@kernel.org> # .38.x: 581ade4d: cifs: clean up various nits in unicode routines (try #2)
Reported-by: NStefan Metzmacher <metze@samba.org>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

11379b5e

cifs: add fallback in is_path_accessible for old servers · 221d1d79

由 Jeff Layton 提交于 5月 17, 2011

The is_path_accessible check uses a QPathInfo call, which isn't
supported by ancient win9x era servers. Fall back to an older
SMBQueryInfo call if it fails with the magic error codes.

Cc: stable@kernel.org
Reported-and-Tested-by: NSandro Bonazzola <sandro.bonazzola@gmail.com>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

221d1d79

15 5月, 2011 5 次提交

Btrfs: fix FS_IOC_SETFLAGS ioctl · ebcb904d

由 Li Zefan 提交于 4月 15, 2011

Steps to reproduce the bug:

  - Call FS_IOC_SETLFAGS ioctl with flags=FS_COMPR_FL
  - Call FS_IOC_SETFLAGS ioctl with flags=0
  - Call FS_IOC_GETFLAGS ioctl, and you'll see FS_COMPR_FL is still set!
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ebcb904d

Btrfs: fix FS_IOC_GETFLAGS ioctl · d0092bdd

由 Li Zefan 提交于 4月 15, 2011

As we've added per file compression/cow support.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d0092bdd

fs: remove FS_COW_FL · e1e8fb6a

由 Li Zefan 提交于 4月 15, 2011

FS_COW_FL and FS_NOCOW_FL were newly introduced to control per file
COW in btrfs, but FS_NOCOW_FL is sufficient.

The fact is we don't have corresponding BTRFS_INODE_COW flag.

COW is default, and FS_NOCOW_FL can be used to switch off COW for
a single file.

If we mount btrfs with nodatacow, a newly created file will be set with
the FS_NOCOW_FL flag. So to turn on COW for it, we can just clear the
FS_NOCOW_FL flag.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e1e8fb6a

Btrfs: fix easily get into ENOSPC in mixed case · 1aba86d6

由 liubo 提交于 4月 08, 2011

When a btrfs disk is created by mixed data & metadata option, it will have no
pure data or pure metadata space info.

In btrfs's for-linus branch, commit 78b1ea13838039cd88afdd62519b40b344d6c920
(Btrfs: fix OOPS of empty filesystem after balance) initializes space infos at
the very beginning.  The problem is this initialization does not take the mixed
case into account, which will cause btrfs will easily get into ENOSPC in mixed
case.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1aba86d6

Prevent oopsing in posix_acl_valid() · f5de9391

由 Daniel J Blueman 提交于 5月 03, 2011

If posix_acl_from_xattr() returns an error code, a negative address is
dereferenced causing an oops; fix by checking for error code first.
Signed-off-by: NDaniel J Blueman <daniel.blueman@gmail.com>
Reviewed-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f5de9391

14 5月, 2011 8 次提交

vfs: micro-optimize acl_permission_check() · 26cf46be

由 Linus Torvalds 提交于 5月 13, 2011

It's a hot function, and we're better off not mixing types in the mask
calculations.  The compiler just ends up mixing 16-bit and 32-bit
operations, for no good reason.

So do everything in 'unsigned int' rather than mixing 'unsigned int'
masking with a 'umode_t' (16-bit) mode variable.

This, together with the parent commit (47a150ed: "Cache user_ns in
struct cred") makes acl_permission_check() much nicer.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

26cf46be

ocfs2/dlm: Target node death during resource migration leads to thread spin · df016c66

由 Sunil Mushran 提交于 5月 04, 2011

During resource migration, if the target node were to die, the thread doing
the migration spins until the target node is not removed from the domain map.
This patch slows the spin by making the thread wait for the recovery to kick in.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <jlbec@evilplan.org>

df016c66

ocfs2: Skip mount recovery for hard-ro mounts · 10b3dd76

由 Sunil Mushran 提交于 5月 04, 2011

Patch skips mount recovery for hard-ro mounts which otherwise leads to an oops.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <jlbec@evilplan.org>

10b3dd76

ocfs2/cluster: Heartbeat mismatch message improved · 33c12a54

由 Sunil Mushran 提交于 5月 04, 2011

If o2hb finds unexpected values in the heartbeat slot, it prints a message
"ERROR: Device "dm-6": another node is heartbeating in our slot!"

This message could be misleading. This patch adds two more messages to
help users better diagnose the problem.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <jlbec@evilplan.org>

33c12a54

ocfs2/cluster: Increase the live threshold for global heartbeat · 76d9fc29

由 Sunil Mushran 提交于 5月 04, 2011

We have seen isolated cases (very few, I might add) of o2hb not detecting all
live nodes on startup. One plausible reasoning for it is that other node had
a hb io delay at the same time. The live threshold set at 2 (as low as it can
be) could be increased to ameliorate the situation.

But increasing the threshold directly affects mount time. Currently it takes
around 5 secs to mount a volume in o2cb cluster with local heartbeat. Increasing
the threshold will make mounts even slower. As the issue itself is rare, we have
left things as they are for the local heartbeat mode.

However we can improve the situation for global heartbeat mode as in that mode,
we start the heartbeat much before the mount.

This patch doubles the live threshold for the start of the first region in
global heartbeat mode.

Addresses internal Oracle bug#10635585.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <jlbec@evilplan.org>

76d9fc29

ocfs2/dlm: Use negotiated o2dlm protocol version · 4da6dc29

由 Sunil Mushran 提交于 5月 04, 2011

Patch fixes a bug in the o2dlm protocol negotiation in that it is using
the builtin version rather than the negotiated version during the domain
join. This causes join errors when a node having kernel >= 2.6.37 joins
a cluster with nodes having kernels < 2.6.37.

This only affects the o2cb cluster stack.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Reported-by: NJacek Stepniewski <Jacek.Stepniewski@agora.pl>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <jlbec@evilplan.org>

4da6dc29

ocfs2: skip existing hole when removing the last extent_rec in punching-hole codes. · 9a790ba1

由 Tristan Ye 提交于 5月 12, 2011

In the case of removing a partial extent record which covers a hole, current
punching-hole logic will try to remove more than the length of whole extent
record, which leads to the failure of following assert(fs/ocfs2/alloc.c):

5507 BUG_ON(cpos < le32_to_cpu(rec->e_cpos) || trunc_range > rec_range);

This patch tries to skip existing hole at the last attempt of removing a partial
extent record, what's more, it also adds some necessary comments for better
understanding of punching-hole codes.
Signed-off-by: NTristan Ye <tristan.ye@oracle.com>
Signed-off-by: NJoel Becker <jlbec@evilplan.org>

9a790ba1

ocfs2: Initialize data_ac (might be used uninitialized) · 5d44670f

由 Marcus Meissner 提交于 5月 05, 2011

CLANG found that there is a path that has data_ac uninitialized,
this place
	2917	/* This gets us the dx_root */
	2918	ret = ocfs2_reserve_new_metadata_blocks(osb, 1, &meta_ac);
	2919	if (ret) {

	3
		Taking true branch
	2920	mlog_errno(ret);
	2921	goto out;

	4
		Control jumps to line 3168
	2922	}

Goes to the out: label without data_ac being initialized.

Ciao, Marcus
Signed-Off-By: NMarcus Meissner <meissner@suse.de>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <jlbec@evilplan.org>

5d44670f

12 5月, 2011 6 次提交

NFSv4.1: Ensure that layoutget uses the correct gfp modes · a75b9df9

由 Trond Myklebust 提交于 5月 11, 2011

Currently, writebacks may end up recursing back into the filesystem due to
GFP_KERNEL direct reclaims in the pnfs subsystem.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

a75b9df9

NFSv4.1: remove pnfs_layout_hdr from pnfs_destroy_all_layouts tmp_list · 2887fe45

由 Andy Adamson 提交于 5月 11, 2011

Prevents an infinite loop as list was never emptied.
Signed-off-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

2887fe45

NFSv41: Resend on NFS4ERR_RETRY_UNCACHED_REP · a8a4ae3a

由 Andy Adamson 提交于 5月 03, 2011

Free the slot and resend the RPC with new session <slot#,seq#>.

For nfs4_async_handle_error, return -EAGAIN and set the task->tk_status to 0
to restart the async rpc in the rpc_restart_call_prepare state which resets
the slot.

For nfs4_handle_exception, retrying a call that uses nfs4_call_sync will
reset the slot via nfs41_call_sync_prepare.

For open/close/lock/locku/delegreturn/layoutcommit/unlink/rename/write
cachethis is true, so these operations will not trigger an
NFS4ERR_RETRY_UNCACHED_REP.
Signed-off-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

a8a4ae3a

ceph: do not use i_wrbuffer_ref as refcount for Fb cap · d3d0720d

由 Henry C Chang 提交于 5月 11, 2011

We increments i_wrbuffer_ref when taking the Fb cap. This breaks
the dirty page accounting and causes looping in
__ceph_do_pending_vmtruncate, and ceph client hangs.

This bug can be reproduced occasionally by running blogbench.

Add a new field i_wb_ref to inode and dedicate it to Fb reference
counting.
Signed-off-by: NHenry C Chang <henry.cy.chang@gmail.com>
Signed-off-by: NSage Weil <sage@newdream.net>

d3d0720d

ceph: fix list_add in ceph_put_snap_realm · a26a185d

由 Henry C Chang 提交于 5月 11, 2011

Signed-off-by: NHenry C Chang <henry.cy.chang@gmail.com>
Signed-off-by: NSage Weil <sage@newdream.net>

a26a185d

ceph: print debug message before put mds session · 7d8e18a6

由 Henry C Chang 提交于 5月 11, 2011

The mds session, s, could be freed during ceph_put_mds_session.
Move dout before ceph_put_mds_session.
Signed-off-by: NHenry C Chang <henry.cy.chang@gmail.com>
Signed-off-by: NSage Weil <sage@newdream.net>

7d8e18a6

10 5月, 2011 8 次提交

fuse: fix oops in revalidate when called with NULL nameidata · d2433905

由 Miklos Szeredi 提交于 5月 10, 2011

Some cases (e.g. ecryptfs) can call ->dentry_revalidate with NULL
nameidata.

https://bugzilla.kernel.org/show_bug.cgi?id=34732

Tyler Hicks pointed out that this bug was introduced by commit
e7c0a167 "fuse: make fuse_dentry_revalidate() RCU aware"
Reported-by: NWitold Baryluk <baryluk@smp.if.uj.edu.pl>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

d2433905

nilfs2: fix infinite loop in nilfs_palloc_freev function · 349dbc36

由 Ryusuke Konishi 提交于 5月 10, 2011

After having applied commit 9954e7af ("nilfs2: add free
entries count only if clear bit operation succeeded"), a free routine
of nilfs came to fall into an infinite loop, outputting the same
message endlessly:

 nilfs_palloc_freev: entry number 29497 already freed
 nilfs_palloc_freev: entry number 29497 already freed
 nilfs_palloc_freev: entry number 29497 already freed
 nilfs_palloc_freev: entry number 29497 already freed
 nilfs_palloc_freev: entry number 29497 already freed ...

That patch broke the routine so that a loop counter is never updated
in an abnormal state.  This fixes the regression.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

349dbc36

xfs: fix race condition in AIL push trigger · 7ac95657