提交 · 13ae246db4a02971ef4f557af1f6d3e21d64b710 · openeuler / Kernel

26 2月, 2012 1 次提交

autofs: work around unhappy compat problem on x86-64 · a32744d4

由 Ian Kent 提交于 2月 22, 2012

When the autofs protocol version 5 packet type was added in commit
5c0a32fc ("autofs4: add new packet type for v5 communications"), it
obvously tried quite hard to be word-size agnostic, and uses explicitly
sized fields that are all correctly aligned.

However, with the final "char name[NAME_MAX+1]" array at the end, the
actual size of the structure ends up being not very well defined:
because the struct isn't marked 'packed', doing a "sizeof()" on it will
align the size of the struct up to the biggest alignment of the members
it has.

And despite all the members being the same, the alignment of them is
different: a "__u64" has 4-byte alignment on x86-32, but native 8-byte
alignment on x86-64. And while 'NAME_MAX+1' ends up being a nice round
number (256), the name[] array starts out a 4-byte aligned.

End result: the "packed" size of the structure is 300 bytes: 4-byte, but
not 8-byte aligned.

As a result, despite all the fields being in the same place on all
architectures, sizeof() will round up that size to 304 bytes on
architectures that have 8-byte alignment for u64.

Note that this is *not* a problem for 32-bit compat mode on POWER, since
there __u64 is 8-byte aligned even in 32-bit mode. But on x86, 32-bit
and 64-bit alignment is different for 64-bit entities, and as a result
the structure that has exactly the same layout has different sizes.

So on x86-64, but no other architecture, we will just subtract 4 from
the size of the structure when running in a compat task. That way we
will write the properly sized packet that user mode expects.

Not pretty. Sadly, this very subtle, and unnecessary, size difference
has been encoded in user space that wants to read packets of *exactly*
the right size, and will refuse to touch anything else.
Reported-and-tested-by: NThomas Meyer <thomas@m3y3r.de>
Signed-off-by: NIan Kent <raven@themaw.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a32744d4

25 2月, 2012 2 次提交

epoll: ep_unregister_pollwait() can use the freed pwq->whead · 971316f0

由 Oleg Nesterov 提交于 2月 24, 2012

signalfd_cleanup() ensures that ->signalfd_wqh is not used, but
this is not enough. eppoll_entry->whead still points to the memory
we are going to free, ep_unregister_pollwait()->remove_wait_queue()
is obviously unsafe.

Change ep_poll_callback(POLLFREE) to set eppoll_entry->whead = NULL,
change ep_unregister_pollwait() to check pwq->whead != NULL under
rcu_read_lock() before remove_wait_queue(). We add the new helper,
ep_remove_wait_queue(), for this.

This works because sighand_cachep is SLAB_DESTROY_BY_RCU and because
->signalfd_wqh is initialized in sighand_ctor(), not in copy_sighand.
ep_unregister_pollwait()->remove_wait_queue() can play with already
freed and potentially reused ->sighand, but this is fine. This memory
must have the valid ->signalfd_wqh until rcu_read_unlock().
Reported-by: NMaxime Bizon <mbizon@freebox.fr>
Cc: <stable@kernel.org>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

971316f0

epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree() · d80e731e

由 Oleg Nesterov 提交于 2月 24, 2012

This patch is intentionally incomplete to simplify the review.
It ignores ep_unregister_pollwait() which plays with the same wqh.
See the next change.

epoll assumes that the EPOLL_CTL_ADD'ed file controls everything
f_op->poll() needs. In particular it assumes that the wait queue
can't go away until eventpoll_release(). This is not true in case
of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand
which is not connected to the file.

This patch adds the special event, POLLFREE, currently only for
epoll. It expects that init_poll_funcptr()'ed hook should do the
necessary cleanup. Perhaps it should be defined as EPOLLFREE in
eventpoll.

__cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if
->signalfd_wqh is not empty, we add the new signalfd_cleanup()
helper.

ep_poll_callback(POLLFREE) simply does list_del_init(task_list).
This make this poll entry inconsistent, but we don't care. If you
share epoll fd which contains our sigfd with another process you
should blame yourself. signalfd is "really special". I simply do
not know how we can define the "right" semantics if it used with
epoll.

The main problem is, epoll calls signalfd_poll() once to establish
the connection with the wait queue, after that signalfd_poll(NULL)
returns the different/inconsistent results depending on who does
EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd
has nothing to do with the file, it works with the current thread.

In short: this patch is the hack which tries to fix the symptoms.
It also assumes that nobody can take tasklist_lock under epoll
locks, this seems to be true.

Note:

	- we do not have wake_up_all_poll() but wake_up_poll()
	  is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE.

	- signalfd_cleanup() uses POLLHUP along with POLLFREE,
	  we need a couple of simple changes in eventpoll.c to
	  make sure it can't be "lost".
Reported-by: NMaxime Bizon <mbizon@freebox.fr>
Cc: <stable@kernel.org>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d80e731e

24 2月, 2012 2 次提交

Btrfs: fix compiler warnings on 32 bit systems · e77266e4

由 Chris Mason 提交于 2月 24, 2012

The enospc tracing code added some interesting uses of
u64 pointer casts.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e77266e4

Restore direct_io / truncate locking API · 37fbf4bf

由 Anton Altaparmakov 提交于 2月 23, 2012

With kernel 3.1, Christoph removed i_alloc_sem and replaced it with
calls (namely inode_dio_wait() and inode_dio_done()) which are
EXPORT_SYMBOL_GPL() thus they cannot be used by non-GPL file systems and
further inode_dio_wait() was pushed from notify_change() into the file
system ->setattr() method but no non-GPL file system can make this call.

That means non-GPL file systems cannot exist any more unless they do not
use any VFS functionality related to reading/writing as far as I can
tell or at least as long as they want to implement direct i/o.

Both Linus and Al (and others) have said on LKML that this breakage of
the VFS API should not have happened and that the change was simply
missed as it was not documented in the change logs of the patches that
did those changes.

This patch changes the two function exports in question to be
EXPORT_SYMBOL() thus restoring the VFS API as it used to be - accessible
for all modules.

Christoph, who introduced the two functions and exported them GPL-only
is CC-ed on this patch to give him the opportunity to object to the
symbols being changed in this manner if he did indeed intend them to be
GPL-only and does not want them to become available to all modules.
Signed-off-by: NAnton Altaparmakov <anton@tuxera.com>
CC: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

37fbf4bf

23 2月, 2012 5 次提交

Btrfs: increase the global block reserve estimates · 5500cdbe

由 Liu Bo 提交于 2月 23, 2012

When doing IO with large amounts of data fragmentation, the global block
reserve calulations are too low.  This increases them to avoid
ENOSPC crashes.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

5500cdbe

Btrfs: clear the extent uptodate bits during parent transid failures · 50653190

由 Chris Mason 提交于 2月 22, 2012

If btrfs reads a block and finds a parent transid mismatch, it clears
the uptodate flags on the extent buffer, and the pages inside it.  But
we only clear the uptodate bits in the state tree if the block straddles
more than one page.

This is from an old optimization from to reduce contention on the extent
state tree.  But it is buggy because the code that retries a read from
a different copy of the block is going to find the uptodate state bits
set and skip the IO.

The end result of the bug is that we'll never actually read the good
copy (if there is one).

The fix here is to always clear the uptodate state bits, which is safe
because this code is only called when the parent transid fails.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

50653190

C
Btrfs: add extra sanity checks on the path names in btrfs_mksubvol · 16780cab
由 Chris Mason 提交于 2月 20, 2012
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
16780cab

Btrfs: make sure we update latest_bdev · a6b0d5c8

由 Chris Mason 提交于 2月 20, 2012

When we are setting up the mount, we close all the
devices that were not actually part of the metadata we found.

But, we don't make sure that one of those devices wasn't
fs_devices->latest_bdev, which means we can do a use after free
on the one we closed.

This updates latest_bdev as it goes.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a6b0d5c8

Btrfs: improve error handling for btrfs_insert_dir_item callers · fe66a05a

由 Chris Mason 提交于 2月 20, 2012

This allows us to gracefully continue if we aren't able to insert
directory items, both for normal files/dirs and snapshots.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

fe66a05a

22 2月, 2012 3 次提交

sys_poll: fix incorrect type for 'timeout' parameter · faf30900

由 Linus Torvalds 提交于 2月 21, 2012

The 'poll()' system call timeout parameter is supposed to be 'int', not
'long'.

Now, the reason this matters is that right now 32-bit compat mode is
broken on at least x86-64, because the 32-bit code just calls
'sys_poll()' directly on x86-64, and the 32-bit argument will have been
zero-extended, turning a signed 'int' into a large unsigned 'long'
value.

We could just introduce a 'compat_sys_poll()' function for this, and
that may eventually be what we have to do, but since the actual standard
poll() semantics is *supposed* to be 'int', and since at least on x86-64
glibc sign-extends the argument before invocing the system call (so
nobody can actually use a 64-bit timeout value in user space _anyway_,
even in 64-bit binaries), the simpler solution would seem to be to just
fix the definition of the system call to match what it should have been
from the very start.

If it turns out that somebody somehow circumvents the user-level libc
64-bit sign extension and actually uses a large unsigned 64-bit timeout
despite that not being how poll() is supposed to work, we will need to
do the compat_sys_poll() approach.
Reported-by: NThomas Meyer <thomas@m3y3r.de>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

faf30900

xfs: make inode quota check more general · c922bbc8

由 Mitsuo Hayasaka 提交于 2月 06, 2012

The xfs checks quota when reserving disk blocks and inodes. In the block
reservation, it checks if the total number of blocks including current
usage and new reservation exceed quota. In the inode reservation,
it checks using the total number of inodes including only current usage
without new reservation. However, this inode quota check works well
since the caller of xfs_trans_dquot() always sets the argument of the
number of new inode reservation to 1 or 0 and inode is reserved one by
one in current xfs.

To make it more general, this patch changes it to the same way as the
block quota check.
Signed-off-by: NMitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: Ben Myers <bpm@sgi.com>
Cc: Alex Elder <elder@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBen Myers <bpm@sgi.com>

c922bbc8

xfs: change available ranges of softlimit and hardlimit in quota check · 20f12d8a

由 Mitsuo Hayasaka 提交于 2月 06, 2012

In general, quota allows us to use disk blocks and inodes up to each
limit, that is, they are available if they don't exceed their limitations.
Current xfs sets their available ranges to lower than them except disk
inode quota check. So, this patch changes the ranges to not beyond them.
Signed-off-by: NMitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: Ben Myers <bpm@sgi.com>
Cc: Alex Elder <elder@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

20f12d8a

21 2月, 2012 1 次提交

Btrfs: be less strict on finding next node in clear_extent_bit · 692e5759

由 Liu Bo 提交于 2月 16, 2012

In clear_extent_bit, it is enough that next node is adjacent in tree level.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>

692e5759

18 2月, 2012 2 次提交

NFSv4: fix server_scope memory leak · abe9a6d5

由 Weston Andros Adamson 提交于 2月 16, 2012

server_scope would never be freed if nfs4_check_cl_exchange_flags() returned
non-zero
Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

abe9a6d5

NFSv4.1: Fix a NFSv4.1 session initialisation regression · f86f36a6

由 Trond Myklebust 提交于 2月 14, 2012

Commit aacd5537 (NFSv4.1: cleanup init and reset of session slot tables)
introduces a regression in the session initialisation code. New tables
now find their sequence ids initialised to 0, rather than the mandated
value of 1 (see RFC5661).

Fix the problem by merging nfs4_reset_slot_table() and nfs4_init_slot_table().
Since the tbl->max_slots is initialised to 0, the test in
nfs4_reset_slot_table for max_reqs != tbl->max_slots will automatically
pass for an empty table.
Reported-by: NVitaliy Gusev <gusev.vitaliy@nexenta.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

f86f36a6

17 2月, 2012 9 次提交

C
ecryptfs: remove the second argument of k[un]map_atomic() · 465c9343
由 Cong Wang 提交于 2月 10, 2012
```
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
```
465c9343

eCryptfs: Copy up lower inode attrs after setting lower xattr · 545d6809

由 Tyler Hicks 提交于 2月 07, 2012

After passing through a ->setxattr() call, eCryptfs needs to copy the
inode attributes from the lower inode to the eCryptfs inode, as they
may have changed in the lower filesystem's ->setxattr() path.

One example is if an extended attribute containing a POSIX Access
Control List is being set. The new ACL may cause the lower filesystem to
modify the mode of the lower inode and the eCryptfs inode would need to
be updated to reflect the new mode.

https://launchpad.net/bugs/926292Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
Reported-by: NSebastien Bacher <seb128@ubuntu.com>
Cc: John Johansen <john.johansen@canonical.com>
Cc: <stable@vger.kernel.org>

545d6809

eCryptfs: Improve statfs reporting · 4a26620d

由 Tyler Hicks 提交于 11月 05, 2011

statfs() calls on eCryptfs files returned the wrong filesystem type and,
when using filename encryption, the wrong maximum filename length.

If mount-wide filename encryption is enabled, the cipher block size and
the lower filesystem's max filename length will determine the max
eCryptfs filename length. Pre-tested, known good lengths are used when
the lower filesystem's namelen is 255 and a cipher with 8 or 16 byte
block sizes is used. In other, less common cases, we fall back to a safe
rounded-down estimate when determining the eCryptfs namelen.

https://launchpad.net/bugs/885744Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
Reported-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NJohn Johansen <john.johansen@canonical.com>

4a26620d

Btrfs: fix a bug on overcommit stuff · d9b0218f

由 Liu Bo 提交于 2月 16, 2012

When overcommitting, we should check the sum of pinned space and
bytes for delayed item.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>

d9b0218f

Btrfs: kick out redundant stuff in convert_extent_bit · 9d47c767

由 Liu Bo 提交于 2月 16, 2012

clear_state_bit will do merge_state for us, so kick out the redundant one.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>

9d47c767

Btrfs: skip states when they does not contain bits to clear · 0449314a

由 Liu Bo 提交于 2月 16, 2012

Clearing a range's bits is different with setting them, since we don't
need to touch them when states do not contain bits we want.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>

0449314a

T
Btrfs: check return value of lookup_extent_mapping() correctly · 285190d9
由 Tsutomu Itoh 提交于 2月 16, 2012
```
This patch corrects error checking of lookup_extent_mapping().
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
```
285190d9

Btrfs: fix deadlock on page lock when doing auto-defragment · 600a45e1

由 Miao Xie 提交于 2月 16, 2012

When I ran xfstests circularly on a auto-defragment btrfs, the deadlock
happened.

Steps to reproduce:
[tty0]
 # export MOUNT_OPTIONS="-o autodefrag"
 # export TEST_DEV=<partition1>
 # export TEST_DIR=<mountpoint1>
 # export SCRATCH_DEV=<partition2>
 # export SCRATCH_MNT=<mountpoint2>
 # while [ 1 ]
 > do
 > ./check 091 127 263
 > sleep 1
 > done
[tty1]
 # while [ 1 ]
 > do
 > echo 3 > /proc/sys/vm/drop_caches
 > done

Several hours later, the test processes will hang on, and the deadlock will
happen on page lock.

The reason is that:
  Auto defrag task		Flush thread			Test task
				btrfs_writepages()
				  add ordered extent
				  (including page 1, 2)
				  set page 1 writeback
				  set page 2 writeback
				endio_fn()
				  end page 2 writeback
								release page 2
lock page 1
alloc and lock page 2
page 2 is not uptodate
  btrfs_readpage()
    start ordered extent()
    btrfs_writepages()
      try  to lock page 1

so deadlock happens.

Fix this bug by unlocking the page which is in writeback, and re-locking it
after the writeback end.
Signed-off-by: NMiao Xie <miax@cn.fujitsu.com>

600a45e1

Btrfs: fix return value check of extent_io_ops · 013bd4c3

由 Tsutomu Itoh 提交于 2月 16, 2012

This patch adds the check on the return value of extent_io_ops.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>

013bd4c3

16 2月, 2012 1 次提交
- F
  btrfs: honor umask when creating subvol root · 12fc9d09
  由 Florian Albrechtskirchinger 提交于 2月 10, 2012
```
Set the subvol root inode permissions based on the current umask.
```
  12fc9d09
15 2月, 2012 9 次提交

btrfs: silence warning in raid array setup · 8a334426

由 David Sterba 提交于 10月 07, 2011

Raid array setup code creates an extent buffer in an usual way. When the
PAGE_CACHE_SIZE is > super block size, the extent pages are not marked
up-to-date, which triggers a WARN_ON in the following
write_extent_buffer call. Add an explicit up-to-date call to silence the
warning.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

8a334426

btrfs: fix structs where bitfields and spinlock/atomic share 8B word · c08782da

由 David Sterba 提交于 1月 26, 2012

On ia64, powerpc64 and sparc64 the bitfield is modified through a RMW cycle and current
gcc rewrites the adjacent 4B word, which in case of a spinlock or atomic has
disaterous effect.

https://lkml.org/lkml/2012/2/1/220Signed-off-by: NDavid Sterba <dsterba@suse.cz>

c08782da

btrfs: delalloc for page dirtied out-of-band in fixup worker · 87826df0

由 Jeff Mahoney 提交于 2月 15, 2012

 We encountered an issue that was easily observable on s/390 systems but
 could really happen anywhere. The timing just seemed to hit reliably
 on s/390 with limited memory.

 The gist is that when an unexpected set_page_dirty() happened, we'd
 run into the BUG() in btrfs_writepage_fixup_worker since it wasn't
 properly set up for delalloc.

 This patch does the following:
 - Performs the missing delalloc in the fixup worker
 - Allow the start hook to return -EBUSY which informs __extent_writepage
   that it should mark the page skipped and not to redirty it. This is
   required since the fixup worker can fail with -ENOSPC and the page
   will have already been redirtied. That causes an Oops in
   drop_outstanding_extents later. Retrying the fixup worker could
   lead to an infinite loop. Deferring the page redirty also saves us
   some cycles since the page would be stuck in a resubmit-redirty loop
   until the fixup worker completes. It's not harmful, just wasteful.
 - If the fixup worker fails, we mark the page and mapping as errored,
   and end the writeback, similar to what we would do had the page
   actually been submitted to writeback.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>

87826df0

Btrfs: fix memory leak in load_free_space_cache() · a7e221e9

由 Tsutomu Itoh 提交于 2月 14, 2012

load_free_space_cache() has forgotten to free path.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>

a7e221e9

btrfs: don't check DUP chunks twice · 859acaf1

由 Arne Jansen 提交于 2月 09, 2012

Because scrub enumerates the dev extent tree to find the chunks to scrub,
it currently finds each DUP chunk twice and also scrubs it twice. This
patch makes sure that scrub_chunk only checks that part of the chunk the
dev extent has been found for. This only changes the behaviour for DUP
chunks.
Reported-and-tested-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NArne Jansen <sensille@gmx.net>

859acaf1

Btrfs: fix trim 0 bytes after a device delete · 2cac13e4

由 Liu Bo 提交于 2月 09, 2012

A user reported a bug of btrfs's trim, that is we will trim 0 bytes
after a device delete.

The reproducer:

$ mkfs.btrfs disk1
$ mkfs.btrfs disk2
$ mount disk1 /mnt
$ fstrim -v /mnt
$ btrfs device add disk2 /mnt
$ btrfs device del disk1 /mnt
$ fstrim -v /mnt

This is because after we delete the device, the block group may start from
a non-zero place, which will confuse trim to discard nothing.
Reported-by: NLutz Euler <lutz.euler@freenet.de>
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>

2cac13e4

Btrfs: return the internal error unchanged if btrfs_get_extent_fiemap() call... · 6af021d8

由 Jeff Liu 提交于 2月 09, 2012

Btrfs: return the internal error unchanged if btrfs_get_extent_fiemap() call failed for SEEK_DATA/SEEK_HOLE inquiry

Given that ENXIO only means "offset beyond EOF" for either SEEK_DATA or SEEK_HOLE inquiry
in a desired file range, so we should return the internal error unchanged if btrfs_get_extent_fiemap()
call failed, rather than ENXIO.

Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: NJie Liu <jeff.liu@oracle.com>

6af021d8

Btrfs: avoid positive number with ERR_PTR · 8f24b496

由 Jan Schmidt 提交于 2月 08, 2012

inode_ref_info() returns 1 when the element wasn't found and < 0 on error,
just like btrfs_search_slot(). In iref_to_path() it's an error when the
inode ref can't be found, thus we return ERR_PTR(ret) in that case. In order
to avoid ERR_PTR(1), we now set ret to -ENOENT in that case.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

8f24b496

btrfs: Sector Size check during Mount · 941b2ddf

由 Keith Mannthey 提交于 11月 29, 2011

Gracefully fail when trying to mount a BTRFS file system that has a
sectorsize smaller than PAGE_SIZE.

On PPC it is possible to build a FS while using a 4k PAGE_SIZE kernel
then boot into a 64K PAGE_SIZE kernel.  Presently open_ctree fails in an
endless loop and hangs the machine in this situation.

My debugging has show this Sector size < Page size to be a non trivial
situation and a graceful exit from the situation would be nice for the
time being.
Signed-off-by: NKeith Mannthey <kmannth@us.ibm.com>

941b2ddf

14 2月, 2012 5 次提交

ocfs2: deal with wraparounds of i_nlink in ocfs2_rename() · 847c9db5

由 Al Viro 提交于 2月 12, 2012

unfortunately, nlink_t may be smaller than 32 bits and ->i_nlink
on ocfs2 can grow up to 0xffffffff; storing it in nlink_t variable
will lose upper bits on such architectures.  Needs to be made u32,
until we get kernel-side nlink_t uniformly 32bit...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

847c9db5

vfs: fix compat_sys_stat() handling of overflows in st_nlink · fcf83067

由 Al Viro 提交于 2月 12, 2012

Massaged cp_compat_stat() into form closer to cp_new_stat(); the only
real issue had been in handling of st_nlink overflows - native 32bit
stat(2) returns -EOVERFLOW in such situations, compat one silently
loses upper bits.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fcf83067

quota: Fix deadlock with suspend and quotas · dcdbed85

由 Jan Kara 提交于 2月 10, 2012

This script causes a kernel deadlock:
set -e
DEVICE=/dev/vg1/linear
lvchange -ay $DEVICE
mkfs.ext3 $DEVICE
mount -t ext3 -o usrquota,grpquota $DEVICE /mnt/test
quotacheck -gu /mnt/test
umount /mnt/test
mount -t ext3 -o usrquota,grpquota $DEVICE /mnt/test
quotaon /mnt/test
dmsetup suspend $DEVICE
setquota -u root 1 2 3 4 /mnt/test &
sleep 1
dmsetup resume $DEVICE

setquota acquired semaphore s_umount for read and then tried to perform a
transaction (and waits because the device is suspended).  dmsetup resume tries
to acquire s_umount for write before resuming the device (and waits for
setquota).

Fix the deadlock by grabbing a thawed superblock for quota commands which need
it.
Reported-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

dcdbed85

vfs: Provide function to get superblock and wait for it to thaw · 6b6dc836

由 Jan Kara 提交于 2月 10, 2012

In quota code we need to find a superblock corresponding to a device and wait
for superblock to be unfrozen. However this waiting has to happen without
s_umount semaphore because that is required for superblock to thaw. So provide
a function in VFS for this to keep dances with s_umount where they belong.

[AV: implementation switched to saner variant]
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6b6dc836

vfs: fix panic in __d_lookup() with high dentry hashtable counts · 074b8517

由 Dimitri Sivanich 提交于 2月 08, 2012

When the number of dentry cache hash table entries gets too high
(2147483648 entries), as happens by default on a 16TB system, use of a
signed integer in the dcache_init() initialization loop prevents the
dentry_hashtable from getting initialized, causing a panic in
__d_lookup().  Fix this in dcache_init() and similar areas.
Signed-off-by: NDimitri Sivanich <sivanich@sgi.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

074b8517

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功