提交 · ccdb357ccb77cc4cbe4f7abee9efd19957f0753a · openeuler / Kernel

25 2月, 2010 1 次提交

NFSD: NFSv4 callback client should use RPC_TASK_SOFTCONN · 58255a4e

由 Chuck Lever 提交于 2月 24, 2010

The server's callback client should stop trying to connect to the
client's callback server as soon as it gets ECONNREFUSED.

The NFS server's callback client does not call rpc_ping(), but appears
to have it's own "ping" procedure, so it wasn't covered by commit
caabea8a.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

58255a4e

21 2月, 2010 2 次提交

xfs_export_operations.commit_metadata · 978ebd97

由 Ben Myers 提交于 2月 17, 2010

This is the commit_metadata export operation for XFS.

- Takes one inode to be committed.

- Forces the log up to the lsn of the inode.

- Doesn't force the log if the inode doesn't have a pincount.
Signed-off-by: NBen Myers <bpm@sgi.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <david@fromorbit.com>
[bfields@citi.umich.edu: trivial whitespace fix]
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

978ebd97

commit_metadata export operation replacing nfsd_sync_dir · f501912a

由 Ben Myers 提交于 2月 17, 2010

- Add commit_metadata export_operation to allow the underlying filesystem to
decide how to commit an inode most efficiently.

- Usage of nfsd_sync_dir and write_inode_now has been replaced with the
commit_metadata function that takes a svc_fh.

- The commit_metadata function calls the commit_metadata export_op if it's
there, or else falls back to sync_inode instead of fsync and write_inode_now
because only metadata need be synced here.

- nfsd4_sync_rec_dir now uses vfs_fsync so that commit_metadata can be static
Signed-off-by: NBen Myers <bpm@sgi.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

f501912a

09 2月, 2010 2 次提交

lockd: don't clear sm_monitored on nsm_reboot_lookup · 7e469af9

由 Jeff Layton 提交于 2月 05, 2010

When lockd gets a notify downcall from statd, it'll search its hosts
cache and then clear the sm_monitored bit on the host it finds. The idea
is apparently to make lockd redo a SM_MON on the next lock request.

This is unnecessary and causes the kernel's NSM cache to go out of sync
with statd. statd doesn't stop monitoring a host when it gets a
SM_NOTIFY and there's no guarantee that another lock will occur after
the reclaim and before the unmount. In that event, no SM_UNMON will
occur.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

7e469af9

lockd: release reference to nsm_handle in nlm_host_rebooted · cdd30fa1

由 Jeff Layton 提交于 2月 05, 2010

nsm_reboot_lookup takes a reference to the nsm_handle that it returns,
but nlm_host_rebooted never releases that reference.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

cdd30fa1

30 1月, 2010 1 次提交

nfsd: Use vfs_fsync_range() in nfsd_commit · aa696a6f

由 Trond Myklebust 提交于 1月 29, 2010

The NFS COMMIT operation allows the client to specify the exact byte range
that it wishes to sync to disk in order to optimise server performance.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

aa696a6f

28 1月, 2010 1 次提交

NFSD: Create PF_INET6 listener in write_ports · 37498292

由 Chuck Lever 提交于 1月 26, 2010

Try to create a PF_INET6 listener for NFSD, if IPv6 is enabled in the
kernel.

Make sure nfsd_serv's reference count is decreased if
__write_ports_addxprt() failed to create a listener.  See
__write_ports_addfd().

Our current plan is to rely on rpc.nfsd to create appropriate IPv6
listeners when server-side NFS/IPv6 support is desired.  Legacy
behavior, via the write_threads or write_svc kernel APIs, will remain
the same -- only IPv4 listeners are created.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
[bfields@citi.umich.edu: Move error-handling code to end]
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

37498292

27 1月, 2010 2 次提交

SUNRPC: NFS kernel APIs shouldn't return ENOENT for "transport not found" · 68717908

由 Chuck Lever 提交于 1月 26, 2010

write_ports() converts svc_create_xprt()'s ENOENT error return to
EPROTONOSUPPORT so that rpc.nfsd (in user space) can report an error
message that makes sense.

It turns out that several of the other kernel APIs rpc.nfsd use can
also return ENOENT from svc_create_xprt(), by way of lockd_up().

On the client side, an NFSv2 or NFSv3 mount request can also return
the result of lockd_up(). This error may also be returned during an
NFSv4 mount request, since the NFSv4 callback service uses
svc_create_xprt() to create the callback listener. An ENOENT error
return results in a confusing error message from the mount command.

Let's have svc_create_xprt() return EPROTONOSUPPORT instead of ENOENT.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

68717908

SUNRPC: Bury "#ifdef IPV6" in svc_create_xprt() · d6783b2b

由 Chuck Lever 提交于 1月 26, 2010

Clean up:  Bruce observed we have more or less common logic in each of
svc_create_xprt()'s callers:  the check to create an IPv6 RPC listener
socket only if CONFIG_IPV6 is set.  I'm about to add another case
that does just the same.

If we move the ifdefs into __svc_xpo_create(), then svc_create_xprt()
call sites can get rid of the "#ifdef" ugliness, and can use the same
logic with or without IPv6 support available in the kernel.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

d6783b2b

15 1月, 2010 1 次提交

nfsd41: Create the recovery entry for the NFSv4.1 client · 8b8aae40

由 Ricardo Labiaga 提交于 12月 11, 2009

Signed-off-by: NRicardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

8b8aae40

13 1月, 2010 4 次提交

nfsd: use vfs_fsync for non-directories · 6a68f89e

由 Christoph Hellwig 提交于 12月 25, 2009

Instead of opencoding the fsync calling sequence use vfs_fsync.  This also
gets rid of the useless i_mutex over the data writeout.

Consolidate the remaining special code for syncing directories and document
it's quirks.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

6a68f89e

nfsd4: Use FIRST_NFS4_OP in nfsd4_decode_compound() · de3cab79

由 Ricardo Labiaga 提交于 12月 11, 2009

Since we're checking for LAST_NFS4_OP, use FIRST_NFS4_OP to be consistent.
Signed-off-by: NRicardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

de3cab79

nfsd41: nfsd4_decode_compound() does not recognize all ops · c551866e

由 Ricardo Labiaga 提交于 12月 11, 2009

The server incorrectly assumes that the operations in the
array start with value 0.  The first operation (OP_ACCESS)
has a value of 3, causing the check in nfsd4_decode_compound
to be off.

Instead of comparing that the operation number is less than
the number of elements in the array, the server should verify
that it is less than the maximum valid operation number
defined by LAST_NFS4_OP.
Signed-off-by: NRicardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

c551866e

lib: Introduce generic list_sort function · 2c761270

由 Dave Chinner 提交于 1月 12, 2010

There are two copies of list_sort() in the tree already, one in the DRM
code, another in ubifs.  Now XFS needs this as well.  Create a generic
list_sort() function from the ubifs version and convert existing users
to it so we don't end up with yet another copy in the tree.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Acked-by: NDave Airlie <airlied@redhat.com>
Acked-by: NArtem Bityutskiy <dedekind@infradead.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2c761270

12 1月, 2010 2 次提交

smaps: fix wrong rss count · 7f53a09e

由 Minchan Kim 提交于 1月 08, 2010

A long time ago we regarded zero page as file_rss and vm_normal_page
doesn't return NULL.

But now, we reinstated ZERO_PAGE and vm_normal_page's implementation can
return NULL in case of zero page.  Also we don't count it with file_rss
any more.

Then, RSS and PSS can't be matched.  For consistency, Let's ignore zero
page in smaps_pte_range.
Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: NMatt Mackall <mpm@selenic.com>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7f53a09e

proc: partially revert "procfs: provide stack information for threads" · 1306d603

由 KOSAKI Motohiro 提交于 1月 08, 2010

Commit d899bf7b (procfs: provide stack information for threads) introduced
to show stack information in /proc/{pid}/status.  But it cause large
performance regression.  Unfortunately /proc/{pid}/status is used ps
command too and ps is one of most important component.  Because both to
take mmap_sem and page table walk are heavily operation.

If many process run, the ps performance is,

[before d899bf7b]

% perf stat ps >/dev/null

 Performance counter stats for 'ps':

     4090.435806  task-clock-msecs         #      0.032 CPUs
             229  context-switches         #      0.000 M/sec
               0  CPU-migrations           #      0.000 M/sec
             234  page-faults              #      0.000 M/sec
      8587565207  cycles                   #   2099.425 M/sec
      9866662403  instructions             #      1.149 IPC
      3789415411  cache-references         #    926.409 M/sec
        30419509  cache-misses             #      7.437 M/sec

   128.859521955  seconds time elapsed

[after d899bf7b]

% perf stat  ps  > /dev/null

 Performance counter stats for 'ps':

     4305.081146  task-clock-msecs         #      0.028 CPUs
             480  context-switches         #      0.000 M/sec
               2  CPU-migrations           #      0.000 M/sec
             237  page-faults              #      0.000 M/sec
      9021211334  cycles                   #   2095.480 M/sec
     10605887536  instructions             #      1.176 IPC
      3612650999  cache-references         #    839.160 M/sec
        23917502  cache-misses             #      5.556 M/sec

   152.277819582  seconds time elapsed

Thus, this patch revert it. Fortunately /proc/{pid}/task/{tid}/smaps
provide almost same information. we can use it.

Commit d899bf7b introduced two features:

 1) Add the annotattion of [thread stack: xxxx] mark to
    /proc/{pid}/task/{tid}/maps.
 2) Add StackUsage field to /proc/{pid}/status.

I only revert (2), because I haven't seen (1) cause regression.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1306d603

11 1月, 2010 6 次提交

quota: Fix dquot_transfer for filesystems different from ext4 · 05b5d898

由 Jan Kara 提交于 1月 06, 2010

Commit fd8fbfc1 modified the way we find amount of reserved space
belonging to an inode. The amount of reserved space is checked
from dquot_transfer and thus inode_reserved_space gets called
even for filesystems that don't provide get_reserved_space callback
which results in a BUG.

Fix the problem by checking get_reserved_space callback and return 0 if
the filesystem does not provide it.

CC: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NJan Kara <jack@suse.cz>

05b5d898

GFS2: Use MAX_LFS_FILESIZE for meta inode size · ba198098

由 Steven Whitehouse 提交于 1月 08, 2010

Using ~0ULL was cauing sign issues in filemap_fdatawrite_range, so
use MAX_LFS_FILESIZE instead.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

ba198098

xfs: Ensure we force all busy extents in range to disk · fd45e478

由 Dave Chinner 提交于 1月 02, 2010

When we search for and find a busy extent during allocation we
force the log out to ensure the extent free transaction is on
disk before the allocation transaction. The current implementation
has a subtle bug in it--it does not handle multiple overlapping
ranges.

That is, if we free lots of little extents into a single
contiguous extent, then allocate the contiguous extent, the busy
search code stops searching at the first extent it finds that
overlaps the allocated range. It then uses the commit LSN of the
transaction to force the log out to.

Unfortunately, the other busy ranges might have more recent
commit LSNs than the first busy extent that is found, and this
results in xfs_alloc_search_busy() returning before all the
extent free transactions are on disk for the range being
allocated. This can lead to potential metadata corruption or
stale data exposure after a crash because log replay won't replay
all the extent free transactions that cover the allocation range.
Modified-by: NAlex Elder <aelder@sgi.com>

(Dropped the "found" argument from the xfs_alloc_busysearch trace
event.)
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

fd45e478

xfs: Don't flush stale inodes · 44e08c45

由 Dave Chinner 提交于 1月 02, 2010

Because inodes remain in cache much longer than inode buffers do
under memory pressure, we can get the situation where we have
stale, dirty inodes being reclaimed but the backing storage has
been freed.  Hence we should never, ever flush XFS_ISTALE inodes
to disk as there is no guarantee that the backing buffer is in
cache and still marked stale when the flush occurs.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

44e08c45

xfs: fix timestamp handling in xfs_setattr · d6d59bad

由 Christoph Hellwig 提交于 12月 23, 2009

We currently have some rather odd code in xfs_setattr for
updating the a/c/mtime timestamps:

 - first we do a non-transaction update if all three are updated
   together
 - second we implicitly update the ctime for various changes
   instead of relying on the ATTR_CTIME flag
 - third we set the timestamps to the current time instead of the
   arguments in the iattr structure in many cases.

This patch makes sure we update it in a consistent way:

 - always transactional
 - ctime is only updated if ATTR_CTIME is set or we do a size
   update, which is a special case
 - always to the times passed in from the caller instead of the
   current time

The only non-size caller of xfs_setattr that doesn't come from
the VFS is updated to set ATTR_CTIME and pass in a valid ctime
value.
Reported-by: NEric Blake <ebb9@byu.net>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

d6d59bad

xfs: use DECLARE_EVENT_CLASS · ea9a4888

由 Christoph Hellwig 提交于 12月 21, 2009

Using DECLARE_EVENT_CLASS allows us to to use trace event code
instead of duplicating it in the binary.  This was not available
before 2.6.33 so it had to be done as a separate step once the
prerequisite was merged.

This only requires changes to xfs_trace.h and the results are
rather impressive:

hch@brick:~/work/linux-2.6/obj-kvm$ size fs/xfs/xfs.o*
text	   data	    bss	    dec	    hex	filename
 607732	  41884	   3616	 653232	  9f7b0	fs/xfs/xfs.o
1026732	  41884	   3808	1072424	 105d28	fs/xfs/xfs.o.old
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

ea9a4888

09 1月, 2010 1 次提交

xfs: kill some warnings on i386 builds · a539bd8c

由 Dave Chinner 提交于 12月 17, 2009

Randy Dunlap Reported printk() format-related warnings reported
on i386 builds in his environment.  Dave Chinner provided this
patch to eliminate them.

Signed-off by: Dave Chinner <david@fromorbit.com>
Acked-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

a539bd8c

08 1月, 2010 3 次提交

GFS2: Fix gfs2_xattr_acl_chmod() · e412bdb1

由 Steven Whitehouse 提交于 12月 21, 2009

The ref counting for the bh returned by gfs2_ea_find() was
wrong. This patch ensures that we always drop the ref count
to that bh correctly.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

e412bdb1

GFS2: Fix locking bug in rename · 24b977b5

由 Steven Whitehouse 提交于 12月 09, 2009

The rename code was taking a resource group lock in cases where
it wasn't actually needed, this caused problems if the rename
was resulting in an inode being unlinked. The patch ensures that
we only take the rgrp lock early if it is really needed.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

24b977b5

GFS2: Ensure uptodate inode size when using O_APPEND · 56aa616a

由 Steven Whitehouse 提交于 12月 08, 2009

The VFS reads the inode size during generic_file_aio_write() but
with no locking around it. In order to get the expected result
from O_APPEND opens, this patch updated the inode size before
calling generic_file_aio_write()

There is of course still a race here, in that there is nothing to
prevent another node coming in and extending the file in the
mean time. On the other hand, when used with file locking this
will ensure that the expected results are obtained.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

56aa616a

07 1月, 2010 6 次提交

reiserfs: Relax reiserfs_xattr_set_handle() while acquiring xattr locks · 31370f62

由 Frederic Weisbecker 提交于 1月 07, 2010

Fix remaining xattr locks acquired in reiserfs_xattr_set_handle()
while we are holding the reiserfs lock to avoid lock inversions.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>

31370f62

reiserfs: Fix unreachable statement · e0baec1b

由 Jiri Slaby 提交于 1月 06, 2010

Stanse found an unreachable statement in reiserfs_ioctl. There is a
if followed by error assignment and `break' with no braces. Add the
braces so that we don't break every time, but only in error case,
so that REISERFS_IOC_SETVERSION actually works when it returns no
error.
Signed-off-by: NJiri Slaby <jslaby@suse.cz>
Cc: Reiserfs <reiserfs-devel@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

e0baec1b

reiserfs: Don't call reiserfs_get_acl() with the reiserfs lock · 6c287054

由 Frederic Weisbecker 提交于 1月 07, 2010

reiserfs_get_acl is usually not called under the reiserfs lock,
as it doesn't need it. But it happens when it is called by
reiserfs_acl_chmod(), which creates a dependency inversion against
the private xattr inodes mutexes for the given inode.

We need to call it without the reiserfs lock, especially since
it's unnecessary.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>

6c287054

FDPIC: Respect PT_GNU_STACK exec protection markings when creating NOMMU stack · 04e4f2b1

由 Mike Frysinger 提交于 1月 06, 2010

The current code will load the stack size and protection markings, but
then only use the markings in the MMU code path.  The NOMMU code path
always passes PROT_EXEC to the mmap() call.  While this doesn't matter
to most people whilst the code is running, it will cause a pointless
icache flush when starting every FDPIC application.  Typically this
icache flush will be of a region on the order of 128KB in size, or may
be the entire icache, depending on the facilities available on the CPU.

In the case where the arch default behaviour seems to be desired
(EXSTACK_DEFAULT), we probe VM_STACK_FLAGS for VM_EXEC to determine
whether we should be setting PROT_EXEC or not.

For arches that support an MPU (Memory Protection Unit - an MMU without
the virtual mapping capability), setting PROT_EXEC or not will make an
important difference.

It should be noted that this change also affects the executability of
the brk region, since ELF-FDPIC has that share with the stack.  However,
this is probably irrelevant as NOMMU programs aren't likely to use the
brk region, preferring instead allocation via mmap().
Signed-off-by: NMike Frysinger <vapier@gentoo.org>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

04e4f2b1

nfs: fix oops in nfs_rename() · 56335936

由 OGAWA Hirofumi 提交于 1月 06, 2010

Recent change is missing to update "rehash".  With that change, it will
become the cause of adding dentry to hash twice.

This explains the reason of Oops (dereference the freed dentry in
__d_lookup()) on my machine.
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reported-by: NMarvin <marvin24@gmx.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

56335936

nfsd: make sure data is on disk before calling ->fsync · 7211a4e8

由 Christoph Hellwig 提交于 12月 25, 2009

nfsd is not using vfs_fsync, so I missed it when changing the calling
convention during the 2.6.32 window.  This patch fixes it to not only
start the data writeout, but also wait for it to complete before calling
into ->fsync.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Cc: stable@kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

7211a4e8

05 1月, 2010 6 次提交

exofs: simple_write_end does not mark_inode_dirty · efd124b9

由 Boaz Harrosh 提交于 12月 27, 2009

exofs uses simple_write_end() for it's .write_end handler. But
it is not enough because simple_write_end() does not call
mark_inode_dirty() when it extends i_size. So even if we do
call mark_inode_dirty at beginning of write out, with a very
long IO and a saturated system we might get the .write_inode()
called while still extend-writing to file and miss out on the last
i_size updates.

So override .write_end, call simple_write_end(), and afterwords if
i_size was changed call mark_inode_dirty().

It stands to logic that since simple_write_end() was the one extending
i_size it should also call mark_inode_dirty(). But it looks like all
users of simple_write_end() are memory-bound pseudo filesystems, who
could careless about mark_inode_dirty(). I might submit a
warning-comment patch to simple_write_end() in future.

CC: Stable <stable@kernel.org>
Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>

efd124b9

exofs: fix pnfs_osd re-definitions in pre-pnfs trees · 89be5030

由 Boaz Harrosh 提交于 12月 21, 2009

Some on disk exofs constants and types are defined in the pnfs_osd_xdr.h
file. Since we needed these types before the pnfs-objects code was
accepted to mainline we duplicated the minimal needed definitions into
an exofs local header. The definitions where conditionally included
depending on !CONFIG_PNFS defined. So if PNFS was present in the tree
definitions are taken from there and if not they are defined locally.

That was all good but, the CONFIG_PNFS is planed to be included upstream
before the pnfs-objects is also included. (The first pnfs batch might be
pnfs-files only)

So condition exofs local definitions on the absence of pnfs_osd_xdr.h
inclusion (__PNFS_OSD_XDR_H__ not defined). User code must make sure
that in future pnfs_osd_xdr.h will be included before fs/exofs/pnfs.h,
which happens to be so in current code.

Once pnfs-objects hits mainline, exofs's local header will be removed.
Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>

89be5030

reiserfs: Relax lock on xattr removing · 4f3be1b5

由 Frederic Weisbecker 提交于 1月 05, 2010

When we remove an xattr, we call lookup_and_delete_xattr()
that takes some private xattr inodes mutexes. But we hold
the reiserfs lock at this time, which leads to dependency
inversions.

We can safely call lookup_and_delete_xattr() without the
reiserfs lock, where xattr inodes lookups only need the
xattr inodes mutexes.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>

4f3be1b5

reiserfs: Relax the lock before truncating pages · 108d3943

由 Frederic Weisbecker 提交于 1月 05, 2010

While truncating a file, reiserfs_setattr() calls inode_setattr()
that will truncate the mapping for the given inode, but for that
it needs the pages locks.

In order to release these, the owners need the reiserfs lock to
complete their jobs. But they can't, as we don't release it before
calling inode_setattr().

We need to do that to fix the following softlockups:

INFO: task flush-8:0:2149 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
flush-8:0     D f51af998     0  2149      2 0x00000000
 f51af9ac 00000092 00000002 f51af998 c2803304 00000000 c1894ad0 010f3000
 f51af9cc c1462604 c189ef80 f51af974 c1710304 f715b450 f715b5ec c2807c40
 00000000 0005bb00 c2803320 c102c55b c1710304 c2807c50 c2803304 00000246
Call Trace:
 [<c1462604>] ? schedule+0x434/0xb20
 [<c102c55b>] ? resched_task+0x4b/0x70
 [<c106fa22>] ? mark_held_locks+0x62/0x80
 [<c146414d>] ? mutex_lock_nested+0x1fd/0x350
 [<c14640b9>] mutex_lock_nested+0x169/0x350
 [<c1178cde>] ? reiserfs_write_lock+0x2e/0x40
 [<c1178cde>] reiserfs_write_lock+0x2e/0x40
 [<c11719a2>] do_journal_end+0xc2/0xe70
 [<c1172912>] journal_end+0xb2/0x120
 [<c11686b3>] ? pathrelse+0x33/0xb0
 [<c11729e4>] reiserfs_end_persistent_transaction+0x64/0x70
 [<c1153caa>] reiserfs_get_block+0x12ba/0x15f0
 [<c106fa22>] ? mark_held_locks+0x62/0x80
 [<c1154b24>] reiserfs_writepage+0xa74/0xe80
 [<c1465a27>] ? _raw_spin_unlock_irq+0x27/0x50
 [<c11f3d25>] ? radix_tree_gang_lookup_tag_slot+0x95/0xc0
 [<c10b5377>] ? find_get_pages_tag+0x127/0x1a0
 [<c106fa22>] ? mark_held_locks+0x62/0x80
 [<c106fcd4>] ? trace_hardirqs_on_caller+0x124/0x170
 [<c10bc1e0>] __writepage+0x10/0x40
 [<c10bc9ab>] write_cache_pages+0x16b/0x320
 [<c10bc1d0>] ? __writepage+0x0/0x40
 [<c10bcb88>] generic_writepages+0x28/0x40
 [<c10bcbd5>] do_writepages+0x35/0x40
 [<c11059f7>] writeback_single_inode+0xc7/0x330
 [<c11067b2>] writeback_inodes_wb+0x2c2/0x490
 [<c1106a86>] wb_writeback+0x106/0x1b0
 [<c1106cf6>] wb_do_writeback+0x106/0x1e0
 [<c1106c18>] ? wb_do_writeback+0x28/0x1e0
 [<c1106e0a>] bdi_writeback_task+0x3a/0xb0
 [<c10cbb13>] bdi_start_fn+0x63/0xc0
 [<c10cbab0>] ? bdi_start_fn+0x0/0xc0
 [<c105d1f4>] kthread+0x74/0x80
 [<c105d180>] ? kthread+0x0/0x80
 [<c100327a>] kernel_thread_helper+0x6/0x10
3 locks held by flush-8:0/2149:
 #0:  (&type->s_umount_key#30){+++++.}, at: [<c110676f>] writeback_inodes_wb+0x27f/0x490
 #1:  (&journal->j_mutex){+.+...}, at: [<c117199a>] do_journal_end+0xba/0xe70
 #2:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1178cde>] reiserfs_write_lock+0x2e/0x40
INFO: task fstest:3813 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fstest        D 00000002     0  3813   3812 0x00000000
 f5103c94 00000082 f5103c40 00000002 f5ad5450 00000007 f5103c28 011f3000
 00000006 f5ad5450 c10bb005 00000480 c1710304 f5ad5450 f5ad55ec c2907c40
 00000001 f5ad5450 f5103c74 00000046 00000002 f5ad5450 00000007 f5103c6c
Call Trace:
 [<c10bb005>] ? free_hot_cold_page+0x1d5/0x280
 [<c1462d64>] io_schedule+0x74/0xc0
 [<c10b5a45>] sync_page+0x35/0x60
 [<c146325a>] __wait_on_bit_lock+0x4a/0x90
 [<c10b5a10>] ? sync_page+0x0/0x60
 [<c10b59e5>] __lock_page+0x85/0x90
 [<c105d660>] ? wake_bit_function+0x0/0x60
 [<c10bf654>] truncate_inode_pages_range+0x1e4/0x2d0
 [<c10bf75f>] truncate_inode_pages+0x1f/0x30
 [<c10bf7cf>] truncate_pagecache+0x5f/0xa0
 [<c10bf86a>] vmtruncate+0x5a/0x70
 [<c10fdb7d>] inode_setattr+0x5d/0x190
 [<c1150117>] reiserfs_setattr+0x1f7/0x2f0
 [<c1464569>] ? down_write+0x49/0x70
 [<c10fde01>] notify_change+0x151/0x330
 [<c10e6f3d>] do_truncate+0x6d/0xa0
 [<c10f4ce2>] do_filp_open+0x9a2/0xcf0
 [<c1465aec>] ? _raw_spin_unlock+0x2c/0x50
 [<c10fec50>] ? alloc_fd+0xe0/0x100
 [<c10e602d>] do_sys_open+0x6d/0x130
 [<c1002cfb>] ? sysenter_exit+0xf/0x16
 [<c10e615e>] sys_open+0x2e/0x40
 [<c1002ccc>] sysenter_do_call+0x12/0x32
3 locks held by fstest/3813:
 #0:  (&sb->s_type->i_mutex_key#4){+.+.+.}, at: [<c10e6f33>] do_truncate+0x63/0xa0
 #1:  (&sb->s_type->i_alloc_sem_key#3){+.+.+.}, at: [<c10fdf07>] notify_change+0x257/0x330
 #2:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1178c8e>] reiserfs_write_lock_once+0x2e/0x50
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>

108d3943

reiserfs: Fix recursive lock on lchown · 5fe1533f

由 Frederic Weisbecker 提交于 1月 04, 2010

On chown, reiserfs will call reiserfs_setattr() to change the owner
of the given inode, but it may also recursively call
reiserfs_setattr() to propagate the owner change to the private xattr
files for this inode.

Hence, the reiserfs lock may be acquired twice which is not wanted
as reiserfs_setattr() calls journal_begin() that is going to try to
relax the lock in order to safely acquire the journal mutex.

Using reiserfs_write_lock_once() from reiserfs_setattr() solves
the problem.

This fixes the following warning, that precedes a lockdep report.

WARNING: at fs/reiserfs/lock.c:95 reiserfs_lock_check_recursive+0x3f/0x50()
Hardware name: MS-7418
Unwanted recursive reiserfs lock!
Pid: 4189, comm: fsstress Not tainted 2.6.33-rc2-tip-atom+ #195
Call Trace:
 [<c1178bff>] ? reiserfs_lock_check_recursive+0x3f/0x50
 [<c1178bff>] ? reiserfs_lock_check_recursive+0x3f/0x50
 [<c103f7ac>] warn_slowpath_common+0x6c/0xc0
 [<c1178bff>] ? reiserfs_lock_check_recursive+0x3f/0x50
 [<c103f84b>] warn_slowpath_fmt+0x2b/0x30
 [<c1178bff>] reiserfs_lock_check_recursive+0x3f/0x50
 [<c1172ae3>] do_journal_begin_r+0x83/0x350
 [<c1172f2d>] journal_begin+0x7d/0x140
 [<c106509a>] ? in_group_p+0x2a/0x30
 [<c10fda71>] ? inode_change_ok+0x91/0x140
 [<c115007d>] reiserfs_setattr+0x15d/0x2e0
 [<c10f9bf3>] ? dput+0xe3/0x140
 [<c1465adc>] ? _raw_spin_unlock+0x2c/0x50
 [<c117831d>] chown_one_xattr+0xd/0x10
 [<c11780a3>] reiserfs_for_each_xattr+0x113/0x2c0
 [<c1178310>] ? chown_one_xattr+0x0/0x10
 [<c14641e9>] ? mutex_lock_nested+0x2a9/0x350
 [<c117826f>] reiserfs_chown_xattrs+0x1f/0x60
 [<c106509a>] ? in_group_p+0x2a/0x30
 [<c10fda71>] ? inode_change_ok+0x91/0x140
 [<c1150046>] reiserfs_setattr+0x126/0x2e0
 [<c1177c20>] ? reiserfs_getxattr+0x0/0x90
 [<c11b0d57>] ? cap_inode_need_killpriv+0x37/0x50
 [<c10fde01>] notify_change+0x151/0x330
 [<c10e659f>] chown_common+0x6f/0x90
 [<c10e67bd>] sys_lchown+0x6d/0x80
 [<c1002ccc>] sysenter_do_call+0x12/0x32
---[ end trace 7c2b77224c1442fc ]---
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>

5fe1533f

sysfs: Add lockdep annotations for the sysfs active reference · 846f9974

由 Eric W. Biederman 提交于 1月 02, 2010

Holding locks over device_del -> kobject_del -> sysfs_deactivate can
cause deadlocks if those same locks are grabbed in sysfs show or store
methods.

The I model s_active count + completion as a sleeping read/write lock.
I describe to lockdep sysfs_get_active as a read_trylock,
sysfs_put_active as a read_unlock, and sysfs_deactivate as a
write_lock and write_unlock pair.  This seems to capture the essence
for purposes of finding deadlocks, and in my testing gives finds real
issues and ignores non-issues.

This brings us back to holding locks over kobject_del is a problem
that ideally we should find a way of addressing, but at least lockdep
can tell us about the problems instead of requiring developers to debug
rare strange system deadlocks, that happen when sysfs files are removed
while being written to.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

846f9974

04 1月, 2010 1 次提交

binfmt_elf_fdpic: Fix build breakage introduced by coredump changes. · 2f48912d

由 Daisuke HATAYAMA 提交于 1月 04, 2010

Commit f6151dfe introduces build
breakage, so this patch fixes it together with some printk formatting
cleanup.
Signed-off-by: NDaisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>

2f48912d

03 1月, 2010 1 次提交

reiserfs: Fix mistake in down_write() conversion · f3e22f48

由 Frederic Weisbecker 提交于 1月 03, 2010

Fix a mistake in commit 0719d343
(reiserfs: Fix reiserfs lock <-> i_xattr_sem dependency inversion)
that has converted a down_write() into a down_read() accidentally.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>

f3e22f48

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功