提交 · 04f2cbe35699d22dbf428373682ead85ca1240f5 · openeuler / raspberrypi-kernel

25 7月, 2008 5 次提交

hugetlb: guarantee that COW faults for a process that called mmap(MAP_PRIVATE)... · 04f2cbe3

由 Mel Gorman 提交于 7月 23, 2008

hugetlb: guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed

After patch 2 in this series, a process that successfully calls mmap() for
a MAP_PRIVATE mapping will be guaranteed to successfully fault until a
process calls fork(). At that point, the next write fault from the parent
could fail due to COW if the child still has a reference.

We only reserve pages for the parent but a copy must be made to avoid
leaking data from the parent to the child after fork(). Reserves could be
taken for both parent and child at fork time to guarantee faults but if
the mapping is large it is highly likely we will not have sufficient pages
for the reservation, and it is common to fork only to exec() immediatly
after. A failure here would be very undesirable.

Note that the current behaviour of mainline with MAP_PRIVATE pages is
pretty bad. The following situation is allowed to occur today.

1. Process calls mmap(MAP_PRIVATE)
2. Process calls mlock() to fault all pages and makes sure it succeeds
3. Process forks()
4. Process writes to MAP_PRIVATE mapping while child still exists
5. If the COW fails at this point, the process gets SIGKILLed even though it
had taken care to ensure the pages existed

This patch improves the situation by guaranteeing the reliability of the
process that successfully calls mmap(). When the parent performs COW, it
will try to satisfy the allocation without using reserves. If that fails
the parent will steal the page leaving any children without a page.
Faults from the child after that point will result in failure. If the
child COW happens first, an attempt will be made to allocate the page
without reserves and the child will get SIGKILLed on failure.

To summarise the new behaviour:

1. If the original mapper performs COW on a private mapping with multiple
references, it will attempt to allocate a hugepage from the pool or
the buddy allocator without using the existing reserves. On fail, VMAs
mapping the same area are traversed and the page being COW'd is unmapped
where found. It will then steal the original page as the last mapper in
the normal way.

2. The VMAs the pages were unmapped from are flagged to note that pages
with data no longer exist. Future no-page faults on those VMAs will
terminate the process as otherwise it would appear that data was corrupted.
A warning is printed to the console that this situation occured.

2. If the child performs COW first, it will attempt to satisfy the COW
from the pool if there are enough pages or via the buddy allocator if
overcommit is allowed and the buddy allocator can satisfy the request. If
it fails, the child will be killed.

If the pool is large enough, existing applications will not notice that
the reserves were a factor. Existing applications depending on the
no-reserves been set are unlikely to exist as for much of the history of
hugetlbfs, pages were prefaulted at mmap(), allocating the pages at that
point or failing the mmap().

[npiggin@suse.de: fix CONFIG_HUGETLB=n build]
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Acked-by: NAdam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

04f2cbe3

hugetlb: reserve huge pages for reliable MAP_PRIVATE hugetlbfs mappings until fork() · a1e78772

由 Mel Gorman 提交于 7月 23, 2008

This patch reserves huge pages at mmap() time for MAP_PRIVATE mappings in
a similar manner to the reservations taken for MAP_SHARED mappings.  The
reserve count is accounted both globally and on a per-VMA basis for
private mappings.  This guarantees that a process that successfully calls
mmap() will successfully fault all pages in the future unless fork() is
called.

The characteristics of private mappings of hugetlbfs files behaviour after
this patch are;

1. The process calling mmap() is guaranteed to succeed all future faults until
   it forks().
2. On fork(), the parent may die due to SIGKILL on writes to the private
   mapping if enough pages are not available for the COW. For reasonably
   reliable behaviour in the face of a small huge page pool, children of
   hugepage-aware processes should not reference the mappings; such as
   might occur when fork()ing to exec().
3. On fork(), the child VMAs inherit no reserves. Reads on pages already
   faulted by the parent will succeed. Successful writes will depend on enough
   huge pages being free in the pool.
4. Quotas of the hugetlbfs mount are checked at reserve time for the mapper
   and at fault time otherwise.

Before this patch, all reads or writes in the child potentially needs page
allocations that can later lead to the death of the parent.  This applies
to reads and writes of uninstantiated pages as well as COW.  After the
patch it is only a write to an instantiated page that causes problems.
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Acked-by: NAdam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a1e78772

fix soft lock up at NFS mount via per-SB LRU-list of unused dentries · da3bbdd4

由 Kentaro Makita 提交于 7月 23, 2008

[Summary]

 Split LRU-list of unused dentries to one per superblock to avoid soft
 lock up during NFS mounts and remounting of any filesystem.

 Previously I posted here:
 http://lkml.org/lkml/2008/3/5/590

[Descriptions]

- background

  dentry_unused is a list of dentries which are not referenced.
  dentry_unused grows up when references on directories or files are
  released.  This list can be very long if there is huge free memory.

- the problem

  When shrink_dcache_sb() is called, it scans all dentry_unused linearly
  under spin_lock(), and if dentry->d_sb is differnt from given
  superblock, scan next dentry.  This scan costs very much if there are
  many entries, and very ineffective if there are many superblocks.

  IOW, When we need to shrink unused dentries on one dentry, but scans
  unused dentries on all superblocks in the system.  For example, we scan
  500 dentries to unmount a filesystem, but scans 1,000,000 or more unused
  dentries on other superblocks.

  In our case , At mounting NFS*, shrink_dcache_sb() is called to shrink
  unused dentries on NFS, but scans 100,000,000 unused dentries on
  superblocks in the system such as local ext3 filesystems.  I hear NFS
  mounting took 1 min on some system in use.

* : NFS uses virtual filesystem in rpc layer, so NFS is affected by
  this problem.

  100,000,000 is possible number on large systems.

  Per-superblock LRU of unused dentried can reduce the cost in
  reasonable manner.

- How to fix

  I found this problem is solved by David Chinner's "Per-superblock
  unused dentry LRU lists V3"(1), so I rebase it and add some fix to
  reclaim with fairness, which is in Andrew Morton's comments(2).

  1) http://lkml.org/lkml/2006/5/25/318
  2) http://lkml.org/lkml/2006/5/25/320

  Split LRU-list of unused dentries to each superblocks.  Then, NFS
  mounting will check dentries under a superblock instead of all.  But
  this spliting will break LRU of dentry-unused.  So, I've attempted to
  make reclaim unused dentrins with fairness by calculate number of
  dentries to scan on this sb based on following way

  number of dentries to scan on this sb =
  count * (number of dentries on this sb / number of dentries in the machine)

- ToDo
 - I have to measuring performance number and do stress tests.

 - When unmount occurs during prune_dcache(), scanning on same
  superblock, It is unable to reach next superblock because it is gone
  away.  We restart scannig superblock from first one, it causes
  unfairness of reclaim unused dentries on first superblock.  But I think
  this happens very rarely.

- Test Results

  Result on 6GB boxes with excessive unused dentries.

Without patch:

$ cat /proc/sys/fs/dentry-state
10181835        10180203        45      0       0       0
# mount -t nfs 10.124.60.70:/work/kernel-src nfs
real    0m1.830s
user    0m0.001s
sys     0m1.653s

 With this patch:
$ cat /proc/sys/fs/dentry-state
10236610        10234751        45      0       0       0
# mount -t nfs 10.124.60.70:/work/kernel-src nfs
real    0m0.106s
user    0m0.002s
sys     0m0.032s

[akpm@linux-foundation.org: fix comments]
Signed-off-by: NKentaro Makita <k-makita@np.css.fujitsu.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: David Chinner <dgc@sgi.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

da3bbdd4

mm: remove double indirection on tlb parameter to free_pgd_range() & Co · 42b77728

由 Jan Beulich 提交于 7月 23, 2008

The double indirection here is not needed anywhere and hence (at least)
confusing.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: NJeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

42b77728

mm/vmstat.c: proper externs · c748e134

由 Adrian Bunk 提交于 7月 23, 2008

This patch adds proper extern declarations for five variables in
include/linux/vmstat.h
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c748e134

23 7月, 2008 3 次提交

netns: make get_proc_net() static · 8086cd45

由 Adrian Bunk 提交于 7月 22, 2008

get_proc_net() can now become static.
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Acked-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8086cd45

proc: fix /proc/*/pagemap some more · ee1e6ab6

由 Alexey Dobriyan 提交于 7月 21, 2008

struct pagemap_walk was placed on stack, some hooks are initialized, the
rest (->pgd_entry, ->pud_entry, ->pte_entry) are valid but junk.
Reported-by: NEric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: "Vegard Nossum" <vegard.nossum@gmail.com>
Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ee1e6ab6

execve filename: document and export via auxiliary vector · 65191087

由 John Reiser 提交于 7月 21, 2008

The Linux kernel puts the filename argument of execve() into the new
address space.  Many developers are surprised to learn this.  Those who
know and could use it, object "But it's not documented."

Those who want to use it dislike the expression
  (char *)(1+ strlen(env[-1+ n_env]) + env[-1+ n_env])
because it requires locating the last original environment variable,
and assumes that the filename follows the characters.

This patch documents the insertion of the filename, and makes it easier
to find by adding a new tag AT_EXECFN in the ElfXX_auxv_t; see <elf.h>.

In many cases readlink("/proc/self/exe",) gives the same answer.  But if
all the original pages get unmapped, then the kernel erases the symlink
for /proc/self/exe.  This can happen when a program decompressor does a
good job of cleaning up after uncompressing directly to memory, so that
the address space of the target program looks the same as if compression
had never happened.  One example is http://upx.sourceforge.net .

One notable use of the underlying concept (what path containED the
executable) is glibc expanding $ORIGIN in DT_RUNPATH.  In practice for
the near term, it may be a good idea for user-mode code to use both
/proc/self/exe and AT_EXECFN as fall-back methods for each other.
/proc/self/exe can fail due to unmapping, AT_EXECFN can fail because it
won't be present on non-new systems.  The auxvec or {AT_EXECFN}.d_val
also can get overwritten, although in nearly all cases this would be the
result of a bug.

The runtime cost is one NEW_AUX_ENT using two words of stack space.  The
underlying value is maintained already as bprm->exec; setup_arg_pages()
in fs/exec.c slides it for stack_shift, etc.
Signed-off-by: NJohn Reiser <jreiser@BitWagon.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

65191087

22 7月, 2008 5 次提交

driver core: Suppress sysfs warnings for device_rename(). · 36ce6dad

由 Cornelia Huck 提交于 6月 10, 2008

driver core: Suppress sysfs warnings for device_rename().

Renaming network devices to an already existing name is not
something we want sysfs to print a scary warning for, since the
callers can deal with this correctly. So let's introduce
sysfs_create_link_nowarn() which gets rid of the common warning.
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

36ce6dad

debugfs: Implement debugfs_remove_recursive() · 9505e637

由 Haavard Skinnemoen 提交于 7月 01, 2008

debugfs_remove_recursive() will remove a dentry and all its children.
Drivers can use this to zap their whole debugfs tree so that they don't
need to keep track of every single debugfs dentry they created.

It may fail to remove the whole tree in certain cases:

sh-3.2# rmmod atmel-mci < /sys/kernel/debug/mmc0/ios/clock
mmc0: card b368 removed
atmel_mci atmel_mci.0: Lost dma0chan1, falling back to PIO
sh-3.2# ls /sys/kernel/debug/mmc0/
ios

But I'm not sure if that case can be handled in any sane manner.
Signed-off-by: NHaavard Skinnemoen <haavard.skinnemoen@atmel.com>
Cc: Pierre Ossman <drzeus-list@drzeus.cx>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

9505e637

sysfs: don't call notify_change · 93265d13

由 Miklos Szeredi 提交于 6月 16, 2008

sysfs_chmod_file() calls notify_change() to change the permission bits
on a sysfs file.  Replace with explicit call to sysfs_setattr() and
fsnotify_change().

This is equivalent, except that security_inode_setattr() is not
called.  This function is called by drivers, so the security checks do
not make any sense.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

93265d13

driver core: remove KOBJ_NAME_LEN define · aab0de24

由 Kay Sievers 提交于 5月 02, 2008

Kobjects do not have a limit in name size since a while, so stop
pretending that they do.
Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

aab0de24

device create: coda: convert device_create to device_create_drvdata · 6143b599

由 Greg Kroah-Hartman 提交于 5月 21, 2008

device_create() is race-prone, so use the race-free
device_create_drvdata() instead as device_create() is going away.

Cc: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

6143b599

21 7月, 2008 1 次提交

tty: Ldisc revamp · a352def2

由 Alan Cox 提交于 7月 16, 2008

Move the line disciplines towards a conventional ->ops arrangement. For
the moment the actual 'tty_ldisc' struct in the tty is kept as part of
the tty struct but this can then be changed if it turns out that when it
all settles down we want to refcount ldiscs separately to the tty.

Pull the ldisc code out of /proc and put it with our ldisc code.
Signed-off-by: NAlan Cox <alan@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a352def2

19 7月, 2008 2 次提交

nfsd: nfs4xdr.c do-while is not a compound statement · 5108b276

由 Harvey Harrison 提交于 7月 17, 2008

The WRITEMEM macro produces sparse warnings of the form:
fs/nfsd/nfs4xdr.c:2668:2: warning: do-while statement is not a compound statement
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Cc: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

5108b276

nfsd: Use C99 initializers in fs/nfsd/nfs4xdr.c · ad1060c8

由 J. Bruce Fields 提交于 7月 18, 2008

Thanks to problem report and original patch from Harvey Harrison.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Cc: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Benny Halevy <bhalevy@panasas.com>

ad1060c8

18 7月, 2008 4 次提交

proc: consolidate per-net single-release callers · b6fcbdb4

由 Pavel Emelyanov 提交于 7月 18, 2008

They are symmetrical to single_open ones :)
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b6fcbdb4

proc: consolidate per-net single_open callers · de05c557

由 Pavel Emelyanov 提交于 7月 18, 2008

There are already 7 of them - time to kill some duplicate code.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de05c557

configfs: Allow ->make_item() and ->make_group() to return detailed errors. · a6795e9e

由 Joel Becker 提交于 7月 17, 2008

The configfs operations ->make_item() and ->make_group() currently
return a new item/group.  A return of NULL signifies an error.  Because
of this, -ENOMEM is the only return code bubbled up the stack.

Multiple folks have requested the ability to return specific error codes
when these operations fail.  This patch adds that ability by changing the
->make_item/group() ops to return ERR_PTR() values.  These errors are
bubbled up appropriately.  NULL returns are changed to -ENOMEM for
compatibility.

Also updated are the in-kernel users of configfs.

This is a rework of reverted commit 11c3b792.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

a6795e9e

J
Revert "configfs: Allow ->make_item() and ->make_group() to return detailed errors." · f89ab861
由 Joel Becker 提交于 7月 17, 2008
```
This reverts commit 11c3b792.  The code
will move to PTR_ERR().
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
```
f89ab861

17 7月, 2008 2 次提交

[PATCH] ocfs2: fix oops in mmap_truncate testing · c0420ad2

由 Coly Li 提交于 6月 30, 2008

This patch fixes a mmap_truncate bug which was found by ocfs2 test suite.

In an ocfs2 cluster more than 1 node, run program mmap_truncate, which races
mmap writes and truncates from multiple processes. While the test is
running, a stat from another node forces writeout, causing an oops in
ocfs2_get_block() because it sees a buffer to write which isn't allocated.

This patch fixed the bug by clear dirty and uptodate bits in buffer, leave
the buffer unmapped and return.

Fix is suggested by Mark Fasheh, and I code up the patch.
Signed-off-by: NColy Li <coyli@suse.de>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

c0420ad2

Fix compile issues in fs/compat_ioctl.c when CONFIG_BLOCK is disabled · 3c3622dc

由 Randy Dunlap 提交于 7月 16, 2008

Fix fs/compat_ioctl.c to handle CONFIG_BLOCK=n, CONFIG_SCSI=n to avoid
build errors:

In file included from include/scsi/scsi.h:12,
                 from fs/compat_ioctl.c:71:
include/scsi/scsi_cmnd.h:27:25: warning: "BLK_MAX_CDB" is not defined
include/scsi/scsi_cmnd.h:28:3: error: #error MAX_COMMAND_SIZE can not be bigger than BLK_MAX_CDB
In file included from include/scsi/scsi.h:12,
                 from fs/compat_ioctl.c:71:
include/scsi/scsi_cmnd.h: In function 'scsi_bidi_cmnd':
include/scsi/scsi_cmnd.h:182: error: implicit declaration of function 'blk_bidi_rq'
include/scsi/scsi_cmnd.h:183: error: dereferencing pointer to incomplete type
include/scsi/scsi_cmnd.h: In function 'scsi_in':
include/scsi/scsi_cmnd.h:189: error: dereferencing pointer to incomplete type
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3c3622dc

16 7月, 2008 18 次提交

T
NFSv4: Remove BKL from the nfsv4 state recovery · f839c4c1
由 Trond Myklebust 提交于 6月 11, 2008
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
f839c4c1

SUNRPC: Remove the BKL from the callback functions · a86dc496

由 Trond Myklebust 提交于 6月 11, 2008

Push it into those callback functions that actually need it.

Note that all the NFS operations use their own locking, so don't need the
BKL. Ditto for the rpcbind client.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

a86dc496

NFS: Remove BKL from the readdir code · c3cc8c01

由 Trond Myklebust 提交于 6月 11, 2008

Page accesses are serialised using the page locks, whereas all attribute
updates are serialised using the inode->i_lock.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

c3cc8c01

NFS: Remove BKL from the symlink code · 76566991

由 Trond Myklebust 提交于 6月 11, 2008

Page cache accesses are serialised using page locks, whereas attribute
updates are serialised using inode->i_lock.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

76566991

T
NFS: Remove BKL from the sillydelete operations · 52e2e8d3
由 Trond Myklebust 提交于 6月 11, 2008
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
52e2e8d3

NFS: Remove the BKL from the rename, rmdir and unlink operations · bd9bb454

由 Trond Myklebust 提交于 6月 11, 2008

Attribute updates are safe, and dentry operations are protected using VFS
level locks. Defer removing the BKL from sillyrename until a separate
patch.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

bd9bb454

NFS: Remove BKL from NFS lookup code · fc0f684c

由 Trond Myklebust 提交于 6月 11, 2008

All dentry-related operations are already BKL-safe, since they are
protected by the VFS locking. No extra locks should be needed in the NFS
code.

In the case of nfs_revalidate_inode(), we're only doing an attribute
update (protected by the inode->i_lock).
In the case of nfs_lookup(), we're instantiating a new dentry, so there
should be no contention possible until after we call d_materialise_unique.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

fc0f684c

T
NFS: Remove the BKL from nfs_link() · fc81af53
由 Trond Myklebust 提交于 6月 11, 2008
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
fc81af53

NFS: Remove the BKL from the inode creation operations · f1e2eda2

由 Trond Myklebust 提交于 6月 11, 2008

nfs_instantiate() does not require the BKL, neither do the attribute
updates or the RPC code.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

f1e2eda2

NFS: Remove BKL usage from open() · bba67e0e

由 Trond Myklebust 提交于 6月 11, 2008

All the NFSv4 stateful operations are already protected by other locks (in
particular by the rpc_sequence locks.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

bba67e0e

T
NFS: Remove BKL usage from the write path · b6a2e569
由 Trond Myklebust 提交于 6月 11, 2008
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
b6a2e569
T
NFS: Remove the BKL from the permission checking code · 4d80f2ec
由 Trond Myklebust 提交于 6月 11, 2008
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
4d80f2ec
T
NFS: Remove attribute update related BKL references · fa6dc9dc
由 Trond Myklebust 提交于 6月 11, 2008
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
fa6dc9dc

NFS: Remove BKL requirement from attribute updates · a3d01454

由 Trond Myklebust 提交于 6月 11, 2008

The main problem is dealing with inode->i_size: we need to set the
inode->i_lock on all attribute updates, and so vmtruncate won't cut it.
Make an NFS-private version of vmtruncate that has the necessary locking
semantics.

The result should be that the following inode attribute updates are
protected by inode->i_lock
	nfsi->cache_validity
	nfsi->read_cache_jiffies
	nfsi->attrtimeo
	nfsi->attrtimeo_timestamp
	nfsi->change_attr
	nfsi->last_updated
	nfsi->cache_change_attribute
	nfsi->access_cache
	nfsi->access_cache_entry_lru
	nfsi->access_cache_inode_lru
	nfsi->acl_access
	nfsi->acl_default
	nfsi->nfs_page_tree
	nfsi->ncommit
	nfsi->npages
	nfsi->open_files
	nfsi->silly_list
	nfsi->acl
	nfsi->open_states
	inode->i_size
	inode->i_atime
	inode->i_mtime
	inode->i_ctime
	inode->i_nlink
	inode->i_uid
	inode->i_gid

The following is protected by dir->i_mutex
	nfsi->cookieverf
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

a3d01454

T
NFS: Protect inode->i_nlink updates using inode->i_lock · 1b83d707
由 Trond Myklebust 提交于 6月 11, 2008
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
1b83d707

nfs: set correct fl_len in nlmclnt_test() · d67d1c7b

由 Felix Blyakher 提交于 7月 15, 2008

fcntl(F_GETLK) on an nfs client incorrectly returns
the values for the conflicting lock. fl_len value is
always 1.
If the conflicting lock is (0, 4095) the F_GETLK
request for (1024, 10) returns (0, 1), which doesn't
even cover the requested range, and is quite confusing.
The fix is trivial, set fl_end from the fl_end value
recieved from the nfs server.
Signed-off-by: NFelix Blyakher <felixb@sgi.com>
Signed-off-by: N"J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

d67d1c7b

lockd: Pass "struct sockaddr *" to new failover-by-IP function · 367c8c7b

由 Chuck Lever 提交于 6月 30, 2008

Pass a more generic socket address type to nlmsvc_unlock_all_by_ip() to
allow for future support of IPv6.  Also provide additional sanity
checking in failover_unlock_ip() when constructing the server's IP
address.

As an added bonus, provide clean kerneldoc comments on related NLM
interfaces which were recently added.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

367c8c7b

lockd: get host reference in nlmsvc_create_block() instead of callers · 560de0e6

由 J. Bruce Fields 提交于 7月 15, 2008

It may not be obvious (till you look at the definition of
nlm_alloc_call()) that a function like nlmsvc_create_block() should
consume a reference on success or failure, so I find it clearer if it
takes the reference it needs itself.

And both callers already do this immediately before the call anyway.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

560de0e6