提交 · 17cadc95372e28024be0874e67329c1862912c5d · OpenHarmony / kernel_linux

10 10月, 2007 23 次提交

NFS: Don't force a dcache revalidation if nfs_wcc_update_inode succeeds · 17cadc95

由 Trond Myklebust 提交于 9月 27, 2007

The reason is that if the weak cache consistency update was successful,
then we know that our client must be the only one that changed the
directory, and we've already updated the dcache to reflect the change.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

17cadc95

NFS: fix nfs_verify_change_attribute · 7957c141

由 Trond Myklebust 提交于 9月 28, 2007

We always want to check that the verifier and directory
cache_change_attribute match. This also allows us to remove the 'wraparound
hack' for the cache_change_attribute. If we're only checking for equality,
then we don't care about wraparound issues.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

7957c141

NFSv4: Make NFSv4 ACCESS calls return attributes too... · 76b32999

由 Trond Myklebust 提交于 8月 10, 2007

It doesn't really make sense to cache an access call without also
revalidating the attributes.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

76b32999

NFSv4: Simplify _nfs4_do_access() · af22f94a

由 Trond Myklebust 提交于 8月 10, 2007

Currently, _nfs4_do_access() is just a copy of nfs_do_access() with added
conversion of the open flags into an access mask. This patch merges the
duplicate functionality.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

af22f94a

T
NFS: Replace file->private_data with calls to nfs_file_open_context() · cd3758e3
由 Trond Myklebust 提交于 8月 10, 2007
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
cd3758e3
T
NFS: Add a helper to extract the nfs_open_context from a struct file · c03025d5
由 Trond Myklebust 提交于 8月 10, 2007
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
c03025d5

RPCRDMA: rpc rdma transport switch · f58851e6

由 \"Talpey, Thomas\ 提交于 9月 10, 2007

This implements the configuration and building of the core transport
switch implementation of the rpcrdma transport. Stubs are provided for
the rpcrdma protocol handling, and the infiniband/iwarp verbs interface.
These are provided in following patches.
Signed-off-by: NTom Talpey <talpey@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

f58851e6

RPCRDMA: Kconfig and header file with rpcrdma protocol definitions · c3a57ed7

由 \"Talpey, Thomas\ 提交于 9月 10, 2007

This file implements the configuration target, protocol template and
constants for the rpcrdma transport framing, for use by the xprtrdma
rpc transport implementation.
Signed-off-by: NTom Talpey <talpey@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

c3a57ed7

NFS/SUNRPC: support transport protocol naming · 4fa016eb

由 \"Talpey, Thomas\ 提交于 9月 10, 2007

To prepare for including non-sockets-based RPC transports, select
RPC transports by an identifier (to be used in following patches).
Signed-off-by: NTom Talpey <tmt@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

4fa016eb

SUNRPC: rearrange RPC sockets definitions · 49c36fcc

由 \"Talpey, Thomas\ 提交于 9月 10, 2007

To prepare for including non-sockets-based RPC transports, move the
sockets-dependent definitions into their own file.
Signed-off-by: NTom Talpey <tmt@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

49c36fcc

SUNRPC: rename the rpc_xprtsock_create structure · 3c341b0b

由 \"Talpey, Thomas\ 提交于 9月 10, 2007

To prepare for including non-sockets-based RPC transports, change the
overly suggestive name of the transport creation arguments struct.
Signed-off-by: NTom Talpey <tmt@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

3c341b0b

SUNRPC: Provide a new API for registering transport implementations · 81c098af

由 \"Talpey, Thomas\ 提交于 9月 10, 2007

To allow transport capabilities to be loaded dynamically, provide an API
for registering and unregistering the transports with the RPC client.
Eventually xprt_create_transport() will be changed to search the list of
registered transports when initializing a fresh transport.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTom Talpey <tmt@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

81c098af

SUNRPC: mark bulk read/write data in xdrbuf · 4f22ccc3

由 \"Talpey, Thomas\ 提交于 9月 10, 2007

Adds a flag word to the xdrbuf struct which indicates any bulk
disposition of the data. This enables RPC transport providers to
marshal it efficiently/appropriately, and may enable other
optimizations.
Signed-off-by: NTom Talpey <tmt@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

4f22ccc3

SUNRPC: export per-transport rpcbind netid's · 4417c8c4

由 \"Talpey, Thomas\ 提交于 9月 10, 2007

The rpcbind (v3+) netid is provided by each RPC client transport. This fixes
an omission in IPv6 rpcbind client support, and enables future extension.
Signed-off-by: NTom Talpey <tmt@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

4417c8c4

SUNRPC: move per-transport rpcbind netid's · 4f40ee4a

由 \"Talpey, Thomas\ 提交于 9月 10, 2007

Move the TCP/UDP rpcbind netid's from the rpcbind client to a global header.
Signed-off-by: NTom Talpey <tmt@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

4f40ee4a

SUNRPC: fix a signed v. unsigned comparison nit in rpc_bind_new_program · 89eb21c3

由 Chuck Lever 提交于 9月 11, 2007

/home/cel/linux/net/sunrpc/clnt.c: In function ‘rpc_bind_new_program’:
/home/cel/linux/net/sunrpc/clnt.c:445: warning:
	comparison between signed and unsigned

RPC version numbers are u32, not int.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

89eb21c3

SUNRPC: Add support for formatted universal addresses · 756805e7

由 Chuck Lever 提交于 8月 16, 2007

"Universal addresses" are a string representation of an IP address and
port. They are described fully in RFC 3530, section 2.2. Add support
for generating them in the RPC client's socket transport module.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

756805e7

SUNRPC: Add hex-formatted address support to rpc_peeraddr2str() · fbfe3cc6

由 Chuck Lever 提交于 8月 06, 2007

Add support for the NFS client's need to export volume information
with IP addresses formatted in hex instead of decimal.

This isn't used yet, but subsequent patches (not in this series) will
change the NFS client to use this functionality.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

fbfe3cc6

Re: [NFS] [PATCH] Attribute timeout handling and wrapping u32 jiffies · c7e15961

由 Fabio Olive Leite 提交于 7月 26, 2007

I would like to discuss the idea that the current checks for attribute
timeout using time_after are inadequate for 32bit architectures, since
time_after works correctly only when the two timestamps being compared
are within 2^31 jiffies of each other. The signed overflow caused by
comparing values more than 2^31 jiffies apart will flip the result,
causing incorrect assumptions of validity.

2^31 jiffies is a fairly large period of time (~25 days) when compared
to the lifetime of most kernel data structures, but for long lived NFS
mounts that can sit idle for months (think that for some reason autofs
cannot be used), it is easy to compare inode attribute timestamps with
very disparate or even bogus values (as in when jiffies have wrapped
many times, where the comparison doesn't even make sense).

Currently the code tests for attribute timeout by simply adding the
desired amount of jiffies to the stored timestamp and comparing that
with the current timestamp of obtained attribute data with time_after.
This is incorrect, as it returns true for the desired timeout period
and another full 2^31 range of jiffies.

In testing with artificial jumps (several small jumps, not one big
crank) of the jiffies I was able to reproduce a problem found in a
server with very long lived NFS mounts, where attributes would not be
refreshed even after touching files and directories in the server:

Initial uptime:
03:42:01 up 6 min, 0 users, load average: 0.01, 0.12, 0.07

NFS volume is mounted and time is advanced:
03:38:09 up 25 days, 2 min, 0 users, load average: 1.22, 1.05, 1.08

# ls -l /local/A/foo/bar /nfs/A/foo/bar
-rw-r--r--  1 root root 0 Dec 17 03:38 /local/A/foo/bar
-rw-r--r--  1 root root 0 Nov 22 00:36 /nfs/A/foo/bar

# touch /local/A/foo/bar

# ls -l /local/A/foo/bar /nfs/A/foo/bar
-rw-r--r--  1 root root 0 Dec 17 03:47 /local/A/foo/bar
-rw-r--r--  1 root root 0 Nov 22 00:36 /nfs/A/foo/bar

We can see the local mtime is updated, but the NFS mount still shows
the old value. The patch below makes it work:

Initial setup...
07:11:02 up 25 days, 1 min,  0 users,  load average: 0.15, 0.03, 0.04

# ls -l /local/A/foo/bar /nfs/A/foo/bar
-rw-r--r--  1 root root 0 Jan 11 07:11 /local/A/foo/bar
-rw-r--r--  1 root root 0 Jan 11 07:11 /nfs/A/foo/bar

# touch /local/A/foo/bar

# ls -l /local/A/foo/bar /nfs/A/foo/bar
-rw-r--r--  1 root root 0 Jan 11 07:14 /local/A/foo/bar
-rw-r--r--  1 root root 0 Jan 11 07:14 /nfs/A/foo/bar
Signed-off-by: NFabio Olive Leite <fleite@redhat.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

c7e15961

NFS: Fall back to synchronous writes when a background write errors... · 7b159fc1

由 Trond Myklebust 提交于 7月 25, 2007

This helps prevent huge queues of background writes from building up
whenever the server runs out of disk or quota space, or if someone changes
the file access modes behind our backs.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

7b159fc1

NFS: Clean up NFS writeback flush code · ed90ef51

由 Trond Myklebust 提交于 7月 20, 2007

The only user of nfs_sync_mapping_range() is nfs_getattr(), which uses it
to flush out the entire inode without sending a commit. We therefore
replace nfs_sync_mapping_range with a more appropriate helper.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

ed90ef51

T
VFS: Remove writeback_control->fs_private · 90e9a3f9
由 Trond Myklebust 提交于 7月 22, 2007
```
The only user of this field was NFS.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
90e9a3f9

NFS: Clean up write code... · 9cccef95

由 Trond Myklebust 提交于 7月 22, 2007

The addition of nfs_page_mkwrite means that We should no longer need to
create requests inside nfs_writepage()
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

9cccef95

09 10月, 2007 1 次提交

mm: set_page_dirty_balance() vs ->page_mkwrite() · a200ee18

由 Peter Zijlstra 提交于 10月 08, 2007

All the current page_mkwrite() implementations also set the page dirty. Which
results in the set_page_dirty_balance() call to _not_ call balance, because the
page is already found dirty.

This allows us to dirty a _lot_ of pages without ever hitting
balance_dirty_pages().  Not good (tm).

Force a balance call if ->page_mkwrite() was successful.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a200ee18

08 10月, 2007 1 次提交

Don't do load-average calculations at even 5-second intervals · 0c2043ab

由 Linus Torvalds 提交于 10月 07, 2007

It turns out that there are a few other five-second timers in the
kernel, and if the timers get in sync, the load-average can get
artificially inflated by events that just happen to coincide.

So just offset the load average calculation it by a timer tick.

Noticed by Anders Boström, for whom the coincidence started triggering
on one of his machines with the JBD jiffies rounding code (JBD is one of
the subsystems that also end up using a 5-second timer by default).
Tested-by: NAnders Boström <anders@bostrom.dyndns.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0c2043ab

27 9月, 2007 1 次提交

Revert "[PATCH] x86-64: fix x86_64-mm-sched-clock-share" · ff0ce684

由 Linus Torvalds 提交于 9月 26, 2007

This reverts commit 184c44d2.

As noted by Dave Jones:
   "Linus, please revert the above cset.  It doesn't seem to be
    necessary (it was added to fix a miscompile in 'make allnoconfig'
    which doesn't seem to be repeatable with it reverted) and actively
   breaks the ARM SA1100 framebuffer driver."
Requested-by: NDave Jones <davej@redhat.com>
Cc: Russell King <rmk+lkml@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ff0ce684

21 9月, 2007 1 次提交

signalfd simplification · b8fceee1

由 Davide Libenzi 提交于 9月 20, 2007

This simplifies signalfd code, by avoiding it to remain attached to the
sighand during its lifetime.

In this way, the signalfd remain attached to the sighand only during
poll(2) (and select and epoll) and read(2).  This also allows to remove
all the custom "tsk == current" checks in kernel/signal.c, since
dequeue_signal() will only be called by "current".

I think this is also what Ben was suggesting time ago.

The external effect of this, is that a thread can extract only its own
private signals and the group ones.  I think this is an acceptable
behaviour, in that those are the signals the thread would be able to
fetch w/out signalfd.
Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b8fceee1

20 9月, 2007 4 次提交

sched: add /proc/sys/kernel/sched_compat_yield · 1799e35d

由 Ingo Molnar 提交于 9月 19, 2007

add /proc/sys/kernel/sched_compat_yield to make sys_sched_yield()
more agressive, by moving the yielding task to the last position
in the rbtree.

with sched_compat_yield=0:

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  2539 mingo     20   0  1576  252  204 R   50  0.0   0:02.03 loop_yield
  2541 mingo     20   0  1576  244  196 R   50  0.0   0:02.05 loop

with sched_compat_yield=1:

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  2584 mingo     20   0  1576  248  196 R   99  0.0   0:52.45 loop
  2582 mingo     20   0  1576  256  204 R    0  0.0   0:00.00 loop_yield
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

1799e35d

Fix NUMA Memory Policy Reference Counting · 480eccf9

由 Lee Schermerhorn 提交于 9月 18, 2007

This patch proposes fixes to the reference counting of memory policy in the
page allocation paths and in show_numa_map().  Extracted from my "Memory
Policy Cleanups and Enhancements" series as stand-alone.

Shared policy lookup [shmem] has always added a reference to the policy,
but this was never unrefed after page allocation or after formatting the
numa map data.

Default system policy should not require additional ref counting, nor
should the current task's task policy.  However, show_numa_map() calls
get_vma_policy() to examine what may be [likely is] another task's policy.
The latter case needs protection against freeing of the policy.

This patch adds a reference count to a mempolicy returned by
get_vma_policy() when the policy is a vma policy or another task's
mempolicy.  Again, shared policy is already reference counted on lookup.  A
matching "unref" [__mpol_free()] is performed in alloc_page_vma() for
shared and vma policies, and in show_numa_map() for shared and another
task's mempolicy.  We can call __mpol_free() directly, saving an admittedly
inexpensive inline NULL test, because we know we have a non-NULL policy.

Handling policy ref counts for hugepages is a bit trickier.
huge_zonelist() returns a zone list that might come from a shared or vma
'BIND policy.  In this case, we should hold the reference until after the
huge page allocation in dequeue_hugepage().  The patch modifies
huge_zonelist() to return a pointer to the mempolicy if it needs to be
unref'd after allocation.

Kernel Build [16cpu, 32GB, ia64] - average of 10 runs:

		w/o patch	w/ refcount patch
	    Avg	  Std Devn	   Avg	  Std Devn
Real:	 100.59	    0.38	 100.63	    0.43
User:	1209.60	    0.37	1209.91	    0.31
System:   81.52	    0.42	  81.64	    0.34
Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: NAndi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Acked-by: NMel Gorman <mel@csn.ul.ie>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

480eccf9

Fix user namespace exiting OOPs · 28f300d2

由 Pavel Emelyanov 提交于 9月 18, 2007

It turned out, that the user namespace is released during the do_exit() in
exit_task_namespaces(), but the struct user_struct is released only during the
put_task_struct(), i.e.  MUCH later.

On debug kernels with poisoned slabs this will cause the oops in
uid_hash_remove() because the head of the chain, which resides inside the
struct user_namespace, will be already freed and poisoned.

Since the uid hash itself is required only when someone can search it, i.e.
when the namespace is alive, we can safely unhash all the user_struct-s from
it during the namespace exiting.  The subsequent free_uid() will complete the
user_struct destruction.

For example simple program

   #include <sched.h>

   char stack[2 * 1024 * 1024];

   int f(void *foo)
   {
   	return 0;
   }

   int main(void)
   {
   	clone(f, stack + 1 * 1024 * 1024, 0x10000000, 0);
   	return 0;
   }

run on kernel with CONFIG_USER_NS turned on will oops the
kernel immediately.

This was spotted during OpenVZ kernel testing.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
Acked-by: N"Serge E. Hallyn" <serue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

28f300d2

Convert uid hash to hlist · 735de223

由 Pavel Emelyanov 提交于 9月 18, 2007

Surprisingly, but (spotted by Alexey Dobriyan) the uid hash still uses
list_heads, thus occupying twice as much place as it could.  Convert it to
hlist_heads.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

735de223

17 9月, 2007 2 次提交

Fix non-ISA link error in drivers/scsi/advansys.c · fa890d58

由 Matthew Wilcox 提交于 9月 16, 2007

When CONFIG_ISA is disabled, the isa_driver support will not be compiled
in.  Define stubs so that we don't get link-time errors.
Signed-off-by: NMatthew Wilcox <matthew@wil.cx>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fa890d58

[NET] skbuff: Add skb_cow_head · d9cc2048

由 Herbert Xu 提交于 9月 16, 2007

This patch adds an optimised version of skb_cow that avoids the copy if
the header can be modified even if the rest of the payload is cloned.

This can be used in encapsulating paths where we only need to modify the
header. As it is, this can be used in PPPOE and bridging.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d9cc2048

12 9月, 2007 4 次提交

Fix select on /proc files without ->poll · dd23aae4

由 Alexey Dobriyan 提交于 9月 11, 2007

Taneli Vähäkangas <vahakang@cs.helsinki.fi> reported that commit
786d7e16 aka "Fix rmmod/read/write races
in /proc entries" broke SBCL + SLIME combo.

The old code in do_select() used DEFAULT_POLLMASK, if couldn't find
->poll handler.  The new code makes ->poll always there and returns 0 by
default, which is not correct.  Return DEFAULT_POLLMASK instead.

Steps to reproduce:

	install emacs, SBCL, SLIME
	emacs
	M-x slime	in *inferior-lisp* buffer
	[watch it doing "Connecting to Swank on port X.."]

Please, apply before 2.6.23.

P.S.: why SBCL can't just read(2) /proc/cpuinfo is a mystery.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: T Taneli Vahakangas <vahakang@cs.helsinki.fi>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dd23aae4

PTR_ALIGN · a83308e6

由 Matthew Wilcox 提交于 9月 11, 2007

The AdvanSys driver wants to align some pointers, and the ALIGN macro
doesn't work for pointers.  Rather than try to make it work, add a new
PTR_ALIGN macro which is typesafe.
Signed-off-by: NMatthew Wilcox <matthew@wil.cx>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a83308e6

leds: Add missing include for leds.h · df96efd7

由 Yoichi Yuasa 提交于 9月 11, 2007

This patch has added #include <linux/spinlock.h> to include/linux/leds.h
for rwlock_t.
Signed-off-by: NYoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: NRichard Purdie <rpurdie@rpsys.net>

df96efd7

ide: add ide_dev_is_sata() helper (take 2) · 6c3c22f3

由 Sergei Shtylyov 提交于 9月 11, 2007

Make the SATA drive detection code from eighty_ninty_three() into inline
ide_dev_is_sata() helper fixing it along the way to be more strict while
checking word 80 for the reserved values...
Signed-off-by: NSergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>

6c3c22f3

11 9月, 2007 3 次提交

PCI: irq and pci_ids patch for Intel Tolapai · 99fa9844

由 Jason Gaston 提交于 8月 30, 2007

This patch adds the Intel Tolapai LPC and SMBus Controller DID's.
Signed-off-by: NJason Gaston <jason.d.gaston@intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

99fa9844

PCI AER: fix warnings when PCIEAER=n · 5547bbee

由 Randy Dunlap 提交于 8月 23, 2007

Fix warnings when CONFIG_PCIEAER=n:

drivers/pci/pcie/portdrv_pci.c:105: warning: statement with no effect
drivers/pci/pcie/portdrv_pci.c:226: warning: statement with no effect
drivers/scsi/arcmsr/arcmsr_hba.c:352: warning: statement with no effect
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Acked-by: NLinas Vepstas <linas@austin.ibm.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

5547bbee

[NETFILTER]: Fix/improve deadlock condition on module removal netfilter · 16fcec35

由 Neil Horman 提交于 9月 11, 2007

So I've had a deadlock reported to me.  I've found that the sequence of
events goes like this:

1) process A (modprobe) runs to remove ip_tables.ko

2) process B (iptables-restore) runs and calls setsockopt on a netfilter socket,
increasing the ip_tables socket_ops use count

3) process A acquires a file lock on the file ip_tables.ko, calls remove_module
in the kernel, which in turn executes the ip_tables module cleanup routine,
which calls nf_unregister_sockopt

4) nf_unregister_sockopt, seeing that the use count is non-zero, puts the
calling process into uninterruptible sleep, expecting the process using the
socket option code to wake it up when it exits the kernel

4) the user of the socket option code (process B) in do_ipt_get_ctl, calls
ipt_find_table_lock, which in this case calls request_module to load
ip_tables_nat.ko

5) request_module forks a copy of modprobe (process C) to load the module and
blocks until modprobe exits.

6) Process C. forked by request_module process the dependencies of
ip_tables_nat.ko, of which ip_tables.ko is one.

7) Process C attempts to lock the request module and all its dependencies, it
blocks when it attempts to lock ip_tables.ko (which was previously locked in
step 3)

Theres not really any great permanent solution to this that I can see, but I've
developed a two part solution that corrects the problem

Part 1) Modifies the nf_sockopt registration code so that, instead of using a
use counter internal to the nf_sockopt_ops structure, we instead use a pointer
to the registering modules owner to do module reference counting when nf_sockopt
calls a modules set/get routine.  This prevents the deadlock by preventing set 4
from happening.

Part 2) Enhances the modprobe utilty so that by default it preforms non-blocking
remove operations (the same way rmmod does), and add an option to explicity
request blocking operation.  So if you select blocking operation in modprobe you
can still cause the above deadlock, but only if you explicity try (and since
root can do any old stupid thing it would like....  :)  ).
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16fcec35

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多