提交 · bb346f63976823c2959b0c5917928f12cbf96e4a · openeuler / Kernel

29 9月, 2012 23 次提交

NFSv4.1: reset the inode MDS threshold counters on layout destruction · bb346f63

由 Trond Myklebust 提交于 9月 20, 2012

Instead of resetting the inode MDS threshold counters when we mark
the layout for destruction, do it as part of freeing the layout.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

bb346f63

NFSv4.1: Get rid of pNFS layout state "NFS_LAYOUT_INVALID" · 965938b8

由 Trond Myklebust 提交于 9月 20, 2012

In all cases where we set NFS_LAYOUT_INVALID, we also set NFS_LAYOUT_DESTROYED.
Furthermore, in all cases where we test for NFS_LAYOUT_INVALID, we should
also be testing for NFS_LAYOUT_DESTROYED, since the latter means that
we hold no valid layout segments.
Ergo the two are redundant.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

965938b8

T
NFSv4.1: Simplify the pNFS return-on-close code · 1f7977c1
由 Trond Myklebust 提交于 9月 20, 2012
```
Confine it to the nfs4_do_close() code.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
1f7977c1

NFSv4.1: Fix a race in the pNFS return-on-close code · 7fdab069

由 Trond Myklebust 提交于 9月 20, 2012

If we sleep after dropping the inode->i_lock, then we are no longer
atomic with respect to the rpc_wake_up() call in pnfs_layout_remove_lseg().
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

7fdab069

NFSv4.1: pnfs_layout_io_set_failed must clear invalid lsegs · 115ce575

由 Trond Myklebust 提交于 9月 20, 2012

If pnfs_layout_io_test_failed() authorises a retry of the failed layoutgets,
we should clear the existing layout segments so that we start afresh. Do
this in pnfs_layout_io_set_failed().
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

115ce575

NFSv4.1: Don't drop the pnfs_layout_hdr after a layoutget failure · 3e621214

由 Trond Myklebust 提交于 9月 24, 2012

We want to cache the pnfs_layout_hdr after a layoutget or i/o
failure so that pnfs_update_layout() can find it and know when
it is time to retry.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

3e621214

NFSv4.1: Fix a reference leak in pnfs_update_layout · 830ffb56

由 Trond Myklebust 提交于 9月 20, 2012

If we exit after the call to pnfs_find_alloc_layout(), we have to ensure
that we put the struct pnfs_layout_hdr.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

830ffb56

NFSv4.1: pNFS data servers may be temporarily offline · 1dfed273

由 Trond Myklebust 提交于 9月 18, 2012

In cases where the pNFS data server is just temporarily out of service,
we want to mark it as such, and then try again later. Typically that will
be in cases of network connection errors etc.
This patch allows us to mark the devices as being "unavailable" for such
transient errors, and will make them available for retries after a
2 minute timeout period.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

1dfed273

NFSv4.1: Retry pNFS after a 2 minute timeout · 25c75333

由 Trond Myklebust 提交于 9月 18, 2012

If we had to fall back to read/write through MDS, then assume that we should
retry pNFS after a suitable timeout period.
The following patch sets a timeout of 2 minutes.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

25c75333

T
NFSv4.1: Add helpers for setting/reading the I/O fail bit · b9e028fd
由 Trond Myklebust 提交于 9月 18, 2012
```
...and make them local to the pnfs.c file.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
b9e028fd

NFSv4.1: Replace dprintk() in pnfs_update_layout with something less buggy · f86bbcf8

由 Trond Myklebust 提交于 9月 26, 2012

Dereferencing nfsi->layout in order to read plh_flags without holding
a spin lock is bug prone. Furthermore, the dprintk() tells you nothing
about whether or not the call succeeded.
Replace it with something that tells you about whether or not a valid
layout segment was returned for the inode in question.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

f86bbcf8

T
NFSv4.1: Replace get_device_info() with filelayout_get_device_info() · 78e4e05c
由 Trond Myklebust 提交于 9月 18, 2012
```
Fix the namespace pollution issue.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
78e4e05c
T
NFSv4.1: Cleanup; add "pnfs_" prefix to put_lseg() and get_lseg() · 9369a431
由 Trond Myklebust 提交于 9月 18, 2012
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
9369a431
T
NFSv4.1: Cleanup; add "pnfs_" prefix to get_layout_hdr() and put_layout_hdr() · 70c3bd2b
由 Trond Myklebust 提交于 9月 18, 2012
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
70c3bd2b
T
NFSv4.1: Cleanup add a "pnfs_" prefix to mark_matching_lsegs_invalid · 49a85061
由 Trond Myklebust 提交于 9月 18, 2012
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
49a85061

NFS: Clean up the pNFS layoutget interface · a0b0a6e3

由 Trond Myklebust 提交于 9月 17, 2012

Ensure that we do return errors from nfs4_proc_layoutget() and that we
don't mark the layout as having failed if the error was due to a
signal or resource problem on the client side.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

a0b0a6e3

SUNRPC: Get rid of the redundant xprt->shutdown bit field · d19751e7

由 Trond Myklebust 提交于 9月 11, 2012

It is only set after everyone has dereferenced the transport,
and serves no useful purpose: setting it is racy, so all the
socket code, etc still needs to be able to cope with the cases
where they miss reading it.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

d19751e7

NFS: Write the entire file if a server reboot occurs during fsync() · dcfc4f25

由 Trond Myklebust 提交于 9月 11, 2012

This is to ensure that we don't clear the NFS_CONTEXT_RESEND_WRITES
flag while there are still writes that haven't been resent.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

dcfc4f25

NFS: Fix fdatasync/fsync() when confronted with a server reboot · 05990d1b

由 Trond Myklebust 提交于 9月 11, 2012

If the server reboots before it can commit the unstable writes to disk,
then nfs_commit_release_pages() will detect this when it compares the
verifier returned by COMMIT to the one returned by WRITE. When this
happens, the client needs to resend those writes in order to guarantee
that they make it to stable storage.

This patch adds a signalling mechanism to notify fsync() that it
needs to retry all writes before it can exit.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

05990d1b

T
NFSv4: Convert the nfs4_lock_state->ls_flags to a bit field · 795a88c9
由 Trond Myklebust 提交于 9月 10, 2012
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
795a88c9

NFS: Clean up helper function nfs4_select_rw_stateid() · 2a369153

由 Trond Myklebust 提交于 8月 13, 2012

We want to be able to pass on the information that the page was not
dirtied under a lock. Instead of adding a flag parameter, do this
by passing a pointer to a 'struct nfs_lock_owner' that may be NULL.

Also reuse this structure in struct nfs_lock_context to carry the
fl_owner_t and pid_t.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

2a369153

NFS: Convert nfs_get_lock_context to return an ERR_PTR on failure · b3c54de6

由 Trond Myklebust 提交于 8月 13, 2012

We want to be able to distinguish between allocation failures, and
the case where the lock context is not needed (because there are no
locks).
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

b3c54de6

SUNRPC: Optimise away unnecessary data moves in xdr_align_pages · a11a2bf4

由 Trond Myklebust 提交于 8月 02, 2012

We only have to call xdr_shrink_pagelen() if the remaining RPC
message does not fit in the page buffer length that we supplied
to xdr_align_pages().
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

a11a2bf4

27 9月, 2012 2 次提交
- T
  NFSv4.1: decode_getdeviceinfo should check xdr_read_pages() return value · 13fe4ba1
  由 Trond Myklebust 提交于 8月 01, 2012
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
  13fe4ba1
- T
  SUNRPC: Fix the return value of xdr_align_pages() · 8a9a8b83
  由 Trond Myklebust 提交于 8月 01, 2012
```
The callers of xdr_align_pages() expect it to return the number of bytes
of actual XDR data remaining in the pages.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
  8a9a8b83
25 9月, 2012 3 次提交

NFS4: avoid underflow when converting error to pointer. · 62d98c93

由 NeilBrown 提交于 9月 17, 2012

In nfs4_create_sec_client, 'flavor' can hold a negative error
code (returned from nfs4_negotiate_security), even though it
is an 'enum' and hence unsigned.

The code is careful to cast it to an (int) before testing if it
is negative, however it doesn't cast to an (int) before calling
ERR_PTR.

On a machine where "void*" is larger than "int", this results in
the unsigned equivalent of -1 (e.g. 0xffffffff) being converted
to a pointer.  Subsequent code determines that this is not
negative, and so  dereferences it with predictable results.

So: cast 'flavor' to a (signed) int before passing to ERR_PTR.

cc: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

62d98c93

NFS: fix the return value check by using IS_ERR · e8d920c5

由 Wei Yongjun 提交于 9月 21, 2012

In case of error, the function rpcauth_create() returns ERR_PTR()
and never returns NULL pointer. The NULL test in the return value
check should be replaced with IS_ERR().

dpatch engine is used to auto generated this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

e8d920c5

SUNRPC: Set alloc_slot for backchannel tcp ops · 84e28a30

由 Bryan Schumaker 提交于 9月 24, 2012

f39c1bfb (SUNRPC: Fix a UDP transport
regression) introduced the "alloc_slot" function for xprt operations,
but never created one for the backchannel operations.  This patch fixes
a null pointer dereference when mounting NFS over v4.1.

Call Trace:
 [<ffffffffa0207957>] ? xprt_reserve+0x47/0x50 [sunrpc]
 [<ffffffffa02023a4>] call_reserve+0x34/0x60 [sunrpc]
 [<ffffffffa020e280>] __rpc_execute+0x90/0x400 [sunrpc]
 [<ffffffffa020e61a>] rpc_async_schedule+0x2a/0x40 [sunrpc]
 [<ffffffff81073589>] process_one_work+0x139/0x500
 [<ffffffff81070e70>] ? alloc_worker+0x70/0x70
 [<ffffffffa020e5f0>] ? __rpc_execute+0x400/0x400 [sunrpc]
 [<ffffffff81073d1e>] worker_thread+0x15e/0x460
 [<ffffffff8145c839>] ? preempt_schedule+0x49/0x70
 [<ffffffff81073bc0>] ? rescuer_thread+0x230/0x230
 [<ffffffff81079603>] kthread+0x93/0xa0
 [<ffffffff81465d04>] kernel_thread_helper+0x4/0x10
 [<ffffffff81079570>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff81465d00>] ? gs_change+0x13/0x13
Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

84e28a30

20 9月, 2012 8 次提交

SUNRPC: Ensure that the TCP socket is closed when in CLOSE_WAIT · a519fc7a

由 Trond Myklebust 提交于 9月 12, 2012

Instead of doing a shutdown() call, we need to do an actual close().
Ditto if/when the server is sending us junk RPC headers.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: NSimon Kirby <sim@hostway.ca>
Cc: stable@vger.kernel.org

a519fc7a

Merge branch 'for-linus' of git://git.kernel.dk/linux-block · c46de226

由 Linus Torvalds 提交于 9月 19, 2012

Pull block fixes from Jens Axboe:
 "A small collection of driver fixes/updates and a core fix for 3.6.  It
  contains:

   - Bug fixes for mtip32xx, and support for new hardware (just addition
     of IDs).  They have been queued up for 3.7 for a few weeks as well.

   - rate-limit a failing command error message in block core.

   - A fix for an old cciss bug from Stephen.

   - Prevent overflow of partition count from Alan."

* 'for-linus' of git://git.kernel.dk/linux-block:
  cciss: fix handling of protocol error
  blk: add an upper sanity check on partition adding
  mtip32xx: fix user_buffer check in exec_drive_command
  mtip32xx: Remove dead code
  mtip32xx: Change printk to pr_xxxx
  mtip32xx: Proper reporting of write protect status on big-endian
  mtip32xx: Increase timeout for standby command
  mtip32xx: Handle NCQ commands during the security locked state
  mtip32xx: Add support for new devices
  block: rate-limit the error message from failing commands

c46de226

Merge tag 'sh-for-linus' of git://github.com/pmundt/linux-sh · 077fee00

由 Linus Torvalds 提交于 9月 19, 2012

Pull SuperH fixes from Paul Mundt.

* tag 'sh-for-linus' of git://github.com/pmundt/linux-sh:
  sh: Fix up TIF_NOTIFY_RESUME sans TIF_SIGPENDING handling.
  sh: pfc: Release spinlock in sh_pfc_gpio_request_enable() error path
  sh: intc: Fix up multi-evt irq association.

077fee00

Merge tag 'rpmsg-3.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg · cf42d543

由 Linus Torvalds 提交于 9月 19, 2012

Pull rpmsg fix from Ohad Ben-Cohen:
 "A quick rpmsg fix from Fernando, fixing two buggy invocations of
  dma_free_coherent"

* tag 'rpmsg-3.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg:
  rpmsg: fix dma_free_coherent dev parameter

cf42d543

Merge tag 'md-3.6-fixes' of git://neil.brown.name/md · 4b92c17e

由 Linus Torvalds 提交于 9月 19, 2012

Pull md fixes from NeilBrown:
 "3 fixes for md in 3.6.

  One reverts a recent patch which turns out to not be such a good idea.

  Other two fix minor bugs with the new (since 3.3) 'replacement' code
  and have been tagged for -stable."

* tag 'md-3.6-fixes' of git://neil.brown.name/md:
  md: make sure metadata is updated when spares are activated or removed.
  md/raid5: fix calculate of 'degraded' when a replacement becomes active.
  Revert "md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE."

4b92c17e

Merge branch 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · c5c473e2

由 Linus Torvalds 提交于 9月 19, 2012

Pull workqueue / powernow-k8 fix from Tejun Heo:
 "This is the fix for the bug where cpufreq/powernow-k8 was tripping
  BUG_ON() in try_to_wake_up_local() by migrating workqueue worker to a
  different CPU.

    https://bugzilla.kernel.org/show_bug.cgi?id=47301

  As discussed, the fix is now two parts - one to reimplement
  work_on_cpu() so that it doesn't create a new kthread each time and
  the actual fix which makes powernow-k8 use work_on_cpu() instead of
  performing manual migration.

  While pretty late in the merge cycle, both changes are on the safer
  side.  Jiri and I verified two existing users of work_on_cpu() and
  Duncan confirmed that the powernow-k8 fix survived about 18 hours of
  testing."

* 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU
  workqueue: reimplement work_on_cpu() using system_wq

c5c473e2

cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU · 6889125b

由 Tejun Heo 提交于 9月 18, 2012

powernowk8_target() runs off a per-cpu work item and if the
cpufreq_policy->cpu is different from the current one, it migrates the
kworker to the target CPU by manipulating current->cpus_allowed.  The
function migrates the kworker back to the original CPU but this is
still broken.  Workqueue concurrency management requires the kworkers
to stay on the same CPU and powernowk8_target() ends up triggerring
BUG_ON(rq != this_rq()) in try_to_wake_up_local() if it contends on
fidvid_mutex and sleeps.

It is unclear why this bug is being reported now.  Duncan says it
appeared to be a regression of 3.6-rc1 and couldn't reproduce it on
3.5.  Bisection seemed to point to 63d95a91 "workqueue: use @pool
instead of @gcwq or @cpu where applicable" which is an non-functional
change.  Given that the reproduce case sometimes took upto days to
trigger, it's easy to be misled while bisecting.  Maybe something made
contention on fidvid_mutex more likely?  I don't know.

This patch fixes the bug by using work_on_cpu() instead if @pol->cpu
isn't the same as the current one.  The code assumes that
cpufreq_policy->cpu is kept online by the caller, which Rafael tells
me is the case.

stable: ed48ece2 ("workqueue: reimplement work_on_cpu() using
        system_wq") should be applied before this; otherwise, the
        behavior could be horrible.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NDuncan <1i5t5.duncan@cox.net>
Tested-by: NDuncan <1i5t5.duncan@cox.net>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: stable@vger.kernel.org
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=47301

6889125b

workqueue: reimplement work_on_cpu() using system_wq · ed48ece2

由 Tejun Heo 提交于 9月 18, 2012

The existing work_on_cpu() implementation is hugely inefficient.  It
creates a new kthread, execute that single function and then let the
kthread die on each invocation.

Now that system_wq can handle concurrent executions, there's no
advantage of doing this.  Reimplement work_on_cpu() using system_wq
which makes it simpler and way more efficient.

stable: While this isn't a fix in itself, it's needed to fix a
        workqueue related bug in cpufreq/powernow-k8.  AFAICS, this
        shouldn't break other existing users.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NJiri Kosina <jkosina@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@vger.kernel.org

ed48ece2

19 9月, 2012 4 次提交

md: make sure metadata is updated when spares are activated or removed. · 6dafab6b

由 NeilBrown 提交于 9月 19, 2012

It isn't always necessary to update the metadata when spares are
removed as the presence-or-not of a spare isn't really important to
the integrity of an array.
Also activating a spare doesn't always require updating the metadata
as the update on 'recovery-completed' is usually sufficient.

However the introduction of 'replacement' devices have made these
transitions sometimes more important.  For example the 'Replacement'
flag isn't cleared until the original device is removed, so we need
to ensure a metadata update after that 'spare' is removed.

So set MD_CHANGE_DEVS whenever a spare is activated or removed, to
complement the current situation where it is set when a spare is added
or a device is failed (or a number of other less common situations).

This is suitable for -stable as out-of-data metadata could lead
to data corruption.
This is only relevant for 3.3 and later 9when 'replacement' as
introduced.

Cc: stable@vger.kernel.org
Signed-off-by: NNeilBrown <neilb@suse.de>

6dafab6b

md/raid5: fix calculate of 'degraded' when a replacement becomes active. · e5c86471

由 NeilBrown 提交于 9月 19, 2012

When a replacement device becomes active, we mark the device that it
replaces as 'faulty' so that it can subsequently get removed.
However 'calc_degraded' only pays attention to the primary device, not
the replacement, so the array appears to become degraded, which is
wrong.

So teach 'calc_degraded' to consider any replacement if a primary
device is faulty.

This is suitable for -stable as an incorrect 'degraded' value can
confuse md and could lead to data corruption.
This is only relevant for 3.3 and later.

Cc: stable@vger.kernel.org
Reported-by: NRobin Hill <robin@robinhill.me.uk>
Reported-by: NJohn Drescher <drescherjm@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

e5c86471

Revert "md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE." · a852d7b8

由 NeilBrown 提交于 9月 19, 2012

This reverts commit 895e3c5c.

While this patch seemed like a good idea and did help some workloads,
it hurts other workloads.
Large sequential O_DIRECT writes were faster,
Small random O_DIRECT writes were slower.

Other changes (batching RAID5 writes) have improved the sequential
writes using a different mechanism, so the net result of this patch
is definitely negative.  So revert it.
Reported-by: NShaohua Li <shli@kernel.org>
Tested-by: NJianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

a852d7b8

Merge tag 'hwspinlock-3.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/hwspinlock · 925a6f0b

由 Linus Torvalds 提交于 9月 18, 2012

Pull hwspinlock fix from Ohad Ben-Cohen:
 "A single hwspinlock fix by Wei Yongjun, which prevents potential NULL
  dereferences"

* tag 'hwspinlock-3.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/hwspinlock:
  hwspinlock/core: move the dereference below the NULL test

925a6f0b

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功