提交 · e09c978aae5bedfdb379be80363b024b7d82638b · openanolis / cloud-kernel

29 8月, 2016 1 次提交

NFSv4.1: Fix Oopsable condition in server callback races · e09c978a

由 Trond Myklebust 提交于 8月 27, 2016

The slot table hasn't been an array since v3.7. Ensure that we
use nfs4_lookup_slot() to access the slot correctly.

Fixes: 87dda67e ("NFSv4.1: Allow SEQUENCE to resize the slot table...")
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v3.8+

e09c978a

23 8月, 2016 2 次提交

pnfs/blocklayout: update last_write_offset atomically with extents · 41963c10

由 Benjamin Coddington 提交于 8月 22, 2016

Block/SCSI layout write completion may add committable extents to the
extent tree before updating the layout's last-written byte under the inode
lock. If a sync happens before this value is updated, then
prepare_layoutcommit may find and encode these extents which would produce
a LAYOUTCOMMIT request whose encoded extents are larger than the request's
loca_length.

Fix this by using a last-written byte value that is updated atomically with
the extent tree so that commitable extents always match.
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

41963c10

pNFS: The client must not do I/O to the DS if it's lease has expired · b88fa69e

由 Trond Myklebust 提交于 8月 23, 2016

Ensure that the client conforms to the normative behaviour described in
RFC5661 Section 12.7.2: "If a client believes its lease has expired,
it MUST NOT send I/O to the storage device until it has validated its
lease."

So ensure that we wait for the lease to be validated before using
the layout.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v3.20+

b88fa69e

20 8月, 2016 1 次提交

pNFS: Handle NFS4ERR_OLD_STATEID correctly in LAYOUTSTAT calls · 9a0fe867

由 Trond Myklebust 提交于 8月 19, 2016

We normally want to update the stateid and then retry,
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

9a0fe867

16 8月, 2016 2 次提交

pNFS/flexfiles: Set reasonable default retrans values for the data channel · 15d03055

由 Trond Myklebust 提交于 8月 16, 2016

Prior to this patch, the retrans value was set at 5, meaning that we
could see a maximum retransmission timeout value of more than 6 minutes.
That's a tad high for NFSv3 where the protocol does allow the server to
drop requests at any time.

Since this is a data channel, let's just set retrans to 0, and the default
timeout to 60s. The user can continue to adjust these defaults using the
dataserver_retrans and dataserver_timeo module parameters.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

15d03055

NFS: Allow the mount option retrans=0 · a956beda

由 Trond Myklebust 提交于 8月 16, 2016

We should allow retrans=0 as just meaning that every timeout is a major
timeout, and that there is no increment in the timeout value.

For instance, this means that we would allow TCP users to specify a
flat timeout value of 60s, by specifying "timeo=600,retrans=0" in their
mount option string.
Siged-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

a956beda

15 8月, 2016 1 次提交

pNFS/flexfiles: Fix layoutstat periodic reporting · 1c8d477a

由 Trond Myklebust 提交于 8月 14, 2016

Putting the periodicity timer in the mirror instances is causing
non-scalable reporting behaviour and missed reporting intervals.
When you recall layouts and/or implement client side mirroring, it
leads to consecutive reports with only a few ms between RPC calls.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Fixes: d0379a5d ("pNFS/flexfiles: Support server-supplied...")

1c8d477a

13 8月, 2016 1 次提交

nfsd: don't return an unhashed lock stateid after taking mutex · dd257933

由 Jeff Layton 提交于 8月 11, 2016

nfsd4_lock will take the st_mutex before working with the stateid it
gets, but between the time when we drop the cl_lock and take the mutex,
the stateid could become unhashed (a'la FREE_STATEID). If that happens
the lock stateid returned to the client will be forgotten.

Fix this by first moving the st_mutex acquisition into
lookup_or_create_lock_state. Then, have it check to see if the lock
stateid is still hashed after taking the mutex. If it's not, then put
the stateid and try the find/create again.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Tested-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
Cc: stable@vger.kernel.org # feb9dad5 nfsd: Always lock state exclusively.
Cc: stable@vger.kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

dd257933

12 8月, 2016 2 次提交

proc, meminfo: use correct helpers for calculating LRU sizes in meminfo · 2f95ff90

由 Mel Gorman 提交于 8月 11, 2016

meminfo_proc_show() and si_mem_available() are using the wrong helpers
for calculating the size of the LRUs. The user-visible impact is that
there appears to be an abnormally high number of unevictable pages.

Link: http://lkml.kernel.org/r/20160805105805.GR2799@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2f95ff90

nfsd: Fix race between FREE_STATEID and LOCK · 42691398

由 Chuck Lever 提交于 8月 11, 2016

When running LTP's nfslock01 test, the Linux client can send a LOCK
and a FREE_STATEID request at the same time. The outcome is:

Frame 324    R OPEN stateid [2,O]

Frame 115004 C LOCK lockowner_is_new stateid [2,O] offset 672000 len 64
Frame 115008 R LOCK stateid [1,L]
Frame 115012 C WRITE stateid [0,L] offset 672000 len 64
Frame 115016 R WRITE NFS4_OK
Frame 115019 C LOCKU stateid [1,L] offset 672000 len 64
Frame 115022 R LOCKU NFS4_OK
Frame 115025 C FREE_STATEID stateid [2,L]
Frame 115026 C LOCK lockowner_is_new stateid [2,O] offset 672128 len 64
Frame 115029 R FREE_STATEID NFS4_OK
Frame 115030 R LOCK stateid [3,L]
Frame 115034 C WRITE stateid [0,L] offset 672128 len 64
Frame 115038 R WRITE NFS4ERR_BAD_STATEID

In other words, the server returns stateid L in a successful LOCK
reply, but it has already released it. Subsequent uses of stateid L
fail.

To address this, protect the generation check in nfsd4_free_stateid
with the st_mutex. This should guarantee that only one of two
outcomes occurs: either LOCK returns a fresh valid stateid, or
FREE_STATEID returns NFS4ERR_LOCKS_HELD.
Reported-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
Fix-suggested-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

42691398

11 8月, 2016 1 次提交

nfsd: fix dentry refcounting on create · 502aa0a5

由 Josef Bacik 提交于 8月 10, 2016

b44061d0 introduced a dentry ref counting bug.  Previously we were
grabbing one ref to dchild in nfsd_create(), but with the creation of
nfsd_create_locked() we have a ref for dchild from the lookup in
nfsd_create(), and then another ref in nfsd_create_locked().  The ref
from the lookup in nfsd_create() is never dropped and results in
dentries still in use at unmount.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Fixes: b44061d0 "nfsd: reorganize nfsd_create"
Reported-by: Nkernel test robot <xiaolong.ye@intel.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

502aa0a5

10 8月, 2016 2 次提交

mm, writeback: flush plugged IO in wakeup_flusher_threads() · 51350ea0

由 Konstantin Khlebnikov 提交于 8月 04, 2016

I've found funny live-lock between raid10 barriers during resync and
memory controller hard limits. Inside mpage_readpages() task holds on to
its plug bio which blocks the barrier in raid10. Its memory cgroup have
no free memory thus the task goes into reclaimer but all reclaimable
pages are dirty and cannot be written because raid10 is rebuilding and
stuck on the barrier.

Common flush of such IO in schedule() never happens, because the caller
doesn't go to sleep.

Lock is 'live' because changing memory limit or killing tasks which
holds that stuck bio unblock whole progress.

That was what happened in 3.18.x but I see no difference in upstream
logic.  Theoretically this might happen even without memory cgroup.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NJens Axboe <axboe@fb.com>

51350ea0

mm: memcontrol: only mark charged pages with PageKmemcg · c4159a75

由 Vladimir Davydov 提交于 8月 08, 2016

To distinguish non-slab pages charged to kmemcg we mark them PageKmemcg,
which sets page->_mapcount to -512.  Currently, we set/clear PageKmemcg
in __alloc_pages_nodemask()/free_pages_prepare() for any page allocated
with __GFP_ACCOUNT, including those that aren't actually charged to any
cgroup, i.e. allocated from the root cgroup context.  To avoid overhead
in case cgroups are not used, we only do that if memcg_kmem_enabled() is
true.  The latter is set iff there are kmem-enabled memory cgroups
(online or offline).  The root cgroup is not considered kmem-enabled.

As a result, if a page is allocated with __GFP_ACCOUNT for the root
cgroup when there are kmem-enabled memory cgroups and is freed after all
kmem-enabled memory cgroups were removed, e.g.

  # no memory cgroups has been created yet, create one
  mkdir /sys/fs/cgroup/memory/test
  # run something allocating pages with __GFP_ACCOUNT, e.g.
  # a program using pipe
  dmesg | tail
  # remove the memory cgroup
  rmdir /sys/fs/cgroup/memory/test

we'll get bad page state bug complaining about page->_mapcount != -1:

  BUG: Bad page state in process swapper/0  pfn:1fd945c
  page:ffffea007f651700 count:0 mapcount:-511 mapping:          (null) index:0x0
  flags: 0x1000000000000000()

To avoid that, let's mark with PageKmemcg only those pages that are
actually charged to and hence pin a non-root memory cgroup.

Fixes: 4949148a ("mm: charge/uncharge kmemcg from generic page allocator paths")
Reported-and-tested-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c4159a75

09 8月, 2016 2 次提交
- I
  ceph: initialize pathbase in the !dentry case in encode_caps_cb() · 4eacd4cb
  由 Ilya Dryomov 提交于 8月 09, 2016
```
pathbase is the base inode; set it to 0 if we've got no path.

Coverity-id: 146348
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
```
  4eacd4cb
- Y
  ceph: fix null pointer dereference in ceph_flush_snaps() · e4d2b16a
  由 Yan, Zheng 提交于 8月 04, 2016
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
  e4d2b16a
08 8月, 2016 2 次提交

block: rename bio bi_rw to bi_opf · 1eff9d32

由 Jens Axboe 提交于 8月 05, 2016

Since commit 63a4cc24, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.

No intended functional changes in this commit.
Signed-off-by: NJens Axboe <axboe@fb.com>

1eff9d32

block/mm: make bdev_ops->rw_page() take a bool for read/write · c11f0c0b

由 Jens Axboe 提交于 8月 05, 2016

Commit abf54548 changed it from an 'rw' flags type to the
newer ops based interface, but now we're effectively leaking
some bdev internals to the rest of the kernel. Since we only
care about whether it's a read or a write at that level, just
pass in a bool 'is_write' parameter instead.

Then we can also move op_is_write() and friends back under
CONFIG_BLOCK protection.
Reviewed-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

c11f0c0b

07 8月, 2016 1 次提交

fs: return EPERM on immutable inode · 337684a1

由 Eryu Guan 提交于 8月 02, 2016

In most cases, EPERM is returned on immutable inode, and there're only a
few places returning EACCES. I noticed this when running LTP on
overlayfs, setxattr03 failed due to unexpected EACCES on immutable
inode.

So converting all EACCES to EPERM on immutable inode.
Acked-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NEryu Guan <guaneryu@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

337684a1

06 8月, 2016 5 次提交

NFSv4: Cap the transport reconnection timer at 1/2 lease period · 8d480326

由 Trond Myklebust 提交于 8月 05, 2016

We don't want to miss a lease period renewal due to the TCP connection
failing to reconnect in a timely fashion. To ensure this doesn't happen,
cap the reconnection timer so that we retry the connection attempt
at least every 1/2 lease period.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

8d480326

NFSv4: Cleanup the setting of the nfs4 lease period · fb10fb67

由 Trond Myklebust 提交于 8月 05, 2016

Make a helper function nfs4_set_lease_period() and have
nfs41_setup_state_renewal() and nfs4_do_fsinfo() use it.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

fb10fb67

ramoops: use persistent_ram_free() instead of kfree() for freeing prz · e976e564

由 Hiraku Toyooka 提交于 7月 25, 2016

persistent_ram_zone(=prz) structures are allocated by persistent_ram_new(),
which includes vmap() or ioremap(). But they are currently freed by
kfree(). This uses persistent_ram_free() for correct this asymmetry usage.
Signed-off-by: NHiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Signed-off-by: NNobuhiro Iwamatsu <nobuhiro.iwamatsu.kw@hitachi.com>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Seiji Aguchi <seiji.aguchi.tr@hitachi.com>
Signed-off-by: NKees Cook <keescook@chromium.org>

e976e564

ramoops: use DT reserved-memory bindings · 529182e2

由 Kees Cook 提交于 7月 29, 2016

Instead of a ramoops-specific node, use a child node of /reserved-memory.
This requires that of_platform_device_create() be explicitly called
for the node, though, since "/reserved-memory" does not have its own
"compatible" property.
Suggested-by: NRob Herring <robh@kernel.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Acked-by: NRob Herring <robh@kernel.org>

529182e2

NFSv4.2: LAYOUTSTATS may return NFS4ERR_ADMIN/DELEG_REVOKED · 206b3bb5

由 Trond Myklebust 提交于 8月 05, 2016

We should handle those errors in the same way we handle the other
stateid errors: by invalidating the faulty layout stateid.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

206b3bb5

05 8月, 2016 13 次提交

nfsd: remove some dead code in nfsd_create_locked() · 2b118859

由 Dan Carpenter 提交于 8月 03, 2016

We changed this around in f135af1041f ('nfsd: reorganize nfsd_create')
so "dchild" can't be an error pointer any more. Also, dchild can't be
NULL here (and dput would already handle this even if it was).
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

2b118859

nfsd: drop unnecessary MAY_EXEC check from create · fa08139d

由 J. Bruce Fields 提交于 7月 21, 2016

We need an fh_verify to make sure we at least have a dentry, but actual
permission checks happen later.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

fa08139d

J
nfsd: clean up bad-type check in nfsd_create_locked · 71423274
由 J. Bruce Fields 提交于 7月 22, 2016
```
Minor cleanup, no change in behavior.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
```
71423274

nfsd: remove unnecessary positive-dentry check · d03d9fe4

由 J. Bruce Fields 提交于 7月 21, 2016

vfs_{create,mkdir,mknod} each begin with a call to may_create(), which
returns EEXIST if the object already exists.

This check is therefore unnecessary.

(In the NFSv2 case, nfsd_proc_create also has such a check.  Contrary to
RFC 1094, our code seems to believe that a CREATE of an existing file
should succeed.  I'm leaving that behavior alone.)
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

d03d9fe4

nfsd: reorganize nfsd_create · b44061d0

由 J. Bruce Fields 提交于 7月 20, 2016

There's some odd logic in nfsd_create() that allows it to be called with
the parent directory either locked or unlocked.  The only already-locked
caller is NFSv2's nfsd_proc_create().  It's less confusing to split out
the unlocked case into a separate function which the NFSv2 code can call
directly.

Also fix some comments while we're here.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

b44061d0

nfsd: check d_can_lookup in fh_verify of directories · e75b23f9

由 J. Bruce Fields 提交于 7月 19, 2016

Create and other nfsd ops generally assume we can call lookup_one_len on
inodes with S_IFDIR set.  Al says that this assumption isn't true in
general, though it should be for the filesystem objects nfsd sees.

Add a check just to make sure our assumption isn't violated.

Remove a couple checks for i_op->lookup in create code.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

e75b23f9

nfsd: remove redundant zero-length check from create · 12391d07

由 J. Bruce Fields 提交于 7月 19, 2016

lookup_one_len already has this check.

The only effect of this patch is to return access instead of perm in the
0-length-filename case.  I actually prefer nfserr_perm (or _inval?), but
I doubt anyone cares.

The isdotent check seems redundant too, but I worry that some client
might actually care about that strange nfserr_exist error.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

12391d07

nfsd: Make creates return EEXIST instead of EACCES · 7eed34f1

由 Oleg Drokin 提交于 7月 14, 2016

When doing a create (mkdir/mknod) on a name, it's worth
checking the name exists first before returning EACCES in case
the directory is not writeable by the user.
This makes return values on the client more consistent
regardless of whenever the entry there is cached in the local
cache or not.
Another positive side effect is certain programs only expect
EEXIST in that case even despite POSIX allowing any valid
error to be returned.
Signed-off-by: NOleg Drokin <green@linuxhacker.ru>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

7eed34f1

mm/block: convert rw_page users to bio op use · abf54548

由 Mike Christie 提交于 8月 04, 2016

The rw_page users were not converted to use bio/req ops. As a result
bdev_write_page is not passing down REQ_OP_WRITE and the IOs will
be sent down as reads.
Signed-off-by: NMike Christie <mchristi@redhat.com>
Fixes: 4e1b2d52 ("block, fs, drivers: remove REQ_OP compat defs and related code")

Modified by me to:

1) Drop op_flags passing into ->rw_page(), as we don't use it.
2) Make op_is_write() and friends safe to use for !CONFIG_BLOCK
Signed-off-by: NJens Axboe <axboe@fb.com>

abf54548

Fixup direct bi_rw modifiers · b571bc60

由 Shaun Tancheff 提交于 7月 30, 2016

bi_rw should be using bio_set_op_attrs to set bi_rw.
Signed-off-by: NShaun Tancheff <shaun@tancheff.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Mike Christie <mchristi@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

b571bc60

f2fs: drop bio->bi_rw manual assignment · 1aee6b9a

由 Jens Axboe 提交于 7月 27, 2016

Merge 4fc29c1a included this extra line, but it's not needed (or
useful) since we'll bio_set_op_attrs() right after to properly set
the op and flags for the bio.
Signed-off-by: NJens Axboe <axboe@fb.com>

1aee6b9a

block: add missing group association in bio-cloning functions · 20bd723e

由 Paolo Valente 提交于 7月 27, 2016

When a bio is cloned, the newly created bio must be associated with
the same blkcg as the original bio (if BLK_CGROUP is enabled). If
this operation is not performed, then the new bio is not associated
with any group, and the group of the current task is returned when
the group of the bio is requested.

Depending on the cloning frequency, this may cause a large
percentage of the bios belonging to a given group to be treated
as if belonging to other groups (in most cases as if belonging to
the root group). The expected group isolation may thereby be broken.

This commit adds the missing association in bio-cloning functions.

Fixes: da2f0f74 ("Btrfs: add support for blkio controllers")
Cc: stable@vger.kernel.org # v4.3+
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Reviewed-by: NNikolay Borisov <kernel@kyup.com>
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

20bd723e

writeback: Write dirty times for WB_SYNC_ALL writeback · dc5ff2b1

由 Jan Kara 提交于 7月 26, 2016

Currently we take care to handle I_DIRTY_TIME in vfs_fsync() and
queue_io() so that inodes which have only dirty timestamps are properly
written on fsync(2) and sync(2). However there are other call sites -
most notably going through write_inode_now() - which expect inode to be
clean after WB_SYNC_ALL writeback. This is not currently true as we do
not clear I_DIRTY_TIME in __writeback_single_inode() even for
WB_SYNC_ALL writeback in all the cases. This then resulted in the
following oops because bdev_write_inode() did not clean the inode and
writeback code later stumbled over a dirty inode with detached wb.

  general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
  Modules linked in:
  CPU: 3 PID: 32 Comm: kworker/u10:1 Not tainted 4.6.0-rc3+ #349
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Workqueue: writeback wb_workfn (flush-11:0)
  task: ffff88006ccf1840 ti: ffff88006cda8000 task.ti: ffff88006cda8000
  RIP: 0010:[<ffffffff818884d2>]  [<ffffffff818884d2>]
  locked_inode_to_wb_and_lock_list+0xa2/0x750
  RSP: 0018:ffff88006cdaf7d0  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88006ccf2050
  RDX: 0000000000000000 RSI: 000000114c8a8484 RDI: 0000000000000286
  RBP: ffff88006cdaf820 R08: ffff88006ccf1840 R09: 0000000000000000
  R10: 000229915090805f R11: 0000000000000001 R12: ffff88006a72f5e0
  R13: dffffc0000000000 R14: ffffed000d4e5eed R15: ffffffff8830cf40
  FS:  0000000000000000(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000003301bf8 CR3: 000000006368f000 CR4: 00000000000006e0
  DR0: 0000000000001ec9 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
  Stack:
   ffff88006a72f680 ffff88006a72f768 ffff8800671230d8 03ff88006cdaf948
   ffff88006a72f668 ffff88006a72f5e0 ffff8800671230d8 ffff88006cdaf948
   ffff880065b90cc8 ffff880067123100 ffff88006cdaf970 ffffffff8188e12e
  Call Trace:
   [<     inline     >] inode_to_wb_and_lock_list fs/fs-writeback.c:309
   [<ffffffff8188e12e>] writeback_sb_inodes+0x4de/0x1250 fs/fs-writeback.c:1554
   [<ffffffff8188efa4>] __writeback_inodes_wb+0x104/0x1e0 fs/fs-writeback.c:1600
   [<ffffffff8188f9ae>] wb_writeback+0x7ce/0xc90 fs/fs-writeback.c:1709
   [<     inline     >] wb_do_writeback fs/fs-writeback.c:1844
   [<ffffffff81891079>] wb_workfn+0x2f9/0x1000 fs/fs-writeback.c:1884
   [<ffffffff813bcd1e>] process_one_work+0x78e/0x15c0 kernel/workqueue.c:2094
   [<ffffffff813bdc2b>] worker_thread+0xdb/0xfc0 kernel/workqueue.c:2228
   [<ffffffff813cdeef>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303
   [<ffffffff867bc5d2>] ret_from_fork+0x22/0x50 arch/x86/entry/entry_64.S:392
  Code: 05 94 4a a8 06 85 c0 0f 85 03 03 00 00 e8 07 15 d0 ff 41 80 3e
  00 0f 85 64 06 00 00 49 8b 9c 24 88 01 00 00 48 89 d8 48 c1 e8 03 <42>
  80 3c 28 00 0f 85 17 06 00 00 48 8b 03 48 83 c0 50 48 39 c3
  RIP  [<     inline     >] wb_get include/linux/backing-dev-defs.h:212
  RIP  [<ffffffff818884d2>] locked_inode_to_wb_and_lock_list+0xa2/0x750
  fs/fs-writeback.c:281
   RSP <ffff88006cdaf7d0>
  ---[ end trace 986a4d314dcb2694 ]---

Fix the problem by making sure __writeback_single_inode() writes inode
only with dirty times in WB_SYNC_ALL mode.
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Tested-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@fb.com>

dc5ff2b1

04 8月, 2016 4 次提交

block: remove BLK_DEV_DAX config option · 99a01cdf

由 Ross Zwisler 提交于 8月 03, 2016

The functionality for block device DAX was already removed with commit
acc93d30 ("Revert "block: enable dax for raw block devices"")

However, we still had a config option hanging around that was always
disabled because it depended on CONFIG_BROKEN. This config option was
introduced in commit 03cdadb0 ("block: disable block device DAX by
default")

This change reverts that commit, removing the dead config option.

Link: http://lkml.kernel.org/r/20160729182314.6368-1-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: NDan Williams <dan.j.williams@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

99a01cdf

hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common() · 8a545f18

由 Dan Carpenter 提交于 7月 13, 2016

We can't pass error pointers to kfree() or it causes an oops.

Fixes: 52b209f7 ('get rid of hostfs_read_inode()')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NRichard Weinberger <richard@nod.at>

8a545f18

Btrfs: fix __MAX_CSUM_ITEMS · 42049bf6

由 Chris Mason 提交于 8月 03, 2016

Jeff Mahoney's cleanup commit (14a1e067) wasn't correct for csums on
machines where the pagesize >= metadata blocksize.

This just reverts the relevant hunks to bring the old math back.
Signed-off-by: NChris Mason <clm@fb.com>

42049bf6

cachefiles: Fix race between inactivating and culling a cache object · db20a892

由 David Howells 提交于 8月 03, 2016

There's a race between cachefiles_mark_object_inactive() and
cachefiles_cull():

 (1) cachefiles_cull() can't delete a backing file until the cache object
     is marked inactive, but as soon as that's the case it's fair game.

 (2) cachefiles_mark_object_inactive() marks the object as being inactive
     and *only then* reads the i_blocks on the backing inode - but
     cachefiles_cull() might've managed to delete it by this point.

Fix this by making sure cachefiles_mark_object_inactive() gets any data it
needs from the backing inode before deactivating the object.

Without this, the following oops may occur:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
IP: [<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
...
CPU: 11 PID: 527 Comm: kworker/u64:4 Tainted: G          I    ------------   3.10.0-470.el7.x86_64 #1
Hardware name: Hewlett-Packard HP Z600 Workstation/0B54h, BIOS 786G4 v03.19 03/11/2011
Workqueue: fscache_object fscache_object_work_func [fscache]
task: ffff880035edaf10 ti: ffff8800b77c0000 task.ti: ffff8800b77c0000
RIP: 0010:[<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
RSP: 0018:ffff8800b77c3d70  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800bf6cc400 RCX: 0000000000000034
RDX: 0000000000000000 RSI: ffff880090ffc710 RDI: ffff8800bf761ef8
RBP: ffff8800b77c3d88 R08: 2000000000000000 R09: 0090ffc710000000
R10: ff51005d2ff1c400 R11: 0000000000000000 R12: ffff880090ffc600
R13: ffff8800bf6cc520 R14: ffff8800bf6cc400 R15: ffff8800bf6cc498
FS:  0000000000000000(0000) GS:ffff8800bb8c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000098 CR3: 00000000019ba000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffff880090ffc600 ffff8800bf6cc400 ffff8800867df140 ffff8800b77c3db0
 ffffffffa06c48cb ffff880090ffc600 ffff880090ffc180 ffff880090ffc658
 ffff8800b77c3df0 ffffffffa085d846 ffff8800a96b8150 ffff880090ffc600
Call Trace:
 [<ffffffffa06c48cb>] cachefiles_drop_object+0x6b/0xf0 [cachefiles]
 [<ffffffffa085d846>] fscache_drop_object+0xd6/0x1e0 [fscache]
 [<ffffffffa085d615>] fscache_object_work_func+0xa5/0x200 [fscache]
 [<ffffffff810a605b>] process_one_work+0x17b/0x470
 [<ffffffff810a6e96>] worker_thread+0x126/0x410
 [<ffffffff810a6d70>] ? rescuer_thread+0x460/0x460
 [<ffffffff810ae64f>] kthread+0xcf/0xe0
 [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140
 [<ffffffff81695418>] ret_from_fork+0x58/0x90
 [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140

The oopsing code shows:

	callq  0xffffffff810af6a0 <wake_up_bit>
	mov    0xf8(%r12),%rax
	mov    0x30(%rax),%rax
	mov    0x98(%rax),%rax   <---- oops here
	lock add %rax,0x130(%rbx)

where this is:

	d_backing_inode(object->dentry)->i_blocks

Fixes: a5b3a80b (CacheFiles: Provide read-and-reset release counters for cachefilesd)
Reported-by: NJianhong Yin <jiyin@redhat.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NSteve Dickson <steved@redhat.com>
cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

db20a892

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功