1. 19 8月, 2016 3 次提交
    • C
      f2fs: allow copying file range only in between regular files · fe8494bf
      Chao Yu 提交于
      Only if two input files are regular files, we allow copying data in
      range of them, otherwise, deny it.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fe8494bf
    • C
      Revert "f2fs: move i_size_write in f2fs_write_end" · 3024c9a1
      Chao Yu 提交于
      This reverts commit a2ee0a30.
      
      When testing with generic/032 of xfstest suit, failure message will be
      reported as below:
      
      generic/032 8s ... [failed, exit status 1] - output mismatch (see results/generic/032.out.bad)
          --- tests/generic/032.out	2015-01-11 16:52:27.643681072 +0800
          +++ results/generic/032.out.bad	2016-08-06 13:44:43.861330500 +0800
          @@ -1,5 +1,5 @@
           QA output created by 032
          -100 iterations
          -0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
          -*
          -0100000
          +1: [768..775]: unwritten
          +Unwritten extents found!
          ...
          (Run 'diff -u tests/generic/032.out results/generic/032.out.bad'  to see the entire diff)
      Ran: generic/032
      Failures: generic/032
      Failed 1 of 1 tests
      
      In write_end(), we should update i_size of inode before unlock page,
      otherwise, we will lose newly updated data in following race condition.
      
      Thread A			Thread B
      - write_end
       - unlock page
      				- writepages
      				 - lock_page
      				  - writepage
      				  if page is out-of-range of file size,
      				  we will skip writting the page.
       - update i_size
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3024c9a1
    • J
      Revert "f2fs: use percpu_rw_semaphore" · b873b798
      Jaegeuk Kim 提交于
      LKP reported -36.3% regression of fsmark.files_per_sec due to this patch.
      I've confirmed that fxmark [1] has also slight regression for DWAL.
      
      [1] https://github.com/sslab-gatech/fxmark
      
      This reverts commit ec795418.
      b873b798
  2. 13 8月, 2016 1 次提交
  3. 12 8月, 2016 2 次提交
  4. 11 8月, 2016 1 次提交
  5. 10 8月, 2016 2 次提交
    • K
      mm, writeback: flush plugged IO in wakeup_flusher_threads() · 51350ea0
      Konstantin Khlebnikov 提交于
      I've found funny live-lock between raid10 barriers during resync and
      memory controller hard limits. Inside mpage_readpages() task holds on to
      its plug bio which blocks the barrier in raid10. Its memory cgroup have
      no free memory thus the task goes into reclaimer but all reclaimable
      pages are dirty and cannot be written because raid10 is rebuilding and
      stuck on the barrier.
      
      Common flush of such IO in schedule() never happens, because the caller
      doesn't go to sleep.
      
      Lock is 'live' because changing memory limit or killing tasks which
      holds that stuck bio unblock whole progress.
      
      That was what happened in 3.18.x but I see no difference in upstream
      logic.  Theoretically this might happen even without memory cgroup.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      51350ea0
    • V
      mm: memcontrol: only mark charged pages with PageKmemcg · c4159a75
      Vladimir Davydov 提交于
      To distinguish non-slab pages charged to kmemcg we mark them PageKmemcg,
      which sets page->_mapcount to -512.  Currently, we set/clear PageKmemcg
      in __alloc_pages_nodemask()/free_pages_prepare() for any page allocated
      with __GFP_ACCOUNT, including those that aren't actually charged to any
      cgroup, i.e. allocated from the root cgroup context.  To avoid overhead
      in case cgroups are not used, we only do that if memcg_kmem_enabled() is
      true.  The latter is set iff there are kmem-enabled memory cgroups
      (online or offline).  The root cgroup is not considered kmem-enabled.
      
      As a result, if a page is allocated with __GFP_ACCOUNT for the root
      cgroup when there are kmem-enabled memory cgroups and is freed after all
      kmem-enabled memory cgroups were removed, e.g.
      
        # no memory cgroups has been created yet, create one
        mkdir /sys/fs/cgroup/memory/test
        # run something allocating pages with __GFP_ACCOUNT, e.g.
        # a program using pipe
        dmesg | tail
        # remove the memory cgroup
        rmdir /sys/fs/cgroup/memory/test
      
      we'll get bad page state bug complaining about page->_mapcount != -1:
      
        BUG: Bad page state in process swapper/0  pfn:1fd945c
        page:ffffea007f651700 count:0 mapcount:-511 mapping:          (null) index:0x0
        flags: 0x1000000000000000()
      
      To avoid that, let's mark with PageKmemcg only those pages that are
      actually charged to and hence pin a non-root memory cgroup.
      
      Fixes: 4949148a ("mm: charge/uncharge kmemcg from generic page allocator paths")
      Reported-and-tested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4159a75
  6. 09 8月, 2016 2 次提交
  7. 08 8月, 2016 2 次提交
    • J
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe 提交于
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1eff9d32
    • J
      block/mm: make bdev_ops->rw_page() take a bool for read/write · c11f0c0b
      Jens Axboe 提交于
      Commit abf54548 changed it from an 'rw' flags type to the
      newer ops based interface, but now we're effectively leaking
      some bdev internals to the rest of the kernel. Since we only
      care about whether it's a read or a write at that level, just
      pass in a bool 'is_write' parameter instead.
      
      Then we can also move op_is_write() and friends back under
      CONFIG_BLOCK protection.
      Reviewed-by: NMike Christie <mchristi@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c11f0c0b
  8. 07 8月, 2016 1 次提交
  9. 06 8月, 2016 6 次提交
    • D
      rxrpc: Fix races between skb free, ACK generation and replying · 372ee163
      David Howells 提交于
      Inside the kafs filesystem it is possible to occasionally have a call
      processed and terminated before we've had a chance to check whether we need
      to clean up the rx queue for that call because afs_send_simple_reply() ends
      the call when it is done, but this is done in a workqueue item that might
      happen to run to completion before afs_deliver_to_call() completes.
      
      Further, it is possible for rxrpc_kernel_send_data() to be called to send a
      reply before the last request-phase data skb is released.  The rxrpc skb
      destructor is where the ACK processing is done and the call state is
      advanced upon release of the last skb.  ACK generation is also deferred to
      a work item because it's possible that the skb destructor is not called in
      a context where kernel_sendmsg() can be invoked.
      
      To this end, the following changes are made:
      
       (1) kernel_rxrpc_data_consumed() is added.  This should be called whenever
           an skb is emptied so as to crank the ACK and call states.  This does
           not release the skb, however.  kernel_rxrpc_free_skb() must now be
           called to achieve that.  These together replace
           rxrpc_kernel_data_delivered().
      
       (2) kernel_rxrpc_data_consumed() is wrapped by afs_data_consumed().
      
           This makes afs_deliver_to_call() easier to work as the skb can simply
           be discarded unconditionally here without trying to work out what the
           return value of the ->deliver() function means.
      
           The ->deliver() functions can, via afs_data_complete(),
           afs_transfer_reply() and afs_extract_data() mark that an skb has been
           consumed (thereby cranking the state) without the need to
           conditionally free the skb to make sure the state is correct on an
           incoming call for when the call processor tries to send the reply.
      
       (3) rxrpc_recvmsg() now has to call kernel_rxrpc_data_consumed() when it
           has finished with a packet and MSG_PEEK isn't set.
      
       (4) rxrpc_packet_destructor() no longer calls rxrpc_hard_ACK_data().
      
           Because of this, we no longer need to clear the destructor and put the
           call before we free the skb in cases where we don't want the ACK/call
           state to be cranked.
      
       (5) The ->deliver() call-type callbacks are made to return -EAGAIN rather
           than 0 if they expect more data (afs_extract_data() returns -EAGAIN to
           the delivery function already), and the caller is now responsible for
           producing an abort if that was the last packet.
      
       (6) There are many bits of unmarshalling code where:
      
       		ret = afs_extract_data(call, skb, last, ...);
      		switch (ret) {
      		case 0:		break;
      		case -EAGAIN:	return 0;
      		default:	return ret;
      		}
      
           is to be found.  As -EAGAIN can now be passed back to the caller, we
           now just return if ret < 0:
      
       		ret = afs_extract_data(call, skb, last, ...);
      		if (ret < 0)
      			return ret;
      
       (7) Checks for trailing data and empty final data packets has been
           consolidated as afs_data_complete().  So:
      
      		if (skb->len > 0)
      			return -EBADMSG;
      		if (!last)
      			return 0;
      
           becomes:
      
      		ret = afs_data_complete(call, skb, last);
      		if (ret < 0)
      			return ret;
      
       (8) afs_transfer_reply() now checks the amount of data it has against the
           amount of data desired and the amount of data in the skb and returns
           an error to induce an abort if we don't get exactly what we want.
      
      Without these changes, the following oops can occasionally be observed,
      particularly if some printks are inserted into the delivery path:
      
      general protection fault: 0000 [#1] SMP
      Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc]
      CPU: 0 PID: 1305 Comm: kworker/u8:3 Tainted: G            E   4.7.0-fsdevel+ #1303
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      Workqueue: kafsd afs_async_workfn [kafs]
      task: ffff88040be041c0 ti: ffff88040c070000 task.ti: ffff88040c070000
      RIP: 0010:[<ffffffff8108fd3c>]  [<ffffffff8108fd3c>] __lock_acquire+0xcf/0x15a1
      RSP: 0018:ffff88040c073bc0  EFLAGS: 00010002
      RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: ffff88040d29a710
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88040d29a710
      RBP: ffff88040c073c70 R08: 0000000000000001 R09: 0000000000000001
      R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: ffff88040be041c0 R15: ffffffff814c928f
      FS:  0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fa4595f4750 CR3: 0000000001c14000 CR4: 00000000001406f0
      Stack:
       0000000000000006 000000000be04930 0000000000000000 ffff880400000000
       ffff880400000000 ffffffff8108f847 ffff88040be041c0 ffffffff81050446
       ffff8803fc08a920 ffff8803fc08a958 ffff88040be041c0 ffff88040c073c38
      Call Trace:
       [<ffffffff8108f847>] ? mark_held_locks+0x5e/0x74
       [<ffffffff81050446>] ? __local_bh_enable_ip+0x9b/0xa1
       [<ffffffff8108f9ca>] ? trace_hardirqs_on_caller+0x16d/0x189
       [<ffffffff810915f4>] lock_acquire+0x122/0x1b6
       [<ffffffff810915f4>] ? lock_acquire+0x122/0x1b6
       [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61
       [<ffffffff81609dbf>] _raw_spin_lock_irqsave+0x35/0x49
       [<ffffffff814c928f>] ? skb_dequeue+0x18/0x61
       [<ffffffff814c928f>] skb_dequeue+0x18/0x61
       [<ffffffffa009aa92>] afs_deliver_to_call+0x344/0x39d [kafs]
       [<ffffffffa009ab37>] afs_process_async_call+0x4c/0xd5 [kafs]
       [<ffffffffa0099e9c>] afs_async_workfn+0xe/0x10 [kafs]
       [<ffffffff81063a3a>] process_one_work+0x29d/0x57c
       [<ffffffff81064ac2>] worker_thread+0x24a/0x385
       [<ffffffff81064878>] ? rescuer_thread+0x2d0/0x2d0
       [<ffffffff810696f5>] kthread+0xf3/0xfb
       [<ffffffff8160a6ff>] ret_from_fork+0x1f/0x40
       [<ffffffff81069602>] ? kthread_create_on_node+0x1cf/0x1cf
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      372ee163
    • T
      NFSv4: Cap the transport reconnection timer at 1/2 lease period · 8d480326
      Trond Myklebust 提交于
      We don't want to miss a lease period renewal due to the TCP connection
      failing to reconnect in a timely fashion. To ensure this doesn't happen,
      cap the reconnection timer so that we retry the connection attempt
      at least every 1/2 lease period.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      8d480326
    • T
      NFSv4: Cleanup the setting of the nfs4 lease period · fb10fb67
      Trond Myklebust 提交于
      Make a helper function nfs4_set_lease_period() and have
      nfs41_setup_state_renewal() and nfs4_do_fsinfo() use it.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      fb10fb67
    • H
      ramoops: use persistent_ram_free() instead of kfree() for freeing prz · e976e564
      Hiraku Toyooka 提交于
      persistent_ram_zone(=prz) structures are allocated by persistent_ram_new(),
      which includes vmap() or ioremap(). But they are currently freed by
      kfree(). This uses persistent_ram_free() for correct this asymmetry usage.
      Signed-off-by: NHiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
      Signed-off-by: NNobuhiro Iwamatsu <nobuhiro.iwamatsu.kw@hitachi.com>
      Cc: Mark Salyzyn <salyzyn@android.com>
      Cc: Seiji Aguchi <seiji.aguchi.tr@hitachi.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      e976e564
    • K
      ramoops: use DT reserved-memory bindings · 529182e2
      Kees Cook 提交于
      Instead of a ramoops-specific node, use a child node of /reserved-memory.
      This requires that of_platform_device_create() be explicitly called
      for the node, though, since "/reserved-memory" does not have its own
      "compatible" property.
      Suggested-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NRob Herring <robh@kernel.org>
      529182e2
    • T
      NFSv4.2: LAYOUTSTATS may return NFS4ERR_ADMIN/DELEG_REVOKED · 206b3bb5
      Trond Myklebust 提交于
      We should handle those errors in the same way we handle the other
      stateid errors: by invalidating the faulty layout stateid.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      206b3bb5
  10. 05 8月, 2016 13 次提交
    • D
      nfsd: remove some dead code in nfsd_create_locked() · 2b118859
      Dan Carpenter 提交于
      We changed this around in f135af1041f ('nfsd: reorganize nfsd_create')
      so "dchild" can't be an error pointer any more.  Also, dchild can't be
      NULL here (and dput would already handle this even if it was).
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      2b118859
    • J
      nfsd: drop unnecessary MAY_EXEC check from create · fa08139d
      J. Bruce Fields 提交于
      We need an fh_verify to make sure we at least have a dentry, but actual
      permission checks happen later.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      fa08139d
    • J
      nfsd: clean up bad-type check in nfsd_create_locked · 71423274
      J. Bruce Fields 提交于
      Minor cleanup, no change in behavior.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      71423274
    • J
      nfsd: remove unnecessary positive-dentry check · d03d9fe4
      J. Bruce Fields 提交于
      vfs_{create,mkdir,mknod} each begin with a call to may_create(), which
      returns EEXIST if the object already exists.
      
      This check is therefore unnecessary.
      
      (In the NFSv2 case, nfsd_proc_create also has such a check.  Contrary to
      RFC 1094, our code seems to believe that a CREATE of an existing file
      should succeed.  I'm leaving that behavior alone.)
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      d03d9fe4
    • J
      nfsd: reorganize nfsd_create · b44061d0
      J. Bruce Fields 提交于
      There's some odd logic in nfsd_create() that allows it to be called with
      the parent directory either locked or unlocked.  The only already-locked
      caller is NFSv2's nfsd_proc_create().  It's less confusing to split out
      the unlocked case into a separate function which the NFSv2 code can call
      directly.
      
      Also fix some comments while we're here.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      b44061d0
    • J
      nfsd: check d_can_lookup in fh_verify of directories · e75b23f9
      J. Bruce Fields 提交于
      Create and other nfsd ops generally assume we can call lookup_one_len on
      inodes with S_IFDIR set.  Al says that this assumption isn't true in
      general, though it should be for the filesystem objects nfsd sees.
      
      Add a check just to make sure our assumption isn't violated.
      
      Remove a couple checks for i_op->lookup in create code.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      e75b23f9
    • J
      nfsd: remove redundant zero-length check from create · 12391d07
      J. Bruce Fields 提交于
      lookup_one_len already has this check.
      
      The only effect of this patch is to return access instead of perm in the
      0-length-filename case.  I actually prefer nfserr_perm (or _inval?), but
      I doubt anyone cares.
      
      The isdotent check seems redundant too, but I worry that some client
      might actually care about that strange nfserr_exist error.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      12391d07
    • O
      nfsd: Make creates return EEXIST instead of EACCES · 7eed34f1
      Oleg Drokin 提交于
      When doing a create (mkdir/mknod) on a name, it's worth
      checking the name exists first before returning EACCES in case
      the directory is not writeable by the user.
      This makes return values on the client more consistent
      regardless of whenever the entry there is cached in the local
      cache or not.
      Another positive side effect is certain programs only expect
      EEXIST in that case even despite POSIX allowing any valid
      error to be returned.
      Signed-off-by: NOleg Drokin <green@linuxhacker.ru>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      7eed34f1
    • M
      mm/block: convert rw_page users to bio op use · abf54548
      Mike Christie 提交于
      The rw_page users were not converted to use bio/req ops. As a result
      bdev_write_page is not passing down REQ_OP_WRITE and the IOs will
      be sent down as reads.
      Signed-off-by: NMike Christie <mchristi@redhat.com>
      Fixes: 4e1b2d52 ("block, fs, drivers: remove REQ_OP compat defs and related code")
      
      Modified by me to:
      
      1) Drop op_flags passing into ->rw_page(), as we don't use it.
      2) Make op_is_write() and friends safe to use for !CONFIG_BLOCK
      Signed-off-by: NJens Axboe <axboe@fb.com>
      abf54548
    • S
      Fixup direct bi_rw modifiers · b571bc60
      Shaun Tancheff 提交于
      bi_rw should be using bio_set_op_attrs to set bi_rw.
      Signed-off-by: NShaun Tancheff <shaun@tancheff.com>
      Cc: Chris Mason <clm@fb.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: David Sterba <dsterba@suse.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b571bc60
    • J
      f2fs: drop bio->bi_rw manual assignment · 1aee6b9a
      Jens Axboe 提交于
      Merge 4fc29c1a included this extra line, but it's not needed (or
      useful) since we'll bio_set_op_attrs() right after to properly set
      the op and flags for the bio.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1aee6b9a
    • P
      block: add missing group association in bio-cloning functions · 20bd723e
      Paolo Valente 提交于
      When a bio is cloned, the newly created bio must be associated with
      the same blkcg as the original bio (if BLK_CGROUP is enabled). If
      this operation is not performed, then the new bio is not associated
      with any group, and the group of the current task is returned when
      the group of the bio is requested.
      
      Depending on the cloning frequency, this may cause a large
      percentage of the bios belonging to a given group to be treated
      as if belonging to other groups (in most cases as if belonging to
      the root group). The expected group isolation may thereby be broken.
      
      This commit adds the missing association in bio-cloning functions.
      
      Fixes: da2f0f74 ("Btrfs: add support for blkio controllers")
      Cc: stable@vger.kernel.org # v4.3+
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Reviewed-by: NNikolay Borisov <kernel@kyup.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      20bd723e
    • J
      writeback: Write dirty times for WB_SYNC_ALL writeback · dc5ff2b1
      Jan Kara 提交于
      Currently we take care to handle I_DIRTY_TIME in vfs_fsync() and
      queue_io() so that inodes which have only dirty timestamps are properly
      written on fsync(2) and sync(2). However there are other call sites -
      most notably going through write_inode_now() - which expect inode to be
      clean after WB_SYNC_ALL writeback. This is not currently true as we do
      not clear I_DIRTY_TIME in __writeback_single_inode() even for
      WB_SYNC_ALL writeback in all the cases. This then resulted in the
      following oops because bdev_write_inode() did not clean the inode and
      writeback code later stumbled over a dirty inode with detached wb.
      
        general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
        Modules linked in:
        CPU: 3 PID: 32 Comm: kworker/u10:1 Not tainted 4.6.0-rc3+ #349
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
        Workqueue: writeback wb_workfn (flush-11:0)
        task: ffff88006ccf1840 ti: ffff88006cda8000 task.ti: ffff88006cda8000
        RIP: 0010:[<ffffffff818884d2>]  [<ffffffff818884d2>]
        locked_inode_to_wb_and_lock_list+0xa2/0x750
        RSP: 0018:ffff88006cdaf7d0  EFLAGS: 00010246
        RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88006ccf2050
        RDX: 0000000000000000 RSI: 000000114c8a8484 RDI: 0000000000000286
        RBP: ffff88006cdaf820 R08: ffff88006ccf1840 R09: 0000000000000000
        R10: 000229915090805f R11: 0000000000000001 R12: ffff88006a72f5e0
        R13: dffffc0000000000 R14: ffffed000d4e5eed R15: ffffffff8830cf40
        FS:  0000000000000000(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000003301bf8 CR3: 000000006368f000 CR4: 00000000000006e0
        DR0: 0000000000001ec9 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
        Stack:
         ffff88006a72f680 ffff88006a72f768 ffff8800671230d8 03ff88006cdaf948
         ffff88006a72f668 ffff88006a72f5e0 ffff8800671230d8 ffff88006cdaf948
         ffff880065b90cc8 ffff880067123100 ffff88006cdaf970 ffffffff8188e12e
        Call Trace:
         [<     inline     >] inode_to_wb_and_lock_list fs/fs-writeback.c:309
         [<ffffffff8188e12e>] writeback_sb_inodes+0x4de/0x1250 fs/fs-writeback.c:1554
         [<ffffffff8188efa4>] __writeback_inodes_wb+0x104/0x1e0 fs/fs-writeback.c:1600
         [<ffffffff8188f9ae>] wb_writeback+0x7ce/0xc90 fs/fs-writeback.c:1709
         [<     inline     >] wb_do_writeback fs/fs-writeback.c:1844
         [<ffffffff81891079>] wb_workfn+0x2f9/0x1000 fs/fs-writeback.c:1884
         [<ffffffff813bcd1e>] process_one_work+0x78e/0x15c0 kernel/workqueue.c:2094
         [<ffffffff813bdc2b>] worker_thread+0xdb/0xfc0 kernel/workqueue.c:2228
         [<ffffffff813cdeef>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303
         [<ffffffff867bc5d2>] ret_from_fork+0x22/0x50 arch/x86/entry/entry_64.S:392
        Code: 05 94 4a a8 06 85 c0 0f 85 03 03 00 00 e8 07 15 d0 ff 41 80 3e
        00 0f 85 64 06 00 00 49 8b 9c 24 88 01 00 00 48 89 d8 48 c1 e8 03 <42>
        80 3c 28 00 0f 85 17 06 00 00 48 8b 03 48 83 c0 50 48 39 c3
        RIP  [<     inline     >] wb_get include/linux/backing-dev-defs.h:212
        RIP  [<ffffffff818884d2>] locked_inode_to_wb_and_lock_list+0xa2/0x750
        fs/fs-writeback.c:281
         RSP <ffff88006cdaf7d0>
        ---[ end trace 986a4d314dcb2694 ]---
      
      Fix the problem by making sure __writeback_single_inode() writes inode
      only with dirty times in WB_SYNC_ALL mode.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Tested-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      dc5ff2b1
  11. 04 8月, 2016 5 次提交
    • R
      block: remove BLK_DEV_DAX config option · 99a01cdf
      Ross Zwisler 提交于
      The functionality for block device DAX was already removed with commit
      acc93d30 ("Revert "block: enable dax for raw block devices"")
      
      However, we still had a config option hanging around that was always
      disabled because it depended on CONFIG_BROKEN.  This config option was
      introduced in commit 03cdadb0 ("block: disable block device DAX by
      default")
      
      This change reverts that commit, removing the dead config option.
      
      Link: http://lkml.kernel.org/r/20160729182314.6368-1-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      99a01cdf
    • D
      hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common() · 8a545f18
      Dan Carpenter 提交于
      We can't pass error pointers to kfree() or it causes an oops.
      
      Fixes: 52b209f7 ('get rid of hostfs_read_inode()')
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      8a545f18
    • C
      Btrfs: fix __MAX_CSUM_ITEMS · 42049bf6
      Chris Mason 提交于
      Jeff Mahoney's cleanup commit (14a1e067) wasn't correct for csums on
      machines where the pagesize >= metadata blocksize.
      
      This just reverts the relevant hunks to bring the old math back.
      Signed-off-by: NChris Mason <clm@fb.com>
      42049bf6
    • D
      cachefiles: Fix race between inactivating and culling a cache object · db20a892
      David Howells 提交于
      There's a race between cachefiles_mark_object_inactive() and
      cachefiles_cull():
      
       (1) cachefiles_cull() can't delete a backing file until the cache object
           is marked inactive, but as soon as that's the case it's fair game.
      
       (2) cachefiles_mark_object_inactive() marks the object as being inactive
           and *only then* reads the i_blocks on the backing inode - but
           cachefiles_cull() might've managed to delete it by this point.
      
      Fix this by making sure cachefiles_mark_object_inactive() gets any data it
      needs from the backing inode before deactivating the object.
      
      Without this, the following oops may occur:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
      IP: [<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
      ...
      CPU: 11 PID: 527 Comm: kworker/u64:4 Tainted: G          I    ------------   3.10.0-470.el7.x86_64 #1
      Hardware name: Hewlett-Packard HP Z600 Workstation/0B54h, BIOS 786G4 v03.19 03/11/2011
      Workqueue: fscache_object fscache_object_work_func [fscache]
      task: ffff880035edaf10 ti: ffff8800b77c0000 task.ti: ffff8800b77c0000
      RIP: 0010:[<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
      RSP: 0018:ffff8800b77c3d70  EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff8800bf6cc400 RCX: 0000000000000034
      RDX: 0000000000000000 RSI: ffff880090ffc710 RDI: ffff8800bf761ef8
      RBP: ffff8800b77c3d88 R08: 2000000000000000 R09: 0090ffc710000000
      R10: ff51005d2ff1c400 R11: 0000000000000000 R12: ffff880090ffc600
      R13: ffff8800bf6cc520 R14: ffff8800bf6cc400 R15: ffff8800bf6cc498
      FS:  0000000000000000(0000) GS:ffff8800bb8c0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000098 CR3: 00000000019ba000 CR4: 00000000000007e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Stack:
       ffff880090ffc600 ffff8800bf6cc400 ffff8800867df140 ffff8800b77c3db0
       ffffffffa06c48cb ffff880090ffc600 ffff880090ffc180 ffff880090ffc658
       ffff8800b77c3df0 ffffffffa085d846 ffff8800a96b8150 ffff880090ffc600
      Call Trace:
       [<ffffffffa06c48cb>] cachefiles_drop_object+0x6b/0xf0 [cachefiles]
       [<ffffffffa085d846>] fscache_drop_object+0xd6/0x1e0 [fscache]
       [<ffffffffa085d615>] fscache_object_work_func+0xa5/0x200 [fscache]
       [<ffffffff810a605b>] process_one_work+0x17b/0x470
       [<ffffffff810a6e96>] worker_thread+0x126/0x410
       [<ffffffff810a6d70>] ? rescuer_thread+0x460/0x460
       [<ffffffff810ae64f>] kthread+0xcf/0xe0
       [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140
       [<ffffffff81695418>] ret_from_fork+0x58/0x90
       [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140
      
      The oopsing code shows:
      
      	callq  0xffffffff810af6a0 <wake_up_bit>
      	mov    0xf8(%r12),%rax
      	mov    0x30(%rax),%rax
      	mov    0x98(%rax),%rax   <---- oops here
      	lock add %rax,0x130(%rbx)
      
      where this is:
      
      	d_backing_inode(object->dentry)->i_blocks
      
      Fixes: a5b3a80b (CacheFiles: Provide read-and-reset release counters for cachefilesd)
      Reported-by: NJianhong Yin <jiyin@redhat.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NSteve Dickson <steved@redhat.com>
      cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      db20a892
    • G
      fs/proc: Add compiler check for -Wno-override-init to support gcc < 4.2 · 4b2e0162
      Geert Uytterhoeven 提交于
      With gcc < 4.2 (e.g. 4.1.2):
      
            CC      fs/proc/task_mmu.o
          cc1: error: unrecognized command line option "-Wno-override-init"
      
      To fix this, only enable the compiler option when it is actually
      supported by the compiler.
      
      Fixes: ca52953f ("fs/proc/task_mmu.c: suppress compilation warnings with W=1")
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NValdis Kletnieks <valdis.kletnieks@vt.edu>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b2e0162
  12. 03 8月, 2016 2 次提交