1. 13 8月, 2016 1 次提交
  2. 12 8月, 2016 2 次提交
  3. 11 8月, 2016 1 次提交
  4. 10 8月, 2016 2 次提交
    • K
      mm, writeback: flush plugged IO in wakeup_flusher_threads() · 51350ea0
      Konstantin Khlebnikov 提交于
      I've found funny live-lock between raid10 barriers during resync and
      memory controller hard limits. Inside mpage_readpages() task holds on to
      its plug bio which blocks the barrier in raid10. Its memory cgroup have
      no free memory thus the task goes into reclaimer but all reclaimable
      pages are dirty and cannot be written because raid10 is rebuilding and
      stuck on the barrier.
      
      Common flush of such IO in schedule() never happens, because the caller
      doesn't go to sleep.
      
      Lock is 'live' because changing memory limit or killing tasks which
      holds that stuck bio unblock whole progress.
      
      That was what happened in 3.18.x but I see no difference in upstream
      logic.  Theoretically this might happen even without memory cgroup.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      51350ea0
    • V
      mm: memcontrol: only mark charged pages with PageKmemcg · c4159a75
      Vladimir Davydov 提交于
      To distinguish non-slab pages charged to kmemcg we mark them PageKmemcg,
      which sets page->_mapcount to -512.  Currently, we set/clear PageKmemcg
      in __alloc_pages_nodemask()/free_pages_prepare() for any page allocated
      with __GFP_ACCOUNT, including those that aren't actually charged to any
      cgroup, i.e. allocated from the root cgroup context.  To avoid overhead
      in case cgroups are not used, we only do that if memcg_kmem_enabled() is
      true.  The latter is set iff there are kmem-enabled memory cgroups
      (online or offline).  The root cgroup is not considered kmem-enabled.
      
      As a result, if a page is allocated with __GFP_ACCOUNT for the root
      cgroup when there are kmem-enabled memory cgroups and is freed after all
      kmem-enabled memory cgroups were removed, e.g.
      
        # no memory cgroups has been created yet, create one
        mkdir /sys/fs/cgroup/memory/test
        # run something allocating pages with __GFP_ACCOUNT, e.g.
        # a program using pipe
        dmesg | tail
        # remove the memory cgroup
        rmdir /sys/fs/cgroup/memory/test
      
      we'll get bad page state bug complaining about page->_mapcount != -1:
      
        BUG: Bad page state in process swapper/0  pfn:1fd945c
        page:ffffea007f651700 count:0 mapcount:-511 mapping:          (null) index:0x0
        flags: 0x1000000000000000()
      
      To avoid that, let's mark with PageKmemcg only those pages that are
      actually charged to and hence pin a non-root memory cgroup.
      
      Fixes: 4949148a ("mm: charge/uncharge kmemcg from generic page allocator paths")
      Reported-and-tested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4159a75
  5. 09 8月, 2016 2 次提交
  6. 08 8月, 2016 2 次提交
    • J
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe 提交于
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1eff9d32
    • J
      block/mm: make bdev_ops->rw_page() take a bool for read/write · c11f0c0b
      Jens Axboe 提交于
      Commit abf54548 changed it from an 'rw' flags type to the
      newer ops based interface, but now we're effectively leaking
      some bdev internals to the rest of the kernel. Since we only
      care about whether it's a read or a write at that level, just
      pass in a bool 'is_write' parameter instead.
      
      Then we can also move op_is_write() and friends back under
      CONFIG_BLOCK protection.
      Reviewed-by: NMike Christie <mchristi@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c11f0c0b
  7. 07 8月, 2016 1 次提交
  8. 06 8月, 2016 5 次提交
  9. 05 8月, 2016 13 次提交
    • D
      nfsd: remove some dead code in nfsd_create_locked() · 2b118859
      Dan Carpenter 提交于
      We changed this around in f135af1041f ('nfsd: reorganize nfsd_create')
      so "dchild" can't be an error pointer any more.  Also, dchild can't be
      NULL here (and dput would already handle this even if it was).
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      2b118859
    • J
      nfsd: drop unnecessary MAY_EXEC check from create · fa08139d
      J. Bruce Fields 提交于
      We need an fh_verify to make sure we at least have a dentry, but actual
      permission checks happen later.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      fa08139d
    • J
      nfsd: clean up bad-type check in nfsd_create_locked · 71423274
      J. Bruce Fields 提交于
      Minor cleanup, no change in behavior.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      71423274
    • J
      nfsd: remove unnecessary positive-dentry check · d03d9fe4
      J. Bruce Fields 提交于
      vfs_{create,mkdir,mknod} each begin with a call to may_create(), which
      returns EEXIST if the object already exists.
      
      This check is therefore unnecessary.
      
      (In the NFSv2 case, nfsd_proc_create also has such a check.  Contrary to
      RFC 1094, our code seems to believe that a CREATE of an existing file
      should succeed.  I'm leaving that behavior alone.)
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      d03d9fe4
    • J
      nfsd: reorganize nfsd_create · b44061d0
      J. Bruce Fields 提交于
      There's some odd logic in nfsd_create() that allows it to be called with
      the parent directory either locked or unlocked.  The only already-locked
      caller is NFSv2's nfsd_proc_create().  It's less confusing to split out
      the unlocked case into a separate function which the NFSv2 code can call
      directly.
      
      Also fix some comments while we're here.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      b44061d0
    • J
      nfsd: check d_can_lookup in fh_verify of directories · e75b23f9
      J. Bruce Fields 提交于
      Create and other nfsd ops generally assume we can call lookup_one_len on
      inodes with S_IFDIR set.  Al says that this assumption isn't true in
      general, though it should be for the filesystem objects nfsd sees.
      
      Add a check just to make sure our assumption isn't violated.
      
      Remove a couple checks for i_op->lookup in create code.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      e75b23f9
    • J
      nfsd: remove redundant zero-length check from create · 12391d07
      J. Bruce Fields 提交于
      lookup_one_len already has this check.
      
      The only effect of this patch is to return access instead of perm in the
      0-length-filename case.  I actually prefer nfserr_perm (or _inval?), but
      I doubt anyone cares.
      
      The isdotent check seems redundant too, but I worry that some client
      might actually care about that strange nfserr_exist error.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      12391d07
    • O
      nfsd: Make creates return EEXIST instead of EACCES · 7eed34f1
      Oleg Drokin 提交于
      When doing a create (mkdir/mknod) on a name, it's worth
      checking the name exists first before returning EACCES in case
      the directory is not writeable by the user.
      This makes return values on the client more consistent
      regardless of whenever the entry there is cached in the local
      cache or not.
      Another positive side effect is certain programs only expect
      EEXIST in that case even despite POSIX allowing any valid
      error to be returned.
      Signed-off-by: NOleg Drokin <green@linuxhacker.ru>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      7eed34f1
    • M
      mm/block: convert rw_page users to bio op use · abf54548
      Mike Christie 提交于
      The rw_page users were not converted to use bio/req ops. As a result
      bdev_write_page is not passing down REQ_OP_WRITE and the IOs will
      be sent down as reads.
      Signed-off-by: NMike Christie <mchristi@redhat.com>
      Fixes: 4e1b2d52 ("block, fs, drivers: remove REQ_OP compat defs and related code")
      
      Modified by me to:
      
      1) Drop op_flags passing into ->rw_page(), as we don't use it.
      2) Make op_is_write() and friends safe to use for !CONFIG_BLOCK
      Signed-off-by: NJens Axboe <axboe@fb.com>
      abf54548
    • S
      Fixup direct bi_rw modifiers · b571bc60
      Shaun Tancheff 提交于
      bi_rw should be using bio_set_op_attrs to set bi_rw.
      Signed-off-by: NShaun Tancheff <shaun@tancheff.com>
      Cc: Chris Mason <clm@fb.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: David Sterba <dsterba@suse.com>
      Cc: Mike Christie <mchristi@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b571bc60
    • J
      f2fs: drop bio->bi_rw manual assignment · 1aee6b9a
      Jens Axboe 提交于
      Merge 4fc29c1a included this extra line, but it's not needed (or
      useful) since we'll bio_set_op_attrs() right after to properly set
      the op and flags for the bio.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1aee6b9a
    • P
      block: add missing group association in bio-cloning functions · 20bd723e
      Paolo Valente 提交于
      When a bio is cloned, the newly created bio must be associated with
      the same blkcg as the original bio (if BLK_CGROUP is enabled). If
      this operation is not performed, then the new bio is not associated
      with any group, and the group of the current task is returned when
      the group of the bio is requested.
      
      Depending on the cloning frequency, this may cause a large
      percentage of the bios belonging to a given group to be treated
      as if belonging to other groups (in most cases as if belonging to
      the root group). The expected group isolation may thereby be broken.
      
      This commit adds the missing association in bio-cloning functions.
      
      Fixes: da2f0f74 ("Btrfs: add support for blkio controllers")
      Cc: stable@vger.kernel.org # v4.3+
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Reviewed-by: NNikolay Borisov <kernel@kyup.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      20bd723e
    • J
      writeback: Write dirty times for WB_SYNC_ALL writeback · dc5ff2b1
      Jan Kara 提交于
      Currently we take care to handle I_DIRTY_TIME in vfs_fsync() and
      queue_io() so that inodes which have only dirty timestamps are properly
      written on fsync(2) and sync(2). However there are other call sites -
      most notably going through write_inode_now() - which expect inode to be
      clean after WB_SYNC_ALL writeback. This is not currently true as we do
      not clear I_DIRTY_TIME in __writeback_single_inode() even for
      WB_SYNC_ALL writeback in all the cases. This then resulted in the
      following oops because bdev_write_inode() did not clean the inode and
      writeback code later stumbled over a dirty inode with detached wb.
      
        general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
        Modules linked in:
        CPU: 3 PID: 32 Comm: kworker/u10:1 Not tainted 4.6.0-rc3+ #349
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
        Workqueue: writeback wb_workfn (flush-11:0)
        task: ffff88006ccf1840 ti: ffff88006cda8000 task.ti: ffff88006cda8000
        RIP: 0010:[<ffffffff818884d2>]  [<ffffffff818884d2>]
        locked_inode_to_wb_and_lock_list+0xa2/0x750
        RSP: 0018:ffff88006cdaf7d0  EFLAGS: 00010246
        RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88006ccf2050
        RDX: 0000000000000000 RSI: 000000114c8a8484 RDI: 0000000000000286
        RBP: ffff88006cdaf820 R08: ffff88006ccf1840 R09: 0000000000000000
        R10: 000229915090805f R11: 0000000000000001 R12: ffff88006a72f5e0
        R13: dffffc0000000000 R14: ffffed000d4e5eed R15: ffffffff8830cf40
        FS:  0000000000000000(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000003301bf8 CR3: 000000006368f000 CR4: 00000000000006e0
        DR0: 0000000000001ec9 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
        Stack:
         ffff88006a72f680 ffff88006a72f768 ffff8800671230d8 03ff88006cdaf948
         ffff88006a72f668 ffff88006a72f5e0 ffff8800671230d8 ffff88006cdaf948
         ffff880065b90cc8 ffff880067123100 ffff88006cdaf970 ffffffff8188e12e
        Call Trace:
         [<     inline     >] inode_to_wb_and_lock_list fs/fs-writeback.c:309
         [<ffffffff8188e12e>] writeback_sb_inodes+0x4de/0x1250 fs/fs-writeback.c:1554
         [<ffffffff8188efa4>] __writeback_inodes_wb+0x104/0x1e0 fs/fs-writeback.c:1600
         [<ffffffff8188f9ae>] wb_writeback+0x7ce/0xc90 fs/fs-writeback.c:1709
         [<     inline     >] wb_do_writeback fs/fs-writeback.c:1844
         [<ffffffff81891079>] wb_workfn+0x2f9/0x1000 fs/fs-writeback.c:1884
         [<ffffffff813bcd1e>] process_one_work+0x78e/0x15c0 kernel/workqueue.c:2094
         [<ffffffff813bdc2b>] worker_thread+0xdb/0xfc0 kernel/workqueue.c:2228
         [<ffffffff813cdeef>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303
         [<ffffffff867bc5d2>] ret_from_fork+0x22/0x50 arch/x86/entry/entry_64.S:392
        Code: 05 94 4a a8 06 85 c0 0f 85 03 03 00 00 e8 07 15 d0 ff 41 80 3e
        00 0f 85 64 06 00 00 49 8b 9c 24 88 01 00 00 48 89 d8 48 c1 e8 03 <42>
        80 3c 28 00 0f 85 17 06 00 00 48 8b 03 48 83 c0 50 48 39 c3
        RIP  [<     inline     >] wb_get include/linux/backing-dev-defs.h:212
        RIP  [<ffffffff818884d2>] locked_inode_to_wb_and_lock_list+0xa2/0x750
        fs/fs-writeback.c:281
         RSP <ffff88006cdaf7d0>
        ---[ end trace 986a4d314dcb2694 ]---
      
      Fix the problem by making sure __writeback_single_inode() writes inode
      only with dirty times in WB_SYNC_ALL mode.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Tested-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      dc5ff2b1
  10. 04 8月, 2016 5 次提交
    • R
      block: remove BLK_DEV_DAX config option · 99a01cdf
      Ross Zwisler 提交于
      The functionality for block device DAX was already removed with commit
      acc93d30 ("Revert "block: enable dax for raw block devices"")
      
      However, we still had a config option hanging around that was always
      disabled because it depended on CONFIG_BROKEN.  This config option was
      introduced in commit 03cdadb0 ("block: disable block device DAX by
      default")
      
      This change reverts that commit, removing the dead config option.
      
      Link: http://lkml.kernel.org/r/20160729182314.6368-1-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      99a01cdf
    • D
      hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common() · 8a545f18
      Dan Carpenter 提交于
      We can't pass error pointers to kfree() or it causes an oops.
      
      Fixes: 52b209f7 ('get rid of hostfs_read_inode()')
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      8a545f18
    • C
      Btrfs: fix __MAX_CSUM_ITEMS · 42049bf6
      Chris Mason 提交于
      Jeff Mahoney's cleanup commit (14a1e067) wasn't correct for csums on
      machines where the pagesize >= metadata blocksize.
      
      This just reverts the relevant hunks to bring the old math back.
      Signed-off-by: NChris Mason <clm@fb.com>
      42049bf6
    • D
      cachefiles: Fix race between inactivating and culling a cache object · db20a892
      David Howells 提交于
      There's a race between cachefiles_mark_object_inactive() and
      cachefiles_cull():
      
       (1) cachefiles_cull() can't delete a backing file until the cache object
           is marked inactive, but as soon as that's the case it's fair game.
      
       (2) cachefiles_mark_object_inactive() marks the object as being inactive
           and *only then* reads the i_blocks on the backing inode - but
           cachefiles_cull() might've managed to delete it by this point.
      
      Fix this by making sure cachefiles_mark_object_inactive() gets any data it
      needs from the backing inode before deactivating the object.
      
      Without this, the following oops may occur:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
      IP: [<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
      ...
      CPU: 11 PID: 527 Comm: kworker/u64:4 Tainted: G          I    ------------   3.10.0-470.el7.x86_64 #1
      Hardware name: Hewlett-Packard HP Z600 Workstation/0B54h, BIOS 786G4 v03.19 03/11/2011
      Workqueue: fscache_object fscache_object_work_func [fscache]
      task: ffff880035edaf10 ti: ffff8800b77c0000 task.ti: ffff8800b77c0000
      RIP: 0010:[<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
      RSP: 0018:ffff8800b77c3d70  EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff8800bf6cc400 RCX: 0000000000000034
      RDX: 0000000000000000 RSI: ffff880090ffc710 RDI: ffff8800bf761ef8
      RBP: ffff8800b77c3d88 R08: 2000000000000000 R09: 0090ffc710000000
      R10: ff51005d2ff1c400 R11: 0000000000000000 R12: ffff880090ffc600
      R13: ffff8800bf6cc520 R14: ffff8800bf6cc400 R15: ffff8800bf6cc498
      FS:  0000000000000000(0000) GS:ffff8800bb8c0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000098 CR3: 00000000019ba000 CR4: 00000000000007e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Stack:
       ffff880090ffc600 ffff8800bf6cc400 ffff8800867df140 ffff8800b77c3db0
       ffffffffa06c48cb ffff880090ffc600 ffff880090ffc180 ffff880090ffc658
       ffff8800b77c3df0 ffffffffa085d846 ffff8800a96b8150 ffff880090ffc600
      Call Trace:
       [<ffffffffa06c48cb>] cachefiles_drop_object+0x6b/0xf0 [cachefiles]
       [<ffffffffa085d846>] fscache_drop_object+0xd6/0x1e0 [fscache]
       [<ffffffffa085d615>] fscache_object_work_func+0xa5/0x200 [fscache]
       [<ffffffff810a605b>] process_one_work+0x17b/0x470
       [<ffffffff810a6e96>] worker_thread+0x126/0x410
       [<ffffffff810a6d70>] ? rescuer_thread+0x460/0x460
       [<ffffffff810ae64f>] kthread+0xcf/0xe0
       [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140
       [<ffffffff81695418>] ret_from_fork+0x58/0x90
       [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140
      
      The oopsing code shows:
      
      	callq  0xffffffff810af6a0 <wake_up_bit>
      	mov    0xf8(%r12),%rax
      	mov    0x30(%rax),%rax
      	mov    0x98(%rax),%rax   <---- oops here
      	lock add %rax,0x130(%rbx)
      
      where this is:
      
      	d_backing_inode(object->dentry)->i_blocks
      
      Fixes: a5b3a80b (CacheFiles: Provide read-and-reset release counters for cachefilesd)
      Reported-by: NJianhong Yin <jiyin@redhat.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NSteve Dickson <steved@redhat.com>
      cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      db20a892
    • G
      fs/proc: Add compiler check for -Wno-override-init to support gcc < 4.2 · 4b2e0162
      Geert Uytterhoeven 提交于
      With gcc < 4.2 (e.g. 4.1.2):
      
            CC      fs/proc/task_mmu.o
          cc1: error: unrecognized command line option "-Wno-override-init"
      
      To fix this, only enable the compiler option when it is actually
      supported by the compiler.
      
      Fixes: ca52953f ("fs/proc/task_mmu.c: suppress compilation warnings with W=1")
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NValdis Kletnieks <valdis.kletnieks@vt.edu>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b2e0162
  11. 03 8月, 2016 6 次提交