1. 14 2月, 2017 32 次提交
  2. 11 2月, 2017 1 次提交
    • O
      Btrfs: fix btrfs_decompress_buf2page() · 6e78b3f7
      Omar Sandoval 提交于
      If btrfs_decompress_buf2page() is handed a bio with its page in the
      middle of the working buffer, then we adjust the offset into the working
      buffer. After we copy into the bio, we advance the iterator by the
      number of bytes we copied. Then, we have some logic to handle the case
      of discontiguous pages and adjust the offset into the working buffer
      again. However, if we didn't advance the bio to a new page, we may enter
      this case in error, essentially repeating the adjustment that we already
      made when we entered the function. The end result is bogus data in the
      bio.
      
      Previously, we only checked for this case when we advanced to a new
      page, but the conversion to bio iterators changed that. This restores
      the old, correct behavior.
      
      A case I saw when testing with zlib was:
      
          buf_start = 42769
          total_out = 46865
          working_bytes = total_out - buf_start = 4096
          start_byte = 45056
      
      The condition (total_out > start_byte && buf_start < start_byte) is
      true, so we adjust the offset:
      
          buf_offset = start_byte - buf_start = 2287
          working_bytes -= buf_offset = 1809
          current_buf_start = buf_start = 42769
      
      Then, we copy
      
          bytes = min(bvec.bv_len, PAGE_SIZE - buf_offset, working_bytes) = 1809
          buf_offset += bytes = 4096
          working_bytes -= bytes = 0
          current_buf_start += bytes = 44578
      
      After bio_advance(), we are still in the same page, so start_byte is the
      same. Then, we check (total_out > start_byte && current_buf_start < start_byte),
      which is true! So, we adjust the values again:
      
          buf_offset = start_byte - buf_start = 2287
          working_bytes = total_out - start_byte = 1809
          current_buf_start = buf_start + buf_offset = 45056
      
      But note that working_bytes was already zero before this, so we should
      have stopped copying.
      
      Fixes: 974b1adc ("btrfs: use bio iterators for the decompression handlers")
      Reported-by: NPat Erley <pat-lkml@erley.org>
      Reviewed-by: NChris Mason <clm@fb.com>
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Tested-by: NLiu Bo <bo.li.liu@oracle.com>
      6e78b3f7
  3. 10 2月, 2017 2 次提交
  4. 09 2月, 2017 1 次提交
  5. 08 2月, 2017 1 次提交
    • H
      mm: fix KPF_SWAPCACHE in /proc/kpageflags · b6789123
      Hugh Dickins 提交于
      Commit 6326fec1 ("mm: Use owner_priv bit for PageSwapCache, valid
      when PageSwapBacked") aliased PG_swapcache to PG_owner_priv_1 (and
      depending on PageSwapBacked being true).
      
      As a result, the KPF_SWAPCACHE bit in '/proc/kpageflags' should now be
      synthesized, instead of being shown on unrelated pages which just happen
      to have PG_owner_priv_1 set.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6789123
  6. 04 2月, 2017 1 次提交
    • M
      fs: break out of iomap_file_buffered_write on fatal signals · d1908f52
      Michal Hocko 提交于
      Tetsuo has noticed that an OOM stress test which performs large write
      requests can cause the full memory reserves depletion.  He has tracked
      this down to the following path
      
      	__alloc_pages_nodemask+0x436/0x4d0
      	alloc_pages_current+0x97/0x1b0
      	__page_cache_alloc+0x15d/0x1a0          mm/filemap.c:728
      	pagecache_get_page+0x5a/0x2b0           mm/filemap.c:1331
      	grab_cache_page_write_begin+0x23/0x40   mm/filemap.c:2773
      	iomap_write_begin+0x50/0xd0             fs/iomap.c:118
      	iomap_write_actor+0xb5/0x1a0            fs/iomap.c:190
      	? iomap_write_end+0x80/0x80             fs/iomap.c:150
      	iomap_apply+0xb3/0x130                  fs/iomap.c:79
      	iomap_file_buffered_write+0x68/0xa0     fs/iomap.c:243
      	? iomap_write_end+0x80/0x80
      	xfs_file_buffered_aio_write+0x132/0x390 [xfs]
      	? remove_wait_queue+0x59/0x60
      	xfs_file_write_iter+0x90/0x130 [xfs]
      	__vfs_write+0xe5/0x140
      	vfs_write+0xc7/0x1f0
      	? syscall_trace_enter+0x1d0/0x380
      	SyS_write+0x58/0xc0
      	do_syscall_64+0x6c/0x200
      	entry_SYSCALL64_slow_path+0x25/0x25
      
      the oom victim has access to all memory reserves to make a forward
      progress to exit easier.  But iomap_file_buffered_write and other
      callers of iomap_apply loop to complete the full request.  We need to
      check for fatal signals and back off with a short write instead.
      
      As the iomap_apply delegates all the work down to the actor we have to
      hook into those.  All callers that work with the page cache are calling
      iomap_write_begin so we will check for signals there.  dax_iomap_actor
      has to handle the situation explicitly because it copies data to the
      userspace directly.  Other callers like iomap_page_mkwrite work on a
      single page or iomap_fiemap_actor do not allocate memory based on the
      given len.
      
      Fixes: 68a9f5e7 ("xfs: implement iomap based buffered write path")
      Link: http://lkml.kernel.org/r/20170201092706.9966-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d1908f52
  7. 01 2月, 2017 2 次提交
    • D
      fscache: Fix dead object requeue · e26bfebd
      David Howells 提交于
      Under some circumstances, an fscache object can become queued such that it
      fscache_object_work_func() can be called once the object is in the
      OBJECT_DEAD state.  This results in the kernel oopsing when it tries to
      invoke the handler for the state (which is hard coded to 0x2).
      
      The way this comes about is something like the following:
      
       (1) The object dispatcher is processing a work state for an object.  This
           is done in workqueue context.
      
       (2) An out-of-band event comes in that isn't masked, causing the object to
           be queued, say EV_KILL.
      
       (3) The object dispatcher finishes processing the current work state on
           that object and then sees there's another event to process, so,
           without returning to the workqueue core, it processes that event too.
           It then follows the chain of events that initiates until we reach
           OBJECT_DEAD without going through a wait state (such as
           WAIT_FOR_CLEARANCE).
      
           At this point, object->events may be 0, object->event_mask will be 0
           and oob_event_mask will be 0.
      
       (4) The object dispatcher returns to the workqueue processor, and in due
           course, this sees that the object's work item is still queued and
           invokes it again.
      
       (5) The current state is a work state (OBJECT_DEAD), so the dispatcher
           jumps to it - resulting in an OOPS.
      
      When I'm seeing this, the work state in (1) appears to have been either
      LOOK_UP_OBJECT or CREATE_OBJECT (object->oob_table is
      fscache_osm_lookup_oob).
      
      The window for (2) is very small:
      
       (A) object->event_mask is cleared whilst the event dispatch process is
           underway - though there's no memory barrier to force this to the top
           of the function.
      
           The window, therefore is from the time the object was selected by the
           workqueue processor and made requeueable to the time the mask was
           cleared.
      
       (B) fscache_raise_event() will only queue the object if it manages to set
           the event bit and the corresponding event_mask bit was set.
      
           The enqueuement is then deferred slightly whilst we get a ref on the
           object and get the per-CPU variable for workqueue congestion.  This
           slight deferral slightly increases the probability by allowing extra
           time for the workqueue to make the item requeueable.
      
      Handle this by giving the dead state a processor function and checking the
      for the dead state address rather than seeing if the processor function is
      address 0x2.  The dead state processor function can then set a flag to
      indicate that it's occurred and give a warning if it occurs more than once
      per object.
      
      If this race occurs, an oops similar to the following is seen (note the RIP
      value):
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
      IP: [<0000000000000002>] 0x1
      PGD 0
      Oops: 0010 [#1] SMP
      Modules linked in: ...
      CPU: 17 PID: 16077 Comm: kworker/u48:9 Not tainted 3.10.0-327.18.2.el7.x86_64 #1
      Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
      Workqueue: fscache_object fscache_object_work_func [fscache]
      task: ffff880302b63980 ti: ffff880717544000 task.ti: ffff880717544000
      RIP: 0010:[<0000000000000002>]  [<0000000000000002>] 0x1
      RSP: 0018:ffff880717547df8  EFLAGS: 00010202
      RAX: ffffffffa0368640 RBX: ffff880edf7a4480 RCX: dead000000200200
      RDX: 0000000000000002 RSI: 00000000ffffffff RDI: ffff880edf7a4480
      RBP: ffff880717547e18 R08: 0000000000000000 R09: dfc40a25cb3a4510
      R10: dfc40a25cb3a4510 R11: 0000000000000400 R12: 0000000000000000
      R13: ffff880edf7a4510 R14: ffff8817f6153400 R15: 0000000000000600
      FS:  0000000000000000(0000) GS:ffff88181f420000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000002 CR3: 000000000194a000 CR4: 00000000001407e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Stack:
       ffffffffa0363695 ffff880edf7a4510 ffff88093f16f900 ffff8817faa4ec00
       ffff880717547e60 ffffffff8109d5db 00000000faa4ec18 0000000000000000
       ffff8817faa4ec18 ffff88093f16f930 ffff880302b63980 ffff88093f16f900
      Call Trace:
       [<ffffffffa0363695>] ? fscache_object_work_func+0xa5/0x200 [fscache]
       [<ffffffff8109d5db>] process_one_work+0x17b/0x470
       [<ffffffff8109e4ac>] worker_thread+0x21c/0x400
       [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
       [<ffffffff810a5acf>] kthread+0xcf/0xe0
       [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
       [<ffffffff816460d8>] ret_from_fork+0x58/0x90
       [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NJeremy McNicoll <jeremymc@redhat.com>
      Tested-by: NFrank Sorenson <sorenson@redhat.com>
      Tested-by: NBenjamin Coddington <bcodding@redhat.com>
      Reviewed-by: NBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e26bfebd
    • D
      fscache: Clear outstanding writes when disabling a cookie · 6bdded59
      David Howells 提交于
      fscache_disable_cookie() needs to clear the outstanding writes on the
      cookie it's disabling because they cannot be completed after.
      
      Without this, fscache_nfs_open_file() gets stuck because it disables the
      cookie when the file is opened for writing but can't uncache the pages till
      afterwards - otherwise there's a race between the open routine and anyone
      who already has it open R/O and is still reading from it.
      
      Looking in /proc/pid/stack of the offending process shows:
      
      [<ffffffffa0142883>] __fscache_wait_on_page_write+0x82/0x9b [fscache]
      [<ffffffffa014336e>] __fscache_uncache_all_inode_pages+0x91/0xe1 [fscache]
      [<ffffffffa01740fa>] nfs_fscache_open_file+0x59/0x9e [nfs]
      [<ffffffffa01ccf41>] nfs4_file_open+0x17f/0x1b8 [nfsv4]
      [<ffffffff8117350e>] do_dentry_open+0x16d/0x2b7
      [<ffffffff811743ac>] vfs_open+0x5c/0x65
      [<ffffffff81184185>] path_openat+0x785/0x8fb
      [<ffffffff81184343>] do_filp_open+0x48/0x9e
      [<ffffffff81174710>] do_sys_open+0x13b/0x1cb
      [<ffffffff811747b9>] SyS_open+0x19/0x1b
      [<ffffffff81001c44>] do_syscall_64+0x80/0x17a
      [<ffffffff8165c2da>] return_from_SYSCALL_64+0x0/0x7a
      [<ffffffffffffffff>] 0xffffffffffffffff
      Reported-by: NJianhong Yin <jiyin@redhat.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      Acked-by: NSteve Dickson <steved@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6bdded59