1. 14 11月, 2018 8 次提交
  2. 18 10月, 2018 3 次提交
    • E
      fscache: Fix out of bound read in long cookie keys · fa520c47
      Eric Sandeen 提交于
      fscache_set_key() can incur an out-of-bounds read, reported by KASAN:
      
       BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x5b3/0x680 [fscache]
       Read of size 4 at addr ffff88084ff056d4 by task mount.nfs/32615
      
      and also reported by syzbot at https://lkml.org/lkml/2018/7/8/236
      
        BUG: KASAN: slab-out-of-bounds in fscache_set_key fs/fscache/cookie.c:120 [inline]
        BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x7a9/0x880 fs/fscache/cookie.c:171
        Read of size 4 at addr ffff8801d3cc8bb4 by task syz-executor907/4466
      
      This happens for any index_key_len which is not divisible by 4 and is
      larger than the size of the inline key, because the code allocates exactly
      index_key_len for the key buffer, but the hashing loop is stepping through
      it 4 bytes (u32) at a time in the buf[] array.
      
      Fix this by calculating how many u32 buffers we'll need by using
      DIV_ROUND_UP, and then using kcalloc() to allocate a precleared allocation
      buffer to hold the index_key, then using that same count as the hashing
      index limit.
      
      Fixes: ec0328e4 ("fscache: Maintain a catalogue of allocated cookies")
      Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fa520c47
    • D
      fscache: Fix incomplete initialisation of inline key space · 1ff22883
      David Howells 提交于
      The inline key in struct rxrpc_cookie is insufficiently initialized,
      zeroing only 3 of the 4 slots, therefore an index_key_len between 13 and 15
      bytes will end up hashing uninitialized memory because the memcpy only
      partially fills the last buf[] element.
      
      Fix this by clearing fscache_cookie objects on allocation rather than using
      the slab constructor to initialise them.  We're going to pretty much fill
      in the entire struct anyway, so bringing it into our dcache writably
      shouldn't incur much overhead.
      
      This removes the need to do clearance in fscache_set_key() (where we aren't
      doing it correctly anyway).
      
      Also, we don't need to set cookie->key_len in fscache_set_key() as we
      already did it in the only caller, so remove that.
      
      Fixes: ec0328e4 ("fscache: Maintain a catalogue of allocated cookies")
      Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
      Reported-by: NEric Sandeen <sandeen@redhat.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ff22883
    • A
      cachefiles: fix the race between cachefiles_bury_object() and rmdir(2) · 169b8033
      Al Viro 提交于
      the victim might've been rmdir'ed just before the lock_rename();
      unlike the normal callers, we do not look the source up after the
      parents are locked - we know it beforehand and just recheck that it's
      still the child of what used to be its parent.  Unfortunately,
      the check is too weak - we don't spot a dead directory since its
      ->d_parent is unchanged, dentry is positive, etc.  So we sail all
      the way to ->rename(), with hosting filesystems _not_ expecting
      to be asked renaming an rmdir'ed subdirectory.
      
      The fix is easy, fortunately - the lock on parent is sufficient for
      making IS_DEADDIR() on child safe.
      
      Cc: stable@vger.kernel.org
      Fixes: 9ae326a6 (CacheFiles: A cache that backs onto a mounted filesystem)
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      169b8033
  3. 15 10月, 2018 1 次提交
    • D
      afs: Fix clearance of reply · f0a7d188
      David Howells 提交于
      The recent patch to fix the afs_server struct leak didn't actually fix the
      bug, but rather fixed some of the symptoms.  The problem is that an
      asynchronous call that holds a resource pointed to by call->reply[0] will
      find the pointer cleared in the call destructor, thereby preventing the
      resource from being cleaned up.
      
      In the case of the server record leak, the afs_fs_get_capabilities()
      function in devel code sets up a call with reply[0] pointing at the server
      record that should be altered when the result is obtained, but this was
      being cleared before the destructor was called, so the put in the
      destructor does nothing and the record is leaked.
      
      Commit f014ffb0 removed the additional ref obtained by
      afs_install_server(), but the removal of this ref is actually used by the
      garbage collector to mark a server record as being defunct after the record
      has expired through lack of use.
      
      The offending clearance of call->reply[0] upon completion in
      afs_process_async_call() has been there from the origin of the code, but
      none of the asynchronous calls actually use that pointer currently, so it
      should be safe to remove (note that synchronous calls don't involve this
      function).
      
      Fix this by the following means:
      
       (1) Revert commit f014ffb0.
      
       (2) Remove the clearance of reply[0] from afs_process_async_call().
      
      Without this, afs_manage_servers() will suffer an assertion failure if it
      sees a server record that didn't get used because the usage count is not 1.
      
      Fixes: f014ffb0 ("afs: Fix afs_server struct leak")
      Fixes: 08e0e7c8 ("[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC.")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f0a7d188
  4. 13 10月, 2018 3 次提交
  5. 12 10月, 2018 3 次提交
    • D
      afs: Fix afs_server struct leak · f014ffb0
      David Howells 提交于
      Fix a leak of afs_server structs.  The routine that installs them in the
      various lookup lists and trees gets a ref on leaving the function, whether
      it added the server or a server already exists.  It shouldn't increment
      the refcount if it added the server.
      
      The effect of this that "rmmod kafs" will hang waiting for the leaked
      server to become unused.
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f014ffb0
    • A
      gfs2: Fix iomap buffered write support for journaled files (2) · fee5150c
      Andreas Gruenbacher 提交于
      It turns out that the fix in commit 6636c3cc56 is bad; the assertion
      that the iomap code no longer creates buffer heads is incorrect for
      filesystems that set the IOMAP_F_BUFFER_HEAD flag.
      
      Instead, what's happening is that gfs2_iomap_begin_write treats all
      files that have the jdata flag set as journaled files, which is
      incorrect as long as those files are inline ("stuffed").  We're handling
      stuffed files directly via the page cache, which is why we ended up with
      pages without buffer heads in gfs2_page_add_databufs.
      
      Fix this by handling stuffed journaled files correctly in
      gfs2_iomap_begin_write.
      
      This reverts commit 6636c3cc5690c11631e6366cf9a28fb99c8b25bb.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      fee5150c
    • D
      afs: Fix cell proc list · 6b3944e4
      David Howells 提交于
      Access to the list of cells by /proc/net/afs/cells has a couple of
      problems:
      
       (1) It should be checking against SEQ_START_TOKEN for the keying the
           header line.
      
       (2) It's only holding the RCU read lock, so it can't just walk over the
           list without following the proper RCU methods.
      
      Fix these by using an hlist instead of an ordinary list and using the
      appropriate accessor functions to follow it with RCU.
      
      Since the code that adds a cell to the list must also necessarily change,
      sort the list on insertion whilst we're at it.
      
      Fixes: 989782dc ("afs: Overhaul cell database management")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b3944e4
  6. 10 10月, 2018 1 次提交
  7. 09 10月, 2018 1 次提交
    • D
      filesystem-dax: Fix dax_layout_busy_page() livelock · d7782145
      Dan Williams 提交于
      In the presence of multi-order entries the typical
      pagevec_lookup_entries() pattern may loop forever:
      
      	while (index < end && pagevec_lookup_entries(&pvec, mapping, index,
      				min(end - index, (pgoff_t)PAGEVEC_SIZE),
      				indices)) {
      		...
      		for (i = 0; i < pagevec_count(&pvec); i++) {
      			index = indices[i];
      			...
      		}
      		index++; /* BUG */
      	}
      
      The loop updates 'index' for each index found and then increments to the
      next possible page to continue the lookup. However, if the last entry in
      the pagevec is multi-order then the next possible page index is more
      than 1 page away. Fix this locally for the filesystem-dax case by
      checking for dax-multi-order entries. Going forward new users of
      multi-order entries need to be similarly careful, or we need a generic
      way to report the page increment in the radix iterator.
      
      Fixes: 5fac7408 ("mm, fs, dax: handle layout changes to pinned dax...")
      Cc: <stable@vger.kernel.org>
      Cc: Ross Zwisler <zwisler@kernel.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d7782145
  8. 06 10月, 2018 5 次提交
    • D
      xfs: fix data corruption w/ unaligned reflink ranges · b3998900
      Dave Chinner 提交于
      When reflinking sub-file ranges, a data corruption can occur when
      the source file range includes a partial EOF block. This shares the
      unknown data beyond EOF into the second file at a position inside
      EOF, exposing stale data in the second file.
      
      XFS only supports whole block sharing, but we still need to
      support whole file reflink correctly.  Hence if the reflink
      request includes the last block of the souce file, only proceed with
      the reflink operation if it lands at or past the destination file's
      current EOF. If it lands within the destination file EOF, reject the
      entire request with -EINVAL and make the caller go the hard way.
      
      This avoids the data corruption vector, but also avoids disruption
      of returning EINVAL to userspace for the common case of whole file
      cloning.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b3998900
    • D
      xfs: fix data corruption w/ unaligned dedupe ranges · dceeb47b
      Dave Chinner 提交于
      A deduplication data corruption is Exposed by fstests generic/505 on
      XFS. It is caused by extending the block match range to include the
      partial EOF block, but then allowing unknown data beyond EOF to be
      considered a "match" to data in the destination file because the
      comparison is only made to the end of the source file. This corrupts
      the destination file when the source extent is shared with it.
      
      XFS only supports whole block dedupe, but we still need to appear to
      support whole file dedupe correctly.  Hence if the dedupe request
      includes the last block of the souce file, don't include it in the
      actual XFS dedupe operation. If the rest of the range dedupes
      successfully, then report the partial last block as deduped, too, so
      that userspace sees it as a successful dedupe rather than return
      EINVAL because we can't dedupe unaligned blocks.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      dceeb47b
    • A
      ocfs2: fix locking for res->tracking and dlm->tracking_list · cbe355f5
      Ashish Samant 提交于
      In dlm_init_lockres() we access and modify res->tracking and
      dlm->tracking_list without holding dlm->track_lock.  This can cause list
      corruptions and can end up in kernel panic.
      
      Fix this by locking res->tracking and dlm->tracking_list with
      dlm->track_lock instead of dlm->spinlock.
      
      Link: http://lkml.kernel.org/r/1529951192-4686-1-git-send-email-ashish.samant@oracle.comSigned-off-by: NAshish Samant <ashish.samant@oracle.com>
      Reviewed-by: NChangwei Ge <ge.changwei@h3c.com>
      Acked-by: NJoseph Qi <jiangqi903@gmail.com>
      Acked-by: NJun Piao <piaojun@huawei.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <ge.changwei@h3c.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cbe355f5
    • J
      proc: restrict kernel stack dumps to root · f8a00cef
      Jann Horn 提交于
      Currently, you can use /proc/self/task/*/stack to cause a stack walk on
      a task you control while it is running on another CPU.  That means that
      the stack can change under the stack walker.  The stack walker does
      have guards against going completely off the rails and into random
      kernel memory, but it can interpret random data from your kernel stack
      as instruction pointers and stack pointers.  This can cause exposure of
      kernel stack contents to userspace.
      
      Restrict the ability to inspect kernel stacks of arbitrary tasks to root
      in order to prevent a local attacker from exploiting racy stack unwinding
      to leak kernel task stack contents.  See the added comment for a longer
      rationale.
      
      There don't seem to be any users of this userspace API that can't
      gracefully bail out if reading from the file fails.  Therefore, I believe
      that this change is unlikely to break things.  In the case that this patch
      does end up needing a revert, the next-best solution might be to fake a
      single-entry stack based on wchan.
      
      Link: http://lkml.kernel.org/r/20180927153316.200286-1-jannh@google.com
      Fixes: 2ec220e2 ("proc: add /proc/*/stack")
      Signed-off-by: NJann Horn <jannh@google.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Ken Chen <kenchen@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f8a00cef
    • L
      ocfs2: fix crash in ocfs2_duplicate_clusters_by_page() · 69eb7765
      Larry Chen 提交于
      ocfs2_duplicate_clusters_by_page() may crash if one of the extent's pages
      is dirty.  When a page has not been written back, it is still in dirty
      state.  If ocfs2_duplicate_clusters_by_page() is called against the dirty
      page, the crash happens.
      
      To fix this bug, we can just unlock the page and wait until the page until
      its not dirty.
      
      The following is the backtrace:
      
      kernel BUG at /root/code/ocfs2/refcounttree.c:2961!
      [exception RIP: ocfs2_duplicate_clusters_by_page+822]
      __ocfs2_move_extent+0x80/0x450 [ocfs2]
      ? __ocfs2_claim_clusters+0x130/0x250 [ocfs2]
      ocfs2_defrag_extent+0x5b8/0x5e0 [ocfs2]
      __ocfs2_move_extents_range+0x2a4/0x470 [ocfs2]
      ocfs2_move_extents+0x180/0x3b0 [ocfs2]
      ? ocfs2_wait_for_recovery+0x13/0x70 [ocfs2]
      ocfs2_ioctl_move_extents+0x133/0x2d0 [ocfs2]
      ocfs2_ioctl+0x253/0x640 [ocfs2]
      do_vfs_ioctl+0x90/0x5f0
      SyS_ioctl+0x74/0x80
      do_syscall_64+0x74/0x140
      entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Once we find the page is dirty, we do not wait until it's clean, rather we
      use write_one_page() to write it back
      
      Link: http://lkml.kernel.org/r/20180829074740.9438-1-lchen@suse.com
      [lchen@suse.com: update comments]
        Link: http://lkml.kernel.org/r/20180830075041.14879-1-lchen@suse.com
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NLarry Chen <lchen@suse.com>
      Acked-by: NChangwei Ge <ge.changwei@h3c.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      69eb7765
  9. 05 10月, 2018 3 次提交
  10. 04 10月, 2018 2 次提交
    • M
      ovl: fix format of setxattr debug · 1a8f8d2a
      Miklos Szeredi 提交于
      Format has a typo: it was meant to be "%.*s", not "%*s".  But at some point
      callers grew nonprintable values as well, so use "%*pE" instead with a
      maximized length.
      Reported-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 3a1e819b ("ovl: store file handle of lower inode on copy up")
      Cc: <stable@vger.kernel.org> # v4.12
      1a8f8d2a
    • A
      ovl: fix access beyond unterminated strings · 601350ff
      Amir Goldstein 提交于
      KASAN detected slab-out-of-bounds access in printk from overlayfs,
      because string format used %*s instead of %.*s.
      
      > BUG: KASAN: slab-out-of-bounds in string+0x298/0x2d0 lib/vsprintf.c:604
      > Read of size 1 at addr ffff8801c36c66ba by task syz-executor2/27811
      >
      > CPU: 0 PID: 27811 Comm: syz-executor2 Not tainted 4.19.0-rc5+ #36
      ...
      >  printk+0xa7/0xcf kernel/printk/printk.c:1996
      >  ovl_lookup_index.cold.15+0xe8/0x1f8 fs/overlayfs/namei.c:689
      
      Reported-by: syzbot+376cea2b0ef340db3dd4@syzkaller.appspotmail.com
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 359f392c ("ovl: lookup index entry for copy up origin")
      Cc: <stable@vger.kernel.org> # v4.13
      601350ff
  11. 03 10月, 2018 4 次提交
    • S
      smb3: fix lease break problem introduced by compounding · 7af929d6
      Steve French 提交于
      Fixes problem (discovered by Aurelien) introduced by recent commit:
      commit b24df3e3
      ("cifs: update receive_encrypted_standard to handle compounded responses")
      
      which broke the ability to respond to some lease breaks
      (lease breaks being ignored is a problem since can block
      server response for duration of the lease break timeout).
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      7af929d6
    • R
      cifs: only wake the thread for the very last PDU in a compound · 4e34feb5
      Ronnie Sahlberg 提交于
      For compounded PDUs we whould only wake the waiting thread for the
      very last PDU of the compound.
      We do this so that we are guaranteed that the demultiplex_thread will
      not process or access any of those MIDs any more once the send/recv
      thread starts processing.
      
      Else there is a race where at the end of the send/recv processing we
      will try to delete all the mids of the compound. If the multiplex
      thread still has other mids to process at this point for this compound
      this can lead to an oops.
      
      Needed to fix recent commit:
      commit 730928c8
      ("cifs: update smb2_queryfs() to use compounding")
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      4e34feb5
    • R
      cifs: add a warning if we try to to dequeue a deleted mid · ddf83afb
      Ronnie Sahlberg 提交于
      cifs_delete_mid() is called once we are finished handling a mid and we
      expect no more work done on this mid.
      
      Needed to fix recent commit:
      commit 730928c8
      ("cifs: update smb2_queryfs() to use compounding")
      
      Add a warning if someone tries to dequeue a mid that has already been
      flagged to be deleted.
      Also change list_del() to list_del_init() so that if we have similar bugs
      resurface in the future we will not oops.
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      ddf83afb
    • A
      smb2: fix missing files in root share directory listing · 0595751f
      Aurelien Aptel 提交于
      When mounting a Windows share that is the root of a drive (eg. C$)
      the server does not return . and .. directory entries. This results in
      the smb2 code path erroneously skipping the 2 first entries.
      
      Pseudo-code of the readdir() code path:
      
      cifs_readdir(struct file, struct dir_context)
          initiate_cifs_search            <-- if no reponse cached yet
              server->ops->query_dir_first
      
          dir_emit_dots
              dir_emit                    <-- adds "." and ".." if we're at pos=0
      
          find_cifs_entry
              initiate_cifs_search        <-- if pos < start of current response
                                               (restart search)
              server->ops->query_dir_next <-- if pos > end of current response
                                               (fetch next search res)
      
          for(...)                        <-- loops over cur response entries
                                                starting at pos
              cifs_filldir                <-- skip . and .., emit entry
                  cifs_fill_dirent
                  dir_emit
      	pos++
      
      A) dir_emit_dots() always adds . & ..
         and sets the current dir pos to 2 (0 and 1 are done).
      
      Therefore we always want the index_to_find to be 2 regardless of if
      the response has . and ..
      
      B) smb1 code initializes index_of_last_entry with a +2 offset
      
        in cifssmb.c CIFSFindFirst():
      		psrch_inf->index_of_last_entry = 2 /* skip . and .. */ +
      			psrch_inf->entries_in_buffer;
      
      Later in find_cifs_entry() we want to find the next dir entry at pos=2
      as a result of (A)
      
      	first_entry_in_buffer = cfile->srch_inf.index_of_last_entry -
      					cfile->srch_inf.entries_in_buffer;
      
      This var is the dir pos that the first entry in the buffer will
      have therefore it must be 2 in the first call.
      
      If we don't offset index_of_last_entry by 2 (like in (B)),
      first_entry_in_buffer=0 but we were instructed to get pos=2 so this
      code in find_cifs_entry() skips the 2 first which is ok for non-root
      shares, as it skips . and .. from the response but is not ok for root
      shares where the 2 first are actual files
      
      		pos_in_buf = index_to_find - first_entry_in_buffer;
                      // pos_in_buf=2
      		// we skip 2 first response entries :(
      		for (i = 0; (i < (pos_in_buf)) && (cur_ent != NULL); i++) {
      			/* go entry by entry figuring out which is first */
      			cur_ent = nxt_dir_entry(cur_ent, end_of_smb,
      						cfile->srch_inf.info_level);
      		}
      
      C) cifs_filldir() skips . and .. so we can safely ignore them for now.
      
      Sample program:
      
      int main(int argc, char **argv)
      {
      	const char *path = argc >= 2 ? argv[1] : ".";
      	DIR *dh;
      	struct dirent *de;
      
      	printf("listing path <%s>\n", path);
      	dh = opendir(path);
      	if (!dh) {
      		printf("opendir error %d\n", errno);
      		return 1;
      	}
      
      	while (1) {
      		de = readdir(dh);
      		if (!de) {
      			if (errno) {
      				printf("readdir error %d\n", errno);
      				return 1;
      			}
      			printf("end of listing\n");
      			break;
      		}
      		printf("off=%lu <%s>\n", de->d_off, de->d_name);
      	}
      
      	return 0;
      }
      
      Before the fix with SMB1 on root shares:
      
      <.>            off=1
      <..>           off=2
      <$Recycle.Bin> off=3
      <bootmgr>      off=4
      
      and on non-root shares:
      
      <.>    off=1
      <..>   off=4  <-- after adding .., the offsets jumps to +2 because
      <2536> off=5       we skipped . and .. from response buffer (C)
      <411>  off=6       but still incremented pos
      <file> off=7
      <fsx>  off=8
      
      Therefore the fix for smb2 is to mimic smb1 behaviour and offset the
      index_of_last_entry by 2.
      
      Test results comparing smb1 and smb2 before/after the fix on root
      share, non-root shares and on large directories (ie. multi-response
      dir listing):
      
      PRE FIX
      =======
      pre-1-root VS pre-2-root:
              ERR pre-2-root is missing [bootmgr, $Recycle.Bin]
      pre-1-nonroot VS pre-2-nonroot:
              OK~ same files, same order, different offsets
      pre-1-nonroot-large VS pre-2-nonroot-large:
              OK~ same files, same order, different offsets
      
      POST FIX
      ========
      post-1-root VS post-2-root:
              OK same files, same order, same offsets
      post-1-nonroot VS post-2-nonroot:
              OK same files, same order, same offsets
      post-1-nonroot-large VS post-2-nonroot-large:
              OK same files, same order, same offsets
      
      REGRESSION?
      ===========
      pre-1-root VS post-1-root:
              OK same files, same order, same offsets
      pre-1-nonroot VS post-1-nonroot:
              OK same files, same order, same offsets
      
      BugLink: https://bugzilla.samba.org/show_bug.cgi?id=13107Signed-off-by: NAurelien Aptel <aaptel@suse.com>
      Signed-off-by: NPaulo Alcantara <palcantara@suse.deR>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      0595751f
  12. 01 10月, 2018 2 次提交
    • D
      xfs: fix error handling in xfs_bmap_extents_to_btree · e55ec4dd
      Dave Chinner 提交于
      Commit 01239d77 ("xfs: fix a null pointer dereference in
      xfs_bmap_extents_to_btree") attempted to fix a null pointer
      dreference when a fuzzing corruption of some kind was found.
      This fix was flawed, resulting in assert failures like:
      
      XFS: Assertion failed: ifp->if_broot == NULL, file: fs/xfs/libxfs/xfs_bmap.c, line: 715
      .....
      Call Trace:
        xfs_bmap_extents_to_btree+0x6b9/0x7b0
        __xfs_bunmapi+0xae7/0xf00
        ? xfs_log_reserve+0x1c8/0x290
        xfs_reflink_remap_extent+0x20b/0x620
        xfs_reflink_remap_blocks+0x7e/0x290
        xfs_reflink_remap_range+0x311/0x530
        vfs_dedupe_file_range_one+0xd7/0xe0
        vfs_dedupe_file_range+0x15b/0x1a0
        do_vfs_ioctl+0x267/0x6c0
      
      The problem is that the error handling code now asserts that the
      inode fork is not in btree format before the error handling code
      undoes the modifications that put the fork back in extent format.
      Fix this by moving the assert back to after the xfs_iroot_realloc()
      call that returns the fork to extent format, and clean up the jump
      labels to be meaningful.
      
      Also, returning ENOSPC when xfs_btree_get_bufl() fails to
      instantiate the buffer that was allocated (the actual fix in the
      commit mentioned above) is incorrect. This is a fatal error - only
      an invalid block address or a filesystem shutdown can result in
      failing to get a buffer here.
      
      Hence change this to EFSCORRUPTED so that the higher layer knows
      this was a corruption related failure and should not treat it as an
      ENOSPC error.  This should result in a shutdown (via cancelling a
      dirty transaction) which is necessary as we do not attempt to clean
      up the (invalid) block that we have already allocated.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      e55ec4dd
    • K
      pstore/ram: Fix failure-path memory leak in ramoops_init · bac6f6cd
      Kees Cook 提交于
      As reported by nixiaoming, with some minor clarifications:
      
      1) memory leak in ramoops_register_dummy():
         dummy_data = kzalloc(sizeof(*dummy_data), GFP_KERNEL);
         but no kfree() if platform_device_register_data() fails.
      
      2) memory leak in ramoops_init():
         Missing platform_device_unregister(dummy) and kfree(dummy_data)
         if platform_driver_register(&ramoops_driver) fails.
      
      I've clarified the purpose of ramoops_register_dummy(), and added a
      common cleanup routine for all three failure paths to call.
      Reported-by: Nnixiaoming <nixiaoming@huawei.com>
      Cc: stable@vger.kernel.org
      Cc: Anton Vorontsov <anton@enomsg.org>
      Cc: Colin Cross <ccross@android.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Geliang Tang <geliangtang@gmail.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      bac6f6cd
  13. 29 9月, 2018 4 次提交
    • B
      iomap: set page dirty after partial delalloc on mkwrite · 561295a3
      Brian Foster 提交于
      The iomap page fault mechanism currently dirties the associated page
      after the full block range of the page has been allocated. This
      leaves the page susceptible to delayed allocations without ever
      being set dirty on sub-page block sized filesystems.
      
      For example, consider a page fault on a page with one preexisting
      real (non-delalloc) block allocated in the middle of the page. The
      first iomap_apply() iteration performs delayed allocation on the
      range up to the preexisting block, the next iteration finds the
      preexisting block, and the last iteration attempts to perform
      delayed allocation on the range after the prexisting block to the
      end of the page. If the first allocation succeeds and the final
      allocation fails with -ENOSPC, iomap_apply() returns the error and
      iomap_page_mkwrite() fails to dirty the page having already
      performed partial delayed allocation. This eventually results in the
      page being invalidated without ever converting the delayed
      allocation to real blocks.
      
      This problem is reliably reproduced by generic/083 on XFS on ppc64
      systems (64k page size, 4k block size). It results in leaked
      delalloc blocks on inode reclaim, which triggers an assert failure
      in xfs_fs_destroy_inode() and filesystem accounting inconsistency.
      
      Move the set_page_dirty() call from iomap_page_mkwrite() to the
      actor callback, similar to how the buffer head implementation works.
      The actor callback is called iff ->iomap_begin() returns success, so
      ensures the page is dirtied as soon as possible after an allocation.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      561295a3
    • B
      xfs: remove invalid log recovery first/last cycle check · ec2ed0b5
      Brian Foster 提交于
      One of the first steps of log recovery is to check for the special
      case of a zeroed log. If the first cycle in the log is zero or the
      tail portion of the log is zeroed, the head is set to the first
      instance of cycle 0. xlog_find_zeroed() includes a sanity check that
      enforces that the first cycle in the log must be 1 if the last cycle
      is 0. While this is true in most cases, the check is not totally
      valid because it doesn't consider the case where the filesystem
      crashed after a partial/out of order log buffer completion that
      wraps around the end of the physical log.
      
      For example, consider a filesystem that has completed most of the
      first cycle of the log, reaches the end of the physical log and
      splits the next single log buffer write into two in order to wrap
      around the end of the log. If these I/Os are reordered, the second
      (wrapped) I/O completes and the first happens to fail, the log is
      left in a state where the last cycle of the log is 0 and the first
      cycle is 2. This causes the xlog_find_zeroed() sanity check to fail
      and prevents the filesystem from mounting. This situation has been
      reproduced on particular systems via repeated runs of generic/475.
      
      This is an expected state that log recovery already knows how to
      deal with, however. Since the log is still partially zeroed, the
      head is detected correctly and points to a valid tail. The
      subsequent stale block detection clears blocks beyond the head up to
      the tail (within a maximum range), with the express purpose of
      clearing such out of order writes. As expected, this removes the out
      of order cycle 2 blocks at the physical start of the log.
      
      In other words, the only thing that prevents a clean mount and
      recovery of the filesystem in this scenario is the specific (last ==
      0 && first != 1) sanity check in xlog_find_zeroed(). Since the log
      head/tail are now independently validated via cycle, log record and
      CRC checks, this highly specific first cycle check is of dubious
      value. Remove it and rely on the higher level validation to
      determine whether log content is sane and recoverable.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      ec2ed0b5
    • E
      xfs: validate inode di_forkoff · 339e1a3f
      Eric Sandeen 提交于
      Verify the inode di_forkoff, lifted from xfs_repair's
      process_check_inode_forkoff().
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      339e1a3f
    • C
      xfs: skip delalloc COW blocks in xfs_reflink_end_cow · f5f3f959
      Christoph Hellwig 提交于
      The iomap direct I/O code issues a single ->end_io call for the whole
      I/O request, and if some of the extents cowered needed a COW operation
      it will call xfs_reflink_end_cow over the whole range.
      
      When we do AIO writes we drop the iolock after doing the initial setup,
      but before the I/O completion.  Between dropping the lock and completing
      the I/O we can have a racing buffered write create new delalloc COW fork
      extents in the region covered by the outstanding direct I/O write, and
      thus see delalloc COW fork extents in xfs_reflink_end_cow.  As
      concurrent writes are fundamentally racy and no guarantees are given we
      can simply skip those.
      
      This can be easily reproduced with xfstests generic/208 in always_cow
      mode.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      f5f3f959