1. 28 9月, 2016 4 次提交
    • D
      fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps · 02027d42
      Deepa Dinamani 提交于
      CURRENT_TIME_SEC is not y2038 safe. current_time() will
      be transitioned to use 64 bit time along with vfs in a
      separate patch.
      There is no plan to transistion CURRENT_TIME_SEC to use
      y2038 safe time interfaces.
      
      current_time() will also be extended to use superblock
      range checking parameters when range checking is introduced.
      
      This works because alloc_super() fills in the the s_time_gran
      in super block to NSEC_PER_SEC.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      02027d42
    • D
      fs: Replace CURRENT_TIME with current_time() for inode timestamps · 078cd827
      Deepa Dinamani 提交于
      CURRENT_TIME macro is not appropriate for filesystems as it
      doesn't use the right granularity for filesystem timestamps.
      Use current_time() instead.
      
      CURRENT_TIME is also not y2038 safe.
      
      This is also in preparation for the patch that transitions
      vfs timestamps to use 64 bit time and hence make them
      y2038 safe. As part of the effort current_time() will be
      extended to do range checks. Hence, it is necessary for all
      file system timestamps to use current_time(). Also,
      current_time() will be transitioned along with vfs to be
      y2038 safe.
      
      Note that whenever a single call to current_time() is used
      to change timestamps in different inodes, it is because they
      share the same time granularity.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NFelipe Balbi <balbi@kernel.org>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Acked-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      078cd827
    • D
      fs: proc: Delete inode time initializations in proc_alloc_inode() · 2554c72e
      Deepa Dinamani 提交于
      proc uses new_inode_pseudo() to allocate a new inode.
      This in turn calls the proc_inode_alloc() callback.
      But, at this point, inode is still not initialized
      with the super_block pointer which only happens just
      before alloc_inode() returns after the call to
      inode_init_always().
      
      Also, the inode times are initialized again after the
      call to new_inode_pseudo() in proc_inode_alloc().
      The assignemet in proc_alloc_inode() is redundant and
      also doesn't work after the current_time() api is
      changed to take struct inode* instead of
      struct *super_block.
      
      This bug was reported after current_time() was used to
      assign times in proc_alloc_inode().
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Reported-by: Fengguang Wu <fengguang.wu@intel.com> [0-day test robot]
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2554c72e
    • D
      vfs: Add current_time() api · 3cd88666
      Deepa Dinamani 提交于
      current_fs_time() is used for inode timestamps.
      
      Change the signature of the function to take inode pointer
      instead of superblock as per Linus's suggestion.
      
      Also, move the api under vfs as per the discussion on the
      thread: https://lkml.org/lkml/2016/6/9/36 . As per Arnd's
      suggestion on the thread, changing the function name.
      
      current_fs_time() will be deleted after all the references
      to it are replaced by current_time().
      
      There was a bug reported by kbuild test bot with the change
      as some of the calls to current_time() were made before the
      super_block was initialized. Catch these accidental assignments
      as timespec_trunc() does for wrong granularities. This allows
      for the function to work right even in these circumstances.
      But, adds a warning to make the user aware of the bug.
      
      A coccinelle script was used to identify all the current
      .alloc_inode super_block callbacks that updated inode timestamps.
      proc filesystem was the only one that was modifying inode times
      as part of this callback. The series includes a patch to fix that.
      
      Note that timespec_trunc() will also be moved to fs/inode.c
      in a separate patch when this will need to be revamped for
      bounds checking purposes.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3cd88666
  2. 10 9月, 2016 4 次提交
    • E
      fscrypto: require write access to mount to set encryption policy · ba63f23d
      Eric Biggers 提交于
      Since setting an encryption policy requires writing metadata to the
      filesystem, it should be guarded by mnt_want_write/mnt_drop_write.
      Otherwise, a user could cause a write to a frozen or readonly
      filesystem.  This was handled correctly by f2fs but not by ext4.  Make
      fscrypt_process_policy() handle it rather than relying on the filesystem
      to get it right.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Acked-by: NJaegeuk Kim <jaegeuk@kernel.org>
      ba63f23d
    • E
      fscrypto: only allow setting encryption policy on directories · 002ced4b
      Eric Biggers 提交于
      The FS_IOC_SET_ENCRYPTION_POLICY ioctl allowed setting an encryption
      policy on nondirectory files.  This was unintentional, and in the case
      of nonempty regular files did not behave as expected because existing
      data was not actually encrypted by the ioctl.
      
      In the case of ext4, the user could also trigger filesystem errors in
      ->empty_dir(), e.g. due to mismatched "directory" checksums when the
      kernel incorrectly tried to interpret a regular file as a directory.
      
      This bug affected ext4 with kernels v4.8-rc1 or later and f2fs with
      kernels v4.6 and later.  It appears that older kernels only permitted
      directories and that the check was accidentally lost during the
      refactoring to share the file encryption code between ext4 and f2fs.
      
      This patch restores the !S_ISDIR() check that was present in older
      kernels.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      002ced4b
    • E
      fscrypto: add authorization check for setting encryption policy · 163ae1c6
      Eric Biggers 提交于
      On an ext4 or f2fs filesystem with file encryption supported, a user
      could set an encryption policy on any empty directory(*) to which they
      had readonly access.  This is obviously problematic, since such a
      directory might be owned by another user and the new encryption policy
      would prevent that other user from creating files in their own directory
      (for example).
      
      Fix this by requiring inode_owner_or_capable() permission to set an
      encryption policy.  This means that either the caller must own the file,
      or the caller must have the capability CAP_FOWNER.
      
      (*) Or also on any regular file, for f2fs v4.6 and later and ext4
          v4.8-rc1 and later; a separate bug fix is coming for that.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      163ae1c6
    • D
      mm: fix show_smap() for zone_device-pmd ranges · ca120cf6
      Dan Williams 提交于
      Attempting to dump /proc/<pid>/smaps for a process with pmd dax mappings
      currently results in the following VM_BUG_ONs:
      
       kernel BUG at mm/huge_memory.c:1105!
       task: ffff88045f16b140 task.stack: ffff88045be14000
       RIP: 0010:[<ffffffff81268f9b>]  [<ffffffff81268f9b>] follow_trans_huge_pmd+0x2cb/0x340
       [..]
       Call Trace:
        [<ffffffff81306030>] smaps_pte_range+0xa0/0x4b0
        [<ffffffff814c2755>] ? vsnprintf+0x255/0x4c0
        [<ffffffff8123c46e>] __walk_page_range+0x1fe/0x4d0
        [<ffffffff8123c8a2>] walk_page_vma+0x62/0x80
        [<ffffffff81307656>] show_smap+0xa6/0x2b0
      
       kernel BUG at fs/proc/task_mmu.c:585!
       RIP: 0010:[<ffffffff81306469>]  [<ffffffff81306469>] smaps_pte_range+0x499/0x4b0
       Call Trace:
        [<ffffffff814c2795>] ? vsnprintf+0x255/0x4c0
        [<ffffffff8123c46e>] __walk_page_range+0x1fe/0x4d0
        [<ffffffff8123c8a2>] walk_page_vma+0x62/0x80
        [<ffffffff81307696>] show_smap+0xa6/0x2b0
      
      These locations are sanity checking page flags that must be set for an
      anonymous transparent huge page, but are not set for the zone_device
      pages associated with dax mappings.
      
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      ca120cf6
  3. 06 9月, 2016 2 次提交
    • W
      btrfs: introduce tickets_id to determine whether asynchronous metadata reclaim work makes progress · ce129655
      Wang Xiaoguang 提交于
      In btrfs_async_reclaim_metadata_space(), we use ticket's address to
      determine whether asynchronous metadata reclaim work is making progress.
      
      	ticket = list_first_entry(&space_info->tickets,
      				  struct reserve_ticket, list);
      	if (last_ticket == ticket) {
      		flush_state++;
      	} else {
      		last_ticket = ticket;
      		flush_state = FLUSH_DELAYED_ITEMS_NR;
      		if (commit_cycles)
      			commit_cycles--;
      	}
      
      But indeed it's wrong, we should not rely on local variable's address to
      do this check, because addresses may be same. In my test environment, I
      dd one 168MB file in a 256MB fs, found that for this file, every time
      wait_reserve_ticket() called, local variable ticket's address is same,
      
      For above codes, assume a previous ticket's address is addrA, last_ticket
      is addrA. Btrfs_async_reclaim_metadata_space() finished this ticket and
      wake up it, then another ticket is added, but with the same address addrA,
      now last_ticket will be same to current ticket, then current ticket's flush
      work will start from current flush_state, not initial FLUSH_DELAYED_ITEMS_NR,
      which may result in some enospc issues(I have seen this in my test machine).
      Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ce129655
    • C
      Btrfs: remove root_log_ctx from ctx list before btrfs_sync_log returns · cbd60aa7
      Chris Mason 提交于
      We use a btrfs_log_ctx structure to pass information into the
      tree log commit, and get error values out.  It gets added to a per
      log-transaction list which we walk when things go bad.
      
      Commit d1433deb added an optimization to skip waiting for the log
      commit, but didn't take root_log_ctx out of the list.  This
      patch makes sure we remove things before exiting.
      Signed-off-by: NChris Mason <clm@fb.com>
      Fixes: d1433deb
      cc: stable@vger.kernel.org # 3.15+
      cbd60aa7
  4. 05 9月, 2016 3 次提交
  5. 04 9月, 2016 1 次提交
    • L
      devpts: return NULL pts 'priv' entry for non-devpts nodes · 3e423945
      Linus Torvalds 提交于
      In commit 8ead9dd5 ("devpts: more pty driver interface cleanups") I
      made devpts_get_priv() just return the dentry->fs_data directly.  And
      because I thought it wouldn't happen, I added a warning if you ever saw
      a pts node that wasn't on devpts.
      
      And no, that warning never triggered under any actual real use, but you
      can trigger it by creating nonsensical pts nodes by hand.
      
      So just revert the warning, and make devpts_get_priv() return NULL for
      that case like it used to.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org # 4.6+
      Cc: Eric W Biederman" <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3e423945
  6. 01 9月, 2016 17 次提交
  7. 31 8月, 2016 2 次提交
    • K
      sysfs: correctly handle read offset on PREALLOC attrs · 17d0774f
      Konstantin Khlebnikov 提交于
      Attributes declared with __ATTR_PREALLOC use sysfs_kf_read() which returns
      zero bytes for non-zero offset. This breaks script checkarray in mdadm tool
      in debian where /bin/sh is 'dash' because its builtin 'read' reads only one
      byte at a time. Script gets 'i' instead of 'idle' when reads current action
      from /sys/block/$dev/md/sync_action and as a result does nothing.
      
      This patch adds trivial implementation of partial read: generate whole
      string and move required part into buffer head.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Fixes: 4ef67a8c ("sysfs/kernfs: make read requests on pre-alloc files use the buffer.")
      Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=787950
      Cc: Stable <stable@vger.kernel.org> # v3.19+
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      17d0774f
    • T
      kernfs: don't depend on d_find_any_alias() when generating notifications · df6a58c5
      Tejun Heo 提交于
      kernfs_notify_workfn() sends out file modified events for the
      scheduled kernfs_nodes.  Because the modifications aren't from
      userland, it doesn't have the matching file struct at hand and can't
      use fsnotify_modify().  Instead, it looked up the inode and then used
      d_find_any_alias() to find the dentry and used fsnotify_parent() and
      fsnotify() directly to generate notifications.
      
      The assumption was that the relevant dentries would have been pinned
      if there are listeners, which isn't true as inotify doesn't pin
      dentries at all and watching the parent doesn't pin the child dentries
      even for dnotify.  This led to, for example, inotify watchers not
      getting notifications if the system is under memory pressure and the
      matching dentries got reclaimed.  It can also be triggered through
      /proc/sys/vm/drop_caches or a remount attempt which involves shrinking
      dcache.
      
      fsnotify_parent() only uses the dentry to access the parent inode,
      which kernfs can do easily.  Update kernfs_notify_workfn() so that it
      uses fsnotify() directly for both the parent and target inodes without
      going through d_find_any_alias().  While at it, supply the target file
      name to fsnotify() from kernfs_node->name.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NEvgeny Vereshchagin <evvers@ya.ru>
      Fixes: d911d987 ("kernfs: make kernfs_notify() trigger inotify events too")
      Cc: John McCutchan <john@johnmccutchan.com>
      Cc: Robert Love <rlove@rlove.org>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: stable@vger.kernel.org # v3.16+
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df6a58c5
  8. 30 8月, 2016 4 次提交
  9. 29 8月, 2016 3 次提交