1. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  2. 15 1月, 2016 1 次提交
    • V
      kmemcg: account certain kmem allocations to memcg · 5d097056
      Vladimir Davydov 提交于
      Mark those kmem allocations that are known to be easily triggered from
      userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
      memcg.  For the list, see below:
      
       - threadinfo
       - task_struct
       - task_delay_info
       - pid
       - cred
       - mm_struct
       - vm_area_struct and vm_region (nommu)
       - anon_vma and anon_vma_chain
       - signal_struct
       - sighand_struct
       - fs_struct
       - files_struct
       - fdtable and fdtable->full_fds_bits
       - dentry and external_name
       - inode for all filesystems. This is the most tedious part, because
         most filesystems overwrite the alloc_inode method.
      
      The list is far from complete, so feel free to add more objects.
      Nevertheless, it should be close to "account everything" approach and
      keep most workloads within bounds.  Malevolent users will be able to
      breach the limit, but this was possible even with the former "account
      everything" approach (simply because it did not account everything in
      fact).
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d097056
  3. 31 12月, 2015 1 次提交
  4. 09 12月, 2015 2 次提交
    • A
      replace ->follow_link() with new method that could stay in RCU mode · 6b255391
      Al Viro 提交于
      new method: ->get_link(); replacement of ->follow_link().  The differences
      are:
      	* inode and dentry are passed separately
      	* might be called both in RCU and non-RCU mode;
      the former is indicated by passing it a NULL dentry.
      	* when called that way it isn't allowed to block
      and should return ERR_PTR(-ECHILD) if it needs to be called
      in non-RCU mode.
      
      It's a flagday change - the old method is gone, all in-tree instances
      converted.  Conversion isn't hard; said that, so far very few instances
      do not immediately bail out when called in RCU mode.  That'll change
      in the next commits.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6b255391
    • A
      cifs: avoid unused variable and label · 8c36e9df
      Arnd Bergmann 提交于
      The newly introduced cifs_clone_file_range() function produces
      two harmless compile-time warnings:
      
      cifsfs.c: In function 'cifs_clone_file_range':
      cifsfs.c:963:1: warning: label 'out_unlock' defined but not used [-Wunused-label]
      cifsfs.c:924:20: warning: unused variable 'src_tcon' [-Wunused-variable]
      
      In both cases, removing the extraneous line avoids the warning.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Fixes: c6f2a1e2e5f8 ("vfs: pull btrfs clone API to vfs layer")
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8c36e9df
  5. 08 12月, 2015 1 次提交
    • C
      vfs: pull btrfs clone API to vfs layer · 04b38d60
      Christoph Hellwig 提交于
      The btrfs clone ioctls are now adopted by other file systems, with NFS
      and CIFS already having support for them, and XFS being under active
      development.  To avoid growth of various slightly incompatible
      implementations, add one to the VFS.  Note that clones are different from
      file copies in several ways:
      
       - they are atomic vs other writers
       - they support whole file clones
       - they support 64-bit legth clones
       - they do not allow partial success (aka short writes)
       - clones are expected to be a fast metadata operation
      
      Because of that it would be rather cumbersome to try to piggyback them on
      top of the recent clone_file_range infrastructure.  The converse isn't
      true and the clone_file_range system call could try clone file range as
      a first attempt to copy, something that further patches will enable.
      
      Based on earlier work from Peng Tao.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      04b38d60
  6. 09 11月, 2015 1 次提交
  7. 04 11月, 2015 1 次提交
  8. 03 11月, 2015 1 次提交
  9. 12 9月, 2015 1 次提交
  10. 05 9月, 2015 1 次提交
    • K
      fs: create and use seq_show_option for escaping · a068acf2
      Kees Cook 提交于
      Many file systems that implement the show_options hook fail to correctly
      escape their output which could lead to unescaped characters (e.g.  new
      lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files.  This
      could lead to confusion, spoofed entries (resulting in things like
      systemd issuing false d-bus "mount" notifications), and who knows what
      else.  This looks like it would only be the root user stepping on
      themselves, but it's possible weird things could happen in containers or
      in other situations with delegated mount privileges.
      
      Here's an example using overlay with setuid fusermount trusting the
      contents of /proc/mounts (via the /etc/mtab symlink).  Imagine the use
      of "sudo" is something more sneaky:
      
        $ BASE="ovl"
        $ MNT="$BASE/mnt"
        $ LOW="$BASE/lower"
        $ UP="$BASE/upper"
        $ WORK="$BASE/work/ 0 0
        none /proc fuse.pwn user_id=1000"
        $ mkdir -p "$LOW" "$UP" "$WORK"
        $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
        $ cat /proc/mounts
        none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
        none /proc fuse.pwn user_id=1000 0 0
        $ fusermount -u /proc
        $ cat /proc/mounts
        cat: /proc/mounts: No such file or directory
      
      This fixes the problem by adding new seq_show_option and
      seq_show_option_n helpers, and updating the vulnerable show_option
      handlers to use them as needed.  Some, like SELinux, need to be open
      coded due to unusual existing escape mechanisms.
      
      [akpm@linux-foundation.org: add lost chunk, per Kees]
      [keescook@chromium.org: seq_show_option should be using const parameters]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NJan Kara <jack@suse.com>
      Acked-by: NPaul Moore <paul@paul-moore.com>
      Cc: J. R. Okajima <hooanon05g@gmail.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a068acf2
  11. 11 5月, 2015 1 次提交
  12. 16 4月, 2015 1 次提交
  13. 12 4月, 2015 1 次提交
  14. 20 11月, 2014 1 次提交
  15. 08 10月, 2014 1 次提交
    • J
      locks: plumb a "priv" pointer into the setlease routines · e6f5c789
      Jeff Layton 提交于
      In later patches, we're going to add a new lock_manager_operation to
      finish setting up the lease while still holding the i_lock.  To do
      this, we'll need to pass a little bit of info in the fcntl setlease
      case (primarily an fasync structure). Plumb the extra pointer into
      there in advance of that.
      
      We declare this pointer as a void ** to make it clear that this is
      private info, and that the caller isn't required to set this unless
      the lm_setup specifically requires it.
      Signed-off-by: NJeff Layton <jlayton@primarydata.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      e6f5c789
  16. 18 8月, 2014 1 次提交
  17. 16 8月, 2014 1 次提交
  18. 08 8月, 2014 1 次提交
  19. 20 6月, 2014 1 次提交
  20. 22 5月, 2014 3 次提交
  21. 07 5月, 2014 3 次提交
  22. 17 4月, 2014 1 次提交
    • S
      cifs: Wait for writebacks to complete before attempting write. · c11f1df5
      Sachin Prabhu 提交于
      Problem reported in Red Hat bz 1040329 for strict writes where we cache
      only when we hold oplock and write direct to the server when we don't.
      
      When we receive an oplock break, we first change the oplock value for
      the inode in cifsInodeInfo->oplock to indicate that we no longer hold
      the oplock before we enqueue a task to flush changes to the backing
      device. Once we have completed flushing the changes, we return the
      oplock to the server.
      
      There are 2 ways here where we can have data corruption
      1) While we flush changes to the backing device as part of the oplock
      break, we can have processes write to the file. These writes check for
      the oplock, find none and attempt to write directly to the server.
      These direct writes made while we are flushing from cache could be
      overwritten by data being flushed from the cache causing data
      corruption.
      2) While a thread runs in cifs_strict_writev, the machine could receive
      and process an oplock break after the thread has checked the oplock and
      found that it allows us to cache and before we have made changes to the
      cache. In that case, we end up with a dirty page in cache when we
      shouldn't have any. This will be flushed later and will overwrite all
      subsequent writes to the part of the file represented by this page.
      
      Before making any writes to the server, we need to confirm that we are
      not in the process of flushing data to the server and if we are, we
      should wait until the process is complete before we attempt the write.
      We should also wait for existing writes to complete before we process
      an oplock break request which changes oplock values.
      
      We add a version specific  downgrade_oplock() operation to allow for
      differences in the oplock values set for the different smb versions.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NSachin Prabhu <sprabhu@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NPavel Shilovsky <piastry@etersoft.ru>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      c11f1df5
  23. 04 4月, 2014 2 次提交
  24. 02 4月, 2014 1 次提交
  25. 13 3月, 2014 1 次提交
    • T
      fs: push sync_filesystem() down to the file system's remount_fs() · 02b9984d
      Theodore Ts'o 提交于
      Previously, the no-op "mount -o mount /dev/xxx" operation when the
      file system is already mounted read-write causes an implied,
      unconditional syncfs().  This seems pretty stupid, and it's certainly
      documented or guaraunteed to do this, nor is it particularly useful,
      except in the case where the file system was mounted rw and is getting
      remounted read-only.
      
      However, it's possible that there might be some file systems that are
      actually depending on this behavior.  In most file systems, it's
      probably fine to only call sync_filesystem() when transitioning from
      read-write to read-only, and there are some file systems where this is
      not needed at all (for example, for a pseudo-filesystem or something
      like romfs).
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Evgeniy Dushistov <dushistov@mail.ru>
      Cc: Jan Kara <jack@suse.cz>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Anders Larsen <al@alarsen.net>
      Cc: Phillip Lougher <phillip@squashfs.org.uk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: Petr Vandrovec <petr@vandrovec.name>
      Cc: xfs@oss.sgi.com
      Cc: linux-btrfs@vger.kernel.org
      Cc: linux-cifs@vger.kernel.org
      Cc: samba-technical@lists.samba.org
      Cc: codalist@coda.cs.cmu.edu
      Cc: linux-ext4@vger.kernel.org
      Cc: linux-f2fs-devel@lists.sourceforge.net
      Cc: fuse-devel@lists.sourceforge.net
      Cc: cluster-devel@redhat.com
      Cc: linux-mtd@lists.infradead.org
      Cc: jfs-discussion@lists.sourceforge.net
      Cc: linux-nfs@vger.kernel.org
      Cc: linux-nilfs@vger.kernel.org
      Cc: linux-ntfs-dev@lists.sourceforge.net
      Cc: ocfs2-devel@oss.oracle.com
      Cc: reiserfs-devel@vger.kernel.org
      02b9984d
  26. 25 10月, 2013 1 次提交
  27. 07 10月, 2013 1 次提交
    • J
      cifs: Fix inability to write files >2GB to SMB2/3 shares · 2f6c9479
      Jan Klos 提交于
      When connecting to SMB2/3 shares, maximum file size is set to non-LFS maximum in superblock. This is due to cap_large_files bit being different for SMB1 and SMB2/3 (where it is just an internal flag that is not negotiated and the SMB1 one corresponds to multichannel capability, so maybe LFS works correctly if server sends 0x08 flag) while capabilities are checked always for the SMB1 bit in cifs_read_super().
      
      The patch fixes this by checking for the correct bit according to the protocol version.
      
      CC: Stable <stable@kernel.org>
      Signed-off-by: NJan Klos <honza.klos@gmail.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      2f6c9479
  28. 10 9月, 2013 1 次提交
  29. 09 9月, 2013 2 次提交
  30. 01 8月, 2013 1 次提交
    • J
      cifs: set sb->s_d_op before calling d_make_root() · 66ffd113
      Jeff Layton 提交于
      Currently, the s_root dentry doesn't get its d_op pointer set to
      anything. This breaks lookups in the root of case-insensitive mounts
      since that relies on having d_hash and d_compare routines that know to
      treat the filename as case-insensitive.
      
      cifs.ko has been broken this way for a long time, but commit 1c929cfe
      ("switch cifs"), added a cryptic comment which is removed in the patch
      below, which makes me wonder if this was done deliberately for some
      reason. It's not clear to me why we'd want the s_root not to have d_op
      set properly.
      
      It may have something to do with d_automount or d_revalidate on the
      root, but my suspicion in looking over the code is that Al was just
      trying to preserve the existing behavior when changing this code over to
      use s_d_op.
      
      This patch changes it so that we set s_d_op before calling d_make_root
      and removes the comment. I tested mounting, accessing and unmounting
      several types of shares (including DFS referrals) and everything still
      seemed to work OK afterward. I could be missing something however, so
      please do let me know if I am.
      Reported-by: NJan-Marek Glogowski <glogow@fbihome.de>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Ian Kent <raven@themaw.net>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      66ffd113
  31. 29 6月, 2013 2 次提交
    • J
      locks: protect most of the file_lock handling with i_lock · 1c8c601a
      Jeff Layton 提交于
      Having a global lock that protects all of this code is a clear
      scalability problem. Instead of doing that, move most of the code to be
      protected by the i_lock instead. The exceptions are the global lists
      that the ->fl_link sits on, and the ->fl_block list.
      
      ->fl_link is what connects these structures to the
      global lists, so we must ensure that we hold those locks when iterating
      over or updating these lists.
      
      Furthermore, sound deadlock detection requires that we hold the
      blocked_list state steady while checking for loops. We also must ensure
      that the search and update to the list are atomic.
      
      For the checking and insertion side of the blocked_list, push the
      acquisition of the global lock into __posix_lock_file and ensure that
      checking and update of the  blocked_list is done without dropping the
      lock in between.
      
      On the removal side, when waking up blocked lock waiters, take the
      global lock before walking the blocked list and dequeue the waiters from
      the global list prior to removal from the fl_block list.
      
      With this, deadlock detection should be race free while we minimize
      excessive file_lock_lock thrashing.
      
      Finally, in order to avoid a lock inversion problem when handling
      /proc/locks output we must ensure that manipulations of the fl_block
      list are also protected by the file_lock_lock.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1c8c601a
    • A
      [readdir] convert cifs · be4ccdcc
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      be4ccdcc
  32. 24 6月, 2013 1 次提交