1. 11 4月, 2016 1 次提交
  2. 31 3月, 2016 1 次提交
    • A
      posix_acl: Inode acl caching fixes · b8a7a3a6
      Andreas Gruenbacher 提交于
      When get_acl() is called for an inode whose ACL is not cached yet, the
      get_acl inode operation is called to fetch the ACL from the filesystem.
      The inode operation is responsible for updating the cached acl with
      set_cached_acl().  This is done without locking at the VFS level, so
      another task can call set_cached_acl() or forget_cached_acl() before the
      get_acl inode operation gets to calling set_cached_acl(), and then
      get_acl's call to set_cached_acl() results in caching an outdate ACL.
      
      Prevent this from happening by setting the cached ACL pointer to a
      task-specific sentinel value before calling the get_acl inode operation.
      Move the responsibility for updating the cached ACL from the get_acl
      inode operations to get_acl().  There, only set the cached ACL if the
      sentinel value hasn't changed.
      
      The sentinel values are chosen to have odd values.  Likewise, the value
      of ACL_NOT_CACHED is odd.  In contrast, ACL object pointers always have
      an even value (ACLs are aligned in memory).  This allows to distinguish
      uncached ACLs values from ACL objects.
      
      In addition, switch from guarding inode->i_acl and inode->i_default_acl
      upates by the inode->i_lock spinlock to using xchg() and cmpxchg().
      
      Filesystems that do not want ACLs returned from their get_acl inode
      operations to be cached must call forget_cached_acl() to prevent the VFS
      from doing so.
      
      (Patch written by Al Viro and Andreas Gruenbacher.)
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b8a7a3a6
  3. 23 3月, 2016 1 次提交
    • J
      fs/coredump: prevent fsuid=0 dumps into user-controlled directories · 378c6520
      Jann Horn 提交于
      This commit fixes the following security hole affecting systems where
      all of the following conditions are fulfilled:
      
       - The fs.suid_dumpable sysctl is set to 2.
       - The kernel.core_pattern sysctl's value starts with "/". (Systems
         where kernel.core_pattern starts with "|/" are not affected.)
       - Unprivileged user namespace creation is permitted. (This is
         true on Linux >=3.8, but some distributions disallow it by
         default using a distro patch.)
      
      Under these conditions, if a program executes under secure exec rules,
      causing it to run with the SUID_DUMP_ROOT flag, then unshares its user
      namespace, changes its root directory and crashes, the coredump will be
      written using fsuid=0 and a path derived from kernel.core_pattern - but
      this path is interpreted relative to the root directory of the process,
      allowing the attacker to control where a coredump will be written with
      root privileges.
      
      To fix the security issue, always interpret core_pattern for dumps that
      are written under SUID_DUMP_ROOT relative to the root directory of init.
      Signed-off-by: NJann Horn <jann@thejh.net>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      378c6520
  4. 18 3月, 2016 1 次提交
    • J
      fs crypto: move per-file encryption from f2fs tree to fs/crypto · 0b81d077
      Jaegeuk Kim 提交于
      This patch adds the renamed functions moved from the f2fs crypto files.
      
      1. definitions for per-file encryption used by ext4 and f2fs.
      
      2. crypto.c for encrypt/decrypt functions
       a. IO preparation:
        - fscrypt_get_ctx / fscrypt_release_ctx
       b. before IOs:
        - fscrypt_encrypt_page
        - fscrypt_decrypt_page
        - fscrypt_zeroout_range
       c. after IOs:
        - fscrypt_decrypt_bio_pages
        - fscrypt_pullback_bio_page
        - fscrypt_restore_control_page
      
      3. policy.c supporting context management.
       a. For ioctls:
        - fscrypt_process_policy
        - fscrypt_get_policy
       b. For context permission
        - fscrypt_has_permitted_context
        - fscrypt_inherit_context
      
      4. keyinfo.c to handle permissions
        - fscrypt_get_encryption_info
        - fscrypt_free_encryption_info
      
      5. fname.c to support filename encryption
       a. general wrapper functions
        - fscrypt_fname_disk_to_usr
        - fscrypt_fname_usr_to_disk
        - fscrypt_setup_filename
        - fscrypt_free_filename
      
       b. specific filename handling functions
        - fscrypt_fname_alloc_buffer
        - fscrypt_fname_free_buffer
      
      6. Makefile and Kconfig
      
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
      Signed-off-by: NIldar Muslukhov <ildarm@google.com>
      Signed-off-by: NUday Savagaonkar <savagaon@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0b81d077
  5. 14 3月, 2016 1 次提交
  6. 05 3月, 2016 2 次提交
  7. 21 2月, 2016 6 次提交
    • D
      ima: load policy using path · 7429b092
      Dmitry Kasatkin 提交于
      We currently cannot do appraisal or signature vetting of IMA policies
      since we currently can only load IMA policies by writing the contents
      of the policy directly in, as follows:
      
      cat policy-file > <securityfs>/ima/policy
      
      If we provide the kernel the path to the IMA policy so it can load
      the policy itself it'd be able to later appraise or vet the file
      signature if it has one.  This patch adds support to load the IMA
      policy with a given path as follows:
      
      echo /etc/ima/ima_policy > /sys/kernel/security/ima/policy
      
      Changelog v4+:
      - moved kernel_read_file_from_path() error messages to callers
      v3:
      - moved kernel_read_file_from_path() to a separate patch
      v2:
      - after re-ordering the patches, replace calling integrity_kernel_read()
        to read the file with kernel_read_file_from_path() (Mimi)
      - Patch description re-written by Luis R. Rodriguez
      Signed-off-by: NDmitry Kasatkin <dmitry.kasatkin@huawei.com>
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      7429b092
    • M
      kexec: replace call to copy_file_from_fd() with kernel version · b804defe
      Mimi Zohar 提交于
      Replace copy_file_from_fd() with kernel_read_file_from_fd().
      
      Two new identifiers named READING_KEXEC_IMAGE and READING_KEXEC_INITRAMFS
      are defined for measuring, appraising or auditing the kexec image and
      initramfs.
      
      Changelog v3:
      - return -EBADF, not -ENOEXEC
      - identifier change
      - split patch, moving copy_file_from_fd() to a separate patch
      - split patch, moving IMA changes to a separate patch
      v0:
      - use kstat file size type loff_t, not size_t
      - Calculate the file hash from the in memory buffer - Dave Young
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Acked-by: NDave Young <dyoung@redhat.com>
      b804defe
    • M
      module: replace copy_module_from_fd with kernel version · a1db7420
      Mimi Zohar 提交于
      Replace copy_module_from_fd() with kernel_read_file_from_fd().
      
      Although none of the upstreamed LSMs define a kernel_module_from_file
      hook, IMA is called, based on policy, to prevent unsigned kernel modules
      from being loaded by the original kernel module syscall and to
      measure/appraise signed kernel modules.
      
      The security function security_kernel_module_from_file() was called prior
      to reading a kernel module.  Preventing unsigned kernel modules from being
      loaded by the original kernel module syscall remains on the pre-read
      kernel_read_file() security hook.  Instead of reading the kernel module
      twice, once for measuring/appraising and again for loading the kernel
      module, the signature validation is moved to the kernel_post_read_file()
      security hook.
      
      This patch removes the security_kernel_module_from_file() hook and security
      call.
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      a1db7420
    • M
      vfs: define kernel_copy_file_from_fd() · b844f0ec
      Mimi Zohar 提交于
      This patch defines kernel_read_file_from_fd(), a wrapper for the VFS
      common kernel_read_file().
      
      Changelog:
      - Separated from the kernel modules patch
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      b844f0ec
    • M
      firmware: replace call to fw_read_file_contents() with kernel version · e40ba6d5
      Mimi Zohar 提交于
      Replace the fw_read_file_contents with kernel_file_read_from_path().
      
      Although none of the upstreamed LSMs define a kernel_fw_from_file hook,
      IMA is called by the security function to prevent unsigned firmware from
      being loaded and to measure/appraise signed firmware, based on policy.
      
      Instead of reading the firmware twice, once for measuring/appraising the
      firmware and again for reading the firmware contents into memory, the
      kernel_post_read_file() security hook calculates the file hash based on
      the in memory file buffer.  The firmware is read once.
      
      This patch removes the LSM kernel_fw_from_file() hook and security call.
      
      Changelog v4+:
      - revert dropped buf->size assignment - reported by Sergey Senozhatsky
      v3:
      - remove kernel_fw_from_file hook
      - use kernel_file_read_from_path() - requested by Luis
      v2:
      - reordered and squashed firmware patches
      - fix MAX firmware size (Kees Cook)
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      e40ba6d5
    • M
      vfs: define kernel_read_file_from_path · 09596b94
      Mimi Zohar 提交于
      This patch defines kernel_read_file_from_path(), a wrapper for the VFS
      common kernel_read_file().
      
      Changelog:
      - revert error msg regression - reported by Sergey Senozhatsky
      - Separated from the IMA patch
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      09596b94
  8. 19 2月, 2016 2 次提交
  9. 08 2月, 2016 1 次提交
  10. 31 1月, 2016 2 次提交
    • D
      block: revert runtime dax control of the raw block device · 9f4736fe
      Dan Williams 提交于
      Dynamically enabling DAX requires that the page cache first be flushed
      and invalidated.  This must occur atomically with the change of DAX mode
      otherwise we confuse the fsync/msync tracking and violate data
      durability guarantees.  Eliminate the possibilty of DAX-disabled to
      DAX-enabled transitions for now and revisit this for the next cycle.
      
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      9f4736fe
    • D
      fs, block: force direct-I/O for dax-enabled block devices · 65f87ee7
      Dan Williams 提交于
      Similar to the file I/O path, re-direct all I/O to the DAX path for I/O
      to a block-device special file.  Both regular files and device special
      files can use the common filp->f_mapping->host lookup to determing is
      DAX is enabled.
      
      Otherwise, we confuse the DAX code that does not expect to find live
      data in the page cache:
      
          ------------[ cut here ]------------
          WARNING: CPU: 0 PID: 7676 at mm/filemap.c:217
          __delete_from_page_cache+0x9f6/0xb60()
          Modules linked in:
          CPU: 0 PID: 7676 Comm: a.out Not tainted 4.4.0+ #276
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
           00000000ffffffff ffff88006d3f7738 ffffffff82999e2d 0000000000000000
           ffff8800620a0000 ffffffff86473d20 ffff88006d3f7778 ffffffff81352089
           ffffffff81658d36 ffffffff86473d20 00000000000000d9 ffffea0000009d60
          Call Trace:
           [<     inline     >] __dump_stack lib/dump_stack.c:15
           [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
           [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
           [<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
           [<ffffffff81658d36>] __delete_from_page_cache+0x9f6/0xb60 mm/filemap.c:217
           [<ffffffff81658fb2>] delete_from_page_cache+0x112/0x200 mm/filemap.c:244
           [<ffffffff818af369>] __dax_fault+0x859/0x1800 fs/dax.c:487
           [<ffffffff8186f4f6>] blkdev_dax_fault+0x26/0x30 fs/block_dev.c:1730
           [<     inline     >] wp_pfn_shared mm/memory.c:2208
           [<ffffffff816e9145>] do_wp_page+0xc85/0x14f0 mm/memory.c:2307
           [<     inline     >] handle_pte_fault mm/memory.c:3323
           [<     inline     >] __handle_mm_fault mm/memory.c:3417
           [<ffffffff816ecec3>] handle_mm_fault+0x2483/0x4640 mm/memory.c:3446
           [<ffffffff8127eff6>] __do_page_fault+0x376/0x960 arch/x86/mm/fault.c:1238
           [<ffffffff8127f738>] trace_do_page_fault+0xe8/0x420 arch/x86/mm/fault.c:1331
           [<ffffffff812705c4>] do_async_page_fault+0x14/0xd0 arch/x86/kernel/kvm.c:264
           [<ffffffff86338f78>] async_page_fault+0x28/0x30 arch/x86/entry/entry_64.S:986
           [<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a
          arch/x86/entry/entry_64.S:185
          ---[ end trace dae21e0f85f1f98c ]---
      
      Fixes: 5a023cdb ("block: enable dax for raw block devices")
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Reported-by: NKirill A. Shutemov <kirill@shutemov.name>
      Suggested-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Suggested-by: NMatthew Wilcox <willy@linux.intel.com>
      Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      65f87ee7
  11. 23 1月, 2016 2 次提交
    • R
      dax: support dirty DAX entries in radix tree · f9fe48be
      Ross Zwisler 提交于
      Add support for tracking dirty DAX entries in the struct address_space
      radix tree.  This tree is already used for dirty page writeback, and it
      already supports the use of exceptional (non struct page*) entries.
      
      In order to properly track dirty DAX pages we will insert new
      exceptional entries into the radix tree that represent dirty DAX PTE or
      PMD pages.  These exceptional entries will also contain the writeback
      addresses for the PTE or PMD faults that we can use at fsync/msync time.
      
      There are currently two types of exceptional entries (shmem and shadow)
      that can be placed into the radix tree, and this adds a third.  We rely
      on the fact that only one type of exceptional entry can be found in a
      given radix tree based on its usage.  This happens for free with DAX vs
      shmem but we explicitly prevent shadow entries from being added to radix
      trees for DAX mappings.
      
      The only shadow entries that would be generated for DAX radix trees
      would be to track zero page mappings that were created for holes.  These
      pages would receive minimal benefit from having shadow entries, and the
      choice to have only one type of exceptional entry in a given radix tree
      makes the logic simpler both in clear_exceptional_entry() and in the
      rest of DAX.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f9fe48be
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  12. 09 1月, 2016 3 次提交
  13. 01 1月, 2016 1 次提交
  14. 31 12月, 2015 1 次提交
  15. 30 12月, 2015 1 次提交
  16. 23 12月, 2015 1 次提交
  17. 09 12月, 2015 2 次提交
    • A
      replace ->follow_link() with new method that could stay in RCU mode · 6b255391
      Al Viro 提交于
      new method: ->get_link(); replacement of ->follow_link().  The differences
      are:
      	* inode and dentry are passed separately
      	* might be called both in RCU and non-RCU mode;
      the former is indicated by passing it a NULL dentry.
      	* when called that way it isn't allowed to block
      and should return ERR_PTR(-ECHILD) if it needs to be called
      in non-RCU mode.
      
      It's a flagday change - the old method is gone, all in-tree instances
      converted.  Conversion isn't hard; said that, so far very few instances
      do not immediately bail out when called in RCU mode.  That'll change
      in the next commits.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6b255391
    • A
      don't put symlink bodies in pagecache into highmem · 21fc61c7
      Al Viro 提交于
      kmap() in page_follow_link_light() needed to go - allowing to hold
      an arbitrary number of kmaps for long is a great way to deadlocking
      the system.
      
      new helper (inode_nohighmem(inode)) needs to be used for pagecache
      symlinks inodes; done for all in-tree cases.  page_follow_link_light()
      instrumented to yell about anything missed.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      21fc61c7
  18. 08 12月, 2015 2 次提交
    • C
      vfs: pull btrfs clone API to vfs layer · 04b38d60
      Christoph Hellwig 提交于
      The btrfs clone ioctls are now adopted by other file systems, with NFS
      and CIFS already having support for them, and XFS being under active
      development.  To avoid growth of various slightly incompatible
      implementations, add one to the VFS.  Note that clones are different from
      file copies in several ways:
      
       - they are atomic vs other writers
       - they support whole file clones
       - they support 64-bit legth clones
       - they do not allow partial success (aka short writes)
       - clones are expected to be a fast metadata operation
      
      Because of that it would be rather cumbersome to try to piggyback them on
      top of the recent clone_file_range infrastructure.  The converse isn't
      true and the clone_file_range system call could try clone file range as
      a first attempt to copy, something that further patches will enable.
      
      Based on earlier work from Peng Tao.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      04b38d60
    • C
      locks: new locks_mandatory_area calling convention · acc15575
      Christoph Hellwig 提交于
      Pass a loff_t end for the last byte instead of the 32-bit count
      parameter to allow full file clones even on 32-bit architectures.
      While we're at it also simplify the read/write selection.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      acc15575
  19. 07 12月, 2015 5 次提交
  20. 02 12月, 2015 1 次提交
    • Z
      vfs: add copy_file_range syscall and vfs helper · 29732938
      Zach Brown 提交于
      Add a copy_file_range() system call for offloading copies between
      regular files.
      
      This gives an interface to underlying layers of the storage stack which
      can copy without reading and writing all the data.  There are a few
      candidates that should support copy offloading in the nearer term:
      
      - btrfs shares extent references with its clone ioctl
      - NFS has patches to add a COPY command which copies on the server
      - SCSI has a family of XCOPY commands which copy in the device
      
      This system call avoids the complexity of also accelerating the creation
      of the destination file by operating on an existing destination file
      descriptor, not a path.
      
      Currently the high level vfs entry point limits copy offloading to files
      on the same mount and super (and not in the same file).  This can be
      relaxed if we get implementations which can copy between file systems
      safely.
      Signed-off-by: NZach Brown <zab@redhat.com>
      [Anna Schumaker: Change -EINVAL to -EBADF during file verification,
                       Change flags parameter from int to unsigned int,
                       Add function to include/linux/syscalls.h,
                       Check copy len after file open mode,
                       Don't forbid ranges inside the same file,
                       Use rw_verify_area() to veriy ranges,
                       Use file_out rather than file_in,
                       Add COPY_FR_REFLINK flag]
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      29732938
  21. 16 11月, 2015 1 次提交
    • J
      locks: Allow disabling mandatory locking at compile time · 9e8925b6
      Jeff Layton 提交于
      Mandatory locking appears to be almost unused and buggy and there
      appears no real interest in doing anything with it.  Since effectively
      no one uses the code and since the code is buggy let's allow it to be
      disabled at compile time.  I would just suggest removing the code but
      undoubtedly that will break some piece of userspace code somewhere.
      
      For the distributions that don't care about this piece of code
      this gives a nice starting point to make mandatory locking go away.
      
      Cc: Benjamin Coddington <bcodding@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jeff Layton <jeff.layton@primarydata.com>
      Cc: J. Bruce Fields <bfields@fieldses.org>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      9e8925b6
  22. 11 11月, 2015 1 次提交
  23. 08 11月, 2015 1 次提交