1. 19 10月, 2017 4 次提交
  2. 13 10月, 2017 2 次提交
  3. 12 10月, 2017 3 次提交
    • R
      ext4: add sanity check for encryption + DAX · 7d3e06a8
      Ross Zwisler 提交于
      We prevent DAX from being used on inodes which are using ext4's built in
      encryption via a check in ext4_set_inode_flags().  We do have what appears
      to be an unsafe transition of S_DAX in ext4_set_context(), though, where
      S_DAX can get disabled without us doing a proper writeback + invalidate.
      
      There are also issues with mm-level races when changing the value of S_DAX,
      as well as issues with the VM_MIXEDMAP flag:
      
      https://www.spinics.net/lists/linux-xfs/msg09859.html
      
      I actually think we are safe in this case because of the following:
      
      1) You can't encrypt an existing file.  Encryption can only be set on an
      empty directory, with new inodes in that directory being created with
      encryption turned on, so I don't think it's possible to turn encryption on
      for a file that has open DAX mmaps or outstanding I/Os.
      
      2) There is no way to turn encryption off on a given file.  Once an inode
      is encrypted, it stays encrypted for the life of that inode, so we don't
      have to worry about the case where we turn encryption off and S_DAX
      suddenly turns on.
      
      3) The only way we end up in ext4_set_context() to turn on encryption is
      when we are creating a new file in the encrypted directory.  This happens
      as part of ext4_create() before the inode has been allowed to do any I/O.
      Here's the call tree:
      
       ext4_create()
         __ext4_new_inode()
      	 ext4_set_inode_flags() // sets S_DAX
      	 fscrypt_inherit_context()
      		fscrypt_get_encryption_info();
      		ext4_set_context() // sets EXT4_INODE_ENCRYPT, clears S_DAX
      
      So, I actually think it's safe to transition S_DAX in ext4_set_context()
      without any locking, writebacks or invalidations.  I've added a
      WARN_ON_ONCE() sanity check to make sure that we are notified if we ever
      encounter a case where we are encrypting an inode that already has data,
      in which case we need to add code to safely transition S_DAX.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      7d3e06a8
    • R
      ext4: prevent data corruption with journaling + DAX · e9072d85
      Ross Zwisler 提交于
      The current code has the potential for data corruption when changing an
      inode's journaling mode, as that can result in a subsequent unsafe change
      in S_DAX.
      
      I've captured an instance of this data corruption in the following fstest:
      
      https://patchwork.kernel.org/patch/9948377/
      
      Prevent this data corruption from happening by disallowing changes to the
      journaling mode if the '-o dax' mount option was used.  This means that for
      a given filesystem we could have a mix of inodes using either DAX or
      data journaling, but whatever state the inodes are in will be held for the
      duration of the mount.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.org
      e9072d85
    • R
      ext4: prevent data corruption with inline data + DAX · 559db4c6
      Ross Zwisler 提交于
      If an inode has inline data it is currently prevented from using DAX by a
      check in ext4_set_inode_flags().  When the inode grows inline data via
      ext4_create_inline_data() or removes its inline data via
      ext4_destroy_inline_data_nolock(), the value of S_DAX can change.
      
      Currently these changes are unsafe because we don't hold off page faults
      and I/O, write back dirty radix tree entries and invalidate all mappings.
      There are also issues with mm-level races when changing the value of S_DAX,
      as well as issues with the VM_MIXEDMAP flag:
      
      https://www.spinics.net/lists/linux-xfs/msg09859.html
      
      The unsafe transition of S_DAX can reliably cause data corruption, as shown
      by the following fstest:
      
      https://patchwork.kernel.org/patch/9948381/
      
      Fix this issue by preventing the DAX mount option from being used on
      filesystems that were created to support inline data.  Inline data is an
      option given to mkfs.ext4.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      CC: stable@vger.kernel.org
      559db4c6
  4. 07 10月, 2017 1 次提交
    • T
      ext4: fix interaction between i_size, fallocate, and delalloc after a crash · 51e3ae81
      Theodore Ts'o 提交于
      If there are pending writes subject to delayed allocation, then i_size
      will show size after the writes have completed, while i_disksize
      contains the value of i_size on the disk (since the writes have not
      been persisted to disk).
      
      If fallocate(2) is called with the FALLOC_FL_KEEP_SIZE flag, either
      with or without the FALLOC_FL_ZERO_RANGE flag set, and the new size
      after the fallocate(2) is between i_size and i_disksize, then after a
      crash, if a journal commit has resulted in the changes made by the
      fallocate() call to be persisted after a crash, but the delayed
      allocation write has not resolved itself, i_size would not be updated,
      and this would cause the following e2fsck complaint:
      
      Inode 12, end of extent exceeds allowed value
      	(logical block 33, physical block 33441, len 7)
      
      This can only take place on a sparse file, where the fallocate(2) call
      is allocating blocks in a range which is before a pending delayed
      allocation write which is extending i_size.  Since this situation is
      quite rare, and the window in which the crash must take place is
      typically < 30 seconds, in practice this condition will rarely happen.
      
      Nevertheless, it can be triggered in testing, and in particular by
      xfstests generic/456.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reported-by: NAmir Goldstein <amir73il@gmail.com>
      Cc: stable@vger.kernel.org
      51e3ae81
  5. 02 10月, 2017 14 次提交
    • T
      ext4: retry allocations conservatively · 68fd9750
      Theodore Ts'o 提交于
      Now that we no longer try to reserve metadata blocks for delayed
      allocations (which tended to overestimate the required number of
      blocks significantly), we really don't need retry allocations when the
      disk is very full as aggressively any more.
      
      The only time when it makes sense to retry an allocation is if we have
      freshly deleted blocks that will only become available after a
      transaction commit.  And if we lose that race, it's not worth it to
      try more than once.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      68fd9750
    • C
      ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA · 545052e9
      Christoph Hellwig 提交于
      Switch to the iomap_seek_hole and iomap_seek_data helpers for
      implementing lseek SEEK_HOLE / SEEK_DATA, and remove all the code that
      isn't needed any more.
      
      Note that with this patch ext4 will now always depend on the iomap code
      instead of only when CONFIG_DAX is enabled, and it requires adding a
      call into the extent status tree for iomap_begin as well to properly
      deal with delalloc extents.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      [More fixes and cleanups by Andreas]
      545052e9
    • A
      ext4: Add iomap support for inline data · 7046ae35
      Andreas Gruenbacher 提交于
      Report inline data as a IOMAP_F_DATA_INLINE mapping.  This allows to use
      iomap_seek_hole and iomap_seek_data in ext4_llseek and makes switching
      to iomap_fiemap in ext4_fiemap easier.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      7046ae35
    • A
      iomap: Add IOMAP_F_DATA_INLINE flag · 9ca250a5
      Andreas Gruenbacher 提交于
      Add a new IOMAP_F_DATA_INLINE flag to indicate that a mapping is in a
      disk area that contains data as well as metadata.  In iomap_fiemap, map
      this flag to FIEMAP_EXTENT_DATA_INLINE.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      9ca250a5
    • A
      iomap: Switch from blkno to disk offset · 19fe5f64
      Andreas Gruenbacher 提交于
      Replace iomap->blkno, the sector number, with iomap->addr, the disk
      offset in bytes.  For invalid disk offsets, use the special value
      IOMAP_NULL_ADDR instead of IOMAP_NULL_BLOCK.
      
      This allows to use iomap for mappings which are not block aligned, such
      as inline data on ext4.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>  # iomap, xfs
      Reviewed-by: NJan Kara <jack@suse.cz>
      19fe5f64
    • L
      Linux 4.14-rc3 · 9e66317d
      Linus Torvalds 提交于
      9e66317d
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 368f8998
      Linus Torvalds 提交于
      Pull x86 fixes from Thomas Gleixner:
       "This contains the following fixes and improvements:
      
         - Avoid dereferencing an unprotected VMA pointer in the fault signal
           generation code
      
         - Fix inline asm call constraints for GCC 4.4
      
         - Use existing register variable to retrieve the stack pointer
           instead of forcing the compiler to create another indirect access
           which results in excessive extra 'mov %rsp, %<dst>' instructions
      
         - Disable branch profiling for the memory encryption code to prevent
           an early boot crash
      
         - Fix a sparse warning caused by casting the __user annotation in
           __get_user_asm_u64() away
      
         - Fix an off by one error in the loop termination of the error patch
           in the x86 sysfs init code
      
         - Add missing CPU IDs to various Intel specific drivers to enable the
           functionality on recent hardware
      
         - More (init) constification in the numachip code"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/asm: Use register variable to get stack pointer value
        x86/mm: Disable branch profiling in mem_encrypt.c
        x86/asm: Fix inline asm call constraints for GCC 4.4
        perf/x86/intel/uncore: Correct num_boxes for IIO and IRP
        perf/x86/intel/rapl: Add missing CPU IDs
        perf/x86/msr: Add missing CPU IDs
        perf/x86/intel/cstate: Add missing CPU IDs
        x86: Don't cast away the __user in __get_user_asm_u64()
        x86/sysfs: Fix off-by-one error in loop termination
        x86/mm: Fix fault error path using unsafe vma pointer
        x86/numachip: Add const and __initconst to numachip2_clockevent
      368f8998
    • L
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c42ed9f9
      Linus Torvalds 提交于
      Pull timer fixes from Thomas Gleixner:
       "This adds a new timer wheel function which is required for the
        conversion of the timer callback function from the 'unsigned long
        data' argument to 'struct timer_list *timer'. This conversion has two
        benefits:
      
         1) It makes struct timer_list smaller
      
         2) Many callers hand in a pointer to the timer or to the structure
            containing the timer, which happens via type casting both at setup
            and in the callback. This change gets rid of the typecasts.
      
        Once the conversion is complete, which is planned for 4.15, the old
        setup function and the intermediate typecast in the new setup function
        go away along with the data field in struct timer_list.
      
        Merging this now into mainline allows a smooth queueing of the actual
        conversion in the affected maintainer trees without creating
        dependencies"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        um/time: Fixup namespace collision
        timer: Prepare to change timer callback argument type
      c42ed9f9
    • L
      Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 82513545
      Linus Torvalds 提交于
      Pull smp/hotplug fixes from Thomas Gleixner:
       "This addresses the fallout of the new lockdep mechanism which covers
        completions in the CPU hotplug code.
      
        The lockdep splats are false positives, but there is no way to
        annotate that reliably. The solution is to split the completions for
        CPU up and down, which requires some reshuffling of the failure
        rollback handling as well"
      
      * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        smp/hotplug: Hotplug state fail injection
        smp/hotplug: Differentiate the AP completion between up and down
        smp/hotplug: Differentiate the AP-work lockdep class between up and down
        smp/hotplug: Callback vs state-machine consistency
        smp/hotplug: Rewrite AP state machine core
        smp/hotplug: Allow external multi-instance rollback
        smp/hotplug: Add state diagram
      82513545
    • L
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7e103ace
      Linus Torvalds 提交于
      Pull scheduler fixes from Thomas Gleixner:
       "The scheduler pull request comes with the following updates:
      
         - Prevent a divide by zero issue by validating the input value of
           sysctl_sched_time_avg
      
         - Make task state printing consistent all over the place and have
           explicit state characters for IDLE and PARKED so they wont be
           displayed as 'D' state which confuses tools"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/sysctl: Check user input value of sysctl_sched_time_avg
        sched/debug: Add explicit TASK_PARKED printing
        sched/debug: Ignore TASK_IDLE for SysRq-W
        sched/debug: Add explicit TASK_IDLE printing
        sched/tracing: Use common task-state helpers
        sched/tracing: Fix trace_sched_switch task-state printing
        sched/debug: Remove unused variable
        sched/debug: Convert TASK_state to hex
        sched/debug: Implement consistent task-state printing
      7e103ace
    • L
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1c6f705b
      Linus Torvalds 提交于
      Pull perf fixes from Thomas Gleixner:
      
       - Prevent a division by zero in the perf aux buffer handling
      
       - Sync kernel headers with perf tool headers
      
       - Fix a build failure in the syscalltbl code
      
       - Make the debug messages of perf report --call-graph work correctly
      
       - Make sure that all required perf files are in the MANIFEST for
         container builds
      
       - Fix the atrr.exclude kernel handling so it respects the
         perf_event_paranoid and the user permissions
      
       - Make perf test on s390x work correctly
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/aux: Only update ->aux_wakeup in non-overwrite mode
        perf test: Fix vmlinux failure on s390x part 2
        perf test: Fix vmlinux failure on s390x
        perf tools: Fix syscalltbl build failure
        perf report: Fix debug messages with --call-graph option
        perf evsel: Fix attr.exclude_kernel setting for default cycles:p
        tools include: Sync kernel ABI headers with tooling headers
        perf tools: Get all of tools/{arch,include}/ in the MANIFEST
      1c6f705b
    • L
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1de47f3c
      Linus Torvalds 提交于
      Pull  locking fixes from Thomas Gleixner:
       "Two fixes for locking:
      
         - Plug a hole the pi_stat->owner serialization which was changed
           recently and failed to fixup two usage sites.
      
         - Prevent reordering of the rwsem_has_spinner() check vs the
           decrement of rwsem count in up_write() which causes a missed
           wakeup"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/rwsem-xadd: Fix missed wakeup due to reordering of load
        futex: Fix pi_state->owner serialization
      1de47f3c
    • L
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3d9d62b9
      Linus Torvalds 提交于
      Pull irq fixes from Thomas Gleixner:
      
       - Add a missing NULL pointer check in free_irq()
      
       - Fix a memory leak/memory corruption in the generic irq chip
      
       - Add missing rcu annotations for radix tree access
      
       - Use ffs instead of fls when extracting data from a chip register in
         the MIPS GIC irq driver
      
       - Fix the unmasking of IPI interrupts in the MIPS GIC driver so they
         end up at the target CPU and not at CPU0
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irq/generic-chip: Don't replace domain's name
        irqdomain: Add __rcu annotations to radix tree accessors
        irqchip/mips-gic: Use effective affinity to unmask
        irqchip/mips-gic: Fix shifts to extract register fields
        genirq: Check __free_irq() return value for NULL
      3d9d62b9
    • L
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 156069f8
      Linus Torvalds 提交于
      Pull objtool fixes from Thomas Gleixner:
       "Two small fixes for objtool:
      
         - Support frame pointer setup via 'lea (%rsp), %rbp' which was not
           yet supported and caused build warnings
      
         - Disable unreacahble warnings for GCC4.4 and older to avoid false
           positives caused by the compiler itself"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        objtool: Support unoptimized frame pointer setup
        objtool: Skip unreachable warnings for GCC 4.4 and older
      156069f8
  6. 01 10月, 2017 2 次提交
  7. 30 9月, 2017 14 次提交