1. 26 9月, 2016 4 次提交
    • B
      xfs: pass current lsn to log recovery buffer validation · 22db9af2
      Brian Foster 提交于
      The current LSN must be available to the buffer validation function to
      provide the ability to update the metadata LSN of the buffer. Pass the
      current_lsn value down to xlog_recover_validate_buf_type() in
      preparation.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      22db9af2
    • B
      xfs: rework log recovery to submit buffers on LSN boundaries · 12818d24
      Brian Foster 提交于
      The fix to log recovery to update the metadata LSN in recovered buffers
      introduces the requirement that a buffer is submitted only once per
      current LSN. Log recovery currently submits buffers on transaction
      boundaries. This is not sufficient as the abstraction between log
      records and transactions allows for various scenarios where multiple
      transactions can share the same current LSN. If independent transactions
      share an LSN and both modify the same buffer, log recovery can
      incorrectly skip updates and leave the filesystem in an inconsisent
      state.
      
      In preparation for proper metadata LSN updates during log recovery,
      update log recovery to submit buffers for write on LSN change boundaries
      rather than transaction boundaries. Explicitly track the current LSN in
      a new struct xlog field to handle the various corner cases of when the
      current LSN may or may not change.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      12818d24
    • D
      xfs: quiesce the filesystem after recovery on readonly mount · ddeb14f4
      Dave Chinner 提交于
      Recently we've had a number of reports where log recovery on a v5
      filesystem has reported corruptions that looked to be caused by
      recovery being re-run over the top of an already-recovered
      metadata. This has uncovered a bug in recovery (fixed elsewhere)
      but the vector that caused this was largely unknown.
      
      A kdump test started tripping over this problem - the system
      would be crashed, the kdump kernel and environment would boot and
      dump the kernel core image, and then the system would reboot. After
      reboot, the root filesystem was triggering log recovery and
      corruptions were being detected. The metadumps indicated the above
      log recovery issue.
      
      What is happening is that the kdump kernel and environment is
      mounting the root device read-only to find the binaries needed to do
      it's work. The result of this is that it is running log recovery.
      However, because there were unlinked files and EFIs to be processed
      by recovery, the completion of phase 1 of log recovery could not
      mark the log clean. And because it's a read-only mount, the unmount
      process does not write records to the log to mark it clean, either.
      Hence on the next mount of the filesystem, log recovery was run
      again across all the metadata that had already been recovered and
      this is what triggered corruption warnings.
      
      To avoid this problem, we need to ensure that a read-only mount
      always updates the log when it completes the second phase of
      recovery. We already handle this sort of issue with rw->ro remount
      transitions, so the solution is as simple as quiescing the
      filesystem at the appropriate time during the mount process. This
      results in the log being marked clean so the mount behaviour
      recorded in the logs on repeated RO mounts will change (i.e. log
      recovery will no longer be run on every mount until a RW mount is
      done). This is a user visible change in behaviour, but it is
      harmless.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ddeb14f4
    • D
      xfs: remote attribute blocks aren't really userdata · 292378ed
      Dave Chinner 提交于
      When adding a new remote attribute, we write the attribute to the
      new extent before the allocation transaction is committed. This
      means we cannot reuse busy extents as that violates crash
      consistency semantics. Hence we currently treat remote attribute
      extent allocation like userdata because it has the same overwrite
      ordering constraints as userdata.
      
      Unfortunately, this also allows the allocator to incorrectly apply
      extent size hints to the remote attribute extent allocation. This
      results in interesting failures, such as transaction block
      reservation overruns and in-memory inode attribute fork corruption.
      
      To fix this, we need to separate the busy extent reuse configuration
      from the userdata configuration. This changes the definition of
      XFS_BMAPI_METADATA slightly - it now means that allocation is
      metadata and reuse of busy extents is acceptible due to the metadata
      ordering semantics of the journal. If this flag is not set, it
      means the allocation is that has unordered data writeback, and hence
      busy extent reuse is not allowed. It no longer implies the
      allocation is for user data, just that the data write will not be
      strictly ordered. This matches the semantics for both user data
      and remote attribute block allocation.
      
      As such, This patch changes the "userdata" field to a "datatype"
      field, and adds a "no busy reuse" flag to the field.
      When we detect an unordered data extent allocation, we immediately set
      the no reuse flag. We then set the "user data" flags based on the
      inode fork we are allocating the extent to. Hence we only set
      userdata flags on data fork allocations now and consider attribute
      fork remote extents to be an unordered metadata extent.
      
      The result is that remote attribute extents now have the expected
      allocation semantics, and the data fork allocation behaviour is
      completely unchanged.
      
      It should be noted that there may be other ways to fix this (e.g.
      use ordered metadata buffers for the remote attribute extent data
      write) but they are more invasive and difficult to validate both
      from a design and implementation POV. Hence this patch takes the
      simple, obvious route to fixing the problem...
      Reported-and-tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      292378ed
  2. 30 8月, 2016 1 次提交
    • D
      xfs: track log done items directly in the deferred pending work item · ea78d808
      Darrick J. Wong 提交于
      Christoph reports slab corruption when a deferred refcount update
      aborts during _defer_finish().  The cause of this was broken log item
      state tracking in xfs_defer_pending -- upon an abort,
      _defer_trans_abort() will call abort_intent on all intent items,
      including the ones that have already had a done item attached.
      
      This is incorrect because each intent item has 2 refcount: the first
      is released when the intent item is committed to the log; and the
      second is released when the _done_ item is committed to the log, or
      by the intent creator if there is no done item.  In other words, once
      we log the done item, responsibility for releasing the intent item's
      second refcount is transferred to the done item and /must not/ be
      performed by anything else.
      
      The dfp_committed flag should have been tracking whether or not we had
      a done item so that _defer_trans_abort could decide if it needs to
      abort the intent item, but due to a thinko this was not the case.  Rip
      it out and track the done item directly so that we do the right thing
      w.r.t. intent item freeing.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reported-by: NChristoph Hellwig <hch@infradead.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ea78d808
  3. 29 8月, 2016 1 次提交
  4. 26 8月, 2016 7 次提交
  5. 17 8月, 2016 13 次提交
  6. 15 8月, 2016 3 次提交
  7. 14 8月, 2016 4 次提交
    • L
      Merge tag 'fixes-for-linus-4.8' of... · 118253a5
      Linus Torvalds 提交于
      Merge tag 'fixes-for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull h8300 and unicore32 architecture fixes from Guenter Roeck:
       "Two patches to fix h8300 and unicore32 builds.
      
        unicore32 builds have been broken since v4.6.  The fix has been
        available in -next since March of this year.
      
        h8300 builds have been broken since the last commit window.  The fix
        has been available in -next since June of this year"
      
      * tag 'fixes-for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        h8300: Add missing include file to asm/io.h
        unicore32: mm: Add missing parameter to arch_vma_access_permitted
      118253a5
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 120c5475
      Linus Torvalds 提交于
      Pull arm64 fixes from Catalin Marinas:
      
       - support for nr_cpus= command line argument (maxcpus was previously
         changed to allow secondary CPUs to be hot-plugged)
      
       - ARM PMU interrupt handling fix
      
       - fix potential TLB conflict in the hibernate code
      
       - improved handling of EL1 instruction aborts (better error reporting)
      
       - removal of useless jprobes code for stack saving/restoring
      
       - defconfig updates
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: defconfig: enable CONFIG_LOCALVERSION_AUTO
        arm64: defconfig: add options for virtualization and containers
        arm64: hibernate: handle allocation failures
        arm64: hibernate: avoid potential TLB conflict
        arm64: Handle el1 synchronous instruction aborts cleanly
        arm64: Remove stack duplicating code from jprobes
        drivers/perf: arm-pmu: Fix handling of SPI lacking "interrupt-affinity" property
        drivers/perf: arm-pmu: convert arm_pmu_mutex to spinlock
        arm64: Support hard limit of cpu count by nr_cpus
      120c5475
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 329f4152
      Linus Torvalds 提交于
      Pull KVM fixes from Radim Krčmář:
       "KVM:
         - lock kvm_device list to prevent corruption on device creation.
      
        PPC:
         - split debugfs initialization from creation of the xics device to
           unlock the newly taken kvm lock earlier.
      
        s390:
         - prevent userspace from triggering two WARN_ON_ONCE.
      
        MIPS:
         - fix several issues in the management of TLB faults (Cc: stable)"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        MIPS: KVM: Propagate kseg0/mapped tlb fault errors
        MIPS: KVM: Fix gfn range check in kseg0 tlb faults
        MIPS: KVM: Add missing gfn range check
        MIPS: KVM: Fix mapped fault broken commpage handling
        KVM: Protect device ops->create and list_add with kvm->lock
        KVM: PPC: Move xics_debugfs_init out of create
        KVM: s390: reset KVM_REQ_MMU_RELOAD if mapping the prefix failed
        KVM: s390: set the prefix initially properly
      329f4152
    • L
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · a1e21033
      Linus Torvalds 提交于
      Pull block fixes from Jens Axboe:
      
       - an NVMe fix from Gabriel, fixing a suspend/resume issue on some
         setups
      
       - addition of a few missing entries in the block queue sysfs
         documentation, from Joe
      
       - a fix for a sparse shadow warning for the bvec iterator, from
         Johannes
      
       - a writeback deadlock involving raid issuing barriers, and not
         flushing the plug when we wakeup the flusher threads.  From
         Konstantin
      
       - a set of patches for the NVMe target/loop/rdma code, from Roland and
         Sagi
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        bvec: avoid variable shadowing warning
        doc: update block/queue-sysfs.txt entries
        nvme: Suspend all queues before deletion
        mm, writeback: flush plugged IO in wakeup_flusher_threads()
        nvme-rdma: Remove unused includes
        nvme-rdma: start async event handler after reconnecting to a controller
        nvmet: Fix controller serial number inconsistency
        nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
        nvmet-rdma: Correctly handle RDMA device hot removal
        nvme-rdma: Make sure to shutdown the controller if we can
        nvme-loop: Remove duplicate call to nvme_remove_namespaces
        nvme-rdma: Free the I/O tags when we delete the controller
        nvme-rdma: Remove duplicate call to nvme_remove_namespaces
        nvme-rdma: Fix device removal handling
        nvme-rdma: Queue ns scanning after a sucessful reconnection
        nvme-rdma: Don't leak uninitialized memory in connect request private data
      a1e21033
  8. 13 8月, 2016 7 次提交
    • G
      h8300: Add missing include file to asm/io.h · 2b05980d
      Guenter Roeck 提交于
      h8300 builds fail with
      
      arch/h8300/include/asm/io.h:9:15: error: unknown type name ‘u8’
      arch/h8300/include/asm/io.h:15:15: error: unknown type name ‘u16’
      arch/h8300/include/asm/io.h:21:15: error: unknown type name ‘u32’
      
      and many related errors.
      
      Fixes: 23c82d41bdf4 ("kexec-allow-architectures-to-override-boot-mapping-fix")
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      2b05980d
    • G
      unicore32: mm: Add missing parameter to arch_vma_access_permitted · 783011b1
      Guenter Roeck 提交于
      unicore32 fails to compile with the following errors.
      
      mm/memory.c: In function ‘__handle_mm_fault’:
      mm/memory.c:3381: error:
      	too many arguments to function ‘arch_vma_access_permitted’
      mm/gup.c: In function ‘check_vma_flags’:
      mm/gup.c:456: error:
      	too many arguments to function ‘arch_vma_access_permitted’
      mm/gup.c: In function ‘vma_permits_fault’:
      mm/gup.c:640: error:
      	too many arguments to function ‘arch_vma_access_permitted’
      
      Fixes: d61172b4 ("mm/core, x86/mm/pkeys: Differentiate instruction fetches")
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Acked-by: NGuan Xuetao <gxt@mprc.pku.edu.cn>
      783011b1
    • L
      Merge tag 'vfio-v4.8-rc2' of git://github.com/awilliam/linux-vfio · f31494bd
      Linus Torvalds 提交于
      Pull VFIO fix from Alex Williamson:
       "Fix oops when dereferencing empty data (Alex Williamson)"
      
      * tag 'vfio-v4.8-rc2' of git://github.com/awilliam/linux-vfio:
        vfio/pci: Fix NULL pointer oops in error interrupt setup handling
      f31494bd
    • L
      Merge tag 'nfsd-4.8-1' of git://linux-nfs.org/~bfields/linux · b112324c
      Linus Torvalds 提交于
      Pull nfsd fixes from Bruce Fields:
       "Fixes for the dentry refcounting leak I introduced in 4.8-rc1, and for
        races in the LOCK code which appear to go back to the big nfsd state
        lock removal from 3.17"
      
      * tag 'nfsd-4.8-1' of git://linux-nfs.org/~bfields/linux:
        nfsd: don't return an unhashed lock stateid after taking mutex
        nfsd: Fix race between FREE_STATEID and LOCK
        nfsd: fix dentry refcounting on create
      b112324c
    • L
      Merge tag 'pm-4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 9710cb66
      Linus Torvalds 提交于
      Pull power management fixes from Rafael Wysocki:
       "Two hibernation fixes allowing it to work with the recently added
        randomization of the kernel identity mapping base on x86-64 and one
        cpufreq driver regression fix.
      
        Specifics:
      
         - Fix the x86 identity mapping creation helpers to avoid the
           assumption that the base address of the mapping will always be
           aligned at the PGD level, as it may be aligned at the PUD level if
           address space randomization is enabled (Rafael Wysocki).
      
         - Fix the hibernation core to avoid executing tracing functions
           before restoring the processor state completely during resume
           (Thomas Garnier).
      
         - Fix a recently introduced regression in the powernv cpufreq driver
           that causes it to crash due to an out-of-bounds array access
           (Akshay Adiga)"
      
      * tag 'pm-4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM / hibernate: Restore processor state before using per-CPU variables
        x86/power/64: Always create temporary identity mapping correctly
        cpufreq: powernv: Fix crash in gpstate_timer_handler()
      9710cb66
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 01ea4439
      Linus Torvalds 提交于
      Pull x86 fixes from Ingo Molnar:
       "This is bigger than usual - the reason is partly a pent-up stream of
        fixes after the merge window and partly accidental.  The fixes are:
      
         - five patches to fix a boot failure on Andy Lutomirsky's laptop
         - four SGI UV platform fixes
         - KASAN fix
         - warning fix
         - documentation update
         - swap entry definition fix
         - pkeys fix
         - irq stats fix"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/apic/x2apic, smp/hotplug: Don't use before alloc in x2apic_cluster_probe()
        x86/efi: Allocate a trampoline if needed in efi_free_boot_services()
        x86/boot: Rework reserve_real_mode() to allow multiple tries
        x86/boot: Defer setup_real_mode() to early_initcall time
        x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly
        x86/boot: Run reserve_bios_regions() after we initialize the memory map
        x86/irq: Do not substract irq_tlb_count from irq_call_count
        x86/mm: Fix swap entry comment and macro
        x86/mm/kaslr: Fix -Wformat-security warning
        x86/mm/pkeys: Fix compact mode by removing protection keys' XSAVE buffer manipulation
        x86/build: Reduce the W=1 warnings noise when compiling x86 syscall tables
        x86/platform/UV: Fix kernel panic running RHEL kdump kernel on UV systems
        x86/platform/UV: Fix problem with UV4 BIOS providing incorrect PXM values
        x86/platform/UV: Fix bug with iounmap() of the UV4 EFI System Table causing a crash
        x86/platform/UV: Fix problem with UV4 Socket IDs not being contiguous
        x86/entry: Clarify the RF saving/restoring situation with SYSCALL/SYSRET
        x86/mm: Disable preemption during CR3 read+write
        x86/mm/KASLR: Increase BRK pages for KASLR memory randomization
        x86/mm/KASLR: Fix physical memory calculation on KASLR memory randomization
        x86, kasan, ftrace: Put APIC interrupt handlers into .irqentry.text
      01ea4439
    • L
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3bc6d8c1
      Linus Torvalds 提交于
      Pull timer fixes from Ingo Molnar:
       "Misc fixes: a /dev/rtc regression fix, two APIC timer period
        calibration fixes, an ARM clocksource driver fix and a NOHZ
        power use regression fix"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/hpet: Fix /dev/rtc breakage caused by RTC cleanup
        x86/timers/apic: Inform TSC deadline clockevent device about recalibration
        x86/timers/apic: Fix imprecise timer interrupts by eliminating TSC clockevents frequency roundoff error
        timers: Fix get_next_timer_interrupt() computation
        clocksource/arm_arch_timer: Force per-CPU interrupt to be level-triggered
      3bc6d8c1