1. 19 8月, 2015 12 次提交
    • B
      xfs: add helper to conditionally remove items from the AIL · 146e54b7
      Brian Foster 提交于
      Several areas of code duplicate a pattern where we take the AIL lock,
      check whether an item is in the AIL and remove it if so. Create a new
      helper for this pattern and use it where appropriate.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      146e54b7
    • B
      xfs: fix btree cursor error cleanups · f307080a
      Brian Foster 提交于
      The btree cursor cleanup function takes an error parameter that
      affects how buffers are released from the cursor. All buffers are
      released in the event of error. Several callers do not specify the
      XFS_BTREE_ERROR flag in the event of error, however. This can cause
      buffers to hang around locked or with an elevated hold count and
      thus lead to umount hangs in the event of errors.
      
      Fix up the xfs_btree_del_cursor() callers to pass XFS_BTREE_ERROR if
      the cursor is being torn down due to error.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      f307080a
    • B
      xfs: clean up root inode properly on mount failure · 0ae120f8
      Brian Foster 提交于
      The root inode is read as part of the xfs_mountfs() sequence and the
      reference is dropped in the event of failure after we grab the
      inode.  The reference drop doesn't necessarily free the inode,
      however. It marks it for reclaim and potentially kicks off the
      reclaim workqueue.  The workqueue is destroyed further up the error
      path, which means we are subject to crash if the workqueue job runs
      after this point or a memory leak which is identified if the
      xfs_inode_zone is destroyed (e.g., on module removal). Both of these
      outcomes are reproducible via manual instrumentation of a mount
      error after the root inode xfs_iget() call in xfs_mountfs().
      
      Update the xfs_mountfs() error path to cancel any potential reclaim
      work items and to run a synchronous inode reclaim if the root inode
      is marked for reclaim. This ensures that no jobs remain on the queue
      before it is destroyed and that the root inode is freed before the
      reclaim mechanism is torn down.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      0ae120f8
    • B
      xfs: checksum log record ext headers based on record size · a3f20014
      Brian Foster 提交于
      The first 4 bytes of every basic block in the physical log is stamped
      with the current lsn. To support this mechanism, the log record header
      (first block of each new log record) contains space for the original
      first byte of each log record block before it is replaced with the lsn.
      The log record header has space for 32k worth of blocks. The version 2
      log adds new extended record headers for each additional 32k worth of
      blocks beyond what is supported by the record header.
      
      The log record checksum incorporates the log record header, the extended
      headers and the record payload. xlog_cksum() checksums the extended
      headers based on log->l_iclog_heads, which specifies the number of
      extended headers in a log record based on the log buffer size mount
      option. The log buffer size is variable, however, and thus means the
      checksum can be calculated differently based on how a filesystem is
      mounted. This is problematic if a filesystem crashes and recovery occurs
      on a subsequent mount using a different log buffer size. For example,
      crash an active filesystem that is mounted with the default (32k)
      logbsize, attempt remount/recovery using '-o logbsize=64k' and the mount
      fails on or warns about log checksum failures.
      
      To avoid this problem, update xlog_cksum() to calculate the checksum
      based on the size of the log buffer according to the log record. The
      size is already included in the h_size field of the log record header
      and thus is available at log recovery time. Extended log record headers
      are also only written when the log record is large enough to require
      them. This makes checksum calculation of log records consistent with the
      extended record header mechanism as well as how on-disk records are
      checksummed with various log buffer size mount options.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      a3f20014
    • B
      xfs: fix broken icreate log item cancellation · fc0d1656
      Brian Foster 提交于
      Inode cluster buffers are invalidated and cancelled when inode chunks
      are freed to notify log recovery that previous logged updates to the
      metadata buffer should be skipped. This ensures that log recovery does
      not overwrite buffers that might have already been reused.
      
      On v4 filesystems, inode chunk allocation and inode updates are logged
      via the cluster buffers and thus cancellation is easily detected via
      buffer cancellation items. v5 filesystems use the new icreate
      transaction, which uses logical logging and ordered buffers to log a
      full inode chunk allocation at once. The resulting icreate item often
      spans multiple inode cluster buffers.
      
      Log recovery checks for cancelled buffers when processing icreate log
      items, but it has a couple problems. First, it uses the full length of
      the inode chunk rather than the cluster size. Second, it uses the length
      in FSB units rather than BB units. Either of these problems prevent
      icreate recovery from identifying cancelled buffers and thus inode
      initialization proceeds unconditionally.
      
      Update xlog_recover_do_icreate_pass2() to iterate the icreate range in
      cluster sized increments and check each increment for cancellation.
      Since icreate is currently only used for the minimum atomic inode chunk
      allocation, we expect that either all or none of the buffers will be
      cancelled. Cancel the icreate if at least one buffer is cancelled to
      avoid making a bad situation worse by initializing a partial inode
      chunk, but detect such anomalies and warn the user.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      fc0d1656
    • B
      xfs: icreate log item recovery and cancellation tracepoints · 78d57e45
      Brian Foster 提交于
      Various log items have recovery tracepoints to identify whether a
      particular log item is recovered or cancelled. Add the equivalent
      tracepoints for the icreate transaction.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      78d57e45
    • B
      xfs: don't leave EFIs on AIL on mount failure · f0b2efad
      Brian Foster 提交于
      Log recovery occurs in two phases at mount time. In the first phase,
      EFIs and EFDs are processed and potentially cancelled out. EFIs without
      EFD objects are inserted into the AIL for processing and recovery in the
      second phase. xfs_mountfs() runs various other operations between the
      phases and is thus subject to failure. If failure occurs after the first
      phase but before the second, pending EFIs sit on the AIL, pin it and
      cause the mount to hang.
      
      Update the mount sequence to ensure that pending EFIs are cancelled in
      the event of failure. Add a recovery cancellation mechanism to iterate
      the AIL and cancel all EFI items when requested. Plumb cancellation
      support through the log mount finish helper and update xfs_mountfs() to
      invoke cancellation in the event of failure after recovery has started.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      f0b2efad
    • B
      xfs: use EFI refcount consistently in log recovery · e32a1d1f
      Brian Foster 提交于
      The EFI is initialized with a reference count of 2. One for the EFI to
      ensure the item makes it to the AIL and one for the subsequently created
      EFD to release the EFI once the EFD is committed. Log recovery uses the
      EFI in a similar manner, but implements a hack to remove both references
      in one call once the EFD is handled.
      
      Update log recovery to use EFI reference counting in a manner consistent
      with the log. When an EFI is encountered during recovery, an EFI item is
      allocated and inserted to the AIL directly. Since the EFI reference is
      typically dropped when the EFI is unpinned and this is analogous with
      AIL insertion, drop the EFI reference at this point.
      
      When a corresponding EFD is encountered in the log, this indicates that
      the extents were freed, no processing is required and the EFI can be
      dropped. Update xlog_recover_efd_pass2() to simply drop the EFD
      reference at this point rather than open code the AIL removal and EFI
      free.
      
      Remaining EFIs (i.e., with no corresponding EFD) are processed in
      xlog_recover_finish(). An EFD transaction is allocated and the extents
      are freed, which transfers ownership of the EFI reference to the EFD
      item in the log.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e32a1d1f
    • B
      xfs: ensure EFD trans aborts on log recovery extent free failure · 6bc43af3
      Brian Foster 提交于
      Log recovery attempts to free extents with leftover EFIs in the AIL
      after initial processing. If the extent free fails (e.g., due to
      unrelated fs corruption), the transaction is cancelled, though it
      might not be dirtied at the time. If this is the case, the EFD does
      not abort and thus does not release the EFI. This can lead to hangs
      as the EFI pins the AIL.
      
      Update xlog_recover_process_efi() to log the EFD in the transaction
      before xfs_free_extent() errors are handled to ensure the
      transaction is dirty, aborts the EFD and releases the EFI on error.
      Since this is a requirement for EFD processing (and consistent with
      xfs_bmap_finish()), update the EFD logging helper to do the extent
      free and unconditionally log the EFD. This encodes the required EFD
      logging behavior into the helper and reduces the likelihood of
      errors down the road.
      
      [dchinner: re-add xfs_alloc.h to xfs_log_recover.c to fix build
       failure.]
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      6bc43af3
    • B
      xfs: fix efi/efd error handling to avoid fs shutdown hangs · 8d99fe92
      Brian Foster 提交于
      Freeing an extent in XFS involves logging an EFI (extent free
      intention), freeing the actual extent, and logging an EFD (extent
      free done). The EFI object is created with a reference count of 2:
      one for the current transaction and one for the subsequently created
      EFD. Under normal circumstances, the first reference is dropped when
      the EFI is unpinned and the second reference is dropped when the EFD
      is committed to the on-disk log.
      
      In event of errors or filesystem shutdown, there are various
      potential cleanup scenarios depending on the state of the EFI/EFD.
      The cleanup scenarios are confusing and racy, as demonstrated by the
      following test sequence:
      
      	# mount $dev $mnt
      	# fsstress -d $mnt -n 99999 -p 16 -z -f fallocate=1 \
      		-f punch=1 -f creat=1 -f unlink=1 &
      	# sleep 5
      	# killall -9 fsstress; wait
      	# godown -f $mnt
      	# umount
      
      ... in which the final umount can hang due to the AIL being pinned
      indefinitely by one or more EFI items. This can occur due to several
      conditions. For example, if the shutdown occurs after the EFI is
      committed to the on-disk log and the EFD committed to the CIL, but
      before the EFD committed to the log, the EFD iop_committed() abort
      handler does not drop its reference to the EFI. Alternatively,
      manual error injection in the xfs_bmap_finish() codepath shows that
      if an error occurs after the EFI transaction is committed but before
      the EFD is constructed and logged, the EFI is never released from
      the AIL.
      
      Update the EFI/EFD item handling code to use a more straightforward
      and reliable approach to error handling. If an error occurs after
      the EFI transaction is committed and before the EFD is constructed,
      release the EFI explicitly from xfs_bmap_finish(). If the EFI
      transaction is cancelled, release the EFI in the unlock handler.
      
      Once the EFD is constructed, it is responsible for releasing the EFI
      under any circumstances (including whether the EFI item aborts due
      to log I/O error). Update the EFD item handlers to release the EFI
      if the transaction is cancelled or aborts due to log I/O error.
      Finally, update xfs_bmap_finish() to log at least one EFD extent to
      the transaction before xfs_free_extent() errors are handled to
      ensure the transaction is dirty and EFD item error handling is
      triggered.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      8d99fe92
    • B
      xfs: return committed status from xfs_trans_roll() · d43ac29b
      Brian Foster 提交于
      Some callers need to make error handling decisions based on whether
      the current transaction successfully committed or not. Rename
      xfs_trans_roll(), add a new parameter and provide a wrapper to
      preserve existing callers.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      d43ac29b
    • B
      xfs: disentagle EFI release from the extent count · 5e4b5386
      Brian Foster 提交于
      Release of the EFI either occurs based on the reference count or the
      extent count. The extent count used is either the count tracked in
      the EFI or EFD, depending on the particular situation. In either
      case, the count is initialized to the final value and thus always
      matches the current efi_next_extent value once the EFI is completely
      constructed.  For example, the EFI extent count is increased as the
      extents are logged in xfs_bmap_finish() and the full free list is
      always completely processed. Therefore, the count is guaranteed to
      be complete once the EFI transaction is committed. The EFD uses the
      efd_nextents counter to release the EFI. This counter is initialized
      to the count of the EFI when the EFD is created. Thus the EFD, as
      currently used, has no concept of partial EFI release based on
      extent count.
      
      Given that the EFI extent count is always released in whole, use of
      the extent count for reference counting is unnecessary. Remove this
      level of the API and release the EFI based on the core reference
      count. The efi_next_extent counter remains because it is still used
      to track the slot to log the next extent to free.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5e4b5386
  2. 13 7月, 2015 7 次提交
    • L
      Linux 4.2-rc2 · bc0195aa
      Linus Torvalds 提交于
      bc0195aa
    • L
      Revert "drm/i915: Use crtc_state->active in primary check_plane func" · 01e2d062
      Linus Torvalds 提交于
      This reverts commit dec4f799.
      
      Jörg Otte reports a NULL pointder dereference due to this commit, as
      'crtc_state' very much can be NULL:
      
              crtc_state = state->base.state ?
                      intel_atomic_get_crtc_state(state->base.state, intel_crtc) : NULL;
      
      So the change to test 'crtc_state->base.active' cannot possibly be
      correct as-is.
      
      There may be some other minimal fix (like just checking crtc_state for
      NULL), but I'm just reverting it now for the rc2 release, and people
      like Daniel Vetter who actually know this code will figure out what the
      right solution is in the longer term.
      Reported-and-bisected-by: NJörg Otte <jrg.otte@gmail.com>
      Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@intel.com>
      CC: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01e2d062
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · c83727a6
      Linus Torvalds 提交于
      Pull VFS fixes from Al Viro:
       "Fixes for this cycle regression in overlayfs and a couple of
        long-standing (== all the way back to 2.6.12, at least) bugs"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        freeing unlinked file indefinitely delayed
        fix a braino in ovl_d_select_inode()
        9p: don't leave a half-initialized inode sitting around
      c83727a6
    • L
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · 7fbb58a0
      Linus Torvalds 提交于
      Pull MIPS fixes from Ralf Baechle:
       "A fair number of 4.2 fixes also because Markos opened the flood gates.
      
         - Patch up the math used calculate the location for the page bitmap.
      
         - The FDC (Not what you think, FDC stands for Fast Debug Channel) IRQ
           around was causing issues on non-Malta platforms, so move the code
           to a Malta specific location.
      
         - A spelling fix replicated through several files.
      
         - Fix to the emulation of an R2 instruction for R6 cores.
      
         - Fix the JR emulation for R6.
      
         - Further patching of mindless 64 bit issues.
      
         - Ensure the kernel won't crash on CPUs with L2 caches with >= 8
           ways.
      
         - Use compat_sys_getsockopt for O32 ABI on 64 bit kernels.
      
         - Fix cache flushing for multithreaded cores.
      
         - A build fix"
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
        MIPS: O32: Use compat_sys_getsockopt.
        MIPS: c-r4k: Extend way_string array
        MIPS: Pistachio: Support CDMM & Fast Debug Channel
        MIPS: Malta: Make GIC FDC IRQ workaround Malta specific
        MIPS: c-r4k: Fix cache flushing for MT cores
        Revert "MIPS: Kconfig: Disable SMP/CPS for 64-bit"
        MIPS: cps-vec: Use macros for various arithmetics and memory operations
        MIPS: kernel: cps-vec: Replace KSEG0 with CKSEG0
        MIPS: kernel: cps-vec: Use ta0-ta3 pseudo-registers for 64-bit
        MIPS: kernel: cps-vec: Replace mips32r2 ISA level with mips64r2
        MIPS: kernel: cps-vec: Replace 'la' macro with PTR_LA
        MIPS: kernel: smp-cps: Fix 64-bit compatibility errors due to pointer casting
        MIPS: Fix erroneous JR emulation for MIPS R6
        MIPS: Fix branch emulation for BLTC and BGEC instructions
        MIPS: kernel: traps: Fix broken indentation
        MIPS: bootmem: Don't use memory holes for page bitmap
        MIPS: O32: Do not handle require 32 bytes from the stack to be readable.
        MIPS, CPUFREQ: Fix spelling of Institute.
        MIPS: Lemote 2F: Fix build caused by recent mass rename.
      7fbb58a0
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1daa1cfb
      Linus Torvalds 提交于
      Pull x86 fixes from Thomas Gleixner:
      
       - the high latency PIT detection fix, which slipped through the cracks
         for rc1
      
       - a regression fix for the early printk mechanism
      
       - the x86 part to plug irq/vector related hotplug races
      
       - move the allocation of the espfix pages on cpu hotplug to non atomic
         context.  The current code triggers a might_sleep() warning.
      
       - a series of KASAN fixes addressing boot crashes and usability
      
       - a trivial typo fix for Kconfig help text
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/kconfig: Fix typo in the CONFIG_CMDLINE_BOOL help text
        x86/irq: Retrieve irq data after locking irq_desc
        x86/irq: Use proper locking in check_irq_vectors_for_cpu_disable()
        x86/irq: Plug irq vector hotplug race
        x86/earlyprintk: Allow early_printk() to use console style parameters like '115200n8'
        x86/espfix: Init espfix on the boot CPU side
        x86/espfix: Add 'cpu' parameter to init_espfix_ap()
        x86/kasan: Move KASAN_SHADOW_OFFSET to the arch Kconfig
        x86/kasan: Add message about KASAN being initialized
        x86/kasan: Fix boot crash on AMD processors
        x86/kasan: Flush TLBs after switching CR3
        x86/kasan: Fix KASAN shadow region page tables
        x86/init: Clear 'init_level4_pgt' earlier
        x86/tsc: Let high latency PIT fail fast in quick_pit_calibrate()
      1daa1cfb
    • L
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7b732169
      Linus Torvalds 提交于
      Pull timer fixes from Thomas Gleixner:
       "This update from the timer departement contains:
      
         - A series of patches which address a shortcoming in the tick
           broadcast code.
      
           If the broadcast device is not available or an hrtimer emulated
           broadcast device, some of the original assumptions lead to boot
           failures.  I rather plugged all of the corner cases instead of only
           addressing the issue reported, so the change got a little larger.
      
           Has been extensivly tested on x86 and arm.
      
         - Get rid of the last holdouts using do_posix_clock_monotonic_gettime()
      
         - A regression fix for the imx clocksource driver
      
         - An update to the new state callbacks mechanism for clockevents.
           This is required to simplify the conversion, which will take place
           in 4.3"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        tick/broadcast: Prevent NULL pointer dereference
        time: Get rid of do_posix_clock_monotonic_gettime
        cris: Replace do_posix_clock_monotonic_gettime()
        tick/broadcast: Unbreak CONFIG_GENERIC_CLOCKEVENTS=n build
        tick/broadcast: Handle spurious interrupts gracefully
        tick/broadcast: Check for hrtimer broadcast active early
        tick/broadcast: Return busy when IPI is pending
        tick/broadcast: Return busy if periodic mode and hrtimer broadcast
        tick/broadcast: Move the check for periodic mode inside state handling
        tick/broadcast: Prevent deep idle if no broadcast device available
        tick/broadcast: Make idle check independent from mode and config
        tick/broadcast: Sanity check the shutdown of the local clock_event
        tick/broadcast: Prevent hrtimer recursion
        clockevents: Allow set-state callbacks to be optional
        clocksource/imx: Define clocksource for mx27
      7b732169
    • L
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c4bc680c
      Linus Torvalds 提交于
      Pull irq fix from Thomas Gleixner:
       "A single fix for a cpu hotplug race vs. interrupt descriptors:
      
        Prevent irq setup/teardown across the cpu starting/dying parts of cpu
        hotplug so that the starting/dying cpu has a stable view of the
        descriptor space.  This has been an issue for all architectures in the
        cpu dying phase, where interrupts are migrated away from the dying
        cpu.  In the starting phase its mostly a x86 issue vs the vector space
        update"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        hotplug: Prevent alloc/free of irq descriptors during cpu up/down
      c4bc680c
  3. 12 7月, 2015 11 次提交
    • A
      freeing unlinked file indefinitely delayed · 75a6f82a
      Al Viro 提交于
      	Normally opening a file, unlinking it and then closing will have
      the inode freed upon close() (provided that it's not otherwise busy and
      has no remaining links, of course).  However, there's one case where that
      does *not* happen.  Namely, if you open it by fhandle with cold dcache,
      then unlink() and close().
      
      	In normal case you get d_delete() in unlink(2) notice that dentry
      is busy and unhash it; on the final dput() it will be forcibly evicted from
      dcache, triggering iput() and inode removal.  In this case, though, we end
      up with *two* dentries - disconnected (created by open-by-fhandle) and
      regular one (used by unlink()).  The latter will have its reference to inode
      dropped just fine, but the former will not - it's considered hashed (it
      is on the ->s_anon list), so it will stay around until the memory pressure
      will finally do it in.  As the result, we have the final iput() delayed
      indefinitely.  It's trivial to reproduce -
      
      void flush_dcache(void)
      {
              system("mount -o remount,rw /");
      }
      
      static char buf[20 * 1024 * 1024];
      
      main()
      {
              int fd;
              union {
                      struct file_handle f;
                      char buf[MAX_HANDLE_SZ];
              } x;
              int m;
      
              x.f.handle_bytes = sizeof(x);
              chdir("/root");
              mkdir("foo", 0700);
              fd = open("foo/bar", O_CREAT | O_RDWR, 0600);
              close(fd);
              name_to_handle_at(AT_FDCWD, "foo/bar", &x.f, &m, 0);
              flush_dcache();
              fd = open_by_handle_at(AT_FDCWD, &x.f, O_RDWR);
              unlink("foo/bar");
              write(fd, buf, sizeof(buf));
              system("df .");			/* 20Mb eaten */
              close(fd);
              system("df .");			/* should've freed those 20Mb */
              flush_dcache();
              system("df .");			/* should be the same as #2 */
      }
      
      will spit out something like
      Filesystem     1K-blocks   Used Available Use% Mounted on
      /dev/root         322023 303843      1131 100% /
      Filesystem     1K-blocks   Used Available Use% Mounted on
      /dev/root         322023 303843      1131 100% /
      Filesystem     1K-blocks   Used Available Use% Mounted on
      /dev/root         322023 283282     21692  93% /
      - inode gets freed only when dentry is finally evicted (here we trigger
      than by remount; normally it would've happened in response to memory
      pressure hell knows when).
      
      Cc: stable@vger.kernel.org # v2.6.38+; earlier ones need s/kill_it/unhash_it/
      Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      75a6f82a
    • A
      fix a braino in ovl_d_select_inode() · 9391dd00
      Al Viro 提交于
      when opening a directory we want the overlayfs inode, not one from
      the topmost layer.
      Reported-By: NAndrey Jr. Melnikov <temnota.am@gmail.com>
      Tested-By: NAndrey Jr. Melnikov <temnota.am@gmail.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9391dd00
    • A
      9p: don't leave a half-initialized inode sitting around · 0a73d0a2
      Al Viro 提交于
      Cc: stable@vger.kernel.org # all branches
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0a73d0a2
    • L
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm · 59c3cb55
      Linus Torvalds 提交于
      Pull libnvdimm fixes from Dan Williams:
       "1) Fixes for a handful of smatch reports (Thanks Dan C.!) and minor
           bug fixes (patches 1-6)
      
        2) Correctness fixes to the BLK-mode nvdimm driver (patches 7-10).
      
           Granted these are slightly large for a -rc update.  They have been
           out for review in one form or another since the end of May and were
           deferred from the merge window while we settled on the "PMEM API"
           for the PMEM-mode nvdimm driver (ie memremap_pmem, memcpy_to_pmem,
           and wmb_pmem).
      
           Now that those apis are merged we implement them in the BLK driver
           to guarantee that mmio aperture moves stay ordered with respect to
           incoming read/write requests, and that writes are flushed through
           those mmio-windows and platform-buffers to be persistent on media.
      
        These pass the sub-system unit tests with the updates to
        tools/testing/nvdimm, and have received a successful build-report from
        the kbuild robot (468 configs).
      
        With acks from Rafael for the touches to drivers/acpi/"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm:
        nfit: add support for NVDIMM "latch" flag
        nfit: update block I/O path to use PMEM API
        tools/testing/nvdimm: add mock acpi_nfit_flush_address entries to nfit_test
        tools/testing/nvdimm: fix return code for unimplemented commands
        tools/testing/nvdimm: mock ioremap_wt
        pmem: add maintainer for include/linux/pmem.h
        nfit: fix smatch "use after null check" report
        nvdimm: Fix return value of nvdimm_bus_init() if class_create() fails
        libnvdimm: smatch cleanups in __nd_ioctl
        sparse: fix misplaced __pmem definition
      59c3cb55
    • L
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · e4925198
      Linus Torvalds 提交于
      Pull i2c fixes from Wolfram Sang:
       "Mostly slight adjusments for new drivers, but also one core fix for
        which finally the dependencies are now available as well"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: Mark instantiated device nodes with OF_POPULATE
        i2c: jz4780: Fix return value if probe fails
        i2c: xgene-slimpro: Fix missing mbox_free_channel call in probe error path
        i2c: I2C_MT65XX should depend on HAS_DMA
      e4925198
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 8a7b8ff4
      Linus Torvalds 提交于
      Pull input fixes from Dmitry Torokhov:
       "A fix (revert) for a recent regression in Synaptics driver and a fix
        for Elan i2c touchpad driver"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Revert "Input: synaptics - allocate 3 slots to keep stability in image sensors"
        Input: elan_i2c - change the hover event from MT to ST
      8a7b8ff4
    • L
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 4322f028
      Linus Torvalds 提交于
      Pull clk fixes from Stephen Boyd:
       "A small set of fixes for problems found by smatch in new drivers that
        we added this rc and a handful of driver fixes that came in during the
        merge window"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        drivers: clk: st: Incorrect register offset used for lock_status
        clk: mediatek: mt8173: Fix enabling of critical clocks
        drivers: clk: st: Fix mux bit-setting for Cortex A9 clocks
        drivers: clk: st: Add CLK_GET_RATE_NOCACHE flag to clocks
        drivers: clk: st: Fix flexgen lock init
        drivers: clk: st: Fix FSYN channel values
        drivers: clk: st: Remove unused code
        clk: qcom: Use parent rate when set rate to pixel RCG clock
        clk: at91: do not leak resources
        clk: stm32: Fix out-by-one error path in the index lookup
        clk: iproc: fix bit manipulation arithmetic
        clk: iproc: fix memory leak from clock name
      4322f028
    • L
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 9cb1680c
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "A bunch of fixes for radeon, intel, omap and one amdkfd fix.
      
        Radeon fixes are all over, but it does fix some cursor corruption
        across suspend/resume.  i915 should fix the second warn you were
        seeing, so let us know if not.  omap is a bunch of small fixes"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (28 commits)
        drm/radeon: disable vce init on cayman (v2)
        drm/amdgpu: fix timeout calculation
        drm/radeon: check if BO_VA is set before adding it to the invalidation list
        drm/radeon: allways add the VM clear duplicate
        Revert "Revert "drm/radeon: dont switch vt on suspend""
        drm/radeon: Fold radeon_set_cursor() into radeon_show_cursor()
        drm/radeon: unpin cursor BOs on suspend and pin them again on resume (v2)
        drm/radeon: Clean up reference counting and pinning of the cursor BOs
        drm/amdkfd: validate pdd where it acquired first
        Revert "drm/i915: Allocate context objects from stolen"
        drm/i915: Declare the swizzling unknown for L-shaped configurations
        drm/radeon: fix underflow in r600_cp_dispatch_texture()
        drm/radeon: default to 2048 MB GART size on SI+
        drm/radeon: fix HDP flushing
        drm/radeon: use RCU query for GEM_BUSY syscall
        drm/amdgpu: Handle irqs only based on irq ring, not irq status regs.
        drm/radeon: Handle irqs only based on irq ring, not irq status regs.
        drm/i915: Use crtc_state->active in primary check_plane func
        drm/i915: Check crtc->active in intel_crtc_disable_planes
        drm/i915: Restore all GGTT VMAs on resume
        ...
      9cb1680c
    • L
      Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security · 2278cb0b
      Linus Torvalds 提交于
      Pull selinux fixes from James Morris.
      
      * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
        selinux: fix mprotect PROT_EXEC regression caused by mm change
        selinux: don't waste ebitmap space when importing NetLabel categories
      2278cb0b
    • L
      Merge branch 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 31b7a57c
      Linus Torvalds 提交于
      Pull btrfs fixes from Chris Mason:
       "This is an assortment of fixes.  Most of the commits are from Filipe
        (fsync, the inode allocation cache and a few others).  Mark kicked in
        a series fixing corners in the extent sharing ioctls, and everyone
        else fixed up on assorted other problems"
      
      * 'for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        Btrfs: fix wrong check for btrfs_force_chunk_alloc()
        Btrfs: fix warning of bytes_may_use
        Btrfs: fix hang when failing to submit bio of directIO
        Btrfs: fix a comment in inode.c:evict_inode_truncate_pages()
        Btrfs: fix memory corruption on failure to submit bio for direct IO
        btrfs: don't update mtime/ctime on deduped inodes
        btrfs: allow dedupe of same inode
        btrfs: fix deadlock with extent-same and readpage
        btrfs: pass unaligned length to btrfs_cmp_data()
        Btrfs: fix fsync after truncate when no_holes feature is enabled
        Btrfs: fix fsync xattr loss in the fast fsync path
        Btrfs: fix fsync data loss after append write
        Btrfs: fix crash on close_ctree() if cleaner starts new transaction
        Btrfs: fix race between caching kthread and returning inode to inode cache
        Btrfs: use kmem_cache_free when freeing entry in inode cache
        Btrfs: fix race between balance and unused block group deletion
        btrfs: add error handling for scrub_workers_get()
        btrfs: cleanup noused initialization of dev in btrfs_end_bio()
        btrfs: qgroup: allow user to clear the limitation on qgroup
      31b7a57c
    • L
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 84e3e9d0
      Linus Torvalds 提交于
      Pull ARM SoC fixes from Kevin Hilman:
       "A fairly random colletion of fixes based on -rc1 for OMAP, sunxi and
        prima2 as well as a few arm64-specific DT fixes.
      
        This series also includes a late to support a new Allwinner (sunxi)
        SoC, but since it's rather simple and isolated to the
        platform-specific code, it's included it for this -rc"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        arm64: dts: add device tree for ARM SMM-A53x2 on LogicTile Express 20MG
        arm: dts: vexpress: add missing CCI PMU device node to TC2
        arm: dts: vexpress: describe all PMUs in TC2 dts
        GICv3: Add ITS entry to THUNDER dts
        arm64: dts: Add poweroff button device node for APM X-Gene platform
        ARM: dts: am4372.dtsi: disable rfbi
        ARM: dts: am57xx-beagle-x15: Provide supply for usb2_phy2
        ARM: dts: am4372: Add emif node
        Revert "ARM: dts: am335x-boneblack: disable RTC-only sleep"
        ARM: sunxi: Enable simplefb in the defconfig
        ARM: Remove deprecated symbol from defconfig files
        ARM: sunxi: Add Machine support for A33
        ARM: sunxi: Introduce Allwinner H3 support
        Documentation: sunxi: Update Allwinner SoC documentation
        ARM: prima2: move to use REGMAP APIs for rtciobrg
        ARM: dts: atlas7: add pinctrl and gpio descriptions
        ARM: OMAP2+: Remove unnessary return statement from the void function, omap2_show_dma_caps
        memory: omap-gpmc: Fix parsing of devices
      84e3e9d0
  4. 11 7月, 2015 10 次提交
    • T
      tick/broadcast: Prevent NULL pointer dereference · c4d029f2
      Thomas Gleixner 提交于
      Dan reported that the recent changes to the broadcast code introduced
      a potential NULL dereference.
      
      Add the proper check.
      
      Fixes: e0454311 "tick/broadcast: Sanity check the shutdown of the local clock_event"
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      c4d029f2
    • L
      Merge branch 'parisc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · b9243b5a
      Linus Torvalds 提交于
      Pull parisc fixes from Helge Deller:
       "We have one important patch from Dave Anglin and myself which fixes
        PTE/TLB race conditions which caused random segmentation faults on our
        debian buildd servers, and one patch from Alex Ivanov which speeds up
        the graphical text console on the STI framebuffer driver"
      
      * 'parisc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Fix some PTE/TLB race conditions and optimize __flush_tlb_range based on timing results
        stifb: Implement hardware accelerated copyarea
      b9243b5a
    • J
    • S
      selinux: fix mprotect PROT_EXEC regression caused by mm change · 892e8cac
      Stephen Smalley 提交于
      commit 66fc1303 ("mm: shmem_zero_setup
      skip security check and lockdep conflict with XFS") caused a regression
      for SELinux by disabling any SELinux checking of mprotect PROT_EXEC on
      shared anonymous mappings.  However, even before that regression, the
      checking on such mprotect PROT_EXEC calls was inconsistent with the
      checking on a mmap PROT_EXEC call for a shared anonymous mapping.  On a
      mmap, the security hook is passed a NULL file and knows it is dealing
      with an anonymous mapping and therefore applies an execmem check and no
      file checks.  On a mprotect, the security hook is passed a vma with a
      non-NULL vm_file (as this was set from the internally-created shmem
      file during mmap) and therefore applies the file-based execute check
      and no execmem check.  Since the aforementioned commit now marks the
      shmem zero inode with the S_PRIVATE flag, the file checks are disabled
      and we have no checking at all on mprotect PROT_EXEC.  Add a test to
      the mprotect hook logic for such private inodes, and apply an execmem
      check in that case.  This makes the mmap and mprotect checking
      consistent for shared anonymous mappings, as well as for /dev/zero and
      ashmem.
      
      Cc: <stable@vger.kernel.org> # 4.1.x
      Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      892e8cac
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 1604f871
      Linus Torvalds 提交于
      Pull arm64 fixes and clean-up from Catalin Marinas:
       - ACPI fix when checking the validity of the GICC MADT subtable
       - handle debug exceptions in the el*_inv exception entries
       - remove pointless register assignment in two compat syscall wrappers
       - unnecessary include path
       - defconfig update
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: entry32: remove pointless register assignment
        arm64: entry: handle debug exceptions in el*_inv
        arm64: Keep the ARM64 Kconfig selects sorted
        ACPI / ARM64 : use the new BAD_MADT_GICC_ENTRY macro
        ACPI / ARM64: add BAD_MADT_GICC_ENTRY() macro
        arm64: defconfig: Add Ceva ahci to the defconfig
        arm64: remove another unnecessary libfdt include path
      1604f871
    • J
      parisc: Fix some PTE/TLB race conditions and optimize __flush_tlb_range based on timing results · 01ab6057
      John David Anglin 提交于
      The increased use of pdtlb/pitlb instructions seemed to increase the
      frequency of random segmentation faults building packages. Further, we
      had a number of cases where TLB inserts would repeatedly fail and all
      forward progress would stop. The Haskell ghc package caused a lot of
      trouble in this area. The final indication of a race in pte handling was
      this syslog entry on sibaris (C8000):
      
       swap_free: Unused swap offset entry 00000004
       BUG: Bad page map in process mysqld  pte:00000100 pmd:019bbec5
       addr:00000000ec464000 vm_flags:00100073 anon_vma:0000000221023828 mapping: (null) index:ec464
       CPU: 1 PID: 9176 Comm: mysqld Not tainted 4.0.0-2-parisc64-smp #1 Debian 4.0.5-1
       Backtrace:
        [<0000000040173eb0>] show_stack+0x20/0x38
        [<0000000040444424>] dump_stack+0x9c/0x110
        [<00000000402a0d38>] print_bad_pte+0x1a8/0x278
        [<00000000402a28b8>] unmap_single_vma+0x3d8/0x770
        [<00000000402a4090>] zap_page_range+0xf0/0x198
        [<00000000402ba2a4>] SyS_madvise+0x404/0x8c0
      
      Note that the pte value is 0 except for the accessed bit 0x100. This bit
      shouldn't be set without the present bit.
      
      It should be noted that the madvise system call is probably a trigger for many
      of the random segmentation faults.
      
      In looking at the kernel code, I found the following problems:
      
      1) The pte_clear define didn't take TLB lock when clearing a pte.
      2) We didn't test pte present bit inside lock in exception support.
      3) The pte and tlb locks needed to merged in order to ensure consistency
      between page table and TLB. This also has the effect of serializing TLB
      broadcasts on SMP systems.
      
      The attached change implements the above and a few other tweaks to try
      to improve performance. Based on the timing code, TLB purges are very
      slow (e.g., ~ 209 cycles per page on rp3440). Thus, I think it
      beneficial to test the split_tlb variable to avoid duplicate purges.
      Probably, all PA 2.0 machines have combined TLBs.
      
      I dropped using __flush_tlb_range in flush_tlb_mm as I realized all
      applications and most threads have a stack size that is too large to
      make this useful. I added some comments to this effect.
      
      Since implementing 1 through 3, I haven't had any random segmentation
      faults on mx3210 (rp3440) in about one week of building code and running
      as a Debian buildd.
      Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org # v3.18+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      01ab6057
    • A
      stifb: Implement hardware accelerated copyarea · cb908ed3
      Alex Ivanov 提交于
      This patch adds hardware assisted scrolling. The code is based upon the
      following investigation: https://parisc.wiki.kernel.org/index.php/NGLE#Blitter
      
      A simple 'time ls -la /usr/bin' test shows 1.6x speed increase over soft
      copy and 2.3x increase over FBINFO_READS_FAST (prefer soft copy over
      screen redraw) on Artist framebuffer.
      Signed-off-by: NAlex Ivanov <lausgans@gmail.com>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      cb908ed3
    • L
      Merge tag 'powerpc-4.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 3cdeb9d1
      Linus Torvalds 提交于
      Pull powerpc fixes from Michael Ellerman:
       - opal-prd mmap fix from Vaidy
       - set kernel taint for MCEs from Daniel
       - alignment exception description from Anton
       - ppc4xx_hsta_msi build fix from Daniel
       - opal-elog interrupt fix from Alistair
       - core_idle_state race fix from Shreyas
       - hv-24x7 lockdep fix from Sukadev
       - multiple cxl fixes from Daniel, Ian, Mikey & Maninder
       - update MAINTAINERS to point at shared tree
      
      * tag 'powerpc-4.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        cxl: Check if afu is not null in cxl_slbia
        powerpc: Update MAINTAINERS to point at shared tree
        powerpc/perf/24x7: Fix lockdep warning
        cxl: Fix off by one error allowing subsequent mmap page to be accessed
        cxl: Fail mmap if requested mapping is larger than assigned problem state area
        cxl: Fix refcounting in kernel API
        powerpc/powernv: Fix race in updating core_idle_state
        powerpc/powernv: Fix opal-elog interrupt handler
        powerpc/ppc4xx_hsta_msi: Include ppc-pci.h to fix reference to hose_list
        powerpc: Add plain English description for alignment exception oopses
        cxl: Test the correct mmio space before unmapping
        powerpc: Set the correct kernel taint on machine check errors
        cxl/vphb.c: Use phb pointer after NULL check
        powerpc/powernv: Fix vma page prot flags in opal-prd driver
      3cdeb9d1
    • R
      nfit: add support for NVDIMM "latch" flag · f0f2c072
      Ross Zwisler 提交于
      Add support in the NFIT BLK I/O path for the "latch" flag
      defined in the "Get Block NVDIMM Flags" _DSM function:
      
      http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
      
      This flag requires the driver to read back the command register after it
      is written in the block I/O path.  This ensures that the hardware has
      fully processed the new command and moved the aperture appropriately.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      f0f2c072
    • R
      nfit: update block I/O path to use PMEM API · c2ad2954
      Ross Zwisler 提交于
      Update the nfit block I/O path to use the new PMEM API and to adhere to
      the read/write flows outlined in the "NVDIMM Block Window Driver
      Writer's Guide":
      
      http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
      
      This includes adding support for targeted NVDIMM flushes called "flush
      hints" in the ACPI 6.0 specification:
      
      http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
      
      For performance and media durability the mapping for a BLK aperture is
      moved to a write-combining mapping which is consistent with
      memcpy_to_pmem() and wmb_blk().
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c2ad2954