1. 03 7月, 2013 2 次提交
    • J
      vfs: export lseek_execute() to modules · 46a1c2c7
      Jie Liu 提交于
      For those file systems(btrfs/ext4/ocfs2/tmpfs) that support
      SEEK_DATA/SEEK_HOLE functions, we end up handling the similar
      matter in lseek_execute() to update the current file offset
      to the desired offset if it is valid, ceph also does the
      simliar things at ceph_llseek().
      
      To reduce the duplications, this patch make lseek_execute()
      public accessible so that we can call it directly from the
      underlying file systems.
      
      Thanks Dave Chinner for this suggestion.
      
      [AV: call it vfs_setpos(), don't bring the removed 'inode' argument back]
      
      v2->v1:
      - Add kernel-doc comments for lseek_execute()
      - Call lseek_execute() in ceph->llseek()
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: Josef Bacik <jbacik@fusionio.com>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: Ted Tso <tytso@mit.edu>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Sage Weil <sage@inktank.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      46a1c2c7
    • D
      sync: don't block the flusher thread waiting on IO · 7747bd4b
      Dave Chinner 提交于
      When sync does it's WB_SYNC_ALL writeback, it issues data Io and
      then immediately waits for IO completion. This is done in the
      context of the flusher thread, and hence completely ties up the
      flusher thread for the backing device until all the dirty inodes
      have been synced. On filesystems that are dirtying inodes constantly
      and quickly, this means the flusher thread can be tied up for
      minutes per sync call and hence badly affect system level write IO
      performance as the page cache cannot be cleaned quickly.
      
      We already have a wait loop for IO completion for sync(2), so cut
      this out of the flusher thread and delegate it to wait_sb_inodes().
      Hence we can do rapid IO submission, and then wait for it all to
      complete.
      
      Effect of sync on fsmark before the patch:
      
      FSUse%        Count         Size    Files/sec     App Overhead
      .....
           0       640000         4096      35154.6          1026984
           0       720000         4096      36740.3          1023844
           0       800000         4096      36184.6           916599
           0       880000         4096       1282.7          1054367
           0       960000         4096       3951.3           918773
           0      1040000         4096      40646.2           996448
           0      1120000         4096      43610.1           895647
           0      1200000         4096      40333.1           921048
      
      And a single sync pass took:
      
        real    0m52.407s
        user    0m0.000s
        sys     0m0.090s
      
      After the patch, there is no impact on fsmark results, and each
      individual sync(2) operation run concurrently with the same fsmark
      workload takes roughly 7s:
      
        real    0m6.930s
        user    0m0.000s
        sys     0m0.039s
      
      IOWs, sync is 7-8x faster on a busy filesystem and does not have an
      adverse impact on ongoing async data write operations.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7747bd4b
  2. 01 7月, 2013 2 次提交
    • T
      jbd2: invalidate handle if jbd2_journal_restart() fails · 41a5b913
      Theodore Ts'o 提交于
      If jbd2_journal_restart() fails the handle will have been disconnected
      from the current transaction.  In this situation, the handle must not
      be used for for any jbd2 function other than jbd2_journal_stop().
      Enforce this with by treating a handle which has a NULL transaction
      pointer as an aborted handle, and issue a kernel warning if
      jbd2_journal_extent(), jbd2_journal_get_write_access(),
      jbd2_journal_dirty_metadata(), etc. is called with an invalid handle.
      
      This commit also fixes a bug where jbd2_journal_stop() would trip over
      a kernel jbd2 assertion check when trying to free an invalid handle.
      
      Also move the responsibility of setting current->journal_info to
      start_this_handle(), simplifying the three users of this function.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: NYounger Liu <younger.liu@huawei.com>
      Cc: Jan Kara <jack@suse.cz>
      41a5b913
    • T
      ext4: translate flag bits to strings in tracepoints · 21ddd568
      Theodore Ts'o 提交于
      Translate the bitfields used in various flags argument to strings to
      make the tracepoint output more human-readable.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      21ddd568
  3. 29 6月, 2013 21 次提交
  4. 28 6月, 2013 2 次提交
    • J
      xen: Convert printks to pr_<level> · 283c0972
      Joe Perches 提交于
      Convert printks to pr_<level> (excludes printk(KERN_DEBUG...)
      to be more consistent throughout the xen subsystem.
      
      Add pr_fmt with KBUILD_MODNAME or "xen:" KBUILD_MODNAME
      Coalesce formats and add missing word spaces
      Add missing newlines
      Align arguments and reflow to 80 columns
      Remove DRV_NAME from formats as pr_fmt adds the same content
      
      This does change some of the prefixes of these messages
      but it also does make them more consistent.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      283c0972
    • T
      cgroup: CGRP_ROOT_SUBSYS_BOUND should be ignored when comparing mount options · 0ce6cba3
      Tejun Heo 提交于
      1672d040 ("cgroup: fix cgroupfs_root early destruction path")
      introduced CGRP_ROOT_SUBSYS_BOUND which is used to mark completion of
      subsys binding on a new root; however, this broke remounts.
      cgroup_remount() doesn't allow changing root options via remount and
      CGRP_ROOT_SUBSYS_BOUND, which is set on all fully initialized roots,
      makes the function reject all remounts.
      
      Fix it by putting the options part in the lower 16 bits of root->flags
      and masking the comparions.  While at it, make cgroup_remount() emit
      an error message explaining why it's rejecting a remount request, so
      that it's less of a mystery.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      0ce6cba3
  5. 27 6月, 2013 7 次提交
  6. 26 6月, 2013 6 次提交
    • D
      mutex: Add w/w mutex slowpath debugging · 23010027
      Daniel Vetter 提交于
      Injects EDEADLK conditions at pseudo-random interval, with
      exponential backoff up to UINT_MAX (to ensure that every lock
      operation still completes in a reasonable time).
      
      This way we can test the wound slowpath even for ww mutex users
      where contention is never expected, and the ww deadlock
      avoidance algorithm is only needed for correctness against
      malicious userspace. An example would be protecting kernel
      modesetting properties, which thanks to single-threaded X isn't
      really expected to contend, ever.
      
      I've looked into using the CONFIG_FAULT_INJECTION
      infrastructure, but decided against it for two reasons:
      
      - EDEADLK handling is mandatory for ww mutex users and should
        never affect the outcome of a syscall. This is in contrast to -ENOMEM
        injection. So fine configurability isn't required.
      
      - The fault injection framework only allows to set a simple
        probability for failure. Now the probability that a ww mutex acquire
        stage with N locks will never complete (due to too many injected
        EDEADLK backoffs) is zero. But the expected number of ww_mutex_lock
        operations for the completely uncontended case would be O(exp(N)).
        The per-acuiqire ctx exponential backoff solution choosen here only
        results in O(log N) overhead due to injection and so O(log N * N)
        lock operations. This way we can fail with high probability (and so
        have good test coverage even for fancy backoff and lock acquisition
        paths) without running into patalogical cases.
      
      Note that EDEADLK will only ever be injected when we managed to
      acquire the lock. This prevents any behaviour changes for users
      which rely on the EALREADY semantics.
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: dri-devel@lists.freedesktop.org
      Cc: linaro-mm-sig@lists.linaro.org
      Cc: rostedt@goodmis.org
      Cc: daniel@ffwll.ch
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20130620113117.4001.21681.stgit@patserSigned-off-by: NIngo Molnar <mingo@kernel.org>
      23010027
    • M
      mutex: Add support for wound/wait style locks · 040a0a37
      Maarten Lankhorst 提交于
      Wound/wait mutexes are used when other multiple lock
      acquisitions of a similar type can be done in an arbitrary
      order. The deadlock handling used here is called wait/wound in
      the RDBMS literature: The older tasks waits until it can acquire
      the contended lock. The younger tasks needs to back off and drop
      all the locks it is currently holding, i.e. the younger task is
      wounded.
      
      For full documentation please read Documentation/ww-mutex-design.txt.
      
      References: https://lwn.net/Articles/548909/Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com>
      Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Acked-by: NRob Clark <robdclark@gmail.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: dri-devel@lists.freedesktop.org
      Cc: linaro-mm-sig@lists.linaro.org
      Cc: rostedt@goodmis.org
      Cc: daniel@ffwll.ch
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/51C8038C.9000106@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      040a0a37
    • M
      arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or not · a41b56ef
      Maarten Lankhorst 提交于
      This will allow me to call functions that have multiple
      arguments if fastpath fails. This is required to support ticket
      mutexes, because they need to be able to pass an extra argument
      to the fail function.
      
      Originally I duplicated the functions, by adding
      __mutex_fastpath_lock_retval_arg. This ended up being just a
      duplication of the existing function, so a way to test if
      fastpath was called ended up being better.
      
      This also cleaned up the reservation mutex patch some by being
      able to call an atomic_set instead of atomic_xchg, and making it
      easier to detect if the wrong unlock function was previously
      used.
      Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: dri-devel@lists.freedesktop.org
      Cc: linaro-mm-sig@lists.linaro.org
      Cc: robclark@gmail.com
      Cc: rostedt@goodmis.org
      Cc: daniel@ffwll.ch
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20130620113105.4001.83929.stgit@patserSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a41b56ef
    • M
      driver core: device.h: fix doc compilation warnings · bfd63cd2
      Michael Opdenacker 提交于
      This patch fixes the below 3 warnings running "make htmldocs",
      by adding descriptions for recently added structure members:
      
      DOCPROC Documentation/DocBook/device-drivers.xml
      Warning(/work/git.free-electrons.com/users/michael-opdenacker/linux//include/linux/device.h:116): No description found for parameter 'lock_key'
      Warning(/work/git.free-electrons.com/users/michael-opdenacker/linux//include/linux/device.h:723): No description found for parameter 'cma_area'
      Warning(/work/git.free-electrons.com/users/michael-opdenacker/linux//include/linux/device.h:723): No description found for parameter 'iommu_group'
      
      Don't hesitate to propose better descriptions!
      Signed-off-by: NMichael Opdenacker <michael.opdenacker@free-electrons.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bfd63cd2
    • E
      gre: fix a possible skb leak · bd8a7036
      Eric Dumazet 提交于
      commit 68c33163 ("v4 GRE: Add TCP segmentation offload for GRE")
      added a possible skb leak, because it frees only the head of segment
      list, in case a skb_linearize() call fails.
      
      This patch adds a kfree_skb_list() helper to fix the bug.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd8a7036
    • H
      linux/const.h: Add _BITUL() and _BITULL() · 2fc016c5
      H. Peter Anvin 提交于
      Add macros for single bit definitions of a specific type.  These are
      similar to the BIT() macro that already exists, but with a few
      exceptions:
      
      1. The namespace is such that they can be used in uapi definitions.
      2. The type is set with the _AC() macro to allow it to be used in
         assembly.
      3. The type is explicitly specified to be UL or ULL.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/n/tip-nbca8p7cg6jyjoit7klh3o91@git.kernel.org
      2fc016c5