1. 19 12月, 2014 6 次提交
    • J
      ocfs2: fix journal commit deadlock · 136f49b9
      Junxiao Bi 提交于
      For buffer write, page lock will be got in write_begin and released in
      write_end, in ocfs2_write_end_nolock(), before it unlock the page in
      ocfs2_free_write_ctxt(), it calls ocfs2_run_deallocs(), this will ask
      for the read lock of journal->j_trans_barrier.  Holding page lock and
      ask for journal->j_trans_barrier breaks the locking order.
      
      This will cause a deadlock with journal commit threads, ocfs2cmt will
      get write lock of journal->j_trans_barrier first, then it wakes up
      kjournald2 to do the commit work, at last it waits until done.  To
      commit journal, kjournald2 needs flushing data first, it needs get the
      cache page lock.
      
      Since some ocfs2 cluster locks are holding by write process, this
      deadlock may hung the whole cluster.
      
      unlock pages before ocfs2_run_deallocs() can fix the locking order, also
      put unlock before ocfs2_commit_trans() to make page lock is unlocked
      before j_trans_barrier to preserve unlocking order.
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: NWengang Wang <wen.gang.wang@oracle.com>
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      136f49b9
    • J
      ocfs2/dlm: fix race between dispatched_work and dlm_lockres_grab_inflight_worker · 1e589581
      Joseph Qi 提交于
      Commit ac4fef4d ("ocfs2/dlm: do not purge lockres that is queued for
      assert master") may have the following possible race case:
      
        dlm_dispatch_assert_master       dlm_wq
        ========================================================================
        queue_work(dlm->quedlm_worker,
            &dlm->dispatched_work);
                                       dispatch work,
                                       dlm_lockres_drop_inflight_worker
                                       *BUG_ON(res->inflight_assert_workers == 0)*
        dlm_lockres_grab_inflight_worker
        inflight_assert_workers++
      
      So ensure inflight_assert_workers to be increased first.
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Signed-off-by: NXue jiufei <xuejiufei@huawei.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e589581
    • J
      ocfs2: reflink: fix slow unlink for refcounted file · f62f12b3
      Junxiao Bi 提交于
      When running ocfs2 test suite multiple nodes reflink stress test, for a
      4 nodes cluster, every unlink() for refcounted file needs about 700s.
      
      The slow unlink is caused by the contention of refcount tree lock since
      all nodes are unlink files using the same refcount tree.  When the
      unlinking file have many extents(over 1600 in our test), most of the
      extents has refcounted flag set.  In ocfs2_commit_truncate(), it will
      execute the following call trace for every extents.  This means it needs
      get and released refcount tree lock about 1600 times.  And when several
      nodes are do this at the same time, the performance will be very low.
      
        ocfs2_remove_btree_range()
        --  ocfs2_lock_refcount_tree()
        ----  ocfs2_refcount_lock()
        ------  __ocfs2_cluster_lock()
      
      ocfs2_refcount_lock() is costly, move it to ocfs2_commit_truncate() to
      do lock/unlock once can improve a lot performance.
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Wengang <wen.gang.wang@oracle.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f62f12b3
    • P
      fs/proc/meminfo.c: include cma info in proc/meminfo · 47f8f929
      Pintu Kumar 提交于
      This patch include CMA info (CMATotal, CMAFree) in /proc/meminfo.
      Currently, in a CMA enabled system, if somebody wants to know the total
      CMA size declared, there is no way to tell, other than the dmesg or
      /var/log/messages logs.
      
      With this patch we are showing the CMA info as part of meminfo, so that it
      can be determined at any point of time.  This will be populated only when
      CMA is enabled.
      
      Below is the sample output from a ARM based device with RAM:512MB and CMA:16MB.
      
        MemTotal:         471172 kB
        MemFree:          111712 kB
        MemAvailable:     271172 kB
        .
        .
        .
        CmaTotal:          16384 kB
        CmaFree:            6144 kB
      
      This patch also fix below checkpatch errors that were found during these changes.
      
        ERROR: space required after that ',' (ctx:ExV)
        199: FILE: fs/proc/meminfo.c:199:
        +       ,atomic_long_read(&num_poisoned_pages) << (PAGE_SHIFT - 10)
                ^
      
        ERROR: space required after that ',' (ctx:ExV)
        202: FILE: fs/proc/meminfo.c:202:
        +       ,K(global_page_state(NR_ANON_TRANSPARENT_HUGEPAGES) *
                ^
      
        ERROR: space required after that ',' (ctx:ExV)
        206: FILE: fs/proc/meminfo.c:206:
        +       ,K(totalcma_pages)
                ^
      
        total: 3 errors, 0 warnings, 2 checks, 236 lines checked
      Signed-off-by: NPintu Kumar <pintu.k@samsung.com>
      Signed-off-by: NVishnu Pratap Singh <vishnu.ps@samsung.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      47f8f929
    • S
      hfsplus: fix longname handling · 89ac9b4d
      Sougata Santra 提交于
      Longname is not correctly handled by hfsplus driver.  If an attempt to
      create a longname(>255) file/directory is made, it succeeds by creating a
      file/directory with HFSPLUS_MAX_STRLEN and incorrect catalog key.  Thus
      leaving the volume in an inconsistent state.  This patch fixes this issue.
      
      Although lookup is always called first to create a negative entry, so just
      doing a check in lookup would probably fix this issue.  I choose to
      propagate error to other iops as well.
      
      Please NOTE: I have factored out hfsplus_cat_build_key_with_cnid from
      hfsplus_cat_build_key, to avoid unncessary branching.
      
      Thanks a lot.
      
        TEST:
        ------
        dir="TEST_DIR"
        cdir=`pwd`
        name255="_123456789_123456789_123456789_123456789_123456789_123456789\
        _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
        _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
        _123456789_123456789_123456789_123456789_123456789_1234"
        name256="${name255}5"
      
        mkdir $dir
        cd $dir
        touch $name255
        rm -f $name255
        touch $name256
        ls -la
        cd $cdir
        rm -rf $dir
      
        RESULT:
        -------
        [sougata@ultrabook tmp]$ cdir=`pwd`
        [sougata@ultrabook tmp]$
        name255="_123456789_123456789_123456789_123456789_123456789_123456789\
         > _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
         > _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
         > _123456789_123456789_123456789_123456789_123456789_1234"
        [sougata@ultrabook tmp]$ name256="${name255}5"
        [sougata@ultrabook tmp]$
        [sougata@ultrabook tmp]$ mkdir $dir
        [sougata@ultrabook tmp]$ cd $dir
        [sougata@ultrabook TEST_DIR]$ touch $name255
        [sougata@ultrabook TEST_DIR]$ rm -f $name255
        [sougata@ultrabook TEST_DIR]$ touch $name256
        [sougata@ultrabook TEST_DIR]$ ls -la
        ls: cannot access
        _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234:
        No such file or directory
        total 0
        drwxrwxr-x 1 sougata sougata 3 Feb 20 19:56 .
        drwxrwxrwx 1 root    root    6 Feb 20 19:56 ..
        -????????? ? ?       ?       ?            ?
        _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234
        [sougata@ultrabook TEST_DIR]$ cd $cdir
        [sougata@ultrabook tmp]$ rm -rf $dir
        rm: cannot remove `TEST_DIR': Directory not empty
      
      -ENAMETOOLONG returned from hfsplus_asc2uni was not propaged to iops.
      This allowed hfsplus to create files/directories with HFSPLUS_MAX_STRLEN
      and incorrect keys, leaving the FS in an inconsistent state.  This patch
      fixes this issue.
      Signed-off-by: NSougata Santra <sougata@tuxera.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      89ac9b4d
    • E
      mnt: Fix a memory stomp in umount · c297abfd
      Eric W. Biederman 提交于
      While reviewing the code of umount_tree I realized that when we append
      to a preexisting unmounted list we do not change pprev of the former
      first item in the list.
      
      Which means later in namespace_unlock hlist_del_init(&mnt->mnt_hash) on
      the former first item of the list will stomp unmounted.first leaving
      it set to some random mount point which we are likely to free soon.
      
      This isn't likely to hit, but if it does I don't know how anyone could
      track it down.
      
      [ This happened because we don't have all the same operations for
        hlist's as we do for normal doubly-linked lists. In particular,
        list_splice() is easy on our standard doubly-linked lists, while
        hlist_splice() doesn't exist and needs both start/end entries of the
        hlist.  And commit 38129a13 incorrectly open-coded that missing
        hlist_splice().
      
        We should think about making these kinds of "mindless" conversions
        easier to get right by adding the missing hlist helpers   - Linus ]
      
      Fixes: 38129a13 switch mnt_hash to hlist
      Cc: stable@vger.kernel.org
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c297abfd
  2. 18 12月, 2014 20 次提交
  3. 15 12月, 2014 1 次提交
    • J
      isofs: Fix infinite looping over CE entries · f54e18f1
      Jan Kara 提交于
      Rock Ridge extensions define so called Continuation Entries (CE) which
      define where is further space with Rock Ridge data. Corrupted isofs
      image can contain arbitrarily long chain of these, including a one
      containing loop and thus causing kernel to end in an infinite loop when
      traversing these entries.
      
      Limit the traversal to 32 entries which should be more than enough space
      to store all the Rock Ridge data.
      Reported-by: NP J P <ppandit@redhat.com>
      CC: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      f54e18f1
  4. 14 12月, 2014 13 次提交
    • F
      aio: Skip timer for io_getevents if timeout=0 · 5f785de5
      Fam Zheng 提交于
      In this case, it is basically a polling. Let's not involve timer at all
      because that would hurt performance for application event loops.
      
      In an arbitrary test I've done, io_getevents syscall elapsed time
      reduces from 50000+ nanoseconds to a few hundereds.
      Signed-off-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      5f785de5
    • P
      aio: Make it possible to remap aio ring · e4a0d3e7
      Pavel Emelyanov 提交于
      There are actually two issues this patch addresses. Let me start with
      the one I tried to solve in the beginning.
      
      So, in the checkpoint-restore project (criu) we try to dump tasks'
      state and restore one back exactly as it was. One of the tasks' state
      bits is rings set up with io_setup() call. There's (almost) no problems
      in dumping them, there's a problem restoring them -- if I dump a task
      with aio ring originally mapped at address A, I want to restore one
      back at exactly the same address A. Unfortunately, the io_setup() does
      not allow for that -- it mmaps the ring at whatever place mm finds
      appropriate (it calls do_mmap_pgoff() with zero address and without
      the MAP_FIXED flag).
      
      To make restore possible I'm going to mremap() the freshly created ring
      into the address A (under which it was seen before dump). The problem is
      that the ring's virtual address is passed back to the user-space as the
      context ID and this ID is then used as search key by all the other io_foo()
      calls. Reworking this ID to be just some integer doesn't seem to work, as
      this value is already used by libaio as a pointer using which this library
      accesses memory for aio meta-data.
      
      So, to make restore work we need to make sure that
      
      a) ring is mapped at desired virtual address
      b) kioctx->user_id matches this value
      
      Having said that, the patch makes mremap() on aio region update the
      kioctx's user_id and mmap_base values.
      
      Here appears the 2nd issue I mentioned in the beginning of this mail.
      If (regardless of the C/R dances I do) someone creates an io context
      with io_setup(), then mremap()-s the ring and then destroys the context,
      the kill_ioctx() routine will call munmap() on wrong (old) address.
      This will result in a) aio ring remaining in memory and b) some other
      vma get unexpectedly unmapped.
      
      What do you think?
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Acked-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      e4a0d3e7
    • J
      fsnotify: remove destroy_list from fsnotify_mark · 37d469e7
      Jan Kara 提交于
      destroy_list is used to track marks which still need waiting for srcu
      period end before they can be freed.  However by the time mark is added to
      destroy_list it isn't in group's list of marks anymore and thus we can
      reuse fsnotify_mark->g_list for queueing into destroy_list.  This saves
      two pointers for each fsnotify_mark.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      37d469e7
    • J
      fsnotify: unify inode and mount marks handling · 0809ab69
      Jan Kara 提交于
      There's a lot of common code in inode and mount marks handling.  Factor it
      out to a common helper function.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0809ab69
    • H
      fallocate: create FAN_MODIFY and IN_MODIFY events · 820c12d5
      Heinrich Schuchardt 提交于
      The fanotify and the inotify API can be used to monitor changes of the
      file system.  System call fallocate() modifies files.  Hence it should
      trigger the corresponding fanotify (FAN_MODIFY) and inotify (IN_MODIFY)
      events.  The most interesting case is FALLOC_FL_COLLAPSE_RANGE because
      this value allows to create arbitrary file content from random data.
      
      This patch adds the missing call to fsnotify_modify().
      
      The FAN_MODIFY and IN_MODIFY event will be created when fallocate()
      succeeds.  It will even be created if the file length remains unchanged,
      e.g.  when calling fanotify with flag FALLOC_FL_KEEP_SIZE.
      
      This logic was primarily chosen to keep the coding simple.
      
      It resembles the logic of the write() system call.
      
      When we call write() we always create a FAN_MODIFY event, even in the case
      of overwriting with identical data.
      
      Events FAN_MODIFY and IN_MODIFY do not provide any guarantee that data was
      actually changed.
      
      Furthermore even if if the filesize remains unchanged, fallocate() may
      influence whether a subsequent write() will succeed and hence the
      fallocate() call may be considered a modification.
      
      The fallocate(2) man page teaches: After a successful call, subsequent
      writes into the range specified by offset and len are guaranteed not to
      fail because of lack of disk space.
      
      So calling fallocate(fd, FALLOC_FL_KEEP_SIZE, offset, len) may result in
      different outcomes of a subsequent write depending on the values of offset
      and len.
      Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: John McCutchan <john@johnmccutchan.com>
      Cc: Robert Love <rlove@rlove.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      820c12d5
    • F
      fs/affs/file.c: remove obsolete pagesize check · 92cab82b
      Fabian Frederick 提交于
      linux kernel doesn't manage page sizes below 4kb.
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      92cab82b
    • F
      fs/affs/file.c: add support to O_DIRECT · 9abb4083
      Fabian Frederick 提交于
      Based on ext2_direct_IO
      
      Tested with O_DIRECT file open and sysbench/mariadb with 1% written
      queries improvement (update_non_index test) on a volume created with
      mkaffs.
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9abb4083
    • F
      fs/affs/amigaffs.c: use va_format instead of buffer/vnsprintf · 1ee54b09
      Fabian Frederick 提交于
      -Remove ErrorBuffer and use %pV
      
      -Add __printf to enable argument mistmatch warnings
      
      Original patch by Joe Perches.
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1ee54b09
    • F
      fs/affs/file.c: forward declaration clean-up · 7633978b
      Fabian Frederick 提交于
      -Move file_operations to avoid forward declarations.
      
      -Remove unused declarations.
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7633978b
    • D
      syscalls: implement execveat() system call · 51f39a1f
      David Drysdale 提交于
      This patchset adds execveat(2) for x86, and is derived from Meredydd
      Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).
      
      The primary aim of adding an execveat syscall is to allow an
      implementation of fexecve(3) that does not rely on the /proc filesystem,
      at least for executables (rather than scripts).  The current glibc version
      of fexecve(3) is implemented via /proc, which causes problems in sandboxed
      or otherwise restricted environments.
      
      Given the desire for a /proc-free fexecve() implementation, HPA suggested
      (https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be
      an appropriate generalization.
      
      Also, having a new syscall means that it can take a flags argument without
      back-compatibility concerns.  The current implementation just defines the
      AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be
      added in future -- for example, flags for new namespaces (as suggested at
      https://lkml.org/lkml/2006/7/11/474).
      
      Related history:
       - https://lkml.org/lkml/2006/12/27/123 is an example of someone
         realizing that fexecve() is likely to fail in a chroot environment.
       - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
         documenting the /proc requirement of fexecve(3) in its manpage, to
         "prevent other people from wasting their time".
       - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
         problem where a process that did setuid() could not fexecve()
         because it no longer had access to /proc/self/fd; this has since
         been fixed.
      
      This patch (of 4):
      
      Add a new execveat(2) system call.  execveat() is to execve() as openat()
      is to open(): it takes a file descriptor that refers to a directory, and
      resolves the filename relative to that.
      
      In addition, if the filename is empty and AT_EMPTY_PATH is specified,
      execveat() executes the file to which the file descriptor refers.  This
      replicates the functionality of fexecve(), which is a system call in other
      UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/<fd>" (and
      so relies on /proc being mounted).
      
      The filename fed to the executed program as argv[0] (or the name of the
      script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
      (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
      reflecting how the executable was found.  This does however mean that
      execution of a script in a /proc-less environment won't work; also, script
      execution via an O_CLOEXEC file descriptor fails (as the file will not be
      accessible after exec).
      
      Based on patches by Meredydd Luff.
      Signed-off-by: NDavid Drysdale <drysdale@google.com>
      Cc: Meredydd Luff <meredydd@senatehouse.org>
      Cc: Shuah Khan <shuah.kh@samsung.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Rich Felker <dalias@aerifal.cx>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      51f39a1f
    • N
      fat: fix data past EOF resulting from fsx testsuite · c0ef0cc9
      Namjae Jeon 提交于
      When running FSX with direct I/O mode, fsx resulted in DATA past EOF issues.
      
        fsx ./file2 -Z -r 4096 -w 4096
        ...
        ..
        truncating to largest ever: 0x907c
        fallocating to largest ever: 0x11137
        truncating to largest ever: 0x2c6fe
        truncating to largest ever: 0x2cfdf
        fallocating to largest ever: 0x40000
        Mapped Read: non-zero data past EOF (0x18628) page offset 0x629 is 0x2a4e
        ...
        ..
      
      The reason being, it is doing a truncate down, but the zeroing does not
      happen on the last block boundary when offset is not aligned.  Even though
      it calls truncate_setsize()->truncate_inode_pages()->
      truncate_inode_pages_range() and considers the partial zeroout but it
      retrieves the page using find_lock_page() - which only looks the page in
      the cache.  So, zeroing out does not happen in case of direct IO.
      
      Make a truncate page based around block_truncate_page for FAT filesystem
      and invoke that helper to zerout in case the offset is not aligned with
      the blocksize.
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NAmit Sahrawat <a.sahrawat@samsung.com>
      Acked-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c0ef0cc9
    • J
      befs: remove dead code · f441ada0
      Jan Kara 提交于
      Coverity id: 1042674
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f441ada0
    • D
      fs, seq_file: fallback to vmalloc instead of oom kill processes · 5cec38ac
      David Rientjes 提交于
      Since commit 058504ed ("fs/seq_file: fallback to vmalloc allocation"),
      seq_buf_alloc() falls back to vmalloc() when the kmalloc() for contiguous
      memory fails.  This was done to address order-4 slab allocations for
      reading /proc/stat on large machines and noticed because
      PAGE_ALLOC_COSTLY_ORDER < 4, so there is no infinite loop in the page
      allocator when allocating new slab for such high-order allocations.
      
      Contiguous memory isn't necessary for caller of seq_buf_alloc(), however.
      Other GFP_KERNEL high-order allocations that are <=
      PAGE_ALLOC_COSTLY_ORDER will simply loop forever in the page allocator and
      oom kill processes as a result.
      
      We don't want to kill processes so that we can allocate contiguous memory
      in situations when contiguous memory isn't necessary.
      
      This patch does the kmalloc() allocation with __GFP_NORETRY for high-order
      allocations.  This still utilizes memory compaction and direct reclaim in
      the allocation path, the only difference is that it will fail immediately
      instead of oom kill processes when out of memory.
      
      [akpm@linux-foundation.org: add comment]
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5cec38ac