1. 27 9月, 2016 1 次提交
  2. 01 9月, 2016 1 次提交
  3. 04 8月, 2016 1 次提交
  4. 03 8月, 2016 2 次提交
  5. 01 8月, 2016 1 次提交
  6. 27 7月, 2016 4 次提交
    • K
    • K
      mm: introduce fault_env · bae473a4
      Kirill A. Shutemov 提交于
      The idea borrowed from Peter's patch from patchset on speculative page
      faults[1]:
      
      Instead of passing around the endless list of function arguments,
      replace the lot with a single structure so we can change context without
      endless function signature changes.
      
      The changes are mostly mechanical with exception of faultaround code:
      filemap_map_pages() got reworked a bit.
      
      This patch is preparation for the next one.
      
      [1] http://lkml.kernel.org/r/20141020222841.302891540@infradead.org
      
      Link: http://lkml.kernel.org/r/1466021202-61880-9-git-send-email-kirill.shutemov@linux.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bae473a4
    • M
      mm: migrate: support non-lru movable page migration · bda807d4
      Minchan Kim 提交于
      We have allowed migration for only LRU pages until now and it was enough
      to make high-order pages.  But recently, embedded system(e.g., webOS,
      android) uses lots of non-movable pages(e.g., zram, GPU memory) so we
      have seen several reports about troubles of small high-order allocation.
      For fixing the problem, there were several efforts (e,g,.  enhance
      compaction algorithm, SLUB fallback to 0-order page, reserved memory,
      vmalloc and so on) but if there are lots of non-movable pages in system,
      their solutions are void in the long run.
      
      So, this patch is to support facility to change non-movable pages with
      movable.  For the feature, this patch introduces functions related to
      migration to address_space_operations as well as some page flags.
      
      If a driver want to make own pages movable, it should define three
      functions which are function pointers of struct
      address_space_operations.
      
      1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);
      
      What VM expects on isolate_page function of driver is to return *true*
      if driver isolates page successfully.  On returing true, VM marks the
      page as PG_isolated so concurrent isolation in several CPUs skip the
      page for isolation.  If a driver cannot isolate the page, it should
      return *false*.
      
      Once page is successfully isolated, VM uses page.lru fields so driver
      shouldn't expect to preserve values in that fields.
      
      2. int (*migratepage) (struct address_space *mapping,
      		struct page *newpage, struct page *oldpage, enum migrate_mode);
      
      After isolation, VM calls migratepage of driver with isolated page.  The
      function of migratepage is to move content of the old page to new page
      and set up fields of struct page newpage.  Keep in mind that you should
      indicate to the VM the oldpage is no longer movable via
      __ClearPageMovable() under page_lock if you migrated the oldpage
      successfully and returns 0.  If driver cannot migrate the page at the
      moment, driver can return -EAGAIN.  On -EAGAIN, VM will retry page
      migration in a short time because VM interprets -EAGAIN as "temporal
      migration failure".  On returning any error except -EAGAIN, VM will give
      up the page migration without retrying in this time.
      
      Driver shouldn't touch page.lru field VM using in the functions.
      
      3. void (*putback_page)(struct page *);
      
      If migration fails on isolated page, VM should return the isolated page
      to the driver so VM calls driver's putback_page with migration failed
      page.  In this function, driver should put the isolated page back to the
      own data structure.
      
      4. non-lru movable page flags
      
      There are two page flags for supporting non-lru movable page.
      
      * PG_movable
      
      Driver should use the below function to make page movable under
      page_lock.
      
      	void __SetPageMovable(struct page *page, struct address_space *mapping)
      
      It needs argument of address_space for registering migration family
      functions which will be called by VM.  Exactly speaking, PG_movable is
      not a real flag of struct page.  Rather than, VM reuses page->mapping's
      lower bits to represent it.
      
      	#define PAGE_MAPPING_MOVABLE 0x2
      	page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;
      
      so driver shouldn't access page->mapping directly.  Instead, driver
      should use page_mapping which mask off the low two bits of page->mapping
      so it can get right struct address_space.
      
      For testing of non-lru movable page, VM supports __PageMovable function.
      However, it doesn't guarantee to identify non-lru movable page because
      page->mapping field is unified with other variables in struct page.  As
      well, if driver releases the page after isolation by VM, page->mapping
      doesn't have stable value although it has PAGE_MAPPING_MOVABLE (Look at
      __ClearPageMovable).  But __PageMovable is cheap to catch whether page
      is LRU or non-lru movable once the page has been isolated.  Because LRU
      pages never can have PAGE_MAPPING_MOVABLE in page->mapping.  It is also
      good for just peeking to test non-lru movable pages before more
      expensive checking with lock_page in pfn scanning to select victim.
      
      For guaranteeing non-lru movable page, VM provides PageMovable function.
      Unlike __PageMovable, PageMovable functions validates page->mapping and
      mapping->a_ops->isolate_page under lock_page.  The lock_page prevents
      sudden destroying of page->mapping.
      
      Driver using __SetPageMovable should clear the flag via
      __ClearMovablePage under page_lock before the releasing the page.
      
      * PG_isolated
      
      To prevent concurrent isolation among several CPUs, VM marks isolated
      page as PG_isolated under lock_page.  So if a CPU encounters PG_isolated
      non-lru movable page, it can skip it.  Driver doesn't need to manipulate
      the flag because VM will set/clear it automatically.  Keep in mind that
      if driver sees PG_isolated page, it means the page have been isolated by
      VM so it shouldn't touch page.lru field.  PG_isolated is alias with
      PG_reclaim flag so driver shouldn't use the flag for own purpose.
      
      [opensource.ganesh@gmail.com: mm/compaction: remove local variable is_lru]
        Link: http://lkml.kernel.org/r/20160618014841.GA7422@leo-test
      Link: http://lkml.kernel.org/r/1464736881-24886-3-git-send-email-minchan@kernel.orgSigned-off-by: NGioh Kim <gi-oh.kim@profitbricks.com>
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NGanesh Mahendran <opensource.ganesh@gmail.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: John Einar Reitan <john.reitan@foss.arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bda807d4
    • R
      dax: some small updates to dax.txt documentation · 221c7dc8
      Ross Zwisler 提交于
      These are originally from Matthew Wilcox and were part of his huge
      "mm,fs,dax: Change ->pmd_fault to ->huge_fault" patch that was part of
      PUD support.
      
      I'm breaking these small changes out as they stand on their own and add
      useful information to Documentation/filesystems/dax.txt.
      
      Link: http://lkml.kernel.org/r/20160714214049.20075-1-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      221c7dc8
  7. 25 7月, 2016 1 次提交
  8. 13 7月, 2016 1 次提交
    • D
      pmem: kill __pmem address space · 7a9eb206
      Dan Williams 提交于
      The __pmem address space was meant to annotate codepaths that touch
      persistent memory and need to coordinate a call to wmb_pmem().  Now that
      wmb_pmem() is gone, there is little need to keep this annotation.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      7a9eb206
  9. 10 7月, 2016 1 次提交
  10. 09 7月, 2016 1 次提交
  11. 02 7月, 2016 1 次提交
  12. 01 7月, 2016 1 次提交
  13. 30 6月, 2016 2 次提交
  14. 14 6月, 2016 1 次提交
  15. 06 6月, 2016 1 次提交
    • E
      devpts: Make each mount of devpts an independent filesystem. · eedf265a
      Eric W. Biederman 提交于
      The /dev/ptmx device node is changed to lookup the directory entry "pts"
      in the same directory as the /dev/ptmx device node was opened in.  If
      there is a "pts" entry and that entry is a devpts filesystem /dev/ptmx
      uses that filesystem.  Otherwise the open of /dev/ptmx fails.
      
      The DEVPTS_MULTIPLE_INSTANCES configuration option is removed, so that
      userspace can now safely depend on each mount of devpts creating a new
      instance of the filesystem.
      
      Each mount of devpts is now a separate and equal filesystem.
      
      Reserved ttys are now available to all instances of devpts where the
      mounter is in the initial mount namespace.
      
      A new vfs helper path_pts is introduced that finds a directory entry
      named "pts" in the directory of the passed in path, and changes the
      passed in path to point to it.  The helper path_pts uses a function
      path_parent_directory that was factored out of follow_dotdot.
      
      In the implementation of devpts:
       - devpts_mnt is killed as it is no longer meaningful if all mounts of
         devpts are equal.
       - pts_sb_from_inode is replaced by just inode->i_sb as all cached
         inodes in the tty layer are now from the devpts filesystem.
       - devpts_add_ref is rolled into the new function devpts_ptmx.  And the
         unnecessary inode hold is removed.
       - devpts_del_ref is renamed devpts_release and reduced to just a
         deacrivate_super.
       - The newinstance mount option continues to be accepted but is now
         ignored.
      
      In devpts_fs.h definitions for when !CONFIG_UNIX98_PTYS are removed as
      they are never used.
      
      Documentation/filesystems/devices.txt is updated to describe the current
      situation.
      
      This has been verified to work properly on openwrt-15.05, centos5,
      centos6, centos7, debian-6.0.2, debian-7.9, debian-8.2, ubuntu-14.04.3,
      ubuntu-15.10, fedora23, magia-5, mint-17.3, opensuse-42.1,
      slackware-14.1, gentoo-20151225 (13.0?), archlinux-2015-12-01.  With the
      caveat that on centos6 and on slackware-14.1 that there wind up being
      two instances of the devpts filesystem mounted on /dev/pts, the lower
      copy does not end up getting used.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Aurelien Jarno <aurelien@aurel32.net>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Jann Horn <jann@thejh.net>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Florian Weimer <fw@deneb.enyo.de>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eedf265a
  16. 28 5月, 2016 1 次提交
  17. 27 5月, 2016 1 次提交
  18. 26 5月, 2016 1 次提交
  19. 24 5月, 2016 1 次提交
  20. 21 5月, 2016 1 次提交
  21. 19 5月, 2016 1 次提交
  22. 03 5月, 2016 4 次提交
    • A
      9cf843e3
    • A
      introduce a parallel variant of ->iterate() · 61922694
      Al Viro 提交于
      New method: ->iterate_shared().  Same arguments as in ->iterate(),
      called with the directory locked only shared.  Once all filesystems
      switch, the old one will be gone.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      61922694
    • A
      parallel lookups: actual switch to rwsem · 9902af79
      Al Viro 提交于
      ta-da!
      
      The main issue is the lack of down_write_killable(), so the places
      like readdir.c switched to plain inode_lock(); once killable
      variants of rwsem primitives appear, that'll be dealt with.
      
      lockdep side also might need more work
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9902af79
    • A
      parallel lookups machinery, part 2 · 84e710da
      Al Viro 提交于
      We'll need to verify that there's neither a hashed nor in-lookup
      dentry with desired parent/name before adding to in-lookup set.
      
      One possible solution would be to hold the parent's ->d_lock through
      both checks, but while the in-lookup set is relatively small at any
      time, dcache is not.  And holding the parent's ->d_lock through
      something like __d_lookup_rcu() would suck too badly.
      
      So we leave the parent's ->d_lock alone, which means that we watch
      out for the following scenario:
      	* we verify that there's no hashed match
      	* existing in-lookup match gets hashed by another process
      	* we verify that there's no in-lookup matches and decide
      that everything's fine.
      
      Solution: per-directory kinda-sorta seqlock, bumped around the times
      we hash something that used to be in-lookup or move (and hash)
      something in place of in-lookup.  Then the above would turn into
      	* read the counter
      	* do dcache lookup
      	* if no matches found, check for in-lookup matches
      	* if there had been none of those either, check if the
      counter has changed; repeat if it has.
      
      The "kinda-sorta" part is due to the fact that we don't have much spare
      space in inode.  There is a spare word (shared with i_bdev/i_cdev/i_pipe),
      so the counter part is not a problem, but spinlock is a different story.
      
      We could use the parent's ->d_lock, and it would be less painful in
      terms of contention, for __d_add() it would be rather inconvenient to
      grab; we could do that (using lock_parent()), but...
      
      Fortunately, we can get serialization on the counter itself, and it
      might be a good idea in general; we can use cmpxchg() in a loop to
      get from even to odd and smp_store_release() from odd to even.
      
      This commit adds the counter and updating logics; the readers will be
      added in the next commit.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      84e710da
  23. 02 5月, 2016 1 次提交
  24. 28 4月, 2016 1 次提交
  25. 11 4月, 2016 1 次提交
  26. 05 4月, 2016 1 次提交
  27. 23 3月, 2016 2 次提交
    • M
      fat: add config option to set UTF-8 mount option by default · 38739380
      Maciej S. Szmigiero 提交于
      FAT has long supported its own default file name encoding config
      setting, separate from CONFIG_NLS_DEFAULT.
      
      However, if UTF-8 encoded file names are desired FAT character set
      should not be set to utf8 since this would make file names case
      sensitive even if case insensitive matching is requested.  Instead,
      "utf8" mount options should be provided to enable UTF-8 file names in
      FAT file system.
      
      Unfortunately, there was no possibility to set the default value of this
      option so on UTF-8 system "utf8" mount option had to be added manually
      to most FAT mounts.
      
      This patch adds config option to set such default value.
      Signed-off-by: NMaciej S. Szmigiero <mail@maciej.szmigiero.name>
      Acked-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      38739380
    • G
      ocfs2: add feature document for online file check · d750c42a
      Gang He 提交于
      This document will describe OCFS2 online file check feature.  OCFS2 is
      often used in high-availaibility systems.  However, OCFS2 usually
      converts the filesystem to read-only when encounters an error.  This may
      not be necessary, since turning the filesystem read-only would affect
      other running processes as well, decreasing availability.
      
      Then, a mount option (errors=continue) is introduced, which would return
      the -EIO errno to the calling process and terminate furhter processing
      so that the filesystem is not corrupted further.  The filesystem is not
      converted to read-only, and the problematic file's inode number is
      reported in the kernel log.  The user can try to check/fix this file via
      online filecheck feature.
      Signed-off-by: NGang He <ghe@suse.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d750c42a
  28. 18 3月, 2016 2 次提交
    • C
      nfsd: add SCSI layout support · f99d4fbd
      Christoph Hellwig 提交于
      This is a simple extension to the block layout driver to use SCSI
      persistent reservations for access control and fencing, as well as
      SCSI VPD pages for device identification.
      
      For this we need to pass the nfs4_client to the proc_getdeviceinfo method
      to generate the reservation key, and add a new fence_client method
      to allow for fence actions in the layout driver.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      f99d4fbd
    • J
      proc: add /proc/<pid>/timerslack_ns interface · 5de23d43
      John Stultz 提交于
      This patch provides a proc/PID/timerslack_ns interface which exposes a
      task's timerslack value in nanoseconds and allows it to be changed.
      
      This allows power/performance management software to set timer slack for
      other threads according to its policy for the thread (such as when the
      thread is designated foreground vs.  background activity)
      
      If the value written is non-zero, slack is set to that value.  Otherwise
      sets it to the default for the thread.
      
      This interface checks that the calling task has permissions to to use
      PTRACE_MODE_ATTACH_FSCREDS on the target task, so that we can ensure
      arbitrary apps do not change the timer slack for other apps.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Oren Laadan <orenl@cellrox.com>
      Cc: Ruchi Kandoi <kandoiruchi@google.com>
      Cc: Rom Lemarchand <romlem@android.com>
      Cc: Android Kernel Team <kernel-team@android.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5de23d43
  29. 12 3月, 2016 1 次提交
  30. 10 3月, 2016 1 次提交