1. 20 9月, 2016 1 次提交
    • S
      mm: fix the page_swap_info() BUG_ON check · c8de641b
      Santosh Shilimkar 提交于
      Commit 62c230bc ("mm: add support for a filesystem to activate
      swap files and use direct_IO for writing swap pages") replaced the
      swap_aops dirty hook from __set_page_dirty_no_writeback() with
      swap_set_page_dirty().
      
      For normal cases without these special SWP flags code path falls back to
      __set_page_dirty_no_writeback() so the behaviour is expected to be the
      same as before.
      
      But swap_set_page_dirty() makes use of the page_swap_info() helper to
      get the swap_info_struct to check for the flags like SWP_FILE,
      SWP_BLKDEV etc as desired for those features.  This helper has
      BUG_ON(!PageSwapCache(page)) which is racy and safe only for the
      set_page_dirty_lock() path.
      
      For the set_page_dirty() path which is often needed for cases to be
      called from irq context, kswapd() can toggle the flag behind the back
      while the call is getting executed when system is low on memory and
      heavy swapping is ongoing.
      
      This ends up with undesired kernel panic.
      
      This patch just moves the check outside the helper to its users
      appropriately to fix kernel panic for the described path.  Couple of
      users of helpers already take care of SwapCache condition so I skipped
      them.
      
      Link: http://lkml.kernel.org/r/1473460718-31013-1-git-send-email-santosh.shilimkar@oracle.comSigned-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>	[4.7.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c8de641b
  2. 08 8月, 2016 1 次提交
  3. 29 7月, 2016 1 次提交
  4. 08 6月, 2016 2 次提交
  5. 02 5月, 2016 1 次提交
  6. 29 4月, 2016 1 次提交
    • M
      mm: call swap_slot_free_notify() with page lock held · b06bad17
      Minchan Kim 提交于
      Kyeongdon reported below error which is BUG_ON(!PageSwapCache(page)) in
      page_swap_info.  The reason is that page_endio in rw_page unlocks the
      page if read I/O is completed so we need to hold a PG_lock again to
      check PageSwapCache.  Otherwise, the page can be removed from swapcache.
      
        Kernel BUG at c00f9040 [verbose debug info unavailable]
        Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
        Modules linked in:
        CPU: 4 PID: 13446 Comm: RenderThread Tainted: G        W 3.10.84-g9f14aec-dirty #73
        task: c3b73200 ti: dd192000 task.ti: dd192000
        PC is at page_swap_info+0x10/0x2c
        LR is at swap_slot_free_notify+0x18/0x6c
        pc : [<c00f9040>]    lr : [<c00f5560>]    psr: 400f0113
        sp : dd193d78  ip : c2deb1e4  fp : da015180
        r10: 00000000  r9 : 000200da  r8 : c120fe08
        r7 : 00000000  r6 : 00000000  r5 : c249a6c0  r4 : = c249a6c0
        r3 : 00000000  r2 : 40080009  r1 : 200f0113  r0 : = c249a6c0
        ..<snip> ..
        Call Trace:
          page_swap_info+0x10/0x2c
          swap_slot_free_notify+0x18/0x6c
          swap_readpage+0x90/0x11c
          read_swap_cache_async+0x134/0x1ac
          swapin_readahead+0x70/0xb0
          handle_pte_fault+0x320/0x6fc
          handle_mm_fault+0xc0/0xf0
          do_page_fault+0x11c/0x36c
          do_DataAbort+0x34/0x118
      
      Fixes: 3f2b1a04 ("zram: revive swap_slot_free_notify")
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Tested-by: NKyeongdon Kim <kyeongdon.kim@lge.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b06bad17
  7. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  8. 23 3月, 2016 1 次提交
    • M
      zram: revive swap_slot_free_notify · 3f2b1a04
      Minchan Kim 提交于
      Commit b430e9d1 ("remove compressed copy from zram in-memory")
      applied swap_slot_free_notify call in *end_swap_bio_read* to remove
      duplicated memory between zram and memory.
      
      However, with the introduction of rw_page in zram: 8c7f0102 ("zram:
      implement rw_page operation of zram"), it became void because rw_page
      doesn't need bio.
      
      Memory footprint is really important in embedded platforms which have
      small memory, for example, 512M) recently because it could start to kill
      processes if memory footprint exceeds some threshold by LMK or some
      similar memory management modules.
      
      This patch restores the function for rw_page, thereby eliminating this
      duplication.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: karam.lee <karam.lee@lge.com>
      Cc: <sangseok.lee@lge.com>
      Cc: Chan Jeong <chan.jeong@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f2b1a04
  9. 18 3月, 2016 1 次提交
  10. 14 8月, 2015 1 次提交
  11. 29 7月, 2015 1 次提交
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  12. 19 5月, 2015 1 次提交
  13. 12 4月, 2015 1 次提交
  14. 26 3月, 2015 1 次提交
  15. 13 3月, 2015 1 次提交
  16. 29 1月, 2015 1 次提交
  17. 15 6月, 2014 1 次提交
    • A
      fix __swap_writepage() compile failure on old gcc versions · 05064084
      Al Viro 提交于
      Tetsuo Handa wrote:
       "Commit 62a8067a ("bio_vec-backed iov_iter") introduced an unnamed
        union inside a struct which gcc-4.4.7 cannot handle.  Name the unnamed
         union as u in order to fix build failure"
      
      Let's do this instead: there is only one place in the entire tree that
      steps into this breakage.  Anon structs and unions work in older gcc
      versions; as the matter of fact, we have those in the tree - see e.g.
      struct ieee80211_tx_info in include/net/mac80211.h
      
      What doesn't work is handling their initializers:
      
      struct {
      	int a;
      	union {
      		int b;
      		char c;
      	};
      } x[2] = {{.a = 1, .c = 'a'}, {.a = 0, .b = 1}};
      
      is the obvious syntax for initializer, perfectly fine for C11 and
      handled correctly by gcc-4.7 or later.
      
      Earlier versions, though, break on it - declaration is fine and so's
      access to fields (i.e.  x[0].c = 'a'; would produce the right code), but
      members of the anon structs and unions are not inserted into the right
      namespace.  Tellingly, those older versions will not barf on struct {int
      a; struct {int a;};}; - looks like they just have it hacked up somewhere
      around the handling of .  and -> instead of doing the right thing.
      
      The easiest way to deal with that crap is to turn initialization of
      those fields (in the only place where we have such initializer of
      iov_iter) into plain assignment.
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      05064084
  18. 05 6月, 2014 1 次提交
  19. 07 5月, 2014 3 次提交
    • A
      bio_vec-backed iov_iter · 62a8067a
      Al Viro 提交于
      New variant of iov_iter - ITER_BVEC in iter->type, backed with
      bio_vec array instead of iovec one.  Primitives taught to deal
      with such beasts, __swap_write() switched to using that kind
      of iov_iter.
      
      Note that bio_vec is just a <page, offset, length> triple - there's
      nothing block-specific about it.  I've left the definition where it
      was, but took it from under ifdef CONFIG_BLOCK.
      
      Next target: ->splice_write()...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      62a8067a
    • A
      start adding the tag to iov_iter · 71d8e532
      Al Viro 提交于
      For now, just use the same thing we pass to ->direct_IO() - it's all
      iovec-based at the moment.  Pass it explicitly to iov_iter_init() and
      account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO()
      uses.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      71d8e532
    • A
      pass iov_iter to ->direct_IO() · d8d3d94b
      Al Viro 提交于
      unmodified, for now
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d8d3d94b
  20. 24 1月, 2014 1 次提交
  21. 24 11月, 2013 1 次提交
    • K
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet 提交于
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
  22. 30 7月, 2013 1 次提交
    • K
      aio: Kill aio_rw_vect_retry() · 73a7075e
      Kent Overstreet 提交于
      This code doesn't serve any purpose anymore, since the aio retry
      infrastructure has been removed.
      
      This change should be safe because aio_read/write are also used for
      synchronous IO, and called from do_sync_read()/do_sync_write() - and
      there's no looping done in the sync case (the read and write syscalls).
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      73a7075e
  23. 04 7月, 2013 1 次提交
    • M
      mm: remove compressed copy from zram in-memory · b430e9d1
      Minchan Kim 提交于
      Swap subsystem does lazy swap slot free with expecting the page would be
      swapped out again so we can avoid unnecessary write.
      
      But the problem in in-memory swap(ex, zram) is that it consumes memory
      space until vm_swap_full(ie, used half of all of swap device) condition
      meet.  It could be bad if we use multiple swap device, small in-memory
      swap and big storage swap or in-memory swap alone.
      
      This patch makes swap subsystem free swap slot as soon as swap-read is
      completed and make the swapcache page dirty so the page should be
      written out the swap device to reclaim it.  It means we never lose it.
      
      I tested this patch with kernel compile workload.
      
      1. before
      
         compile time : 9882.42
         zram max wasted space by fragmentation: 13471881 byte
         memory space consumed by zram: 174227456 byte
         the number of slot free notify: 206684
      
      2. after
      
         compile time : 9653.90
         zram max wasted space by fragmentation: 11805932 byte
         memory space consumed by zram: 154001408 byte
         the number of slot free notify: 426972
      
      [akpm@linux-foundation.org: tweak comment text]
      [artem.savkov@gmail.com: fix BUG due to non-swapcache pages in end_swap_bio_read()]
      [akpm@linux-foundation.org: invert unlikely() test, augment comment, 80-col cleanup]
      Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NArtem Savkov <artem.savkov@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>
      Cc: Shaohua Li <shli@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b430e9d1
  24. 08 5月, 2013 1 次提交
  25. 30 4月, 2013 4 次提交
    • M
      mm: swap: mark swap pages writeback before queueing for direct IO · 0cdc444a
      Mel Gorman 提交于
      As pointed out by Andrew Morton, the swap-over-NFS writeback is not
      setting PageWriteback before it is queued for direct IO.  While swap
      pages do not participate in BDI or process dirty accounting and the IO
      is synchronous, the writeback bit is still required and not setting it
      in this case was an oversight.  swapoff depends on the page writeback to
      synchronoise all pending writes on a swap page before it is reused.
      Swapcache freeing and reuse depend on checking the PageWriteback under
      lock to ensure the page is safe to reuse.
      
      Direct IO handlers and the direct IO handler for NFS do not deal with
      PageWriteback as they are synchronous writes.  In the case of NFS, it
      schedules pages (or a page in the case of swap) for IO and then waits
      synchronously for IO to complete in nfs_direct_write().  It is
      recognised that this is a slowdown from normal swap handling which is
      asynchronous and uses a completion handler.  Shoving PageWriteback
      handling down into direct IO handlers looks like a bad fit to handle the
      swap case although it may have to be dealt with some day if swap is
      converted to use direct IO in general and bmap is finally done away
      with.  At that point it will be necessary to refit asynchronous direct
      IO with completion handlers onto the swap subsystem.
      
      As swapcache currently depends on PageWriteback to protect against
      races, this patch sets PageWriteback under the page lock before queueing
      it for direct IO.  It is cleared when the direct IO handler returns.  IO
      errors are treated similarly to the direct-to-bio case except PageError
      is not set as in the case of swap-over-NFS, it is likely to be a
      transient error.
      
      It was asked what prevents such a page being reclaimed in parallel.
      With this patch applied, such a page will now be skipped (most of the
      time) or blocked until the writeback completes.  Reclaim checks
      PageWriteback under the page lock before calling try_to_free_swap and
      the page lock should prevent the page being requeued for IO before it is
      freed.
      
      This and Jerome's related patch should considered for -stable as far
      back as 3.6 when swap-over-NFS was introduced.
      
      [akpm@linux-foundation.org: use pr_err_ratelimited()]
      [akpm@linux-foundation.org: remove hopefully-unneeded cast in printk]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>	[3.6+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0cdc444a
    • J
      swap: redirty page if page write fails on swap file · 2d30d31e
      Jerome Marchand 提交于
      Since commit 62c230bc ("mm: add support for a filesystem to activate
      swap files and use direct_IO for writing swap pages"), swap_writepage()
      calls direct_IO on swap files.  However, in that case the page isn't
      redirtied if I/O fails, and is therefore handled afterwards as if it has
      been successfully written to the swap file, leading to memory corruption
      when the page is eventually swapped back in.
      
      This patch sets the page dirty when direct_IO() fails.  It fixes a
      memory corruption that happened while using swap-over-NFS.
      Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>	[3.6+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d30d31e
    • S
      mm: allow for outstanding swap writeback accounting · 1eec6702
      Seth Jennings 提交于
      To prevent flooding the swap device with writebacks, frontswap backends
      need to count and limit the number of outstanding writebacks.  The
      incrementing of the counter can be done before the call to
      __swap_writepage().  However, the caller must receive a notification
      when the writeback completes in order to decrement the counter.
      
      To achieve this functionality, this patch modifies __swap_writepage() to
      take the bio completion callback function as an argument.
      
      end_swap_bio_write(), the normal bio completion function, is also made
      non-static so that code doing the accounting can call it after the
      accounting is done.
      
      There should be no behavioural change to existing code.
      Signed-off-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1eec6702
    • S
      mm: break up swap_writepage() for frontswap backends · 2f772e6c
      Seth Jennings 提交于
      swap_writepage() is currently where frontswap hooks into the swap write
      path to capture pages with the frontswap_store() function.  However, if
      a frontswap backend wants to "resume" the writeback of a page to the
      swap device, it can't call swap_writepage() as the page will simply
      reenter the backend.
      
      This patch separates swap_writepage() into a top and bottom half, the
      bottom half named __swap_writepage() to allow a frontswap backend, like
      zswap, to resume writeback beyond the frontswap_store() hook.
      
      __add_to_swap_cache() is also made non-static so that the page for which
      writeback is to be resumed can be added to the swap cache.
      Signed-off-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f772e6c
  26. 24 3月, 2013 1 次提交
    • K
      block: Remove bi_idx references · 4f2ac93c
      Kent Overstreet 提交于
      For immutable bvecs, all bi_idx usage needs to be audited - so here
      we're removing all the unnecessary uses.
      
      Most of these are places where it was being initialized on a bio that
      was just allocated, a few others are conversions to standard macros.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      4f2ac93c
  27. 01 8月, 2012 3 次提交
    • M
      mm: add support for direct_IO to highmem pages · 5a178119
      Mel Gorman 提交于
      The patch "mm: add support for a filesystem to activate swap files and use
      direct_IO for writing swap pages" added support for using direct_IO to
      write swap pages but it is insufficient for highmem pages.
      
      To support highmem pages, this patch kmaps() the page before calling the
      direct_IO() handler.  As direct_IO deals with virtual addresses an
      additional helper is necessary for get_kernel_pages() to lookup the struct
      page for a kmap virtual address.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Xiaotian Feng <dfeng@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a178119
    • M
      mm: swap: implement generic handler for swap_activate · a509bc1a
      Mel Gorman 提交于
      The version of swap_activate introduced is sufficient for swap-over-NFS
      but would not provide enough information to implement a generic handler.
      This patch shuffles things slightly to ensure the same information is
      available for aops->swap_activate() as is available to the core.
      
      No functionality change.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Xiaotian Feng <dfeng@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a509bc1a
    • M
      mm: add support for a filesystem to activate swap files and use direct_IO for writing swap pages · 62c230bc
      Mel Gorman 提交于
      Currently swapfiles are managed entirely by the core VM by using ->bmap to
      allocate space and write to the blocks directly.  This effectively ensures
      that the underlying blocks are allocated and avoids the need for the swap
      subsystem to locate what physical blocks store offsets within a file.
      
      If the swap subsystem is to use the filesystem information to locate the
      blocks, it is critical that information such as block groups, block
      bitmaps and the block descriptor table that map the swap file were
      resident in memory.  This patch adds address_space_operations that the VM
      can call when activating or deactivating swap backed by a file.
      
        int swap_activate(struct file *);
        int swap_deactivate(struct file *);
      
      The ->swap_activate() method is used to communicate to the file that the
      VM relies on it, and the address_space should take adequate measures such
      as reserving space in the underlying device, reserving memory for mempools
      and pinning information such as the block descriptor table in memory.  The
      ->swap_deactivate() method is called on sys_swapoff() if ->swap_activate()
      returned success.
      
      After a successful swapfile ->swap_activate, the swapfile is marked
      SWP_FILE and swapper_space.a_ops will proxy to
      sis->swap_file->f_mappings->a_ops using ->direct_io to write swapcache
      pages and ->readpage to read.
      
      It is perfectly possible that direct_IO be used to read the swap pages but
      it is an unnecessary complication.  Similarly, it is possible that
      ->writepage be used instead of direct_io to write the pages but filesystem
      developers have stated that calling writepage from the VM is undesirable
      for a variety of reasons and using direct_IO opens up the possibility of
      writing back batches of swap pages in the future.
      
      [a.p.zijlstra@chello.nl: Original patch]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Xiaotian Feng <dfeng@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      62c230bc
  28. 15 5月, 2012 2 次提交
    • K
      frontswap: s/put_page/store/g s/get_page/load · 165c8aed
      Konrad Rzeszutek Wilk 提交于
      Sounds so much more natural.
      Suggested-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      165c8aed
    • D
      mm: frontswap: core swap subsystem hooks and headers · 38b5faf4
      Dan Magenheimer 提交于
      This patch, 2of4, contains the changes to the core swap subsystem.
      This includes:
      
      (1) makes available core swap data structures (swap_lock, swap_list and
      swap_info) that are needed by frontswap.c but we don't need to expose them
      to the dozens of files that include swap.h so we create a new swapfile.h
      just to extern-ify these and modify their declarations to non-static
      
      (2) adds frontswap-related elements to swap_info_struct.  Frontswap_map
      points to vzalloc'ed one-bit-per-swap-page metadata that indicates
      whether the swap page is in frontswap or in the device and frontswap_pages
      counts how many pages are in frontswap.
      
      (3) adds hooks in the swap subsystem and extends try_to_unuse so that
      frontswap_shrink can do a "partial swapoff".
      
      Note that a failed frontswap_map allocation is safe... failure is noted
      by lack of "FS" in the subsequent printk.
      
      ---
      
      [v14: rebase to 3.4-rc2]
      [v10: no change]
      [v9: akpm@linux-foundation.org: mark some statics __read_mostly]
      [v9: akpm@linux-foundation.org: add clarifying comments]
      [v9: akpm@linux-foundation.org: no need to loop repeating try_to_unuse]
      [v9: error27@gmail.com: remove superfluous check for NULL]
      [v8: rebase to 3.0-rc4]
      [v8: kamezawa.hiroyu@jp.fujitsu.com: change counter to atomic_t to avoid races]
      [v8: kamezawa.hiroyu@jp.fujitsu.com: comment to clarify informational counters]
      [v7: rebase to 3.0-rc3]
      [v7: JBeulich@novell.com: add new swap struct elements only if config'd]
      [v6: rebase to 3.0-rc1]
      [v6: lliubbo@gmail.com: fix null pointer deref if vzalloc fails]
      [v6: konrad.wilk@oracl.com: various checks and code clarifications/comments]
      [v5: no change from v4]
      [v4: rebase to 2.6.39]
      Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Reviewed-by: NKamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NJan Beulich <JBeulich@novell.com>
      Acked-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Rik Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      [v11: Rebased, fixed mm/swapfile.c context change]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      38b5faf4
  29. 10 3月, 2011 1 次提交
    • J
      block: kill off REQ_UNPLUG · 721a9602
      Jens Axboe 提交于
      With the plugging now being explicitly controlled by the
      submitter, callers need not pass down unplugging hints
      to the block layer. If they want to unplug, it's because they
      manually plugged on their own - in which case, they should just
      unplug at will.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      721a9602
  30. 08 8月, 2010 1 次提交
    • C
      block: unify flags for struct bio and struct request · 7b6d91da
      Christoph Hellwig 提交于
      Remove the current bio flags and reuse the request flags for the bio, too.
      This allows to more easily trace the type of I/O from the filesystem
      down to the block driver.  There were two flags in the bio that were
      missing in the requests:  BIO_RW_UNPLUG and BIO_RW_AHEAD.  Also I've
      renamed two request flags that had a superflous RW in them.
      
      Note that the flags are in bio.h despite having the REQ_ name - as
      blkdev.h includes bio.h that is the only way to go for now.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      7b6d91da
  31. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6