1. 04 11月, 2017 1 次提交
  2. 07 9月, 2017 1 次提交
    • H
      mm: test code to write THP to swap device as a whole · 225311a4
      Huang Ying 提交于
      To support delay splitting THP (Transparent Huge Page) after swapped
      out, we need to enhance swap writing code to support to write a THP as a
      whole.  This will improve swap write IO performance.
      
      As Ming Lei <ming.lei@redhat.com> pointed out, this should be based on
      multipage bvec support, which hasn't been merged yet.  So this patch is
      only for testing the functionality of the other patches in the series.
      And will be reimplemented after multipage bvec support is merged.
      
      Link: http://lkml.kernel.org/r/20170724051840.2309-7-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ross Zwisler <ross.zwisler@intel.com> [for brd.c, zram_drv.c, pmem.c]
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Vishal L Verma <vishal.l.verma@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      225311a4
  3. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  4. 03 8月, 2017 1 次提交
    • T
      mm/page_io.c: fix oops during block io poll in swapin path · b0ba2d0f
      Tetsuo Handa 提交于
      When a thread is OOM-killed during swap_readpage() operation, an oops
      occurs because end_swap_bio_read() is calling wake_up_process() based on
      an assumption that the thread which called swap_readpage() is still
      alive.
      
        Out of memory: Kill process 525 (polkitd) score 0 or sacrifice child
        Killed process 525 (polkitd) total-vm:528128kB, anon-rss:0kB, file-rss:4kB, shmem-rss:0kB
        oom_reaper: reaped process 525 (polkitd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
        general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
        Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter coretemp ppdev pcspkr vmw_balloon sg shpchp vmw_vmci parport_pc parport i2c_piix4 ip_tables xfs libcrc32c sd_mod sr_mod cdrom ata_generic pata_acpi vmwgfx ahci libahci drm_kms_helper ata_piix syscopyarea sysfillrect sysimgblt fb_sys_fops mptspi scsi_transport_spi ttm e1000 mptscsih drm mptbase i2c_core libata serio_raw
        CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0-rc2-next-20170725 #129
        Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
        task: ffffffffb7c16500 task.stack: ffffffffb7c00000
        RIP: 0010:__lock_acquire+0x151/0x12f0
        Call Trace:
         <IRQ>
         lock_acquire+0x59/0x80
         _raw_spin_lock_irqsave+0x3b/0x4f
         try_to_wake_up+0x3b/0x410
         wake_up_process+0x10/0x20
         end_swap_bio_read+0x6f/0xf0
         bio_endio+0x92/0xb0
         blk_update_request+0x88/0x270
         scsi_end_request+0x32/0x1c0
         scsi_io_completion+0x209/0x680
         scsi_finish_command+0xd4/0x120
         scsi_softirq_done+0x120/0x140
         __blk_mq_complete_request_remote+0xe/0x10
         flush_smp_call_function_queue+0x51/0x120
         generic_smp_call_function_single_interrupt+0xe/0x20
         smp_trace_call_function_single_interrupt+0x22/0x30
         smp_call_function_single_interrupt+0x9/0x10
         call_function_single_interrupt+0xa7/0xb0
         </IRQ>
        RIP: 0010:native_safe_halt+0x6/0x10
         default_idle+0xe/0x20
         arch_cpu_idle+0xa/0x10
         default_idle_call+0x1e/0x30
         do_idle+0x187/0x200
         cpu_startup_entry+0x6e/0x70
         rest_init+0xd0/0xe0
         start_kernel+0x456/0x477
         x86_64_start_reservations+0x24/0x26
         x86_64_start_kernel+0xf7/0x11a
         secondary_startup_64+0xa5/0xa5
        Code: c3 49 81 3f 20 9e 0b b8 41 bc 00 00 00 00 44 0f 45 e2 83 fe 01 0f 87 62 ff ff ff 89 f0 49 8b 44 c7 08 48 85 c0 0f 84 52 ff ff ff <f0> ff 80 98 01 00 00 8b 3d 5a 49 c4 01 45 8b b3 18 0c 00 00 85
        RIP: __lock_acquire+0x151/0x12f0 RSP: ffffa01f39e03c50
        ---[ end trace 6c441db499169b1e ]---
        Kernel panic - not syncing: Fatal exception in interrupt
        Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
        ---[ end Kernel panic - not syncing: Fatal exception in interrupt
      
      Fix it by holding a reference to the thread.
      
      [akpm@linux-foundation.org: add comment]
      Fixes: 23955622 ("swap: add block io poll in swapin path")
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reviewed-by: NShaohua Li <shli@fb.com>
      Cc: Tim Chen <tim.c.chen@intel.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b0ba2d0f
  5. 11 7月, 2017 1 次提交
    • S
      swap: add block io poll in swapin path · 23955622
      Shaohua Li 提交于
      For fast flash disk, async IO could introduce overhead because of
      context switch.  block-mq now supports IO poll, which improves
      performance and latency a lot.  swapin is a good place to use this
      technique, because the task is waiting for the swapin page to continue
      execution.
      
      In my virtual machine, directly read 4k data from a NVMe with iopoll is
      about 60% better than that without poll.  With iopoll support in swapin
      patch, my microbenchmark (a task does random memory write) is about
      10%~25% faster.  CPU utilization increases a lot though, 2x and even 3x
      CPU utilization.  This will depend on disk speed.
      
      While iopoll in swapin isn't intended for all usage cases, it's a win
      for latency sensistive workloads with high speed swap disk.  block layer
      has knob to control poll in runtime.  If poll isn't enabled in block
      layer, there should be no noticeable change in swapin.
      
      I got a chance to run the same test in a NVMe with DRAM as the media.
      In simple fio IO test, blkpoll boosts 50% performance in single thread
      test and ~20% in 8 threads test.  So this is the base line.  In above
      swap test, blkpoll boosts ~27% performance in single thread test.
      blkpoll uses 2x CPU time though.
      
      If we enable hybid polling, the performance gain has very slight drop
      but CPU time is only 50% worse than that without blkpoll.  Also we can
      adjust parameter of hybid poll, with it, the CPU time penality is
      reduced further.  In 8 threads test, blkpoll doesn't help though.  The
      performance is similar to that without blkpoll, but cpu utilization is
      similar too.  There is lock contention in swap path.  The cpu time
      spending on blkpoll isn't high.  So overall, blkpoll swapin isn't worse
      than that without it.
      
      The swapin readahead might read several pages in in the same time and
      form a big IO request.  Since the IO will take longer time, it doesn't
      make sense to do poll, so the patch only does iopoll for single page
      swapin.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/070c3c3e40b711e7b1390002c991e86a-b5408f0@7511894063d3764ff01ea8111f5a004d7dd700ed078797c204a24e620ddb965cSigned-off-by: NShaohua Li <shli@fb.com>
      Cc: Tim Chen <tim.c.chen@intel.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      23955622
  6. 09 6月, 2017 1 次提交
  7. 03 11月, 2016 1 次提交
  8. 08 10月, 2016 1 次提交
  9. 20 9月, 2016 1 次提交
    • S
      mm: fix the page_swap_info() BUG_ON check · c8de641b
      Santosh Shilimkar 提交于
      Commit 62c230bc ("mm: add support for a filesystem to activate
      swap files and use direct_IO for writing swap pages") replaced the
      swap_aops dirty hook from __set_page_dirty_no_writeback() with
      swap_set_page_dirty().
      
      For normal cases without these special SWP flags code path falls back to
      __set_page_dirty_no_writeback() so the behaviour is expected to be the
      same as before.
      
      But swap_set_page_dirty() makes use of the page_swap_info() helper to
      get the swap_info_struct to check for the flags like SWP_FILE,
      SWP_BLKDEV etc as desired for those features.  This helper has
      BUG_ON(!PageSwapCache(page)) which is racy and safe only for the
      set_page_dirty_lock() path.
      
      For the set_page_dirty() path which is often needed for cases to be
      called from irq context, kswapd() can toggle the flag behind the back
      while the call is getting executed when system is low on memory and
      heavy swapping is ongoing.
      
      This ends up with undesired kernel panic.
      
      This patch just moves the check outside the helper to its users
      appropriately to fix kernel panic for the described path.  Couple of
      users of helpers already take care of SwapCache condition so I skipped
      them.
      
      Link: http://lkml.kernel.org/r/1473460718-31013-1-git-send-email-santosh.shilimkar@oracle.comSigned-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>	[4.7.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c8de641b
  10. 08 8月, 2016 1 次提交
  11. 29 7月, 2016 1 次提交
  12. 08 6月, 2016 2 次提交
  13. 02 5月, 2016 1 次提交
  14. 29 4月, 2016 1 次提交
    • M
      mm: call swap_slot_free_notify() with page lock held · b06bad17
      Minchan Kim 提交于
      Kyeongdon reported below error which is BUG_ON(!PageSwapCache(page)) in
      page_swap_info.  The reason is that page_endio in rw_page unlocks the
      page if read I/O is completed so we need to hold a PG_lock again to
      check PageSwapCache.  Otherwise, the page can be removed from swapcache.
      
        Kernel BUG at c00f9040 [verbose debug info unavailable]
        Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
        Modules linked in:
        CPU: 4 PID: 13446 Comm: RenderThread Tainted: G        W 3.10.84-g9f14aec-dirty #73
        task: c3b73200 ti: dd192000 task.ti: dd192000
        PC is at page_swap_info+0x10/0x2c
        LR is at swap_slot_free_notify+0x18/0x6c
        pc : [<c00f9040>]    lr : [<c00f5560>]    psr: 400f0113
        sp : dd193d78  ip : c2deb1e4  fp : da015180
        r10: 00000000  r9 : 000200da  r8 : c120fe08
        r7 : 00000000  r6 : 00000000  r5 : c249a6c0  r4 : = c249a6c0
        r3 : 00000000  r2 : 40080009  r1 : 200f0113  r0 : = c249a6c0
        ..<snip> ..
        Call Trace:
          page_swap_info+0x10/0x2c
          swap_slot_free_notify+0x18/0x6c
          swap_readpage+0x90/0x11c
          read_swap_cache_async+0x134/0x1ac
          swapin_readahead+0x70/0xb0
          handle_pte_fault+0x320/0x6fc
          handle_mm_fault+0xc0/0xf0
          do_page_fault+0x11c/0x36c
          do_DataAbort+0x34/0x118
      
      Fixes: 3f2b1a04 ("zram: revive swap_slot_free_notify")
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Tested-by: NKyeongdon Kim <kyeongdon.kim@lge.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b06bad17
  15. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  16. 23 3月, 2016 1 次提交
    • M
      zram: revive swap_slot_free_notify · 3f2b1a04
      Minchan Kim 提交于
      Commit b430e9d1 ("remove compressed copy from zram in-memory")
      applied swap_slot_free_notify call in *end_swap_bio_read* to remove
      duplicated memory between zram and memory.
      
      However, with the introduction of rw_page in zram: 8c7f0102 ("zram:
      implement rw_page operation of zram"), it became void because rw_page
      doesn't need bio.
      
      Memory footprint is really important in embedded platforms which have
      small memory, for example, 512M) recently because it could start to kill
      processes if memory footprint exceeds some threshold by LMK or some
      similar memory management modules.
      
      This patch restores the function for rw_page, thereby eliminating this
      duplication.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: karam.lee <karam.lee@lge.com>
      Cc: <sangseok.lee@lge.com>
      Cc: Chan Jeong <chan.jeong@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f2b1a04
  17. 18 3月, 2016 1 次提交
  18. 14 8月, 2015 1 次提交
  19. 29 7月, 2015 1 次提交
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  20. 19 5月, 2015 1 次提交
  21. 12 4月, 2015 1 次提交
  22. 26 3月, 2015 1 次提交
  23. 13 3月, 2015 1 次提交
  24. 29 1月, 2015 1 次提交
  25. 15 6月, 2014 1 次提交
    • A
      fix __swap_writepage() compile failure on old gcc versions · 05064084
      Al Viro 提交于
      Tetsuo Handa wrote:
       "Commit 62a8067a ("bio_vec-backed iov_iter") introduced an unnamed
        union inside a struct which gcc-4.4.7 cannot handle.  Name the unnamed
         union as u in order to fix build failure"
      
      Let's do this instead: there is only one place in the entire tree that
      steps into this breakage.  Anon structs and unions work in older gcc
      versions; as the matter of fact, we have those in the tree - see e.g.
      struct ieee80211_tx_info in include/net/mac80211.h
      
      What doesn't work is handling their initializers:
      
      struct {
      	int a;
      	union {
      		int b;
      		char c;
      	};
      } x[2] = {{.a = 1, .c = 'a'}, {.a = 0, .b = 1}};
      
      is the obvious syntax for initializer, perfectly fine for C11 and
      handled correctly by gcc-4.7 or later.
      
      Earlier versions, though, break on it - declaration is fine and so's
      access to fields (i.e.  x[0].c = 'a'; would produce the right code), but
      members of the anon structs and unions are not inserted into the right
      namespace.  Tellingly, those older versions will not barf on struct {int
      a; struct {int a;};}; - looks like they just have it hacked up somewhere
      around the handling of .  and -> instead of doing the right thing.
      
      The easiest way to deal with that crap is to turn initialization of
      those fields (in the only place where we have such initializer of
      iov_iter) into plain assignment.
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      05064084
  26. 05 6月, 2014 1 次提交
  27. 07 5月, 2014 3 次提交
    • A
      bio_vec-backed iov_iter · 62a8067a
      Al Viro 提交于
      New variant of iov_iter - ITER_BVEC in iter->type, backed with
      bio_vec array instead of iovec one.  Primitives taught to deal
      with such beasts, __swap_write() switched to using that kind
      of iov_iter.
      
      Note that bio_vec is just a <page, offset, length> triple - there's
      nothing block-specific about it.  I've left the definition where it
      was, but took it from under ifdef CONFIG_BLOCK.
      
      Next target: ->splice_write()...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      62a8067a
    • A
      start adding the tag to iov_iter · 71d8e532
      Al Viro 提交于
      For now, just use the same thing we pass to ->direct_IO() - it's all
      iovec-based at the moment.  Pass it explicitly to iov_iter_init() and
      account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO()
      uses.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      71d8e532
    • A
      pass iov_iter to ->direct_IO() · d8d3d94b
      Al Viro 提交于
      unmodified, for now
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d8d3d94b
  28. 24 1月, 2014 1 次提交
  29. 24 11月, 2013 1 次提交
    • K
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet 提交于
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
  30. 30 7月, 2013 1 次提交
    • K
      aio: Kill aio_rw_vect_retry() · 73a7075e
      Kent Overstreet 提交于
      This code doesn't serve any purpose anymore, since the aio retry
      infrastructure has been removed.
      
      This change should be safe because aio_read/write are also used for
      synchronous IO, and called from do_sync_read()/do_sync_write() - and
      there's no looping done in the sync case (the read and write syscalls).
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      73a7075e
  31. 04 7月, 2013 1 次提交
    • M
      mm: remove compressed copy from zram in-memory · b430e9d1
      Minchan Kim 提交于
      Swap subsystem does lazy swap slot free with expecting the page would be
      swapped out again so we can avoid unnecessary write.
      
      But the problem in in-memory swap(ex, zram) is that it consumes memory
      space until vm_swap_full(ie, used half of all of swap device) condition
      meet.  It could be bad if we use multiple swap device, small in-memory
      swap and big storage swap or in-memory swap alone.
      
      This patch makes swap subsystem free swap slot as soon as swap-read is
      completed and make the swapcache page dirty so the page should be
      written out the swap device to reclaim it.  It means we never lose it.
      
      I tested this patch with kernel compile workload.
      
      1. before
      
         compile time : 9882.42
         zram max wasted space by fragmentation: 13471881 byte
         memory space consumed by zram: 174227456 byte
         the number of slot free notify: 206684
      
      2. after
      
         compile time : 9653.90
         zram max wasted space by fragmentation: 11805932 byte
         memory space consumed by zram: 154001408 byte
         the number of slot free notify: 426972
      
      [akpm@linux-foundation.org: tweak comment text]
      [artem.savkov@gmail.com: fix BUG due to non-swapcache pages in end_swap_bio_read()]
      [akpm@linux-foundation.org: invert unlikely() test, augment comment, 80-col cleanup]
      Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NArtem Savkov <artem.savkov@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>
      Cc: Shaohua Li <shli@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b430e9d1
  32. 08 5月, 2013 1 次提交
  33. 30 4月, 2013 4 次提交
    • M
      mm: swap: mark swap pages writeback before queueing for direct IO · 0cdc444a
      Mel Gorman 提交于
      As pointed out by Andrew Morton, the swap-over-NFS writeback is not
      setting PageWriteback before it is queued for direct IO.  While swap
      pages do not participate in BDI or process dirty accounting and the IO
      is synchronous, the writeback bit is still required and not setting it
      in this case was an oversight.  swapoff depends on the page writeback to
      synchronoise all pending writes on a swap page before it is reused.
      Swapcache freeing and reuse depend on checking the PageWriteback under
      lock to ensure the page is safe to reuse.
      
      Direct IO handlers and the direct IO handler for NFS do not deal with
      PageWriteback as they are synchronous writes.  In the case of NFS, it
      schedules pages (or a page in the case of swap) for IO and then waits
      synchronously for IO to complete in nfs_direct_write().  It is
      recognised that this is a slowdown from normal swap handling which is
      asynchronous and uses a completion handler.  Shoving PageWriteback
      handling down into direct IO handlers looks like a bad fit to handle the
      swap case although it may have to be dealt with some day if swap is
      converted to use direct IO in general and bmap is finally done away
      with.  At that point it will be necessary to refit asynchronous direct
      IO with completion handlers onto the swap subsystem.
      
      As swapcache currently depends on PageWriteback to protect against
      races, this patch sets PageWriteback under the page lock before queueing
      it for direct IO.  It is cleared when the direct IO handler returns.  IO
      errors are treated similarly to the direct-to-bio case except PageError
      is not set as in the case of swap-over-NFS, it is likely to be a
      transient error.
      
      It was asked what prevents such a page being reclaimed in parallel.
      With this patch applied, such a page will now be skipped (most of the
      time) or blocked until the writeback completes.  Reclaim checks
      PageWriteback under the page lock before calling try_to_free_swap and
      the page lock should prevent the page being requeued for IO before it is
      freed.
      
      This and Jerome's related patch should considered for -stable as far
      back as 3.6 when swap-over-NFS was introduced.
      
      [akpm@linux-foundation.org: use pr_err_ratelimited()]
      [akpm@linux-foundation.org: remove hopefully-unneeded cast in printk]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>	[3.6+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0cdc444a
    • J
      swap: redirty page if page write fails on swap file · 2d30d31e
      Jerome Marchand 提交于
      Since commit 62c230bc ("mm: add support for a filesystem to activate
      swap files and use direct_IO for writing swap pages"), swap_writepage()
      calls direct_IO on swap files.  However, in that case the page isn't
      redirtied if I/O fails, and is therefore handled afterwards as if it has
      been successfully written to the swap file, leading to memory corruption
      when the page is eventually swapped back in.
      
      This patch sets the page dirty when direct_IO() fails.  It fixes a
      memory corruption that happened while using swap-over-NFS.
      Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>	[3.6+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d30d31e
    • S
      mm: allow for outstanding swap writeback accounting · 1eec6702
      Seth Jennings 提交于
      To prevent flooding the swap device with writebacks, frontswap backends
      need to count and limit the number of outstanding writebacks.  The
      incrementing of the counter can be done before the call to
      __swap_writepage().  However, the caller must receive a notification
      when the writeback completes in order to decrement the counter.
      
      To achieve this functionality, this patch modifies __swap_writepage() to
      take the bio completion callback function as an argument.
      
      end_swap_bio_write(), the normal bio completion function, is also made
      non-static so that code doing the accounting can call it after the
      accounting is done.
      
      There should be no behavioural change to existing code.
      Signed-off-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1eec6702
    • S
      mm: break up swap_writepage() for frontswap backends · 2f772e6c
      Seth Jennings 提交于
      swap_writepage() is currently where frontswap hooks into the swap write
      path to capture pages with the frontswap_store() function.  However, if
      a frontswap backend wants to "resume" the writeback of a page to the
      swap device, it can't call swap_writepage() as the page will simply
      reenter the backend.
      
      This patch separates swap_writepage() into a top and bottom half, the
      bottom half named __swap_writepage() to allow a frontswap backend, like
      zswap, to resume writeback beyond the frontswap_store() hook.
      
      __add_to_swap_cache() is also made non-static so that the page for which
      writeback is to be resumed can be added to the swap cache.
      Signed-off-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f772e6c
  34. 24 3月, 2013 1 次提交
    • K
      block: Remove bi_idx references · 4f2ac93c
      Kent Overstreet 提交于
      For immutable bvecs, all bi_idx usage needs to be audited - so here
      we're removing all the unnecessary uses.
      
      Most of these are places where it was being initialized on a bio that
      was just allocated, a few others are conversions to standard macros.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      4f2ac93c