1. 28 9月, 2017 1 次提交
  2. 25 9月, 2017 1 次提交
    • L
      fs: Fix page cache inconsistency when mixing buffered and AIO DIO · 332391a9
      Lukas Czerner 提交于
      Currently when mixing buffered reads and asynchronous direct writes it
      is possible to end up with the situation where we have stale data in the
      page cache while the new data is already written to disk. This is
      permanent until the affected pages are flushed away. Despite the fact
      that mixing buffered and direct IO is ill-advised it does pose a thread
      for a data integrity, is unexpected and should be fixed.
      
      Fix this by deferring completion of asynchronous direct writes to a
      process context in the case that there are mapped pages to be found in
      the inode. Later before the completion in dio_complete() invalidate
      the pages in question. This ensures that after the completion the pages
      in the written area are either unmapped, or populated with up-to-date
      data. Also do the same for the iomap case which uses
      iomap_dio_complete() instead.
      
      This has a side effect of deferring the completion to a process context
      for every AIO DIO that happens on inode that has pages mapped. However
      since the consensus is that this is ill-advised practice the performance
      implication should not be a problem.
      
      This was based on proposal from Jeff Moyer, thanks!
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      332391a9
  3. 15 9月, 2017 1 次提交
    • T
      sched/wait: Introduce wakeup boomark in wake_up_page_bit · 11a19c7b
      Tim Chen 提交于
      Now that we have added breaks in the wait queue scan and allow bookmark
      on scan position, we put this logic in the wake_up_page_bit function.
      
      We can have very long page wait list in large system where multiple
      pages share the same wait list. We break the wake up walk here to allow
      other cpus a chance to access the list, and not to disable the interrupts
      when traversing the list for too long.  This reduces the interrupt and
      rescheduling latency, and excessive page wait queue lock hold time.
      
      [ v2: Remove bookmark_wake_function ]
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      11a19c7b
  4. 14 9月, 2017 2 次提交
    • P
      mm, page_owner: skip unnecessary stack_trace entries · 5f48f0bd
      Prakash Gupta 提交于
      The page_owner stacktrace always begin as follows:
      
        [<ffffff987bfd48f4>] save_stack+0x40/0xc8
        [<ffffff987bfd4da8>] __set_page_owner+0x3c/0x6c
      
      These two entries do not provide any useful information and limits the
      available stacktrace depth.  The page_owner stacktrace was skipping
      caller function from stack entries but this was missed with commit
      f2ca0b55 ("mm/page_owner: use stackdepot to store stacktrace")
      
      Example page_owner entry after the patch:
      
        Page allocated via order 0, mask 0x8(ffffff80085fb714)
        PFN 654411 type Movable Block 639 type CMA Flags 0x0(ffffffbe5c7f12c0)
        [<ffffff9b64989c14>] post_alloc_hook+0x70/0x80
        ...
        [<ffffff9b651216e8>] msm_comm_try_state+0x5f8/0x14f4
        [<ffffff9b6512486c>] msm_vidc_open+0x5e4/0x7d0
        [<ffffff9b65113674>] msm_v4l2_open+0xa8/0x224
      
      Link: http://lkml.kernel.org/r/1504078343-28754-2-git-send-email-guptap@codeaurora.org
      Fixes: f2ca0b55 ("mm/page_owner: use stackdepot to store stacktrace")
      Signed-off-by: NPrakash Gupta <guptap@codeaurora.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5f48f0bd
    • M
      mm: treewide: remove GFP_TEMPORARY allocation flag · 0ee931c4
      Michal Hocko 提交于
      GFP_TEMPORARY was introduced by commit e12ba74d ("Group short-lived
      and reclaimable kernel allocations") along with __GFP_RECLAIMABLE.  It's
      primary motivation was to allow users to tell that an allocation is
      short lived and so the allocator can try to place such allocations close
      together and prevent long term fragmentation.  As much as this sounds
      like a reasonable semantic it becomes much less clear when to use the
      highlevel GFP_TEMPORARY allocation flag.  How long is temporary? Can the
      context holding that memory sleep? Can it take locks? It seems there is
      no good answer for those questions.
      
      The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
      __GFP_RECLAIMABLE which in itself is tricky because basically none of
      the existing caller provide a way to reclaim the allocated memory.  So
      this is rather misleading and hard to evaluate for any benefits.
      
      I have checked some random users and none of them has added the flag
      with a specific justification.  I suspect most of them just copied from
      other existing users and others just thought it might be a good idea to
      use without any measuring.  This suggests that GFP_TEMPORARY just
      motivates for cargo cult usage without any reasoning.
      
      I believe that our gfp flags are quite complex already and especially
      those with highlevel semantic should be clearly defined to prevent from
      confusion and abuse.  Therefore I propose dropping GFP_TEMPORARY and
      replace all existing users to simply use GFP_KERNEL.  Please note that
      SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
      so they will be placed properly for memory fragmentation prevention.
      
      I can see reasons we might want some gfp flag to reflect shorterm
      allocations but I propose starting from a clear semantic definition and
      only then add users with proper justification.
      
      This was been brought up before LSF this year by Matthew [1] and it
      turned out that GFP_TEMPORARY really doesn't have a clear semantic.  It
      seems to be a heuristic without any measured advantage for most (if not
      all) its current users.  The follow up discussion has revealed that
      opinions on what might be temporary allocation differ a lot between
      developers.  So rather than trying to tweak existing users into a
      semantic which they haven't expected I propose to simply remove the flag
      and start from scratch if we really need a semantic for short term
      allocations.
      
      [1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org
      
      [akpm@linux-foundation.org: fix typo]
      [akpm@linux-foundation.org: coding-style fixes]
      [sfr@canb.auug.org.au: drm/i915: fix up]
        Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
      Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ee931c4
  5. 12 9月, 2017 1 次提交
  6. 09 9月, 2017 34 次提交