1. 26 9月, 2017 1 次提交
  2. 25 9月, 2017 1 次提交
    • L
      fs: Fix page cache inconsistency when mixing buffered and AIO DIO · 332391a9
      Lukas Czerner 提交于
      Currently when mixing buffered reads and asynchronous direct writes it
      is possible to end up with the situation where we have stale data in the
      page cache while the new data is already written to disk. This is
      permanent until the affected pages are flushed away. Despite the fact
      that mixing buffered and direct IO is ill-advised it does pose a thread
      for a data integrity, is unexpected and should be fixed.
      
      Fix this by deferring completion of asynchronous direct writes to a
      process context in the case that there are mapped pages to be found in
      the inode. Later before the completion in dio_complete() invalidate
      the pages in question. This ensures that after the completion the pages
      in the written area are either unmapped, or populated with up-to-date
      data. Also do the same for the iomap case which uses
      iomap_dio_complete() instead.
      
      This has a side effect of deferring the completion to a process context
      for every AIO DIO that happens on inode that has pages mapped. However
      since the consensus is that this is ill-advised practice the performance
      implication should not be a problem.
      
      This was based on proposal from Jeff Moyer, thanks!
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      332391a9
  3. 23 9月, 2017 2 次提交
  4. 21 9月, 2017 5 次提交
  5. 20 9月, 2017 5 次提交
  6. 19 9月, 2017 1 次提交
  7. 18 9月, 2017 3 次提交
  8. 16 9月, 2017 1 次提交
    • J
      fs/proc: Report eip/esp in /prod/PID/stat for coredumping · fd7d5627
      John Ogness 提交于
      Commit 0a1eb2d4 ("fs/proc: Stop reporting eip and esp in
      /proc/PID/stat") stopped reporting eip/esp because it is
      racy and dangerous for executing tasks. The comment adds:
      
          As far as I know, there are no use programs that make any
          material use of these fields, so just get rid of them.
      
      However, existing userspace core-dump-handler applications (for
      example, minicoredumper) are using these fields since they
      provide an excellent cross-platform interface to these valuable
      pointers. So that commit introduced a user space visible
      regression.
      
      Partially revert the change and make the readout possible for
      tasks with the proper permissions and only if the target task
      has the PF_DUMPCORE flag set.
      
      Fixes: 0a1eb2d4 ("fs/proc: Stop reporting eip and esp in> /proc/PID/stat")
      Reported-by: NMarco Felsch <marco.felsch@preh.de>
      Signed-off-by: NJohn Ogness <john.ogness@linutronix.de>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Tycho Andersen <tycho.andersen@canonical.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: stable@vger.kernel.org
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Linux API <linux-api@vger.kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/87poatfwg6.fsf@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      fd7d5627
  9. 15 9月, 2017 9 次提交
  10. 14 9月, 2017 3 次提交
    • M
      mm: treewide: remove GFP_TEMPORARY allocation flag · 0ee931c4
      Michal Hocko 提交于
      GFP_TEMPORARY was introduced by commit e12ba74d ("Group short-lived
      and reclaimable kernel allocations") along with __GFP_RECLAIMABLE.  It's
      primary motivation was to allow users to tell that an allocation is
      short lived and so the allocator can try to place such allocations close
      together and prevent long term fragmentation.  As much as this sounds
      like a reasonable semantic it becomes much less clear when to use the
      highlevel GFP_TEMPORARY allocation flag.  How long is temporary? Can the
      context holding that memory sleep? Can it take locks? It seems there is
      no good answer for those questions.
      
      The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
      __GFP_RECLAIMABLE which in itself is tricky because basically none of
      the existing caller provide a way to reclaim the allocated memory.  So
      this is rather misleading and hard to evaluate for any benefits.
      
      I have checked some random users and none of them has added the flag
      with a specific justification.  I suspect most of them just copied from
      other existing users and others just thought it might be a good idea to
      use without any measuring.  This suggests that GFP_TEMPORARY just
      motivates for cargo cult usage without any reasoning.
      
      I believe that our gfp flags are quite complex already and especially
      those with highlevel semantic should be clearly defined to prevent from
      confusion and abuse.  Therefore I propose dropping GFP_TEMPORARY and
      replace all existing users to simply use GFP_KERNEL.  Please note that
      SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
      so they will be placed properly for memory fragmentation prevention.
      
      I can see reasons we might want some gfp flag to reflect shorterm
      allocations but I propose starting from a clear semantic definition and
      only then add users with proper justification.
      
      This was been brought up before LSF this year by Matthew [1] and it
      turned out that GFP_TEMPORARY really doesn't have a clear semantic.  It
      seems to be a heuristic without any measured advantage for most (if not
      all) its current users.  The follow up discussion has revealed that
      opinions on what might be temporary allocation differ a lot between
      developers.  So rather than trying to tweak existing users into a
      semantic which they haven't expected I propose to simply remove the flag
      and start from scratch if we really need a semantic for short term
      allocations.
      
      [1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org
      
      [akpm@linux-foundation.org: fix typo]
      [akpm@linux-foundation.org: coding-style fixes]
      [sfr@canb.auug.org.au: drm/i915: fix up]
        Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
      Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ee931c4
    • A
      fscache: fix fscache_objlist_show format processing · ebfddb3d
      Arnd Bergmann 提交于
      gcc points out a minor bug in the handling of unknown cookie types,
      which could result in a string overflow when the integer is copied into
      a 3-byte string:
      
        fs/fscache/object-list.c: In function 'fscache_objlist_show':
        fs/fscache/object-list.c:265:19: error: 'sprintf' may write a terminating nul past the end of the destination [-Werror=format-overflow=]
         sprintf(_type, "%02u", cookie->def->type);
                        ^~~~~~
        fs/fscache/object-list.c:265:4: note: 'sprintf' output between 3 and 4 bytes into a destination of size 3
      
      This is currently harmless as no code sets a type other than 0 or 1, but
      it makes sense to use snprintf() here to avoid overflowing the array if
      that changes.
      
      Link: http://lkml.kernel.org/r/20170714120720.906842-22-arnd@arndb.deSigned-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ebfddb3d
    • A
      procfs: remove unused variable · 6dec0dd4
      Arnd Bergmann 提交于
      In NOMMU configurations, we get a warning about a variable that has become
      unused:
      
        fs/proc/task_nommu.c: In function 'nommu_vma_show':
        fs/proc/task_nommu.c:148:28: error: unused variable 'priv' [-Werror=unused-variable]
      
      Link: http://lkml.kernel.org/r/20170911200231.3171415-1-arnd@arndb.de
      Fixes: 1240ea0d ("fs, proc: remove priv argument from is_stack")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6dec0dd4
  11. 13 9月, 2017 5 次提交
  12. 12 9月, 2017 4 次提交
    • A
      ovl: fix false positive ESTALE on lookup · 939ae4ef
      Amir Goldstein 提交于
      Commit b9ac5c27 ("ovl: hash overlay non-dir inodes by copy up origin")
      verifies that the origin lower inode stored in the overlayfs inode matched
      the inode of a copy up origin dentry found by lookup.
      
      There is a false positive result in that check when lower fs does not
      support file handles and copy up origin cannot be followed by file handle
      at lookup time.
      
      The false negative happens when finding an overlay inode in cache on a
      copied up overlay dentry lookup. The overlay inode still 'remembers' the
      copy up origin inode, but the copy up origin dentry is not available for
      verification.
      
      Relax the check in case copy up origin dentry is not available.
      
      Fixes: b9ac5c27 ("ovl: hash overlay non-dir inodes by copy up...")
      Cc: <stable@vger.kernel.org> # v4.13
      Reported-by: NJordi Pujol <jordipujolp@gmail.com>
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      939ae4ef
    • M
      fuse: getattr cleanup · 5b97eeac
      Miklos Szeredi 提交于
      The refreshed argument isn't used by any caller, get rid of it.
      
      Use a helper for just updating the inode (no need to fill in a kstat).
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      5b97eeac
    • M
      fuse: honor iocb sync flags on write · e1c0eecb
      Miklos Szeredi 提交于
      If the IOCB_DSYNC flag is set a sync is not being performed by
      fuse_file_write_iter.
      
      Honor IOCB_DSYNC/IOCB_SYNC by setting O_DYSNC/O_SYNC respectively in the
      flags filed of the write request.
      
      We don't need to sync data or metadata, since fuse_perform_write() does
      write-through and the filesystem is responsible for updating file times.
      
      Original patch by Vitaly Zolotusky.
      Reported-by: NNate Clark <nate@neworld.us>
      Cc: Vitaly Zolotusky <vitaly@unitc.com>.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      e1c0eecb
    • M
      fuse: allow server to run in different pid_ns · 5d6d3a30
      Miklos Szeredi 提交于
      Commit 0b6e9ea0 ("fuse: Add support for pid namespaces") broke
      Sandstorm.io development tools, which have been sending FUSE file
      descriptors across PID namespace boundaries since early 2014.
      
      The above patch added a check that prevented I/O on the fuse device file
      descriptor if the pid namespace of the reader/writer was different from the
      pid namespace of the mounter.  With this change passing the device file
      descriptor to a different pid namespace simply doesn't work.  The check was
      added because pids are transferred to/from the fuse userspace server in the
      namespace registered at mount time.
      
      To fix this regression, remove the checks and do the following:
      
      1) the pid in the request header (the pid of the task that initiated the
      filesystem operation) is translated to the reader's pid namespace.  If a
      mapping doesn't exist for this pid, then a zero pid is used.  Note: even if
      a mapping would exist between the initiator task's pid namespace and the
      reader's pid namespace the pid will be zero if either mapping from
      initator's to mounter's namespace or mapping from mounter's to reader's
      namespace doesn't exist.
      
      2) The lk.pid value in setlk/setlkw requests and getlk reply is left alone.
      Userspace should not interpret this value anyway.  Also allow the
      setlk/setlkw operations if the pid of the task cannot be represented in the
      mounter's namespace (pid being zero in that case).
      Reported-by: NKenton Varda <kenton@sandstorm.io>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 0b6e9ea0 ("fuse: Add support for pid namespaces")
      Cc: <stable@vger.kernel.org> # v4.12+
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Seth Forshee <seth.forshee@canonical.com>
      5d6d3a30