1. 18 12月, 2019 1 次提交
    • M
      mm, thp, proc: report THP eligibility for each vma · c76adee3
      Michal Hocko 提交于
      [ Upstream commit 7635d9cbe8327e131a1d3d8517dc186c2796ce2e ]
      
      Userspace falls short when trying to find out whether a specific memory
      range is eligible for THP.  There are usecases that would like to know
      that
      http://lkml.kernel.org/r/alpine.DEB.2.21.1809251248450.50347@chino.kir.corp.google.com
      : This is used to identify heap mappings that should be able to fault thp
      : but do not, and they normally point to a low-on-memory or fragmentation
      : issue.
      
      The only way to deduce this now is to query for hg resp.  nh flags and
      confronting the state with the global setting.  Except that there is also
      PR_SET_THP_DISABLE that might change the picture.  So the final logic is
      not trivial.  Moreover the eligibility of the vma depends on the type of
      VMA as well.  In the past we have supported only anononymous memory VMAs
      but things have changed and shmem based vmas are supported as well these
      days and the query logic gets even more complicated because the
      eligibility depends on the mount option and another global configuration
      knob.
      
      Simplify the current state and report the THP eligibility in
      /proc/<pid>/smaps for each existing vma.  Reuse
      transparent_hugepage_enabled for this purpose.  The original
      implementation of this function assumes that the caller knows that the vma
      itself is supported for THP so make the core checks into
      __transparent_hugepage_enabled and use it for existing callers.
      __show_smap just use the new transparent_hugepage_enabled which also
      checks the vma support status (please note that this one has to be out of
      line due to include dependency issues).
      
      [mhocko@kernel.org: fix oops with NULL ->f_mapping]
        Link: http://lkml.kernel.org/r/20181224185106.GC16738@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/20181211143641.3503-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Paul Oppenheimer <bepvte@gmail.com>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      c76adee3
  2. 21 9月, 2019 1 次提交
  3. 26 5月, 2019 1 次提交
    • A
      dcache: sort the freeing-without-RCU-delay mess for good. · c939121b
      Al Viro 提交于
      commit 5467a68cbf6884c9a9d91e2a89140afb1839c835 upstream.
      
      For lockless accesses to dentries we don't have pinned we rely
      (among other things) upon having an RCU delay between dropping
      the last reference and actually freeing the memory.
      
      On the other hand, for things like pipes and sockets we neither
      do that kind of lockless access, nor want to deal with the
      overhead of an RCU delay every time a socket gets closed.
      
      So delay was made optional - setting DCACHE_RCUACCESS in ->d_flags
      made sure it would happen.  We tried to avoid setting it unless
      we knew we need it.  Unfortunately, that had led to recurring
      class of bugs, in which we missed the need to set it.
      
      We only really need it for dentries that are created by
      d_alloc_pseudo(), so let's not bother with trying to be smart -
      just make having an RCU delay the default.  The ones that do
      *not* get it set the replacement flag (DCACHE_NORCU) and we'd
      better use that sparingly.  d_alloc_pseudo() is the only
      such user right now.
      
      FWIW, the race that finally prompted that switch had been
      between __lock_parent() of immediate subdirectory of what's
      currently the root of a disconnected tree (e.g. from
      open-by-handle in progress) racing with d_splice_alias()
      elsewhere picking another alias for the same inode, either
      on outright corrupted fs image, or (in case of open-by-handle
      on NFS) that subdirectory having been just moved on server.
      It's not easy to hit, so the sky is not falling, but that's
      not the first race on similar missed cases and the logics
      for settinf DCACHE_RCUACCESS has gotten ridiculously
      convoluted.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c939121b
  4. 26 1月, 2019 1 次提交
  5. 21 11月, 2018 1 次提交
    • M
      ovl: automatically enable redirect_dir on metacopy=on · be677259
      Miklos Szeredi 提交于
      commit d47748e5ae5af6572e520cc9767bbe70c22ea498 upstream.
      
      Current behavior is to automatically disable metacopy if redirect_dir is
      not enabled and proceed with the mount.
      
      If "metacopy=on" mount option was given, then this behavior can confuse the
      user: no mount failure, yet metacopy is disabled.
      
      This patch makes metacopy=on imply redirect_dir=on.
      
      The converse is also true: turning off full redirect with redirect_dir=
      {off|follow|nofollow} will disable metacopy.
      
      If both metacopy=on and redirect_dir={off|follow|nofollow} is specified,
      then mount will fail, since there's no way to correctly resolve the
      conflict.
      Reported-by: NDaniel Walsh <dwalsh@redhat.com>
      Fixes: d5791044 ("ovl: Provide a mount option metacopy=on/off...")
      Cc: <stable@vger.kernel.org> # v4.19
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      be677259
  6. 14 11月, 2018 1 次提交
  7. 30 8月, 2018 2 次提交
  8. 23 8月, 2018 1 次提交
  9. 22 8月, 2018 1 次提交
  10. 18 8月, 2018 1 次提交
    • N
      fs/seq_file.c: simplify seq_file iteration code and interface · 1f4aace6
      NeilBrown 提交于
      The documentation for seq_file suggests that it is necessary to be able
      to move the iterator to a given offset, however that is not the case.
      If the iterator is stored in the private data and is stable from one
      read() syscall to the next, it is only necessary to support first/next
      interactions.  Implementing this in a client is a little clumsy.
      
       - if ->start() is given a pos of zero, it should go to start of
         sequence.
      
       - if ->start() is given the name pos that was given to the most recent
         next() or start(), it should restore the iterator to state just
         before that last call
      
       - if ->start is given another number, it should set the iterator one
         beyond the start just before the last ->start or ->next call.
      
      Also, the documentation says that the implementation can interpret the
      pos however it likes (other than zero meaning start), but seq_file
      increments the pos sometimes which does impose on the implementation.
      
      This patch simplifies the interface for first/next iteration and
      simplifies the code, while maintaining complete backward compatability.
      Now:
      
       - if ->start() is given a pos of zero, it should return an iterator
         placed at the start of the sequence
      
       - if ->start() is given a non-zero pos, it should return the iterator
         in the same state it was after the last ->start or ->next.
      
      This is particularly useful for interators which walk the multiple
      chains in a hash table, e.g.  using rhashtable_walk*.  See
      fs/gfs2/glock.c and drivers/staging/lustre/lustre/llite/vvp_dev.c
      
      A large part of achieving this is to *always* call ->next after ->show
      has successfully stored all of an entry in the buffer.  Never just
      increment the index instead.  Also:
      
       - always pass &m->index to ->start() and ->next(), never a temp
         variable
      
       - don't clear ->from when ->count is zero, as ->from is dead when
         ->count is zero.
      
      Some ->next functions do not increment *pos when they return NULL.  To
      maintain compatability with this, we still need to increment m->index in
      one place, if ->next didn't increment it.  Note that such ->next
      functions are buggy and should be fixed.  A simple demonstration is
      
         dd if=/proc/swaps bs=1000 skip=1
      
      Choose any block size larger than the size of /proc/swaps.  This will
      always show the whole last line of /proc/swaps.
      
      This patch doesn't work around buggy next() functions for this case.
      
      [neilb@suse.com: ensure ->from is valid]
        Link: http://lkml.kernel.org/r/87601ryb8a.fsf@notabene.neil.brown.nameSigned-off-by: NNeilBrown <neilb@suse.com>
      Acked-by: Jonathan Corbet <corbet@lwn.net>	[docs]
      Tested-by: NJann Horn <jannh@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f4aace6
  11. 14 8月, 2018 1 次提交
    • C
      f2fs: support fault_type mount option · d494500a
      Chao Yu 提交于
      Previously, once fault injection is on, by default, all kind of faults
      will be injected to f2fs, if we want to trigger single or specified
      combined type during the test, we need to configure sysfs entry, it will
      be a little inconvenient to integrate sysfs configuring into testsuit,
      such as xfstest.
      
      So this patch introduces a new mount option 'fault_type' to assist old
      option 'fault_injection', with these two mount options, we can specify
      any fault rate/type at mount-time.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d494500a
  12. 08 8月, 2018 2 次提交
  13. 30 7月, 2018 16 次提交
  14. 27 7月, 2018 1 次提交
  15. 23 7月, 2018 2 次提交
  16. 20 7月, 2018 2 次提交
  17. 18 7月, 2018 2 次提交
  18. 12 7月, 2018 2 次提交
  19. 11 7月, 2018 1 次提交