1. 13 2月, 2015 1 次提交
    • V
      list_lru: introduce list_lru_shrink_{count,walk} · 503c358c
      Vladimir Davydov 提交于
      Kmem accounting of memcg is unusable now, because it lacks slab shrinker
      support.  That means when we hit the limit we will get ENOMEM w/o any
      chance to recover.  What we should do then is to call shrink_slab, which
      would reclaim old inode/dentry caches from this cgroup.  This is what
      this patch set is intended to do.
      
      Basically, it does two things.  First, it introduces the notion of
      per-memcg slab shrinker.  A shrinker that wants to reclaim objects per
      cgroup should mark itself as SHRINKER_MEMCG_AWARE.  Then it will be
      passed the memory cgroup to scan from in shrink_control->memcg.  For
      such shrinkers shrink_slab iterates over the whole cgroup subtree under
      the target cgroup and calls the shrinker for each kmem-active memory
      cgroup.
      
      Secondly, this patch set makes the list_lru structure per-memcg.  It's
      done transparently to list_lru users - everything they have to do is to
      tell list_lru_init that they want memcg-aware list_lru.  Then the
      list_lru will automatically distribute objects among per-memcg lists
      basing on which cgroup the object is accounted to.  This way to make FS
      shrinkers (icache, dcache) memcg-aware we only need to make them use
      memcg-aware list_lru, and this is what this patch set does.
      
      As before, this patch set only enables per-memcg kmem reclaim when the
      pressure goes from memory.limit, not from memory.kmem.limit.  Handling
      memory.kmem.limit is going to be tricky due to GFP_NOFS allocations, and
      it is still unclear whether we will have this knob in the unified
      hierarchy.
      
      This patch (of 9):
      
      NUMA aware slab shrinkers use the list_lru structure to distribute
      objects coming from different NUMA nodes to different lists.  Whenever
      such a shrinker needs to count or scan objects from a particular node,
      it issues commands like this:
      
              count = list_lru_count_node(lru, sc->nid);
              freed = list_lru_walk_node(lru, sc->nid, isolate_func,
                                         isolate_arg, &sc->nr_to_scan);
      
      where sc is an instance of the shrink_control structure passed to it
      from vmscan.
      
      To simplify this, let's add special list_lru functions to be used by
      shrinkers, list_lru_shrink_count() and list_lru_shrink_walk(), which
      consolidate the nid and nr_to_scan arguments in the shrink_control
      structure.
      
      This will also allow us to avoid patching shrinkers that use list_lru
      when we make shrink_slab() per-memcg - all we will have to do is extend
      the shrink_control structure to include the target memcg and make
      list_lru_shrink_{count,walk} handle this appropriately.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Suggested-by: NDave Chinner <david@fromorbit.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      503c358c
  2. 11 2月, 2015 1 次提交
  3. 10 2月, 2015 1 次提交
  4. 06 2月, 2015 1 次提交
  5. 05 2月, 2015 2 次提交
  6. 02 2月, 2015 12 次提交
  7. 30 1月, 2015 1 次提交
  8. 28 1月, 2015 1 次提交
    • J
      quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units · 14bf61ff
      Jan Kara 提交于
      Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which
      tracks space limits and usage in 512-byte blocks. However VFS quotas
      track usage in bytes (as some filesystems require that) and we need to
      somehow pass this information. Upto now it wasn't a problem because we
      didn't do any unit conversion (thus VFS quota routines happily stuck
      number of bytes into d_bcount field of struct fd_disk_quota). Only if
      you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA
      / Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone
      tried this but reportedly some Samba users hit the problem in practice.
      So when we want interfaces compatible we need to fix this.
      
      We bite the bullet and define another quota structure used for passing
      information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have
      to have more conversion routines in fs/quota/quota.c and another copying
      of quota structure slows down getting of quota information by about 2%
      but it seems cleaner than overloading e.g. units of d_bcount to bytes.
      
      CC: stable@vger.kernel.org
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      14bf61ff
  9. 22 1月, 2015 11 次提交
    • B
      xfs: remove incorrect error negation in attr_multi ioctl · 4d949021
      Brian Foster 提交于
      xfs_compat_attrmulti_by_handle() calls memdup_user() which returns a
      negative error code. The error code is negated by the caller and thus
      incorrectly converted to a positive error code.
      
      Remove the error negation such that the negative error is passed
      correctly back up to userspace.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      4d949021
    • D
      xfs: set superblock buffer type correctly · 3443a3bc
      Dave Chinner 提交于
      When the superblock is modified in a transaction, the commonly
      modified fields are not actually copied to the superblock buffer to
      avoid the buffer lock becoming a serialisation point. However, there
      are some other operations that modify the superblock fields within
      the transaction that don't directly log to the superblock but rely
      on the changes to be applied during the transaction commit (to
      minimise the buffer lock hold time).
      
      When we do this, we fail to mark the buffer log item as being a
      superblock buffer and that can lead to the buffer not being marked
      with the corect type in the log and hence causing recovery issues.
      Fix it by setting the type correctly, similar to xfs_mod_sb()...
      
      cc: <stable@vger.kernel.org> # 3.10 to current
      Tested-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      3443a3bc
    • D
      xfs: set buf types when converting extent formats · fe22d552
      Dave Chinner 提交于
      Conversion from local to extent format does not set the buffer type
      correctly on the new extent buffer when a symlink data is moved out
      of line.
      
      Fix the symlink code and leave a comment in the generic bmap code
      reminding us that the format-specific data copy needs to set the
      destination buffer type appropriately.
      
      cc: <stable@vger.kernel.org> # 3.10 to current
      Tested-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      fe22d552
    • D
      xfs: inode unlink does not set AGI buffer type · f19b872b
      Dave Chinner 提交于
      This leads to log recovery throwing errors like:
      
      XFS (md0): Mounting V5 Filesystem
      XFS (md0): Starting recovery (logdev: internal)
      XFS (md0): Unknown buffer type 0!
      XFS (md0): _xfs_buf_ioapply: no ops on block 0xaea8802/0x1
      ffff8800ffc53800: 58 41 47 49 .....
      
      Which is the AGI buffer magic number.
      
      Ensure that we set the type appropriately in both unlink list
      addition and removal.
      
      cc: <stable@vger.kernel.org> # 3.10 to current
      Tested-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      f19b872b
    • D
      xfs: ensure buffer types are set correctly · 0d612fb5
      Dave Chinner 提交于
      Jan Kara reported that log recovery was finding buffers with invalid
      types in them. This should not happen, and indicates a bug in the
      logging of buffers. To catch this, add asserts to the buffer
      formatting code to ensure that the buffer type is in range when the
      transaction is committed.
      
      We don't set a type on buffers being marked stale - they are not
      going to get replayed, the format item exists only for recovery to
      be able to prevent replay of the buffer, so the type does not
      matter. Hence that needs special casing here.
      
      cc: <stable@vger.kernel.org> # 3.10 to current
      Reported-by: NJan Kara <jack@suse.cz>
      Tested-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      0d612fb5
    • D
      xfs: sanitise sb_bad_features2 handling · 074e427b
      Dave Chinner 提交于
      We currently have to ensure that every time we update sb_features2
      that we update sb_bad_features2. Now that we log and format the
      superblock in it's entirety we actually don't have to care because
      we can simply update the sb_bad_features2 when we format it into the
      buffer. This removes the need for anything but the mount and
      superblock formatting code to care about sb_bad_features2, and
      hence removes the possibility that we forget to update bad_features2
      when necessary in the future.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      074e427b
    • D
      xfs: consolidate superblock logging functions · 61e63ecb
      Dave Chinner 提交于
      We now have several superblock loggin functions that are identical
      except for the transaction reservation and whether it shoul dbe a
      synchronous transaction or not. Consolidate these all into a single
      function, a single reserveration and a sync flag and call it
      xfs_sync_sb().
      
      Also, xfs_mod_sb() is not really a modification function - it's the
      operation of logging the superblock buffer. hence change the name of
      it to reflect this.
      
      Note that we have to change the mp->m_update_flags that are passed
      around at mount time to a boolean simply to indicate a superblock
      update is needed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      61e63ecb
    • D
      xfs: remove bitfield based superblock updates · 4d11a402
      Dave Chinner 提交于
      When we log changes to the superblock, we first have to write them
      to the on-disk buffer, and then log that. Right now we have a
      complex bitfield based arrangement to only write the modified field
      to the buffer before we log it.
      
      This used to be necessary as a performance optimisation because we
      logged the superblock buffer in every extent or inode allocation or
      freeing, and so performance was extremely important. We haven't done
      this for years, however, ever since the lazy superblock counters
      pulled the superblock logging out of the transaction commit
      fast path.
      
      Hence we have a bunch of complexity that is not necessary that makes
      writing the in-core superblock to disk much more complex than it
      needs to be. We only need to log the superblock now during
      management operations (e.g. during mount, unmount or quota control
      operations) so it is not a performance critical path anymore.
      
      As such, remove the complex field based logging mechanism and
      replace it with a simple conversion function similar to what we use
      for all other on-disk structures.
      
      This means we always log the entirity of the superblock, but again
      because we rarely modify the superblock this is not an issue for log
      bandwidth or CPU time. Indeed, if we do log the superblock
      frequently, delayed logging will minimise the impact of this
      overhead.
      
      [Fixed gquota/pquota inode sharing regression noticed by bfoster.]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      4d11a402
    • J
      xfs: Remove some pointless quota checks · a3942700
      Jan Kara 提交于
      xfs_fs_get_xstate() and xfs_fs_get_xstatev() check whether there's quota
      running before calling xfs_qm_scall_getqstat() or
      xfs_qm_scall_getqstatv(). Thus we are certain that superblock supports
      quota and xfs_sb_version_hasquota() check is pointless. Similarly we
      know that when quota is running, mp->m_quotainfo will be allocated.
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      a3942700
    • J
      xfs: Remove some useless flags tests · fbf64b3d
      Jan Kara 提交于
      'flags' have XFS_ALL_QUOTA_ACCT cleared immediately on function entry.
      There's no point in checking these bits later in the function. Also
      because we check something is going to change, we know some enforcement
      bits are being added and thus there's no point in testing that later.
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      fbf64b3d
    • J
      xfs: Remove useless test · 8a2fdd4a
      Jan Kara 提交于
      Q_XQUOTARM is never passed to xfs_fs_set_xstate() so remove the test.
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      8a2fdd4a
  10. 09 1月, 2015 7 次提交
  11. 24 12月, 2014 2 次提交
新手
引导
客服 返回
顶部