- 13 2月, 2015 1 次提交
-
-
由 Vladimir Davydov 提交于
Kmem accounting of memcg is unusable now, because it lacks slab shrinker support. That means when we hit the limit we will get ENOMEM w/o any chance to recover. What we should do then is to call shrink_slab, which would reclaim old inode/dentry caches from this cgroup. This is what this patch set is intended to do. Basically, it does two things. First, it introduces the notion of per-memcg slab shrinker. A shrinker that wants to reclaim objects per cgroup should mark itself as SHRINKER_MEMCG_AWARE. Then it will be passed the memory cgroup to scan from in shrink_control->memcg. For such shrinkers shrink_slab iterates over the whole cgroup subtree under the target cgroup and calls the shrinker for each kmem-active memory cgroup. Secondly, this patch set makes the list_lru structure per-memcg. It's done transparently to list_lru users - everything they have to do is to tell list_lru_init that they want memcg-aware list_lru. Then the list_lru will automatically distribute objects among per-memcg lists basing on which cgroup the object is accounted to. This way to make FS shrinkers (icache, dcache) memcg-aware we only need to make them use memcg-aware list_lru, and this is what this patch set does. As before, this patch set only enables per-memcg kmem reclaim when the pressure goes from memory.limit, not from memory.kmem.limit. Handling memory.kmem.limit is going to be tricky due to GFP_NOFS allocations, and it is still unclear whether we will have this knob in the unified hierarchy. This patch (of 9): NUMA aware slab shrinkers use the list_lru structure to distribute objects coming from different NUMA nodes to different lists. Whenever such a shrinker needs to count or scan objects from a particular node, it issues commands like this: count = list_lru_count_node(lru, sc->nid); freed = list_lru_walk_node(lru, sc->nid, isolate_func, isolate_arg, &sc->nr_to_scan); where sc is an instance of the shrink_control structure passed to it from vmscan. To simplify this, let's add special list_lru functions to be used by shrinkers, list_lru_shrink_count() and list_lru_shrink_walk(), which consolidate the nid and nr_to_scan arguments in the shrink_control structure. This will also allow us to avoid patching shrinkers that use list_lru when we make shrink_slab() per-memcg - all we will have to do is extend the shrink_control structure to include the target memcg and make list_lru_shrink_{count,walk} handle this appropriately. Signed-off-by: NVladimir Davydov <vdavydov@parallels.com> Suggested-by: NDave Chinner <david@fromorbit.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Greg Thelen <gthelen@google.com> Cc: Glauber Costa <glommer@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 2月, 2015 1 次提交
-
-
由 Kirill A. Shutemov 提交于
Nobody uses it anymore. [akpm@linux-foundation.org: fix filemap_xip.c] Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 2月, 2015 1 次提交
-
-
由 Dave Chinner 提交于
The commit 2d3d0c53 ("xfs: lobotomise xfs_trans_read_buf_map()") left a landmine in the tracing code: trace_xfs_trans_buf_read() is now call on all buffers that are read through this interface rather than just buffers in transactions. For buffers outside transaction context, bp->b_fspriv is null, and so the buf log item tracing functions cannot be called. This causes a NULL pointer dereference in the trace_xfs_trans_buf_read() function when tracing is turned on. cc: <stable@vger.kernel.org> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 06 2月, 2015 1 次提交
-
-
由 Eric Sandeen 提交于
Normally, a statfs syscall reports m_maxicount as f_files (total file nodes in file system) because it is supposed to be the upper limit for dynamically-allocated inodes. It's possible, however, to overshoot imaxpct / m_maxicount. If this happens, we should report the actual number of allocated inodes, which is contained in sb_icount. Add one more adjustment to the statfs code to make this happen. Reported-by: NAlexander Tsvetkov <alexander.tsvetkov@oracle.com> Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 05 2月, 2015 2 次提交
-
-
由 kbuild test robot 提交于
fs/xfs/xfs_ioctl.c:1146:1: sparse: symbol 'xfs_ioctl_setattr_check_projid' was not declared. Should it be static? Also fix xfs_ioctl_setattr_check_extsize at the same time. Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
Growfs updates the secondary superblocks using synchronous unlogged buffer writes after committing the updates to the primary superblock. Mark the transaction to the primary superblock as synchronous so that we guarantee it is committed to disk before we update the secondary superblocks. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 02 2月, 2015 12 次提交
-
-
由 Iustin Pop 提交于
Currently, the ioctl handling code for XFS_IOC_FSSETXATTR treats all targets as regular files: it refuses to change the extent size if extents are allocated. This is wrong for directories, as there the extent size is only used as a default for children. The patch fixes this issue and improves validation of flag combinations: - only disallow extent size changes after extents have been allocated for regular files - only allow XFS_XFLAG_EXTSIZE for regular files - only allow XFS_XFLAG_EXTSZINHERIT for directories - automatically clear the flags if the extent size is zero Thanks to Dave Chinner for guidance on the proper fix for this issue. [dchinner: ported changes onto cleanup series. Makes changes clear and obvious.] [dchinner: added comments documenting validity checking rules.] Signed-off-by: NIustin Pop <iustin@k1024.org> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
The project ID change checking is one of the few remaining open coded checks in xfs_ioctl_setattr(). Factor it into a helper function so that the setattr code mostly becomes a flow of check and action helpers, making it easier to read and follow. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
The extent size hint change checking is fairly complex, so isolate that into it's own function. This simplifies the logic flow of the setattr code, making it easier to read. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Currently XFS_IOCTL_SETXATTR will fail if run in a user namespace as it it not allowed to change project IDs. The current code, however, also prevents any other change being made as well, so things like extent size hints cannot be set in user namespaces. This is wrong, so only disallow access to project IDs and related flags from inside the init namespace. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Now there is only one caller to xfs_ioctl_setattr that uses all the functionality of the function we can kill the behviour mask and start cleaning up the code. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
xfs_ioctl_setxflags doesn't need all of the functionailty in xfs_ioctl_setattr() and now we have separate helper functions that share the checks and modifications that xfs_ioctl_setxflags requires. Hence disaggregate it from xfs_ioctl_setattr() to allow further work to be done on xfs_ioctl_setattr. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
The setup of the transaction is done after a random smattering of checks and before another bunch of ioperations specific validity checks. Pull all the preamble out into a helper function that returns a transaction or error. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
The setting of the extended flags is down through two separate interfaces, but they are munged together into xfs_ioctl_setattr and make that function far more complex than it needs to be. Separate it out into a helper function along with all the other common inode changes and transaction manipulations in xfs_ioctl_setattr(). Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
It is set if the filp is set ot non-blocking, but the flag is not used anywhere. Hence we can kill it. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
Back in the days when the direct I/O ->end_io callback could be called from interrupt context for AIO we needed a structure to hand off to the workqueue, and reused the ioend structure for this purpose. These days ->end_io is always called from user or workqueue context, which allows us to avoid this memory allocation and simplify the code significantly. [dchinner: removed now unused xfs_finish_ioend_sync() function after Brian Foster did an initial review. ] Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Wang, Yalin 提交于
Change kmem_free to use kvfree() generic function, remove the duplicated code. Signed-off-by: NYalin Wang <yalin.wang@sonymobile.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
This logic is duplicated in xfs_file_fallocate and xfs_ioc_space, and we'll need another copy of it for pNFS block support. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 30 1月, 2015 1 次提交
-
-
由 Jan Kara 提交于
Split ->set_xstate callback into two callbacks - one for turning quotas on (->quota_enable) and one for turning quotas off (->quota_disable). That way we don't have to pass quotactl command into the callback which seems cleaner. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJan Kara <jack@suse.cz>
-
- 28 1月, 2015 1 次提交
-
-
由 Jan Kara 提交于
Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which tracks space limits and usage in 512-byte blocks. However VFS quotas track usage in bytes (as some filesystems require that) and we need to somehow pass this information. Upto now it wasn't a problem because we didn't do any unit conversion (thus VFS quota routines happily stuck number of bytes into d_bcount field of struct fd_disk_quota). Only if you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA / Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone tried this but reportedly some Samba users hit the problem in practice. So when we want interfaces compatible we need to fix this. We bite the bullet and define another quota structure used for passing information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have to have more conversion routines in fs/quota/quota.c and another copying of quota structure slows down getting of quota information by about 2% but it seems cleaner than overloading e.g. units of d_bcount to bytes. CC: stable@vger.kernel.org Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJan Kara <jack@suse.cz>
-
- 22 1月, 2015 11 次提交
-
-
由 Brian Foster 提交于
xfs_compat_attrmulti_by_handle() calls memdup_user() which returns a negative error code. The error code is negated by the caller and thus incorrectly converted to a positive error code. Remove the error negation such that the negative error is passed correctly back up to userspace. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
When the superblock is modified in a transaction, the commonly modified fields are not actually copied to the superblock buffer to avoid the buffer lock becoming a serialisation point. However, there are some other operations that modify the superblock fields within the transaction that don't directly log to the superblock but rely on the changes to be applied during the transaction commit (to minimise the buffer lock hold time). When we do this, we fail to mark the buffer log item as being a superblock buffer and that can lead to the buffer not being marked with the corect type in the log and hence causing recovery issues. Fix it by setting the type correctly, similar to xfs_mod_sb()... cc: <stable@vger.kernel.org> # 3.10 to current Tested-by: NJan Kara <jack@suse.cz> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Conversion from local to extent format does not set the buffer type correctly on the new extent buffer when a symlink data is moved out of line. Fix the symlink code and leave a comment in the generic bmap code reminding us that the format-specific data copy needs to set the destination buffer type appropriately. cc: <stable@vger.kernel.org> # 3.10 to current Tested-by: NJan Kara <jack@suse.cz> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
This leads to log recovery throwing errors like: XFS (md0): Mounting V5 Filesystem XFS (md0): Starting recovery (logdev: internal) XFS (md0): Unknown buffer type 0! XFS (md0): _xfs_buf_ioapply: no ops on block 0xaea8802/0x1 ffff8800ffc53800: 58 41 47 49 ..... Which is the AGI buffer magic number. Ensure that we set the type appropriately in both unlink list addition and removal. cc: <stable@vger.kernel.org> # 3.10 to current Tested-by: NJan Kara <jack@suse.cz> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Jan Kara reported that log recovery was finding buffers with invalid types in them. This should not happen, and indicates a bug in the logging of buffers. To catch this, add asserts to the buffer formatting code to ensure that the buffer type is in range when the transaction is committed. We don't set a type on buffers being marked stale - they are not going to get replayed, the format item exists only for recovery to be able to prevent replay of the buffer, so the type does not matter. Hence that needs special casing here. cc: <stable@vger.kernel.org> # 3.10 to current Reported-by: NJan Kara <jack@suse.cz> Tested-by: NJan Kara <jack@suse.cz> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
We currently have to ensure that every time we update sb_features2 that we update sb_bad_features2. Now that we log and format the superblock in it's entirety we actually don't have to care because we can simply update the sb_bad_features2 when we format it into the buffer. This removes the need for anything but the mount and superblock formatting code to care about sb_bad_features2, and hence removes the possibility that we forget to update bad_features2 when necessary in the future. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
We now have several superblock loggin functions that are identical except for the transaction reservation and whether it shoul dbe a synchronous transaction or not. Consolidate these all into a single function, a single reserveration and a sync flag and call it xfs_sync_sb(). Also, xfs_mod_sb() is not really a modification function - it's the operation of logging the superblock buffer. hence change the name of it to reflect this. Note that we have to change the mp->m_update_flags that are passed around at mount time to a boolean simply to indicate a superblock update is needed. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
When we log changes to the superblock, we first have to write them to the on-disk buffer, and then log that. Right now we have a complex bitfield based arrangement to only write the modified field to the buffer before we log it. This used to be necessary as a performance optimisation because we logged the superblock buffer in every extent or inode allocation or freeing, and so performance was extremely important. We haven't done this for years, however, ever since the lazy superblock counters pulled the superblock logging out of the transaction commit fast path. Hence we have a bunch of complexity that is not necessary that makes writing the in-core superblock to disk much more complex than it needs to be. We only need to log the superblock now during management operations (e.g. during mount, unmount or quota control operations) so it is not a performance critical path anymore. As such, remove the complex field based logging mechanism and replace it with a simple conversion function similar to what we use for all other on-disk structures. This means we always log the entirity of the superblock, but again because we rarely modify the superblock this is not an issue for log bandwidth or CPU time. Indeed, if we do log the superblock frequently, delayed logging will minimise the impact of this overhead. [Fixed gquota/pquota inode sharing regression noticed by bfoster.] Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Jan Kara 提交于
xfs_fs_get_xstate() and xfs_fs_get_xstatev() check whether there's quota running before calling xfs_qm_scall_getqstat() or xfs_qm_scall_getqstatv(). Thus we are certain that superblock supports quota and xfs_sb_version_hasquota() check is pointless. Similarly we know that when quota is running, mp->m_quotainfo will be allocated. Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Jan Kara 提交于
'flags' have XFS_ALL_QUOTA_ACCT cleared immediately on function entry. There's no point in checking these bits later in the function. Also because we check something is going to change, we know some enforcement bits are being added and thus there's no point in testing that later. Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Jan Kara 提交于
Q_XQUOTARM is never passed to xfs_fs_set_xstate() so remove the test. Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
- 09 1月, 2015 7 次提交
-
-
由 Nicholas Mc Guire 提交于
try_wait_for_completion returns bool so the wrapper function xfs_dqflock_nowait should probably also return bool and not int. Signed-off-by: NNicholas Mc Guire <der.herr@hofr.at> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
The code is already ready for it, and the pnfs layout commit code expects to be able to pass a larger than 32-bit argument. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
xfsbufd_centisecs and age_buffer_centisecs were due for removal in 3.14. We forgot to do that - it's now well past time to remove these deprecated, unused sysctls. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
This function is used libxfs code, but is implemented separately in userspace. Move the function prototype to xfs_bmap.h so that the prototype is shared even if the implementations aren't. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
It no long is used for stack splits, so strip the kernel workqueue bits from it and push it back into libxfs/xfs_bmap.h so that it can be shared with the userspace code. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
The types used by the core XFS code are common between kernel and userspace. xfs_types.h is duplicated in both kernel and userspace, so move it to libxfs along with all the other shared code. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Ioctl API definitions are shared with userspace, so move the header file that defines them all to libxfs along with all the other code shared with userspace. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 24 12月, 2014 2 次提交
-
-
由 Jan Kara 提交于
Currently when we modify sb_features2, we store the same value also in sb_bad_features2. However in most places we forget to mark field sb_bad_features2 for logging and thus it can happen that a change to it is lost. This results in an inconsistent sb_features2 and sb_bad_features2 fields e.g. after xfstests test xfs/187. Fix the problem by changing XFS_SB_FEATURES2 to actually mean both sb_features2 and sb_bad_features2 fields since this is always what we want to log. This isn't ideal because the fact that XFS_SB_FEATURES2 means two fields could cause some problem in future however the code is hopefully less error prone that it is now. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
xfs_warn() and friends add a newline by default, but some messages add another one. Particularly for the failing write message below, this can waste a lot of console real estate! Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-