• R
    writeback, cgroup: do not switch inodes with I_WILL_FREE flag · 4ade5867
    Roman Gushchin 提交于
    Patch series "cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups", v9.
    
    When an inode is getting dirty for the first time it's associated with a
    wb structure (see __inode_attach_wb()).  It can later be switched to
    another wb (if e.g.  some other cgroup is writing a lot of data to the
    same inode), but otherwise stays attached to the original wb until being
    reclaimed.
    
    The problem is that the wb structure holds a reference to the original
    memory and blkcg cgroups.  So if an inode has been dirty once and later is
    actively used in read-only mode, it has a good chance to pin down the
    original memory and blkcg cgroups forever.  This is often the case with
    services bringing data for other services, e.g.  updating some rpm
    packages.
    
    In the real life it becomes a problem due to a large size of the memcg
    structure, which can easily be 1000x larger than an inode.  Also a really
    large number of dying cgroups can raise different scalability issues, e.g.
    making the memory reclaim costly and less effective.
    
    To solve the problem inodes should be eventually detached from the
    corresponding writeback structure.  It's inefficient to do it after every
    writeback completion.  Instead it can be done whenever the original memory
    cgroup is offlined and writeback structure is getting killed.  Scanning
    over a (potentially long) list of inodes and detach them from the
    writeback structure can take quite some time.  To avoid scanning all
    inodes, attached inodes are kept on a new list (b_attached).  To make it
    less noticeable to a user, the scanning and switching is performed from a
    work context.
    
    Big thanks to Jan Kara, Dennis Zhou, Hillf Danton and Tejun Heo for their
    ideas and contribution to this patchset.
    
    This patch (of 8):
    
    If an inode's state has I_WILL_FREE flag set, the inode will be freed
    soon, so there is no point in trying to switch the inode to a different
    cgwb.
    
    I_WILL_FREE was ignored since the introduction of the inode switching, so
    it looks like it doesn't lead to any noticeable issues for a user.  This
    is why the patch is not intended for a stable backport.
    
    Link: https://lkml.kernel.org/r/20210608230225.2078447-1-guro@fb.com
    Link: https://lkml.kernel.org/r/20210608230225.2078447-2-guro@fb.comSigned-off-by: NRoman Gushchin <guro@fb.com>
    Suggested-by: NJan Kara <jack@suse.cz>
    Acked-by: NTejun Heo <tj@kernel.org>
    Reviewed-by: NJan Kara <jack@suse.cz>
    Acked-by: NDennis Zhou <dennis@kernel.org>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Dave Chinner <dchinner@redhat.com>
    Cc: Jan Kara <jack@suse.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    4ade5867
fs-writeback.c 76.1 KB