• Y
    mm, fadvise: improve the expensive remote LRU cache draining after FADV_DONTNEED · eb1d7a65
    Yafang Shao 提交于
    Our users reported that there're some random latency spikes when their RT
    process is running.  Finally we found that latency spike is caused by
    FADV_DONTNEED.  Which may call lru_add_drain_all() to drain LRU cache on
    remote CPUs, and then waits the per-cpu work to complete.  The wait time
    is uncertain, which may be tens millisecond.
    
    That behavior is unreasonable, because this process is bound to a specific
    CPU and the file is only accessed by itself, IOW, there should be no
    pagecache pages on a per-cpu pagevec of a remote CPU.  That unreasonable
    behavior is partially caused by the wrong comparation of the number of
    invalidated pages and the number of the target.  For example,
    
            if (count < (end_index - start_index + 1))
    
    The count above is how many pages were invalidated in the local CPU, and
    (end_index - start_index + 1) is how many pages should be invalidated.
    The usage of (end_index - start_index + 1) is incorrect, because they are
    virtual addresses, which may not mapped to pages.  Besides that, there may
    be holes between start and end.  So we'd better check whether there are
    still pages on per-cpu pagevec after drain the local cpu, and then decide
    whether or not to call lru_add_drain_all().
    
    After I applied it with a hotfix to our production environment, most of
    the lru_add_drain_all() can be avoided.
    Suggested-by: NMel Gorman <mgorman@suse.de>
    Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Acked-by: NMel Gorman <mgorman@suse.de>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Link: https://lkml.kernel.org/r/20200923133318.14373-1-laoar.shao@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    eb1d7a65
truncate.c 27.3 KB