• V
    memcg: accounting for objects allocated for new netdevice · 425b9c7f
    Vasily Averin 提交于
    Creating a new netdevice allocates at least ~50Kb of memory for various
    kernel objects, but only ~5Kb of them are accounted to memcg. As a result,
    creating an unlimited number of netdevice inside a memcg-limited container
    does not fall within memcg restrictions, consumes a significant part
    of the host's memory, can cause global OOM and lead to random kills of
    host processes.
    
    The main consumers of non-accounted memory are:
     ~10Kb   80+ kernfs nodes
     ~6Kb    ipv6_add_dev() allocations
      6Kb    __register_sysctl_table() allocations
      4Kb    neigh_sysctl_register() allocations
      4Kb    __devinet_sysctl_register() allocations
      4Kb    __addrconf_sysctl_register() allocations
    
    Accounting of these objects allows to increase the share of memcg-related
    memory up to 60-70% (~38Kb accounted vs ~54Kb total for dummy netdevice
    on typical VM with default Fedora 35 kernel) and this should be enough
    to somehow protect the host from misuse inside container.
    
    Other related objects are quite small and may not be taken into account
    to minimize the expected performance degradation.
    
    It should be separately mentonied ~300 bytes of percpu allocation
    of struct ipstats_mib in snmp6_alloc_dev(), on huge multi-cpu nodes
    it can become the main consumer of memory.
    
    This patch does not enables kernfs accounting as it affects
    other parts of the kernel and should be discussed separately.
    However, even without kernfs, this patch significantly improves the
    current situation and allows to take into account more than half
    of all netdevice allocations.
    Signed-off-by: NVasily Averin <vvs@openvz.org>
    Acked-by: NLuis Chamberlain <mcgrof@kernel.org>
    Link: https://lore.kernel.org/r/354a0a5f-9ec3-a25c-3215-304eab2157bc@openvz.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
    425b9c7f
proc_sysctl.c 47.9 KB