• M
    netvm: prevent a stream-specific deadlock · c76562b6
    Mel Gorman 提交于
    This patch series is based on top of "Swap-over-NBD without deadlocking
    v15" as it depends on the same reservation of PF_MEMALLOC reserves logic.
    
    When a user or administrator requires swap for their application, they
    create a swap partition and file, format it with mkswap and activate it
    with swapon.  In diskless systems this is not an option so if swap if
    required then swapping over the network is considered.  The two likely
    scenarios are when blade servers are used as part of a cluster where the
    form factor or maintenance costs do not allow the use of disks and thin
    clients.
    
    The Linux Terminal Server Project recommends the use of the Network Block
    Device (NBD) for swap but this is not always an option.  There is no
    guarantee that the network attached storage (NAS) device is running Linux
    or supports NBD.  However, it is likely that it supports NFS so there are
    users that want support for swapping over NFS despite any performance
    concern.  Some distributions currently carry patches that support swapping
    over NFS but it would be preferable to support it in the mainline kernel.
    
    Patch 1 avoids a stream-specific deadlock that potentially affects TCP.
    
    Patch 2 is a small modification to SELinux to avoid using PFMEMALLOC
    	reserves.
    
    Patch 3 adds three helpers for filesystems to handle swap cache pages.
    	For example, page_file_mapping() returns page->mapping for
    	file-backed pages and the address_space of the underlying
    	swap file for swap cache pages.
    
    Patch 4 adds two address_space_operations to allow a filesystem
    	to pin all metadata relevant to a swapfile in memory. Upon
    	successful activation, the swapfile is marked SWP_FILE and
    	the address space operation ->direct_IO is used for writing
    	and ->readpage for reading in swap pages.
    
    Patch 5 notes that patch 3 is bolting
    	filesystem-specific-swapfile-support onto the side and that
    	the default handlers have different information to what
    	is available to the filesystem. This patch refactors the
    	code so that there are generic handlers for each of the new
    	address_space operations.
    
    Patch 6 adds an API to allow a vector of kernel addresses to be
    	translated to struct pages and pinned for IO.
    
    Patch 7 adds support for using highmem pages for swap by kmapping
    	the pages before calling the direct_IO handler.
    
    Patch 8 updates NFS to use the helpers from patch 3 where necessary.
    
    Patch 9 avoids setting PF_private on PG_swapcache pages within NFS.
    
    Patch 10 implements the new swapfile-related address_space operations
    	for NFS and teaches the direct IO handler how to manage
    	kernel addresses.
    
    Patch 11 prevents page allocator recursions in NFS by using GFP_NOIO
    	where appropriate.
    
    Patch 12 fixes a NULL pointer dereference that occurs when using
    	swap-over-NFS.
    
    With the patches applied, it is possible to mount a swapfile that is on an
    NFS filesystem.  Swap performance is not great with a swap stress test
    taking roughly twice as long to complete than if the swap device was
    backed by NBD.
    
    This patch: netvm: prevent a stream-specific deadlock
    
    It could happen that all !SOCK_MEMALLOC sockets have buffered so much data
    that we're over the global rmem limit.  This will prevent SOCK_MEMALLOC
    buffers from receiving data, which will prevent userspace from running,
    which is needed to reduce the buffered data.
    
    Fix this by exempting the SOCK_MEMALLOC sockets from the rmem limit.  Once
    this change it applied, it is important that sockets that set
    SOCK_MEMALLOC do not clear the flag until the socket is being torn down.
    If this happens, a warning is generated and the tokens reclaimed to avoid
    accounting errors until the bug is fixed.
    
    [davem@davemloft.net: Warning about clearing SOCK_MEMALLOC]
    Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
    Signed-off-by: NMel Gorman <mgorman@suse.de>
    Acked-by: NDavid S. Miller <davem@davemloft.net>
    Acked-by: NRik van Riel <riel@redhat.com>
    Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
    Cc: Neil Brown <neilb@suse.de>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: Mike Christie <michaelc@cs.wisc.edu>
    Cc: Eric B Munson <emunson@mgebm.net>
    Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
    Cc: Mel Gorman <mgorman@suse.de>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    c76562b6
tcp_input.c 173.9 KB