- 04 4月, 2009 1 次提交
-
-
由 Kumar Gala 提交于
Commit f4112de6 ("mm: introduce debug_kmap_atomic") broke PPC builds with CONFIG_HIGHMEM=y: CC init/main.o In file included from include/linux/highmem.h:25, from include/linux/pagemap.h:11, from include/linux/mempolicy.h:63, from init/main.c:53: arch/powerpc/include/asm/highmem.h: In function 'kmap_atomic_prot': arch/powerpc/include/asm/highmem.h:98: error: implicit declaration of function 'debug_kmap_atomic' In file included from include/linux/pagemap.h:11, from include/linux/mempolicy.h:63, from init/main.c:53: include/linux/highmem.h: At top level: include/linux/highmem.h:196: warning: conflicting types for 'debug_kmap_atomic' include/linux/highmem.h:196: error: static declaration of 'debug_kmap_atomic' follows non-static declaration include/asm/highmem.h:98: error: previous implicit declaration of 'debug_kmap_atomic' was here make[1]: *** [init/main.o] Error 1 make: *** [init] Error 2 Signed-off-by: NKumar Gala <galak@kernel.crashing.org> Acked-by: NAkinobu Mita <akinobu.mita@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 4月, 2009 39 次提交
-
-
由 David Howells 提交于
nfs_readpage_async() needs to be non-static so that it can be used as a fallback for the local on-disk caching should an EIO crop up when reading the cache. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSteve Dickson <steved@redhat.com> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>
-
由 David Howells 提交于
Add some new NFS I/O counters for FS-Cache doing things for NFS. A new line is emitted into /proc/pid/mountstats if caching is enabled that looks like: fsc: <rok> <rfl> <wok> <wfl> <unc> Where <rok> is the number of pages read successfully from the cache, <rfl> is the number of failed page reads against the cache, <wok> is the number of successful page writes to the cache, <wfl> is the number of failed page writes to the cache, and <unc> is the number of NFS pages that have been disconnected from the cache. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSteve Dickson <steved@redhat.com> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>
-
由 David Howells 提交于
Bind data storage objects in the local cache to NFS inodes. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSteve Dickson <steved@redhat.com> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>
-
由 David Howells 提交于
Define and create superblock-level cache index objects (as managed by nfs_server structs). Each superblock object is created in a server level index object and is itself an index into which inode-level objects are inserted. Ideally there would be one superblock-level object per server, and the former would be folded into the latter; however, since the "nosharecache" option exists this isn't possible. The superblock object key is a sequence consisting of: (1) Certain superblock s_flags. (2) Various connection parameters that serve to distinguish superblocks for sget(). (3) The volume FSID. (4) The security flavour. (5) The uniquifier length. (6) The uniquifier text. This is normally an empty string, unless the fsc=xyz mount option was used to explicitly specify a uniquifier. The key blob is of variable length, depending on the length of (6). The superblock object is given no coherency data to carry in the auxiliary data permitted by the cache. It is assumed that the superblock is always coherent. This patch also adds uniquification handling such that two otherwise identical superblocks, at least one of which is marked "nosharecache", won't end up trying to share the on-disk cache. It will be possible to manually provide a uniquifier through a mount option with a later patch to avoid the error otherwise produced. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSteve Dickson <steved@redhat.com> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>
-
由 David Howells 提交于
Define and create server-level cache index objects (as managed by nfs_client structs). Each server object is created in the NFS top-level index object and is itself an index into which superblock-level objects are inserted. Ideally there would be one superblock-level object per server, and the former would be folded into the latter; however, since the "nosharecache" option exists this isn't possible. The server object key is a sequence consisting of: (1) NFS version (2) Server address family (eg: AF_INET or AF_INET6) (3) Server port. (4) Server IP address. The key blob is of variable length, depending on the length of (4). The server object is given no coherency data to carry in the auxiliary data permitted by the cache. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSteve Dickson <steved@redhat.com> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>
-
由 David Howells 提交于
Add FS-Cache option bit to nfs_server struct. This is set to indicate local on-disk caching is enabled for a particular superblock. Also add debug bit for local caching operations. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSteve Dickson <steved@redhat.com> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>
-
由 David Howells 提交于
Add a function to install a monitor on the page lock waitqueue for a particular page, thus allowing the page being unlocked to be detected. This is used by CacheFiles to detect read completion on a page in the backing filesystem so that it can then copy the data to the waiting netfs page. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSteve Dickson <steved@redhat.com> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Acked-by: NRik van Riel <riel@redhat.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>
-
由 David Howells 提交于
Implement the data I/O part of the FS-Cache netfs API. The documentation and API header file were added in a previous patch. This patch implements the following functions for the netfs to call: (*) fscache_attr_changed(). Indicate that the object has changed its attributes. The only attribute currently recorded is the file size. Only pages within the set file size will be stored in the cache. This operation is submitted for asynchronous processing, and will return immediately. It will return -ENOMEM if an out of memory error is encountered, -ENOBUFS if the object is not actually cached, or 0 if the operation is successfully queued. (*) fscache_read_or_alloc_page(). (*) fscache_read_or_alloc_pages(). Request data be fetched from the disk, and allocate internal metadata to track the netfs pages and reserve disk space for unknown pages. These operations perform semi-asynchronous data reads. Upon returning they will indicate which pages they think can be retrieved from disk, and will have set in progress attempts to retrieve those pages. These will return, in order of preference, -ENOMEM on memory allocation error, -ERESTARTSYS if a signal interrupted proceedings, -ENODATA if one or more requested pages are not yet cached, -ENOBUFS if the object is not actually cached or if there isn't space for future pages to be cached on this object, or 0 if successful. In the case of the multipage function, the pages for which reads are set in progress will be removed from the list and the page count decreased appropriately. If any read operations should fail, the completion function will be given an error, and will also be passed contextual information to allow the netfs to fall back to querying the server for the absent pages. For each successful read, the page completion function will also be called. Any pages subsequently tracked by the cache will have PG_fscache set upon them on return. fscache_uncache_page() must be called for such pages. If supplied by the netfs, the mark_pages_cached() cookie op will be invoked for any pages now tracked. (*) fscache_alloc_page(). Allocate internal metadata to track a netfs page and reserve disk space. This will return -ENOMEM on memory allocation error, -ERESTARTSYS on signal, -ENOBUFS if the object isn't cached, or there isn't enough space in the cache, or 0 if successful. Any pages subsequently tracked by the cache will have PG_fscache set upon them on return. fscache_uncache_page() must be called for such pages. If supplied by the netfs, the mark_pages_cached() cookie op will be invoked for any pages now tracked. (*) fscache_write_page(). Request data be stored to disk. This may only be called on pages that have been read or alloc'd by the above three functions and have not yet been uncached. This will return -ENOMEM on memory allocation error, -ERESTARTSYS on signal, -ENOBUFS if the object isn't cached, or there isn't immediately enough space in the cache, or 0 if successful. On a successful return, this operation will have queued the page for asynchronous writing to the cache. The page will be returned with PG_fscache_write set until the write completes one way or another. The caller will not be notified if the write fails due to an I/O error. If that happens, the object will become available and all pending writes will be aborted. Note that the cache may batch up page writes, and so it may take a while to get around to writing them out. The caller must assume that until PG_fscache_write is cleared the page is use by the cache. Any changes made to the page may be reflected on disk. The page may even be under DMA. (*) fscache_uncache_page(). Indicate that the cache should stop tracking a page previously read or alloc'd from the cache. If the page was alloc'd only, but unwritten, it will not appear on disk. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSteve Dickson <steved@redhat.com> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>
-
由 David Howells 提交于
Implement the cookie management part of the FS-Cache netfs client API. The documentation and API header file were added in a previous patch. This patch implements the following three functions: (1) fscache_acquire_cookie(). Acquire a cookie to represent an object to the netfs. If the object in question is a non-index object, then that object and its parent indices will be created on disk at this point if they don't already exist. Index creation is deferred because an index may reside in multiple caches. (2) fscache_relinquish_cookie(). Retire or release a cookie previously acquired. At this point, the object on disk may be destroyed. (3) fscache_update_cookie(). Update the in-cache representation of a cookie. This is used to update the auxiliary data for coherency management purposes. With this patch it is possible to have a netfs instruct a cache backend to look up, validate and create metadata on disk and to destroy it again. The ability to actually store and retrieve data in the objects so created is added in later patches. Note that these functions will never return an error. _All_ errors are handled internally to FS-Cache. The worst that can happen is that fscache_acquire_cookie() may return a NULL pointer - which is considered a negative cookie pointer and can be passed back to any function that takes a cookie without harm. A negative cookie pointer merely suppresses caching at that level. The stub in linux/fscache.h will detect inline the negative cookie pointer and abort the operation as fast as possible. This means that the compiler doesn't have to set up for a call in that case. See the documentation in Documentation/filesystems/caching/netfs-api.txt for more information. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSteve Dickson <steved@redhat.com> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>
-
由 David Howells 提交于
Add functions to register and unregister a network filesystem or other client of the FS-Cache service. This allocates and releases the cookie representing the top-level index for a netfs, and makes it available to the netfs. If the FS-Cache facility is disabled, then the calls are optimised away at compile time. Note that whilst this patch may appear to work with FS-Cache enabled and a netfs attempting to use it, it will leak the cookie it allocates for the netfs as fscache_relinquish_cookie() is implemented in a later patch. This will cause the slab code to emit a warning when the module is removed. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSteve Dickson <steved@redhat.com> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>
-
由 David Howells 提交于
Implement two features of FS-Cache: (1) The ability to request and release cache tags - names by which a cache may be known to a netfs, and thus selected for use. (2) An internal function by which a cache is selected by consulting the netfs, if the netfs wishes to be consulted. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSteve Dickson <steved@redhat.com> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>
-
由 David Howells 提交于
Make FS-Cache create its /proc interface and present various statistical information through it. Also provide the functions for updating this information. These features are enabled by: CONFIG_FSCACHE_PROC CONFIG_FSCACHE_STATS CONFIG_FSCACHE_HISTOGRAM The /proc directory for FS-Cache is also exported so that caching modules can add their own statistics there too. The FS-Cache module is loadable at this point, and the statistics files can be examined by userspace: cat /proc/fs/fscache/stats cat /proc/fs/fscache/histogram Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSteve Dickson <steved@redhat.com> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>
-
由 David Howells 提交于
Add the API for a generic facility (FS-Cache) by which caches may declare them selves open for business, and may obtain work to be done from network filesystems. The header file is included by: #include <linux/fscache-cache.h> Documentation for the API is also added to: Documentation/filesystems/caching/backend-api.txt This API is not usable without the implementation of the utility functions which will be added in further patches. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSteve Dickson <steved@redhat.com> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>
-
由 David Howells 提交于
Add the API for a generic facility (FS-Cache) by which filesystems (such as AFS or NFS) may call on local caching capabilities without having to know anything about how the cache works, or even if there is a cache: +---------+ | | +--------------+ | NFS |--+ | | | | | +-->| CacheFS | +---------+ | +----------+ | | /dev/hda5 | | | | | +--------------+ +---------+ +-->| | | | | | |--+ | AFS |----->| FS-Cache | | | | |--+ +---------+ +-->| | | | | | | +--------------+ +---------+ | +----------+ | | | | | | +-->| CacheFiles | | ISOFS |--+ | /var/cache | | | +--------------+ +---------+ General documentation and documentation of the netfs specific API are provided in addition to the header files. As this patch stands, it is possible to build a filesystem against the facility and attempt to use it. All that will happen is that all requests will be immediately denied as if no cache is present. Further patches will implement the core of the facility. The facility will transfer requests from networking filesystems to appropriate caches if possible, or else gracefully deny them. If this facility is disabled in the kernel configuration, then all its operations will trivially reduce to nothing during compilation. WHY NOT I_MAPPING? ================== I have added my own API to implement caching rather than using i_mapping to do this for a number of reasons. These have been discussed a lot on the LKML and CacheFS mailing lists, but to summarise the basics: (1) Most filesystems don't do hole reportage. Holes in files are treated as blocks of zeros and can't be distinguished otherwise, making it difficult to distinguish blocks that have been read from the network and cached from those that haven't. (2) The backing inode must be fully populated before being exposed to userspace through the main inode because the VM/VFS goes directly to the backing inode and does not interrogate the front inode's VM ops. Therefore: (a) The backing inode must fit entirely within the cache. (b) All backed files currently open must fit entirely within the cache at the same time. (c) A working set of files in total larger than the cache may not be cached. (d) A file may not grow larger than the available space in the cache. (e) A file that's open and cached, and remotely grows larger than the cache is potentially stuffed. (3) Writes go to the backing filesystem, and can only be transferred to the network when the file is closed. (4) There's no record of what changes have been made, so the whole file must be written back. (5) The pages belong to the backing filesystem, and all metadata associated with that page are relevant only to the backing filesystem, and not anything stacked atop it. OVERVIEW ======== FS-Cache provides (or will provide) the following facilities: (1) Caches can be added / removed at any time, even whilst in use. (2) Adds a facility by which tags can be used to refer to caches, even if they're not available yet. (3) More than one cache can be used at once. Caches can be selected explicitly by use of tags. (4) The netfs is provided with an interface that allows either party to withdraw caching facilities from a file (required for (1)). (5) A netfs may annotate cache objects that belongs to it. This permits the storage of coherency maintenance data. (6) Cache objects will be pinnable and space reservations will be possible. (7) The interface to the netfs returns as few errors as possible, preferring rather to let the netfs remain oblivious. (8) Cookies are used to represent indices, files and other objects to the netfs. The simplest cookie is just a NULL pointer - indicating nothing cached there. (9) The netfs is allowed to propose - dynamically - any index hierarchy it desires, though it must be aware that the index search function is recursive, stack space is limited, and indices can only be children of indices. (10) Indices can be used to group files together to reduce key size and to make group invalidation easier. The use of indices may make lookup quicker, but that's cache dependent. (11) Data I/O is effectively done directly to and from the netfs's pages. The netfs indicates that page A is at index B of the data-file represented by cookie C, and that it should be read or written. The cache backend may or may not start I/O on that page, but if it does, a netfs callback will be invoked to indicate completion. The I/O may be either synchronous or asynchronous. (12) Cookies can be "retired" upon release. At this point FS-Cache will mark them as obsolete and the index hierarchy rooted at that point will get recycled. (13) The netfs provides a "match" function for index searches. In addition to saying whether a match was made or not, this can also specify that an entry should be updated or deleted. FS-Cache maintains a virtual index tree in which all indices, files, objects and pages are kept. Bits of this tree may actually reside in one or more caches. FSDEF | +------------------------------------+ | | NFS AFS | | +--------------------------+ +-----------+ | | | | homedir mirror afs.org redhat.com | | | +------------+ +---------------+ +----------+ | | | | | | 00001 00002 00007 00125 vol00001 vol00002 | | | | | +---+---+ +-----+ +---+ +------+------+ +-----+----+ | | | | | | | | | | | | | PG0 PG1 PG2 PG0 XATTR PG0 PG1 DIRENT DIRENT DIRENT R/W R/O Bak | | PG0 +-------+ | | 00001 00003 | +---+---+ | | | PG0 PG1 PG2 In the example above, two netfs's can be seen to be backed: NFS and AFS. These have different index hierarchies: (*) The NFS primary index will probably contain per-server indices. Each server index is indexed by NFS file handles to get data file objects. Each data file objects can have an array of pages, but may also have further child objects, such as extended attributes and directory entries. Extended attribute objects themselves have page-array contents. (*) The AFS primary index contains per-cell indices. Each cell index contains per-logical-volume indices. Each of volume index contains up to three indices for the read-write, read-only and backup mirrors of those volumes. Each of these contains vnode data file objects, each of which contains an array of pages. The very top index is the FS-Cache master index in which individual netfs's have entries. Any index object may reside in more than one cache, provided it only has index children. Any index with non-index object children will be assumed to only reside in one cache. The FS-Cache overview can be found in: Documentation/filesystems/caching/fscache.txt The netfs API to FS-Cache can be found in: Documentation/filesystems/caching/netfs-api.txt Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSteve Dickson <steved@redhat.com> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>
-
由 David Howells 提交于
Recruit a page flag to aid in cache management. The following extra flag is defined: (1) PG_fscache (PG_private_2) The marked page is backed by a local cache and is pinning resources in the cache driver. If PG_fscache is set, then things that checked for PG_private will now also check for that. This includes things like truncation and page invalidation. The function page_has_private() had been added to make the checks for both PG_private and PG_private_2 at the same time. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSteve Dickson <steved@redhat.com> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Acked-by: NRik van Riel <riel@redhat.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>
-
由 David Howells 提交于
The attached patch causes read_cache_pages() to release page-private data on a page for which add_to_page_cache() fails. If the filler function fails, then the problematic page is left attached to the pagecache (with appropriate flags set, one presumes) and the remaining to-be-attached pages are invalidated and discarded. This permits pages with caching references associated with them to be cleaned up. The invalidatepage() address space op is called (indirectly) to do the honours. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSteve Dickson <steved@redhat.com> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Acked-by: NRik van Riel <riel@redhat.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>
-
由 David Howells 提交于
Document the slow work thread pool. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSteve Dickson <steved@redhat.com> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>
-
由 David Howells 提交于
Make the slow work pool configurable through /proc/sys/kernel/slow-work. (*) /proc/sys/kernel/slow-work/min-threads The minimum number of threads that should be in the pool as long as it is in use. This may be anywhere between 2 and max-threads. (*) /proc/sys/kernel/slow-work/max-threads The maximum number of threads that should in the pool. This may be anywhere between min-threads and 255 or NR_CPUS * 2, whichever is greater. (*) /proc/sys/kernel/slow-work/vslow-percentage The percentage of active threads in the pool that may be used to execute very slow work items. This may be between 1 and 99. The resultant number is bounded to between 1 and one fewer than the number of active threads. This ensures there is always at least one thread that can process very slow work items, and always at least one thread that won't. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSerge Hallyn <serue@us.ibm.com> Acked-by: NSteve Dickson <steved@redhat.com> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>
-
由 David Howells 提交于
Create a dynamically sized pool of threads for doing very slow work items, such as invoking mkdir() or rmdir() - things that may take a long time and may sleep, holding mutexes/semaphores and hogging a thread, and are thus unsuitable for workqueues. The number of threads is always at least a settable minimum, but more are started when there's more work to do, up to a limit. Because of the nature of the load, it's not suitable for a 1-thread-per-CPU type pool. A system with one CPU may well want several threads. This is used by FS-Cache to do slow caching operations in the background, such as looking up, creating or deleting cache objects. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSerge Hallyn <serue@us.ibm.com> Acked-by: NSteve Dickson <steved@redhat.com> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>
-
由 Theodore Ts'o 提交于
In data=writeback mode, start an asynchronous flush when closing a file which had been previously truncated down to zero. This lowers the probability of data loss in the case of applications that attempt to replace a file using truncate. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Robin Holt 提交于
Pass the original flags to rwlock arch-code, so that it can re-enable interrupts if implemented for that architecture. Initially, make __raw_read_lock_flags and __raw_write_lock_flags stubs which just do the same thing as non-flags variants. Signed-off-by: NPetr Tesarik <ptesarik@suse.cz> Signed-off-by: NRobin Holt <holt@sgi.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Acked-by: NIngo Molnar <mingo@elte.hu> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Robin Holt 提交于
SGI has observed that on large systems, interrupts are not serviced for a long period of time when waiting for a rwlock. The following patch series re-enables irqs while waiting for the lock, resembling the code which is already there for spinlocks. I only made the ia64 version, because the patch adds some overhead to the fast path. I assume there is currently no demand to have this for other architectures, because the systems are not so large. Of course, the possibility to implement raw_{read|write}_lock_flags for any architecture is still there. This patch: The new macro LOCK_CONTENDED_FLAGS expands to the correct implementation depending on the config options, so that IRQ's are re-enabled when possible, but they remain disabled if CONFIG_LOCKDEP is set. Signed-off-by: NPetr Tesarik <ptesarik@suse.cz> Signed-off-by: NRobin Holt <holt@sgi.com> Cc: <linux-arch@vger.kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gerd Hoffmann 提交于
This patch adds preadv and pwritev system calls. These syscalls are a pretty straightforward combination of pread and readv (same for write). They are quite useful for doing vectored I/O in threaded applications. Using lseek+readv instead opens race windows you'll have to plug with locking. Other systems have such system calls too, for example NetBSD, check here: http://www.daemon-systems.org/man/preadv.2.html The application-visible interface provided by glibc should look like this to be compatible to the existing implementations in the *BSD family: ssize_t preadv(int d, const struct iovec *iov, int iovcnt, off_t offset); ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset); This prototype has one problem though: On 32bit archs is the (64bit) offset argument unaligned, which the syscall ABI of several archs doesn't allow to do. At least s390 needs a wrapper in glibc to handle this. As we'll need a wrappers in glibc anyway I've decided to push problem to glibc entriely and use a syscall prototype which works without arch-specific wrappers inside the kernel: The offset argument is explicitly splitted into two 32bit values. The patch sports the actual system call implementation and the windup in the x86 system call tables. Other archs follow as separate patches. Signed-off-by: NGerd Hoffmann <kraxel@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <linux-api@vger.kernel.org> Cc: <linux-arch@vger.kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Neil Horman 提交于
It would be nice to be able to extract the dmesg log from a vmcore file without needing to keep the debug symbols for the running kernel handy all the time. We have a facility to do this in /proc/vmcore. This patch adds the log_buf and log_end symbols to the vmcoreinfo area so that tools (like makedumpfile) can easily extract the dmesg logs from a vmcore image. [akpm@linux-foundation.org: several fixes and cleanups] [akpm@linux-foundation.org: fix unused log_buf_kexec_setup()] [akpm@linux-foundation.org: build fix] Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> Cc: Simon Horman <horms@verge.net.au> Acked-by: NVivek Goyal <vgoyal@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Simon Horman <horms@verge.net.au> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Harry Ciao 提交于
Add the PCI Device ID of the PCI Bridge Controller on AMD8111 chip. Signed-off-by: NHarry Ciao <qingtao.cao@windriver.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
We are wasting 2 words in signal_struct without any reason to implement task_pgrp_nr() and task_session_nr(). task_session_nr() has no callers since 2e2ba22e, we can remove it. task_pgrp_nr() is still (I believe wrongly) used in fs/autofsX and fs/coda. This patch reimplements task_pgrp_nr() via task_pgrp_nr_ns(), and kills __pgrp/__session and the related helpers. The change in drivers/char/tty_io.c is cosmetic, but hopefully makes sense anyway. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: Alan Cox <number6@the-village.bc.nu> [tty parts] Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Inho, the safety rules for vnr/nr_ns helpers are horrible and buggy. task_pid_nr_ns(task) needs rcu/tasklist depending on task == current. As for "special" pids, vnr/nr_ns helpers always need rcu. However, if task != current, they are unsafe even under rcu lock, we can't trust task->group_leader without the special checks. And almost every helper has a callsite which needs a fix. Also, it is a bit annoying that the implementations of, say, task_pgrp_vnr() and task_pgrp_nr_ns() are not "symmetrical". This patch introduces the new helper, __task_pid_nr_ns(), which is always safe to use, and turns all other helpers into the trivial wrappers. After this I'll send another patch which converts task_tgid_xxx() as well, they're are a bit special. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Louis Rilling <Louis.Rilling@kerlabs.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Even if task == current, it is not safe to dereference the result of task_pgrp/task_session. We can race with another thread which changes the special pid via setpgid/setsid. Document this. The next 2 patches give an example of the unsafe usage, we have more bad users. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Louis Rilling <Louis.Rilling@kerlabs.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Paul Fulghum 提交于
Add support for x8 asynchronous sample rate and ability to specify base clock frequency. Signed-off-by: NPaul Fulghum <paulkf@microgate.com> Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
Signed-off-by: NKirill A. Shutemov <kirill@shutemov.name> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lai Jiangshan 提交于
cpuhotplug_mutex_lock() is not used, remove it. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Rusty Russell <rusty@rustcorp.com.au> Acked-by: NGautham R Shenoy <ego@in.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Now that task_detached() is exported, change tracehook_notify_death() to use this helper, nobody else checks ->exit_signal == -1 by hand. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Metzger, Markus T" <markus.t.metzger@intel.com> Acked-by: NRoland McGrath <roland@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
By discussion with Roland. - Rename ptrace_exit() to exit_ptrace(), and change it to do all the necessary work with ->ptraced list by its own. - Move this code from exit.c to ptrace.c - Update the comment in ptrace_detach() to explain the rechecking of the child->ptrace. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Metzger, Markus T" <markus.t.metzger@intel.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
When ptrace_detach() takes tasklist, the tracee can be SIGKILL'ed. If it has already passed exit_notify() we can leak a zombie, because a) ptracing disables the auto-reaping logic, and b) ->real_parent was not notified about the child's death. ptrace_detach() should follow the ptrace_exit's logic, change the code accordingly. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Roland McGrath <roland@redhat.com> Tested-by: NDenys Vlasenko <dvlasenk@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Container-init must behave like global-init to processes within the container and hence it must be immune to unhandled fatal signals from within the container (i.e SIG_DFL signals that terminate the process). But the same container-init must behave like a normal process to processes in ancestor namespaces and so if it receives the same fatal signal from a process in ancestor namespace, the signal must be processed. Implementing these semantics requires that send_signal() determine pid namespace of the sender but since signals can originate from workqueues/ interrupt-handlers, determining pid namespace of sender may not always be possible or safe. This patchset implements the design/simplified semantics suggested by Oleg Nesterov. The simplified semantics for container-init are: - container-init must never be terminated by a signal from a descendant process. - container-init must never be immune to SIGKILL from an ancestor namespace (so a process in parent namespace must always be able to terminate a descendant container). - container-init may be immune to unhandled fatal signals (like SIGUSR1) even if they are from ancestor namespace. SIGKILL/SIGSTOP are the only reliable signals to a container-init from ancestor namespace. This patch: Based on an earlier patch submitted by Oleg Nesterov and comments from Roland McGrath (http://lkml.org/lkml/2008/11/19/258). The handler parameter is currently unused in the tracehook functions. Besides, the tracehook functions are called with siglock held, so the functions can check the handler if they later need to. Removing the parameter simiplifies changes to sig_ignored() in a follow-on patch. Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: NRoland McGrath <roland@redhat.com> Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Daniel Lezcano <daniel.lezcano@free.fr> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
The cpuset_zone_allowed() variants are actually only a function of the zone's node. Cc: Paul Menage <menage@google.com> Acked-by: NChristoph Lameter <cl@linux-foundation.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NDavid Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Li Zefan 提交于
We need to pass some data to test_task() or process_task() in some cases. Will be used later. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Menage <menage@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
Try to use CSS ID for records in swap_cgroup. By this, on 64bit machine, size of swap_cgroup goes down to 2 bytes from 8bytes. This means, when 2GB of swap is equipped, (assume the page size is 4096bytes) From size of swap_cgroup = 2G/4k * 8 = 4Mbytes. To size of swap_cgroup = 2G/4k * 2 = 1Mbytes. Reduction is large. Of course, there are trade-offs. This CSS ID will add overhead to swap-in/swap-out/swap-free. But in general, - swap is a resource which the user tend to avoid use. - If swap is never used, swap_cgroup area is not used. - Reading traditional manuals, size of swap should be proportional to size of memory. Memory size of machine is increasing now. I think reducing size of swap_cgroup makes sense. Note: - ID->CSS lookup routine has no locks, it's under RCU-Read-Side. - memcg can be obsolete at rmdir() but not freed while refcnt from swap_cgroup is available. Changelog v4->v5: - reworked on to memcg-charge-swapcache-to-proper-memcg.patch Changlog ->v4: - fixed not configured case. - deleted unnecessary comments. - fixed NULL pointer bug. - fixed message in dmesg. [nishimura@mxp.nes.nec.co.jp: css_tryget can be called twice in !PageCgroupUsed case] Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KOSAKI Motohiro 提交于
commit 4f98a2fe (vmscan: split LRU lists into anon & file sets) removed mem_cgroup_reclaim_imbalance(), but there are some leftovers in memcontrol.h. Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-