- 10 1月, 2009 6 次提交
-
-
由 Takashi Sato 提交于
Currently, ext3 in mainline Linux doesn't have the freeze feature which suspends write requests. So, we cannot take a backup which keeps the filesystem's consistency with the storage device's features (snapshot and replication) while it is mounted. In many case, a commercial filesystem (e.g. VxFS) has the freeze feature and it would be used to get the consistent backup. If Linux's standard filesystem ext3 has the freeze feature, we can do it without a commercial filesystem. So I have implemented the ioctls of the freeze feature. I think we can take the consistent backup with the following steps. 1. Freeze the filesystem with the freeze ioctl. 2. Separate the replication volume or create the snapshot with the storage device's feature. 3. Unfreeze the filesystem with the unfreeze ioctl. 4. Take the backup from the separated replication volume or the snapshot. This patch: VFS: Changed the type of write_super_lockfs and unlockfs from "void" to "int" so that they can return an error. Rename write_super_lockfs and unlockfs of the super block operation freeze_fs and unfreeze_fs to avoid a confusion. ext3, ext4, xfs, gfs2, jfs: Changed the type of write_super_lockfs and unlockfs from "void" to "int" so that write_super_lockfs returns an error if needed, and unlockfs always returns 0. reiserfs: Changed the type of write_super_lockfs and unlockfs from "void" to "int" so that they always return 0 (success) to keep a current behavior. Signed-off-by: NTakashi Sato <t-sato@yk.jp.nec.com> Signed-off-by: NMasayuki Hamaguchi <m-hamaguchi@ys.jp.nec.com> Cc: <xfs-masters@oss.sgi.com> Cc: <linux-ext4@vger.kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Brownell 提交于
Kernels that don't support ELF coredumps at all surely can't be supporting new partial-segment flavored ELF coredumps ... don't make folk answer Kconfig questions about that flavor. Signed-off-by: NDavid Brownell <dbrownell@users.sourceforge.net> Acked-by: NRoland McGrath <roland@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Arjan van de Ven 提交于
let the core of this one bake in -next as well, but leave some of the infrastructure in place. Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
-
由 Artem Bityutskiy 提交于
'rb_prev()', 'rb_next()' and 'rb_replace_node()' are declared in include/linux/rbtree.h, no need for JFFS2 to re-declare them. I believe these are left-overs from the old days when the common RB tree code did not have those call and JFFS2 had private implementation. Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
-
由 Neil Brown 提交于
Neil writes: Hi Jens, I've found a little bug for you. It was introduced by a6f23657 block: add one-hit cache for disk partition lookup and has the effect of killing my machine whenever I try to assemble an md array :-( One of the devices in the array has partitions, and mdadm always deletes partitions before putting a whole-device in an array (as it can cause confusion). The next IO to that device locks the machine. I don't really understand exactly why it locks up, but it happens in disk_map_sector_rcu(). This patch fixes it. Which is due to a missing clear of the (now) stale partition lookup data. So clear that when we delete a partition. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Chris Mason 提交于
Each subvolume has an extent_state_tree used to mark metadata that needs to be sent to disk while syncing the tree. This is used in addition to the dirty bits on the pages themselves so that a single subvolume can be sent to disk efficiently in disk order. Normally this marking happens in btrfs_alloc_free_block, which also does special recording of dirty tree blocks for the tree log roots. Yan Zheng noticed that when the root of the log tree is allocated, it is added to the wrong writeback list. The fix used here is to explicitly set it dirty as part of tree log creation. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 09 1月, 2009 22 次提交
-
-
由 NeilBrown 提交于
Currently md devices, once created, never disappear until the module is unloaded. This is essentially because the gendisk holds a reference to the mddev, and the mddev holds a reference to the gendisk, this a circular reference. If we drop the reference from mddev to gendisk, then we need to ensure that the mddev is destroyed when the gendisk is destroyed. However it is not possible to hook into the gendisk destruction process to enable this. So we drop the reference from the gendisk to the mddev and destroy the gendisk when the mddev gets destroyed. However this has a complication. Between the call __blkdev_get->get_gendisk->kobj_lookup->md_probe and the call __blkdev_get->md_open there is no obvious way to hold a reference on the mddev any more, so unless something is done, it will disappear and gendisk will be destroyed prematurely. Also, once we decide to destroy the mddev, there will be an unlockable moment before the gendisk is unlinked (blk_unregister_region) during which a new reference to the gendisk can be created. We need to ensure that this reference can not be used. i.e. the ->open must fail. So: 1/ in md_probe we set a flag in the mddev (hold_active) which indicates that the array should be treated as active, even though there are no references, and no appearance of activity. This is cleared by md_release when the device is closed if it is no longer needed. This ensures that the gendisk will survive between md_probe and md_open. 2/ In md_open we check if the mddev we expect to open matches the gendisk that we did open. If there is a mismatch we return -ERESTARTSYS and modify __blkdev_get to retry from the top in that case. In the -ERESTARTSYS sys case we make sure to wait until the old gendisk (that we succeeded in opening) is really gone so we loop at most once. Some udev configurations will always open an md device when it first appears. If we allow an md device that was just created by an open to disappear on an immediate close, then this can race with such udev configurations and result in an infinite loop the device being opened and closed, then re-open due to the 'ADD' even from the first open, and then close and so on. So we make sure an md device, once created by an open, remains active at least until some md 'ioctl' has been made on it. This means that all normal usage of md devices will allow them to disappear promptly when not needed, but the worst that an incorrect usage will do it cause an inactive md device to be left in existence (it can easily be removed). As an array can be stopped by writing to a sysfs attribute echo clear > /sys/block/mdXXX/md/array_state we need to use scheduled work for deleting the gendisk and other kobjects. This allows us to wait for any pending gendisk deletion to complete by simply calling flush_scheduled_work(). Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Coly Li 提交于
When I review ocfs2 code, find there are 2 typos to "successfull". After doing grep "successfull " in kernel tree, 22 typos found totally -- great minds always think alike :) This patch fixes all the similar typos. Thanks for Randy's ack and comments. Signed-off-by: NColy Li <coyli@suse.de> Acked-by: NRandy Dunlap <randy.dunlap@oracle.com> Acked-by: NRoland Dreier <rolandd@cisco.com> Cc: Jeremy Kerr <jk@ozlabs.org> Cc: Jeff Garzik <jeff@garzik.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wu Fengguang 提交于
Use the new generic implementation. Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wu Fengguang 提交于
Use the new generic implementation. Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wu Fengguang 提交于
Use the new generic implementation. Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Fernando Carrijo 提交于
Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: NTheodore Ts'o <tytso@mit.edu> Acked-by: NMark Fasheh <mfasheh@suse.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Cc: James Morris <jmorris@namei.org> Acked-by: NCasey Schaufler <casey@schaufler-ca.com> Acked-by: NTakashi Iwai <tiwai@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 roel kluin 提交于
romfs_strnlen() returns int unsigned X >= 0 is always true [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Nroel kluin <roel.kluin@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Magnus Damm 提交于
Remove the saved_max_pfn check from the /proc/vmcore function read_from_oldmem(). No need to verify, we should be able to just trust that "elfcorehdr=" is correctly passed to the crash kernel on the kernel command line like we do with other parameters. The read_from_oldmem() function in fs/proc/vmcore.c is quite similar to read_from_oldmem() in drivers/char/mem.c, but only in the latter it makes sense to use saved_max_pfn. For oldmem it is used to determine when to stop reading. For vmcore we already have the elf header info pointing out the physical memory regions, no need to pass the end-of- old-memory twice. Removing the saved_max_pfn check from vmcore makes it possible for architectures to skip oldmem but still support crash dump through vmcore - without the need for the old saved_max_pfn cruft. Architectures that want to play safe can do the saved_max_pfn check in copy_oldmem_page(). Not sure why anyone would want to do that, but that's even safer than today - the saved_max_pfn check in vmcore removed by this patch only checks the first page. Signed-off-by: NMagnus Damm <damm@igel.co.jp> Acked-by: NVivek Goyal <vgoyal@redhat.com> Acked-by: NSimon Horman <horms@verge.net.au> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
While discussing[1] the need for glibc to have access to random bytes during program load, it seems that an earlier attempt to implement AT_RANDOM got stalled. This implements a random 16 byte string, available to every ELF program via a new auxv AT_RANDOM vector. [1] http://sourceware.org/ml/libc-alpha/2008-10/msg00006.html Ulrich said: glibc needs right after startup a bit of random data for internal protections (stack canary etc). What is now in upstream glibc is that we always unconditionally open /dev/urandom, read some data, and use it. For every process startup. That's slow. ... The solution is to provide a limited amount of random data to the starting process in the aux vector. I suggested 16 bytes and this is what the patch implements. If we need only 16 bytes or less we use the data directly. If we need more we'll use the 16 bytes to see a PRNG. This avoids the costly /dev/urandom use and it allows the kernel to use the most adequate source of random data for this purpose. It might not be the same pool as that for /dev/urandom. Concerns were expressed about the depletion of the randomness pool. But this patch doesn't make the situation worse, it doesn't deplete entropy more than happens now. Signed-off-by: NKees Cook <kees.cook@canonical.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
A big patch for changing memcg's LRU semantics. Now, - page_cgroup is linked to mem_cgroup's its own LRU (per zone). - LRU of page_cgroup is not synchronous with global LRU. - page and page_cgroup is one-to-one and statically allocated. - To find page_cgroup is on what LRU, you have to check pc->mem_cgroup as - lru = page_cgroup_zoneinfo(pc, nid_of_pc, zid_of_pc); - SwapCache is handled. And, when we handle LRU list of page_cgroup, we do following. pc = lookup_page_cgroup(page); lock_page_cgroup(pc); .....................(1) mz = page_cgroup_zoneinfo(pc); spin_lock(&mz->lru_lock); .....add to LRU spin_unlock(&mz->lru_lock); unlock_page_cgroup(pc); But (1) is spin_lock and we have to be afraid of dead-lock with zone->lru_lock. So, trylock() is used at (1), now. Without (1), we can't trust "mz" is correct. This is a trial to remove this dirty nesting of locks. This patch changes mz->lru_lock to be zone->lru_lock. Then, above sequence will be written as spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU mem_cgroup_add/remove/etc_lru() { pc = lookup_page_cgroup(page); mz = page_cgroup_zoneinfo(pc); if (PageCgroupUsed(pc)) { ....add to LRU } spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU This is much simpler. (*) We're safe even if we don't take lock_page_cgroup(pc). Because.. 1. When pc->mem_cgroup can be modified. - at charge. - at account_move(). 2. at charge the PCG_USED bit is not set before pc->mem_cgroup is fixed. 3. at account_move() the page is isolated and not on LRU. Pros. - easy for maintenance. - memcg can make use of laziness of pagevec. - we don't have to duplicated LRU/Active/Unevictable bit in page_cgroup. - LRU status of memcg will be synchronized with global LRU's one. - # of locks are reduced. - account_move() is simplified very much. Cons. - may increase cost of LRU rotation. (no impact if memcg is not configured.) Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
do_set_dqblk() allowed SETDQBLK quotactl to set user's grace time even if user was not above his softlimit. This does not make much sence and by coincidence causes quota code to omit softlimit warning when user really exceeds softlimit. This patch makes do_set_dqblk() reset user's grace time if he has not exceeded softlimit. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Richard A. Holden III 提交于
Fix fs/coda/sysctl.c:14: warning: 'fs_table_header' defined but not used fs/coda/sysctl.c:44: warning: 'fs_table' defined but not used these are only used when CONFIG_SYSCTL is defined. Signed-off-by: NRichard A. Holden III <aciddeath@gmail.com> Cc: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Randy Dunlap 提交于
Remove excess kernel-doc from fs/jbd/transaction.c: Warning(linux-2.6.28-git5//fs/jbd/transaction.c:764): Excess function parameter 'credits' description in 'journal_get_write_access' Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Duane Griffin 提交于
At the moment there are few restrictions on which flags may be set on which inodes. Specifically DIRSYNC may only be set on directories and IMMUTABLE and APPEND may not be set on links. Tighten that to disallow TOPDIR being set on non-directories and only NODUMP and NOATIME to be set on non-regular file, non-directories. Introduces a flags masking function which masks flags based on mode and use it during inode creation and when flags are set via the ioctl to facilitate future consistency. Signed-off-by: NDuane Griffin <duaneg@dghda.com> Acked-by: NAndreas Dilger <adilger@sun.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Duane Griffin 提交于
At present INDEX is the only flag that new ext3 inodes do NOT inherit from their parent. In addition prevent the flags DIRTY, ECOMPR, IMAGIC and TOPDIR from being inherited. List inheritable flags explicitly to prevent future flags from accidentally being inherited. This fixes the TOPDIR flag inheritance bug reported at http://bugzilla.kernel.org/show_bug.cgi?id=9866. Signed-off-by: NDuane Griffin <duaneg@dghda.com> Acked-by: NAndreas Dilger <adilger@sun.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pekka Enberg 提交于
As spotted by kmemtrace, struct ext3_sb_info is 17152 bytes on 64-bit which makes it a very bad fit for SLAB allocators. The culprit of the wasted memory is ->s_blockgroup_lock which can be as big as 16 KB when NR_CPUS >= 32. To fix that, allocate ->s_blockgroup_lock, which fits nicely in a order 2 page in the worst case, separately. This shinks down struct ext3_sb_info enough to fit a 1 KB slab cache so now we allocate 16 KB + 1 KB instead of 32 KB saving 15 KB of memory. Acked-by: NAndreas Dilger <adilger@sun.com> Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Josef Bacik 提交于
There is a flaw with the way jbd handles fsync batching. If we fsync() a file and we were not the last person to run fsync() on this fs then we automatically sleep for 1 jiffie in order to wait for new writers to join into the transaction before forcing the commit. The problem with this is that with really fast storage (ie a Clariion) the time it takes to commit a transaction to disk is way faster than 1 jiffie in most cases, so sleeping means waiting longer with nothing to do than if we just committed the transaction and kept going. Ric Wheeler noticed this when using fs_mark with more than 1 thread, the throughput would plummet as he added more threads. This patch attempts to fix this problem by recording the average time in nanoseconds that it takes to commit a transaction to disk, and what time we started the transaction. If we run an fsync() and we have been running for less time than it takes to commit the transaction to disk, we sleep for the delta amount of time and then commit to disk. We acheive sub-jiffie sleeping using schedule_hrtimeout. This means that the wait time is auto-tuned to the speed of the underlying disk, instead of having this static timeout. I weighted the average according to somebody's comments (Andreas Dilger I think) in order to help normalize random outliers where we take way longer or way less time to commit than the average. I also have a min() check in there to make sure we don't sleep longer than a jiffie in case our storage is super slow, this was requested by Andrew. I unfortunately do not have access to a Clariion, so I had to use a ramdisk to represent a super fast array. I tested with a SATA drive with barrier=1 to make sure there was no regression with local disks, I tested with a 4 way multipathed Apple Xserve RAID array and of course the ramdisk. I ran the following command fs_mark -d /mnt/ext3-test -s 4096 -n 2000 -D 64 -t $i where $i was 2, 4, 8, 16 and 32. I mkfs'ed the fs each time. Here are my results type threads with patch without patch sata 2 24.6 26.3 sata 4 49.2 48.1 sata 8 70.1 67.0 sata 16 104.0 94.1 sata 32 153.6 142.7 xserve 2 246.4 222.0 xserve 4 480.0 440.8 xserve 8 829.5 730.8 xserve 16 1172.7 1026.9 xserve 32 1816.3 1650.5 ramdisk 2 2538.3 1745.6 ramdisk 4 2942.3 661.9 ramdisk 8 2882.5 999.8 ramdisk 16 2738.7 1801.9 ramdisk 32 2541.9 2394.0 Signed-off-by: NJosef Bacik <jbacik@redhat.com> Cc: Andreas Dilger <adilger@sun.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ric Wheeler <rwheeler@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Duane Griffin 提交于
At the moment there are few restrictions on which flags may be set on which inodes. Specifically DIRSYNC may only be set on directories and IMMUTABLE and APPEND may not be set on links. Tighten that to disallow TOPDIR being set on non-directories and only NODUMP and NOATIME to be set on non-regular file, non-directories. Introduces a flags masking function which masks flags based on mode and use it during inode creation and when flags are set via the ioctl to facilitate future consistency. Signed-off-by: NDuane Griffin <duaneg@dghda.com> Acked-by: NAndreas Dilger <adilger@sun.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Duane Griffin 提交于
At present BTREE/INDEX is the only flag that new ext2 inodes do NOT inherit from their parent. In addition prevent the flags DIRTY, ECOMPR, INDEX, IMAGIC and TOPDIR from being inherited. List inheritable flags explicitly to prevent future flags from accidentally being inherited. This fixes the TOPDIR flag inheritance bug reported at http://bugzilla.kernel.org/show_bug.cgi?id=9866. Signed-off-by: NDuane Griffin <duaneg@dghda.com> Acked-by: NAndreas Dilger <adilger@sun.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pekka J Enberg 提交于
As spotted by kmemtrace, struct ext2_sb_info is 17024 bytes on 64-bit which makes it a very bad fit for SLAB allocators. The culprit of the wasted memory is ->s_blockgroup_lock which can be as big as 16 KB when NR_CPUS >= 32. To fix that, allocate ->s_blockgroup_lock, which fits nicely in a order 2 page in the worst case, separately. This shinks down struct ext2_sb_info enough to fit a 1 KB slab cache so now we allocate 16 KB + 1 KB instead of 32 KB saving 15 KB of memory. Acked-by: NAndreas Dilger <adilger@sun.com> Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Qinghuang Feng 提交于
There is no argument named @chain in ext2_splice_branch, remove references to it. Signed-off-by: NQinghuang Feng <qhfeng.kernel@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dave Kleikamp 提交于
sync_filesystems() shouldn't be calling async_synchronize_full_special while holding a spinlock. The second while loop in that function is the right place for this anyway. Signed-off-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Reported-by: NGrissiom <chaos.proton@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 1月, 2009 12 次提交
-
-
由 David Howells 提交于
Stop the FLAT binfmt from attempting to expand the userspace stack and brk segments to fill the space actually allocated for it. The space allocated may be rounded up by mmap(), and may be wasted. However, finding out how much space we actually obtained uses the contentious kobjsize() function which we'd like to get rid of as it doesn't necessarily work for all slab allocators. Signed-off-by: NDavid Howells <dhowells@redhat.com> Tested-by: NMike Frysinger <vapier.adi@gmail.com> Acked-by: NPaul Mundt <lethal@linux-sh.org>
-
由 David Howells 提交于
Stop the ELF-FDPIC binfmt from attempting to expand the userspace stack and brk segments to fill the space actually allocated for it. The space allocated may be rounded up by mmap(), and may be wasted. However, finding out how much space we actually obtained uses the contentious kobjsize() function which we'd like to get rid of as it doesn't necessarily work for all slab allocators. Signed-off-by: NDavid Howells <dhowells@redhat.com> Tested-by: NMike Frysinger <vapier.adi@gmail.com> Acked-by: NPaul Mundt <lethal@linux-sh.org>
-
由 David Howells 提交于
Improve procfs output using per-MM VMAs for process memory accounting. Signed-off-by: NDavid Howells <dhowells@redhat.com> Tested-by: NMike Frysinger <vapier.adi@gmail.com> Acked-by: NPaul Mundt <lethal@linux-sh.org>
-
由 David Howells 提交于
Make VMAs per mm_struct as for MMU-mode linux. This solves two problems: (1) In SYSV SHM where nattch for a segment does not reflect the number of shmat's (and forks) done. (2) In mmap() where the VMA's vm_mm is set to point to the parent mm by an exec'ing process when VM_EXECUTABLE is specified, regardless of the fact that a VMA might be shared and already have its vm_mm assigned to another process or a dead process. A new struct (vm_region) is introduced to track a mapped region and to remember the circumstances under which it may be shared and the vm_list_struct structure is discarded as it's no longer required. This patch makes the following additional changes: (1) Regions are now allocated with alloc_pages() rather than kmalloc() and with no recourse to __GFP_COMP, so the pages are not composite. Instead, each page has a reference on it held by the region. Anything else that is interested in such a page will have to get a reference on it to retain it. When the pages are released due to unmapping, each page is passed to put_page() and will be freed when the page usage count reaches zero. (2) Excess pages are trimmed after an allocation as the allocation must be made as a power-of-2 quantity of pages. (3) VMAs are added to the parent MM's R/B tree and mmap lists. As an MM may end up with overlapping VMAs within the tree, the VMA struct address is appended to the sort key. (4) Non-anonymous VMAs are now added to the backing inode's prio list. (5) Holes may be punched in anonymous VMAs with munmap(), releasing parts of the backing region. The VMA and region structs will be split if necessary. (6) sys_shmdt() only releases one attachment to a SYSV IPC shared memory segment instead of all the attachments at that addresss. Multiple shmat()'s return the same address under NOMMU-mode instead of different virtual addresses as under MMU-mode. (7) Core dumping for ELF-FDPIC requires fewer exceptions for NOMMU-mode. (8) /proc/maps is now the global list of mapped regions, and may list bits that aren't actually mapped anywhere. (9) /proc/meminfo gains a line (tagged "MmapCopy") that indicates the amount of RAM currently allocated by mmap to hold mappable regions that can't be mapped directly. These are copies of the backing device or file if not anonymous. These changes make NOMMU mode more similar to MMU mode. The downside is that NOMMU mode requires some extra memory to track things over NOMMU without this patch (VMAs are no longer shared, and there are now region structs). Signed-off-by: NDavid Howells <dhowells@redhat.com> Tested-by: NMike Frysinger <vapier.adi@gmail.com> Acked-by: NPaul Mundt <lethal@linux-sh.org>
-
由 David Howells 提交于
Fix cleanup handling in ramfs_nommu_get_umapped_area() by only freeing the number of pages that find_get_pages() said it had returned (nr) rather than attempting to free the number of pages we asked for (lpages) - thus avoiding the situation whereby put_page() may be handed NULL pointers if find_get_pages() returned fewer pages that were requested. Also avoid a warning about nr being uninitialised and the need for an if-statement in the cleanup path by using appropriate gotos. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 Chris Mason 提交于
This is already in the arch specific directories in mainline and shouldn't be copied into btrfs. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Benny Halevy 提交于
refactor the nfs4 server lock code to use last_byte_offset to compute the last byte covered by the lock. Check for overflow so that the last byte is set to NFS4_MAX_UINT64 if offset + len wraps around. Also, use NFS4_MAX_UINT64 for ~(u64)0 where appropriate. Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 Marc Eshel 提交于
Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 Benny Halevy 提交于
There's no use for nfs4_cb_null_ops's declaration in fs/nfsd/nfs4callback.c Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 Benny Halevy 提交于
Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 Dean Hildebrand 提交于
Signed-off-by: NDean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 Steve Dickson 提交于
When determining the fsid_type in fh_compose(), the setting of the FID via fsid= export option needs to take precedence over using the UUID device id. Signed-off-by: NSteve Dickson <steved@redhat.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-