- 19 10月, 2011 1 次提交
-
-
由 Thomas Meyer 提交于
When running a Fedora 15 (x86) on an x86_64 kernel, in the boot process plymouthd complains about those two missing ioctls: [ 2.581783] ioctl32(plymouthd:186): Unknown cmd fd(10) cmd(00005457){t:'T';sz:0} arg(ffb6a5d0) on /dev/tty1 [ 2.581803] ioctl32(plymouthd:186): Unknown cmd fd(10) cmd(00005456){t:'T';sz:0} arg(ffb6a680) on /dev/tty1 both ioctl functions work on the 'struct termios' resp. 'struct termios2', which has the same size (36 bytes resp. 44 bytes) on x86 and x86_64, so it's just a matter of converting the pointer from userland. Signed-off-by: NThomas Meyer <thomas@m3y3r.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: NAndrew Morton <akpm@google.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 23 9月, 2011 1 次提交
-
-
由 Søren Holm 提交于
The EFR (Enhenced-Features-Register) is located at a different offset than the other devices supporting UART_CAP_EFR. This change add a special setup quick to set UPF_EXAR_EFR on the port. UPF_EXAR_EFR is then used to the port type to PORT_XR17D15X since it is for sure a XR17D15X uart. Signed-off-by: NSøren Holm <sgh@sgh.dk> Acked-by: NAlan Cox <alan@linux.intel.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 26 8月, 2011 1 次提交
-
-
由 Jiri Slaby 提交于
We need this helper to fix system stalls. The issue is that the rest of the system TTYs wait for us to finish waiting. This wasn't an issue with BKL. BKL used to unlock implicitly. This is based on the Arnd suggestion. Signed-off-by: NJiri Slaby <jslaby@suse.cz> Acked-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 25 8月, 2011 2 次提交
-
-
由 Bernhard Roth 提交于
By default the atmel_serial driver in RS485 mode disables receiving data until all data in the send buffer has been sent. This flag allows to receive data even whilst sending data. Signed-off-by: NBernhard Roth <br@pwrnet.de> Signed-off-by: NClaudio Scordino <claudio@evidence.eu.com> Acked-by: NAlan Cox <alan@linux.intel.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Greg Kroah-Hartman 提交于
This reverts commit 6b1a98d1. It causes a build error that needs to be resolved differently. Cc: Alan Cox <alan@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jamie Iles <jamie@jamieiles.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 24 8月, 2011 6 次提交
-
-
由 Jamie Iles 提交于
The Synopsys DesignWare 8250 is an 8250 that has an extra interrupt that gets raised when writing to the LCR when busy. To handle this we need special serial_out, serial_in and handle_irq methods. Add a new function serial8250_use_designware_io() that configures a uart_port with these accessors. Cc: Alan Cox <alan@linux.intel.com> Acked-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NJamie Iles <jamie@jamieiles.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Jamie Iles 提交于
Now that platforms can override the port IRQ handler and the only user of these UPIO modes has been converted over, kill off UPIO_DWAPB and UPIO_DWAPB32. Acked-by: NAlan Cox <alan@linux.intel.com> Signed-off-by: NJamie Iles <jamie@jamieiles.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Jamie Iles 提交于
Some ports (e.g. Synopsys DesignWare 8250) have special requirements for handling the interrupts. Allow these platforms to specify their own interrupt handler that will override the default. serial8250_handle_irq() is provided so that platforms can extend the IRQ handler rather than completely replacing it. Signed-off-by: NJamie Iles <jamie@jamieiles.com> Acked-by: NAlan Cox <alan@linux.intel.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Jamie Iles 提交于
Some serial ports may have unusal requirements for interrupt handling (e.g. the Synopsys DesignWare 8250-alike port and it's busy detect interrupt). Add a .handle_irq callback that can be used for platforms to override the interrupt behaviour in a similar fashion to the .serial_out and .serial_in callbacks. Signed-off-by: NJamie Iles <jamie@jamieiles.com> Acked-by: NAlan Cox <alan@linux.intel.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Jiri Slaby 提交于
We used it really only serial and ami_serial. The rest of the callsites were BUG/WARN_ONs to check if BTM is held. Now that we pruned tty_locked from both of the real users, we can get rid of tty_lock along with __big_tty_mutex_owner. Signed-off-by: NJiri Slaby <jslaby@suse.cz> Acked-by: NArnd Bergmann <arnd@arndb.de> Cc: Alan Cox <alan@linux.intel.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Jiri Slaby 提交于
tty_wakeup can be called from any context. So there is no need to have an extra tasklet for calling that. Hence save some space and remove the tasklet completely. Signed-off-by: NJiri Slaby <jslaby@suse.cz> Cc: Alan Cox <alan@linux.intel.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 18 8月, 2011 2 次提交
-
-
由 Ian Campbell 提交于
This allows the cast in lowmem_page_address (introduced as a warning fixup to 33dd4e0e "mm: make some struct page's const") to be removed. Propagate const'ness to page_to_section() as well since it is required by __page_to_pfn. Signed-off-by: NIan Campbell <ian.campbell@citrix.com> Acked-by: NRik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ian Campbell 提交于
Followup to 33dd4e0e "mm: make some struct page's const" which missed the HASHED_PAGE_VIRTUAL case. Signed-off-by: NIan Campbell <ian.campbell@citrix.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 8月, 2011 1 次提交
-
-
由 Jeff Moyer 提交于
Commit ae1b1539, block: reimplement FLUSH/FUA to support merge, introduced a performance regression when running any sort of fsyncing workload using dm-multipath and certain storage (in our case, an HP EVA). The test I ran was fs_mark, and it dropped from ~800 files/sec on ext4 to ~100 files/sec. It turns out that dm-multipath always advertised flush+fua support, and passed commands on down the stack, where those flags used to get stripped off. The above commit changed that behavior: static inline struct request *__elv_next_request(struct request_queue *q) { struct request *rq; while (1) { - while (!list_empty(&q->queue_head)) { + if (!list_empty(&q->queue_head)) { rq = list_entry_rq(q->queue_head.next); - if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) || - (rq->cmd_flags & REQ_FLUSH_SEQ)) - return rq; - rq = blk_do_flush(q, rq); - if (rq) - return rq; + return rq; } Note that previously, a command would come in here, have REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush: struct request *blk_do_flush(struct request_queue *q, struct request *rq) { unsigned int fflags = q->flush_flags; /* may change, cache it */ bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA; bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH); bool do_postflush = has_flush && !has_fua && (rq->cmd_flags & REQ_FUA); unsigned skip = 0; ... if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) { rq->cmd_flags &= ~REQ_FLUSH; if (!has_fua) rq->cmd_flags &= ~REQ_FUA; return rq; } So, the flush machinery was bypassed in such cases (q->flush_flags == 0 && rq->cmd_flags & (REQ_FLUSH|REQ_FUA)). Now, however, we don't get into the flush machinery at all. Instead, __elv_next_request just hands a request with flush and fua bits set to the scsi_request_fn, even if the underlying request_queue does not support flush or fua. The agreed upon approach is to fix the flush machinery to allow stacking. While this isn't used in practice (since there is only one request-based dm target, and that target will now reflect the flush flags of the underlying device), it does future-proof the solution, and make it function as designed. In order to make this work, I had to add a field to the struct request, inside the flush structure (to store the original req->end_io). Shaohua had suggested overloading the union with rb_node and completion_data, but the completion data is used by device mapper and can also be used by other drivers. So, I didn't see a way around the additional field. I tested this patch on an HP EVA with both ext4 and xfs, and it recovers the lost performance. Comments and other testers, as always, are appreciated. Cheers, Jeff Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 14 8月, 2011 2 次提交
-
-
由 Rafael J. Wysocki 提交于
Function genpd_queue_power_off_work() is not defined for CONFIG_PM_RUNTIME, so pm_genpd_poweroff_unused() causes a build error to happen in that case. Fix the problem by making pm_genpd_poweroff_unused() depend on CONFIG_PM_RUNTIME too. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
由 Jaehoon Chung 提交于
"mmc: dw_mmc: Fix DDR mode support" removed the last user. Signed-off-by: NJaehoon Chung <jh80.chung@samsung.com> Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com> Signed-off-by: NChris Ball <cjb@laptop.org>
-
- 12 8月, 2011 1 次提交
-
-
由 Vasiliy Kulikov 提交于
The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC check in set_user() to check for NPROC exceeding via setuid() and similar functions. Before the check there was a possibility to greatly exceed the allowed number of processes by an unprivileged user if the program relied on rlimit only. But the check created new security threat: many poorly written programs simply don't check setuid() return code and believe it cannot fail if executed with root privileges. So, the check is removed in this patch because of too often privilege escalations related to buggy programs. The NPROC can still be enforced in the common code flow of daemons spawning user processes. Most of daemons do fork()+setuid()+execve(). The check introduced in execve() (1) enforces the same limit as in setuid() and (2) doesn't create similar security issues. Neil Brown suggested to track what specific process has exceeded the limit by setting PF_NPROC_EXCEEDED process flag. With the change only this process would fail on execve(), and other processes' execve() behaviour is not changed. Solar Designer suggested to re-check whether NPROC limit is still exceeded at the moment of execve(). If the process was sleeping for days between set*uid() and execve(), and the NPROC counter step down under the limit, the defered execve() failure because NPROC limit was exceeded days ago would be unexpected. If the limit is not exceeded anymore, we clear the flag on successful calls to execve() and fork(). The flag is also cleared on successful calls to set_user() as the limit was exceeded for the previous user, not the current one. Similar check was introduced in -ow patches (without the process flag). v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user(). Reviewed-by: NJames Morris <jmorris@namei.org> Signed-off-by: NVasiliy Kulikov <segoon@openwall.com> Acked-by: NNeilBrown <neilb@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 8月, 2011 2 次提交
-
-
由 Namhyung Kim 提交于
Add FLUSH/FUA support to blktrace. As FLUSH precedes WRITE and/or FUA follows WRITE, use the same 'F' flag for both cases and distinguish them by their (relative) position. The end results look like (other flags might be shown also): - WRITE: W - WRITE_FLUSH: FW - WRITE_FUA: WF - WRITE_FLUSH_FUA: FWF Note that we reuse TC_BARRIER due to lack of bit space of act_mask so that the older versions of blktrace tools will report flush requests as barriers from now on. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Matthew Wilcox 提交于
REQ_SECURE, REQ_FLUSH and REQ_FUA may all be set on a bio as well as on a request, so relocate them to the shared part of the enum. Signed-off-by: NMatthew Wilcox <willy@linux.intel.com> Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 10 8月, 2011 1 次提交
-
-
由 Stephen Warren 提交于
The patch adds empty function of_get_property for non-dt build, so that drivers migrating to dt can save some '#ifdef CONFIG_OF'. This also fixes the current Tegra compile problem in linux-next. Signed-off-by: NStephen Warren <swarren@nvidia.com> Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
-
- 09 8月, 2011 3 次提交
-
-
由 Peter Zijlstra 提交于
In commit 2efaca92 ("mm/futex: fix futex writes on archs with SW tracking of dirty & young") we forgot about MMU=n. This patch fixes that. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: NDavid Howells <dhowells@redhat.com> Link: http://lkml.kernel.org/r/1311761831.24752.413.camel@twinsSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
Avoid annoying warnings from these functions ("discards qualifiers") because they assign 'current_cred()' to a non-const pointer. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Howells 提交于
Commit 32955148 ("fix rcu annotations noise in cred.h") accidentally dropped the const of current->cred inside current_cred() by the insertion of a cast to deal with an RCU annotation loss warning from sparce. Use an appropriate RCU wrapper instead so as not to lose the const. Signed-off-by: NDavid Howells <dhowells@redhat.com> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 8月, 2011 2 次提交
-
-
由 David S. Miller 提交于
Currently userland will barf when including linux/netlink.h unless it precisely includes sys/socket.h first. The issue is where the definition of "sa_family_t" comes from. We've been back and forth on how to fix this issue in the past, see: http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/622621 http://thread.gmane.org/gmane.linux.network/143380 Ben Hutchings suggested we take a hint from how we handle the sockaddr_storage type. First we define a "__kernel_sa_family_t" to linux/socket.h that is always defined. Then if __KERNEL__ is defined, we also define "sa_family_t" as equal to "__kernel_sa_family_t". Then in places like linux/netlink.h we use __kernel_sa_family_t in user visible datastructures. Reported-by: NMichel Machado <michel@digirati.com.br> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Al Viro 提交于
task->cred is declared as __rcu, and access to other tasks' ->cred is, indeed, protected. Access to current->cred does not need rcu_dereference() at all, since only the task itself can change its ->cred. sparse, of course, has no way of knowing that... Add force-cast in current_cred(), make current_fsuid() et.al. use it. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 8月, 2011 5 次提交
-
-
由 Linus Torvalds 提交于
The inode structure layout is largely random, and some of the vfs paths really do care. The path lookup in particular is already quite D$ intensive, and profiles show that accessing the 'inode->i_op->xyz' fields is quite costly. We already optimized the dcache to not unnecessarily load the d_op structure for members that are often NULL using the DCACHE_OP_xyz bits in dentry->d_flags, and this does something very similar for the inode ops that are used during pathname lookup. It also re-orders the fields so that the fields accessed by 'stat' are together at the beginning of the inode structure, and roughly in the order accessed. The effect of this seems to be in the 1-2% range for an empty kernel "make -j" run (which is fairly kernel-intensive, mostly in filename lookup), so it's visible. The numbers are fairly noisy, though, and likely depend a lot on exact microarchitecture. So there's more tuning to be done. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
Gcc tends to generate better code with small integers, including the DCACHE_xyz flag tests - so move the common ones to be first in the list. Also just remove the unused DCACHE_INOTIFY_PARENT_WATCHED and DCACHE_AUTOFS_PENDING values, their users no longer exists in the source tree. And add a "unlikely()" to the DCACHE_OP_COMPARE test, since we want the common case to be a nice straight-line fall-through. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David S. Miller 提交于
Computers have become a lot faster since we compromised on the partial MD4 hash which we use currently for performance reasons. MD5 is a much safer choice, and is inline with both RFC1948 and other ISS generators (OpenBSD, Solaris, etc.) Furthermore, only having 24-bits of the sequence number be truly unpredictable is a very serious limitation. So the periodic regeneration and 8-bit counter have been removed. We compute and use a full 32-bit sequence number. For ipv6, DCCP was found to use a 32-bit truncated initial sequence number (it needs 43-bits) and that is fixed here as well. Reported-by: NDan Kaminsky <dan@doxpara.com> Tested-by: NWilly Tarreau <w@1wt.eu> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
We are going to use this for TCP/IP sequence number and fragment ID generation. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Mandeep Singh Baines 提交于
For ChromiumOS, we use SHA-1 to verify the integrity of the root filesystem. The speed of the kernel sha-1 implementation has a major impact on our boot performance. To improve boot performance, we investigated using the heavily optimized sha-1 implementation used in git. With the git sha-1 implementation, we see a 11.7% improvement in boot time. 10 reboots, remove slowest/fastest. Before: Mean: 6.58 seconds Stdev: 0.14 After (with git sha-1, this patch): Mean: 5.89 seconds Stdev: 0.07 The other cool thing about the git SHA-1 implementation is that it only needs 64 bytes of stack for the workspace while the original kernel implementation needed 320 bytes. Signed-off-by: NMandeep Singh Baines <msb@chromium.org> Cc: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Cc: Nicolas Pitre <nico@cam.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: David S. Miller <davem@davemloft.net> Cc: linux-crypto@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 8月, 2011 1 次提交
-
-
由 Andy Lutomirski 提交于
I suspect that this works on T410. Signed-off-by: NAndy Lutomirski <luto@mit.edu> Signed-off-by: NMatthew Garrett <mjg@redhat.com>
-
- 05 8月, 2011 1 次提交
-
-
由 Boaz Harrosh 提交于
exofs file system wants to use pnfs_osd_xdr.h file instead of redefining pnfs-objects types in it's private "pnfs.h" headr. Before we do the switch we must make sure pnfs_osd_xdr.h is compilable also under NFS versions smaller than 4.1. Since now it is needed regardless of version, by the exofs code. nfs4_string is not the only nfs4 type out in the global scope. Ack-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
- 04 8月, 2011 8 次提交
-
-
由 Grant Likely 提交于
This reverts commit 750f463a. of_alias_* still needs work to be generalized for 'promtree' dt platforms, and to no implicitly create entries for available ids. Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
-
由 Hugh Dickins 提交于
We have already acknowledged that swapoff of a tmpfs file is slower than it was before conversion to the generic radix_tree: a little slower there will be acceptable, if the hotter paths are faster. But it was a shock to find swapoff of a 500MB file 20 times slower on my laptop, taking 10 minutes; and at that rate it significantly slows down my testing. Now, most of that turned out to be overhead from PROVE_LOCKING and PROVE_RCU: without those it was only 4 times slower than before; and more realistic tests on other machines don't fare as badly. I've tried a number of things to improve it, including tagging the swap entries, then doing lookup by tag: I'd expected that to halve the time, but in practice it's erratic, and often counter-productive. The only change I've so far found to make a consistent improvement, is to short-circuit the way we go back and forth, gang lookup packing entries into the array supplied, then shmem scanning that array for the target entry. Scanning in place doubles the speed, so it's now only twice as slow as before (or three times slower when the PROVEs are on). So, add radix_tree_locate_item() as an expedient, once-off, single-caller hack to do the lookup directly in place. #ifdef it on CONFIG_SHMEM and CONFIG_SWAP, as much to document its limited applicability as save space in other configurations. And, sadly, #include sched.h for cond_resched(). Signed-off-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
But we've not yet removed the old swp_entry_t i_direct[16] from shmem_inode_info. That's because it was still being shared with the inline symlink. Remove it now (saving 64 or 128 bytes from shmem inode size), and use kmemdup() for short symlinks, say, those up to 128 bytes. I wonder why mpol_free_shared_policy() is done in shmem_destroy_inode() rather than shmem_evict_inode(), where we usually do such freeing? I guess it doesn't matter, and I'm not into NUMA mpol testing right now. Signed-off-by: NHugh Dickins <hughd@google.com> Acked-by: NRik van Riel <riel@redhat.com> Reviewed-by: NPekka Enberg <penberg@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Remove mem_cgroup_shmem_charge_fallback(): it was only required when we had to move swappage to filecache with GFP_NOWAIT. Remove the GFP_NOWAIT special case from mem_cgroup_cache_charge(), by moving its call out from shmem_add_to_page_cache() to two of thats three callers. But leave it doing mem_cgroup_uncharge_cache_page() on error: although asymmetrical, it's easier for all 3 callers to handle. These two changes would also be appropriate if anyone were to start using shmem_read_mapping_page_gfp() with GFP_NOWAIT. Remove mem_cgroup_get_shmem_target(): mc_handle_file_pte() can test radix_tree_exceptional_entry() to get what it needs for itself. Signed-off-by: NHugh Dickins <hughd@google.com> Acked-by: NRik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
While it's at its least, make a number of boring nitpicky cleanups to shmem.c, mostly for consistency of variable naming. Things like "swap" instead of "entry", "pgoff_t index" instead of "unsigned long idx". And since everything else here is prefixed "shmem_", better change init_tmpfs() to shmem_init(). Signed-off-by: NHugh Dickins <hughd@google.com> Acked-by: NRik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
The maximum size of a shmem/tmpfs file has been limited by the maximum size of its triple-indirect swap vector. With 4kB page size, maximum filesize was just over 2TB on a 32-bit kernel, but sadly one eighth of that on a 64-bit kernel. (With 8kB page size, maximum filesize was just over 4TB on a 64-bit kernel, but 16TB on a 32-bit kernel, MAX_LFS_FILESIZE being then more restrictive than swap vector layout.) It's a shame that tmpfs should be more restrictive than ramfs, and this limitation has now been noticed. Add another level to the swap vector? No, it became obscure and hard to maintain, once I complicated it to make use of highmem pages nine years ago: better choose another way. Surely, if 2.4 had had the radix tree pagecache introduced in 2.5, then tmpfs would never have invented its own peculiar radix tree: we would have fitted swap entries into the common radix tree instead, in much the same way as we fit swap entries into page tables. And why should each file have a separate radix tree for its pages and for its swap entries? The swap entries are required precisely where and when the pages are not. We want to put them together in a single radix tree: which can then avoid much of the locking which was needed to prevent them from being exchanged underneath us. This also avoids the waste of memory devoted to swap vectors, first in the shmem_inode itself, then at least two more pages once a file grew beyond 16 data pages (pages accounted by df and du, but not by memcg). Allocated upfront, to avoid allocation when under swapping pressure, but pure waste when CONFIG_SWAP is not set - I have never spattered around the ifdefs to prevent that, preferring this move to sharing the common radix tree instead. There are three downsides to sharing the radix tree. One, that it binds tmpfs more tightly to the rest of mm, either requiring knowledge of swap entries in radix tree there, or duplication of its code here in shmem.c. I believe that the simplications and memory savings (and probable higher performance, not yet measured) justify that. Two, that on HIGHMEM systems with SWAP enabled, it's the lowmem radix nodes that cannot be freed under memory pressure - whereas before it was the less precious highmem swap vector pages that could not be freed. I'm hoping that 64-bit has now been accessible for long enough, that the highmem argument has grown much less persuasive. Three, that swapoff is slower than it used to be on tmpfs files, since it's using a simple generic mechanism not tailored to it: I find this noticeable, and shall want to improve, but maybe nobody else will notice. So... now remove most of the old swap vector code from shmem.c. But, for the moment, keep the simple i_direct vector of 16 pages, with simple accessors shmem_put_swap() and shmem_get_swap(), as a toy implementation to help mark where swap needs to be handled in subsequent patches. Signed-off-by: NHugh Dickins <hughd@google.com> Acked-by: NRik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
If swap entries are to be stored along with struct page pointers in a radix tree, they need to be distinguished as exceptional entries. Most of the handling of swap entries in radix tree will be contained in shmem.c, but a few functions in filemap.c's common code need to check for their appearance: find_get_page(), find_lock_page(), find_get_pages() and find_get_pages_contig(). So as not to slow their fast paths, tuck those checks inside the existing checks for unlikely radix_tree_deref_slot(); except for find_lock_page(), where it is an added test. And make it a BUG in find_get_pages_tag(), which is not applied to tmpfs files. A part of the reason for eliminating shmem_readpage() earlier, was to minimize the places where common code would need to allow for swap entries. The swp_entry_t known to swapfile.c must be massaged into a slightly different form when stored in the radix tree, just as it gets massaged into a pte_t when stored in page tables. In an i386 kernel this limits its information (type and page offset) to 30 bits: given 32 "types" of swapfile and 4kB pagesize, that's a maximum swapfile size of 128GB. Which is less than the 512GB we previously allowed with X86_PAE (where the swap entry can occupy the entire upper 32 bits of a pte_t), but not a new limitation on 32-bit without PAE; and there's not a new limitation on 64-bit (where swap filesize is already limited to 16TB by a 32-bit page offset). Thirty areas of 128GB is probably still enough swap for a 64GB 32-bit machine. Provide swp_to_radix_entry() and radix_to_swp_entry() conversions, and enforce filesize limit in read_swap_header(), just as for ptes. Signed-off-by: NHugh Dickins <hughd@google.com> Acked-by: NRik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
A patchset to extend tmpfs to MAX_LFS_FILESIZE by abandoning its peculiar swap vector, instead keeping a file's swap entries in the same radix tree as its struct page pointers: thus saving memory, and simplifying its code and locking. This patch: The radix_tree is used by several subsystems for different purposes. A major use is to store the struct page pointers of a file's pagecache for memory management. But what if mm wanted to store something other than page pointers there too? The low bit of a radix_tree entry is already used to denote an indirect pointer, for internal use, and the unlikely radix_tree_deref_retry() case. Define the next bit as denoting an exceptional entry, and supply inline functions radix_tree_exception() to return non-0 in either unlikely case, and radix_tree_exceptional_entry() to return non-0 in the second case. If a subsystem already uses radix_tree with that bit set, no problem: it does not affect internal workings at all, but is defined for the convenience of those storing well-aligned pointers in the radix_tree. The radix_tree_gang_lookups have an implicit assumption that the caller can deduce the offset of each entry returned e.g. by the page->index of a struct page. But that may not be feasible for some kinds of item to be stored there. radix_tree_gang_lookup_slot() allow for an optional indices argument, output array in which to return those offsets. The same could be added to other radix_tree_gang_lookups, but for now keep it to the only one for which we need it. Signed-off-by: NHugh Dickins <hughd@google.com> Acked-by: NRik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-