- 05 8月, 2010 14 次提交
-
-
由 Yinghai Lu 提交于
This exposes memblock_debug and associated memblock_dbg() macro, along with memblock_can_resize so that x86 can use these when ported to use memblock Signed-off-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Benjamin Herrenschmidt 提交于
The former is now strict, it will fail if it cannot honor the allocation within the node, while the later implements the previous semantic which falls back to allocating anywhere. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Benjamin Herrenschmidt 提交于
We now provide a default (weak) implementation of memblock_nid_range() which uses the early_pfn_map[] if CONFIG_ARCH_POPULATES_NODE_MAP is set. Sparc still needs to use its own method due to the way the pages can be scattered between nodes. This implementation is inefficient due to our main algorithm and callback construct wanting to work on an ascending addresses bases while early_pfn_map[] would rather work with nid's (it's unsorted at that stage). But it should work and we can look into improving it subsequently, possibly using arch compile options to chose a different algorithm alltogether. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Benjamin Herrenschmidt 提交于
Some archs such as ARM want to avoid coalescing accross things such as the lowmem/highmem boundary or similar. This provides the option to control it via an arch callback for which a weak default is provided which always allows coalescing. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Benjamin Herrenschmidt 提交于
This is in preparation for having resizable arrays. Note that we still allocate one more than needed, this is unchanged from the previous implementation. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Benjamin Herrenschmidt 提交于
Right now, both the "memory" and "reserved" memblock_type structures have a "size" member. It represents the calculated memory size in the former case and is unused in the latter. This moves it out to the main memblock structure instead Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Benjamin Herrenschmidt 提交于
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Benjamin Herrenschmidt 提交于
Let's not waste space and cycles on archs that don't support >32-bit physical address space. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Benjamin Herrenschmidt 提交于
The RMA (RMO is a misnomer) is a concept specific to ppc64 (in fact server ppc64 though I hijack it on embedded ppc64 for similar purposes) and represents the area of memory that can be accessed in real mode (aka with MMU off), or on embedded, from the exception vectors (which is bolted in the TLB) which pretty much boils down to the same thing. We take that out of the generic MEMBLOCK data structure and move it into arch/powerpc where it belongs, renaming it to "RMA" while at it. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Benjamin Herrenschmidt 提交于
This introduce memblock.current_limit which is used to limit allocations from memblock_alloc() or memblock_alloc_base(..., MEMBLOCK_ALLOC_ACCESSIBLE). The old MEMBLOCK_ALLOC_ANYWHERE changes value from 0 to ~(u64)0 and can still be used with memblock_alloc_base() to allocate really anywhere. It is -no-longer- cropped to MEMBLOCK_REAL_LIMIT which disappears. Note to archs: I'm leaving the default limit to MEMBLOCK_ALLOC_ANYWHERE. I strongly recommend that you ensure that you set an appropriate limit during boot in order to guarantee that an memblock_alloc() at any time results in something that is accessible with a simple __va(). The reason is that a subsequent patch will introduce the ability for the array to resize itself by reallocating itself. The MEMBLOCK core will honor the current limit when performing those allocations. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Benjamin Herrenschmidt 提交于
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Benjamin Herrenschmidt 提交于
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Benjamin Herrenschmidt 提交于
Nobody uses it anymore. It's semantics were ... weird Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Benjamin Herrenschmidt 提交于
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 04 8月, 2010 4 次提交
-
-
由 Benjamin Herrenschmidt 提交于
Walk memblock's using for_each_memblock() and use memblock_region_base/end_pfn() for getting to PFNs. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Benjamin Herrenschmidt 提交于
To make it fast, we steal ARM's binary search for memblock_is_memory() and we use that to also the replace existing implementation of memblock_is_reserved(). Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Benjamin Herrenschmidt 提交于
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Benjamin Herrenschmidt 提交于
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 02 8月, 2010 2 次提交
-
-
由 Fang Wenqi 提交于
Found with makes headers_check: include/linux/virtio_9p.h:15: found __[us]{8,16,32,64} type without #include <linux/types.h> Signed-off-by: NFang Wenqi <antonf@turbolinux.com.cn> Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
-
由 Trond Myklebust 提交于
nfs_commit_inode() needs to be defined irrespectively of whether or not we are supporting NFSv3 and NFSv4. Allow the compiler to optimise away code in the NFSv2-only case by converting it into an inlined stub function. Reported-and-tested-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 31 7月, 2010 2 次提交
-
-
由 Russell King 提交于
Some platforms gate the pclk (APB - the bus - clock) to the peripherals for power saving, along with the functional clock. When devices are accessed without pclk enabled, the kernel will oops. This gives them two options: 1. Leave all clocks on all the time. 2. Attempt to gate pclk along with the functional clock. (With some hardware, pclk and the functional clock are gated by a single bit in a register.) (1) has the disadvantage that it causes increased power usage, which is bad news for battery operated devices. (2) can lead to kernel oops if registers are accessed without the functional clock being enabled. So, introduce the apb_pclk signal in such a way existing drivers don't need to be updated. Essentially, this means we guarantee that: 1. pclk will be enabled whenever the driver is bound to a device - from probe() to remove() time. 2. pclk will also be enabled when reading the primecell IDs from the device. In order to allow drivers to be incrementally updated to achieve greater power savings, we provide two additional calls to allow drivers to manage the pclk - amba_pclk_enable()/amba_pclk_disable(). Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
由 Trond Myklebust 提交于
See https://bugzilla.kernel.org/show_bug.cgi?id=16056 If other processes are blocked waiting for kswapd to free up some memory so that they can make progress, then we cannot allow kswapd to block on those processes. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
-
- 30 7月, 2010 2 次提交
-
-
由 David Howells 提交于
Fix __task_cred()'s lockdep check by removing the following validation condition: lockdep_tasklist_lock_is_held() as commit_creds() does not take the tasklist_lock, and nor do most of the functions that call it, so this check is pointless and it can prevent detection of the RCU lock not being held if the tasklist_lock is held. Instead, add the following validation condition: task->exit_state >= 0 to permit the access if the target task is dead and therefore unable to change its own credentials. Fix __task_cred()'s comment to: (1) discard the bit that says that the caller must prevent the target task from being deleted. That shouldn't need saying. (2) Add a comment indicating the result of __task_cred() should not be passed directly to get_cred(), but rather than get_task_cred() should be used instead. Also put a note into the documentation to enforce this point there too. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJiri Olsa <jolsa@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Howells 提交于
It's possible for get_task_cred() as it currently stands to 'corrupt' a set of credentials by incrementing their usage count after their replacement by the task being accessed. What happens is that get_task_cred() can race with commit_creds(): TASK_1 TASK_2 RCU_CLEANER -->get_task_cred(TASK_2) rcu_read_lock() __cred = __task_cred(TASK_2) -->commit_creds() old_cred = TASK_2->real_cred TASK_2->real_cred = ... put_cred(old_cred) call_rcu(old_cred) [__cred->usage == 0] get_cred(__cred) [__cred->usage == 1] rcu_read_unlock() -->put_cred_rcu() [__cred->usage == 1] panic() However, since a tasks credentials are generally not changed very often, we can reasonably make use of a loop involving reading the creds pointer and using atomic_inc_not_zero() to attempt to increment it if it hasn't already hit zero. If successful, we can safely return the credentials in the knowledge that, even if the task we're accessing has released them, they haven't gone to the RCU cleanup code. We then change task_state() in procfs to use get_task_cred() rather than calling get_cred() on the result of __task_cred(), as that suffers from the same problem. Without this change, a BUG_ON in __put_cred() or in put_cred_rcu() can be tripped when it is noticed that the usage count is not zero as it ought to be, for example: kernel BUG at kernel/cred.c:168! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/kernel/mm/ksm/run CPU 0 Pid: 2436, comm: master Not tainted 2.6.33.3-85.fc13.x86_64 #1 0HR330/OptiPlex 745 RIP: 0010:[<ffffffff81069881>] [<ffffffff81069881>] __put_cred+0xc/0x45 RSP: 0018:ffff88019e7e9eb8 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff880161514480 RCX: 00000000ffffffff RDX: 00000000ffffffff RSI: ffff880140c690c0 RDI: ffff880140c690c0 RBP: ffff88019e7e9eb8 R08: 00000000000000d0 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000040 R12: ffff880140c690c0 R13: ffff88019e77aea0 R14: 00007fff336b0a5c R15: 0000000000000001 FS: 00007f12f50d97c0(0000) GS:ffff880007400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8f461bc000 CR3: 00000001b26ce000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process master (pid: 2436, threadinfo ffff88019e7e8000, task ffff88019e77aea0) Stack: ffff88019e7e9ec8 ffffffff810698cd ffff88019e7e9ef8 ffffffff81069b45 <0> ffff880161514180 ffff880161514480 ffff880161514180 0000000000000000 <0> ffff88019e7e9f28 ffffffff8106aace 0000000000000001 0000000000000246 Call Trace: [<ffffffff810698cd>] put_cred+0x13/0x15 [<ffffffff81069b45>] commit_creds+0x16b/0x175 [<ffffffff8106aace>] set_current_groups+0x47/0x4e [<ffffffff8106ac89>] sys_setgroups+0xf6/0x105 [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b Code: 48 8d 71 ff e8 7e 4e 15 00 85 c0 78 0b 8b 75 ec 48 89 df e8 ef 4a 15 00 48 83 c4 18 5b c9 c3 55 8b 07 8b 07 48 89 e5 85 c0 74 04 <0f> 0b eb fe 65 48 8b 04 25 00 cc 00 00 48 3b b8 58 04 00 00 75 RIP [<ffffffff81069881>] __put_cred+0xc/0x45 RSP <ffff88019e7e9eb8> ---[ end trace df391256a100ebdd ]--- Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJiri Olsa <jolsa@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 7月, 2010 1 次提交
-
-
由 Rabin Vincent 提交于
Platforms may have some external power control which need to be controlled from board specific code. Rename the translate_vdd() callback to vdd_handler() and pass it the power mode. Acked-by: NLinus Walleij <linus.walleij@stericsson.com> Signed-off-by: NRabin Vincent <rabin.vincent@stericsson.com> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 28 7月, 2010 1 次提交
-
-
由 Anuj Aggarwal 提交于
Acked-by: NMark Brown <broonie@opensource.wolfsonmicro.com> In TPS6507x, depending on the status of DEFDCDC{2,3} pin either DEFDCDC{2,3}_LOW or DEFDCDC{2,3}_HIGH register needs to be read or programmed to change the output voltage. The current driver assumes DEFDCDC{2,3} pins are always tied low and thus operates only on DEFDCDC{2,3}_LOW register. This need not always be the case (as is found on OMAP-L138 EVM). Unfortunately, software cannot read the status of DEFDCDC{2,3} pins. So, this information is passed through platform data depending on how the board is wired. Signed-off-by: NAnuj Aggarwal <anuj.aggarwal@ti.com> Signed-off-by: NSekhar Nori <nsekhar@ti.com> Signed-off-by: NLiam Girdwood <lrg@slimlogic.co.uk>
-
- 27 7月, 2010 4 次提交
-
-
由 Linus Walleij 提交于
Implementation of the ST-Ericsson baudrate extension in the PL011 block. In this modified variant it is possible to change the sampling factor from 16 to 8, and thanks to this we can get higher baudrates while still using the same peripheral clock. Also replace the simple division to determine the baud divisor with DIV_ROUND_CLOSEST() rather than a simple integer division. Cc: Alessandro Rubini <rubini@unipv.it> Cc: Jerzy Kasenberg <jerzy.kasenberg@tieto.com> Signed-off-by: NMarcin Mielczarczyk <marcin.mielczarczyk@tieto.com> Signed-off-by: NLinus Walleij <linus.walleij@stericsson.com> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
由 Linus Walleij 提交于
In the ST-Ericsson version of the PL011 the TX and RX have different control registers. Cc: Alessandro Rubini <rubini@unipv.it> Signed-off-by: NMarcin Mielczarczyk <marcin.mielczarczyk@tieto.com> Signed-off-by: NLinus Walleij <linus.walleij@stericsson.com> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
由 Russell King 提交于
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
由 Christoph Hellwig 提交于
Filesystems with unwritten extent support must not complete an AIO request until the transaction to convert the extent has been commited. That means the aio_complete calls needs to be moved into the ->end_io callback so that the filesystem can control when to call it exactly. This makes a bit of a mess out of dio_complete and the ->end_io callback prototype even more complicated. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 25 7月, 2010 1 次提交
-
-
由 Rafael J. Wysocki 提交于
Commit 2a6b6976 (ACPI: Store NVS state even when entering suspend to RAM) caused the ACPI suspend code save the NVS area during suspend and restore it during resume unconditionally, although it is known that some systems need to use acpi_sleep=s4_nonvs for hibernation to work. To allow the affected systems to avoid saving and restoring the NVS area during suspend to RAM and resume, introduce kernel command line option acpi_sleep=nonvs and make acpi_sleep=s4_nonvs work as its alias temporarily (add acpi_sleep=s4_nonvs to the feature removal file). Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16396 . Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Reported-and-tested-by: Ntomas m <tmezzadra@gmail.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
- 23 7月, 2010 1 次提交
-
-
由 Herbert Xu 提交于
Mark Wagner reported OOM symptoms when sending UDP traffic over a macvtap link to a kvm receiver. This appears to be caused by the fact that macvtap packet queues are unlimited in length. This means that if the receiver can't keep up with the rate of flow, then we will hit OOM. Of course it gets worse if the OOM killer then decides to kill the receiver. This patch imposes a cap on the packet queue length, in the same way as the tuntap driver, using the device TX queue length. Please note that macvtap currently has no way of giving congestion notification, that means the software device TX queue cannot be used and packets will always be dropped once the macvtap driver queue fills up. This shouldn't be a great problem for the scenario where macvtap is used to feed a kvm receiver, as the traffic is most likely external in origin so congestion notification can't be applied anyway. Of course, if anybody decides to complain about guest-to-guest UDP packet loss down the track, then we may have to revisit this. Incidentally, this patch also fixes a real memory leak when macvtap_get_queue fails. Chris Wright noticed that for this patch to work, we need a non-zero TX queue length. This patch includes his work to change the default macvtap TX queue length to 500. Reported-by: NMark Wagner <mwagner@redhat.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Acked-by: NChris Wright <chrisw@sous-sol.org> Acked-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 7月, 2010 1 次提交
-
-
由 Jason Wessel 提交于
The kdb code should not toggle the sysrq state in case an end user wants to try and resume the normal kernel execution. Signed-off-by: NJason Wessel <jason.wessel@windriver.com> Acked-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
-
- 21 7月, 2010 2 次提交
-
-
由 Doug Goldstein 提交于
vgaarb.h was missing the #define of the #ifndef at the top for the guard to prevent multiple #include's from causing re-define errors Signed-off-by: NDoug Goldstein <cardoe@gentoo.org> Cc: Dave Airlie <airlied@redhat.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDave Airlie <airlied@redhat.com>
-
由 Paul E. McKenney 提交于
If a single-threaded process does a file-descriptor operation, and some other process accesses that same file descriptor via /proc, the current rcu_dereference_check_fdtable() can give a false-positive RCU-lockdep splat due to the reference count being increased by the /proc access after the reference-count check in fget_light() but before the check in rcu_dereference_check_fdtable(). This commit prevents this false positive by checking for a single-threaded process. To avoid #include hell, this commit uses the wrapper for thread_group_empty(current) defined by rcu_my_thread_group_empty() provided in a separate commit. Located-by: NMiles Lane <miles.lane@gmail.com> Located-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 7月, 2010 1 次提交
-
-
由 Dan Carpenter 提交于
If the kzalloc() fails we should return NULL. All the places that call alloc_apertures() check for this already. Signed-off-by: NDan Carpenter <error27@gmail.com> Acked-by: NJames Simmons <jsimmons@infradead.org> Acked-by: NMarcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: NDave Airlie <airlied@redhat.com>
-
- 19 7月, 2010 1 次提交
-
-
由 Dave Chinner 提交于
The current shrinker implementation requires the registered callback to have global state to work from. This makes it difficult to shrink caches that are not global (e.g. per-filesystem caches). Pass the shrinker structure to the callback so that users can embed the shrinker structure in the context the shrinker needs to operate on and get back to it in the callback via container_of(). Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 17 7月, 2010 2 次提交
-
-
由 Benjamin Herrenschmidt 提交于
This moves the various known Marvell PHY IDs to include/linux/marvell_phy.h along with dev_flags definitions for use by the driver. I then added a flag that changes the PHY init code to setup the LEDs config to the values needed to operate a dns323 rev C1 NAS. I moved the existing "resistance" flag to the .h as well, though I've been unable to find whoever sets this to convert it to use that constant. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: NWolfram Sang <w.sang@pengutronix.de> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NNicolas Pitre <nico@fluxnic.net>
-
由 Bjorn Helgaas 提交于
If we fail to assign resources to a PCI BAR, this patch makes us try the original address from BIOS rather than leaving it disabled. Linux tries to make sure all PCI device BARs are inside the upstream PCI host bridge or P2P bridge apertures, reassigning BARs if necessary. Windows does similar reassignment. Before this patch, if we could not move a BAR into an aperture, we left the resource unassigned, i.e., at address zero. Windows leaves such BARs at the original BIOS addresses, and this patch makes Linux do the same. This is a bit ugly because we disable the resource long before we try to reassign it, so we have to keep track of the BIOS BAR address somewhere. For lack of a better place, I put it in the struct pci_dev. I think it would be cleaner to attempt the assignment immediately when the claim fails, so we could easily remember the original address. But we currently claim motherboard resources in the middle, after attempting to claim PCI resources and before assigning new PCI resources, and changing that is a fairly big job. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16263Reported-by: NAndrew <nitr0@seti.kr.ua> Tested-by: NAndrew <nitr0@seti.kr.ua> Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
-
- 16 7月, 2010 1 次提交
-
-
由 Jan Kara 提交于
OCFS2 uses t_commit trigger to compute and store checksum of the just committed blocks. When a buffer has b_frozen_data, checksum is computed for it instead of b_data but this can result in an old checksum being written to the filesystem in the following scenario: 1) transaction1 is opened 2) handle1 is opened 3) journal_access(handle1, bh) - This sets jh->b_transaction to transaction1 4) modify(bh) 5) journal_dirty(handle1, bh) 6) handle1 is closed 7) start committing transaction1, opening transaction2 8) handle2 is opened 9) journal_access(handle2, bh) - This copies off b_frozen_data to make it safe for transaction1 to commit. jh->b_next_transaction is set to transaction2. 10) jbd2_journal_write_metadata() checksums b_frozen_data 11) the journal correctly writes b_frozen_data to the disk journal 12) handle2 is closed - There was no dirty call for the bh on handle2, so it is never queued for any more journal operation 13) Checkpointing finally happens, and it just spools the bh via normal buffer writeback. This will write b_data, which was never triggered on and thus contains a wrong (old) checksum. This patch fixes the problem by calling the trigger at the moment data is frozen for journal commit - i.e., either when b_frozen_data is created by do_get_write_access or just before we write a buffer to the log if b_frozen_data does not exist. We also rename the trigger to t_frozen as that better describes when it is called. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-