- 16 6月, 2011 2 次提交
-
-
由 KOSAKI Motohiro 提交于
Currently, memcg reclaim can disable swap token even if the swap token mm doesn't belong in its memory cgroup. It's slightly risky. If an admin creates very small mem-cgroup and silly guy runs contentious heavy memory pressure workload, every tasks are going to lose swap token and then system may become unresponsive. That's bad. This patch adds 'memcg' parameter into disable_swap_token(). and if the parameter doesn't match swap token, VM doesn't disable it. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Rik van Riel<riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michael Hennerich 提交于
Signed-off-by: NMichael Hennerich <michael.hennerich@analog.com> Signed-off-by: NMike Frysinger <vapier@gentoo.org> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 6月, 2011 2 次提交
-
-
由 Jiri Pirko 提交于
Testing of VLAN_FLAG_REORDER_HDR does not belong in vlan_untag but rather in vlan_do_receive. Otherwise the vlan header will not be properly put on the packet in the case of vlan header accelleration. As we remove the check from vlan_check_reorder_header rename it vlan_reorder_header to keep the naming clean. Fix up the skb->pkt_type early so we don't look at the packet after adding the vlan tag, which guarantees we don't goof and look at the wrong field. Use a simple if statement instead of a complicated switch statement to decided that we need to increment rx_stats for a multicast packet. Hopefully at somepoint we will just declare the case where VLAN_FLAG_REORDER_HDR is cleared as unsupported and remove the code. Until then this keeps it working correctly. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NJiri Pirko <jpirko@redhat.com> Acked-by: NChangli Gao <xiaosuo@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Howells 提交于
It uses cpu_relax(), and so needs <asm/processor.h> Without this patch, I see: CC arch/mn10300/kernel/asm-offsets.s In file included from include/linux/time.h:8, from include/linux/timex.h:56, from include/linux/sched.h:57, from arch/mn10300/kernel/asm-offsets.c:7: include/linux/seqlock.h: In function 'read_seqbegin': include/linux/seqlock.h:91: error: implicit declaration of function 'cpu_relax' whilst building asb2364_defconfig on MN10300. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 6月, 2011 2 次提交
-
-
由 Jamie Iles 提交于
include/linux/basic_mmio_gpio.h uses a spinlock_t without including any of the spinlock headers resulting in this compiler warning. include/linux/basic_mmio_gpio.h:51:2: error: expected specifier-qualifier-list before 'spinlock_t' Explicitly include linux/spinlock_types.h to fix it. Signed-off-by: NJamie Iles <jamie@jamieiles.com> Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
-
由 Yegor Yefremov 提交于
Signed-off-by: NYegor Yefremov <yegorslists@googlemail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 6月, 2011 1 次提交
-
-
由 Linus Torvalds 提交于
This tries to make the 'struct inode' accesses denser in the data cache by moving a commonly accessed field (i_security) closer to other fields that are accessed often. It also makes 'i_state' just an 'unsigned int' rather than 'unsigned long', since we only use a few bits of that field, and moves it next to the existing 'i_flags' so that we potentially get better structure layout (although depending on config options, i_flags may already have packed in the same word as i_lock, so this improves packing only for the case of spinlock debugging) Out 'struct inode' is still way too big, and we should probably move some other fields around too (the acl fields in particular) for better data cache access density. Other fields (like the inode hash) are likely to be entirely irrelevant under most loads. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 6月, 2011 1 次提交
-
-
由 Alan Stern 提交于
Some USB mass-storage devices have bugs that cause them not to handle the first READ(10) command they receive correctly. The Corsair Padlock v2 returns completely bogus data for its first read (possibly it returns the data in encrypted form even though the device is supposed to be unlocked). The Feiya SD/SDHC card reader fails to complete the first READ(10) command after it is plugged in or after a new card is inserted, returning a status code that indicates it thinks the command was invalid, which prevents the kernel from retrying the read. Since the first read of a new device or a new medium is for the partition sector, the kernel is unable to retrieve the device's partition table. Users have to manually issue an "hdparm -z" or "blockdev --rereadpt" command before they can access the device. This patch (as1470) works around the problem. It adds a new quirk flag, US_FL_INVALID_READ10, indicating that the first READ(10) should always be retried immediately, as should any failing READ(10) commands (provided the preceding READ(10) command succeeded, to avoid getting stuck in a loop). The patch also adds appropriate unusual_devs entries containing the new flag. Signed-off-by: NAlan Stern <stern@rowland.harvard.edu> Tested-by: NSven Geggus <sven-usbst@geggus.net> Tested-by: NPaul Hartman <paul.hartman+linux@gmail.com> CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net> CC: <stable@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 07 6月, 2011 3 次提交
-
-
由 Eric Dumazet 提交于
In 2.6.27, commit 393e52e3 (packet: deliver VLAN TCI to userspace) added a small information leak. Add padding field and make sure its zeroed before copy to user. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
This interface uses a temporary buffer, but for no real reason. And now can generate warnings like: net/sched/sch_generic.c: In function dev_watchdog net/sched/sch_generic.c:254:10: warning: unused variable drivername Just return driver->name directly or "". Reported-by: NConnor Hansen <cmdkhh@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 FUJITA Tomonori 提交于
By default the io_tlb_nslabs is set to zero, and gets set to whatever value is passed in via swiotlb_init_with_tbl function. The default value passed in is 64MB. However, if the user provides the 'swiotlb=<nslabs>' the default value is ignored and the value provided by the user is used... Except when the SWIOTLB is used under Xen - there the default value of 64MB is used and the Xen-SWIOTLB has no mechanism to get the 'io_tlb_nslabs' filled out by setup_io_tlb_npages functions. This patch provides a function for the Xen-SWIOTLB to call to see if the io_tlb_nslabs is set and if so use that value. Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 06 6月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
Following error is raised (and other similar ones) : net/ipv4/netfilter/nf_nat_standalone.c: In function ‘nf_nat_fn’: net/ipv4/netfilter/nf_nat_standalone.c:119:2: warning: case value ‘4’ not in enumerated type ‘enum ip_conntrack_info’ gcc barfs on adding two enum values and getting a not enumerated result : case IP_CT_RELATED+IP_CT_IS_REPLY: Add missing enum values Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: David Miller <davem@davemloft.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 04 6月, 2011 4 次提交
-
-
由 Vince Weaver 提交于
Fix include/linux/perf_event.h comments to be consistent with the actual #define names. This is trivial, but it can be a bit confusing when first reading through the file. Signed-off-by: NVince Weaver <vweaver1@eecs.utk.edu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus@samba.org Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1106031757090.29381@cl320.eecs.utk.eduSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Al Viro 提交于
Caching "we have already removed suid/caps" was overenthusiastic as merged. On network filesystems we might have had suid/caps set on another client, silently picked by this client on revalidate, all of that *without* clearing the S_NOSEC flag. AFAICS, the only reasonably sane way to deal with that is * new superblock flag; unless set, S_NOSEC is not going to be set. * local block filesystems set it in their ->mount() (more accurately, mount_bdev() does, so does btrfs ->mount(), users of mount_bdev() other than local block ones clear it) * if any network filesystem (or a cluster one) wants to use S_NOSEC, it'll need to set MS_NOSEC in sb->s_flags *AND* take care to clear S_NOSEC when inode attribute changes are picked from other clients. It's not an earth-shattering hole (anybody that can set suid on another client will almost certainly be able to write to the file before doing that anyway), but it's a bug that needs fixing. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Linus Torvalds 提交于
This reverts commit b1c43f82. It was broken in so many ways, and results in random odd pty issues. It re-introduced the buggy schedule_work() in flush_to_ldisc() that can cause endless work-loops (see commit a5660b41: "tty: fix endless work loop when the buffer fills up"). It also used an "unsigned int" return value fo the ->receive_buf() function, but then made multiple functions return a negative error code, and didn't actually check for the error in the caller. And it didn't actually work at all. BenH bisected down odd tty behavior to it: "It looks like the patch is causing some major malfunctions of the X server for me, possibly related to PTYs. For example, cat'ing a large file in a gnome terminal hangs the kernel for -minutes- in a loop of what looks like flush_to_ldisc/workqueue code, (some ftrace data in the quoted bits further down). ... Some more data: It -looks- like what happens is that the flush_to_ldisc work queue entry constantly re-queues itself (because the PTY is full ?) and the workqueue thread will basically loop forver calling it without ever scheduling, thus starving the consumer process that could have emptied the PTY." which is pretty much exactly the problem we fixed in a5660b41. Milton Miller pointed out the 'unsigned int' issue. Reported-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Reported-by: NMilton Miller <miltonm@bga.com> Cc: Stefan Bigler <stefan.bigler@keymile.com> Cc: Toby Gray <toby.gray@realvnc.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Chris Metcalf 提交于
On an architecture without CMPXCHG_LOCAL but with DEBUG_VM enabled, the VM_BUG_ON() in __pcpu_double_call_return_bool() will cause an early panic during boot unless we always align cpu_slab properly. In principle we could remove the alignment-testing VM_BUG_ON() for architectures that don't have CMPXCHG_LOCAL, but leaving it in means that new code will tend not to break x86 even if it is introduced on another platform, and it's low cost to require alignment. Acked-by: NDavid Rientjes <rientjes@google.com> Acked-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NChris Metcalf <cmetcalf@tilera.com> Signed-off-by: NPekka Enberg <penberg@kernel.org>
-
- 03 6月, 2011 1 次提交
-
-
The detection of spurios interrupts is currently limited to first level handler. In force-threaded mode we never notice if the threaded irq does not feel responsible. This patch catches the return value of the threaded handler and forwards it to the spurious detector. If the primary handler returns only IRQ_WAKE_THREAD then the spourious detector ignores it because it gets called again from the threaded handler. [ tglx: Report the erroneous return value early and bail out ] Signed-off-by: NSebastian Andrzej Siewior <sebastian@breakpoint.cc> Link: http://lkml.kernel.org/r/1306824972-27067-2-git-send-email-sebastian@breakpoint.ccSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 02 6月, 2011 2 次提交
-
-
由 Ben Greear 提交于
Currently, user-space cannot determine if a 0 tcp_vlan_tci means there is no VLAN tag or the VLAN ID was zero. Add flag to make this explicit. User-space can check for TP_STATUS_VLAN_VALID || tp_vlan_tci > 0, which will be backwards compatible. Older could would have just checked for tp_vlan_tci, so it will work no worse than before. Signed-off-by: NBen Greear <greearb@candelatech.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eliad Peller 提交于
Commit 0a35d36d ("cfg80211: Use capability info to detect mesh beacons") assumed that probe response with both ESS and IBSS bits cleared means that the frame was sent by a mesh sta. However, these capabilities are also being used in the p2p_find phase, and the mesh-validation broke it. Rename the WLAN_CAPABILITY_IS_MBSS macro, and verify that mesh ies exist before assuming this frame was sent by a mesh sta. Signed-off-by: NEliad Peller <eliad@wizery.com> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
- 01 6月, 2011 2 次提交
-
-
由 Youquan Song 提交于
There are no externally-visible changes with this. In the loop in the internal __domain_mapping() function, we simply detect if we are mapping: - size >= 2MiB, and - virtual address aligned to 2MiB, and - physical address aligned to 2MiB, and - on hardware that supports superpages. (and likewise for larger superpages). We automatically use a superpage for such mappings. We never have to worry about *breaking* superpages, since we trust that we will always *unmap* the same range that was mapped. So all we need to do is ensure that dma_pte_clear_range() will also cope with superpages. Adjust pfn_to_dma_pte() to take a superpage 'level' as an argument, so it can return a PTE at the appropriate level rather than always extending the page tables all the way down to level 1. Again, this is simplified by the fact that we should never encounter existing small pages when we're creating a mapping; any old mapping that used the same virtual range will have been entirely removed and its obsolete page tables freed. Provide an 'intel_iommu=sp_off' argument on the command line as a chicken bit. Not that it should ever be required. == The original commit seen in the iommu-2.6.git was Youquan's implementation (and completion) of my own half-baked code which I'd typed into an email. Followed by half a dozen subsequent 'fixes'. I've taken the unusual step of rewriting history and collapsing the original commits in order to keep the main history simpler, and make life easier for the people who are going to have to backport this to older kernels. And also so I can give it a more coherent commit comment which (hopefully) gives a better explanation of what's going on. The original sequence of commits leading to identical code was: Youquan Song (3): intel-iommu: super page support intel-iommu: Fix superpage alignment calculation error intel-iommu: Fix superpage level calculation error in dma_pfn_level_pte() David Woodhouse (4): intel-iommu: Precalculate superpage support for dmar_domain intel-iommu: Fix hardware_largepage_caps() intel-iommu: Fix inappropriate use of superpages in __domain_mapping() intel-iommu: Fix phys_pfn in __domain_mapping for sglist pages Signed-off-by: NYouquan Song <youquan.song@intel.com> Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
-
由 Randy Dunlap 提交于
Fix build warnings in physmap.h: include/linux/mtd/physmap.h:25: warning: 'struct platform_device' declared inside parameter list include/linux/mtd/physmap.h:25: warning: its scope is only this definition or declaration, which is probably not what you want include/linux/mtd/physmap.h:26: warning: 'struct platform_device' declared inside parameter list include/linux/mtd/physmap.h:27: warning: 'struct platform_device' declared inside parameter list Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
-
- 31 5月, 2011 2 次提交
-
-
由 Peter Zijlstra 提交于
While looking over the code I found that with the ttwu rework the nr_wakeups_migrate test broke since we now switch cpus prior to calling ttwu_stat(), hence the test is always true. Cure this by passing the migration state in wake_flags. Also move the whole test under CONFIG_SMP, its hard to migrate tasks on UP :-) Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-pwwxl7gdqs5676f1d4cx6pj7@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Namhyung Kim 提交于
Since those defined functions require additional semicolon from the caller, they could cause potential syntax errors when used in if-else statements. Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Acked-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 30 5月, 2011 11 次提交
-
-
由 Jens Axboe 提交于
It was not a good idea to start dereferencing disk->queue from the fs sysfs strategy for displaying discard alignment. We ran into first a NULL pointer deref, and after fixing that we sometimes see unvalid disk->queue pointer values. Since discard is the only one of the bunch actually looking into the queue, just revert the change. This reverts commit 23ceb5b7. Conflicts: fs/partitions/check.c
-
由 Michael S. Tsirkin 提交于
Add an API that tells the other side that callbacks should be delayed until a lot of work has been done. Implement using the new event_idx feature. Note: it might seem advantageous to let the drivers ask for a callback after a specific capacity has been reached. However, as a single head can free many entries in the descriptor table, we don't really have a clue about capacity until get_buf is called. The API is the simplest to implement at the moment, we'll see what kind of hints drivers can pass when there's more than one user of the feature. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Michael S. Tsirkin 提交于
With the new used_event and avail_event and features, both host and guest need similar logic to check whether events are enabled, so it helps to put the common code in the header. Note that Xen has similar logic for notification hold-off in include/xen/interface/io/ring.h with req_event and req_prod corresponding to event_idx + 1 and new_idx respectively. +1 comes from the fact that req_event and req_prod in Xen start at 1, while event index in virtio starts at 0. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Michael S. Tsirkin 提交于
Define a new feature bit for the guest and host to utilize an event index (like Xen) instead if a flag bit to enable/disable interrupts and kicks. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Rusty Russell 提交于
It's unclear to me if it's important, but it's obviously causing my technical colleages some headaches and I'd hate such imprecision to slow virtio adoption. I've emailed this to all non-trivial contributors for approval, too. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Acked-by: NGrant Likely <grant.likely@secretlab.ca> Acked-by: NRyan Harper <ryanh@us.ibm.com> Acked-by: NAnthony Liguori <aliguori@us.ibm.com> Acked-by: NEric Van Hensbergen <ericvh@gmail.com> Acked-by: Njohn cooper <john.cooper@redhat.com> Acked-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com> Acked-by: NFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
-
由 Linus Torvalds 提交于
Thomas Gleixner reports that we now have a boot crash triggered by CONFIG_CPUMASK_OFFSTACK=y: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<c11ae035>] find_next_bit+0x55/0xb0 Call Trace: [<c11addda>] cpumask_any_but+0x2a/0x70 [<c102396b>] flush_tlb_mm+0x2b/0x80 [<c1022705>] pud_populate+0x35/0x50 [<c10227ba>] pgd_alloc+0x9a/0xf0 [<c103a3fc>] mm_init+0xec/0x120 [<c103a7a3>] mm_alloc+0x53/0xd0 which was introduced by commit de03c72c ("mm: convert mm->cpu_vm_cpumask into cpumask_var_t"), and is due to wrong ordering of mm_init() vs mm_init_cpumask Thomas wrote a patch to just fix the ordering of initialization, but I hate the new double allocation in the fork path, so I ended up instead doing some more radical surgery to clean it all up. Reported-by: NThomas Gleixner <tglx@linutronix.de> Reported-by: NIngo Molnar <mingo@elte.hu> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Benny Halevy 提交于
Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
-
由 Benny Halevy 提交于
NFSv4.1 LAYOUTRETURN implementation Currently, does not support layout-type payload encoding. Signed-off-by: NAlexandros Batsakis <batsakis@netapp.com> Signed-off-by: NAndy Adamson <andros@citi.umich.edu> Signed-off-by: NAndy Adamson <andros@netapp.com> Signed-off-by: NDean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: NFred Isaman <iisaman@citi.umich.edu> Signed-off-by: NFred Isaman <iisaman@netapp.com> Signed-off-by: NMarc Eshel <eshel@almaden.ibm.com> Signed-off-by: NZhang Jingwang <zhangjingwang@nrchpc.ac.cn> [call pnfs_return_layout right before pnfs_destroy_layout] [remove assert_spin_locked from pnfs_clear_lseg_list] [remove wait parameter from the layoutreturn path.] [remove return_type field from nfs4_layoutreturn_args] [remove range from nfs4_layoutreturn_args] [no need to send layoutcommit from _pnfs_return_layout] [don't wait on sync layoutreturn] [fix layout stateid in layoutreturn args] [fixed NULL deref in _pnfs_return_layout] [removed recaim member of nfs4_layoutreturn_args] Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
-
由 Benny Halevy 提交于
Non-rpc layout driver such as for objects and blocks implement their own I/O path and error handling logic. Therefore bypass NFS-based error handling for these layout drivers. [fix lseg ref-count bugs, and null de-refs] [Fall out from: non-rpc layout drivers] Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com> [get rid of PNFS_USE_RPC_CODE] [get rid of __nfs4_write_done_cb] [revert useless change in nfs4_write_done_cb] Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
-
由 Benny Halevy 提交于
* Add the pnfs_osd_xdr.h header * defintions the pnfs_osd_layout structure including all it's sub-types and constants. * Declare the pnfs_osd_xdr_decode_layout API + all needed inline helpers. * Define the pnfs_osd_deviceaddr structure and all its subtypes and constants. * Declare API for decoding of a pnfs_osd_deviceaddr from XDR stream. * Define the pnfs_osd_ioerr structure, its substructures and constants. * Declare API for encoding of a pnfs_osd_ioerr into XDR stream. * Define the pnfs_osd_layoutupdate structure and its substructures. * Declare API for encoding of a pnfs_osd_layoutupdate into XDR stream. [Remove server definitions] Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com> Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
-
由 Benny Halevy 提交于
Initialize xdr_stream and xdr_buf using an array of page pointers and length of buffer. Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
-
- 29 5月, 2011 6 次提交
-
-
由 Mikulas Patocka 提交于
Return client directly from dm_kcopyd_client_create, not through a parameter, making it consistent with dm_io_client_create. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Reserve just the minimum of pages needed to process one job. Because we allocate pages from page allocator, we don't need to reserve a large number of pages. The maximum job size is SUB_JOB_SIZE and we calculate the number of reserved pages based on this. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Replace the arbitrary calculation of an initial io struct mempool size with a constant. The code calculated the number of reserved structures based on the request size and used a "magic" multiplication constant of 4. This patch changes it to reserve a fixed number - itself still chosen quite arbitrarily. Further testing might show if there is a better number to choose. Note that if there is no memory pressure, we can still allocate an arbitrary number of "struct io" structures. One structure is enough to process the whole request. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mike Snitzer 提交于
Permit a target to support discards regardless of whether or not all its underlying devices do. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Heiko Carstens 提交于
page_get_storage_key() and page_set_storage_key() expect a page address and not its page frame number. This got inconsistent with 2d42552d "[S390] merge page_test_dirty and page_clear_dirty". Result is that we read/write storage keys from random pages and do not have a working dirty bit tracking at all. E.g. SetPageUpdate() doesn't clear the dirty bit of requested pages, which for example ext4 doesn't like very much and panics after a while. Unable to handle kernel paging request at virtual user address (null) Oops: 0004 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 1 Not tainted 2.6.39-07551-g139f37f5-dirty #152 Process flush-94:0 (pid: 1576, task: 000000003eb34538, ksp: 000000003c287b70) Krnl PSW : 0704c00180000000 0000000000316b12 (jbd2_journal_file_inode+0x10e/0x138) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3 Krnl GPRS: 0000000000000000 0000000000000000 0000000000000000 0700000000000000 0000000000316a62 000000003eb34cd0 0000000000000025 000000003c287b88 0000000000000001 000000003c287a70 000000003f1ec678 000000003f1ec000 0000000000000000 000000003e66ec00 0000000000316a62 000000003c287988 Krnl Code: 0000000000316b04: f0a0000407f4 srp 4(11,%r0),2036,0 0000000000316b0a: b9020022 ltgr %r2,%r2 0000000000316b0e: a7740015 brc 7,316b38 >0000000000316b12: e3d0c0000024 stg %r13,0(%r12) 0000000000316b18: 4120c010 la %r2,16(%r12) 0000000000316b1c: 4130d060 la %r3,96(%r13) 0000000000316b20: e340d0600004 lg %r4,96(%r13) 0000000000316b26: c0e50002b567 brasl %r14,36d5f4 Call Trace: ([<0000000000316a62>] jbd2_journal_file_inode+0x5e/0x138) [<00000000002da13c>] mpage_da_map_and_submit+0x2e8/0x42c [<00000000002daac2>] ext4_da_writepages+0x2da/0x504 [<00000000002597e8>] writeback_single_inode+0xf8/0x268 [<0000000000259f06>] writeback_sb_inodes+0xd2/0x18c [<000000000025a700>] writeback_inodes_wb+0x80/0x168 [<000000000025aa92>] wb_writeback+0x2aa/0x324 [<000000000025abde>] wb_do_writeback+0xd2/0x274 [<000000000025ae3a>] bdi_writeback_thread+0xba/0x1c4 [<00000000001737be>] kthread+0xa6/0xb0 [<000000000056c1da>] kernel_thread_starter+0x6/0xc [<000000000056c1d4>] kernel_thread_starter+0x0/0xc INFO: lockdep is turned off. Last Breaking-Event-Address: [<0000000000316a8a>] jbd2_journal_file_inode+0x86/0x138 Reported-by: NSebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
-
由 Tim Chen 提交于
Thanks to the reviews and comments by Rafael, James, Mark and Andi. Here's version 2 of the patch incorporating your comments and also some update to my previous patch comments. I noticed that before entering idle state, the menu idle governor will look up the current pm_qos target value according to the list of qos requests received. This look up currently needs the acquisition of a lock to access the list of qos requests to find the qos target value, slowing down the entrance into idle state due to contention by multiple cpus to access this list. The contention is severe when there are a lot of cpus waking and going into idle. For example, for a simple workload that has 32 pair of processes ping ponging messages to each other, where 64 cpu cores are active in test system, I see the following profile with 37.82% of cpu cycles spent in contention of pm_qos_lock: - 37.82% swapper [kernel.kallsyms] [k] _raw_spin_lock_irqsave - _raw_spin_lock_irqsave - 95.65% pm_qos_request menu_select cpuidle_idle_call - cpu_idle 99.98% start_secondary A better approach will be to cache the updated pm_qos target value so reading it does not require lock acquisition as in the patch below. With this patch the contention for pm_qos_lock is removed and I saw a 2.2X increase in throughput for my message passing workload. cc: stable@kernel.org Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com> Acked-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJames Bottomley <James.Bottomley@suse.de> Acked-by: Nmark gross <markgross@thegnar.org> Signed-off-by: NLen Brown <len.brown@intel.com>
-