- 10 2月, 2014 3 次提交
-
-
由 Oleg Nesterov 提交于
Cosmetic. This doesn't really matter because a) device->mutex is the only user of __lockdep_no_validate__ and b) this class should be never reported as the source of problem, but if something goes wrong "&dev->mutex" looks better than "&__lockdep_no_validate__" as the name of the lock. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140120182016.GA26512@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Oleg Nesterov 提交于
The "int check" argument of lock_acquire() and held_lock->check are misleading. This is actually a boolean: 2 means "true", everything else is "false". And there is no need to pass 1 or 0 to lock_acquire() depending on CONFIG_PROVE_LOCKING, __lock_acquire() checks prove_locking at the start and clears "check" if !CONFIG_PROVE_LOCKING. Note: probably we can simply kill this member/arg. The only explicit user of check => 0 is rcu_lock_acquire(), perhaps we can change it to use lock_acquire(trylock =>, read => 2). __lockdep_no_validate means check => 0 implicitly, but we can change validate_chain() to check hlock->instance->key instead. Not to mention it would be nice to get rid of lockdep_set_novalidate_class(). Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140120182006.GA26495@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Tim Chen 提交于
This patch allows each architecture to add its specific assembly optimized arch_mcs_spin_lock_contended and arch_mcs_spinlock_uncontended for MCS lock and unlock functions. Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: AswinChandramouleeswaran <aswin@hp.com> Cc: George Spelvin <linux@horizon.com> Cc: Rik vanRiel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: MichelLespinasse <walken@google.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alex Shi <alex.shi@linaro.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Figo.zhang" <figo1802@gmail.com> Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Waiman Long <waiman.long@hp.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Will Deacon <will.deacon@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew R Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1390347382.3138.67.camel@schen9-DESKSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 09 2月, 2014 1 次提交
-
-
由 Johannes Berg 提交于
This warning seems to show up a lot now, since ___wait_event() is (indirectly) used inside wait_event_timeout(), which also has a variable called __ret. Rename the one in ___wait_event() to ___ret (another leading underscore) to suppress the warning. Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1391704121.12789.20.camel@jlt4.sipsolutions.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 01 2月, 2014 1 次提交
-
-
由 Stephan Springl 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 31 1月, 2014 4 次提交
-
-
由 Minchan Kim 提交于
Add my copyright to the zsmalloc source code which I maintain. Signed-off-by: NMinchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
This patch moves zsmalloc under mm directory. Before that, description will explain why we have needed custom allocator. Zsmalloc is a new slab-based memory allocator for storing compressed pages. It is designed for low fragmentation and high allocation success rate on large object, but <= PAGE_SIZE allocations. zsmalloc differs from the kernel slab allocator in two primary ways to achieve these design goals. zsmalloc never requires high order page allocations to back slabs, or "size classes" in zsmalloc terms. Instead it allows multiple single-order pages to be stitched together into a "zspage" which backs the slab. This allows for higher allocation success rate under memory pressure. Also, zsmalloc allows objects to span page boundaries within the zspage. This allows for lower fragmentation than could be had with the kernel slab allocator for objects between PAGE_SIZE/2 and PAGE_SIZE. With the kernel slab allocator, if a page compresses to 60% of it original size, the memory savings gained through compression is lost in fragmentation because another object of the same size can't be stored in the leftover space. This ability to span pages results in zsmalloc allocations not being directly addressable by the user. The user is given an non-dereferencable handle in response to an allocation request. That handle must be mapped, using zs_map_object(), which returns a pointer to the mapped region that can be used. The mapping is necessary since the object data may reside in two different noncontigious pages. The zsmalloc fulfills the allocation needs for zram perfectly [sjenning@linux.vnet.ibm.com: borrow Seth's quote] Signed-off-by: NMinchan Kim <minchan@kernel.org> Acked-by: NNitin Gupta <ngupta@vflare.org> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Luigi Semenzato <semenzato@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Pekka Enberg <penberg@kernel.org> Cc: Rik van Riel <riel@redhat.com> Cc: Seth Jennings <sjenning@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Make smp_call_function_single and friends more efficient by using a lockless list. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yinghai Lu 提交于
Now we have memblock_virt_alloc_low to replace original bootmem api in swiotlb. But we should not use BOOTMEM_LOW_LIMIT for arch that does not support CONFIG_NOBOOTMEM, as old api take 0. | #define alloc_bootmem_low(x) \ | __alloc_bootmem_low(x, SMP_CACHE_BYTES, 0) |#define alloc_bootmem_low_pages_nopanic(x) \ | __alloc_bootmem_low_nopanic(x, PAGE_SIZE, 0) and we have #define BOOTMEM_LOW_LIMIT __pa(MAX_DMA_ADDRESS) for CONFIG_NOBOOTMEM. Restore goal to 0 to fix ia64 crash, that Tony found. Signed-off-by: NYinghai Lu <yinghai@kernel.org> Reported-by: NTony Luck <tony.luck@gmail.com> Tested-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 1月, 2014 4 次提交
-
-
由 Heiko Carstens 提交于
Commit d5dc77bf ("consolidate compat lookup_dcookie()") coverted all architectures to the new compat_sys_lookup_dcookie() syscall. The "len" paramater of the new compat syscall must have the type compat_size_t in order to enforce zero extension for architectures where the ABI requires that the caller of a function performed zero and/or sign extension to 64 bit of all parameters. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: <stable@vger.kernel.org> [v3.10+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Heiko Carstens 提交于
We got a report that the pwritev syscall does not work correctly in compat mode on s390. It turned out that with commit 72ec3516 ("switch compat readv/writev variants to COMPAT_SYSCALL_DEFINE") we lost the zero extension of a couple of syscall parameters because the some parameter types haven't been converted from unsigned long to compat_ulong_t. This is needed for architectures where the ABI requires that the caller of a function performed zero and/or sign extension to 64 bit of all parameters. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: <stable@vger.kernel.org> [v3.10+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
The VM is currently heavily tuned to avoid swapping. Whether that is good or bad is a separate discussion, but as long as the VM won't swap to make room for dirty cache, we can not consider anonymous pages when calculating the amount of dirtyable memory, the baseline to which dirty_background_ratio and dirty_ratio are applied. A simple workload that occupies a significant size (40+%, depending on memory layout, storage speeds etc.) of memory with anon/tmpfs pages and uses the remainder for a streaming writer demonstrates this problem. In that case, the actual cache pages are a small fraction of what is considered dirtyable overall, which results in an relatively large portion of the cache pages to be dirtied. As kswapd starts rotating these, random tasks enter direct reclaim and stall on IO. Only consider free pages and file pages dirtyable. Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reported-by: NTejun Heo <tj@kernel.org> Tested-by: NTejun Heo <tj@kernel.org> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jean Delvare 提交于
Signed-off-by: NJean Delvare <khali@linux-fr.org>
-
- 29 1月, 2014 2 次提交
-
-
由 Jan Kara 提交于
The event returned from fsnotify_add_notify_event() cannot ever be used safely as the event may be freed by the time the function returns (after dropping notification_mutex). So change the prototype to just return whether the event was added or merged into some existing event. Reported-and-tested-by: NJiri Kosina <jkosina@suse.cz> Reported-and-tested-by: NDave Jones <davej@fedoraproject.org> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Josef Bacik 提交于
Btrfs needs a simple way to know if it needs to let go of it's read lock on a rwsem. Introduce rwsem_is_contended to check to see if there are any waiters on this rwsem currently. This is just a hueristic, it is meant to be light and not 100% accurate and called by somebody already holding on to the rwsem in either read or write. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <clm@fb.com> Acked-by: NIngo Molnar <mingo@kernel.org>
-
- 28 1月, 2014 25 次提交
-
-
由 Jose Alonso 提交于
I observed that there are for_each macros that do an extra memory access beyond the defined area. Normally this does not cause problems. But, this can cause exceptions. For example: if the area is allocated at the end of a page and the next page is not accessible. For correctness, I suggest changing the arguments of the 'for loop' like others 'for_each' do in the kernel. Signed-off-by: NJose Alonso <joalonsof@gmail.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Will Deacon 提交于
When contended, architectures may be able to reduce the polling overhead in ways which aren't expressible using a simple relax() primitive. This patch allows architectures to hook into the mcs_{lock,unlock} functions for the contended cases only. Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1390347370.3138.65.camel@schen9-DESKSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jason Low 提交于
Remove unnecessary operation to assign locked status to 1 if lock is acquired without contention. Lock status will not be checked by lock holder again once it is acquired and any lock contenders will not be looking at the lock holder's lock status. Make the cmpxchg(lock, node, NULL) == node check in mcs_spin_unlock() likely() as it is likely that a race did not occur most of the time. Also add in more comments describing how the local node is used in MCS locks. Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NTim Chen <tim.c.chen@linux.intel.com> Signed-off-by: NJason Low <jason.low2@hp.com> Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1390347365.3138.64.camel@schen9-DESKSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Tim Chen 提交于
We will need the MCS lock code for doing optimistic spinning for rwsem and queued rwlock. Extracting the MCS code from mutex.c and put into its own file allow us to reuse this code easily. We also inline mcs_spin_lock and mcs_spin_unlock functions for better efficiency. Note that using the smp_load_acquire/smp_store_release pair used in mcs_lock and mcs_unlock is not sufficient to form a full memory barrier across cpus for many architectures (except x86). For applications that absolutely need a full barrier across multiple cpus with mcs_unlock and mcs_lock pair, smp_mb__after_unlock_lock() should be used after mcs_lock. Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com> Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1390347360.3138.63.camel@schen9-DESKSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Joe Perches 提交于
Reduce data size a little. Reduce checkpatch noise. $ size kernel/softirq.o* text data bss dec hex filename 11554 6013 4008 21575 5447 kernel/softirq.o.new 11474 6093 4008 21575 5447 kernel/softirq.o.old Signed-off-by: NJoe Perches <joe@perches.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Xiao Guangrong 提交于
@splice_desc.total_len is 32 bit(unsigned int) which is used to store the size passed from userspace which is 64 bit(size_t) so that the size is unexpectedly truncated That means vmsplice can not work if the size passed from userspace is >= 4G, for example, we noticed in vmsplice, splice-reader does not do anything and splice-writer is waiting for available buffer forever if the size is 4G Fix it by extending @splice_desc.total_len to 64 bits as well Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
This field is only used to reset the ids seq number if it exceeds the smaller of INT_MAX/SEQ_MULTIPLIER and USHRT_MAX, and can therefore be moved out of the structure and into its own macro. Since each ipc_namespace contains a table of 3 pointers to struct ipc_ids we can save space in instruction text: text data bss dec hex filename 56232 2348 24 58604 e4ec ipc/built-in.o 56216 2348 24 58588 e4dc ipc/built-in.o-after Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com> Reviewed-by: NJonathan Gonzalez <jgonzalez@linets.cl> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: NManfred Spraul <manfred@colorfullife.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Manfred Spraul 提交于
The ipc code does not adhere the typical linux coding style. This patch fixes lots of simple whitespace errors. - mostly autogenerated by scripts/checkpatch.pl -f --fix \ --types=pointer_location,spacing,space_before_tab - one manual fixup (keep structure members tab-aligned) - removal of additional space_before_tab that were not found by --fix Tested with some of my msg and sem test apps. Andrew: Could you include it in -mm and move it towards Linus' tree? Signed-off-by: NManfred Spraul <manfred@colorfullife.com> Suggested-by: NLi Bin <huawei.libin@huawei.com> Cc: Joe Perches <joe@perches.com> Acked-by: NRafael Aquini <aquini@redhat.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rafael Aquini 提交于
struct kern_ipc_perm.deleted is meant to be used as a boolean toggle, and the changes introduced by this patch are just to make the case explicit. Signed-off-by: NRafael Aquini <aquini@redhat.com> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: Greg Thelen <gthelen@google.com> Acked-by: NDavidlohr Bueso <davidlohr@hp.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yinghai Lu 提交于
The new memblock_virt APIs are used to replaced old bootmem API. We need to allocate page below 4G for swiotlb. That should fix regression on Andrew's system that is using swiotlb. Signed-off-by: NYinghai Lu <yinghai@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: NSantosh Shilimkar <santosh.shilimkar@ti.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sachin Kamat 提交于
Commit c02cecb9 ("ARM: orion: move platform_data definitions") moved the files to the current location but forgot to remove the pointer to its previous location. Clean it up. While at it also change the header file protection macros appropriately. Signed-off-by: NSachin Kamat <sachin.kamat@linaro.org> Signed-off-by: NBryan Wu <cooloney@gmail.com>
-
由 Alexander Shiyan 提交于
LED platform data are overwhelmed by excessive field "max_cur" which just replicates few bits of "led_control" field. This patch removes this field and adds a definition for the current settings in the header. Signed-off-by: NAlexander Shiyan <shc_work@mail.ru> Signed-off-by: NBryan Wu <cooloney@gmail.com>
-
由 Ilya Dryomov 提交于
Announce our (limited, see previous commit) support for CACHEPOOL feature. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
Follow redirect replies from osds, for details see ceph.git commit fbbe3ad1220799b7bb00ea30fce581c5eadaf034. v1 (current) version of redirect reply consists of oloc and oid, which expands to pool, key, nspace, hash and oid. However, server-side code that would populate anything other than pool doesn't exist yet, and hence this commit adds support for pool redirects only. To make sure that future server-side updates don't break us, we decode all fields and, if any of key, nspace, hash or oid have a non-default value, error out with "corrupt osd_op_reply ..." message. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
Rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid} before introducing r_target_{oloc,oid} needed for redirects. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
Overwrite ceph_osd_request::r_oloc.pool with read_tier for read ops and write_tier for write and read+write ops (aka basic tiering support). {read,write}_tier are part of pg_pool_t since v9. This commit bumps our pg_pool_t decode compat version from v7 to v9, all new fields except for {read,write}_tier are ignored. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
"Lookup pool info by ID" function is hidden in osdmap.c. Expose it to the rest of libceph. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
Update CEPH_OSD_FLAG_* enum. (We need CEPH_OSD_FLAG_IGNORE_OVERLAY to support tiering). Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
Switch ceph_calc_ceph_pg() to new oloc and oid abstractions and rename it to ceph_oloc_oid_to_pg() to make its purpose more clear. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
In preparation for tiering support, which would require having two (base and target) object names for each osd request and also copying those names around, introduce struct ceph_object_id (oid) and a couple helpers to facilitate those copies and encapsulate the fact that object name is not necessarily a NUL-terminated string. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
In preparation for adding oid abstraction, rename MAX_OBJ_NAME_SIZE to CEPH_MAX_OID_NAME_LEN. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
Move ceph_file_layout helper macros and inline functions to ceph_fs.h. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
Instead of relying on pool fields in ceph_file_layout (for mapping) and ceph_pg (for enconding), start using ceph_object_locator (oloc) abstraction. Note that userspace oloc currently consists of pool, key, nspace and hash fields, while this one contains only a pool. This is OK, because at this point we only send (i.e. encode) olocs and never have to receive (i.e. decode) them. This makes keeping a copy of ceph_file_layout in every osd request unnecessary, so ceph_osd_request::r_file_layout field is nuked. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Chen Gang 提交于
For some assemblers, they use another character as newline in a macro (e.g. arc uses '`'), so for generic assembly code, need use ASM_NL (a macro) instead of ';' for it. Signed-off-by: NChen Gang <gang.chen.5i5j@gmail.com> Acked-by: NVineet Gupta <vgupta@synopsys.com> Signed-off-by: NMichal Marek <mmarek@suse.cz>
-
由 Jeff Layton 提交于
There is a possible race in how the nfs_invalidate_mapping function is handled. Currently, we go and invalidate the pages in the file and then clear NFS_INO_INVALID_DATA. The problem is that it's possible for a stale page to creep into the mapping after the page was invalidated (i.e., via readahead). If another writer comes along and sets the flag after that happens but before invalidate_inode_pages2 returns then we could clear the flag without the cache having been properly invalidated. So, we must clear the flag first and then invalidate the pages. Doing this however, opens another race: It's possible to have two concurrent read() calls that end up in nfs_revalidate_mapping at the same time. The first one clears the NFS_INO_INVALID_DATA flag and then goes to call nfs_invalidate_mapping. Just before calling that though, the other task races in, checks the flag and finds it cleared. At that point, it trusts that the mapping is good and gets the lock on the page, allowing the read() to be satisfied from the cache even though the data is no longer valid. These effects are easily manifested by running diotest3 from the LTP test suite on NFS. That program does a series of DIO writes and buffered reads. The operations are serialized and page-aligned but the existing code fails the test since it occasionally allows a read to come out of the cache incorrectly. While mixing direct and buffered I/O isn't recommended, I believe it's possible to hit this in other ways that just use buffered I/O, though that situation is much harder to reproduce. The problem is that the checking/clearing of that flag and the invalidation of the mapping really need to be atomic. Fix this by serializing concurrent invalidations with a bitlock. At the same time, we also need to allow other places that check NFS_INO_INVALID_DATA to check whether we might be in the middle of invalidating the file, so fix up a couple of places that do that to look for the new NFS_INO_INVALIDATING flag. Doing this requires us to be careful not to set the bitlock unnecessarily, so this code only does that if it believes it will be doing an invalidation. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-