- 08 5月, 2012 1 次提交
-
-
由 Sage Weil 提交于
This was an ill-conceived feature that has been removed from Ceph. Do this gracefully: - reject attempts to specify a preferred_osd via the ioctl - stop exposing this information via virtual xattrs - always fill in -1 for requests, in case we talk to an older server - don't calculate preferred_osd placements/pgids Reviewed-by: NAlex Elder <elder@inktank.com> Signed-off-by: NSage Weil <sage@inktank.com>
-
- 14 12月, 2011 1 次提交
-
-
由 Sage Weil 提交于
Commit 06222e49 got the if wrong so that it always evaluates as true. This is semantically harmless, but makes SEEK_CUR and SEEK_SET needlessly query the server. Rewrite the if to explicitly enumerate the cases we DO need a valid i_size to make this code less fragile. Reported-by: NRoel Kluin <roel.kluin@gmail.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 08 12月, 2011 1 次提交
-
-
由 Sage Weil 提交于
We have been using i_lock to protect all kinds of data structures in the ceph_inode_info struct, including lists of inodes that we need to iterate over while avoiding races with inode destruction. That requires grabbing a reference to the inode with the list lock protected, but igrab() now takes i_lock to check the inode flags. Changing the list lock ordering would be a painful process. However, using a ceph-specific i_ceph_lock in the ceph inode instead of i_lock is a simple mechanical change and avoids the ordering constraints imposed by igrab(). Reported-by: NAmon Ott <a.ott@m-privacy.de> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 27 7月, 2011 6 次提交
-
-
由 Sage Weil 提交于
d_parent is protected by d_lock: use it when looking up a dentry's parent directory inode. Also take a reference and drop it in the caller to avoid a use-after-free. Reported-by: NAl Viro <viro@ZenIV.linux.org.uk> Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
We weren't properly calling lookup_instantiate_filp when setting up the lookup intent, which could lead to file leakage on errors. So: - use separate helper for the hidden snapdir translation, immediately following the mds request - use ceph_finish_lookup for the final dentry/return value dance in the exit path - lookup_instantiate_filp on success Reported-by: NAl Viro <viro@ZenIV.linux.org.uk> Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
We only need to put these on the directory unsafe list if they have side effects that fsync(2) should flush out. Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
We were always getting NULL here because the intent file f_dentry is always NULL at this point, which means we were always passing NULL to ceph_mdsc_do_request. In reality, this was fine, since this isn't currently ever a write operation that needs to get strung on the dir's unsafe list. Use the dir explicitly, and only pass it if this open has side-effects that a dir fsync should flush. Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
The generic_file_aio_write call may block on balance_dirty_pages while we flush data to the OSDs. If we hold a reference to the FILE_WR cap during that interval revocation by the MDS (e.g., to do a stat(2)) may be very slow. Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
This allows us to force IO through the sync path which you normally only get when multiple clients are reading/writing to the same file or by mounting with -o sync. Among other things, this lets test programs verify correctness with a single mount. Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 21 7月, 2011 1 次提交
-
-
由 Josef Bacik 提交于
This converts everybody to handle SEEK_HOLE/SEEK_DATA properly. In some cases we just return -EINVAL, in others we do the normal generic thing, and in others we're simply making sure that the properly due-dilligence is done. For example in NFS/CIFS we need to make sure the file size is update properly for the SEEK_HOLE and SEEK_DATA case, but since it calls the generic llseek stuff itself that is all we have to do. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 20 7月, 2011 1 次提交
-
-
由 Al Viro 提交于
->create() instances are much happier that way... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 14 6月, 2011 2 次提交
-
-
由 Sage Weil 提交于
We were iterating across stripe boundaries properly, but not moving the write buffer pointer forward. This caused us to rewrite the same data after the break. Fix by adjusting the data pointer forward, and recalculating the io and buffer alignment after the break. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
dd if=/dev/urandom of=/mnt/fs_depot/dd10 bs=500 seek=8388 count=1 dd if=/mnt/fs_depot/dd10 of=/root/dd10out bs=500 skip=8388 count=1 Reported-by: NHenry C Chang <henry.cy.chang@gmail.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 08 6月, 2011 3 次提交
-
-
由 Sage Weil 提交于
Getting ENOENT is equivalent to reading 0 bytes. Make that correction before setting up the hit_stripe and was_short flags. Fixes the following case: dd if=/dev/zero of=/mnt/fs_depot/dd3 bs=1 seek=1048576 count=0 dd if=/mnt/fs_depot/dd3 of=/root/ddout1 skip=8 bs=500 count=2 iflag=direct Reported-by: NHenry C Chang <henry.cy.chang@gmail.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
If we get a short read from the OSD because the object is small, we need to zero the remainder of the buffer. For O_DIRECT reads, the attempted range is not trimmed to i_size by the VFS, so we were actually looping indefinitely. Fix by trimming by i_size, and the unconditionally zeroing the trailing range. Reported-by: NJeff Wu <cpwu@tnsoft.com.cn> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
We should use ihold whenever we already have a stable inode ref, even when we aren't holding i_lock. This avoids adding new and unnecessary locking dependencies. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 05 5月, 2011 1 次提交
-
-
由 Sage Weil 提交于
The __mark_dirty_inode helper now takes i_lock as of 250df6ed. Fix the one ceph callers that held i_lock (__ceph_mark_dirty_caps) to return the flags value so that the callers can do it outside of i_lock. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 22 3月, 2011 2 次提交
-
-
由 Henry C Chang 提交于
In sync_write_wait(), we assume that the newest request is at the tail of unsafe write list. We should maintain the semantics here. Signed-off-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Henry C Chang 提交于
This fixes the list corruption warning like this: ------------[ cut here ]------------ WARNING: at lib/list_debug.c:30 __list_add+0x68/0x81() Hardware name: X8DTU list_add corruption. prev->next should be next (ffff880618931250), but was (null). (prev=ffff880c188b9130). Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs ceph libceph libcrc32c sunrpc ipv6 fuse igb i2c_i801 ioatdma i2c_core iTCO_wdt iTCO_vendor_support joydev dca serio_raw usb_storage [last unloaded: scsi_wait_scan] Pid: 10977, comm: smbd Tainted: G W 2.6.32.23-170.Elaster.xendom0.fc12.x86_64 #1 Call Trace: [<ffffffff8105753c>] warn_slowpath_common+0x7c/0x94 [<ffffffff810575ab>] warn_slowpath_fmt+0x41/0x43 [<ffffffff812351a3>] __list_add+0x68/0x81 [<ffffffffa014799d>] ceph_aio_write+0x614/0x8a2 [ceph] [<ffffffff8111d2a0>] do_sync_write+0xe8/0x125 [<ffffffff81075a1f>] ? autoremove_wake_function+0x0/0x39 [<ffffffff811f21ec>] ? selinux_file_permission+0x5c/0xb3 [<ffffffff811e8521>] ? security_file_permission+0x16/0x18 [<ffffffff8111d864>] vfs_write+0xae/0x10b [<ffffffff8111d91b>] sys_pwrite64+0x5a/0x76 [<ffffffff81012d32>] system_call_fastpath+0x16/0x1b ---[ end trace 08573eb9f07ff6f4 ]--- Signed-off-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 18 12月, 2010 1 次提交
-
-
由 Henry C Chang 提交于
For read operation, we have to set the argument _write_ of get_user_pages to 1 since we will write data to pages. Also, we need to SetPageDirty before releasing these pages. Signed-off-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 16 12月, 2010 1 次提交
-
-
由 Henry C Chang 提交于
The user buffer may be 512-byte aligned, not page-aligned. We were assuming the buffer was page-aligned and only accounting for non-page-aligned io offsets. Signed-off-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 10 11月, 2010 2 次提交
-
-
由 Sage Weil 提交于
We used to infer alignment of IOs within a page based on the file offset, which assumed they matched. This broke with direct IO that was not aligned to pages (e.g., 512-byte aligned IO). We were also trusting the alignment specified in the OSD reply, which could have been adjusted by the server. Explicitly specify the page alignment when setting up OSD IO requests. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
The offset/length arguments aren't used. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 08 11月, 2010 1 次提交
-
-
由 Sage Weil 提交于
Normally when we open a file we already have a cap, and simply update the wanted set. However, if we open a file for write, but don't have an auth cap, that doesn't work; we need to open a new cap with the auth MDS. Only reuse existing caps if we are opening for read or the existing cap is auth. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 21 10月, 2010 1 次提交
-
-
由 Yehuda Sadeh 提交于
This factors out protocol and low-level storage parts of ceph into a separate libceph module living in net/ceph and include/linux/ceph. This is mostly a matter of moving files around. However, a few key pieces of the interface change as well: - ceph_client becomes ceph_fs_client and ceph_client, where the latter captures the mon and osd clients, and the fs_client gets the mds client and file system specific pieces. - Mount option parsing and debugfs setup is correspondingly broken into two pieces. - The mon client gets a generic handler callback for otherwise unknown messages (mds map, in this case). - The basic supported/required feature bits can be expanded (and are by ceph_fs_client). No functional change, aside from some subtle error handling cases that got cleaned up in the refactoring process. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 07 10月, 2010 1 次提交
-
-
由 Henry C Chang 提交于
Fix argument order. Signed-off-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 04 8月, 2010 1 次提交
-
-
由 Sage Weil 提交于
Signed-off-by: NSage Weil <sage@newdream.net>
-
- 03 8月, 2010 1 次提交
-
-
由 Greg Farnum 提交于
Implement flock inode operation to support advisory file locking. All lock/unlock operations are synchronous with the MDS. Lock state is sent when reconnecting to a recovering MDS to restore the shared lock state. Signed-off-by: NGreg Farnum <gregf@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 02 8月, 2010 3 次提交
-
-
由 Yehuda Sadeh 提交于
Mainly fixing minor issues reported by sparse. Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
If the file mode is marked as "lazy," perform cached/buffered reads when the caps permit it. Adjust the rdcache_gen and invalidation logic accordingly so that we manage our cache based on the FILE_CACHE -or- FILE_LAZYIO cap bits. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
If we have marked a file as "lazy" (using the ceph ioctl), perform buffered writes when the MDS caps allow it. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 28 7月, 2010 1 次提交
-
-
由 Yehuda Sadeh 提交于
This fixes an issue triggered by running concurrent syncs. One of the syncs would go through while the other would just hang indefinitely. In any case, we never actually want to wake a single waiter, so the *_all functions should be used. Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 30 5月, 2010 1 次提交
-
-
由 Julia Lawall 提交于
Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more clear what is the purpose of the operation, which otherwise looks like a no-op. In the case of fs/ceph/inode.c, ERR_CAST is not needed, because the type of the returned value is the same as the type of the enclosing function. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ type T; T x; identifier f; @@ T f (...) { <+... - ERR_PTR(PTR_ERR(x)) + x ...+> } @@ expression x; @@ - ERR_PTR(PTR_ERR(x)) + ERR_CAST(x) // </smpl> Signed-off-by: NJulia Lawall <julia@diku.dk> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 22 5月, 2010 1 次提交
-
-
由 Christoph Hellwig 提交于
Now that the last user passing a NULL file pointer is gone we can remove the redundant dentry argument and associated hacks inside vfs_fsynmc_range. The next step will be removig the dentry argument from ->fsync, but given the luck with the last round of method prototype changes I'd rather defer this until after the main merge window. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 18 5月, 2010 4 次提交
-
-
由 Yehuda Sadeh 提交于
This is essential, as for the rados block device we'll need to run in different contexts that would need flags that are other than GFP_NOFS. Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Returning ERR_PTR(-ENOMEM) is useless extra work. Return NULL on failure instead, and fix up the callers (about half of which were wrong anyway). Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Cheng Renquan 提交于
ceph_sb_to_client and ceph_client are really identical, we need to dump one; while function ceph_client is confusing with "struct ceph_client", ceph_sb_to_client's definition is more clear; so we'd better switch all call to ceph_sb_to_client. -static inline struct ceph_client *ceph_client(struct super_block *sb) -{ - return sb->s_fs_info; -} Signed-off-by: NCheng Renquan <crquan@gmail.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Yehuda Sadeh 提交于
Following Nick Piggin patches in btrfs, pagecache pages should be allocated with __page_cache_alloc, so they obey pagecache memory policies. Also, using add_to_page_cache_lru instead of using a private pagevec where applicable. Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 04 5月, 2010 1 次提交
-
-
由 Sage Weil 提交于
truncate_inode_pages_range wants the end offset to align with the last byte in a page. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 30 3月, 2010 1 次提交
-
-
由 Tejun Heo 提交于
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: NTejun Heo <tj@kernel.org> Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
-