- 05 4月, 2014 1 次提交
-
-
由 Yan, Zheng 提交于
flock and posix lock should use fl->fl_file instead of process ID as owner identifier. (posix lock uses fl->fl_owner. fl->fl_owner is usually equal to fl->fl_file, but it also can be a customized value). The process ID of who holds the lock is just for F_GETLK fcntl(2). The fix is rename the 'pid' fields of struct ceph_mds_request_args and struct ceph_filelock to 'owner', rename 'pid_namespace' fields to 'pid'. Assign fl->fl_file to the 'owner' field of lock messages. We also set the most significant bit of the 'owner' field. MDS can use that bit to distinguish between old and new clients. The MDS counterpart of this patch modifies the flock code to not take the 'pid_namespace' into consideration when checking conflict locks. Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
- 18 2月, 2014 1 次提交
-
-
由 Sage Weil 提交于
Make the 'acl' option dependent on having ACL support compiled in. Make the 'noacl' option work even without it so that one can always ask it to be off and not error out on mount when it is not supported. Signed-off-by: NGuangliang Zhao <lucienchao@gmail.com> Signed-off-by: NSage Weil <sage@inktank.com>
-
- 01 1月, 2014 2 次提交
-
-
由 Ilya Dryomov 提交于
In preparation for ceph_features.h update, change all features fields from unsigned int/u32 to u64. (ceph.git has ~40 feature bits at this point.) Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Guangliang Zhao 提交于
Signed-off-by: NGuangliang Zhao <lucienchao@gmail.com> Reviewed-by: NLi Wang <li.wang@ubuntykylin.com> Reviewed-by: NZheng Yan <zheng.z.yan@intel.com>
-
- 14 12月, 2013 1 次提交
-
-
由 Yan, Zheng 提交于
Positve dentry and corresponding inode are always accompanied in MDS reply. So no need to keep inode in the cache after dropping all its aliases. Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
- 07 9月, 2013 1 次提交
-
-
由 Milosz Tanski 提交于
Adding support for fscache to the Ceph filesystem. This would bring it to on par with some of the other network filesystems in Linux (like NFS, AFS, etc...) In order to mount the filesystem with fscache the 'fsc' mount option must be passed. Signed-off-by: NMilosz Tanski <milosz@adfin.com> Signed-off-by: NSage Weil <sage@inktank.com>
-
- 04 7月, 2013 1 次提交
-
-
由 Sasha Levin 提交于
when mounting ceph with a dev name that starts with a slash, ceph would attempt to access the character before that slash. Since we don't actually own that byte of memory, we would trigger an invalid access: [ 43.499934] BUG: unable to handle kernel paging request at ffff880fa3a97fff [ 43.500984] IP: [<ffffffff818f3884>] parse_mount_options+0x1a4/0x300 [ 43.501491] PGD 743b067 PUD 10283c4067 PMD 10282a6067 PTE 8000000fa3a97060 [ 43.502301] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 43.503006] Dumping ftrace buffer: [ 43.503596] (ftrace buffer empty) [ 43.504046] CPU: 0 PID: 10879 Comm: mount Tainted: G W 3.10.0-sasha #1129 [ 43.504851] task: ffff880fa625b000 ti: ffff880fa3412000 task.ti: ffff880fa3412000 [ 43.505608] RIP: 0010:[<ffffffff818f3884>] [<ffffffff818f3884>] parse_mount_options$ [ 43.506552] RSP: 0018:ffff880fa3413d08 EFLAGS: 00010286 [ 43.507133] RAX: ffff880fa3a98000 RBX: ffff880fa3a98000 RCX: 0000000000000000 [ 43.507893] RDX: ffff880fa3a98001 RSI: 000000000000002f RDI: ffff880fa3a98000 [ 43.508610] RBP: ffff880fa3413d58 R08: 0000000000001f99 R09: ffff880fa3fe64c0 [ 43.509426] R10: ffff880fa3413d98 R11: ffff880fa38710d8 R12: ffff880fa3413da0 [ 43.509792] R13: ffff880fa3a97fff R14: 0000000000000000 R15: ffff880fa3413d90 [ 43.509792] FS: 00007fa9c48757e0(0000) GS:ffff880fd2600000(0000) knlGS:000000000000$ [ 43.509792] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 43.509792] CR2: ffff880fa3a97fff CR3: 0000000fa3bb9000 CR4: 00000000000006b0 [ 43.509792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 43.509792] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 43.509792] Stack: [ 43.509792] 0000e5180000000e ffffffff85ca1900 ffff880fa38710d8 ffff880fa3413d98 [ 43.509792] 0000000000000120 0000000000000000 ffff880fa3a98000 0000000000000000 [ 43.509792] ffffffff85cf32a0 0000000000000000 ffff880fa3413dc8 ffffffff818f3c72 [ 43.509792] Call Trace: [ 43.509792] [<ffffffff818f3c72>] ceph_mount+0xa2/0x390 [ 43.509792] [<ffffffff81226314>] ? pcpu_alloc+0x334/0x3c0 [ 43.509792] [<ffffffff81282f8d>] mount_fs+0x8d/0x1a0 [ 43.509792] [<ffffffff812263d0>] ? __alloc_percpu+0x10/0x20 [ 43.509792] [<ffffffff8129f799>] vfs_kern_mount+0x79/0x100 [ 43.509792] [<ffffffff812a224d>] do_new_mount+0xcd/0x1c0 [ 43.509792] [<ffffffff812a2e8d>] do_mount+0x15d/0x210 [ 43.509792] [<ffffffff81220e55>] ? strndup_user+0x45/0x60 [ 43.509792] [<ffffffff812a2fdd>] SyS_mount+0x9d/0xe0 [ 43.509792] [<ffffffff83fd816c>] tracesys+0xdd/0xe2 [ 43.509792] Code: 4c 8b 5d c0 74 0a 48 8d 50 01 49 89 14 24 eb 17 31 c0 48 83 c9 ff $ [ 43.509792] RIP [<ffffffff818f3884>] parse_mount_options+0x1a4/0x300 [ 43.509792] RSP <ffff880fa3413d08> [ 43.509792] CR2: ffff880fa3a97fff [ 43.509792] ---[ end trace 22469cd81e93af51 ]--- Signed-off-by: NSasha Levin <sasha.levin@oracle.com> Reviewed-by: NSage Weil <sage@inktan.com>
-
- 02 5月, 2013 1 次提交
-
-
由 Alex Elder 提交于
In create_fs_client() a memory pool is set up be used for arrays of pages that might be needed in ceph_writepages_start() if memory is tight. There are two problems with the way it's initialized: - The size provided is the number of pages we want in the array, but it should be the number of bytes required for that many page pointers. - The number of pages computed can end up being 0, while we will always need at least one page. This patch fixes both of these problems. This resolves the two simple problems defined in: http://tracker.ceph.com/issues/4603Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>
-
- 04 3月, 2013 1 次提交
-
-
由 Eric W. Biederman 提交于
Modify the request_module to prefix the file system type with "fs-" and add aliases to all of the filesystems that can be built as modules to match. A common practice is to build all of the kernel code and leave code that is not commonly needed as modules, with the result that many users are exposed to any bug anywhere in the kernel. Looking for filesystems with a fs- prefix limits the pool of possible modules that can be loaded by mount to just filesystems trivially making things safer with no real cost. Using aliases means user space can control the policy of which filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf with blacklist and alias directives. Allowing simple, safe, well understood work-arounds to known problematic software. This also addresses a rare but unfortunate problem where the filesystem name is not the same as it's module name and module auto-loading would not work. While writing this patch I saw a handful of such cases. The most significant being autofs that lives in the module autofs4. This is relevant to user namespaces because we can reach the request module in get_fs_type() without having any special permissions, and people get uncomfortable when a user specified string (in this case the filesystem type) goes all of the way to request_module. After having looked at this issue I don't think there is any particular reason to perform any filtering or permission checks beyond making it clear in the module request that we want a filesystem module. The common pattern in the kernel is to call request_module() without regards to the users permissions. In general all a filesystem module does once loaded is call register_filesystem() and go to sleep. Which means there is not much attack surface exposed by loading a filesytem module unless the filesystem is mounted. In a user namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT, which most filesystems do not set today. Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Acked-by: NKees Cook <keescook@chromium.org> Reported-by: NKees Cook <keescook@google.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 23 2月, 2013 1 次提交
-
-
由 Sage Weil 提交于
Different versions of glibc are broken in different ways, but the short of it is that for the time being, frsize should == bsize, and be used as the multiple for the blocks, free, and available fields. This mirrors what is done for NFS. The previous reporting of the page size for frsize meant that newer glibc and df would report a very small value for the fs size. Fixes http://tracker.ceph.com/issues/3793. Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NGreg Farnum <greg@inktank.com>
-
- 13 12月, 2012 2 次提交
-
-
由 Joe Perches 提交于
__printf is useful to verify format and arguments. Signed-off-by: NJoe Perches <joe@perches.com> Reviewed-by: NAlex Elder <elder@inktank.com>
-
由 Sage Weil 提交于
This would reset a connection with any OSD that had an outstanding request that was taking more than N seconds. The idea was that if the OSD was buggy, the client could compensate by resending the request. In reality, this only served to hide server bugs, and we haven't actually seen such a bug in quite a while. Moreover, the userspace client code never did this. More importantly, often the request is taking a long time because the OSD is trying to recover, or overloaded, and killing the connection and retrying would only make the situation worse by giving the OSD more work to do. Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com>
-
- 03 10月, 2012 1 次提交
-
-
由 Kirill A. Shutemov 提交于
There's no reason to call rcu_barrier() on every deactivate_locked_super(). We only need to make sure that all delayed rcu free inodes are flushed before we destroy related cache. Removing rcu_barrier() from deactivate_locked_super() affects some fast paths. E.g. on my machine exit_group() of a last process in IPC namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 02 10月, 2012 1 次提交
-
-
由 Alex Elder 提交于
A recent change to /sbin/mountall causes any trailing '/' character in the "device" (or fs_spec) field in /etc/fstab to be stripped. As a result, an entry for a ceph mount that intends to mount the root of the name space ends up with now path portion, and the ceph mount option processing code rejects this. That is, an entry in /etc/fstab like: cephserver:port:/ /mnt ceph defaults 0 0 provides to the ceph code just "cephserver:port:" as the "device," and that gets rejected. Although this is a bug in /sbin/mountall, we can have the ceph mount code support an empty/nonexistent path, interpreting it to mean the root of the name space. RFC 5952 offers recommendations for how to express IPv6 addresses, and recommends the usage found in RFC 3986 (which specifies the format for URI's) for representing both IPv4 and IPv6 addresses that include port numbers. (See in particular the definition of "authority" found in the Appendix of RFC 3986.) According to those standards, no host specification will ever contain a '/' character. As a result, it is sufficient to scan a provided "device" from an /etc/fstab entry for the first '/' character, and if it's found, treat that as the beginning of the path. If no '/' character is present, we can treat the entire string as the monitor host specification(s), and assume the path to be the root of the name space. We'll still require a ':' to separate the host portion from the (possibly empty) path portion. This means that we can more formally define how ceph will interpret the "device" it's provided when processing a mount request: "device" will look like: <server_spec>[,<server_spec>...]:[<path>] where <server_spec> is <ip>[:<port>] <path> is optional, but if present must begin with '/' This addresses http://tracker.newdream.net/issues/2919Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NDan Mick <dan.mick@inktank.com>
-
- 31 7月, 2012 1 次提交
-
-
由 Sage Weil 提交于
This is simply cleanup that will keep things more closely synced with the userland code. Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com> Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
-
- 14 7月, 2012 1 次提交
-
-
由 David Howells 提交于
Pass mount flags to sget() so that it can use them in initialising a new superblock before the set function is called. They could also be passed to the compare function. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 22 3月, 2012 3 次提交
-
-
由 Alex Elder 提交于
Many ceph-related Boolean options offer the ability to both enable and disable a feature. For all those that don't offer this, add a new option so that they do. Note that ceph_show_options()--which reports mount options currently in effect--only reports the option if it is different from the default value. Signed-off-by: NAlex Elder <elder@dreamhost.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Alex Elder 提交于
ceph_parse_options() takes the address of a pointer as an argument and uses it to return the address of an allocated structure if successful. With this interface is not evident at call sites that the pointer is always initialized. Change the interface to return the address instead (or a pointer-coded error code) to make the validity of the returned pointer obvious. Signed-off-by: NAlex Elder <elder@dreamhost.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Alex Elder 提交于
All names defined in the directory and file virtual extended attribute tables are constant, and the size of each is known at compile time. So there's no need to compute their length every time any file's attribute is listed. Record the length of each string and use it when needed to determine the space need to represent them. In addition, compute the aggregate size of strings in each table just once at initialization time. Signed-off-by: NAlex Elder <elder@dreamhost.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 21 3月, 2012 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 13 1月, 2012 1 次提交
-
-
由 Sage Weil 提交于
Enable/disable use of the dentry dir 'complete' flag via a mount option. This lets the admin control whether ceph uses the dcache to satisfy negative lookups or readdir when it has the entire directory contents in its cache. This is purely a performance optimization; correctness is guaranteed whether it is enabled or not. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 12 1月, 2012 1 次提交
-
-
由 Alex Elder 提交于
When open_root_dentry() gets a dentry via d_obtain_alias() it does not get initialized. If the dentry obtained came from the cache, this is OK. But if not, the result is an improperly initialized dentry. To fix this, call ceph_init_dentry() regardless of which path produced the dentry. That function returns immediately for a dentry that is already initialized, it is safe to use either way. (Credit to Sage, who suggested this fix.) Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 10 1月, 2012 1 次提交
-
-
由 Al Viro 提交于
... and ceph_init_dentry(NULL) will oops Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 07 1月, 2012 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 03 12月, 2011 1 次提交
-
-
由 Sage Weil 提交于
Fix typo. Reported-by: Nmowang da <whooya.xxl@gmail.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 12 11月, 2011 1 次提交
-
-
由 Sage Weil 提交于
Set up d_fsdata on the root dentry. This fixes a NULL pointer dereference in ceph_d_prune on umount. It also means we can eventually strip out all of the conditional checks on d_fsdata because it is now set unconditionally (prior to setting up the d_ops). Fix the ceph_d_prune debug print while we're here. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 06 11月, 2011 1 次提交
-
-
由 H Hartley Sweeten 提交于
Quiet the sparse noise: warning: symbol 'create_fs_client' was not declared. Should it be static? warning: symbol 'destroy_fs_client' was not declared. Should it be static? Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com> Cc: Sage Weil <sage@newdream.net> ceph-devel@vger.kernel.org Signed-off-by: NSage Weil <sage@newdream.net>
-
- 26 10月, 2011 3 次提交
-
-
由 Noah Watkins 提交于
Trivial formatting fix. Signed-off-by: NNoah Watkins <noahwatkins@gmail.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
This simplifies the init/shutdown paths, and makes client->msgr available during the rest of the setup process. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
It controls readahead. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 23 8月, 2011 1 次提交
-
-
由 Noah Watkins 提交于
kfree does not clean up indirect allocations in ceph_fs_client and ceph_options (e.g. snapdir_name). Signed-off-by: NNoah Watkins <noahwatkins@gmail.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 27 7月, 2011 2 次提交
-
-
由 Yehuda Sadeh 提交于
This should improve the default read performance, as without it readahead is practically disabled. Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
-
由 Greg Farnum 提交于
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NGreg Farnum <gregory.farnum@dreamhost.com>
-
- 30 3月, 2011 1 次提交
-
-
由 Tommi Virtanen 提交于
This makes the base64 logic be contained in mount option parsing, and prepares us for replacing the homebew key management with the kernel key retention service. Signed-off-by: NTommi Virtanen <tommi.virtanen@dreamhost.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 22 3月, 2011 2 次提交
-
-
由 Sage Weil 提交于
Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Yehuda Sadeh 提交于
The ino32 mount option forces the ceph fs to report 32 bit ino values. This is useful for 64 bit kernels with 32 bit userspace. Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
-
- 20 1月, 2011 1 次提交
-
-
由 Sage Weil 提交于
These were initialized to 0 instead of the default, fallout from the RBD refactor in 3d14c5d2. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 13 1月, 2011 2 次提交
-
-
由 Tejun Heo 提交于
fsc->*_wq's aren't depended upon during memory reclaim. Convert to alloc_workqueue() w/o WQ_MEM_RECLAIM. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Sage Weil <sage@newdream.net> Cc: ceph-devel@vger.kernel.org Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
This implements the DIRLAYOUTHASH protocol feature, which passes the dir layout over the wire from the MDS. This gives the client knowledge of the correct hash function to use for mapping dentries among dir fragments. Note that if this feature is _not_ present on the client but is on the MDS, the client may misdirect requests. This will result in a forward and degrade performance. It may also result in inaccurate NFS filehandle generation, which will prevent fh resolution when the inode is not present in the client cache and the parent directories have been fragmented. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 29 10月, 2010 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-