提交 fc7f99cf 编写于 作者: L Linus Torvalds

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (205 commits)
  ceph: update for write_inode API change
  ceph: reset osd after relevant messages timed out
  ceph: fix flush_dirty_caps race with caps migration
  ceph: include migrating caps in issued set
  ceph: fix osdmap decoding when pools include (removed) snaps
  ceph: return EBADF if waiting for caps on closed file
  ceph: set osd request message front length correctly
  ceph: reset front len on return to msgpool; BUG on mismatched front iov
  ceph: fix snaptrace decoding on cap migration between mds
  ceph: use single osd op reply msg
  ceph: reset bits on connection close
  ceph: remove bogus mds forward warning
  ceph: remove fragile __map_osds optimization
  ceph: fix connection fault STANDBY check
  ceph: invalidate_authorizer without con->mutex held
  ceph: don't clobber write return value when using O_SYNC
  ceph: fix client_request_forward decoding
  ceph: drop messages on unregistered mds sessions; cleanup
  ceph: fix comments, locking in destroy_inode
  ceph: move dereference after NULL test
  ...

Fix trivial conflicts in Documentation/ioctl/ioctl-number.txt
Ceph Distributed File System
============================
Ceph is a distributed network file system designed to provide good
performance, reliability, and scalability.
Basic features include:
* POSIX semantics
* Seamless scaling from 1 to many thousands of nodes
* High availability and reliability. No single points of failure.
* N-way replication of data across storage nodes
* Fast recovery from node failures
* Automatic rebalancing of data on node addition/removal
* Easy deployment: most FS components are userspace daemons
Also,
* Flexible snapshots (on any directory)
* Recursive accounting (nested files, directories, bytes)
In contrast to cluster filesystems like GFS, OCFS2, and GPFS that rely
on symmetric access by all clients to shared block devices, Ceph
separates data and metadata management into independent server
clusters, similar to Lustre. Unlike Lustre, however, metadata and
storage nodes run entirely as user space daemons. Storage nodes
utilize btrfs to store data objects, leveraging its advanced features
(checksumming, metadata replication, etc.). File data is striped
across storage nodes in large chunks to distribute workload and
facilitate high throughputs. When storage nodes fail, data is
re-replicated in a distributed fashion by the storage nodes themselves
(with some minimal coordination from a cluster monitor), making the
system extremely efficient and scalable.
Metadata servers effectively form a large, consistent, distributed
in-memory cache above the file namespace that is extremely scalable,
dynamically redistributes metadata in response to workload changes,
and can tolerate arbitrary (well, non-Byzantine) node failures. The
metadata server takes a somewhat unconventional approach to metadata
storage to significantly improve performance for common workloads. In
particular, inodes with only a single link are embedded in
directories, allowing entire directories of dentries and inodes to be
loaded into its cache with a single I/O operation. The contents of
extremely large directories can be fragmented and managed by
independent metadata servers, allowing scalable concurrent access.
The system offers automatic data rebalancing/migration when scaling
from a small cluster of just a few nodes to many hundreds, without
requiring an administrator carve the data set into static volumes or
go through the tedious process of migrating data between servers.
When the file system approaches full, new nodes can be easily added
and things will "just work."
Ceph includes flexible snapshot mechanism that allows a user to create
a snapshot on any subdirectory (and its nested contents) in the
system. Snapshot creation and deletion are as simple as 'mkdir
.snap/foo' and 'rmdir .snap/foo'.
Ceph also provides some recursive accounting on directories for nested
files and bytes. That is, a 'getfattr -d foo' on any directory in the
system will reveal the total number of nested regular files and
subdirectories, and a summation of all nested file sizes. This makes
the identification of large disk space consumers relatively quick, as
no 'du' or similar recursive scan of the file system is required.
Mount Syntax
============
The basic mount syntax is:
# mount -t ceph monip[:port][,monip2[:port]...]:/[subdir] mnt
You only need to specify a single monitor, as the client will get the
full list when it connects. (However, if the monitor you specify
happens to be down, the mount won't succeed.) The port can be left
off if the monitor is using the default. So if the monitor is at
1.2.3.4,
# mount -t ceph 1.2.3.4:/ /mnt/ceph
is sufficient. If /sbin/mount.ceph is installed, a hostname can be
used instead of an IP address.
Mount Options
=============
ip=A.B.C.D[:N]
Specify the IP and/or port the client should bind to locally.
There is normally not much reason to do this. If the IP is not
specified, the client's IP address is determined by looking at the
address it's connection to the monitor originates from.
wsize=X
Specify the maximum write size in bytes. By default there is no
maximu. Ceph will normally size writes based on the file stripe
size.
rsize=X
Specify the maximum readahead.
mount_timeout=X
Specify the timeout value for mount (in seconds), in the case
of a non-responsive Ceph file system. The default is 30
seconds.
rbytes
When stat() is called on a directory, set st_size to 'rbytes',
the summation of file sizes over all files nested beneath that
directory. This is the default.
norbytes
When stat() is called on a directory, set st_size to the
number of entries in that directory.
nocrc
Disable CRC32C calculation for data writes. If set, the OSD
must rely on TCP's error correction to detect data corruption
in the data payload.
noasyncreaddir
Disable client's use its local cache to satisfy readdir
requests. (This does not change correctness; the client uses
cached metadata only when a lease or capability ensures it is
valid.)
More Information
================
For more information on Ceph, see the home page at
http://ceph.newdream.net/
The Linux kernel client source tree is available at
git://ceph.newdream.net/linux-ceph-client.git
and the source for the full system is at
git://ceph.newdream.net/ceph.git
......@@ -291,6 +291,7 @@ Code Seq#(hex) Include File Comments
0x92 00-0F drivers/usb/mon/mon_bin.c
0x93 60-7F linux/auto_fs.h
0x94 all fs/btrfs/ioctl.h
0x97 00-7F fs/ceph/ioctl.h Ceph file system
0x99 00-0F 537-Addinboard driver
<mailto:buk@buks.ipn.de>
0xA0 all linux/sdp/sdp.h Industrial Device Project
......
......@@ -1441,6 +1441,15 @@ F: arch/powerpc/include/asm/spu*.h
F: arch/powerpc/oprofile/*cell*
F: arch/powerpc/platforms/cell/
CEPH DISTRIBUTED FILE SYSTEM CLIENT
M: Sage Weil <sage@newdream.net>
L: ceph-devel@lists.sourceforge.net
W: http://ceph.newdream.net/
T: git git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git
S: Supported
F: Documentation/filesystems/ceph.txt
F: fs/ceph
CERTIFIED WIRELESS USB (WUSB) SUBSYSTEM:
M: David Vrabel <david.vrabel@csr.com>
L: linux-usb@vger.kernel.org
......
......@@ -235,6 +235,7 @@ config NFS_COMMON
source "net/sunrpc/Kconfig"
source "fs/smbfs/Kconfig"
source "fs/ceph/Kconfig"
source "fs/cifs/Kconfig"
source "fs/ncpfs/Kconfig"
source "fs/coda/Kconfig"
......
......@@ -125,3 +125,4 @@ obj-$(CONFIG_OCFS2_FS) += ocfs2/
obj-$(CONFIG_BTRFS_FS) += btrfs/
obj-$(CONFIG_GFS2_FS) += gfs2/
obj-$(CONFIG_EXOFS_FS) += exofs/
obj-$(CONFIG_CEPH_FS) += ceph/
config CEPH_FS
tristate "Ceph distributed file system (EXPERIMENTAL)"
depends on INET && EXPERIMENTAL
select LIBCRC32C
select CONFIG_CRYPTO_AES
help
Choose Y or M here to include support for mounting the
experimental Ceph distributed file system. Ceph is an extremely
scalable file system designed to provide high performance,
reliable access to petabytes of storage.
More information at http://ceph.newdream.net/.
If unsure, say N.
config CEPH_FS_PRETTYDEBUG
bool "Include file:line in ceph debug output"
depends on CEPH_FS
default n
help
If you say Y here, debug output will include a filename and
line to aid debugging. This icnreases kernel size and slows
execution slightly when debug call sites are enabled (e.g.,
via CONFIG_DYNAMIC_DEBUG).
If unsure, say N.
#
# Makefile for CEPH filesystem.
#
ifneq ($(KERNELRELEASE),)
obj-$(CONFIG_CEPH_FS) += ceph.o
ceph-objs := super.o inode.o dir.o file.o addr.o ioctl.o \
export.o caps.o snap.o xattr.o \
messenger.o msgpool.o buffer.o pagelist.o \
mds_client.o mdsmap.o \
mon_client.o \
osd_client.o osdmap.o crush/crush.o crush/mapper.o crush/hash.o \
debugfs.o \
auth.o auth_none.o \
crypto.o armor.o \
auth_x.o \
ceph_fs.o ceph_strings.o ceph_hash.o ceph_frag.o
else
#Otherwise we were called directly from the command
# line; invoke the kernel build system.
KERNELDIR ?= /lib/modules/$(shell uname -r)/build
PWD := $(shell pwd)
default: all
all:
$(MAKE) -C $(KERNELDIR) M=$(PWD) CONFIG_CEPH_FS=m modules
modules_install:
$(MAKE) -C $(KERNELDIR) M=$(PWD) CONFIG_CEPH_FS=m modules_install
clean:
$(MAKE) -C $(KERNELDIR) M=$(PWD) clean
endif
#
# The following files are shared by (and manually synchronized
# between) the Ceph userland and kernel client.
#
# userland kernel
src/include/ceph_fs.h fs/ceph/ceph_fs.h
src/include/ceph_fs.cc fs/ceph/ceph_fs.c
src/include/msgr.h fs/ceph/msgr.h
src/include/rados.h fs/ceph/rados.h
src/include/ceph_strings.cc fs/ceph/ceph_strings.c
src/include/ceph_frag.h fs/ceph/ceph_frag.h
src/include/ceph_frag.cc fs/ceph/ceph_frag.c
src/include/ceph_hash.h fs/ceph/ceph_hash.h
src/include/ceph_hash.cc fs/ceph/ceph_hash.c
src/crush/crush.c fs/ceph/crush/crush.c
src/crush/crush.h fs/ceph/crush/crush.h
src/crush/mapper.c fs/ceph/crush/mapper.c
src/crush/mapper.h fs/ceph/crush/mapper.h
src/crush/hash.h fs/ceph/crush/hash.h
src/crush/hash.c fs/ceph/crush/hash.c
此差异已折叠。
#include <linux/errno.h>
/*
* base64 encode/decode.
*/
const char *pem_key = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
static int encode_bits(int c)
{
return pem_key[c];
}
static int decode_bits(char c)
{
if (c >= 'A' && c <= 'Z')
return c - 'A';
if (c >= 'a' && c <= 'z')
return c - 'a' + 26;
if (c >= '0' && c <= '9')
return c - '0' + 52;
if (c == '+')
return 62;
if (c == '/')
return 63;
if (c == '=')
return 0; /* just non-negative, please */
return -EINVAL;
}
int ceph_armor(char *dst, const char *src, const char *end)
{
int olen = 0;
int line = 0;
while (src < end) {
unsigned char a, b, c;
a = *src++;
*dst++ = encode_bits(a >> 2);
if (src < end) {
b = *src++;
*dst++ = encode_bits(((a & 3) << 4) | (b >> 4));
if (src < end) {
c = *src++;
*dst++ = encode_bits(((b & 15) << 2) |
(c >> 6));
*dst++ = encode_bits(c & 63);
} else {
*dst++ = encode_bits((b & 15) << 2);
*dst++ = '=';
}
} else {
*dst++ = encode_bits(((a & 3) << 4));
*dst++ = '=';
*dst++ = '=';
}
olen += 4;
line += 4;
if (line == 64) {
line = 0;
*(dst++) = '\n';
olen++;
}
}
return olen;
}
int ceph_unarmor(char *dst, const char *src, const char *end)
{
int olen = 0;
while (src < end) {
int a, b, c, d;
if (src < end && src[0] == '\n')
src++;
if (src + 4 > end)
return -EINVAL;
a = decode_bits(src[0]);
b = decode_bits(src[1]);
c = decode_bits(src[2]);
d = decode_bits(src[3]);
if (a < 0 || b < 0 || c < 0 || d < 0)
return -EINVAL;
*dst++ = (a << 2) | (b >> 4);
if (src[2] == '=')
return olen + 1;
*dst++ = ((b & 15) << 4) | (c >> 2);
if (src[3] == '=')
return olen + 2;
*dst++ = ((c & 3) << 6) | d;
olen += 3;
src += 4;
}
return olen;
}
#include "ceph_debug.h"
#include <linux/module.h>
#include <linux/err.h>
#include "types.h"
#include "auth_none.h"
#include "auth_x.h"
#include "decode.h"
#include "super.h"
#include "messenger.h"
/*
* get protocol handler
*/
static u32 supported_protocols[] = {
CEPH_AUTH_NONE,
CEPH_AUTH_CEPHX
};
int ceph_auth_init_protocol(struct ceph_auth_client *ac, int protocol)
{
switch (protocol) {
case CEPH_AUTH_NONE:
return ceph_auth_none_init(ac);
case CEPH_AUTH_CEPHX:
return ceph_x_init(ac);
default:
return -ENOENT;
}
}
/*
* setup, teardown.
*/
struct ceph_auth_client *ceph_auth_init(const char *name, const char *secret)
{
struct ceph_auth_client *ac;
int ret;
dout("auth_init name '%s' secret '%s'\n", name, secret);
ret = -ENOMEM;
ac = kzalloc(sizeof(*ac), GFP_NOFS);
if (!ac)
goto out;
ac->negotiating = true;
if (name)
ac->name = name;
else
ac->name = CEPH_AUTH_NAME_DEFAULT;
dout("auth_init name %s secret %s\n", ac->name, secret);
ac->secret = secret;
return ac;
out:
return ERR_PTR(ret);
}
void ceph_auth_destroy(struct ceph_auth_client *ac)
{
dout("auth_destroy %p\n", ac);
if (ac->ops)
ac->ops->destroy(ac);
kfree(ac);
}
/*
* Reset occurs when reconnecting to the monitor.
*/
void ceph_auth_reset(struct ceph_auth_client *ac)
{
dout("auth_reset %p\n", ac);
if (ac->ops && !ac->negotiating)
ac->ops->reset(ac);
ac->negotiating = true;
}
int ceph_entity_name_encode(const char *name, void **p, void *end)
{
int len = strlen(name);
if (*p + 2*sizeof(u32) + len > end)
return -ERANGE;
ceph_encode_32(p, CEPH_ENTITY_TYPE_CLIENT);
ceph_encode_32(p, len);
ceph_encode_copy(p, name, len);
return 0;
}
/*
* Initiate protocol negotiation with monitor. Include entity name
* and list supported protocols.
*/
int ceph_auth_build_hello(struct ceph_auth_client *ac, void *buf, size_t len)
{
struct ceph_mon_request_header *monhdr = buf;
void *p = monhdr + 1, *end = buf + len, *lenp;
int i, num;
int ret;
dout("auth_build_hello\n");
monhdr->have_version = 0;
monhdr->session_mon = cpu_to_le16(-1);
monhdr->session_mon_tid = 0;
ceph_encode_32(&p, 0); /* no protocol, yet */
lenp = p;
p += sizeof(u32);
ceph_decode_need(&p, end, 1 + sizeof(u32), bad);
ceph_encode_8(&p, 1);
num = ARRAY_SIZE(supported_protocols);
ceph_encode_32(&p, num);
ceph_decode_need(&p, end, num * sizeof(u32), bad);
for (i = 0; i < num; i++)
ceph_encode_32(&p, supported_protocols[i]);
ret = ceph_entity_name_encode(ac->name, &p, end);
if (ret < 0)
return ret;
ceph_decode_need(&p, end, sizeof(u64), bad);
ceph_encode_64(&p, ac->global_id);
ceph_encode_32(&lenp, p - lenp - sizeof(u32));
return p - buf;
bad:
return -ERANGE;
}
int ceph_build_auth_request(struct ceph_auth_client *ac,
void *msg_buf, size_t msg_len)
{
struct ceph_mon_request_header *monhdr = msg_buf;
void *p = monhdr + 1;
void *end = msg_buf + msg_len;
int ret;
monhdr->have_version = 0;
monhdr->session_mon = cpu_to_le16(-1);
monhdr->session_mon_tid = 0;
ceph_encode_32(&p, ac->protocol);
ret = ac->ops->build_request(ac, p + sizeof(u32), end);
if (ret < 0) {
pr_err("error %d building request\n", ret);
return ret;
}
dout(" built request %d bytes\n", ret);
ceph_encode_32(&p, ret);
return p + ret - msg_buf;
}
/*
* Handle auth message from monitor.
*/
int ceph_handle_auth_reply(struct ceph_auth_client *ac,
void *buf, size_t len,
void *reply_buf, size_t reply_len)
{
void *p = buf;
void *end = buf + len;
int protocol;
s32 result;
u64 global_id;
void *payload, *payload_end;
int payload_len;
char *result_msg;
int result_msg_len;
int ret = -EINVAL;
dout("handle_auth_reply %p %p\n", p, end);
ceph_decode_need(&p, end, sizeof(u32) * 3 + sizeof(u64), bad);
protocol = ceph_decode_32(&p);
result = ceph_decode_32(&p);
global_id = ceph_decode_64(&p);
payload_len = ceph_decode_32(&p);
payload = p;
p += payload_len;
ceph_decode_need(&p, end, sizeof(u32), bad);
result_msg_len = ceph_decode_32(&p);
result_msg = p;
p += result_msg_len;
if (p != end)
goto bad;
dout(" result %d '%.*s' gid %llu len %d\n", result, result_msg_len,
result_msg, global_id, payload_len);
payload_end = payload + payload_len;
if (global_id && ac->global_id != global_id) {
dout(" set global_id %lld -> %lld\n", ac->global_id, global_id);
ac->global_id = global_id;
}
if (ac->negotiating) {
/* server does not support our protocols? */
if (!protocol && result < 0) {
ret = result;
goto out;
}
/* set up (new) protocol handler? */
if (ac->protocol && ac->protocol != protocol) {
ac->ops->destroy(ac);
ac->protocol = 0;
ac->ops = NULL;
}
if (ac->protocol != protocol) {
ret = ceph_auth_init_protocol(ac, protocol);
if (ret) {
pr_err("error %d on auth protocol %d init\n",
ret, protocol);
goto out;
}
}
ac->negotiating = false;
}
ret = ac->ops->handle_reply(ac, result, payload, payload_end);
if (ret == -EAGAIN) {
return ceph_build_auth_request(ac, reply_buf, reply_len);
} else if (ret) {
pr_err("authentication error %d\n", ret);
return ret;
}
return 0;
bad:
pr_err("failed to decode auth msg\n");
out:
return ret;
}
int ceph_build_auth(struct ceph_auth_client *ac,
void *msg_buf, size_t msg_len)
{
if (!ac->protocol)
return ceph_auth_build_hello(ac, msg_buf, msg_len);
BUG_ON(!ac->ops);
if (!ac->ops->is_authenticated(ac))
return ceph_build_auth_request(ac, msg_buf, msg_len);
return 0;
}
int ceph_auth_is_authenticated(struct ceph_auth_client *ac)
{
if (!ac->ops)
return 0;
return ac->ops->is_authenticated(ac);
}
#ifndef _FS_CEPH_AUTH_H
#define _FS_CEPH_AUTH_H
#include "types.h"
#include "buffer.h"
/*
* Abstract interface for communicating with the authenticate module.
* There is some handshake that takes place between us and the monitor
* to acquire the necessary keys. These are used to generate an
* 'authorizer' that we use when connecting to a service (mds, osd).
*/
struct ceph_auth_client;
struct ceph_authorizer;
struct ceph_auth_client_ops {
/*
* true if we are authenticated and can connect to
* services.
*/
int (*is_authenticated)(struct ceph_auth_client *ac);
/*
* build requests and process replies during monitor
* handshake. if handle_reply returns -EAGAIN, we build
* another request.
*/
int (*build_request)(struct ceph_auth_client *ac, void *buf, void *end);
int (*handle_reply)(struct ceph_auth_client *ac, int result,
void *buf, void *end);
/*
* Create authorizer for connecting to a service, and verify
* the response to authenticate the service.
*/
int (*create_authorizer)(struct ceph_auth_client *ac, int peer_type,
struct ceph_authorizer **a,
void **buf, size_t *len,
void **reply_buf, size_t *reply_len);
int (*verify_authorizer_reply)(struct ceph_auth_client *ac,
struct ceph_authorizer *a, size_t len);
void (*destroy_authorizer)(struct ceph_auth_client *ac,
struct ceph_authorizer *a);
void (*invalidate_authorizer)(struct ceph_auth_client *ac,
int peer_type);
/* reset when we (re)connect to a monitor */
void (*reset)(struct ceph_auth_client *ac);
void (*destroy)(struct ceph_auth_client *ac);
};
struct ceph_auth_client {
u32 protocol; /* CEPH_AUTH_* */
void *private; /* for use by protocol implementation */
const struct ceph_auth_client_ops *ops; /* null iff protocol==0 */
bool negotiating; /* true if negotiating protocol */
const char *name; /* entity name */
u64 global_id; /* our unique id in system */
const char *secret; /* our secret key */
unsigned want_keys; /* which services we want */
};
extern struct ceph_auth_client *ceph_auth_init(const char *name,
const char *secret);
extern void ceph_auth_destroy(struct ceph_auth_client *ac);
extern void ceph_auth_reset(struct ceph_auth_client *ac);
extern int ceph_auth_build_hello(struct ceph_auth_client *ac,
void *buf, size_t len);
extern int ceph_handle_auth_reply(struct ceph_auth_client *ac,
void *buf, size_t len,
void *reply_buf, size_t reply_len);
extern int ceph_entity_name_encode(const char *name, void **p, void *end);
extern int ceph_build_auth(struct ceph_auth_client *ac,
void *msg_buf, size_t msg_len);
extern int ceph_auth_is_authenticated(struct ceph_auth_client *ac);
#endif
#include "ceph_debug.h"
#include <linux/err.h>
#include <linux/module.h>
#include <linux/random.h>
#include "auth_none.h"
#include "auth.h"
#include "decode.h"
static void reset(struct ceph_auth_client *ac)
{
struct ceph_auth_none_info *xi = ac->private;
xi->starting = true;
xi->built_authorizer = false;
}
static void destroy(struct ceph_auth_client *ac)
{
kfree(ac->private);
ac->private = NULL;
}
static int is_authenticated(struct ceph_auth_client *ac)
{
struct ceph_auth_none_info *xi = ac->private;
return !xi->starting;
}
/*
* the generic auth code decode the global_id, and we carry no actual
* authenticate state, so nothing happens here.
*/
static int handle_reply(struct ceph_auth_client *ac, int result,
void *buf, void *end)
{
struct ceph_auth_none_info *xi = ac->private;
xi->starting = false;
return result;
}
/*
* build an 'authorizer' with our entity_name and global_id. we can
* reuse a single static copy since it is identical for all services
* we connect to.
*/
static int ceph_auth_none_create_authorizer(
struct ceph_auth_client *ac, int peer_type,
struct ceph_authorizer **a,
void **buf, size_t *len,
void **reply_buf, size_t *reply_len)
{
struct ceph_auth_none_info *ai = ac->private;
struct ceph_none_authorizer *au = &ai->au;
void *p, *end;
int ret;
if (!ai->built_authorizer) {
p = au->buf;
end = p + sizeof(au->buf);
ceph_encode_8(&p, 1);
ret = ceph_entity_name_encode(ac->name, &p, end - 8);
if (ret < 0)
goto bad;
ceph_decode_need(&p, end, sizeof(u64), bad2);
ceph_encode_64(&p, ac->global_id);
au->buf_len = p - (void *)au->buf;
ai->built_authorizer = true;
dout("built authorizer len %d\n", au->buf_len);
}
*a = (struct ceph_authorizer *)au;
*buf = au->buf;
*len = au->buf_len;
*reply_buf = au->reply_buf;
*reply_len = sizeof(au->reply_buf);
return 0;
bad2:
ret = -ERANGE;
bad:
return ret;
}
static void ceph_auth_none_destroy_authorizer(struct ceph_auth_client *ac,
struct ceph_authorizer *a)
{
/* nothing to do */
}
static const struct ceph_auth_client_ops ceph_auth_none_ops = {
.reset = reset,
.destroy = destroy,
.is_authenticated = is_authenticated,
.handle_reply = handle_reply,
.create_authorizer = ceph_auth_none_create_authorizer,
.destroy_authorizer = ceph_auth_none_destroy_authorizer,
};
int ceph_auth_none_init(struct ceph_auth_client *ac)
{
struct ceph_auth_none_info *xi;
dout("ceph_auth_none_init %p\n", ac);
xi = kzalloc(sizeof(*xi), GFP_NOFS);
if (!xi)
return -ENOMEM;
xi->starting = true;
xi->built_authorizer = false;
ac->protocol = CEPH_AUTH_NONE;
ac->private = xi;
ac->ops = &ceph_auth_none_ops;
return 0;
}
#ifndef _FS_CEPH_AUTH_NONE_H
#define _FS_CEPH_AUTH_NONE_H
#include "auth.h"
/*
* null security mode.
*
* we use a single static authorizer that simply encodes our entity name
* and global id.
*/
struct ceph_none_authorizer {
char buf[128];
int buf_len;
char reply_buf[0];
};
struct ceph_auth_none_info {
bool starting;
bool built_authorizer;
struct ceph_none_authorizer au; /* we only need one; it's static */
};
extern int ceph_auth_none_init(struct ceph_auth_client *ac);
#endif
#include "ceph_debug.h"
#include <linux/err.h>
#include <linux/module.h>
#include <linux/random.h>
#include "auth_x.h"
#include "auth_x_protocol.h"
#include "crypto.h"
#include "auth.h"
#include "decode.h"
struct kmem_cache *ceph_x_ticketbuf_cachep;
#define TEMP_TICKET_BUF_LEN 256
static void ceph_x_validate_tickets(struct ceph_auth_client *ac, int *pneed);
static int ceph_x_is_authenticated(struct ceph_auth_client *ac)
{
struct ceph_x_info *xi = ac->private;
int need;
ceph_x_validate_tickets(ac, &need);
dout("ceph_x_is_authenticated want=%d need=%d have=%d\n",
ac->want_keys, need, xi->have_keys);
return (ac->want_keys & xi->have_keys) == ac->want_keys;
}
static int ceph_x_encrypt(struct ceph_crypto_key *secret,
void *ibuf, int ilen, void *obuf, size_t olen)
{
struct ceph_x_encrypt_header head = {
.struct_v = 1,
.magic = cpu_to_le64(CEPHX_ENC_MAGIC)
};
size_t len = olen - sizeof(u32);
int ret;
ret = ceph_encrypt2(secret, obuf + sizeof(u32), &len,
&head, sizeof(head), ibuf, ilen);
if (ret)
return ret;
ceph_encode_32(&obuf, len);
return len + sizeof(u32);
}
static int ceph_x_decrypt(struct ceph_crypto_key *secret,
void **p, void *end, void *obuf, size_t olen)
{
struct ceph_x_encrypt_header head;
size_t head_len = sizeof(head);
int len, ret;
len = ceph_decode_32(p);
if (*p + len > end)
return -EINVAL;
dout("ceph_x_decrypt len %d\n", len);
ret = ceph_decrypt2(secret, &head, &head_len, obuf, &olen,
*p, len);
if (ret)
return ret;
if (head.struct_v != 1 || le64_to_cpu(head.magic) != CEPHX_ENC_MAGIC)
return -EPERM;
*p += len;
return olen;
}
/*
* get existing (or insert new) ticket handler
*/
struct ceph_x_ticket_handler *get_ticket_handler(struct ceph_auth_client *ac,
int service)
{
struct ceph_x_ticket_handler *th;
struct ceph_x_info *xi = ac->private;
struct rb_node *parent = NULL, **p = &xi->ticket_handlers.rb_node;
while (*p) {
parent = *p;
th = rb_entry(parent, struct ceph_x_ticket_handler, node);
if (service < th->service)
p = &(*p)->rb_left;
else if (service > th->service)
p = &(*p)->rb_right;
else
return th;
}
/* add it */
th = kzalloc(sizeof(*th), GFP_NOFS);
if (!th)
return ERR_PTR(-ENOMEM);
th->service = service;
rb_link_node(&th->node, parent, p);
rb_insert_color(&th->node, &xi->ticket_handlers);
return th;
}
static void remove_ticket_handler(struct ceph_auth_client *ac,
struct ceph_x_ticket_handler *th)
{
struct ceph_x_info *xi = ac->private;
dout("remove_ticket_handler %p %d\n", th, th->service);
rb_erase(&th->node, &xi->ticket_handlers);
ceph_crypto_key_destroy(&th->session_key);
if (th->ticket_blob)
ceph_buffer_put(th->ticket_blob);
kfree(th);
}
static int ceph_x_proc_ticket_reply(struct ceph_auth_client *ac,
struct ceph_crypto_key *secret,
void *buf, void *end)
{
struct ceph_x_info *xi = ac->private;
int num;
void *p = buf;
int ret;
char *dbuf;
char *ticket_buf;
u8 struct_v;
dbuf = kmem_cache_alloc(ceph_x_ticketbuf_cachep, GFP_NOFS | GFP_ATOMIC);
if (!dbuf)
return -ENOMEM;
ret = -ENOMEM;
ticket_buf = kmem_cache_alloc(ceph_x_ticketbuf_cachep,
GFP_NOFS | GFP_ATOMIC);
if (!ticket_buf)
goto out_dbuf;
ceph_decode_need(&p, end, 1 + sizeof(u32), bad);
struct_v = ceph_decode_8(&p);
if (struct_v != 1)
goto bad;
num = ceph_decode_32(&p);
dout("%d tickets\n", num);
while (num--) {
int type;
u8 struct_v;
struct ceph_x_ticket_handler *th;
void *dp, *dend;
int dlen;
char is_enc;
struct timespec validity;
struct ceph_crypto_key old_key;
void *tp, *tpend;
ceph_decode_need(&p, end, sizeof(u32) + 1, bad);
type = ceph_decode_32(&p);
dout(" ticket type %d %s\n", type, ceph_entity_type_name(type));
struct_v = ceph_decode_8(&p);
if (struct_v != 1)
goto bad;
th = get_ticket_handler(ac, type);
if (IS_ERR(th)) {
ret = PTR_ERR(th);
goto out;
}
/* blob for me */
dlen = ceph_x_decrypt(secret, &p, end, dbuf,
TEMP_TICKET_BUF_LEN);
if (dlen <= 0) {
ret = dlen;
goto out;
}
dout(" decrypted %d bytes\n", dlen);
dend = dbuf + dlen;
dp = dbuf;
struct_v = ceph_decode_8(&dp);
if (struct_v != 1)
goto bad;
memcpy(&old_key, &th->session_key, sizeof(old_key));
ret = ceph_crypto_key_decode(&th->session_key, &dp, dend);
if (ret)
goto out;
ceph_decode_copy(&dp, &th->validity, sizeof(th->validity));
ceph_decode_timespec(&validity, &th->validity);
th->expires = get_seconds() + validity.tv_sec;
th->renew_after = th->expires - (validity.tv_sec / 4);
dout(" expires=%lu renew_after=%lu\n", th->expires,
th->renew_after);
/* ticket blob for service */
ceph_decode_8_safe(&p, end, is_enc, bad);
tp = ticket_buf;
if (is_enc) {
/* encrypted */
dout(" encrypted ticket\n");
dlen = ceph_x_decrypt(&old_key, &p, end, ticket_buf,
TEMP_TICKET_BUF_LEN);
if (dlen < 0) {
ret = dlen;
goto out;
}
dlen = ceph_decode_32(&tp);
} else {
/* unencrypted */
ceph_decode_32_safe(&p, end, dlen, bad);
ceph_decode_need(&p, end, dlen, bad);
ceph_decode_copy(&p, ticket_buf, dlen);
}
tpend = tp + dlen;
dout(" ticket blob is %d bytes\n", dlen);
ceph_decode_need(&tp, tpend, 1 + sizeof(u64), bad);
struct_v = ceph_decode_8(&tp);
th->secret_id = ceph_decode_64(&tp);
ret = ceph_decode_buffer(&th->ticket_blob, &tp, tpend);
if (ret)
goto out;
dout(" got ticket service %d (%s) secret_id %lld len %d\n",
type, ceph_entity_type_name(type), th->secret_id,
(int)th->ticket_blob->vec.iov_len);
xi->have_keys |= th->service;
}
ret = 0;
out:
kmem_cache_free(ceph_x_ticketbuf_cachep, ticket_buf);
out_dbuf:
kmem_cache_free(ceph_x_ticketbuf_cachep, dbuf);
return ret;
bad:
ret = -EINVAL;
goto out;
}
static int ceph_x_build_authorizer(struct ceph_auth_client *ac,
struct ceph_x_ticket_handler *th,
struct ceph_x_authorizer *au)
{
int len;
struct ceph_x_authorize_a *msg_a;
struct ceph_x_authorize_b msg_b;
void *p, *end;
int ret;
int ticket_blob_len =
(th->ticket_blob ? th->ticket_blob->vec.iov_len : 0);
dout("build_authorizer for %s %p\n",
ceph_entity_type_name(th->service), au);
len = sizeof(*msg_a) + sizeof(msg_b) + sizeof(u32) +
ticket_blob_len + 16;
dout(" need len %d\n", len);
if (au->buf && au->buf->alloc_len < len) {
ceph_buffer_put(au->buf);
au->buf = NULL;
}
if (!au->buf) {
au->buf = ceph_buffer_new(len, GFP_NOFS);
if (!au->buf)
return -ENOMEM;
}
au->service = th->service;
msg_a = au->buf->vec.iov_base;
msg_a->struct_v = 1;
msg_a->global_id = cpu_to_le64(ac->global_id);
msg_a->service_id = cpu_to_le32(th->service);
msg_a->ticket_blob.struct_v = 1;
msg_a->ticket_blob.secret_id = cpu_to_le64(th->secret_id);
msg_a->ticket_blob.blob_len = cpu_to_le32(ticket_blob_len);
if (ticket_blob_len) {
memcpy(msg_a->ticket_blob.blob, th->ticket_blob->vec.iov_base,
th->ticket_blob->vec.iov_len);
}
dout(" th %p secret_id %lld %lld\n", th, th->secret_id,
le64_to_cpu(msg_a->ticket_blob.secret_id));
p = msg_a + 1;
p += ticket_blob_len;
end = au->buf->vec.iov_base + au->buf->vec.iov_len;
get_random_bytes(&au->nonce, sizeof(au->nonce));
msg_b.struct_v = 1;
msg_b.nonce = cpu_to_le64(au->nonce);
ret = ceph_x_encrypt(&th->session_key, &msg_b, sizeof(msg_b),
p, end - p);
if (ret < 0)
goto out_buf;
p += ret;
au->buf->vec.iov_len = p - au->buf->vec.iov_base;
dout(" built authorizer nonce %llx len %d\n", au->nonce,
(int)au->buf->vec.iov_len);
return 0;
out_buf:
ceph_buffer_put(au->buf);
au->buf = NULL;
return ret;
}
static int ceph_x_encode_ticket(struct ceph_x_ticket_handler *th,
void **p, void *end)
{
ceph_decode_need(p, end, 1 + sizeof(u64), bad);
ceph_encode_8(p, 1);
ceph_encode_64(p, th->secret_id);
if (th->ticket_blob) {
const char *buf = th->ticket_blob->vec.iov_base;
u32 len = th->ticket_blob->vec.iov_len;
ceph_encode_32_safe(p, end, len, bad);
ceph_encode_copy_safe(p, end, buf, len, bad);
} else {
ceph_encode_32_safe(p, end, 0, bad);
}
return 0;
bad:
return -ERANGE;
}
static void ceph_x_validate_tickets(struct ceph_auth_client *ac, int *pneed)
{
int want = ac->want_keys;
struct ceph_x_info *xi = ac->private;
int service;
*pneed = ac->want_keys & ~(xi->have_keys);
for (service = 1; service <= want; service <<= 1) {
struct ceph_x_ticket_handler *th;
if (!(ac->want_keys & service))
continue;
if (*pneed & service)
continue;
th = get_ticket_handler(ac, service);
if (!th) {
*pneed |= service;
continue;
}
if (get_seconds() >= th->renew_after)
*pneed |= service;
if (get_seconds() >= th->expires)
xi->have_keys &= ~service;
}
}
static int ceph_x_build_request(struct ceph_auth_client *ac,
void *buf, void *end)
{
struct ceph_x_info *xi = ac->private;
int need;
struct ceph_x_request_header *head = buf;
int ret;
struct ceph_x_ticket_handler *th =
get_ticket_handler(ac, CEPH_ENTITY_TYPE_AUTH);
ceph_x_validate_tickets(ac, &need);
dout("build_request want %x have %x need %x\n",
ac->want_keys, xi->have_keys, need);
if (need & CEPH_ENTITY_TYPE_AUTH) {
struct ceph_x_authenticate *auth = (void *)(head + 1);
void *p = auth + 1;
struct ceph_x_challenge_blob tmp;
char tmp_enc[40];
u64 *u;
if (p > end)
return -ERANGE;
dout(" get_auth_session_key\n");
head->op = cpu_to_le16(CEPHX_GET_AUTH_SESSION_KEY);
/* encrypt and hash */
get_random_bytes(&auth->client_challenge, sizeof(u64));
tmp.client_challenge = auth->client_challenge;
tmp.server_challenge = cpu_to_le64(xi->server_challenge);
ret = ceph_x_encrypt(&xi->secret, &tmp, sizeof(tmp),
tmp_enc, sizeof(tmp_enc));
if (ret < 0)
return ret;
auth->struct_v = 1;
auth->key = 0;
for (u = (u64 *)tmp_enc; u + 1 <= (u64 *)(tmp_enc + ret); u++)
auth->key ^= *u;
dout(" server_challenge %llx client_challenge %llx key %llx\n",
xi->server_challenge, le64_to_cpu(auth->client_challenge),
le64_to_cpu(auth->key));
/* now encode the old ticket if exists */
ret = ceph_x_encode_ticket(th, &p, end);
if (ret < 0)
return ret;
return p - buf;
}
if (need) {
void *p = head + 1;
struct ceph_x_service_ticket_request *req;
if (p > end)
return -ERANGE;
head->op = cpu_to_le16(CEPHX_GET_PRINCIPAL_SESSION_KEY);
BUG_ON(!th);
ret = ceph_x_build_authorizer(ac, th, &xi->auth_authorizer);
if (ret)
return ret;
ceph_encode_copy(&p, xi->auth_authorizer.buf->vec.iov_base,
xi->auth_authorizer.buf->vec.iov_len);
req = p;
req->keys = cpu_to_le32(need);
p += sizeof(*req);
return p - buf;
}
return 0;
}
static int ceph_x_handle_reply(struct ceph_auth_client *ac, int result,
void *buf, void *end)
{
struct ceph_x_info *xi = ac->private;
struct ceph_x_reply_header *head = buf;
struct ceph_x_ticket_handler *th;
int len = end - buf;
int op;
int ret;
if (result)
return result; /* XXX hmm? */
if (xi->starting) {
/* it's a hello */
struct ceph_x_server_challenge *sc = buf;
if (len != sizeof(*sc))
return -EINVAL;
xi->server_challenge = le64_to_cpu(sc->server_challenge);
dout("handle_reply got server challenge %llx\n",
xi->server_challenge);
xi->starting = false;
xi->have_keys &= ~CEPH_ENTITY_TYPE_AUTH;
return -EAGAIN;
}
op = le32_to_cpu(head->op);
result = le32_to_cpu(head->result);
dout("handle_reply op %d result %d\n", op, result);
switch (op) {
case CEPHX_GET_AUTH_SESSION_KEY:
/* verify auth key */
ret = ceph_x_proc_ticket_reply(ac, &xi->secret,
buf + sizeof(*head), end);
break;
case CEPHX_GET_PRINCIPAL_SESSION_KEY:
th = get_ticket_handler(ac, CEPH_ENTITY_TYPE_AUTH);
BUG_ON(!th);
ret = ceph_x_proc_ticket_reply(ac, &th->session_key,
buf + sizeof(*head), end);
break;
default:
return -EINVAL;
}
if (ret)
return ret;
if (ac->want_keys == xi->have_keys)
return 0;
return -EAGAIN;
}
static int ceph_x_create_authorizer(
struct ceph_auth_client *ac, int peer_type,
struct ceph_authorizer **a,
void **buf, size_t *len,
void **reply_buf, size_t *reply_len)
{
struct ceph_x_authorizer *au;
struct ceph_x_ticket_handler *th;
int ret;
th = get_ticket_handler(ac, peer_type);
if (IS_ERR(th))
return PTR_ERR(th);
au = kzalloc(sizeof(*au), GFP_NOFS);
if (!au)
return -ENOMEM;
ret = ceph_x_build_authorizer(ac, th, au);
if (ret) {
kfree(au);
return ret;
}
*a = (struct ceph_authorizer *)au;
*buf = au->buf->vec.iov_base;
*len = au->buf->vec.iov_len;
*reply_buf = au->reply_buf;
*reply_len = sizeof(au->reply_buf);
return 0;
}
static int ceph_x_verify_authorizer_reply(struct ceph_auth_client *ac,
struct ceph_authorizer *a, size_t len)
{
struct ceph_x_authorizer *au = (void *)a;
struct ceph_x_ticket_handler *th;
int ret = 0;
struct ceph_x_authorize_reply reply;
void *p = au->reply_buf;
void *end = p + sizeof(au->reply_buf);
th = get_ticket_handler(ac, au->service);
if (!th)
return -EIO; /* hrm! */
ret = ceph_x_decrypt(&th->session_key, &p, end, &reply, sizeof(reply));
if (ret < 0)
return ret;
if (ret != sizeof(reply))
return -EPERM;
if (au->nonce + 1 != le64_to_cpu(reply.nonce_plus_one))
ret = -EPERM;
else
ret = 0;
dout("verify_authorizer_reply nonce %llx got %llx ret %d\n",
au->nonce, le64_to_cpu(reply.nonce_plus_one), ret);
return ret;
}
static void ceph_x_destroy_authorizer(struct ceph_auth_client *ac,
struct ceph_authorizer *a)
{
struct ceph_x_authorizer *au = (void *)a;
ceph_buffer_put(au->buf);
kfree(au);
}
static void ceph_x_reset(struct ceph_auth_client *ac)
{
struct ceph_x_info *xi = ac->private;
dout("reset\n");
xi->starting = true;
xi->server_challenge = 0;
}
static void ceph_x_destroy(struct ceph_auth_client *ac)
{
struct ceph_x_info *xi = ac->private;
struct rb_node *p;
dout("ceph_x_destroy %p\n", ac);
ceph_crypto_key_destroy(&xi->secret);
while ((p = rb_first(&xi->ticket_handlers)) != NULL) {
struct ceph_x_ticket_handler *th =
rb_entry(p, struct ceph_x_ticket_handler, node);
remove_ticket_handler(ac, th);
}
kmem_cache_destroy(ceph_x_ticketbuf_cachep);
kfree(ac->private);
ac->private = NULL;
}
static void ceph_x_invalidate_authorizer(struct ceph_auth_client *ac,
int peer_type)
{
struct ceph_x_ticket_handler *th;
th = get_ticket_handler(ac, peer_type);
if (th && !IS_ERR(th))
remove_ticket_handler(ac, th);
}
static const struct ceph_auth_client_ops ceph_x_ops = {
.is_authenticated = ceph_x_is_authenticated,
.build_request = ceph_x_build_request,
.handle_reply = ceph_x_handle_reply,
.create_authorizer = ceph_x_create_authorizer,
.verify_authorizer_reply = ceph_x_verify_authorizer_reply,
.destroy_authorizer = ceph_x_destroy_authorizer,
.invalidate_authorizer = ceph_x_invalidate_authorizer,
.reset = ceph_x_reset,
.destroy = ceph_x_destroy,
};
int ceph_x_init(struct ceph_auth_client *ac)
{
struct ceph_x_info *xi;
int ret;
dout("ceph_x_init %p\n", ac);
xi = kzalloc(sizeof(*xi), GFP_NOFS);
if (!xi)
return -ENOMEM;
ret = -ENOMEM;
ceph_x_ticketbuf_cachep = kmem_cache_create("ceph_x_ticketbuf",
TEMP_TICKET_BUF_LEN, 8,
(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD),
NULL);
if (!ceph_x_ticketbuf_cachep)
goto done_nomem;
ret = -EINVAL;
if (!ac->secret) {
pr_err("no secret set (for auth_x protocol)\n");
goto done_nomem;
}
ret = ceph_crypto_key_unarmor(&xi->secret, ac->secret);
if (ret)
goto done_nomem;
xi->starting = true;
xi->ticket_handlers = RB_ROOT;
ac->protocol = CEPH_AUTH_CEPHX;
ac->private = xi;
ac->ops = &ceph_x_ops;
return 0;
done_nomem:
kfree(xi);
if (ceph_x_ticketbuf_cachep)
kmem_cache_destroy(ceph_x_ticketbuf_cachep);
return ret;
}
#ifndef _FS_CEPH_AUTH_X_H
#define _FS_CEPH_AUTH_X_H
#include <linux/rbtree.h>
#include "crypto.h"
#include "auth.h"
#include "auth_x_protocol.h"
/*
* Handle ticket for a single service.
*/
struct ceph_x_ticket_handler {
struct rb_node node;
unsigned service;
struct ceph_crypto_key session_key;
struct ceph_timespec validity;
u64 secret_id;
struct ceph_buffer *ticket_blob;
unsigned long renew_after, expires;
};
struct ceph_x_authorizer {
struct ceph_buffer *buf;
unsigned service;
u64 nonce;
char reply_buf[128]; /* big enough for encrypted blob */
};
struct ceph_x_info {
struct ceph_crypto_key secret;
bool starting;
u64 server_challenge;
unsigned have_keys;
struct rb_root ticket_handlers;
struct ceph_x_authorizer auth_authorizer;
};
extern int ceph_x_init(struct ceph_auth_client *ac);
#endif
#ifndef __FS_CEPH_AUTH_X_PROTOCOL
#define __FS_CEPH_AUTH_X_PROTOCOL
#define CEPHX_GET_AUTH_SESSION_KEY 0x0100
#define CEPHX_GET_PRINCIPAL_SESSION_KEY 0x0200
#define CEPHX_GET_ROTATING_KEY 0x0400
/* common bits */
struct ceph_x_ticket_blob {
__u8 struct_v;
__le64 secret_id;
__le32 blob_len;
char blob[];
} __attribute__ ((packed));
/* common request/reply headers */
struct ceph_x_request_header {
__le16 op;
} __attribute__ ((packed));
struct ceph_x_reply_header {
__le16 op;
__le32 result;
} __attribute__ ((packed));
/* authenticate handshake */
/* initial hello (no reply header) */
struct ceph_x_server_challenge {
__u8 struct_v;
__le64 server_challenge;
} __attribute__ ((packed));
struct ceph_x_authenticate {
__u8 struct_v;
__le64 client_challenge;
__le64 key;
/* ticket blob */
} __attribute__ ((packed));
struct ceph_x_service_ticket_request {
__u8 struct_v;
__le32 keys;
} __attribute__ ((packed));
struct ceph_x_challenge_blob {
__le64 server_challenge;
__le64 client_challenge;
} __attribute__ ((packed));
/* authorize handshake */
/*
* The authorizer consists of two pieces:
* a - service id, ticket blob
* b - encrypted with session key
*/
struct ceph_x_authorize_a {
__u8 struct_v;
__le64 global_id;
__le32 service_id;
struct ceph_x_ticket_blob ticket_blob;
} __attribute__ ((packed));
struct ceph_x_authorize_b {
__u8 struct_v;
__le64 nonce;
} __attribute__ ((packed));
struct ceph_x_authorize_reply {
__u8 struct_v;
__le64 nonce_plus_one;
} __attribute__ ((packed));
/*
* encyption bundle
*/
#define CEPHX_ENC_MAGIC 0xff009cad8826aa55ull
struct ceph_x_encrypt_header {
__u8 struct_v;
__le64 magic;
} __attribute__ ((packed));
#endif
#include "ceph_debug.h"
#include "buffer.h"
#include "decode.h"
struct ceph_buffer *ceph_buffer_new(size_t len, gfp_t gfp)
{
struct ceph_buffer *b;
b = kmalloc(sizeof(*b), gfp);
if (!b)
return NULL;
b->vec.iov_base = kmalloc(len, gfp | __GFP_NOWARN);
if (b->vec.iov_base) {
b->is_vmalloc = false;
} else {
b->vec.iov_base = __vmalloc(len, gfp, PAGE_KERNEL);
if (!b->vec.iov_base) {
kfree(b);
return NULL;
}
b->is_vmalloc = true;
}
kref_init(&b->kref);
b->alloc_len = len;
b->vec.iov_len = len;
dout("buffer_new %p\n", b);
return b;
}
void ceph_buffer_release(struct kref *kref)
{
struct ceph_buffer *b = container_of(kref, struct ceph_buffer, kref);
dout("buffer_release %p\n", b);
if (b->vec.iov_base) {
if (b->is_vmalloc)
vfree(b->vec.iov_base);
else
kfree(b->vec.iov_base);
}
kfree(b);
}
int ceph_buffer_alloc(struct ceph_buffer *b, int len, gfp_t gfp)
{
b->vec.iov_base = kmalloc(len, gfp | __GFP_NOWARN);
if (b->vec.iov_base) {
b->is_vmalloc = false;
} else {
b->vec.iov_base = __vmalloc(len, gfp, PAGE_KERNEL);
b->is_vmalloc = true;
}
if (!b->vec.iov_base)
return -ENOMEM;
b->alloc_len = len;
b->vec.iov_len = len;
return 0;
}
int ceph_decode_buffer(struct ceph_buffer **b, void **p, void *end)
{
size_t len;
ceph_decode_need(p, end, sizeof(u32), bad);
len = ceph_decode_32(p);
dout("decode_buffer len %d\n", (int)len);
ceph_decode_need(p, end, len, bad);
*b = ceph_buffer_new(len, GFP_NOFS);
if (!*b)
return -ENOMEM;
ceph_decode_copy(p, (*b)->vec.iov_base, len);
return 0;
bad:
return -EINVAL;
}
#ifndef __FS_CEPH_BUFFER_H
#define __FS_CEPH_BUFFER_H
#include <linux/kref.h>
#include <linux/mm.h>
#include <linux/vmalloc.h>
#include <linux/types.h>
#include <linux/uio.h>
/*
* a simple reference counted buffer.
*
* use kmalloc for small sizes (<= one page), vmalloc for larger
* sizes.
*/
struct ceph_buffer {
struct kref kref;
struct kvec vec;
size_t alloc_len;
bool is_vmalloc;
};
extern struct ceph_buffer *ceph_buffer_new(size_t len, gfp_t gfp);
extern void ceph_buffer_release(struct kref *kref);
static inline struct ceph_buffer *ceph_buffer_get(struct ceph_buffer *b)
{
kref_get(&b->kref);
return b;
}
static inline void ceph_buffer_put(struct ceph_buffer *b)
{
kref_put(&b->kref, ceph_buffer_release);
}
extern int ceph_decode_buffer(struct ceph_buffer **b, void **p, void *end);
#endif
此差异已折叠。
#ifndef _FS_CEPH_DEBUG_H
#define _FS_CEPH_DEBUG_H
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#ifdef CONFIG_CEPH_FS_PRETTYDEBUG
/*
* wrap pr_debug to include a filename:lineno prefix on each line.
* this incurs some overhead (kernel size and execution time) due to
* the extra function call at each call site.
*/
# if defined(DEBUG) || defined(CONFIG_DYNAMIC_DEBUG)
extern const char *ceph_file_part(const char *s, int len);
# define dout(fmt, ...) \
pr_debug(" %12.12s:%-4d : " fmt, \
ceph_file_part(__FILE__, sizeof(__FILE__)), \
__LINE__, ##__VA_ARGS__)
# else
/* faux printk call just to see any compiler warnings. */
# define dout(fmt, ...) do { \
if (0) \
printk(KERN_DEBUG fmt, ##__VA_ARGS__); \
} while (0)
# endif
#else
/*
* or, just wrap pr_debug
*/
# define dout(fmt, ...) pr_debug(" " fmt, ##__VA_ARGS__)
#endif
#endif
/*
* Ceph 'frag' type
*/
#include "types.h"
int ceph_frag_compare(__u32 a, __u32 b)
{
unsigned va = ceph_frag_value(a);
unsigned vb = ceph_frag_value(b);
if (va < vb)
return -1;
if (va > vb)
return 1;
va = ceph_frag_bits(a);
vb = ceph_frag_bits(b);
if (va < vb)
return -1;
if (va > vb)
return 1;
return 0;
}
#ifndef _FS_CEPH_FRAG_H
#define _FS_CEPH_FRAG_H
/*
* "Frags" are a way to describe a subset of a 32-bit number space,
* using a mask and a value to match against that mask. Any given frag
* (subset of the number space) can be partitioned into 2^n sub-frags.
*
* Frags are encoded into a 32-bit word:
* 8 upper bits = "bits"
* 24 lower bits = "value"
* (We could go to 5+27 bits, but who cares.)
*
* We use the _most_ significant bits of the 24 bit value. This makes
* values logically sort.
*
* Unfortunately, because the "bits" field is still in the high bits, we
* can't sort encoded frags numerically. However, it does allow you
* to feed encoded frags as values into frag_contains_value.
*/
static inline __u32 ceph_frag_make(__u32 b, __u32 v)
{
return (b << 24) |
(v & (0xffffffu << (24-b)) & 0xffffffu);
}
static inline __u32 ceph_frag_bits(__u32 f)
{
return f >> 24;
}
static inline __u32 ceph_frag_value(__u32 f)
{
return f & 0xffffffu;
}
static inline __u32 ceph_frag_mask(__u32 f)
{
return (0xffffffu << (24-ceph_frag_bits(f))) & 0xffffffu;
}
static inline __u32 ceph_frag_mask_shift(__u32 f)
{
return 24 - ceph_frag_bits(f);
}
static inline int ceph_frag_contains_value(__u32 f, __u32 v)
{
return (v & ceph_frag_mask(f)) == ceph_frag_value(f);
}
static inline int ceph_frag_contains_frag(__u32 f, __u32 sub)
{
/* is sub as specific as us, and contained by us? */
return ceph_frag_bits(sub) >= ceph_frag_bits(f) &&
(ceph_frag_value(sub) & ceph_frag_mask(f)) == ceph_frag_value(f);
}
static inline __u32 ceph_frag_parent(__u32 f)
{
return ceph_frag_make(ceph_frag_bits(f) - 1,
ceph_frag_value(f) & (ceph_frag_mask(f) << 1));
}
static inline int ceph_frag_is_left_child(__u32 f)
{
return ceph_frag_bits(f) > 0 &&
(ceph_frag_value(f) & (0x1000000 >> ceph_frag_bits(f))) == 0;
}
static inline int ceph_frag_is_right_child(__u32 f)
{
return ceph_frag_bits(f) > 0 &&
(ceph_frag_value(f) & (0x1000000 >> ceph_frag_bits(f))) == 1;
}
static inline __u32 ceph_frag_sibling(__u32 f)
{
return ceph_frag_make(ceph_frag_bits(f),
ceph_frag_value(f) ^ (0x1000000 >> ceph_frag_bits(f)));
}
static inline __u32 ceph_frag_left_child(__u32 f)
{
return ceph_frag_make(ceph_frag_bits(f)+1, ceph_frag_value(f));
}
static inline __u32 ceph_frag_right_child(__u32 f)
{
return ceph_frag_make(ceph_frag_bits(f)+1,
ceph_frag_value(f) | (0x1000000 >> (1+ceph_frag_bits(f))));
}
static inline __u32 ceph_frag_make_child(__u32 f, int by, int i)
{
int newbits = ceph_frag_bits(f) + by;
return ceph_frag_make(newbits,
ceph_frag_value(f) | (i << (24 - newbits)));
}
static inline int ceph_frag_is_leftmost(__u32 f)
{
return ceph_frag_value(f) == 0;
}
static inline int ceph_frag_is_rightmost(__u32 f)
{
return ceph_frag_value(f) == ceph_frag_mask(f);
}
static inline __u32 ceph_frag_next(__u32 f)
{
return ceph_frag_make(ceph_frag_bits(f),
ceph_frag_value(f) + (0x1000000 >> ceph_frag_bits(f)));
}
/*
* comparator to sort frags logically, as when traversing the
* number space in ascending order...
*/
int ceph_frag_compare(__u32 a, __u32 b);
#endif
/*
* Some non-inline ceph helpers
*/
#include "types.h"
/*
* return true if @layout appears to be valid
*/
int ceph_file_layout_is_valid(const struct ceph_file_layout *layout)
{
__u32 su = le32_to_cpu(layout->fl_stripe_unit);
__u32 sc = le32_to_cpu(layout->fl_stripe_count);
__u32 os = le32_to_cpu(layout->fl_object_size);
/* stripe unit, object size must be non-zero, 64k increment */
if (!su || (su & (CEPH_MIN_STRIPE_UNIT-1)))
return 0;
if (!os || (os & (CEPH_MIN_STRIPE_UNIT-1)))
return 0;
/* object size must be a multiple of stripe unit */
if (os < su || os % su)
return 0;
/* stripe count must be non-zero */
if (!sc)
return 0;
return 1;
}
int ceph_flags_to_mode(int flags)
{
#ifdef O_DIRECTORY /* fixme */
if ((flags & O_DIRECTORY) == O_DIRECTORY)
return CEPH_FILE_MODE_PIN;
#endif
#ifdef O_LAZY
if (flags & O_LAZY)
return CEPH_FILE_MODE_LAZY;
#endif
if ((flags & O_APPEND) == O_APPEND)
flags |= O_WRONLY;
flags &= O_ACCMODE;
if ((flags & O_RDWR) == O_RDWR)
return CEPH_FILE_MODE_RDWR;
if ((flags & O_WRONLY) == O_WRONLY)
return CEPH_FILE_MODE_WR;
return CEPH_FILE_MODE_RD;
}
int ceph_caps_for_mode(int mode)
{
switch (mode) {
case CEPH_FILE_MODE_PIN:
return CEPH_CAP_PIN;
case CEPH_FILE_MODE_RD:
return CEPH_CAP_PIN | CEPH_CAP_FILE_SHARED |
CEPH_CAP_FILE_RD | CEPH_CAP_FILE_CACHE;
case CEPH_FILE_MODE_RDWR:
return CEPH_CAP_PIN | CEPH_CAP_FILE_SHARED |
CEPH_CAP_FILE_EXCL |
CEPH_CAP_FILE_RD | CEPH_CAP_FILE_CACHE |
CEPH_CAP_FILE_WR | CEPH_CAP_FILE_BUFFER |
CEPH_CAP_AUTH_SHARED | CEPH_CAP_AUTH_EXCL |
CEPH_CAP_XATTR_SHARED | CEPH_CAP_XATTR_EXCL;
case CEPH_FILE_MODE_WR:
return CEPH_CAP_PIN | CEPH_CAP_FILE_SHARED |
CEPH_CAP_FILE_EXCL |
CEPH_CAP_FILE_WR | CEPH_CAP_FILE_BUFFER |
CEPH_CAP_AUTH_SHARED | CEPH_CAP_AUTH_EXCL |
CEPH_CAP_XATTR_SHARED | CEPH_CAP_XATTR_EXCL;
}
return 0;
}
此差异已折叠。
#include "types.h"
/*
* Robert Jenkin's hash function.
* http://burtleburtle.net/bob/hash/evahash.html
* This is in the public domain.
*/
#define mix(a, b, c) \
do { \
a = a - b; a = a - c; a = a ^ (c >> 13); \
b = b - c; b = b - a; b = b ^ (a << 8); \
c = c - a; c = c - b; c = c ^ (b >> 13); \
a = a - b; a = a - c; a = a ^ (c >> 12); \
b = b - c; b = b - a; b = b ^ (a << 16); \
c = c - a; c = c - b; c = c ^ (b >> 5); \
a = a - b; a = a - c; a = a ^ (c >> 3); \
b = b - c; b = b - a; b = b ^ (a << 10); \
c = c - a; c = c - b; c = c ^ (b >> 15); \
} while (0)
unsigned ceph_str_hash_rjenkins(const char *str, unsigned length)
{
const unsigned char *k = (const unsigned char *)str;
__u32 a, b, c; /* the internal state */
__u32 len; /* how many key bytes still need mixing */
/* Set up the internal state */
len = length;
a = 0x9e3779b9; /* the golden ratio; an arbitrary value */
b = a;
c = 0; /* variable initialization of internal state */
/* handle most of the key */
while (len >= 12) {
a = a + (k[0] + ((__u32)k[1] << 8) + ((__u32)k[2] << 16) +
((__u32)k[3] << 24));
b = b + (k[4] + ((__u32)k[5] << 8) + ((__u32)k[6] << 16) +
((__u32)k[7] << 24));
c = c + (k[8] + ((__u32)k[9] << 8) + ((__u32)k[10] << 16) +
((__u32)k[11] << 24));
mix(a, b, c);
k = k + 12;
len = len - 12;
}
/* handle the last 11 bytes */
c = c + length;
switch (len) { /* all the case statements fall through */
case 11:
c = c + ((__u32)k[10] << 24);
case 10:
c = c + ((__u32)k[9] << 16);
case 9:
c = c + ((__u32)k[8] << 8);
/* the first byte of c is reserved for the length */
case 8:
b = b + ((__u32)k[7] << 24);
case 7:
b = b + ((__u32)k[6] << 16);
case 6:
b = b + ((__u32)k[5] << 8);
case 5:
b = b + k[4];
case 4:
a = a + ((__u32)k[3] << 24);
case 3:
a = a + ((__u32)k[2] << 16);
case 2:
a = a + ((__u32)k[1] << 8);
case 1:
a = a + k[0];
/* case 0: nothing left to add */
}
mix(a, b, c);
return c;
}
/*
* linux dcache hash
*/
unsigned ceph_str_hash_linux(const char *str, unsigned length)
{
unsigned long hash = 0;
unsigned char c;
while (length--) {
c = *str++;
hash = (hash + (c << 4) + (c >> 4)) * 11;
}
return hash;
}
unsigned ceph_str_hash(int type, const char *s, unsigned len)
{
switch (type) {
case CEPH_STR_HASH_LINUX:
return ceph_str_hash_linux(s, len);
case CEPH_STR_HASH_RJENKINS:
return ceph_str_hash_rjenkins(s, len);
default:
return -1;
}
}
const char *ceph_str_hash_name(int type)
{
switch (type) {
case CEPH_STR_HASH_LINUX:
return "linux";
case CEPH_STR_HASH_RJENKINS:
return "rjenkins";
default:
return "unknown";
}
}
#ifndef _FS_CEPH_HASH_H
#define _FS_CEPH_HASH_H
#define CEPH_STR_HASH_LINUX 0x1 /* linux dcache hash */
#define CEPH_STR_HASH_RJENKINS 0x2 /* robert jenkins' */
extern unsigned ceph_str_hash_linux(const char *s, unsigned len);
extern unsigned ceph_str_hash_rjenkins(const char *s, unsigned len);
extern unsigned ceph_str_hash(int type, const char *s, unsigned len);
extern const char *ceph_str_hash_name(int type);
#endif
/*
* Ceph string constants
*/
#include "types.h"
const char *ceph_entity_type_name(int type)
{
switch (type) {
case CEPH_ENTITY_TYPE_MDS: return "mds";
case CEPH_ENTITY_TYPE_OSD: return "osd";
case CEPH_ENTITY_TYPE_MON: return "mon";
case CEPH_ENTITY_TYPE_CLIENT: return "client";
case CEPH_ENTITY_TYPE_ADMIN: return "admin";
case CEPH_ENTITY_TYPE_AUTH: return "auth";
default: return "unknown";
}
}
const char *ceph_osd_op_name(int op)
{
switch (op) {
case CEPH_OSD_OP_READ: return "read";
case CEPH_OSD_OP_STAT: return "stat";
case CEPH_OSD_OP_MASKTRUNC: return "masktrunc";
case CEPH_OSD_OP_WRITE: return "write";
case CEPH_OSD_OP_DELETE: return "delete";
case CEPH_OSD_OP_TRUNCATE: return "truncate";
case CEPH_OSD_OP_ZERO: return "zero";
case CEPH_OSD_OP_WRITEFULL: return "writefull";
case CEPH_OSD_OP_APPEND: return "append";
case CEPH_OSD_OP_STARTSYNC: return "startsync";
case CEPH_OSD_OP_SETTRUNC: return "settrunc";
case CEPH_OSD_OP_TRIMTRUNC: return "trimtrunc";
case CEPH_OSD_OP_TMAPUP: return "tmapup";
case CEPH_OSD_OP_TMAPGET: return "tmapget";
case CEPH_OSD_OP_TMAPPUT: return "tmapput";
case CEPH_OSD_OP_GETXATTR: return "getxattr";
case CEPH_OSD_OP_GETXATTRS: return "getxattrs";
case CEPH_OSD_OP_SETXATTR: return "setxattr";
case CEPH_OSD_OP_SETXATTRS: return "setxattrs";
case CEPH_OSD_OP_RESETXATTRS: return "resetxattrs";
case CEPH_OSD_OP_RMXATTR: return "rmxattr";
case CEPH_OSD_OP_PULL: return "pull";
case CEPH_OSD_OP_PUSH: return "push";
case CEPH_OSD_OP_BALANCEREADS: return "balance-reads";
case CEPH_OSD_OP_UNBALANCEREADS: return "unbalance-reads";
case CEPH_OSD_OP_SCRUB: return "scrub";
case CEPH_OSD_OP_WRLOCK: return "wrlock";
case CEPH_OSD_OP_WRUNLOCK: return "wrunlock";
case CEPH_OSD_OP_RDLOCK: return "rdlock";
case CEPH_OSD_OP_RDUNLOCK: return "rdunlock";
case CEPH_OSD_OP_UPLOCK: return "uplock";
case CEPH_OSD_OP_DNLOCK: return "dnlock";
case CEPH_OSD_OP_CALL: return "call";
case CEPH_OSD_OP_PGLS: return "pgls";
}
return "???";
}
const char *ceph_mds_state_name(int s)
{
switch (s) {
/* down and out */
case CEPH_MDS_STATE_DNE: return "down:dne";
case CEPH_MDS_STATE_STOPPED: return "down:stopped";
/* up and out */
case CEPH_MDS_STATE_BOOT: return "up:boot";
case CEPH_MDS_STATE_STANDBY: return "up:standby";
case CEPH_MDS_STATE_STANDBY_REPLAY: return "up:standby-replay";
case CEPH_MDS_STATE_CREATING: return "up:creating";
case CEPH_MDS_STATE_STARTING: return "up:starting";
/* up and in */
case CEPH_MDS_STATE_REPLAY: return "up:replay";
case CEPH_MDS_STATE_RESOLVE: return "up:resolve";
case CEPH_MDS_STATE_RECONNECT: return "up:reconnect";
case CEPH_MDS_STATE_REJOIN: return "up:rejoin";
case CEPH_MDS_STATE_CLIENTREPLAY: return "up:clientreplay";
case CEPH_MDS_STATE_ACTIVE: return "up:active";
case CEPH_MDS_STATE_STOPPING: return "up:stopping";
}
return "???";
}
const char *ceph_session_op_name(int op)
{
switch (op) {
case CEPH_SESSION_REQUEST_OPEN: return "request_open";
case CEPH_SESSION_OPEN: return "open";
case CEPH_SESSION_REQUEST_CLOSE: return "request_close";
case CEPH_SESSION_CLOSE: return "close";
case CEPH_SESSION_REQUEST_RENEWCAPS: return "request_renewcaps";
case CEPH_SESSION_RENEWCAPS: return "renewcaps";
case CEPH_SESSION_STALE: return "stale";
case CEPH_SESSION_RECALL_STATE: return "recall_state";
}
return "???";
}
const char *ceph_mds_op_name(int op)
{
switch (op) {
case CEPH_MDS_OP_LOOKUP: return "lookup";
case CEPH_MDS_OP_LOOKUPHASH: return "lookuphash";
case CEPH_MDS_OP_LOOKUPPARENT: return "lookupparent";
case CEPH_MDS_OP_GETATTR: return "getattr";
case CEPH_MDS_OP_SETXATTR: return "setxattr";
case CEPH_MDS_OP_SETATTR: return "setattr";
case CEPH_MDS_OP_RMXATTR: return "rmxattr";
case CEPH_MDS_OP_READDIR: return "readdir";
case CEPH_MDS_OP_MKNOD: return "mknod";
case CEPH_MDS_OP_LINK: return "link";
case CEPH_MDS_OP_UNLINK: return "unlink";
case CEPH_MDS_OP_RENAME: return "rename";
case CEPH_MDS_OP_MKDIR: return "mkdir";
case CEPH_MDS_OP_RMDIR: return "rmdir";
case CEPH_MDS_OP_SYMLINK: return "symlink";
case CEPH_MDS_OP_CREATE: return "create";
case CEPH_MDS_OP_OPEN: return "open";
case CEPH_MDS_OP_LOOKUPSNAP: return "lookupsnap";
case CEPH_MDS_OP_LSSNAP: return "lssnap";
case CEPH_MDS_OP_MKSNAP: return "mksnap";
case CEPH_MDS_OP_RMSNAP: return "rmsnap";
}
return "???";
}
const char *ceph_cap_op_name(int op)
{
switch (op) {
case CEPH_CAP_OP_GRANT: return "grant";
case CEPH_CAP_OP_REVOKE: return "revoke";
case CEPH_CAP_OP_TRUNC: return "trunc";
case CEPH_CAP_OP_EXPORT: return "export";
case CEPH_CAP_OP_IMPORT: return "import";
case CEPH_CAP_OP_UPDATE: return "update";
case CEPH_CAP_OP_DROP: return "drop";
case CEPH_CAP_OP_FLUSH: return "flush";
case CEPH_CAP_OP_FLUSH_ACK: return "flush_ack";
case CEPH_CAP_OP_FLUSHSNAP: return "flushsnap";
case CEPH_CAP_OP_FLUSHSNAP_ACK: return "flushsnap_ack";
case CEPH_CAP_OP_RELEASE: return "release";
case CEPH_CAP_OP_RENEW: return "renew";
}
return "???";
}
const char *ceph_lease_op_name(int o)
{
switch (o) {
case CEPH_MDS_LEASE_REVOKE: return "revoke";
case CEPH_MDS_LEASE_RELEASE: return "release";
case CEPH_MDS_LEASE_RENEW: return "renew";
case CEPH_MDS_LEASE_REVOKE_ACK: return "revoke_ack";
}
return "???";
}
const char *ceph_snap_op_name(int o)
{
switch (o) {
case CEPH_SNAP_OP_UPDATE: return "update";
case CEPH_SNAP_OP_CREATE: return "create";
case CEPH_SNAP_OP_DESTROY: return "destroy";
case CEPH_SNAP_OP_SPLIT: return "split";
}
return "???";
}
#ifdef __KERNEL__
# include <linux/slab.h>
#else
# include <stdlib.h>
# include <assert.h>
# define kfree(x) do { if (x) free(x); } while (0)
# define BUG_ON(x) assert(!(x))
#endif
#include "crush.h"
const char *crush_bucket_alg_name(int alg)
{
switch (alg) {
case CRUSH_BUCKET_UNIFORM: return "uniform";
case CRUSH_BUCKET_LIST: return "list";
case CRUSH_BUCKET_TREE: return "tree";
case CRUSH_BUCKET_STRAW: return "straw";
default: return "unknown";
}
}
/**
* crush_get_bucket_item_weight - Get weight of an item in given bucket
* @b: bucket pointer
* @p: item index in bucket
*/
int crush_get_bucket_item_weight(struct crush_bucket *b, int p)
{
if (p >= b->size)
return 0;
switch (b->alg) {
case CRUSH_BUCKET_UNIFORM:
return ((struct crush_bucket_uniform *)b)->item_weight;
case CRUSH_BUCKET_LIST:
return ((struct crush_bucket_list *)b)->item_weights[p];
case CRUSH_BUCKET_TREE:
if (p & 1)
return ((struct crush_bucket_tree *)b)->node_weights[p];
return 0;
case CRUSH_BUCKET_STRAW:
return ((struct crush_bucket_straw *)b)->item_weights[p];
}
return 0;
}
/**
* crush_calc_parents - Calculate parent vectors for the given crush map.
* @map: crush_map pointer
*/
void crush_calc_parents(struct crush_map *map)
{
int i, b, c;
for (b = 0; b < map->max_buckets; b++) {
if (map->buckets[b] == NULL)
continue;
for (i = 0; i < map->buckets[b]->size; i++) {
c = map->buckets[b]->items[i];
BUG_ON(c >= map->max_devices ||
c < -map->max_buckets);
if (c >= 0)
map->device_parents[c] = map->buckets[b]->id;
else
map->bucket_parents[-1-c] = map->buckets[b]->id;
}
}
}
void crush_destroy_bucket_uniform(struct crush_bucket_uniform *b)
{
kfree(b->h.perm);
kfree(b->h.items);
kfree(b);
}
void crush_destroy_bucket_list(struct crush_bucket_list *b)
{
kfree(b->item_weights);
kfree(b->sum_weights);
kfree(b->h.perm);
kfree(b->h.items);
kfree(b);
}
void crush_destroy_bucket_tree(struct crush_bucket_tree *b)
{
kfree(b->node_weights);
kfree(b);
}
void crush_destroy_bucket_straw(struct crush_bucket_straw *b)
{
kfree(b->straws);
kfree(b->item_weights);
kfree(b->h.perm);
kfree(b->h.items);
kfree(b);
}
void crush_destroy_bucket(struct crush_bucket *b)
{
switch (b->alg) {
case CRUSH_BUCKET_UNIFORM:
crush_destroy_bucket_uniform((struct crush_bucket_uniform *)b);
break;
case CRUSH_BUCKET_LIST:
crush_destroy_bucket_list((struct crush_bucket_list *)b);
break;
case CRUSH_BUCKET_TREE:
crush_destroy_bucket_tree((struct crush_bucket_tree *)b);
break;
case CRUSH_BUCKET_STRAW:
crush_destroy_bucket_straw((struct crush_bucket_straw *)b);
break;
}
}
/**
* crush_destroy - Destroy a crush_map
* @map: crush_map pointer
*/
void crush_destroy(struct crush_map *map)
{
int b;
/* buckets */
if (map->buckets) {
for (b = 0; b < map->max_buckets; b++) {
if (map->buckets[b] == NULL)
continue;
crush_destroy_bucket(map->buckets[b]);
}
kfree(map->buckets);
}
/* rules */
if (map->rules) {
for (b = 0; b < map->max_rules; b++)
kfree(map->rules[b]);
kfree(map->rules);
}
kfree(map->bucket_parents);
kfree(map->device_parents);
kfree(map);
}
#ifndef _CRUSH_CRUSH_H
#define _CRUSH_CRUSH_H
#include <linux/types.h>
/*
* CRUSH is a pseudo-random data distribution algorithm that
* efficiently distributes input values (typically, data objects)
* across a heterogeneous, structured storage cluster.
*
* The algorithm was originally described in detail in this paper
* (although the algorithm has evolved somewhat since then):
*
* http://www.ssrc.ucsc.edu/Papers/weil-sc06.pdf
*
* LGPL2
*/
#define CRUSH_MAGIC 0x00010000ul /* for detecting algorithm revisions */
#define CRUSH_MAX_DEPTH 10 /* max crush hierarchy depth */
#define CRUSH_MAX_SET 10 /* max size of a mapping result */
/*
* CRUSH uses user-defined "rules" to describe how inputs should be
* mapped to devices. A rule consists of sequence of steps to perform
* to generate the set of output devices.
*/
struct crush_rule_step {
__u32 op;
__s32 arg1;
__s32 arg2;
};
/* step op codes */
enum {
CRUSH_RULE_NOOP = 0,
CRUSH_RULE_TAKE = 1, /* arg1 = value to start with */
CRUSH_RULE_CHOOSE_FIRSTN = 2, /* arg1 = num items to pick */
/* arg2 = type */
CRUSH_RULE_CHOOSE_INDEP = 3, /* same */
CRUSH_RULE_EMIT = 4, /* no args */
CRUSH_RULE_CHOOSE_LEAF_FIRSTN = 6,
CRUSH_RULE_CHOOSE_LEAF_INDEP = 7,
};
/*
* for specifying choose num (arg1) relative to the max parameter
* passed to do_rule
*/
#define CRUSH_CHOOSE_N 0
#define CRUSH_CHOOSE_N_MINUS(x) (-(x))
/*
* The rule mask is used to describe what the rule is intended for.
* Given a ruleset and size of output set, we search through the
* rule list for a matching rule_mask.
*/
struct crush_rule_mask {
__u8 ruleset;
__u8 type;
__u8 min_size;
__u8 max_size;
};
struct crush_rule {
__u32 len;
struct crush_rule_mask mask;
struct crush_rule_step steps[0];
};
#define crush_rule_size(len) (sizeof(struct crush_rule) + \
(len)*sizeof(struct crush_rule_step))
/*
* A bucket is a named container of other items (either devices or
* other buckets). Items within a bucket are chosen using one of a
* few different algorithms. The table summarizes how the speed of
* each option measures up against mapping stability when items are
* added or removed.
*
* Bucket Alg Speed Additions Removals
* ------------------------------------------------
* uniform O(1) poor poor
* list O(n) optimal poor
* tree O(log n) good good
* straw O(n) optimal optimal
*/
enum {
CRUSH_BUCKET_UNIFORM = 1,
CRUSH_BUCKET_LIST = 2,
CRUSH_BUCKET_TREE = 3,
CRUSH_BUCKET_STRAW = 4
};
extern const char *crush_bucket_alg_name(int alg);
struct crush_bucket {
__s32 id; /* this'll be negative */
__u16 type; /* non-zero; type=0 is reserved for devices */
__u8 alg; /* one of CRUSH_BUCKET_* */
__u8 hash; /* which hash function to use, CRUSH_HASH_* */
__u32 weight; /* 16-bit fixed point */
__u32 size; /* num items */
__s32 *items;
/*
* cached random permutation: used for uniform bucket and for
* the linear search fallback for the other bucket types.
*/
__u32 perm_x; /* @x for which *perm is defined */
__u32 perm_n; /* num elements of *perm that are permuted/defined */
__u32 *perm;
};
struct crush_bucket_uniform {
struct crush_bucket h;
__u32 item_weight; /* 16-bit fixed point; all items equally weighted */
};
struct crush_bucket_list {
struct crush_bucket h;
__u32 *item_weights; /* 16-bit fixed point */
__u32 *sum_weights; /* 16-bit fixed point. element i is sum
of weights 0..i, inclusive */
};
struct crush_bucket_tree {
struct crush_bucket h; /* note: h.size is _tree_ size, not number of
actual items */
__u8 num_nodes;
__u32 *node_weights;
};
struct crush_bucket_straw {
struct crush_bucket h;
__u32 *item_weights; /* 16-bit fixed point */
__u32 *straws; /* 16-bit fixed point */
};
/*
* CRUSH map includes all buckets, rules, etc.
*/
struct crush_map {
struct crush_bucket **buckets;
struct crush_rule **rules;
/*
* Parent pointers to identify the parent bucket a device or
* bucket in the hierarchy. If an item appears more than
* once, this is the _last_ time it appeared (where buckets
* are processed in bucket id order, from -1 on down to
* -max_buckets.
*/
__u32 *bucket_parents;
__u32 *device_parents;
__s32 max_buckets;
__u32 max_rules;
__s32 max_devices;
};
/* crush.c */
extern int crush_get_bucket_item_weight(struct crush_bucket *b, int pos);
extern void crush_calc_parents(struct crush_map *map);
extern void crush_destroy_bucket_uniform(struct crush_bucket_uniform *b);
extern void crush_destroy_bucket_list(struct crush_bucket_list *b);
extern void crush_destroy_bucket_tree(struct crush_bucket_tree *b);
extern void crush_destroy_bucket_straw(struct crush_bucket_straw *b);
extern void crush_destroy_bucket(struct crush_bucket *b);
extern void crush_destroy(struct crush_map *map);
#endif
#include <linux/types.h>
#include "hash.h"
/*
* Robert Jenkins' function for mixing 32-bit values
* http://burtleburtle.net/bob/hash/evahash.html
* a, b = random bits, c = input and output
*/
#define crush_hashmix(a, b, c) do { \
a = a-b; a = a-c; a = a^(c>>13); \
b = b-c; b = b-a; b = b^(a<<8); \
c = c-a; c = c-b; c = c^(b>>13); \
a = a-b; a = a-c; a = a^(c>>12); \
b = b-c; b = b-a; b = b^(a<<16); \
c = c-a; c = c-b; c = c^(b>>5); \
a = a-b; a = a-c; a = a^(c>>3); \
b = b-c; b = b-a; b = b^(a<<10); \
c = c-a; c = c-b; c = c^(b>>15); \
} while (0)
#define crush_hash_seed 1315423911
static __u32 crush_hash32_rjenkins1(__u32 a)
{
__u32 hash = crush_hash_seed ^ a;
__u32 b = a;
__u32 x = 231232;
__u32 y = 1232;
crush_hashmix(b, x, hash);
crush_hashmix(y, a, hash);
return hash;
}
static __u32 crush_hash32_rjenkins1_2(__u32 a, __u32 b)
{
__u32 hash = crush_hash_seed ^ a ^ b;
__u32 x = 231232;
__u32 y = 1232;
crush_hashmix(a, b, hash);
crush_hashmix(x, a, hash);
crush_hashmix(b, y, hash);
return hash;
}
static __u32 crush_hash32_rjenkins1_3(__u32 a, __u32 b, __u32 c)
{
__u32 hash = crush_hash_seed ^ a ^ b ^ c;
__u32 x = 231232;
__u32 y = 1232;
crush_hashmix(a, b, hash);
crush_hashmix(c, x, hash);
crush_hashmix(y, a, hash);
crush_hashmix(b, x, hash);
crush_hashmix(y, c, hash);
return hash;
}
static __u32 crush_hash32_rjenkins1_4(__u32 a, __u32 b, __u32 c, __u32 d)
{
__u32 hash = crush_hash_seed ^ a ^ b ^ c ^ d;
__u32 x = 231232;
__u32 y = 1232;
crush_hashmix(a, b, hash);
crush_hashmix(c, d, hash);
crush_hashmix(a, x, hash);
crush_hashmix(y, b, hash);
crush_hashmix(c, x, hash);
crush_hashmix(y, d, hash);
return hash;
}
static __u32 crush_hash32_rjenkins1_5(__u32 a, __u32 b, __u32 c, __u32 d,
__u32 e)
{
__u32 hash = crush_hash_seed ^ a ^ b ^ c ^ d ^ e;
__u32 x = 231232;
__u32 y = 1232;
crush_hashmix(a, b, hash);
crush_hashmix(c, d, hash);
crush_hashmix(e, x, hash);
crush_hashmix(y, a, hash);
crush_hashmix(b, x, hash);
crush_hashmix(y, c, hash);
crush_hashmix(d, x, hash);
crush_hashmix(y, e, hash);
return hash;
}
__u32 crush_hash32(int type, __u32 a)
{
switch (type) {
case CRUSH_HASH_RJENKINS1:
return crush_hash32_rjenkins1(a);
default:
return 0;
}
}
__u32 crush_hash32_2(int type, __u32 a, __u32 b)
{
switch (type) {
case CRUSH_HASH_RJENKINS1:
return crush_hash32_rjenkins1_2(a, b);
default:
return 0;
}
}
__u32 crush_hash32_3(int type, __u32 a, __u32 b, __u32 c)
{
switch (type) {
case CRUSH_HASH_RJENKINS1:
return crush_hash32_rjenkins1_3(a, b, c);
default:
return 0;
}
}
__u32 crush_hash32_4(int type, __u32 a, __u32 b, __u32 c, __u32 d)
{
switch (type) {
case CRUSH_HASH_RJENKINS1:
return crush_hash32_rjenkins1_4(a, b, c, d);
default:
return 0;
}
}
__u32 crush_hash32_5(int type, __u32 a, __u32 b, __u32 c, __u32 d, __u32 e)
{
switch (type) {
case CRUSH_HASH_RJENKINS1:
return crush_hash32_rjenkins1_5(a, b, c, d, e);
default:
return 0;
}
}
const char *crush_hash_name(int type)
{
switch (type) {
case CRUSH_HASH_RJENKINS1:
return "rjenkins1";
default:
return "unknown";
}
}
#ifndef _CRUSH_HASH_H
#define _CRUSH_HASH_H
#define CRUSH_HASH_RJENKINS1 0
#define CRUSH_HASH_DEFAULT CRUSH_HASH_RJENKINS1
extern const char *crush_hash_name(int type);
extern __u32 crush_hash32(int type, __u32 a);
extern __u32 crush_hash32_2(int type, __u32 a, __u32 b);
extern __u32 crush_hash32_3(int type, __u32 a, __u32 b, __u32 c);
extern __u32 crush_hash32_4(int type, __u32 a, __u32 b, __u32 c, __u32 d);
extern __u32 crush_hash32_5(int type, __u32 a, __u32 b, __u32 c, __u32 d,
__u32 e);
#endif
此差异已折叠。
#ifndef _CRUSH_MAPPER_H
#define _CRUSH_MAPPER_H
/*
* CRUSH functions for find rules and then mapping an input to an
* output set.
*
* LGPL2
*/
#include "crush.h"
extern int crush_find_rule(struct crush_map *map, int pool, int type, int size);
extern int crush_do_rule(struct crush_map *map,
int ruleno,
int x, int *result, int result_max,
int forcefeed, /* -1 for none */
__u32 *weights);
#endif
此差异已折叠。
#ifndef _FS_CEPH_CRYPTO_H
#define _FS_CEPH_CRYPTO_H
#include "types.h"
#include "buffer.h"
/*
* cryptographic secret
*/
struct ceph_crypto_key {
int type;
struct ceph_timespec created;
int len;
void *key;
};
static inline void ceph_crypto_key_destroy(struct ceph_crypto_key *key)
{
kfree(key->key);
}
extern int ceph_crypto_key_encode(struct ceph_crypto_key *key,
void **p, void *end);
extern int ceph_crypto_key_decode(struct ceph_crypto_key *key,
void **p, void *end);
extern int ceph_crypto_key_unarmor(struct ceph_crypto_key *key, const char *in);
/* crypto.c */
extern int ceph_decrypt(struct ceph_crypto_key *secret,
void *dst, size_t *dst_len,
const void *src, size_t src_len);
extern int ceph_encrypt(struct ceph_crypto_key *secret,
void *dst, size_t *dst_len,
const void *src, size_t src_len);
extern int ceph_decrypt2(struct ceph_crypto_key *secret,
void *dst1, size_t *dst1_len,
void *dst2, size_t *dst2_len,
const void *src, size_t src_len);
extern int ceph_encrypt2(struct ceph_crypto_key *secret,
void *dst, size_t *dst_len,
const void *src1, size_t src1_len,
const void *src2, size_t src2_len);
/* armor.c */
extern int ceph_armor(char *dst, const void *src, const void *end);
extern int ceph_unarmor(void *dst, const char *src, const char *end);
#endif
此差异已折叠。
#ifndef __CEPH_DECODE_H
#define __CEPH_DECODE_H
#include <asm/unaligned.h>
#include <linux/time.h>
#include "types.h"
/*
* in all cases,
* void **p pointer to position pointer
* void *end pointer to end of buffer (last byte + 1)
*/
static inline u64 ceph_decode_64(void **p)
{
u64 v = get_unaligned_le64(*p);
*p += sizeof(u64);
return v;
}
static inline u32 ceph_decode_32(void **p)
{
u32 v = get_unaligned_le32(*p);
*p += sizeof(u32);
return v;
}
static inline u16 ceph_decode_16(void **p)
{
u16 v = get_unaligned_le16(*p);
*p += sizeof(u16);
return v;
}
static inline u8 ceph_decode_8(void **p)
{
u8 v = *(u8 *)*p;
(*p)++;
return v;
}
static inline void ceph_decode_copy(void **p, void *pv, size_t n)
{
memcpy(pv, *p, n);
*p += n;
}
/*
* bounds check input.
*/
#define ceph_decode_need(p, end, n, bad) \
do { \
if (unlikely(*(p) + (n) > (end))) \
goto bad; \
} while (0)
#define ceph_decode_64_safe(p, end, v, bad) \
do { \
ceph_decode_need(p, end, sizeof(u64), bad); \
v = ceph_decode_64(p); \
} while (0)
#define ceph_decode_32_safe(p, end, v, bad) \
do { \
ceph_decode_need(p, end, sizeof(u32), bad); \
v = ceph_decode_32(p); \
} while (0)
#define ceph_decode_16_safe(p, end, v, bad) \
do { \
ceph_decode_need(p, end, sizeof(u16), bad); \
v = ceph_decode_16(p); \
} while (0)
#define ceph_decode_8_safe(p, end, v, bad) \
do { \
ceph_decode_need(p, end, sizeof(u8), bad); \
v = ceph_decode_8(p); \
} while (0)
#define ceph_decode_copy_safe(p, end, pv, n, bad) \
do { \
ceph_decode_need(p, end, n, bad); \
ceph_decode_copy(p, pv, n); \
} while (0)
/*
* struct ceph_timespec <-> struct timespec
*/
static inline void ceph_decode_timespec(struct timespec *ts,
const struct ceph_timespec *tv)
{
ts->tv_sec = le32_to_cpu(tv->tv_sec);
ts->tv_nsec = le32_to_cpu(tv->tv_nsec);
}
static inline void ceph_encode_timespec(struct ceph_timespec *tv,
const struct timespec *ts)
{
tv->tv_sec = cpu_to_le32(ts->tv_sec);
tv->tv_nsec = cpu_to_le32(ts->tv_nsec);
}
/*
* sockaddr_storage <-> ceph_sockaddr
*/
static inline void ceph_encode_addr(struct ceph_entity_addr *a)
{
a->in_addr.ss_family = htons(a->in_addr.ss_family);
}
static inline void ceph_decode_addr(struct ceph_entity_addr *a)
{
a->in_addr.ss_family = ntohs(a->in_addr.ss_family);
WARN_ON(a->in_addr.ss_family == 512);
}
/*
* encoders
*/
static inline void ceph_encode_64(void **p, u64 v)
{
put_unaligned_le64(v, (__le64 *)*p);
*p += sizeof(u64);
}
static inline void ceph_encode_32(void **p, u32 v)
{
put_unaligned_le32(v, (__le32 *)*p);
*p += sizeof(u32);
}
static inline void ceph_encode_16(void **p, u16 v)
{
put_unaligned_le16(v, (__le16 *)*p);
*p += sizeof(u16);
}
static inline void ceph_encode_8(void **p, u8 v)
{
*(u8 *)*p = v;
(*p)++;
}
static inline void ceph_encode_copy(void **p, const void *s, int len)
{
memcpy(*p, s, len);
*p += len;
}
/*
* filepath, string encoders
*/
static inline void ceph_encode_filepath(void **p, void *end,
u64 ino, const char *path)
{
u32 len = path ? strlen(path) : 0;
BUG_ON(*p + sizeof(ino) + sizeof(len) + len > end);
ceph_encode_8(p, 1);
ceph_encode_64(p, ino);
ceph_encode_32(p, len);
if (len)
memcpy(*p, path, len);
*p += len;
}
static inline void ceph_encode_string(void **p, void *end,
const char *s, u32 len)
{
BUG_ON(*p + sizeof(len) + len > end);
ceph_encode_32(p, len);
if (len)
memcpy(*p, s, len);
*p += len;
}
#define ceph_encode_need(p, end, n, bad) \
do { \
if (unlikely(*(p) + (n) > (end))) \
goto bad; \
} while (0)
#define ceph_encode_64_safe(p, end, v, bad) \
do { \
ceph_encode_need(p, end, sizeof(u64), bad); \
ceph_encode_64(p, v); \
} while (0)
#define ceph_encode_32_safe(p, end, v, bad) \
do { \
ceph_encode_need(p, end, sizeof(u32), bad); \
ceph_encode_32(p, v); \
} while (0)
#define ceph_encode_16_safe(p, end, v, bad) \
do { \
ceph_encode_need(p, end, sizeof(u16), bad); \
ceph_encode_16(p, v); \
} while (0)
#define ceph_encode_copy_safe(p, end, pv, n, bad) \
do { \
ceph_encode_need(p, end, n, bad); \
ceph_encode_copy(p, pv, n); \
} while (0)
#endif
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
#ifndef FS_CEPH_IOCTL_H
#define FS_CEPH_IOCTL_H
#include <linux/ioctl.h>
#include <linux/types.h>
#define CEPH_IOCTL_MAGIC 0x97
/* just use u64 to align sanely on all archs */
struct ceph_ioctl_layout {
__u64 stripe_unit, stripe_count, object_size;
__u64 data_pool;
__s64 preferred_osd;
};
#define CEPH_IOC_GET_LAYOUT _IOR(CEPH_IOCTL_MAGIC, 1, \
struct ceph_ioctl_layout)
#define CEPH_IOC_SET_LAYOUT _IOW(CEPH_IOCTL_MAGIC, 2, \
struct ceph_ioctl_layout)
/*
* Extract identity, address of the OSD and object storing a given
* file offset.
*/
struct ceph_ioctl_dataloc {
__u64 file_offset; /* in+out: file offset */
__u64 object_offset; /* out: offset in object */
__u64 object_no; /* out: object # */
__u64 object_size; /* out: object size */
char object_name[64]; /* out: object name */
__u64 block_offset; /* out: offset in block */
__u64 block_size; /* out: block length */
__s64 osd; /* out: osd # */
struct sockaddr_storage osd_addr; /* out: osd address */
};
#define CEPH_IOC_GET_DATALOC _IOWR(CEPH_IOCTL_MAGIC, 3, \
struct ceph_ioctl_dataloc)
#endif
此差异已折叠。
此差异已折叠。
此差异已折叠。
#ifndef _FS_CEPH_MDSMAP_H
#define _FS_CEPH_MDSMAP_H
#include "types.h"
/*
* mds map - describe servers in the mds cluster.
*
* we limit fields to those the client actually xcares about
*/
struct ceph_mds_info {
u64 global_id;
struct ceph_entity_addr addr;
s32 state;
int num_export_targets;
u32 *export_targets;
};
struct ceph_mdsmap {
u32 m_epoch, m_client_epoch, m_last_failure;
u32 m_root;
u32 m_session_timeout; /* seconds */
u32 m_session_autoclose; /* seconds */
u64 m_max_file_size;
u32 m_max_mds; /* size of m_addr, m_state arrays */
struct ceph_mds_info *m_info;
/* which object pools file data can be stored in */
int m_num_data_pg_pools;
u32 *m_data_pg_pools;
u32 m_cas_pg_pool;
};
static inline struct ceph_entity_addr *
ceph_mdsmap_get_addr(struct ceph_mdsmap *m, int w)
{
if (w >= m->m_max_mds)
return NULL;
return &m->m_info[w].addr;
}
static inline int ceph_mdsmap_get_state(struct ceph_mdsmap *m, int w)
{
BUG_ON(w < 0);
if (w >= m->m_max_mds)
return CEPH_MDS_STATE_DNE;
return m->m_info[w].state;
}
extern int ceph_mdsmap_get_random_mds(struct ceph_mdsmap *m);
extern struct ceph_mdsmap *ceph_mdsmap_decode(void **p, void *end);
extern void ceph_mdsmap_destroy(struct ceph_mdsmap *m);
#endif
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册