提交 29552b14 编写于 作者: L Linus Torvalds
......@@ -12,10 +12,14 @@ cifs.txt
- description of the CIFS filesystem
coda.txt
- description of the CODA filesystem.
configfs/
- directory containing configfs documentation and example code.
cramfs.txt
- info on the cram filesystem for small storage (ROMs etc)
devfs/
- directory containing devfs documentation.
dlmfs.txt
- info on the userspace interface to the OCFS2 DLM.
ext2.txt
- info, mount options and specifications for the Ext2 filesystem.
hpfs.txt
......@@ -30,6 +34,8 @@ ntfs.txt
- info and mount options for the NTFS filesystem (Windows NT).
proc.txt
- info on Linux's /proc filesystem.
ocfs2.txt
- info and mount options for the OCFS2 clustered filesystem.
romfs.txt
- Description of the ROMFS filesystem.
smbfs.txt
......
configfs - Userspace-driven kernel object configuation.
Joel Becker <joel.becker@oracle.com>
Updated: 31 March 2005
Copyright (c) 2005 Oracle Corporation,
Joel Becker <joel.becker@oracle.com>
[What is configfs?]
configfs is a ram-based filesystem that provides the converse of
sysfs's functionality. Where sysfs is a filesystem-based view of
kernel objects, configfs is a filesystem-based manager of kernel
objects, or config_items.
With sysfs, an object is created in kernel (for example, when a device
is discovered) and it is registered with sysfs. Its attributes then
appear in sysfs, allowing userspace to read the attributes via
readdir(3)/read(2). It may allow some attributes to be modified via
write(2). The important point is that the object is created and
destroyed in kernel, the kernel controls the lifecycle of the sysfs
representation, and sysfs is merely a window on all this.
A configfs config_item is created via an explicit userspace operation:
mkdir(2). It is destroyed via rmdir(2). The attributes appear at
mkdir(2) time, and can be read or modified via read(2) and write(2).
As with sysfs, readdir(3) queries the list of items and/or attributes.
symlink(2) can be used to group items together. Unlike sysfs, the
lifetime of the representation is completely driven by userspace. The
kernel modules backing the items must respond to this.
Both sysfs and configfs can and should exist together on the same
system. One is not a replacement for the other.
[Using configfs]
configfs can be compiled as a module or into the kernel. You can access
it by doing
mount -t configfs none /config
The configfs tree will be empty unless client modules are also loaded.
These are modules that register their item types with configfs as
subsystems. Once a client subsystem is loaded, it will appear as a
subdirectory (or more than one) under /config. Like sysfs, the
configfs tree is always there, whether mounted on /config or not.
An item is created via mkdir(2). The item's attributes will also
appear at this time. readdir(3) can determine what the attributes are,
read(2) can query their default values, and write(2) can store new
values. Like sysfs, attributes should be ASCII text files, preferably
with only one value per file. The same efficiency caveats from sysfs
apply. Don't mix more than one attribute in one attribute file.
Like sysfs, configfs expects write(2) to store the entire buffer at
once. When writing to configfs attributes, userspace processes should
first read the entire file, modify the portions they wish to change, and
then write the entire buffer back. Attribute files have a maximum size
of one page (PAGE_SIZE, 4096 on i386).
When an item needs to be destroyed, remove it with rmdir(2). An
item cannot be destroyed if any other item has a link to it (via
symlink(2)). Links can be removed via unlink(2).
[Configuring FakeNBD: an Example]
Imagine there's a Network Block Device (NBD) driver that allows you to
access remote block devices. Call it FakeNBD. FakeNBD uses configfs
for its configuration. Obviously, there will be a nice program that
sysadmins use to configure FakeNBD, but somehow that program has to tell
the driver about it. Here's where configfs comes in.
When the FakeNBD driver is loaded, it registers itself with configfs.
readdir(3) sees this just fine:
# ls /config
fakenbd
A fakenbd connection can be created with mkdir(2). The name is
arbitrary, but likely the tool will make some use of the name. Perhaps
it is a uuid or a disk name:
# mkdir /config/fakenbd/disk1
# ls /config/fakenbd/disk1
target device rw
The target attribute contains the IP address of the server FakeNBD will
connect to. The device attribute is the device on the server.
Predictably, the rw attribute determines whether the connection is
read-only or read-write.
# echo 10.0.0.1 > /config/fakenbd/disk1/target
# echo /dev/sda1 > /config/fakenbd/disk1/device
# echo 1 > /config/fakenbd/disk1/rw
That's it. That's all there is. Now the device is configured, via the
shell no less.
[Coding With configfs]
Every object in configfs is a config_item. A config_item reflects an
object in the subsystem. It has attributes that match values on that
object. configfs handles the filesystem representation of that object
and its attributes, allowing the subsystem to ignore all but the
basic show/store interaction.
Items are created and destroyed inside a config_group. A group is a
collection of items that share the same attributes and operations.
Items are created by mkdir(2) and removed by rmdir(2), but configfs
handles that. The group has a set of operations to perform these tasks
A subsystem is the top level of a client module. During initialization,
the client module registers the subsystem with configfs, the subsystem
appears as a directory at the top of the configfs filesystem. A
subsystem is also a config_group, and can do everything a config_group
can.
[struct config_item]
struct config_item {
char *ci_name;
char ci_namebuf[UOBJ_NAME_LEN];
struct kref ci_kref;
struct list_head ci_entry;
struct config_item *ci_parent;
struct config_group *ci_group;
struct config_item_type *ci_type;
struct dentry *ci_dentry;
};
void config_item_init(struct config_item *);
void config_item_init_type_name(struct config_item *,
const char *name,
struct config_item_type *type);
struct config_item *config_item_get(struct config_item *);
void config_item_put(struct config_item *);
Generally, struct config_item is embedded in a container structure, a
structure that actually represents what the subsystem is doing. The
config_item portion of that structure is how the object interacts with
configfs.
Whether statically defined in a source file or created by a parent
config_group, a config_item must have one of the _init() functions
called on it. This initializes the reference count and sets up the
appropriate fields.
All users of a config_item should have a reference on it via
config_item_get(), and drop the reference when they are done via
config_item_put().
By itself, a config_item cannot do much more than appear in configfs.
Usually a subsystem wants the item to display and/or store attributes,
among other things. For that, it needs a type.
[struct config_item_type]
struct configfs_item_operations {
void (*release)(struct config_item *);
ssize_t (*show_attribute)(struct config_item *,
struct configfs_attribute *,
char *);
ssize_t (*store_attribute)(struct config_item *,
struct configfs_attribute *,
const char *, size_t);
int (*allow_link)(struct config_item *src,
struct config_item *target);
int (*drop_link)(struct config_item *src,
struct config_item *target);
};
struct config_item_type {
struct module *ct_owner;
struct configfs_item_operations *ct_item_ops;
struct configfs_group_operations *ct_group_ops;
struct configfs_attribute **ct_attrs;
};
The most basic function of a config_item_type is to define what
operations can be performed on a config_item. All items that have been
allocated dynamically will need to provide the ct_item_ops->release()
method. This method is called when the config_item's reference count
reaches zero. Items that wish to display an attribute need to provide
the ct_item_ops->show_attribute() method. Similarly, storing a new
attribute value uses the store_attribute() method.
[struct configfs_attribute]
struct configfs_attribute {
char *ca_name;
struct module *ca_owner;
mode_t ca_mode;
};
When a config_item wants an attribute to appear as a file in the item's
configfs directory, it must define a configfs_attribute describing it.
It then adds the attribute to the NULL-terminated array
config_item_type->ct_attrs. When the item appears in configfs, the
attribute file will appear with the configfs_attribute->ca_name
filename. configfs_attribute->ca_mode specifies the file permissions.
If an attribute is readable and the config_item provides a
ct_item_ops->show_attribute() method, that method will be called
whenever userspace asks for a read(2) on the attribute. The converse
will happen for write(2).
[struct config_group]
A config_item cannot live in a vaccum. The only way one can be created
is via mkdir(2) on a config_group. This will trigger creation of a
child item.
struct config_group {
struct config_item cg_item;
struct list_head cg_children;
struct configfs_subsystem *cg_subsys;
struct config_group **default_groups;
};
void config_group_init(struct config_group *group);
void config_group_init_type_name(struct config_group *group,
const char *name,
struct config_item_type *type);
The config_group structure contains a config_item. Properly configuring
that item means that a group can behave as an item in its own right.
However, it can do more: it can create child items or groups. This is
accomplished via the group operations specified on the group's
config_item_type.
struct configfs_group_operations {
struct config_item *(*make_item)(struct config_group *group,
const char *name);
struct config_group *(*make_group)(struct config_group *group,
const char *name);
int (*commit_item)(struct config_item *item);
void (*drop_item)(struct config_group *group,
struct config_item *item);
};
A group creates child items by providing the
ct_group_ops->make_item() method. If provided, this method is called from mkdir(2) in the group's directory. The subsystem allocates a new
config_item (or more likely, its container structure), initializes it,
and returns it to configfs. Configfs will then populate the filesystem
tree to reflect the new item.
If the subsystem wants the child to be a group itself, the subsystem
provides ct_group_ops->make_group(). Everything else behaves the same,
using the group _init() functions on the group.
Finally, when userspace calls rmdir(2) on the item or group,
ct_group_ops->drop_item() is called. As a config_group is also a
config_item, it is not necessary for a seperate drop_group() method.
The subsystem must config_item_put() the reference that was initialized
upon item allocation. If a subsystem has no work to do, it may omit
the ct_group_ops->drop_item() method, and configfs will call
config_item_put() on the item on behalf of the subsystem.
IMPORTANT: drop_item() is void, and as such cannot fail. When rmdir(2)
is called, configfs WILL remove the item from the filesystem tree
(assuming that it has no children to keep it busy). The subsystem is
responsible for responding to this. If the subsystem has references to
the item in other threads, the memory is safe. It may take some time
for the item to actually disappear from the subsystem's usage. But it
is gone from configfs.
A config_group cannot be removed while it still has child items. This
is implemented in the configfs rmdir(2) code. ->drop_item() will not be
called, as the item has not been dropped. rmdir(2) will fail, as the
directory is not empty.
[struct configfs_subsystem]
A subsystem must register itself, ususally at module_init time. This
tells configfs to make the subsystem appear in the file tree.
struct configfs_subsystem {
struct config_group su_group;
struct semaphore su_sem;
};
int configfs_register_subsystem(struct configfs_subsystem *subsys);
void configfs_unregister_subsystem(struct configfs_subsystem *subsys);
A subsystem consists of a toplevel config_group and a semaphore.
The group is where child config_items are created. For a subsystem,
this group is usually defined statically. Before calling
configfs_register_subsystem(), the subsystem must have initialized the
group via the usual group _init() functions, and it must also have
initialized the semaphore.
When the register call returns, the subsystem is live, and it
will be visible via configfs. At that point, mkdir(2) can be called and
the subsystem must be ready for it.
[An Example]
The best example of these basic concepts is the simple_children
subsystem/group and the simple_child item in configfs_example.c It
shows a trivial object displaying and storing an attribute, and a simple
group creating and destroying these children.
[Hierarchy Navigation and the Subsystem Semaphore]
There is an extra bonus that configfs provides. The config_groups and
config_items are arranged in a hierarchy due to the fact that they
appear in a filesystem. A subsystem is NEVER to touch the filesystem
parts, but the subsystem might be interested in this hierarchy. For
this reason, the hierarchy is mirrored via the config_group->cg_children
and config_item->ci_parent structure members.
A subsystem can navigate the cg_children list and the ci_parent pointer
to see the tree created by the subsystem. This can race with configfs'
management of the hierarchy, so configfs uses the subsystem semaphore to
protect modifications. Whenever a subsystem wants to navigate the
hierarchy, it must do so under the protection of the subsystem
semaphore.
A subsystem will be prevented from acquiring the semaphore while a newly
allocated item has not been linked into this hierarchy. Similarly, it
will not be able to acquire the semaphore while a dropping item has not
yet been unlinked. This means that an item's ci_parent pointer will
never be NULL while the item is in configfs, and that an item will only
be in its parent's cg_children list for the same duration. This allows
a subsystem to trust ci_parent and cg_children while they hold the
semaphore.
[Item Aggregation Via symlink(2)]
configfs provides a simple group via the group->item parent/child
relationship. Often, however, a larger environment requires aggregation
outside of the parent/child connection. This is implemented via
symlink(2).
A config_item may provide the ct_item_ops->allow_link() and
ct_item_ops->drop_link() methods. If the ->allow_link() method exists,
symlink(2) may be called with the config_item as the source of the link.
These links are only allowed between configfs config_items. Any
symlink(2) attempt outside the configfs filesystem will be denied.
When symlink(2) is called, the source config_item's ->allow_link()
method is called with itself and a target item. If the source item
allows linking to target item, it returns 0. A source item may wish to
reject a link if it only wants links to a certain type of object (say,
in its own subsystem).
When unlink(2) is called on the symbolic link, the source item is
notified via the ->drop_link() method. Like the ->drop_item() method,
this is a void function and cannot return failure. The subsystem is
responsible for responding to the change.
A config_item cannot be removed while it links to any other item, nor
can it be removed while an item links to it. Dangling symlinks are not
allowed in configfs.
[Automatically Created Subgroups]
A new config_group may want to have two types of child config_items.
While this could be codified by magic names in ->make_item(), it is much
more explicit to have a method whereby userspace sees this divergence.
Rather than have a group where some items behave differently than
others, configfs provides a method whereby one or many subgroups are
automatically created inside the parent at its creation. Thus,
mkdir("parent) results in "parent", "parent/subgroup1", up through
"parent/subgroupN". Items of type 1 can now be created in
"parent/subgroup1", and items of type N can be created in
"parent/subgroupN".
These automatic subgroups, or default groups, do not preclude other
children of the parent group. If ct_group_ops->make_group() exists,
other child groups can be created on the parent group directly.
A configfs subsystem specifies default groups by filling in the
NULL-terminated array default_groups on the config_group structure.
Each group in that array is populated in the configfs tree at the same
time as the parent group. Similarly, they are removed at the same time
as the parent. No extra notification is provided. When a ->drop_item()
method call notifies the subsystem the parent group is going away, it
also means every default group child associated with that parent group.
As a consequence of this, default_groups cannot be removed directly via
rmdir(2). They also are not considered when rmdir(2) on the parent
group is checking for children.
[Committable Items]
NOTE: Committable items are currently unimplemented.
Some config_items cannot have a valid initial state. That is, no
default values can be specified for the item's attributes such that the
item can do its work. Userspace must configure one or more attributes,
after which the subsystem can start whatever entity this item
represents.
Consider the FakeNBD device from above. Without a target address *and*
a target device, the subsystem has no idea what block device to import.
The simple example assumes that the subsystem merely waits until all the
appropriate attributes are configured, and then connects. This will,
indeed, work, but now every attribute store must check if the attributes
are initialized. Every attribute store must fire off the connection if
that condition is met.
Far better would be an explicit action notifying the subsystem that the
config_item is ready to go. More importantly, an explicit action allows
the subsystem to provide feedback as to whether the attibutes are
initialized in a way that makes sense. configfs provides this as
committable items.
configfs still uses only normal filesystem operations. An item is
committed via rename(2). The item is moved from a directory where it
can be modified to a directory where it cannot.
Any group that provides the ct_group_ops->commit_item() method has
committable items. When this group appears in configfs, mkdir(2) will
not work directly in the group. Instead, the group will have two
subdirectories: "live" and "pending". The "live" directory does not
support mkdir(2) or rmdir(2) either. It only allows rename(2). The
"pending" directory does allow mkdir(2) and rmdir(2). An item is
created in the "pending" directory. Its attributes can be modified at
will. Userspace commits the item by renaming it into the "live"
directory. At this point, the subsystem recieves the ->commit_item()
callback. If all required attributes are filled to satisfaction, the
method returns zero and the item is moved to the "live" directory.
As rmdir(2) does not work in the "live" directory, an item must be
shutdown, or "uncommitted". Again, this is done via rename(2), this
time from the "live" directory back to the "pending" one. The subsystem
is notified by the ct_group_ops->uncommit_object() method.
/*
* vim: noexpandtab ts=8 sts=0 sw=8:
*
* configfs_example.c - This file is a demonstration module containing
* a number of configfs subsystems.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public
* License along with this program; if not, write to the
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
* Boston, MA 021110-1307, USA.
*
* Based on sysfs:
* sysfs is Copyright (C) 2001, 2002, 2003 Patrick Mochel
*
* configfs Copyright (C) 2005 Oracle. All rights reserved.
*/
#include <linux/init.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/configfs.h>
/*
* 01-childless
*
* This first example is a childless subsystem. It cannot create
* any config_items. It just has attributes.
*
* Note that we are enclosing the configfs_subsystem inside a container.
* This is not necessary if a subsystem has no attributes directly
* on the subsystem. See the next example, 02-simple-children, for
* such a subsystem.
*/
struct childless {
struct configfs_subsystem subsys;
int showme;
int storeme;
};
struct childless_attribute {
struct configfs_attribute attr;
ssize_t (*show)(struct childless *, char *);
ssize_t (*store)(struct childless *, const char *, size_t);
};
static inline struct childless *to_childless(struct config_item *item)
{
return item ? container_of(to_configfs_subsystem(to_config_group(item)), struct childless, subsys) : NULL;
}
static ssize_t childless_showme_read(struct childless *childless,
char *page)
{
ssize_t pos;
pos = sprintf(page, "%d\n", childless->showme);
childless->showme++;
return pos;
}
static ssize_t childless_storeme_read(struct childless *childless,
char *page)
{
return sprintf(page, "%d\n", childless->storeme);
}
static ssize_t childless_storeme_write(struct childless *childless,
const char *page,
size_t count)
{
unsigned long tmp;
char *p = (char *) page;
tmp = simple_strtoul(p, &p, 10);
if (!p || (*p && (*p != '\n')))
return -EINVAL;
if (tmp > INT_MAX)
return -ERANGE;
childless->storeme = tmp;
return count;
}
static ssize_t childless_description_read(struct childless *childless,
char *page)
{
return sprintf(page,
"[01-childless]\n"
"\n"
"The childless subsystem is the simplest possible subsystem in\n"
"configfs. It does not support the creation of child config_items.\n"
"It only has a few attributes. In fact, it isn't much different\n"
"than a directory in /proc.\n");
}
static struct childless_attribute childless_attr_showme = {
.attr = { .ca_owner = THIS_MODULE, .ca_name = "showme", .ca_mode = S_IRUGO },
.show = childless_showme_read,
};
static struct childless_attribute childless_attr_storeme = {
.attr = { .ca_owner = THIS_MODULE, .ca_name = "storeme", .ca_mode = S_IRUGO | S_IWUSR },
.show = childless_storeme_read,
.store = childless_storeme_write,
};
static struct childless_attribute childless_attr_description = {
.attr = { .ca_owner = THIS_MODULE, .ca_name = "description", .ca_mode = S_IRUGO },
.show = childless_description_read,
};
static struct configfs_attribute *childless_attrs[] = {
&childless_attr_showme.attr,
&childless_attr_storeme.attr,
&childless_attr_description.attr,
NULL,
};
static ssize_t childless_attr_show(struct config_item *item,
struct configfs_attribute *attr,
char *page)
{
struct childless *childless = to_childless(item);
struct childless_attribute *childless_attr =
container_of(attr, struct childless_attribute, attr);
ssize_t ret = 0;
if (childless_attr->show)
ret = childless_attr->show(childless, page);
return ret;
}
static ssize_t childless_attr_store(struct config_item *item,
struct configfs_attribute *attr,
const char *page, size_t count)
{
struct childless *childless = to_childless(item);
struct childless_attribute *childless_attr =
container_of(attr, struct childless_attribute, attr);
ssize_t ret = -EINVAL;
if (childless_attr->store)
ret = childless_attr->store(childless, page, count);
return ret;
}
static struct configfs_item_operations childless_item_ops = {
.show_attribute = childless_attr_show,
.store_attribute = childless_attr_store,
};
static struct config_item_type childless_type = {
.ct_item_ops = &childless_item_ops,
.ct_attrs = childless_attrs,
.ct_owner = THIS_MODULE,
};
static struct childless childless_subsys = {
.subsys = {
.su_group = {
.cg_item = {
.ci_namebuf = "01-childless",
.ci_type = &childless_type,
},
},
},
};
/* ----------------------------------------------------------------- */
/*
* 02-simple-children
*
* This example merely has a simple one-attribute child. Note that
* there is no extra attribute structure, as the child's attribute is
* known from the get-go. Also, there is no container for the
* subsystem, as it has no attributes of its own.
*/
struct simple_child {
struct config_item item;
int storeme;
};
static inline struct simple_child *to_simple_child(struct config_item *item)
{
return item ? container_of(item, struct simple_child, item) : NULL;
}
static struct configfs_attribute simple_child_attr_storeme = {
.ca_owner = THIS_MODULE,
.ca_name = "storeme",
.ca_mode = S_IRUGO | S_IWUSR,
};
static struct configfs_attribute *simple_child_attrs[] = {
&simple_child_attr_storeme,
NULL,
};
static ssize_t simple_child_attr_show(struct config_item *item,
struct configfs_attribute *attr,
char *page)
{
ssize_t count;
struct simple_child *simple_child = to_simple_child(item);
count = sprintf(page, "%d\n", simple_child->storeme);
return count;
}
static ssize_t simple_child_attr_store(struct config_item *item,
struct configfs_attribute *attr,
const char *page, size_t count)
{
struct simple_child *simple_child = to_simple_child(item);
unsigned long tmp;
char *p = (char *) page;
tmp = simple_strtoul(p, &p, 10);
if (!p || (*p && (*p != '\n')))
return -EINVAL;
if (tmp > INT_MAX)
return -ERANGE;
simple_child->storeme = tmp;
return count;
}
static void simple_child_release(struct config_item *item)
{
kfree(to_simple_child(item));
}
static struct configfs_item_operations simple_child_item_ops = {
.release = simple_child_release,
.show_attribute = simple_child_attr_show,
.store_attribute = simple_child_attr_store,
};
static struct config_item_type simple_child_type = {
.ct_item_ops = &simple_child_item_ops,
.ct_attrs = simple_child_attrs,
.ct_owner = THIS_MODULE,
};
static struct config_item *simple_children_make_item(struct config_group *group, const char *name)
{
struct simple_child *simple_child;
simple_child = kmalloc(sizeof(struct simple_child), GFP_KERNEL);
if (!simple_child)
return NULL;
memset(simple_child, 0, sizeof(struct simple_child));
config_item_init_type_name(&simple_child->item, name,
&simple_child_type);
simple_child->storeme = 0;
return &simple_child->item;
}
static struct configfs_attribute simple_children_attr_description = {
.ca_owner = THIS_MODULE,
.ca_name = "description",
.ca_mode = S_IRUGO,
};
static struct configfs_attribute *simple_children_attrs[] = {
&simple_children_attr_description,
NULL,
};
static ssize_t simple_children_attr_show(struct config_item *item,
struct configfs_attribute *attr,
char *page)
{
return sprintf(page,
"[02-simple-children]\n"
"\n"
"This subsystem allows the creation of child config_items. These\n"
"items have only one attribute that is readable and writeable.\n");
}
static struct configfs_item_operations simple_children_item_ops = {
.show_attribute = simple_children_attr_show,
};
/*
* Note that, since no extra work is required on ->drop_item(),
* no ->drop_item() is provided.
*/
static struct configfs_group_operations simple_children_group_ops = {
.make_item = simple_children_make_item,
};
static struct config_item_type simple_children_type = {
.ct_item_ops = &simple_children_item_ops,
.ct_group_ops = &simple_children_group_ops,
.ct_attrs = simple_children_attrs,
};
static struct configfs_subsystem simple_children_subsys = {
.su_group = {
.cg_item = {
.ci_namebuf = "02-simple-children",
.ci_type = &simple_children_type,
},
},
};
/* ----------------------------------------------------------------- */
/*
* 03-group-children
*
* This example reuses the simple_children group from above. However,
* the simple_children group is not the subsystem itself, it is a
* child of the subsystem. Creation of a group in the subsystem creates
* a new simple_children group. That group can then have simple_child
* children of its own.
*/
struct simple_children {
struct config_group group;
};
static struct config_group *group_children_make_group(struct config_group *group, const char *name)
{
struct simple_children *simple_children;
simple_children = kmalloc(sizeof(struct simple_children),
GFP_KERNEL);
if (!simple_children)
return NULL;
memset(simple_children, 0, sizeof(struct simple_children));
config_group_init_type_name(&simple_children->group, name,
&simple_children_type);
return &simple_children->group;
}
static struct configfs_attribute group_children_attr_description = {
.ca_owner = THIS_MODULE,
.ca_name = "description",
.ca_mode = S_IRUGO,
};
static struct configfs_attribute *group_children_attrs[] = {
&group_children_attr_description,
NULL,
};
static ssize_t group_children_attr_show(struct config_item *item,
struct configfs_attribute *attr,
char *page)
{
return sprintf(page,
"[03-group-children]\n"
"\n"
"This subsystem allows the creation of child config_groups. These\n"
"groups are like the subsystem simple-children.\n");
}
static struct configfs_item_operations group_children_item_ops = {
.show_attribute = group_children_attr_show,
};
/*
* Note that, since no extra work is required on ->drop_item(),
* no ->drop_item() is provided.
*/
static struct configfs_group_operations group_children_group_ops = {
.make_group = group_children_make_group,
};
static struct config_item_type group_children_type = {
.ct_item_ops = &group_children_item_ops,
.ct_group_ops = &group_children_group_ops,
.ct_attrs = group_children_attrs,
};
static struct configfs_subsystem group_children_subsys = {
.su_group = {
.cg_item = {
.ci_namebuf = "03-group-children",
.ci_type = &group_children_type,
},
},
};
/* ----------------------------------------------------------------- */
/*
* We're now done with our subsystem definitions.
* For convenience in this module, here's a list of them all. It
* allows the init function to easily register them. Most modules
* will only have one subsystem, and will only call register_subsystem
* on it directly.
*/
static struct configfs_subsystem *example_subsys[] = {
&childless_subsys.subsys,
&simple_children_subsys,
&group_children_subsys,
NULL,
};
static int __init configfs_example_init(void)
{
int ret;
int i;
struct configfs_subsystem *subsys;
for (i = 0; example_subsys[i]; i++) {
subsys = example_subsys[i];
config_group_init(&subsys->su_group);
init_MUTEX(&subsys->su_sem);
ret = configfs_register_subsystem(subsys);
if (ret) {
printk(KERN_ERR "Error %d while registering subsystem %s\n",
ret,
subsys->su_group.cg_item.ci_namebuf);
goto out_unregister;
}
}
return 0;
out_unregister:
for (; i >= 0; i--) {
configfs_unregister_subsystem(example_subsys[i]);
}
return ret;
}
static void __exit configfs_example_exit(void)
{
int i;
for (i = 0; example_subsys[i]; i++) {
configfs_unregister_subsystem(example_subsys[i]);
}
}
module_init(configfs_example_init);
module_exit(configfs_example_exit);
MODULE_LICENSE("GPL");
dlmfs
==================
A minimal DLM userspace interface implemented via a virtual file
system.
dlmfs is built with OCFS2 as it requires most of its infrastructure.
Project web page: http://oss.oracle.com/projects/ocfs2
Tools web page: http://oss.oracle.com/projects/ocfs2-tools
OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/
All code copyright 2005 Oracle except when otherwise noted.
CREDITS
=======
Some code taken from ramfs which is Copyright (C) 2000 Linus Torvalds
and Transmeta Corp.
Mark Fasheh <mark.fasheh@oracle.com>
Caveats
=======
- Right now it only works with the OCFS2 DLM, though support for other
DLM implementations should not be a major issue.
Mount options
=============
None
Usage
=====
If you're just interested in OCFS2, then please see ocfs2.txt. The
rest of this document will be geared towards those who want to use
dlmfs for easy to setup and easy to use clustered locking in
userspace.
Setup
=====
dlmfs requires that the OCFS2 cluster infrastructure be in
place. Please download ocfs2-tools from the above url and configure a
cluster.
You'll want to start heartbeating on a volume which all the nodes in
your lockspace can access. The easiest way to do this is via
ocfs2_hb_ctl (distributed with ocfs2-tools). Right now it requires
that an OCFS2 file system be in place so that it can automatically
find it's heartbeat area, though it will eventually support heartbeat
against raw disks.
Please see the ocfs2_hb_ctl and mkfs.ocfs2 manual pages distributed
with ocfs2-tools.
Once you're heartbeating, DLM lock 'domains' can be easily created /
destroyed and locks within them accessed.
Locking
=======
Users may access dlmfs via standard file system calls, or they can use
'libo2dlm' (distributed with ocfs2-tools) which abstracts the file
system calls and presents a more traditional locking api.
dlmfs handles lock caching automatically for the user, so a lock
request for an already acquired lock will not generate another DLM
call. Userspace programs are assumed to handle their own local
locking.
Two levels of locks are supported - Shared Read, and Exlcusive.
Also supported is a Trylock operation.
For information on the libo2dlm interface, please see o2dlm.h,
distributed with ocfs2-tools.
Lock value blocks can be read and written to a resource via read(2)
and write(2) against the fd obtained via your open(2) call. The
maximum currently supported LVB length is 64 bytes (though that is an
OCFS2 DLM limitation). Through this mechanism, users of dlmfs can share
small amounts of data amongst their nodes.
mkdir(2) signals dlmfs to join a domain (which will have the same name
as the resulting directory)
rmdir(2) signals dlmfs to leave the domain
Locks for a given domain are represented by regular inodes inside the
domain directory. Locking against them is done via the open(2) system
call.
The open(2) call will not return until your lock has been granted or
an error has occurred, unless it has been instructed to do a trylock
operation. If the lock succeeds, you'll get an fd.
open(2) with O_CREAT to ensure the resource inode is created - dlmfs does
not automatically create inodes for existing lock resources.
Open Flag Lock Request Type
--------- -----------------
O_RDONLY Shared Read
O_RDWR Exclusive
Open Flag Resulting Locking Behavior
--------- --------------------------
O_NONBLOCK Trylock operation
You must provide exactly one of O_RDONLY or O_RDWR.
If O_NONBLOCK is also provided and the trylock operation was valid but
could not lock the resource then open(2) will return ETXTBUSY.
close(2) drops the lock associated with your fd.
Modes passed to mkdir(2) or open(2) are adhered to locally. Chown is
supported locally as well. This means you can use them to restrict
access to the resources via dlmfs on your local node only.
The resource LVB may be read from the fd in either Shared Read or
Exclusive modes via the read(2) system call. It can be written via
write(2) only when open in Exclusive mode.
Once written, an LVB will be visible to other nodes who obtain Read
Only or higher level locks on the resource.
See Also
========
http://opendlm.sourceforge.net/cvsmirror/opendlm/docs/dlmbook_final.pdf
For more information on the VMS distributed locking API.
OCFS2 filesystem
==================
OCFS2 is a general purpose extent based shared disk cluster file
system with many similarities to ext3. It supports 64 bit inode
numbers, and has automatically extending metadata groups which may
also make it attractive for non-clustered use.
You'll want to install the ocfs2-tools package in order to at least
get "mount.ocfs2" and "ocfs2_hb_ctl".
Project web page: http://oss.oracle.com/projects/ocfs2
Tools web page: http://oss.oracle.com/projects/ocfs2-tools
OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/
All code copyright 2005 Oracle except when otherwise noted.
CREDITS:
Lots of code taken from ext3 and other projects.
Authors in alphabetical order:
Joel Becker <joel.becker@oracle.com>
Zach Brown <zach.brown@oracle.com>
Mark Fasheh <mark.fasheh@oracle.com>
Kurt Hackel <kurt.hackel@oracle.com>
Sunil Mushran <sunil.mushran@oracle.com>
Manish Singh <manish.singh@oracle.com>
Caveats
=======
Features which OCFS2 does not support yet:
- sparse files
- extended attributes
- shared writeable mmap
- loopback is supported, but data written will not
be cluster coherent.
- quotas
- cluster aware flock
- Directory change notification (F_NOTIFY)
- Distributed Caching (F_SETLEASE/F_GETLEASE/break_lease)
- POSIX ACLs
- readpages / writepages (not user visible)
Mount options
=============
OCFS2 supports the following mount options:
(*) == default
barrier=1 This enables/disables barriers. barrier=0 disables it,
barrier=1 enables it.
errors=remount-ro(*) Remount the filesystem read-only on an error.
errors=panic Panic and halt the machine if an error occurs.
intr (*) Allow signals to interrupt cluster operations.
nointr Do not allow signals to interrupt cluster
operations.
......@@ -554,6 +554,11 @@ W: http://us1.samba.org/samba/Linux_CIFS_client.html
T: git kernel.org:/pub/scm/linux/kernel/git/sfrench/cifs-2.6.git
S: Supported
CONFIGFS
P: Joel Becker
M: Joel Becker <joel.becker@oracle.com>
S: Supported
CIRRUS LOGIC GENERIC FBDEV DRIVER
P: Jeff Garzik
M: jgarzik@pobox.com
......@@ -1898,6 +1903,15 @@ M: ajoshi@shell.unixbox.com
L: linux-nvidia@lists.surfsouth.com
S: Maintained
ORACLE CLUSTER FILESYSTEM 2 (OCFS2)
P: Mark Fasheh
M: mark.fasheh@oracle.com
P: Kurt Hackel
M: kurt.hackel@oracle.com
L: ocfs2-devel@oss.oracle.com
W: http://oss.oracle.com/projects/ocfs2/
S: Supported
OLYMPIC NETWORK DRIVER
P: Peter De Shrijver
M: p2@ace.ulyssis.student.kuleuven.ac.be
......
......@@ -213,7 +213,7 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec,
struct address_space_operations *aops = mapping->a_ops;
pgoff_t index;
unsigned offset, bv_offs;
int len, ret = 0;
int len, ret;
down(&mapping->host->i_sem);
index = pos >> PAGE_CACHE_SHIFT;
......@@ -232,9 +232,15 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec,
page = grab_cache_page(mapping, index);
if (unlikely(!page))
goto fail;
if (unlikely(aops->prepare_write(file, page, offset,
offset + size)))
ret = aops->prepare_write(file, page, offset,
offset + size);
if (unlikely(ret)) {
if (ret == AOP_TRUNCATED_PAGE) {
page_cache_release(page);
continue;
}
goto unlock;
}
transfer_result = lo_do_transfer(lo, WRITE, page, offset,
bvec->bv_page, bv_offs, size, IV);
if (unlikely(transfer_result)) {
......@@ -251,9 +257,15 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec,
kunmap_atomic(kaddr, KM_USER0);
}
flush_dcache_page(page);
if (unlikely(aops->commit_write(file, page, offset,
offset + size)))
ret = aops->commit_write(file, page, offset,
offset + size);
if (unlikely(ret)) {
if (ret == AOP_TRUNCATED_PAGE) {
page_cache_release(page);
continue;
}
goto unlock;
}
if (unlikely(transfer_result))
goto unlock;
bv_offs += size;
......@@ -264,6 +276,7 @@ static int do_lo_send_aops(struct loop_device *lo, struct bio_vec *bvec,
unlock_page(page);
page_cache_release(page);
}
ret = 0;
out:
up(&mapping->host->i_sem);
return ret;
......
......@@ -154,7 +154,7 @@ static int ramdisk_commit_write(struct file *file, struct page *page,
/*
* ->writepage to the the blockdev's mapping has to redirty the page so that the
* VM doesn't go and steal it. We return WRITEPAGE_ACTIVATE so that the VM
* VM doesn't go and steal it. We return AOP_WRITEPAGE_ACTIVATE so that the VM
* won't try to (pointlessly) write the page again for a while.
*
* Really, these pages should not be on the LRU at all.
......@@ -165,7 +165,7 @@ static int ramdisk_writepage(struct page *page, struct writeback_control *wbc)
make_page_uptodate(page);
SetPageDirty(page);
if (wbc->for_reclaim)
return WRITEPAGE_ACTIVATE;
return AOP_WRITEPAGE_ACTIVATE;
unlock_page(page);
return 0;
}
......
......@@ -70,6 +70,7 @@ config FS_XIP
config EXT3_FS
tristate "Ext3 journalling file system support"
select JBD
help
This is the journaling version of the Second extended file system
(often called ext3), the de facto standard Linux file system
......@@ -138,23 +139,20 @@ config EXT3_FS_SECURITY
extended attributes for file security labels, say N.
config JBD
# CONFIG_JBD could be its own option (even modular), but until there are
# other users than ext3, we will simply make it be the same as CONFIG_EXT3_FS
# dep_tristate ' Journal Block Device support (JBD for ext3)' CONFIG_JBD $CONFIG_EXT3_FS
tristate
default EXT3_FS
help
This is a generic journaling layer for block devices. It is
currently used by the ext3 file system, but it could also be used to
add journal support to other file systems or block devices such as
RAID or LVM.
currently used by the ext3 and OCFS2 file systems, but it could
also be used to add journal support to other file systems or block
devices such as RAID or LVM.
If you are using the ext3 file system, you need to say Y here. If
you are not using ext3 then you will probably want to say N.
If you are using the ext3 or OCFS2 file systems, you need to
say Y here. If you are not using ext3 OCFS2 then you will probably
want to say N.
To compile this device as a module, choose M here: the module will be
called jbd. If you are compiling ext3 into the kernel, you cannot
compile this code as a module.
called jbd. If you are compiling ext3 or OCFS2 into the kernel,
you cannot compile this code as a module.
config JBD_DEBUG
bool "JBD (ext3) debugging support"
......@@ -326,6 +324,38 @@ config FS_POSIX_ACL
source "fs/xfs/Kconfig"
config OCFS2_FS
tristate "OCFS2 file system support (EXPERIMENTAL)"
depends on NET && EXPERIMENTAL
select CONFIGFS_FS
select JBD
select CRC32
select INET
help
OCFS2 is a general purpose extent based shared disk cluster file
system with many similarities to ext3. It supports 64 bit inode
numbers, and has automatically extending metadata groups which may
also make it attractive for non-clustered use.
You'll want to install the ocfs2-tools package in order to at least
get "mount.ocfs2".
Project web page: http://oss.oracle.com/projects/ocfs2
Tools web page: http://oss.oracle.com/projects/ocfs2-tools
OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/
Note: Features which OCFS2 does not support yet:
- extended attributes
- shared writeable mmap
- loopback is supported, but data written will not
be cluster coherent.
- quotas
- cluster aware flock
- Directory change notification (F_NOTIFY)
- Distributed Caching (F_SETLEASE/F_GETLEASE/break_lease)
- POSIX ACLs
- readpages / writepages (not user visible)
config MINIX_FS
tristate "Minix fs support"
help
......@@ -841,6 +871,20 @@ config RELAYFS_FS
If unsure, say N.
config CONFIGFS_FS
tristate "Userspace-driven configuration filesystem (EXPERIMENTAL)"
depends on EXPERIMENTAL
help
configfs is a ram-based filesystem that provides the converse
of sysfs's functionality. Where sysfs is a filesystem-based
view of kernel objects, configfs is a filesystem-based manager
of kernel objects, or config_items.
Both sysfs and configfs can and should exist together on the
same system. One is not a replacement for the other.
If unsure, say N.
endmenu
menu "Miscellaneous filesystems"
......
......@@ -101,3 +101,5 @@ obj-$(CONFIG_BEFS_FS) += befs/
obj-$(CONFIG_HOSTFS) += hostfs/
obj-$(CONFIG_HPPFS) += hppfs/
obj-$(CONFIG_DEBUG_FS) += debugfs/
obj-$(CONFIG_CONFIGFS_FS) += configfs/
obj-$(CONFIG_OCFS2_FS) += ocfs2/
#
# Makefile for the configfs virtual filesystem
#
obj-$(CONFIG_CONFIGFS_FS) += configfs.o
configfs-objs := inode.o file.o dir.o symlink.o mount.o item.o
/* -*- mode: c; c-basic-offset:8; -*-
* vim: noexpandtab sw=8 ts=8 sts=0:
*
* configfs_internal.h - Internal stuff for configfs
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public
* License along with this program; if not, write to the
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
* Boston, MA 021110-1307, USA.
*
* Based on sysfs:
* sysfs is Copyright (C) 2001, 2002, 2003 Patrick Mochel
*
* configfs Copyright (C) 2005 Oracle. All rights reserved.
*/
#include <linux/slab.h>
#include <linux/list.h>
struct configfs_dirent {
atomic_t s_count;
struct list_head s_sibling;
struct list_head s_children;
struct list_head s_links;
void * s_element;
int s_type;
umode_t s_mode;
struct dentry * s_dentry;
};
#define CONFIGFS_ROOT 0x0001
#define CONFIGFS_DIR 0x0002
#define CONFIGFS_ITEM_ATTR 0x0004
#define CONFIGFS_ITEM_LINK 0x0020
#define CONFIGFS_USET_DIR 0x0040
#define CONFIGFS_USET_DEFAULT 0x0080
#define CONFIGFS_USET_DROPPING 0x0100
#define CONFIGFS_NOT_PINNED (CONFIGFS_ITEM_ATTR)
extern struct vfsmount * configfs_mount;
extern int configfs_is_root(struct config_item *item);
extern struct inode * configfs_new_inode(mode_t mode);
extern int configfs_create(struct dentry *, int mode, int (*init)(struct inode *));
extern int configfs_create_file(struct config_item *, const struct configfs_attribute *);
extern int configfs_make_dirent(struct configfs_dirent *,
struct dentry *, void *, umode_t, int);
extern int configfs_add_file(struct dentry *, const struct configfs_attribute *, int);
extern void configfs_hash_and_remove(struct dentry * dir, const char * name);
extern const unsigned char * configfs_get_name(struct configfs_dirent *sd);
extern void configfs_drop_dentry(struct configfs_dirent *sd, struct dentry *parent);
extern int configfs_pin_fs(void);
extern void configfs_release_fs(void);
extern struct rw_semaphore configfs_rename_sem;
extern struct super_block * configfs_sb;
extern struct file_operations configfs_dir_operations;
extern struct file_operations configfs_file_operations;
extern struct file_operations bin_fops;
extern struct inode_operations configfs_dir_inode_operations;
extern struct inode_operations configfs_symlink_inode_operations;
extern int configfs_symlink(struct inode *dir, struct dentry *dentry,
const char *symname);
extern int configfs_unlink(struct inode *dir, struct dentry *dentry);
struct configfs_symlink {
struct list_head sl_list;
struct config_item *sl_target;
};
extern int configfs_create_link(struct configfs_symlink *sl,
struct dentry *parent,
struct dentry *dentry);
static inline struct config_item * to_item(struct dentry * dentry)
{
struct configfs_dirent * sd = dentry->d_fsdata;
return ((struct config_item *) sd->s_element);
}
static inline struct configfs_attribute * to_attr(struct dentry * dentry)
{
struct configfs_dirent * sd = dentry->d_fsdata;
return ((struct configfs_attribute *) sd->s_element);
}
static inline struct config_item *configfs_get_config_item(struct dentry *dentry)
{
struct config_item * item = NULL;
spin_lock(&dcache_lock);
if (!d_unhashed(dentry)) {
struct configfs_dirent * sd = dentry->d_fsdata;
if (sd->s_type & CONFIGFS_ITEM_LINK) {
struct configfs_symlink * sl = sd->s_element;
item = config_item_get(sl->sl_target);
} else
item = config_item_get(sd->s_element);
}
spin_unlock(&dcache_lock);
return item;
}
static inline void release_configfs_dirent(struct configfs_dirent * sd)
{
if (!(sd->s_type & CONFIGFS_ROOT))
kfree(sd);
}
static inline struct configfs_dirent * configfs_get(struct configfs_dirent * sd)
{
if (sd) {
WARN_ON(!atomic_read(&sd->s_count));
atomic_inc(&sd->s_count);
}
return sd;
}
static inline void configfs_put(struct configfs_dirent * sd)
{
WARN_ON(!atomic_read(&sd->s_count));
if (atomic_dec_and_test(&sd->s_count))
release_configfs_dirent(sd);
}
此差异已折叠。
/* -*- mode: c; c-basic-offset: 8; -*-
* vim: noexpandtab sw=8 ts=8 sts=0:
*
* file.c - operations for regular (text) files.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public
* License along with this program; if not, write to the
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
* Boston, MA 021110-1307, USA.
*
* Based on sysfs:
* sysfs is Copyright (C) 2001, 2002, 2003 Patrick Mochel
*
* configfs Copyright (C) 2005 Oracle. All rights reserved.
*/
#include <linux/fs.h>
#include <linux/module.h>
#include <linux/dnotify.h>
#include <linux/slab.h>
#include <asm/uaccess.h>
#include <asm/semaphore.h>
#include <linux/configfs.h>
#include "configfs_internal.h"
struct configfs_buffer {
size_t count;
loff_t pos;
char * page;
struct configfs_item_operations * ops;
struct semaphore sem;
int needs_read_fill;
};
/**
* fill_read_buffer - allocate and fill buffer from item.
* @dentry: dentry pointer.
* @buffer: data buffer for file.
*
* Allocate @buffer->page, if it hasn't been already, then call the
* config_item's show() method to fill the buffer with this attribute's
* data.
* This is called only once, on the file's first read.
*/
static int fill_read_buffer(struct dentry * dentry, struct configfs_buffer * buffer)
{
struct configfs_attribute * attr = to_attr(dentry);
struct config_item * item = to_item(dentry->d_parent);
struct configfs_item_operations * ops = buffer->ops;
int ret = 0;
ssize_t count;
if (!buffer->page)
buffer->page = (char *) get_zeroed_page(GFP_KERNEL);
if (!buffer->page)
return -ENOMEM;
count = ops->show_attribute(item,attr,buffer->page);
buffer->needs_read_fill = 0;
BUG_ON(count > (ssize_t)PAGE_SIZE);
if (count >= 0)
buffer->count = count;
else
ret = count;
return ret;
}
/**
* flush_read_buffer - push buffer to userspace.
* @buffer: data buffer for file.
* @userbuf: user-passed buffer.
* @count: number of bytes requested.
* @ppos: file position.
*
* Copy the buffer we filled in fill_read_buffer() to userspace.
* This is done at the reader's leisure, copying and advancing
* the amount they specify each time.
* This may be called continuously until the buffer is empty.
*/
static int flush_read_buffer(struct configfs_buffer * buffer, char __user * buf,
size_t count, loff_t * ppos)
{
int error;
if (*ppos > buffer->count)
return 0;
if (count > (buffer->count - *ppos))
count = buffer->count - *ppos;
error = copy_to_user(buf,buffer->page + *ppos,count);
if (!error)
*ppos += count;
return error ? -EFAULT : count;
}
/**
* configfs_read_file - read an attribute.
* @file: file pointer.
* @buf: buffer to fill.
* @count: number of bytes to read.
* @ppos: starting offset in file.
*
* Userspace wants to read an attribute file. The attribute descriptor
* is in the file's ->d_fsdata. The target item is in the directory's
* ->d_fsdata.
*
* We call fill_read_buffer() to allocate and fill the buffer from the
* item's show() method exactly once (if the read is happening from
* the beginning of the file). That should fill the entire buffer with
* all the data the item has to offer for that attribute.
* We then call flush_read_buffer() to copy the buffer to userspace
* in the increments specified.
*/
static ssize_t
configfs_read_file(struct file *file, char __user *buf, size_t count, loff_t *ppos)
{
struct configfs_buffer * buffer = file->private_data;
ssize_t retval = 0;
down(&buffer->sem);
if (buffer->needs_read_fill) {
if ((retval = fill_read_buffer(file->f_dentry,buffer)))
goto out;
}
pr_debug("%s: count = %d, ppos = %lld, buf = %s\n",
__FUNCTION__,count,*ppos,buffer->page);
retval = flush_read_buffer(buffer,buf,count,ppos);
out:
up(&buffer->sem);
return retval;
}
/**
* fill_write_buffer - copy buffer from userspace.
* @buffer: data buffer for file.
* @userbuf: data from user.
* @count: number of bytes in @userbuf.
*
* Allocate @buffer->page if it hasn't been already, then
* copy the user-supplied buffer into it.
*/
static int
fill_write_buffer(struct configfs_buffer * buffer, const char __user * buf, size_t count)
{
int error;
if (!buffer->page)
buffer->page = (char *)get_zeroed_page(GFP_KERNEL);
if (!buffer->page)
return -ENOMEM;
if (count > PAGE_SIZE)
count = PAGE_SIZE;
error = copy_from_user(buffer->page,buf,count);
buffer->needs_read_fill = 1;
return error ? -EFAULT : count;
}
/**
* flush_write_buffer - push buffer to config_item.
* @file: file pointer.
* @buffer: data buffer for file.
*
* Get the correct pointers for the config_item and the attribute we're
* dealing with, then call the store() method for the attribute,
* passing the buffer that we acquired in fill_write_buffer().
*/
static int
flush_write_buffer(struct dentry * dentry, struct configfs_buffer * buffer, size_t count)
{
struct configfs_attribute * attr = to_attr(dentry);
struct config_item * item = to_item(dentry->d_parent);
struct configfs_item_operations * ops = buffer->ops;
return ops->store_attribute(item,attr,buffer->page,count);
}
/**
* configfs_write_file - write an attribute.
* @file: file pointer
* @buf: data to write
* @count: number of bytes
* @ppos: starting offset
*
* Similar to configfs_read_file(), though working in the opposite direction.
* We allocate and fill the data from the user in fill_write_buffer(),
* then push it to the config_item in flush_write_buffer().
* There is no easy way for us to know if userspace is only doing a partial
* write, so we don't support them. We expect the entire buffer to come
* on the first write.
* Hint: if you're writing a value, first read the file, modify only the
* the value you're changing, then write entire buffer back.
*/
static ssize_t
configfs_write_file(struct file *file, const char __user *buf, size_t count, loff_t *ppos)
{
struct configfs_buffer * buffer = file->private_data;
down(&buffer->sem);
count = fill_write_buffer(buffer,buf,count);
if (count > 0)
count = flush_write_buffer(file->f_dentry,buffer,count);
if (count > 0)
*ppos += count;
up(&buffer->sem);
return count;
}
static int check_perm(struct inode * inode, struct file * file)
{
struct config_item *item = configfs_get_config_item(file->f_dentry->d_parent);
struct configfs_attribute * attr = to_attr(file->f_dentry);
struct configfs_buffer * buffer;
struct configfs_item_operations * ops = NULL;
int error = 0;
if (!item || !attr)
goto Einval;
/* Grab the module reference for this attribute if we have one */
if (!try_module_get(attr->ca_owner)) {
error = -ENODEV;
goto Done;
}
if (item->ci_type)
ops = item->ci_type->ct_item_ops;
else
goto Eaccess;
/* File needs write support.
* The inode's perms must say it's ok,
* and we must have a store method.
*/
if (file->f_mode & FMODE_WRITE) {
if (!(inode->i_mode & S_IWUGO) || !ops->store_attribute)
goto Eaccess;
}
/* File needs read support.
* The inode's perms must say it's ok, and we there
* must be a show method for it.
*/
if (file->f_mode & FMODE_READ) {
if (!(inode->i_mode & S_IRUGO) || !ops->show_attribute)
goto Eaccess;
}
/* No error? Great, allocate a buffer for the file, and store it
* it in file->private_data for easy access.
*/
buffer = kmalloc(sizeof(struct configfs_buffer),GFP_KERNEL);
if (buffer) {
memset(buffer,0,sizeof(struct configfs_buffer));
init_MUTEX(&buffer->sem);
buffer->needs_read_fill = 1;
buffer->ops = ops;
file->private_data = buffer;
} else
error = -ENOMEM;
goto Done;
Einval:
error = -EINVAL;
goto Done;
Eaccess:
error = -EACCES;
module_put(attr->ca_owner);
Done:
if (error && item)
config_item_put(item);
return error;
}
static int configfs_open_file(struct inode * inode, struct file * filp)
{
return check_perm(inode,filp);
}
static int configfs_release(struct inode * inode, struct file * filp)
{
struct config_item * item = to_item(filp->f_dentry->d_parent);
struct configfs_attribute * attr = to_attr(filp->f_dentry);
struct module * owner = attr->ca_owner;
struct configfs_buffer * buffer = filp->private_data;
if (item)
config_item_put(item);
/* After this point, attr should not be accessed. */
module_put(owner);
if (buffer) {
if (buffer->page)
free_page((unsigned long)buffer->page);
kfree(buffer);
}
return 0;
}
struct file_operations configfs_file_operations = {
.read = configfs_read_file,
.write = configfs_write_file,
.llseek = generic_file_llseek,
.open = configfs_open_file,
.release = configfs_release,
};
int configfs_add_file(struct dentry * dir, const struct configfs_attribute * attr, int type)
{
struct configfs_dirent * parent_sd = dir->d_fsdata;
umode_t mode = (attr->ca_mode & S_IALLUGO) | S_IFREG;
int error = 0;
down(&dir->d_inode->i_sem);
error = configfs_make_dirent(parent_sd, NULL, (void *) attr, mode, type);
up(&dir->d_inode->i_sem);
return error;
}
/**
* configfs_create_file - create an attribute file for an item.
* @item: item we're creating for.
* @attr: atrribute descriptor.
*/
int configfs_create_file(struct config_item * item, const struct configfs_attribute * attr)
{
BUG_ON(!item || !item->ci_dentry || !attr);
return configfs_add_file(item->ci_dentry, attr,
CONFIGFS_ITEM_ATTR);
}
/* -*- mode: c; c-basic-offset: 8; -*-
* vim: noexpandtab sw=8 ts=8 sts=0:
*
* inode.c - basic inode and dentry operations.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public
* License along with this program; if not, write to the
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
* Boston, MA 021110-1307, USA.
*
* Based on sysfs:
* sysfs is Copyright (C) 2001, 2002, 2003 Patrick Mochel
*
* configfs Copyright (C) 2005 Oracle. All rights reserved.
*
* Please see Documentation/filesystems/configfs.txt for more information.
*/
#undef DEBUG
#include <linux/pagemap.h>
#include <linux/namei.h>
#include <linux/backing-dev.h>
#include <linux/configfs.h>
#include "configfs_internal.h"
extern struct super_block * configfs_sb;
static struct address_space_operations configfs_aops = {
.readpage = simple_readpage,
.prepare_write = simple_prepare_write,
.commit_write = simple_commit_write
};
static struct backing_dev_info configfs_backing_dev_info = {
.ra_pages = 0, /* No readahead */
.capabilities = BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_WRITEBACK,
};
struct inode * configfs_new_inode(mode_t mode)
{
struct inode * inode = new_inode(configfs_sb);
if (inode) {
inode->i_mode = mode;
inode->i_uid = 0;
inode->i_gid = 0;
inode->i_blksize = PAGE_CACHE_SIZE;
inode->i_blocks = 0;
inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
inode->i_mapping->a_ops = &configfs_aops;
inode->i_mapping->backing_dev_info = &configfs_backing_dev_info;
}
return inode;
}
int configfs_create(struct dentry * dentry, int mode, int (*init)(struct inode *))
{
int error = 0;
struct inode * inode = NULL;
if (dentry) {
if (!dentry->d_inode) {
if ((inode = configfs_new_inode(mode))) {
if (dentry->d_parent && dentry->d_parent->d_inode) {
struct inode *p_inode = dentry->d_parent->d_inode;
p_inode->i_mtime = p_inode->i_ctime = CURRENT_TIME;
}
goto Proceed;
}
else
error = -ENOMEM;
} else
error = -EEXIST;
} else
error = -ENOENT;
goto Done;
Proceed:
if (init)
error = init(inode);
if (!error) {
d_instantiate(dentry, inode);
if (S_ISDIR(mode) || S_ISLNK(mode))
dget(dentry); /* pin link and directory dentries in core */
} else
iput(inode);
Done:
return error;
}
/*
* Get the name for corresponding element represented by the given configfs_dirent
*/
const unsigned char * configfs_get_name(struct configfs_dirent *sd)
{
struct attribute * attr;
if (!sd || !sd->s_element)
BUG();
/* These always have a dentry, so use that */
if (sd->s_type & (CONFIGFS_DIR | CONFIGFS_ITEM_LINK))
return sd->s_dentry->d_name.name;
if (sd->s_type & CONFIGFS_ITEM_ATTR) {
attr = sd->s_element;
return attr->name;
}
return NULL;
}
/*
* Unhashes the dentry corresponding to given configfs_dirent
* Called with parent inode's i_sem held.
*/
void configfs_drop_dentry(struct configfs_dirent * sd, struct dentry * parent)
{
struct dentry * dentry = sd->s_dentry;
if (dentry) {
spin_lock(&dcache_lock);
if (!(d_unhashed(dentry) && dentry->d_inode)) {
dget_locked(dentry);
__d_drop(dentry);
spin_unlock(&dcache_lock);
simple_unlink(parent->d_inode, dentry);
} else
spin_unlock(&dcache_lock);
}
}
void configfs_hash_and_remove(struct dentry * dir, const char * name)
{
struct configfs_dirent * sd;
struct configfs_dirent * parent_sd = dir->d_fsdata;
down(&dir->d_inode->i_sem);
list_for_each_entry(sd, &parent_sd->s_children, s_sibling) {
if (!sd->s_element)
continue;
if (!strcmp(configfs_get_name(sd), name)) {
list_del_init(&sd->s_sibling);
configfs_drop_dentry(sd, dir);
configfs_put(sd);
break;
}
}
up(&dir->d_inode->i_sem);
}
/* -*- mode: c; c-basic-offset: 8; -*-
* vim: noexpandtab sw=8 ts=8 sts=0:
*
* item.c - library routines for handling generic config items
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public
* License along with this program; if not, write to the
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
* Boston, MA 021110-1307, USA.
*
* Based on kobject:
* kobject is Copyright (c) 2002-2003 Patrick Mochel
*
* configfs Copyright (C) 2005 Oracle. All rights reserved.
*
* Please see the file Documentation/filesystems/configfs.txt for
* critical information about using the config_item interface.
*/
#include <linux/string.h>
#include <linux/module.h>
#include <linux/stat.h>
#include <linux/slab.h>
#include <linux/configfs.h>
static inline struct config_item * to_item(struct list_head * entry)
{
return container_of(entry,struct config_item,ci_entry);
}
/* Evil kernel */
static void config_item_release(struct kref *kref);
/**
* config_item_init - initialize item.
* @item: item in question.
*/
void config_item_init(struct config_item * item)
{
kref_init(&item->ci_kref);
INIT_LIST_HEAD(&item->ci_entry);
}
/**
* config_item_set_name - Set the name of an item
* @item: item.
* @name: name.
*
* If strlen(name) >= CONFIGFS_ITEM_NAME_LEN, then use a
* dynamically allocated string that @item->ci_name points to.
* Otherwise, use the static @item->ci_namebuf array.
*/
int config_item_set_name(struct config_item * item, const char * fmt, ...)
{
int error = 0;
int limit = CONFIGFS_ITEM_NAME_LEN;
int need;
va_list args;
char * name;
/*
* First, try the static array
*/
va_start(args,fmt);
need = vsnprintf(item->ci_namebuf,limit,fmt,args);
va_end(args);
if (need < limit)
name = item->ci_namebuf;
else {
/*
* Need more space? Allocate it and try again
*/
limit = need + 1;
name = kmalloc(limit,GFP_KERNEL);
if (!name) {
error = -ENOMEM;
goto Done;
}
va_start(args,fmt);
need = vsnprintf(name,limit,fmt,args);
va_end(args);
/* Still? Give up. */
if (need >= limit) {
kfree(name);
error = -EFAULT;
goto Done;
}
}
/* Free the old name, if necessary. */
if (item->ci_name && item->ci_name != item->ci_namebuf)
kfree(item->ci_name);
/* Now, set the new name */
item->ci_name = name;
Done:
return error;
}
EXPORT_SYMBOL(config_item_set_name);
void config_item_init_type_name(struct config_item *item,
const char *name,
struct config_item_type *type)
{
config_item_set_name(item, name);
item->ci_type = type;
config_item_init(item);
}
EXPORT_SYMBOL(config_item_init_type_name);
void config_group_init_type_name(struct config_group *group, const char *name,
struct config_item_type *type)
{
config_item_set_name(&group->cg_item, name);
group->cg_item.ci_type = type;
config_group_init(group);
}
EXPORT_SYMBOL(config_group_init_type_name);
struct config_item * config_item_get(struct config_item * item)
{
if (item)
kref_get(&item->ci_kref);
return item;
}
/**
* config_item_cleanup - free config_item resources.
* @item: item.
*/
void config_item_cleanup(struct config_item * item)
{
struct config_item_type * t = item->ci_type;
struct config_group * s = item->ci_group;
struct config_item * parent = item->ci_parent;
pr_debug("config_item %s: cleaning up\n",config_item_name(item));
if (item->ci_name != item->ci_namebuf)
kfree(item->ci_name);
item->ci_name = NULL;
if (t && t->ct_item_ops && t->ct_item_ops->release)
t->ct_item_ops->release(item);
if (s)
config_group_put(s);
if (parent)
config_item_put(parent);
}
static void config_item_release(struct kref *kref)
{
config_item_cleanup(container_of(kref, struct config_item, ci_kref));
}
/**
* config_item_put - decrement refcount for item.
* @item: item.
*
* Decrement the refcount, and if 0, call config_item_cleanup().
*/
void config_item_put(struct config_item * item)
{
if (item)
kref_put(&item->ci_kref, config_item_release);
}
/**
* config_group_init - initialize a group for use
* @k: group
*/
void config_group_init(struct config_group *group)
{
config_item_init(&group->cg_item);
INIT_LIST_HEAD(&group->cg_children);
}
/**
* config_group_find_obj - search for item in group.
* @group: group we're looking in.
* @name: item's name.
*
* Lock group via @group->cg_subsys, and iterate over @group->cg_list,
* looking for a matching config_item. If matching item is found
* take a reference and return the item.
*/
struct config_item * config_group_find_obj(struct config_group * group, const char * name)
{
struct list_head * entry;
struct config_item * ret = NULL;
/* XXX LOCKING! */
list_for_each(entry,&group->cg_children) {
struct config_item * item = to_item(entry);
if (config_item_name(item) &&
!strcmp(config_item_name(item), name)) {
ret = config_item_get(item);
break;
}
}
return ret;
}
EXPORT_SYMBOL(config_item_init);
EXPORT_SYMBOL(config_group_init);
EXPORT_SYMBOL(config_item_get);
EXPORT_SYMBOL(config_item_put);
/* -*- mode: c; c-basic-offset: 8; -*-
* vim: noexpandtab sw=8 ts=8 sts=0:
*
* mount.c - operations for initializing and mounting configfs.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public
* License along with this program; if not, write to the
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
* Boston, MA 021110-1307, USA.
*
* Based on sysfs:
* sysfs is Copyright (C) 2001, 2002, 2003 Patrick Mochel
*
* configfs Copyright (C) 2005 Oracle. All rights reserved.
*/
#include <linux/fs.h>
#include <linux/module.h>
#include <linux/mount.h>
#include <linux/pagemap.h>
#include <linux/init.h>
#include <linux/configfs.h>
#include "configfs_internal.h"
/* Random magic number */
#define CONFIGFS_MAGIC 0x62656570
struct vfsmount * configfs_mount = NULL;
struct super_block * configfs_sb = NULL;
static int configfs_mnt_count = 0;
static struct super_operations configfs_ops = {
.statfs = simple_statfs,
.drop_inode = generic_delete_inode,
};
static struct config_group configfs_root_group = {
.cg_item = {
.ci_namebuf = "root",
.ci_name = configfs_root_group.cg_item.ci_namebuf,
},
};
int configfs_is_root(struct config_item *item)
{
return item == &configfs_root_group.cg_item;
}
static struct configfs_dirent configfs_root = {
.s_sibling = LIST_HEAD_INIT(configfs_root.s_sibling),
.s_children = LIST_HEAD_INIT(configfs_root.s_children),
.s_element = &configfs_root_group.cg_item,
.s_type = CONFIGFS_ROOT,
};
static int configfs_fill_super(struct super_block *sb, void *data, int silent)
{
struct inode *inode;
struct dentry *root;
sb->s_blocksize = PAGE_CACHE_SIZE;
sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
sb->s_magic = CONFIGFS_MAGIC;
sb->s_op = &configfs_ops;
configfs_sb = sb;
inode = configfs_new_inode(S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO);
if (inode) {
inode->i_op = &configfs_dir_inode_operations;
inode->i_fop = &configfs_dir_operations;
/* directory inodes start off with i_nlink == 2 (for "." entry) */
inode->i_nlink++;
} else {
pr_debug("configfs: could not get root inode\n");
return -ENOMEM;
}
root = d_alloc_root(inode);
if (!root) {
pr_debug("%s: could not get root dentry!\n",__FUNCTION__);
iput(inode);
return -ENOMEM;
}
config_group_init(&configfs_root_group);
configfs_root_group.cg_item.ci_dentry = root;
root->d_fsdata = &configfs_root;
sb->s_root = root;
return 0;
}
static struct super_block *configfs_get_sb(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data)
{
return get_sb_single(fs_type, flags, data, configfs_fill_super);
}
static struct file_system_type configfs_fs_type = {
.owner = THIS_MODULE,
.name = "configfs",
.get_sb = configfs_get_sb,
.kill_sb = kill_litter_super,
};
int configfs_pin_fs(void)
{
return simple_pin_fs("configfs", &configfs_mount,
&configfs_mnt_count);
}
void configfs_release_fs(void)
{
simple_release_fs(&configfs_mount, &configfs_mnt_count);
}
static decl_subsys(config, NULL, NULL);
static int __init configfs_init(void)
{
int err;
kset_set_kset_s(&config_subsys, kernel_subsys);
err = subsystem_register(&config_subsys);
if (err)
return err;
err = register_filesystem(&configfs_fs_type);
if (err) {
printk(KERN_ERR "configfs: Unable to register filesystem!\n");
subsystem_unregister(&config_subsys);
}
return err;
}
static void __exit configfs_exit(void)
{
unregister_filesystem(&configfs_fs_type);
subsystem_unregister(&config_subsys);
}
MODULE_AUTHOR("Oracle");
MODULE_LICENSE("GPL");
MODULE_VERSION("0.0.1");
MODULE_DESCRIPTION("Simple RAM filesystem for user driven kernel subsystem configuration.");
module_init(configfs_init);
module_exit(configfs_exit);
/* -*- mode: c; c-basic-offset: 8; -*-
* vim: noexpandtab sw=8 ts=8 sts=0:
*
* symlink.c - operations for configfs symlinks.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public
* License along with this program; if not, write to the
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
* Boston, MA 021110-1307, USA.
*
* Based on sysfs:
* sysfs is Copyright (C) 2001, 2002, 2003 Patrick Mochel
*
* configfs Copyright (C) 2005 Oracle. All rights reserved.
*/
#include <linux/fs.h>
#include <linux/module.h>
#include <linux/namei.h>
#include <linux/configfs.h>
#include "configfs_internal.h"
static int item_depth(struct config_item * item)
{
struct config_item * p = item;
int depth = 0;
do { depth++; } while ((p = p->ci_parent) && !configfs_is_root(p));
return depth;
}
static int item_path_length(struct config_item * item)
{
struct config_item * p = item;
int length = 1;
do {
length += strlen(config_item_name(p)) + 1;
p = p->ci_parent;
} while (p && !configfs_is_root(p));
return length;
}
static void fill_item_path(struct config_item * item, char * buffer, int length)
{
struct config_item * p;
--length;
for (p = item; p && !configfs_is_root(p); p = p->ci_parent) {
int cur = strlen(config_item_name(p));
/* back up enough to print this bus id with '/' */
length -= cur;
strncpy(buffer + length,config_item_name(p),cur);
*(buffer + --length) = '/';
}
}
static int create_link(struct config_item *parent_item,
struct config_item *item,
struct dentry *dentry)
{
struct configfs_dirent *target_sd = item->ci_dentry->d_fsdata;
struct configfs_symlink *sl;
int ret;
ret = -ENOMEM;
sl = kmalloc(sizeof(struct configfs_symlink), GFP_KERNEL);
if (sl) {
sl->sl_target = config_item_get(item);
/* FIXME: needs a lock, I'd bet */
list_add(&sl->sl_list, &target_sd->s_links);
ret = configfs_create_link(sl, parent_item->ci_dentry,
dentry);
if (ret) {
list_del_init(&sl->sl_list);
config_item_put(item);
kfree(sl);
}
}
return ret;
}
static int get_target(const char *symname, struct nameidata *nd,
struct config_item **target)
{
int ret;
ret = path_lookup(symname, LOOKUP_FOLLOW|LOOKUP_DIRECTORY, nd);
if (!ret) {
if (nd->dentry->d_sb == configfs_sb) {
*target = configfs_get_config_item(nd->dentry);
if (!*target) {
ret = -ENOENT;
path_release(nd);
}
} else
ret = -EPERM;
}
return ret;
}
int configfs_symlink(struct inode *dir, struct dentry *dentry, const char *symname)
{
int ret;
struct nameidata nd;
struct config_item *parent_item;
struct config_item *target_item;
struct config_item_type *type;
ret = -EPERM; /* What lack-of-symlink returns */
if (dentry->d_parent == configfs_sb->s_root)
goto out;
parent_item = configfs_get_config_item(dentry->d_parent);
type = parent_item->ci_type;
if (!type || !type->ct_item_ops ||
!type->ct_item_ops->allow_link)
goto out_put;
ret = get_target(symname, &nd, &target_item);
if (ret)
goto out_put;
ret = type->ct_item_ops->allow_link(parent_item, target_item);
if (!ret)
ret = create_link(parent_item, target_item, dentry);
config_item_put(target_item);
path_release(&nd);
out_put:
config_item_put(parent_item);
out:
return ret;
}
int configfs_unlink(struct inode *dir, struct dentry *dentry)
{
struct configfs_dirent *sd = dentry->d_fsdata;
struct configfs_symlink *sl;
struct config_item *parent_item;
struct config_item_type *type;
int ret;
ret = -EPERM; /* What lack-of-symlink returns */
if (!(sd->s_type & CONFIGFS_ITEM_LINK))
goto out;
if (dentry->d_parent == configfs_sb->s_root)
BUG();
sl = sd->s_element;
parent_item = configfs_get_config_item(dentry->d_parent);
type = parent_item->ci_type;
list_del_init(&sd->s_sibling);
configfs_drop_dentry(sd, dentry->d_parent);
dput(dentry);
configfs_put(sd);
/*
* drop_link() must be called before
* list_del_init(&sl->sl_list), so that the order of
* drop_link(this, target) and drop_item(target) is preserved.
*/
if (type && type->ct_item_ops &&
type->ct_item_ops->drop_link)
type->ct_item_ops->drop_link(parent_item,
sl->sl_target);
/* FIXME: Needs lock */
list_del_init(&sl->sl_list);
/* Put reference from create_link() */
config_item_put(sl->sl_target);
kfree(sl);
config_item_put(parent_item);
ret = 0;
out:
return ret;
}
static int configfs_get_target_path(struct config_item * item, struct config_item * target,
char *path)
{
char * s;
int depth, size;
depth = item_depth(item);
size = item_path_length(target) + depth * 3 - 1;
if (size > PATH_MAX)
return -ENAMETOOLONG;
pr_debug("%s: depth = %d, size = %d\n", __FUNCTION__, depth, size);
for (s = path; depth--; s += 3)
strcpy(s,"../");
fill_item_path(target, path, size);
pr_debug("%s: path = '%s'\n", __FUNCTION__, path);
return 0;
}
static int configfs_getlink(struct dentry *dentry, char * path)
{
struct config_item *item, *target_item;
int error = 0;
item = configfs_get_config_item(dentry->d_parent);
if (!item)
return -EINVAL;
target_item = configfs_get_config_item(dentry);
if (!target_item) {
config_item_put(item);
return -EINVAL;
}
down_read(&configfs_rename_sem);
error = configfs_get_target_path(item, target_item, path);
up_read(&configfs_rename_sem);
config_item_put(item);
config_item_put(target_item);
return error;
}
static void *configfs_follow_link(struct dentry *dentry, struct nameidata *nd)
{
int error = -ENOMEM;
unsigned long page = get_zeroed_page(GFP_KERNEL);
if (page) {
error = configfs_getlink(dentry, (char *)page);
if (!error) {
nd_set_link(nd, (char *)page);
return (void *)page;
}
}
nd_set_link(nd, ERR_PTR(error));
return NULL;
}
static void configfs_put_link(struct dentry *dentry, struct nameidata *nd,
void *cookie)
{
if (cookie) {
unsigned long page = (unsigned long)cookie;
free_page(page);
}
}
struct inode_operations configfs_symlink_inode_operations = {
.follow_link = configfs_follow_link,
.readlink = generic_readlink,
.put_link = configfs_put_link,
};
......@@ -721,7 +721,7 @@ mpage_writepages(struct address_space *mapping,
&last_block_in_bio, &ret, wbc,
page->mapping->a_ops->writepage);
}
if (unlikely(ret == WRITEPAGE_ACTIVATE))
if (unlikely(ret == AOP_WRITEPAGE_ACTIVATE))
unlock_page(page);
if (ret || (--(wbc->nr_to_write) <= 0))
done = 1;
......
EXTRA_CFLAGS += -Ifs/ocfs2
EXTRA_CFLAGS += -DCATCH_BH_JBD_RACES
obj-$(CONFIG_OCFS2_FS) += ocfs2.o
ocfs2-objs := \
alloc.o \
aops.o \
buffer_head_io.o \
dcache.o \
dir.o \
dlmglue.o \
export.o \
extent_map.o \
file.o \
heartbeat.o \
inode.o \
journal.o \
localalloc.o \
mmap.o \
namei.o \
slot_map.o \
suballoc.o \
super.o \
symlink.o \
sysfile.o \
uptodate.o \
ver.o \
vote.o
obj-$(CONFIG_OCFS2_FS) += cluster/
obj-$(CONFIG_OCFS2_FS) += dlm/
此差异已折叠。
/* -*- mode: c; c-basic-offset: 8; -*-
* vim: noexpandtab sw=8 ts=8 sts=0:
*
* alloc.h
*
* Function prototypes
*
* Copyright (C) 2002, 2004 Oracle. All rights reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public
* License along with this program; if not, write to the
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
* Boston, MA 021110-1307, USA.
*/
#ifndef OCFS2_ALLOC_H
#define OCFS2_ALLOC_H
struct ocfs2_alloc_context;
int ocfs2_insert_extent(struct ocfs2_super *osb,
struct ocfs2_journal_handle *handle,
struct inode *inode,
struct buffer_head *fe_bh,
u64 blkno,
u32 new_clusters,
struct ocfs2_alloc_context *meta_ac);
int ocfs2_num_free_extents(struct ocfs2_super *osb,
struct inode *inode,
struct ocfs2_dinode *fe);
/* how many new metadata chunks would an allocation need at maximum? */
static inline int ocfs2_extend_meta_needed(struct ocfs2_dinode *fe)
{
/*
* Rather than do all the work of determining how much we need
* (involves a ton of reads and locks), just ask for the
* maximal limit. That's a tree depth shift. So, one block for
* level of the tree (current l_tree_depth), one block for the
* new tree_depth==0 extent_block, and one block at the new
* top-of-the tree.
*/
return le16_to_cpu(fe->id2.i_list.l_tree_depth) + 2;
}
int ocfs2_truncate_log_init(struct ocfs2_super *osb);
void ocfs2_truncate_log_shutdown(struct ocfs2_super *osb);
void ocfs2_schedule_truncate_log_flush(struct ocfs2_super *osb,
int cancel);
int ocfs2_flush_truncate_log(struct ocfs2_super *osb);
int ocfs2_begin_truncate_log_recovery(struct ocfs2_super *osb,
int slot_num,
struct ocfs2_dinode **tl_copy);
int ocfs2_complete_truncate_log_recovery(struct ocfs2_super *osb,
struct ocfs2_dinode *tl_copy);
struct ocfs2_truncate_context {
struct inode *tc_ext_alloc_inode;
struct buffer_head *tc_ext_alloc_bh;
int tc_ext_alloc_locked; /* is it cluster locked? */
/* these get destroyed once it's passed to ocfs2_commit_truncate. */
struct buffer_head *tc_last_eb_bh;
};
int ocfs2_prepare_truncate(struct ocfs2_super *osb,
struct inode *inode,
struct buffer_head *fe_bh,
struct ocfs2_truncate_context **tc);
int ocfs2_commit_truncate(struct ocfs2_super *osb,
struct inode *inode,
struct buffer_head *fe_bh,
struct ocfs2_truncate_context *tc);
#endif /* OCFS2_ALLOC_H */
/* -*- mode: c; c-basic-offset: 8; -*-
* vim: noexpandtab sw=8 ts=8 sts=0:
*
* Copyright (C) 2002, 2004 Oracle. All rights reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public
* License along with this program; if not, write to the
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
* Boston, MA 021110-1307, USA.
*/
#include <linux/fs.h>
#include <linux/slab.h>
#include <linux/highmem.h>
#include <linux/pagemap.h>
#include <asm/byteorder.h>
#define MLOG_MASK_PREFIX ML_FILE_IO
#include <cluster/masklog.h>
#include "ocfs2.h"
#include "alloc.h"
#include "aops.h"
#include "dlmglue.h"
#include "extent_map.h"
#include "file.h"
#include "inode.h"
#include "journal.h"
#include "super.h"
#include "symlink.h"
#include "buffer_head_io.h"
static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
{
int err = -EIO;
int status;
struct ocfs2_dinode *fe = NULL;
struct buffer_head *bh = NULL;
struct buffer_head *buffer_cache_bh = NULL;
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
void *kaddr;
mlog_entry("(0x%p, %llu, 0x%p, %d)\n", inode,
(unsigned long long)iblock, bh_result, create);
BUG_ON(ocfs2_inode_is_fast_symlink(inode));
if ((iblock << inode->i_sb->s_blocksize_bits) > PATH_MAX + 1) {
mlog(ML_ERROR, "block offset > PATH_MAX: %llu",
(unsigned long long)iblock);
goto bail;
}
status = ocfs2_read_block(OCFS2_SB(inode->i_sb),
OCFS2_I(inode)->ip_blkno,
&bh, OCFS2_BH_CACHED, inode);
if (status < 0) {
mlog_errno(status);
goto bail;
}
fe = (struct ocfs2_dinode *) bh->b_data;
if (!OCFS2_IS_VALID_DINODE(fe)) {
mlog(ML_ERROR, "Invalid dinode #%"MLFu64": signature = %.*s\n",
fe->i_blkno, 7, fe->i_signature);
goto bail;
}
if ((u64)iblock >= ocfs2_clusters_to_blocks(inode->i_sb,
le32_to_cpu(fe->i_clusters))) {
mlog(ML_ERROR, "block offset is outside the allocated size: "
"%llu\n", (unsigned long long)iblock);
goto bail;
}
/* We don't use the page cache to create symlink data, so if
* need be, copy it over from the buffer cache. */
if (!buffer_uptodate(bh_result) && ocfs2_inode_is_new(inode)) {
u64 blkno = le64_to_cpu(fe->id2.i_list.l_recs[0].e_blkno) +
iblock;
buffer_cache_bh = sb_getblk(osb->sb, blkno);
if (!buffer_cache_bh) {
mlog(ML_ERROR, "couldn't getblock for symlink!\n");
goto bail;
}
/* we haven't locked out transactions, so a commit
* could've happened. Since we've got a reference on
* the bh, even if it commits while we're doing the
* copy, the data is still good. */
if (buffer_jbd(buffer_cache_bh)
&& ocfs2_inode_is_new(inode)) {
kaddr = kmap_atomic(bh_result->b_page, KM_USER0);
if (!kaddr) {
mlog(ML_ERROR, "couldn't kmap!\n");
goto bail;
}
memcpy(kaddr + (bh_result->b_size * iblock),
buffer_cache_bh->b_data,
bh_result->b_size);
kunmap_atomic(kaddr, KM_USER0);
set_buffer_uptodate(bh_result);
}
brelse(buffer_cache_bh);
}
map_bh(bh_result, inode->i_sb,
le64_to_cpu(fe->id2.i_list.l_recs[0].e_blkno) + iblock);
err = 0;
bail:
if (bh)
brelse(bh);
mlog_exit(err);
return err;
}
static int ocfs2_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
{
int err = 0;
u64 p_blkno, past_eof;
mlog_entry("(0x%p, %llu, 0x%p, %d)\n", inode,
(unsigned long long)iblock, bh_result, create);
if (OCFS2_I(inode)->ip_flags & OCFS2_INODE_SYSTEM_FILE)
mlog(ML_NOTICE, "get_block on system inode 0x%p (%lu)\n",
inode, inode->i_ino);
if (S_ISLNK(inode->i_mode)) {
/* this always does I/O for some reason. */
err = ocfs2_symlink_get_block(inode, iblock, bh_result, create);
goto bail;
}
/* this can happen if another node truncs after our extend! */
spin_lock(&OCFS2_I(inode)->ip_lock);
if (iblock >= ocfs2_clusters_to_blocks(inode->i_sb,
OCFS2_I(inode)->ip_clusters))
err = -EIO;
spin_unlock(&OCFS2_I(inode)->ip_lock);
if (err)
goto bail;
err = ocfs2_extent_map_get_blocks(inode, iblock, 1, &p_blkno,
NULL);
if (err) {
mlog(ML_ERROR, "Error %d from get_blocks(0x%p, %llu, 1, "
"%"MLFu64", NULL)\n", err, inode,
(unsigned long long)iblock, p_blkno);
goto bail;
}
map_bh(bh_result, inode->i_sb, p_blkno);
if (bh_result->b_blocknr == 0) {
err = -EIO;
mlog(ML_ERROR, "iblock = %llu p_blkno = %"MLFu64" "
"blkno=(%"MLFu64")\n", (unsigned long long)iblock,
p_blkno, OCFS2_I(inode)->ip_blkno);
}
past_eof = ocfs2_blocks_for_bytes(inode->i_sb, i_size_read(inode));
mlog(0, "Inode %lu, past_eof = %"MLFu64"\n", inode->i_ino, past_eof);
if (create && (iblock >= past_eof))
set_buffer_new(bh_result);
bail:
if (err < 0)
err = -EIO;
mlog_exit(err);
return err;
}
static int ocfs2_readpage(struct file *file, struct page *page)
{
struct inode *inode = page->mapping->host;
loff_t start = (loff_t)page->index << PAGE_CACHE_SHIFT;
int ret, unlock = 1;
mlog_entry("(0x%p, %lu)\n", file, (page ? page->index : 0));
ret = ocfs2_meta_lock_with_page(inode, NULL, NULL, 0, page);
if (ret != 0) {
if (ret == AOP_TRUNCATED_PAGE)
unlock = 0;
mlog_errno(ret);
goto out;
}
down_read(&OCFS2_I(inode)->ip_alloc_sem);
/*
* i_size might have just been updated as we grabed the meta lock. We
* might now be discovering a truncate that hit on another node.
* block_read_full_page->get_block freaks out if it is asked to read
* beyond the end of a file, so we check here. Callers
* (generic_file_read, fault->nopage) are clever enough to check i_size
* and notice that the page they just read isn't needed.
*
* XXX sys_readahead() seems to get that wrong?
*/
if (start >= i_size_read(inode)) {
char *addr = kmap(page);
memset(addr, 0, PAGE_SIZE);
flush_dcache_page(page);
kunmap(page);
SetPageUptodate(page);
ret = 0;
goto out_alloc;
}
ret = ocfs2_data_lock_with_page(inode, 0, page);
if (ret != 0) {
if (ret == AOP_TRUNCATED_PAGE)
unlock = 0;
mlog_errno(ret);
goto out_alloc;
}
ret = block_read_full_page(page, ocfs2_get_block);
unlock = 0;
ocfs2_data_unlock(inode, 0);
out_alloc:
up_read(&OCFS2_I(inode)->ip_alloc_sem);
ocfs2_meta_unlock(inode, 0);
out:
if (unlock)
unlock_page(page);
mlog_exit(ret);
return ret;
}
/* Note: Because we don't support holes, our allocation has
* already happened (allocation writes zeros to the file data)
* so we don't have to worry about ordered writes in
* ocfs2_writepage.
*
* ->writepage is called during the process of invalidating the page cache
* during blocked lock processing. It can't block on any cluster locks
* to during block mapping. It's relying on the fact that the block
* mapping can't have disappeared under the dirty pages that it is
* being asked to write back.
*/
static int ocfs2_writepage(struct page *page, struct writeback_control *wbc)
{
int ret;
mlog_entry("(0x%p)\n", page);
ret = block_write_full_page(page, ocfs2_get_block, wbc);
mlog_exit(ret);
return ret;
}
/*
* ocfs2_prepare_write() can be an outer-most ocfs2 call when it is called
* from loopback. It must be able to perform its own locking around
* ocfs2_get_block().
*/
int ocfs2_prepare_write(struct file *file, struct page *page,
unsigned from, unsigned to)
{
struct inode *inode = page->mapping->host;
int ret;
mlog_entry("(0x%p, 0x%p, %u, %u)\n", file, page, from, to);
ret = ocfs2_meta_lock_with_page(inode, NULL, NULL, 0, page);
if (ret != 0) {
mlog_errno(ret);
goto out;
}
down_read(&OCFS2_I(inode)->ip_alloc_sem);
ret = block_prepare_write(page, from, to, ocfs2_get_block);
up_read(&OCFS2_I(inode)->ip_alloc_sem);
ocfs2_meta_unlock(inode, 0);
out:
mlog_exit(ret);
return ret;
}
/* Taken from ext3. We don't necessarily need the full blown
* functionality yet, but IMHO it's better to cut and paste the whole
* thing so we can avoid introducing our own bugs (and easily pick up
* their fixes when they happen) --Mark */
static int walk_page_buffers( handle_t *handle,
struct buffer_head *head,
unsigned from,
unsigned to,
int *partial,
int (*fn)( handle_t *handle,
struct buffer_head *bh))
{
struct buffer_head *bh;
unsigned block_start, block_end;
unsigned blocksize = head->b_size;
int err, ret = 0;
struct buffer_head *next;
for ( bh = head, block_start = 0;
ret == 0 && (bh != head || !block_start);
block_start = block_end, bh = next)
{
next = bh->b_this_page;
block_end = block_start + blocksize;
if (block_end <= from || block_start >= to) {
if (partial && !buffer_uptodate(bh))
*partial = 1;
continue;
}
err = (*fn)(handle, bh);
if (!ret)
ret = err;
}
return ret;
}
struct ocfs2_journal_handle *ocfs2_start_walk_page_trans(struct inode *inode,
struct page *page,
unsigned from,
unsigned to)
{
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
struct ocfs2_journal_handle *handle = NULL;
int ret = 0;
handle = ocfs2_start_trans(osb, NULL, OCFS2_INODE_UPDATE_CREDITS);
if (!handle) {
ret = -ENOMEM;
mlog_errno(ret);
goto out;
}
if (ocfs2_should_order_data(inode)) {
ret = walk_page_buffers(handle->k_handle,
page_buffers(page),
from, to, NULL,
ocfs2_journal_dirty_data);
if (ret < 0)
mlog_errno(ret);
}
out:
if (ret) {
if (handle)
ocfs2_commit_trans(handle);
handle = ERR_PTR(ret);
}
return handle;
}
static int ocfs2_commit_write(struct file *file, struct page *page,
unsigned from, unsigned to)
{
int ret, extending = 0, locklevel = 0;
loff_t new_i_size;
struct buffer_head *di_bh = NULL;
struct inode *inode = page->mapping->host;
struct ocfs2_journal_handle *handle = NULL;
mlog_entry("(0x%p, 0x%p, %u, %u)\n", file, page, from, to);
/* NOTE: ocfs2_file_aio_write has ensured that it's safe for
* us to sample inode->i_size here without the metadata lock:
*
* 1) We're currently holding the inode alloc lock, so no
* nodes can change it underneath us.
*
* 2) We've had to take the metadata lock at least once
* already to check for extending writes, hence insuring
* that our current copy is also up to date.
*/
new_i_size = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
if (new_i_size > i_size_read(inode)) {
extending = 1;
locklevel = 1;
}
ret = ocfs2_meta_lock_with_page(inode, NULL, &di_bh, locklevel, page);
if (ret != 0) {
mlog_errno(ret);
goto out;
}
ret = ocfs2_data_lock_with_page(inode, 1, page);
if (ret != 0) {
mlog_errno(ret);
goto out_unlock_meta;
}
if (extending) {
handle = ocfs2_start_walk_page_trans(inode, page, from, to);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
handle = NULL;
goto out_unlock_data;
}
/* Mark our buffer early. We'd rather catch this error up here
* as opposed to after a successful commit_write which would
* require us to set back inode->i_size. */
ret = ocfs2_journal_access(handle, inode, di_bh,
OCFS2_JOURNAL_ACCESS_WRITE);
if (ret < 0) {
mlog_errno(ret);
goto out_commit;
}
}
/* might update i_size */
ret = generic_commit_write(file, page, from, to);
if (ret < 0) {
mlog_errno(ret);
goto out_commit;
}
if (extending) {
loff_t size = (u64) i_size_read(inode);
struct ocfs2_dinode *di =
(struct ocfs2_dinode *)di_bh->b_data;
/* ocfs2_mark_inode_dirty is too heavy to use here. */
inode->i_blocks = ocfs2_align_bytes_to_sectors(size);
inode->i_ctime = inode->i_mtime = CURRENT_TIME;
di->i_size = cpu_to_le64(size);
di->i_ctime = di->i_mtime =
cpu_to_le64(inode->i_mtime.tv_sec);
di->i_ctime_nsec = di->i_mtime_nsec =
cpu_to_le32(inode->i_mtime.tv_nsec);
ret = ocfs2_journal_dirty(handle, di_bh);
if (ret < 0) {
mlog_errno(ret);
goto out_commit;
}
}
BUG_ON(extending && (i_size_read(inode) != new_i_size));
out_commit:
if (handle)
ocfs2_commit_trans(handle);
out_unlock_data:
ocfs2_data_unlock(inode, 1);
out_unlock_meta:
ocfs2_meta_unlock(inode, locklevel);
out:
if (di_bh)
brelse(di_bh);
mlog_exit(ret);
return ret;
}
static sector_t ocfs2_bmap(struct address_space *mapping, sector_t block)
{
sector_t status;
u64 p_blkno = 0;
int err = 0;
struct inode *inode = mapping->host;
mlog_entry("(block = %llu)\n", (unsigned long long)block);
/* We don't need to lock journal system files, since they aren't
* accessed concurrently from multiple nodes.
*/
if (!INODE_JOURNAL(inode)) {
err = ocfs2_meta_lock(inode, NULL, NULL, 0);
if (err) {
if (err != -ENOENT)
mlog_errno(err);
goto bail;
}
down_read(&OCFS2_I(inode)->ip_alloc_sem);
}
err = ocfs2_extent_map_get_blocks(inode, block, 1, &p_blkno,
NULL);
if (!INODE_JOURNAL(inode)) {
up_read(&OCFS2_I(inode)->ip_alloc_sem);
ocfs2_meta_unlock(inode, 0);
}
if (err) {
mlog(ML_ERROR, "get_blocks() failed, block = %llu\n",
(unsigned long long)block);
mlog_errno(err);
goto bail;
}
bail:
status = err ? 0 : p_blkno;
mlog_exit((int)status);
return status;
}
/*
* TODO: Make this into a generic get_blocks function.
*
* From do_direct_io in direct-io.c:
* "So what we do is to permit the ->get_blocks function to populate
* bh.b_size with the size of IO which is permitted at this offset and
* this i_blkbits."
*
* This function is called directly from get_more_blocks in direct-io.c.
*
* called like this: dio->get_blocks(dio->inode, fs_startblk,
* fs_count, map_bh, dio->rw == WRITE);
*/
static int ocfs2_direct_IO_get_blocks(struct inode *inode, sector_t iblock,
unsigned long max_blocks,
struct buffer_head *bh_result, int create)
{
int ret;
u64 vbo_max; /* file offset, max_blocks from iblock */
u64 p_blkno;
int contig_blocks;
unsigned char blocksize_bits;
if (!inode || !bh_result) {
mlog(ML_ERROR, "inode or bh_result is null\n");
return -EIO;
}
blocksize_bits = inode->i_sb->s_blocksize_bits;
/* This function won't even be called if the request isn't all
* nicely aligned and of the right size, so there's no need
* for us to check any of that. */
vbo_max = ((u64)iblock + max_blocks) << blocksize_bits;
spin_lock(&OCFS2_I(inode)->ip_lock);
if ((iblock + max_blocks) >
ocfs2_clusters_to_blocks(inode->i_sb,
OCFS2_I(inode)->ip_clusters)) {
spin_unlock(&OCFS2_I(inode)->ip_lock);
ret = -EIO;
goto bail;
}
spin_unlock(&OCFS2_I(inode)->ip_lock);
/* This figures out the size of the next contiguous block, and
* our logical offset */
ret = ocfs2_extent_map_get_blocks(inode, iblock, 1, &p_blkno,
&contig_blocks);
if (ret) {
mlog(ML_ERROR, "get_blocks() failed iblock=%llu\n",
(unsigned long long)iblock);
ret = -EIO;
goto bail;
}
map_bh(bh_result, inode->i_sb, p_blkno);
/* make sure we don't map more than max_blocks blocks here as
that's all the kernel will handle at this point. */
if (max_blocks < contig_blocks)
contig_blocks = max_blocks;
bh_result->b_size = contig_blocks << blocksize_bits;
bail:
return ret;
}
/*
* ocfs2_dio_end_io is called by the dio core when a dio is finished. We're
* particularly interested in the aio/dio case. Like the core uses
* i_alloc_sem, we use the rw_lock DLM lock to protect io on one node from
* truncation on another.
*/
static void ocfs2_dio_end_io(struct kiocb *iocb,
loff_t offset,
ssize_t bytes,
void *private)
{
struct inode *inode = iocb->ki_filp->f_dentry->d_inode;
/* this io's submitter should not have unlocked this before we could */
BUG_ON(!ocfs2_iocb_is_rw_locked(iocb));
ocfs2_iocb_clear_rw_locked(iocb);
up_read(&inode->i_alloc_sem);
ocfs2_rw_unlock(inode, 0);
}
static ssize_t ocfs2_direct_IO(int rw,
struct kiocb *iocb,
const struct iovec *iov,
loff_t offset,
unsigned long nr_segs)
{
struct file *file = iocb->ki_filp;
struct inode *inode = file->f_dentry->d_inode->i_mapping->host;
int ret;
mlog_entry_void();
ret = blockdev_direct_IO_no_locking(rw, iocb, inode,
inode->i_sb->s_bdev, iov, offset,
nr_segs,
ocfs2_direct_IO_get_blocks,
ocfs2_dio_end_io);
mlog_exit(ret);
return ret;
}
struct address_space_operations ocfs2_aops = {
.readpage = ocfs2_readpage,
.writepage = ocfs2_writepage,
.prepare_write = ocfs2_prepare_write,
.commit_write = ocfs2_commit_write,
.bmap = ocfs2_bmap,
.sync_page = block_sync_page,
.direct_IO = ocfs2_direct_IO
};
/* -*- mode: c; c-basic-offset: 8; -*-
* vim: noexpandtab sw=8 ts=8 sts=0:
*
* Copyright (C) 2002, 2004, 2005 Oracle. All rights reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public
* License along with this program; if not, write to the
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
* Boston, MA 021110-1307, USA.
*/
#ifndef OCFS2_AOPS_H
#define OCFS2_AOPS_H
int ocfs2_prepare_write(struct file *file, struct page *page,
unsigned from, unsigned to);
struct ocfs2_journal_handle *ocfs2_start_walk_page_trans(struct inode *inode,
struct page *page,
unsigned from,
unsigned to);
/* all ocfs2_dio_end_io()'s fault */
#define ocfs2_iocb_is_rw_locked(iocb) \
test_bit(0, (unsigned long *)&iocb->private)
#define ocfs2_iocb_set_rw_locked(iocb) \
set_bit(0, (unsigned long *)&iocb->private)
#define ocfs2_iocb_clear_rw_locked(iocb) \
clear_bit(0, (unsigned long *)&iocb->private)
#endif /* OCFS2_FILE_H */
/* -*- mode: c; c-basic-offset: 8; -*-
* vim: noexpandtab sw=8 ts=8 sts=0:
*
* io.c
*
* Buffer cache handling
*
* Copyright (C) 2002, 2004 Oracle. All rights reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public
* License along with this program; if not, write to the
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
* Boston, MA 021110-1307, USA.
*/
#include <linux/fs.h>
#include <linux/types.h>
#include <linux/slab.h>
#include <linux/highmem.h>
#include <cluster/masklog.h>
#include "ocfs2.h"
#include "alloc.h"
#include "inode.h"
#include "journal.h"
#include "uptodate.h"
#include "buffer_head_io.h"
int ocfs2_write_block(struct ocfs2_super *osb, struct buffer_head *bh,
struct inode *inode)
{
int ret = 0;
mlog_entry("(bh->b_blocknr = %llu, inode=%p)\n",
(unsigned long long)bh->b_blocknr, inode);
BUG_ON(bh->b_blocknr < OCFS2_SUPER_BLOCK_BLKNO);
BUG_ON(buffer_jbd(bh));
/* No need to check for a soft readonly file system here. non
* journalled writes are only ever done on system files which
* can get modified during recovery even if read-only. */
if (ocfs2_is_hard_readonly(osb)) {
ret = -EROFS;
goto out;
}
down(&OCFS2_I(inode)->ip_io_sem);
lock_buffer(bh);
set_buffer_uptodate(bh);
/* remove from dirty list before I/O. */
clear_buffer_dirty(bh);
get_bh(bh); /* for end_buffer_write_sync() */
bh->b_end_io = end_buffer_write_sync;
submit_bh(WRITE, bh);
wait_on_buffer(bh);
if (buffer_uptodate(bh)) {
ocfs2_set_buffer_uptodate(inode, bh);
} else {
/* We don't need to remove the clustered uptodate
* information for this bh as it's not marked locally
* uptodate. */
ret = -EIO;
brelse(bh);
}
up(&OCFS2_I(inode)->ip_io_sem);
out:
mlog_exit(ret);
return ret;
}
int ocfs2_read_blocks(struct ocfs2_super *osb, u64 block, int nr,
struct buffer_head *bhs[], int flags,
struct inode *inode)
{
int status = 0;
struct super_block *sb;
int i, ignore_cache = 0;
struct buffer_head *bh;
mlog_entry("(block=(%"MLFu64"), nr=(%d), flags=%d, inode=%p)\n",
block, nr, flags, inode);
if (osb == NULL || osb->sb == NULL || bhs == NULL) {
status = -EINVAL;
mlog_errno(status);
goto bail;
}
if (nr < 0) {
mlog(ML_ERROR, "asked to read %d blocks!\n", nr);
status = -EINVAL;
mlog_errno(status);
goto bail;
}
if (nr == 0) {
mlog(ML_BH_IO, "No buffers will be read!\n");
status = 0;
goto bail;
}
sb = osb->sb;
if (flags & OCFS2_BH_CACHED && !inode)
flags &= ~OCFS2_BH_CACHED;
if (inode)
down(&OCFS2_I(inode)->ip_io_sem);
for (i = 0 ; i < nr ; i++) {
if (bhs[i] == NULL) {
bhs[i] = sb_getblk(sb, block++);
if (bhs[i] == NULL) {
if (inode)
up(&OCFS2_I(inode)->ip_io_sem);
status = -EIO;
mlog_errno(status);
goto bail;
}
}
bh = bhs[i];
ignore_cache = 0;
if (flags & OCFS2_BH_CACHED &&
!ocfs2_buffer_uptodate(inode, bh)) {
mlog(ML_UPTODATE,
"bh (%llu), inode %"MLFu64" not uptodate\n",
(unsigned long long)bh->b_blocknr,
OCFS2_I(inode)->ip_blkno);
ignore_cache = 1;
}
/* XXX: Can we ever get this and *not* have the cached
* flag set? */
if (buffer_jbd(bh)) {
if (!(flags & OCFS2_BH_CACHED) || ignore_cache)
mlog(ML_BH_IO, "trying to sync read a jbd "
"managed bh (blocknr = %llu)\n",
(unsigned long long)bh->b_blocknr);
continue;
}
if (!(flags & OCFS2_BH_CACHED) || ignore_cache) {
if (buffer_dirty(bh)) {
/* This should probably be a BUG, or
* at least return an error. */
mlog(ML_BH_IO, "asking me to sync read a dirty "
"buffer! (blocknr = %llu)\n",
(unsigned long long)bh->b_blocknr);
continue;
}
lock_buffer(bh);
if (buffer_jbd(bh)) {
#ifdef CATCH_BH_JBD_RACES
mlog(ML_ERROR, "block %llu had the JBD bit set "
"while I was in lock_buffer!",
(unsigned long long)bh->b_blocknr);
BUG();
#else
unlock_buffer(bh);
continue;
#endif
}
clear_buffer_uptodate(bh);
get_bh(bh); /* for end_buffer_read_sync() */
bh->b_end_io = end_buffer_read_sync;
if (flags & OCFS2_BH_READAHEAD)
submit_bh(READA, bh);
else
submit_bh(READ, bh);
continue;
}
}
status = 0;
for (i = (nr - 1); i >= 0; i--) {
bh = bhs[i];
/* We know this can't have changed as we hold the
* inode sem. Avoid doing any work on the bh if the
* journal has it. */
if (!buffer_jbd(bh))
wait_on_buffer(bh);
if (!buffer_uptodate(bh)) {
/* Status won't be cleared from here on out,
* so we can safely record this and loop back
* to cleanup the other buffers. Don't need to
* remove the clustered uptodate information
* for this bh as it's not marked locally
* uptodate. */
status = -EIO;
brelse(bh);
bhs[i] = NULL;
continue;
}
if (inode)
ocfs2_set_buffer_uptodate(inode, bh);
}
if (inode)
up(&OCFS2_I(inode)->ip_io_sem);
mlog(ML_BH_IO, "block=(%"MLFu64"), nr=(%d), cached=%s\n", block, nr,
(!(flags & OCFS2_BH_CACHED) || ignore_cache) ? "no" : "yes");
bail:
mlog_exit(status);
return status;
}
/* -*- mode: c; c-basic-offset: 8; -*-
* vim: noexpandtab sw=8 ts=8 sts=0:
*
* ocfs2_buffer_head.h
*
* Buffer cache handling functions defined
*
* Copyright (C) 2002, 2004 Oracle. All rights reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public
* License along with this program; if not, write to the
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
* Boston, MA 021110-1307, USA.
*/
#ifndef OCFS2_BUFFER_HEAD_IO_H
#define OCFS2_BUFFER_HEAD_IO_H
#include <linux/buffer_head.h>
void ocfs2_end_buffer_io_sync(struct buffer_head *bh,
int uptodate);
static inline int ocfs2_read_block(struct ocfs2_super *osb,
u64 off,
struct buffer_head **bh,
int flags,
struct inode *inode);
int ocfs2_write_block(struct ocfs2_super *osb,
struct buffer_head *bh,
struct inode *inode);
int ocfs2_read_blocks(struct ocfs2_super *osb,
u64 block,
int nr,
struct buffer_head *bhs[],
int flags,
struct inode *inode);
#define OCFS2_BH_CACHED 1
#define OCFS2_BH_READAHEAD 8 /* use this to pass READA down to submit_bh */
static inline int ocfs2_read_block(struct ocfs2_super * osb, u64 off,
struct buffer_head **bh, int flags,
struct inode *inode)
{
int status = 0;
if (bh == NULL) {
printk("ocfs2: bh == NULL\n");
status = -EINVAL;
goto bail;
}
status = ocfs2_read_blocks(osb, off, 1, bh,
flags, inode);
bail:
return status;
}
#endif /* OCFS2_BUFFER_HEAD_IO_H */
obj-$(CONFIG_OCFS2_FS) += ocfs2_nodemanager.o
ocfs2_nodemanager-objs := heartbeat.o masklog.o sys.o nodemanager.o \
quorum.o tcp.o ver.o
/* -*- mode: c; c-basic-offset: 8; -*-
* vim: noexpandtab sw=8 ts=8 sts=0:
*
* Copyright (C) 2005 Oracle. All rights reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public
* License along with this program; if not, write to the
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
* Boston, MA 021110-1307, USA.
*/
#ifndef OCFS2_CLUSTER_ENDIAN_H
#define OCFS2_CLUSTER_ENDIAN_H
static inline void be32_add_cpu(__be32 *var, u32 val)
{
*var = cpu_to_be32(be32_to_cpu(*var) + val);
}
#endif /* OCFS2_CLUSTER_ENDIAN_H */
此差异已折叠。
/* -*- mode: c; c-basic-offset: 8; -*-
* vim: noexpandtab sw=8 ts=8 sts=0:
*
* heartbeat.h
*
* Function prototypes
*
* Copyright (C) 2004 Oracle. All rights reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public
* License as published by the Free Software Foundation; either
* version 2 of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* General Public License for more details.
*
* You should have received a copy of the GNU General Public
* License along with this program; if not, write to the
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
* Boston, MA 021110-1307, USA.
*
*/
#ifndef O2CLUSTER_HEARTBEAT_H
#define O2CLUSTER_HEARTBEAT_H
#include "ocfs2_heartbeat.h"
#define O2HB_REGION_TIMEOUT_MS 2000
/* number of changes to be seen as live */
#define O2HB_LIVE_THRESHOLD 2
/* number of equal samples to be seen as dead */
extern unsigned int o2hb_dead_threshold;
#define O2HB_DEFAULT_DEAD_THRESHOLD 7
/* Otherwise MAX_WRITE_TIMEOUT will be zero... */
#define O2HB_MIN_DEAD_THRESHOLD 2
#define O2HB_MAX_WRITE_TIMEOUT_MS (O2HB_REGION_TIMEOUT_MS * (o2hb_dead_threshold - 1))
#define O2HB_CB_MAGIC 0x51d1e4ec
/* callback stuff */
enum o2hb_callback_type {
O2HB_NODE_DOWN_CB = 0,
O2HB_NODE_UP_CB,
O2HB_NUM_CB
};
struct o2nm_node;
typedef void (o2hb_cb_func)(struct o2nm_node *, int, void *);
struct o2hb_callback_func {
u32 hc_magic;
struct list_head hc_item;
o2hb_cb_func *hc_func;
void *hc_data;
int hc_priority;
enum o2hb_callback_type hc_type;
};
struct config_group *o2hb_alloc_hb_set(void);
void o2hb_free_hb_set(struct config_group *group);
void o2hb_setup_callback(struct o2hb_callback_func *hc,
enum o2hb_callback_type type,
o2hb_cb_func *func,
void *data,
int priority);
int o2hb_register_callback(struct o2hb_callback_func *hc);
int o2hb_unregister_callback(struct o2hb_callback_func *hc);
void o2hb_fill_node_map(unsigned long *map,
unsigned bytes);
void o2hb_init(void);
int o2hb_check_node_heartbeating(u8 node_num);
int o2hb_check_node_heartbeating_from_callback(u8 node_num);
int o2hb_check_local_node_heartbeating(void);
void o2hb_stop_all_regions(void);
#endif /* O2CLUSTER_HEARTBEAT_H */
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
EXTRA_CFLAGS += -Ifs/ocfs2
obj-$(CONFIG_OCFS2_FS) += ocfs2_dlm.o ocfs2_dlmfs.o
ocfs2_dlm-objs := dlmdomain.o dlmdebug.o dlmthread.o dlmrecovery.o \
dlmmaster.o dlmast.o dlmconvert.o dlmlock.o dlmunlock.o dlmver.o
ocfs2_dlmfs-objs := userdlm.o dlmfs.o dlmfsver.o
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
#ifndef OCFS2_MMAP_H
#define OCFS2_MMAP_H
int ocfs2_mmap(struct file *file, struct vm_area_struct *vma);
#endif /* OCFS2_MMAP_H */
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册