- 19 4月, 2008 23 次提交
-
-
由 Dave Hansen 提交于
Originally from: Herbert Poetzl <herbert@13thfloor.at> This is the core of the read-only bind mount patch set. Note that this does _not_ add a "ro" option directly to the bind mount operation. If you require such a mount, you must first do the bind, then follow it up with a 'mount -o remount,ro' operation: If you wish to have a r/o bind mount of /foo on bar: mount --bind /foo /bar mount -o remount,ro /bar Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
This is the real meat of the entire series. It actually implements the tracking of the number of writers to a mount. However, it causes scalability problems because there can be hundreds of cpus doing open()/close() on files on the same mnt at the same time. Even an atomic_t in the mnt has massive scalaing problems because the cacheline gets so terribly contended. This uses a statically-allocated percpu variable. All want/drop operations are local to a cpu as long that cpu operates on the same mount, and there are no writer count imbalances. Writer count imbalances happen when a write is taken on one cpu, and released on another, like when an open/close pair is performed on two Upon a remount,ro request, all of the data from the percpu variables is collected (expensive, but very rare) and we determine if there are any outstanding writers to the mount. I've written a little benchmark to sit in a loop for a couple of seconds in several cpus in parallel doing open/write/close loops. http://sr71.net/~dave/linux/openbench.c The code in here is a a worst-possible case for this patch. It does opens on a _pair_ of files in two different mounts in parallel. This should cause my code to lose its "operate on the same mount" optimization completely. This worst-case scenario causes a 3% degredation in the benchmark. I could probably get rid of even this 3%, but it would be more complex than what I have here, and I think this is getting into acceptable territory. In practice, I expect writing more than 3 bytes to a file, as well as disk I/O to mask any effects that this has. (To get rid of that 3%, we could have an #defined number of mounts in the percpu variable. So, instead of a CPU getting operate only on percpu data when it accesses only one mount, it could stay on percpu data when it only accesses N or fewer mounts.) [AV] merged fix for __clear_mnt_mount() stepping on freed vfsmount Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
If we depend on the inodes for writeability, we will not catch the r/o mounts when implemented. This patches uses __mnt_want_write(). It does not guarantee that the mount will stay writeable after the check. But, this is OK for one of the checks because it is just for a printk(). The other two are probably unnecessary and duplicate existing checks in the VFS. This won't make them better checks than before, but it will make them detect r/o mounts. Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
Elevate the write count during the xfs m/ctime updates. XFS has to do it's own timestamp updates due to an unfortunate VFS design limitation, so it will have to track writers by itself aswell. [hch: split out from the touch_atime patch as it's not related to it at all] Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
It is OK to let access() go without using a mnt_want/drop_write() pair because it doesn't actually do writes to the filesystem, and it is inherently racy anyway. This is a rare case when it is OK to use __mnt_is_readonly() directly. Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
chown/chmod,etc... don't call permission in the same way that the normal "open for write" calls do. They still write to the filesystem, so bump the write count during these operations. Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
This is the first really tricky patch in the series. It elevates the writer count on a mount each time a non-special file is opened for write. We used to do this in may_open(), but Miklos pointed out that __dentry_open() is used as well to create filps. This will cover even those cases, while a call in may_open() would not have. There is also an elevated count around the vfs_create() call in open_namei(). See the comments for more details, but we need this to fix a 'create, remount, fail r/w open()' race. Some filesystems forego the use of normal vfs calls to create struct files. Make sure that these users elevate the mnt writer count because they will get __fput(), and we need to make sure they're balanced. Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
Some ioctl()s can cause writes to the filesystem. Take these, and make them use mnt_want/drop_write() instead. [AV: updated] Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
Now includes fix for oops seen by akpm. "never let a libc developer write your kernel code" - hch "nor, apparently, a kernel developer" - akpm Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
Remove handling of NULL mnt while we are at it - that can't happen these days. Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
This basically audits the callers of xattr_permission(), which calls permission() and can perform writes to the filesystem. [AV: add missing parts - removexattr() and nfsd posix acls, plug for a leak spotted by Miklos] Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
This also uses the little helper in the NFS code to make an if() a little bit less ugly. We introduced the helper at the beginning of the series. Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
[AV: add missing nfsd pieces] Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
This takes care of all of the direct callers of vfs_mknod(). Since a few of these cases also handle normal file creation as well, this also covers some calls to vfs_create(). So that we don't have to make three mnt_want/drop_write() calls inside of the switch statement, we move some of its logic outside of the switch and into a helper function suggested by Christoph. This also encapsulates a fix for mknod(S_IFREG) that Miklos found. [AV: merged mkdir handling, added missing nfsd pieces] Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
Elevate the write count during the vfs_rmdir() and vfs_unlink(). [AV: merged rmdir and unlink parts, added missing pieces in nfsd] Acked-by: NSerge Hallyn <serue@us.ibm.com> Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
The emergency remount code forcibly removes FMODE_WRITE from filps. The r/o bind mount code notices that this was done without a proper mnt_drop_write() and properly gives a warning. This patch does a mnt_drop_write() to keep everything balanced. Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
If someone decides to demote a file from r/w to just r/o, they can use this same code as __fput(). NFS does just that, and will use this in the next patch. AV: drop write access in __fput() only after we evict from file list. Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Cc: Erez Zadok <ezk@cs.sunysb.edu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J Bruce Fields" <bfields@fieldses.org> Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
This patch adds two function mnt_want_write() and mnt_drop_write(). These are used like a lock pair around and fs operations that might cause a write to the filesystem. Before these can become useful, we must first cover each place in the VFS where writes are performed with a want/drop pair. When that is complete, we can actually introduce code that will safely check the counts before allowing r/w<->r/o transitions to occur. Acked-by: NSerge Hallyn <serue@us.ibm.com> Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
open_namei() will, in the future, need to take mount write counts over its creation and truncation (via may_open()) operations. It needs to keep these write counts until any potential filp that is created gets __fput()'d. This gets complicated in the error handling and becomes very murky as to how far open_namei() actually got, and whether or not that mount write count was taken. That makes it a bad interface. All that the current do_filp_open() really does is allocate the nameidata on the stack, then call open_namei(). So, this merges those two functions and moves filp_open() over to namei.c so it can be close to its buddy: do_filp_open(). It also gets a kerneldoc comment in the process. Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
My end goal here is to make sure all users of may_open() return filps. This will ensure that we properly release mount write counts which were taken for the filp in may_open(). This patch moves the sys_open flags to namei flags calculation into fs/namei.c. We'll shortly be moving the nameidata_to_filp() calls into namei.c, and this gets the sys_open flags to a place where we can get at them when we need them. Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 18 4月, 2008 17 次提交
-
-
由 Sunil Mushran 提交于
This patch exposes o2net information via debugfs. The information includes the list of sockets (sock_containers) as well as the list of outstanding messages (send_tracking). Useful for o2dlm debugging. (This patch is derived from an earlier one written by Zach Brown that exposed the same information via /proc.) [Mark: checkpatch fixes] Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Reviewed-by: NJoel Becker <joel.becker@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Mark Fasheh 提交于
fs/ocfs2/dlm/ocfs2_dlm.ko and fs/ocfs2/dlm/ocfs2_dlmfs.ko get built if CONFIG_FS_OCFS2 is specified. This isn't quite how it should happen any more - the "o2cb" dlm modules should only be built if CONFIG_FS_OCFS2_O2CB is set, so update the dlm Makefile accordingly. Signed-off-by: NMark Fasheh <mfasheh@suse.com> Acked-by: NRandy Dunlap <randy.dunlap@oracle.com> Acked-by: NJoel Becker <joel.becker@oracle.com>
-
由 Jeff Mahoney 提交于
We keep seeing bug reports related to NULL pointer derefs in o2net_set_nn_state(). When I originally wrote up the configurable timeout patch, I had tried to plan for multiple clusters. This was silly. The timeout routines all use o2nm_single_cluster so there's no point in passing an argument at all. This patch removes the arguments and kills those bugs dead. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Julia Lawall 提交于
if (...) BUG(); should be replaced with BUG_ON(...) when the test has no side-effects to allow a definition of BUG_ON that drops the code completely. The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @ disable unlikely @ expression E,f; @@ ( if (<... f(...) ...>) { BUG(); } | - if (unlikely(E)) { BUG(); } + BUG_ON(E); ) @@ expression E,f; @@ ( if (<... f(...) ...>) { BUG(); } | - if (E) { BUG(); } + BUG_ON(E); ) // </smpl> Signed-off-by: NJulia Lawall <julia@diku.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Andi Kleen 提交于
As far as I can see there is nothing in ocfs2_ioctl that requires the BKL, so use unlocked_ioctl Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Jan Kara 提交于
ocfs2_rename() was being too aggressive with the rename lock - we only need it for certain forms of directory rename. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Julia Lawall 提交于
The function ocfs2_start_trans always returns either a valid pointer or a value made with ERR_PTR, so its result should be tested with IS_ERR, not with a test for 0. Signed-off-by: NJulia Lawall <julia@diku.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Tao Ma 提交于
Inode allocation is modified to look in other nodes allocators during extreme out of space situations. We retry our own slot when space is freed back to the global bitmap, or whenever we've allocated more than 1024 inodes from another slot. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Tao Ma 提交于
In inode stealing, we no longer restrict the allocation to happen in the local node. So it is neccessary for us to add a new member in ocfs2_alloc_context to indicate which slot we are using for allocation. We also modify the process of local alloc so that this member can be used there also. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Tao Ma 提交于
In some cases(Inode stealing from other nodes), we may not want ocfs2_reserve_suballoc_bits to allocate new groups from the global_bitmap since it may already be full. So add a new parameter for this. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Tao Ma 提交于
In ocfs2_figure_merge_contig_type, we judge whether there exists a cross extent block merge and enable it by setting CONTIG_LEFT and CONTIG_RIGHT accordingly. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Tao Ma 提交于
In ocfs2_merge_rec_left, when we find the merge extent is "CONTIG_RIGHT" with the first extent record of the next extent block, we will merge it to the next extent block and change all the related extent blocks accordingly. In ocfs2_merge_rec_right, when we find the merge extent is "CONTIG_LEFT" with the last extent record of the previous extent block, we will merge it to the prevoius extent block and change all the related extent blocks accordingly. As for CONTIG_LEFTRIGHT, we will handle CONTIG_RIGHT first so that when the index is zero, the merge process will be more efficient and easier. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Mark Fasheh 提交于
/sys/fs is where we really want file system specific sysfs objects. Ocfs2-tools has been updated to look in /sys/fs/o2cb. We can maintain backwards compatibility with old ocfs2-tools by using a sysfs symlink. After some time (2 years), the symlink can be safely removed. This patch also adds documentation to make it easier for people to figure out what /sys/fs/o2cb is used for. Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Mark Fasheh 提交于
Allow callers of sysfs_remove_link() to pass a NULL kobj, in which case sysfs_root will be used as the parent directory. This allows us to tear down top level symlinks created via sysfs_create_link(), which already has similar handling of a NULL parent object. Signed-off-by: NMark Fasheh <mfasheh@suse.com> Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Tao Ma 提交于
Currently, o2net connects to a node on hb_up and disconnects on hb_down and net timeout. It disconnects on net timeout is ok, but it should attempt to reconnect back. This is because sometimes nodes get overloaded enough that the network connection breaks but the disk hb does not. And if we get into that situation, we either fence (unnecessarily) or wait for its disk hb to die (and sometimes hang in the process). So in this updated scheme, when the network disconnects, we keep attempting to reconnect till we succeed or we get a disk hb down event. If the other node is really dead, then we will eventually get a node down event. If not, we should be able to connect again and continue. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Sunil Mushran 提交于
A previous patch added KERN_NOTICE to printks printing the lockres that cluttered the output. This patch removes the log level. For people concerned with syslog clutter, please note we now use this facility to print lockres only during an error. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Sunil Mushran 提交于
__dlm_print_one_lock_resource was printing lockname incorrectly. Also, we now use printk directly instead of mlog as the latter prints the line context which is not useful for this print. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-