- 01 11月, 2011 2 次提交
-
-
由 Paul Gortmaker 提交于
Some files were using the complete module.h infrastructure without actually including the header at all. Fix them up in advance so once the implicit presence is removed, we won't get failures like this: CC [M] fs/nfsd/nfssvc.o fs/nfsd/nfssvc.c: In function 'nfsd_create_serv': fs/nfsd/nfssvc.c:335: error: 'THIS_MODULE' undeclared (first use in this function) fs/nfsd/nfssvc.c:335: error: (Each undeclared identifier is reported only once fs/nfsd/nfssvc.c:335: error: for each function it appears in.) fs/nfsd/nfssvc.c: In function 'nfsd': fs/nfsd/nfssvc.c:555: error: implicit declaration of function 'module_put_and_exit' make[3]: *** [fs/nfsd/nfssvc.o] Error 1 Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
由 Paul Gortmaker 提交于
These files were getting <linux/module.h> via an implicit include path, but we want to crush those out of existence since they cost time during compiles of processing thousands of lines of headers for no reason. Give them the lightweight header that just contains the EXPORT_SYMBOL infrastructure. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
- 28 10月, 2011 20 次提交
-
-
由 J. Bruce Fields 提交于
In setlease, we use i_writecount to decide whether we can give out a read lease. In open, we break leases before incrementing i_writecount. There is therefore a window between the break lease and the i_writecount increment when setlease could add a new read lease. This would leave us with a simultaneous write open and read lease, which shouldn't happen. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Andi Kleen 提交于
This makes NFS follow the standard generic_file_llseek locking scheme. Cc: Trond.Myklebust@netapp.com Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Andi Kleen 提交于
This gives ext4 the benefits of unlocked llseek. Cc: tytso@mit.edu Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Andi Kleen 提交于
Add a generic_file_llseek variant to the VFS that allows passing in the maximum file size of the file system, instead of always using maxbytes from the superblock. This can be used to eliminate some cut'n'paste seek code in ext4. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Andi Kleen 提交于
The i_mutex lock use of generic _file_llseek hurts. Independent processes accessing the same file synchronize over a single lock, even though they have no need for synchronization at all. Under high utilization this can cause llseek to scale very poorly on larger systems. This patch does some rethinking of the llseek locking model: First the 64bit f_pos is not necessarily atomic without locks on 32bit systems. This can already cause races with read() today. This was discussed on linux-kernel in the past and deemed acceptable. The patch does not change that. Let's look at the different seek variants: SEEK_SET: Doesn't really need any locking. If there's a race one writer wins, the other loses. For 32bit the non atomic update races against read() stay the same. Without a lock they can also happen against write() now. The read() race was deemed acceptable in past discussions, and I think if it's ok for read it's ok for write too. => Don't need a lock. SEEK_END: This behaves like SEEK_SET plus it reads the maximum size too. Reading the maximum size would have the 32bit atomic problem. But luckily we already have a way to read the maximum size without locking (i_size_read), so we can just use that instead. Without i_mutex there is no synchronization with write() anymore, however since the write() update is atomic on 64bit it just behaves like another racy SEEK_SET. On non atomic 32bit it's the same as SEEK_SET. => Don't need a lock, but need to use i_size_read() SEEK_CUR: This has a read-modify-write race window on the same file. One could argue that any application doing unsynchronized seeks on the same file is already broken. But for the sake of not adding a regression here I'm using the file->f_lock to synchronize this. Using this lock is much better than the inode mutex because it doesn't synchronize between processes. => So still need a lock, but can use a f_lock. This patch implements this new scheme in generic_file_llseek. I dropped generic_file_llseek_unlocked and changed all callers. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Andi Kleen 提交于
This doesn't change anything for the compiler, but hch thought it would make the code clearer. I moved the reference counting into its own little inline. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Andi Kleen 提交于
Add inlines to all the submission path functions. While this increases code size it also gives gcc a lot of optimization opportunities in this critical hotpath. In particular -- together with some other changes -- this allows gcc to get rid of the unnecessary clearing of sdio at the beginning and optimize the messy parameter passing. Any non inlining of a function which takes a sdio parameter would break this optimization because they cannot be done if the address of a structure is taken. Note that benefits are only seen with CONFIG_OPTIMIZE_INLINING and CONFIG_CC_OPTIMIZE_FOR_SIZE both set to off. This gives about 2.2% improvement on a large database benchmark with a high IOPS rate. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Andi Kleen 提交于
Only a single b_private field in the map_bh buffer head is needed after the submission path. Move map_bh separately to avoid storing this information in the long term slab. This avoids the weird 104 byte hole in struct dio_submit which also needed to be memseted early. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Andi Kleen 提交于
A direct slab call is slightly faster than kmalloc and can be better cached per CPU. It also avoids rounding to the next kmalloc slab. In addition this enforces cache line alignment for struct dio to avoid any false sharing. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Andi Kleen 提交于
Fix most problems reported by pahole. There is still a weird 104 byte hole after map_bh. I'm not sure what causes this. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Andi Kleen 提交于
There's nothing on the stack, even before my changes. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Andi Kleen 提交于
This large, but largely mechanic, patch moves all fields in struct dio that are only used in the submission path into a separate on stack data structure. This has the advantage that the memory is very likely cache hot, which is not guaranteed for memory fresh out of kmalloc. This also gives gcc more optimization potential because it can easier determine that there are no external aliases for these variables. The sdio initialization is a initialization now instead of memset. This allows gcc to break sdio into individual fields and optimize away unnecessary zeroing (after all the functions are inlined) Signed-off-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Christoph Hellwig 提交于
We need to move the inode to the end of the list to actually make the spinning prevention explained in the comment above it work. With a plain list_move it will simply stay in place as we're always reclaiming from the head of the list. Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Andreas Gruenbacher 提交于
Acked-by: NJ. Bruce Fields <bfields@redhat.com> Acked-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAndreas Gruenbacher <agruen@kernel.org> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Andreas Gruenbacher 提交于
Acked-by: NJ. Bruce Fields <bfields@redhat.com> Acked-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAndreas Gruenbacher <agruen@kernel.org> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Andreas Gruenbacher 提交于
Acked-by: NJ. Bruce Fields <bfields@redhat.com> Acked-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAndreas Gruenbacher <agruen@kernel.org> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Eric W. Biederman 提交于
This was found by inspection while tracking a similar bug in compat_statfs64, that has been fixed in mainline since decemeber. - This fixes a bug where not all of the f_spare fields were cleared on mips and s390. - Add the f_flags field to struct compat_statfs - Copy f_flags to userspace in case someone cares. - Use __clear_user to copy the f_spare field to userspace to ensure that all of the elements of f_spare are cleared. On some architectures f_spare is has 5 ints and on some architectures f_spare only has 4 ints. Which makes the previous technique of clearing each int individually broken. I don't expect anyone actually uses the old statfs system call anymore but if they do let them benefit from having the compat and the native version working the same. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Bryan Schumaker 提交于
nfsiostat was failing to find mounted filesystems on kernels after 2.6.38 because of changes to show_vfsstat() by commit c7f404b4. This patch adds back the "device" tag before the nfs server entry so scripts can parse the mountstats file correctly. Signed-off-by: NBryan Schumaker <bjschuma@netapp.com> CC: stable@kernel.org [>=2.6.39] Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Wang Sheng-Hui 提交于
The patch is aganist 3.1-rc3. Signed-off-by: NWang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Steve French 提交于
Samba supports a setfs info level to negotiate encrypted shares. This patch adds the defines so we recognize this info level. Later patches will add the enablement for it. Acked-by: NJeremy Allison <jra@samba.org> Signed-off-by: NSteve French <smfrench@gmail.com>
-
- 27 10月, 2011 1 次提交
-
-
由 Boaz Harrosh 提交于
In my last patch I did a stupid mistake and broke the exofs compilation completely. Fix it ASAP. Instead of obj-y I did obj-$(y) Really Really sorry. Me totally blushing :-{| Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 10月, 2011 11 次提交
-
-
由 Sage Weil 提交于
ceph_release_page_vector() kfrees the vector; we shouldn't do it here too. Reported-by: NJeff Wu <cpwu@tnsoft.com.cn> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Amon Ott 提交于
Fix 32-bit ino generation to not always be 1. Signed-off-by: NAmon Ott <a.ott@m-privacy.de>
-
由 Greg Farnum 提交于
Previously we were validating the passed-in stripe unit, object size, and stripe count against each other (and not testing most other stuff). Instead, make sure that the composed previous layout and new values are valid, and only send the new values to the MDS. This lets users change the pool without setting the whole layout, for instance. Signed-off-by: NGreg Farnum <gregory.farnum@dreamhost.com>
-
由 Sage Weil 提交于
This reverts commit c9af9fb6. We need to block and truncate all pages in order to reliably invalidate them. Otherwise, we could: - have some uptodate pages in the cache - queue an invalidate - write(2) locks some pages - invalidate_work skips them - write(2) only overwrites part of the page - page now dirty and uptodate -> partial leakage of invalidated data It's not entirely clear why we started skipping locked pages in the first place. I just ran this through fsx and didn't see any problems. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Noah Watkins 提交于
Trivial formatting fix. Signed-off-by: NNoah Watkins <noahwatkins@gmail.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
The pool allocation failures are masked by the pool; there is no need to spam the console about them. (That's the whole point of having the pool in the first place.) Mark msg allocations whose failure is safely handled as such. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
This simplifies the init/shutdown paths, and makes client->msgr available during the rest of the setup process. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
...after some prodding by Christoph. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
The 'rsize' mount option limits the maximum size of an individual read(ahead) operation that is sent off to an OSD. This is distinct from 'rasize', which controls the size of the readahead window. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
It controls readahead. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
When we get a ->readpages() aop, submit async reads for all page ranges in the provided page list. Lock the pages immediately, so that VFS/MM will block until the reads complete. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 25 10月, 2011 6 次提交
-
-
由 Eric W. Biederman 提交于
In commit 8a9ea323 ("Merge git://.../davem/net-next") where my sysfs changes from the net tree merged with the sysfs rbtree changes from Mickulas Patocka the conflict resolution failed to preserve the simplified property that was the point of my changes. That is sysfs_find_dirent can now say something is a match if and only s_name and s_ns match what we are looking for, and sysfs_readdir can simply return all of the directory entries where s_ns matches the directory that we should be returning. Now that we are back to exact matches we can tweak sysfs_find_dirent and the name rb_tree to order sysfs_dirents by s_ns s_name and remove the second loop in sysfs_find_dirent. However that change seems a bit much for a conflict resolution so it can come later. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Boaz Harrosh 提交于
Now that we support raid5 Enable it at mount. Raid6 will come next raid4 is not demanded for so it will probably not be enabled. (Until some one wants it) NOTE: That mkfs.exofs had support for raid5/6 since long time ago. (Making an empty raidX FS is just as easy as raid0 ;-} ) Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
由 Boaz Harrosh 提交于
The ore need suplied a r4w_get_page/r4w_put_page API from Filesystem so it can get cache pages to read-into when writing parial stripes. Also I commented out and NULLed the .writepage (singular) vector. Because it gives terrible write pattern to raid and is apparently not needed. Even in OOM conditions the system copes (even better) with out it. TODO: How to specify to write_cache_pages() to start or include a certain page? Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
由 Boaz Harrosh 提交于
This is finally the RAID5 Write support. The bigger part of this patch is not the XOR engine itself, But the read4write logic, which is a complete mini prepare_for_striping reading engine that can read scattered pages of a stripe into cache so it can be used for XOR calculation. That is, if the write was not stripe aligned. The main algorithm behind the XOR engine is the 2 dimensional array: struct __stripe_pages_2d. A drawing might save 1000 words --- __stripe_pages_2d | n = pages_in_stripe_unit; w = group_width - parity; | pages array presented to the XOR lib | | V | __1_page_stripe[0].pages --> [c0][c1]..[cw][c_par] <---| | | __1_page_stripe[1].pages --> [c0][c1]..[cw][c_par] <--- | ... | ... | __1_page_stripe[n].pages --> [c0][c1]..[cw][c_par] ^ | data added columns first then row --- The pages are put on this array columns first. .i.e: p0-of-c0, p1-of-c0, ... pn-of-c0, p0-of-c1, ... So we are doing a corner turn of the pages. Note that pages will zigzag down and left. but are put sequentially in growing order. So when the time comes to XOR the stripe, only the beginning and end of the array need be checked. We scan the array and any NULL spot will be field by pages-to-be-read. The FS that wants to support RAID5 needs to supply an operations-vector that searches a given page in cache, and specifies if the page is uptodate or need reading. All these pages to be read are put on a slave ore_io_state and synchronously read. All the pages of a stripe are read in one IO, using the scatter gather mechanism. In write we constrain our IO to only be incomplete on a single stripe. Meaning either the complete IO is within a single stripe so we might have pages to read from both beginning or end of the strip. Or we have some reading to do at beginning but end at strip boundary. The left over pages are pushed to the next IO by the API already established by previous work, where an IO offset/length combination presented to the ORE might get the length truncated and the user must re-submit the leftover pages. (Both exofs and NFS support this) But any ORE user should make it's best effort to align it's IO before hand and avoid complications. A cached ore_layout->stripe_size member can be used for that calculation. (NOTE: that ORE demands that stripe_size may not be bigger then 32bit) What else? Well read it and tell me. Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
由 Boaz Harrosh 提交于
This patch introduces the first stage of RAID5 support mainly the skip-over-raid-units when reading. For writes it inserts BLANK units, into where XOR blocks should be calculated and written to. It introduces the new "general raid maths", and the main additional parameters and components needed for raid5. Since at this stage it could corrupt future version that actually do support raid5. The enablement of raid5 mounting and setting of parity-count > 0 is disabled. So the raid5 code will never be used. Mounting of raid5 is only enabled later once the basic XOR write is also in. But if the patch "enable RAID5" is applied this code has been tested to be able to properly read raid5 volumes and is according to standard. Also it has been tested that the new maths still properly supports RAID0 and grouping code just as before. (BTW: I have found more bugs in the pnfs-obj RAID math fixed here) The ore.c file is getting too big, so new ore_raid.[hc] files are added that will include the special raid stuff that are not used in striping and mirrors. In future write support these will get bigger. When adding the ore_raid.c to Kbuild file I was forced to rename ore.ko to libore.ko. Is it possible to keep source file, say ore.c and module file ore.ko the same even if there are multiple files inside ore.ko? Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-
由 Boaz Harrosh 提交于
fs/exofs directory has multiple targets now, of which the ore.ko will be needed by the pnfs-objects-layout-driver (fs/nfs/objlayout). As suggested by: Michal Marek <mmarek@suse.cz> convert inclusion of exofs/ from obj-$(CONFIG_EXOFS_FS) => obj-$(y). So ORE can be selected also from fs/nfs/Kconfig CC: Michal Marek <mmarek@suse.cz> CC: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
-