- 27 5月, 2011 1 次提交
-
-
由 Chris Mason 提交于
This will detect small random writes into files and queue the up for an auto defrag process. It isn't well suited to database workloads yet, but works for smaller files such as rpm, sqlite or bdb databases. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 24 5月, 2011 1 次提交
-
-
由 Julia Lawall 提交于
Btrfs_alloc_path should be matched with btrfs_free_path in error-handling code. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r exists@ local idexpression struct btrfs_path * x; expression ra,rb; position p1,p2; @@ x = btrfs_alloc_path@p1(...) ... when != btrfs_free_path(x,...) when != if (...) { ... btrfs_free_path(x,...) ...} when != x = ra if(...) { ... when != x = rb when forall when != btrfs_free_path(x,...) \(return <+...x...+>; \| return@p2...; \) } @script:python@ p1 << r.p1; p2 << r.p2; @@ cocci.print_main("alloc",p1) cocci.print_secs("return",p2) // </smpl> Signed-off-by: NJulia Lawall <julia@diku.dk> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 21 5月, 2011 1 次提交
-
-
由 Miao Xie 提交于
Changelog V5 -> V6: - Fix oom when the memory load is high, by storing the delayed nodes into the root's radix tree, and letting btrfs inodes go. Changelog V4 -> V5: - Fix the race on adding the delayed node to the inode, which is spotted by Chris Mason. - Merge Chris Mason's incremental patch into this patch. - Fix deadlock between readdir() and memory fault, which is reported by Itaru Kitayama. Changelog V3 -> V4: - Fix nested lock, which is reported by Itaru Kitayama, by updating space cache inode in time. Changelog V2 -> V3: - Fix the race between the delayed worker and the task which does delayed items balance, which is reported by Tsutomu Itoh. - Modify the patch address David Sterba's comment. - Fix the bug of the cpu recursion spinlock, reported by Chris Mason Changelog V1 -> V2: - break up the global rb-tree, use a list to manage the delayed nodes, which is created for every directory and file, and used to manage the delayed directory name index items and the delayed inode item. - introduce a worker to deal with the delayed nodes. Compare with Ext3/4, the performance of file creation and deletion on btrfs is very poor. the reason is that btrfs must do a lot of b+ tree insertions, such as inode item, directory name item, directory name index and so on. If we can do some delayed b+ tree insertion or deletion, we can improve the performance, so we made this patch which implemented delayed directory name index insertion/deletion and delayed inode update. Implementation: - introduce a delayed root object into the filesystem, that use two lists to manage the delayed nodes which are created for every file/directory. One is used to manage all the delayed nodes that have delayed items. And the other is used to manage the delayed nodes which is waiting to be dealt with by the work thread. - Every delayed node has two rb-tree, one is used to manage the directory name index which is going to be inserted into b+ tree, and the other is used to manage the directory name index which is going to be deleted from b+ tree. - introduce a worker to deal with the delayed operation. This worker is used to deal with the works of the delayed directory name index items insertion and deletion and the delayed inode update. When the delayed items is beyond the lower limit, we create works for some delayed nodes and insert them into the work queue of the worker, and then go back. When the delayed items is beyond the upper bound, we create works for all the delayed nodes that haven't been dealt with, and insert them into the work queue of the worker, and then wait for that the untreated items is below some threshold value. - When we want to insert a directory name index into b+ tree, we just add the information into the delayed inserting rb-tree. And then we check the number of the delayed items and do delayed items balance. (The balance policy is above.) - When we want to delete a directory name index from the b+ tree, we search it in the inserting rb-tree at first. If we look it up, just drop it. If not, add the key of it into the delayed deleting rb-tree. Similar to the delayed inserting rb-tree, we also check the number of the delayed items and do delayed items balance. (The same to inserting manipulation) - When we want to update the metadata of some inode, we cached the data of the inode into the delayed node. the worker will flush it into the b+ tree after dealing with the delayed insertion and deletion. - We will move the delayed node to the tail of the list after we access the delayed node, By this way, we can cache more delayed items and merge more inode updates. - If we want to commit transaction, we will deal with all the delayed node. - the delayed node will be freed when we free the btrfs inode. - Before we log the inode items, we commit all the directory name index items and the delayed inode update. I did a quick test by the benchmark tool[1] and found we can improve the performance of file creation by ~15%, and file deletion by ~20%. Before applying this patch: Create files: Total files: 50000 Total time: 1.096108 Average time: 0.000022 Delete files: Total files: 50000 Total time: 1.510403 Average time: 0.000030 After applying this patch: Create files: Total files: 50000 Total time: 0.932899 Average time: 0.000019 Delete files: Total files: 50000 Total time: 1.215732 Average time: 0.000024 [1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3 Many thanks for Kitayama-san's help! Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Reviewed-by: NDavid Sterba <dave@jikos.cz> Tested-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com> Tested-by: NItaru Kitayama <kitayama@cl.bb4u.ne.jp> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 13 5月, 2011 1 次提交
-
-
由 Arne Jansen 提交于
this function won't be used here anymore, so move it super.c where it is used for df-calculation
-
- 02 5月, 2011 1 次提交
-
-
由 David Sterba 提交于
reported by gcc -Wshadow: page_index, page_offset, new_inode, dev_name Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
- 12 4月, 2011 1 次提交
-
-
由 Xin Zhong 提交于
We create two subvolumes (meego_root and meego_home) in btrfs root directory. And set meego_root as default mount subvolume. After we remount btrfs, meego_root is mounted to top directory by default. Then when we try to mount meego_home (subvol=meego_home) to a subdirectory, it failed. The problem is when default mount subvolume is set to meego_root, we search meego_home in meego_root but can not find it. So the solution is to add a new mount option (subvolrootid) to specify subvol id of root and search subvol name in it. For our case, now we can use "-o subvolrootid=0,subvol=meego_home) to mount meego_home. Detail information can be found in meego bugzilla: https://bugs.meego.com/show_bug.cgi?id=15055Signed-off-by: NZhong, Xin <xin.zhong@intel.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 05 4月, 2011 1 次提交
-
-
由 Tsutomu Itoh 提交于
Some mount options are not displayed by /proc/mounts. This patch displays the option such as compress_type by /proc/mounts. Ex. [before] $ mount | grep sdc2 /dev/sdc2 on /test12 type btrfs (rw,space_cache,compress=lzo) $ cat /proc/mounts | grep sdc2 /dev/sdc2 /test12 btrfs rw,relatime,compress 0 0 [after] $ mount | grep sdc2 /dev/sdc2 on /test12 type btrfs (rw,space_cache,compress=lzo) $ cat /proc/mounts | grep sdc2 /dev/sdc2 /test12 btrfs rw,relatime,compress=lzo,space_cache 0 0 Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 28 3月, 2011 1 次提交
-
-
由 liubo 提交于
Tracepoints can provide insight into why btrfs hits bugs and be greatly helpful for debugging, e.g dd-7822 [000] 2121.641088: btrfs_inode_request: root = 5(FS_TREE), gen = 4, ino = 256, blocks = 8, disk_i_size = 0, last_trans = 8, logged_trans = 0 dd-7822 [000] 2121.641100: btrfs_inode_new: root = 5(FS_TREE), gen = 8, ino = 257, blocks = 0, disk_i_size = 0, last_trans = 0, logged_trans = 0 btrfs-transacti-7804 [001] 2146.935420: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29368320 (orig_level = 0), cow_buf = 29388800 (cow_level = 0) btrfs-transacti-7804 [001] 2146.935473: btrfs_cow_block: root = 1(ROOT_TREE), refs = 2, orig_buf = 29364224 (orig_level = 0), cow_buf = 29392896 (cow_level = 0) btrfs-transacti-7804 [001] 2146.972221: btrfs_transaction_commit: root = 1(ROOT_TREE), gen = 8 flush-btrfs-2-7821 [001] 2155.824210: btrfs_chunk_alloc: root = 3(CHUNK_TREE), offset = 1103101952, size = 1073741824, num_stripes = 1, sub_stripes = 0, type = DATA flush-btrfs-2-7821 [001] 2155.824241: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29388800 (orig_level = 0), cow_buf = 29396992 (cow_level = 0) flush-btrfs-2-7821 [001] 2155.824255: btrfs_cow_block: root = 4(DEV_TREE), refs = 2, orig_buf = 29372416 (orig_level = 0), cow_buf = 29401088 (cow_level = 0) flush-btrfs-2-7821 [000] 2155.824329: btrfs_cow_block: root = 3(CHUNK_TREE), refs = 2, orig_buf = 20971520 (orig_level = 0), cow_buf = 20975616 (cow_level = 0) btrfs-endio-wri-7800 [001] 2155.898019: btrfs_cow_block: root = 5(FS_TREE), refs = 2, orig_buf = 29384704 (orig_level = 0), cow_buf = 29405184 (cow_level = 0) btrfs-endio-wri-7800 [001] 2155.898043: btrfs_cow_block: root = 7(CSUM_TREE), refs = 2, orig_buf = 29376512 (orig_level = 0), cow_buf = 29409280 (cow_level = 0) Here is what I have added: 1) ordere_extent: btrfs_ordered_extent_add btrfs_ordered_extent_remove btrfs_ordered_extent_start btrfs_ordered_extent_put These provide critical information to understand how ordered_extents are updated. 2) extent_map: btrfs_get_extent extent_map is used in both read and write cases, and it is useful for tracking how btrfs specific IO is running. 3) writepage: __extent_writepage btrfs_writepage_end_io_hook Pages are cirtical resourses and produce a lot of corner cases during writeback, so it is valuable to know how page is written to disk. 4) inode: btrfs_inode_new btrfs_inode_request btrfs_inode_evict These can show where and when a inode is created, when a inode is evicted. 5) sync: btrfs_sync_file btrfs_sync_fs These show sync arguments. 6) transaction: btrfs_transaction_commit In transaction based filesystem, it will be useful to know the generation and who does commit. 7) back reference and cow: btrfs_delayed_tree_ref btrfs_delayed_data_ref btrfs_delayed_ref_head btrfs_cow_block Btrfs natively supports back references, these tracepoints are helpful on understanding btrfs's COW mechanism. 8) chunk: btrfs_chunk_alloc btrfs_chunk_free Chunk is a link between physical offset and logical offset, and stands for space infomation in btrfs, and these are helpful on tracing space things. 9) reserved_extent: btrfs_reserved_extent_alloc btrfs_reserved_extent_free These can show how btrfs uses its space. Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 17 2月, 2011 1 次提交
-
-
由 Chris Mason 提交于
ENOSPC in btrfs is getting to the point where the extra debugging isn't required. I've put it under mount -o enospc_debug just in case someone is having difficult problems. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 01 2月, 2011 1 次提交
-
-
由 Tsutomu Itoh 提交于
The error check of btrfs_start_transaction() is added, and the mistake of the error check on several places is corrected. Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 27 1月, 2011 2 次提交
-
-
由 Tero Roponen 提交于
We must save and free the original kstrdup()'ed pointer because strsep() modifies its first argument. Signed-off-by: NTero Roponen <tero.roponen@gmail.com> Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
-
由 Ian Kent 提交于
We missed a memory deallocation in commit 450ba0ea. If an existing super block is found at mount and there is no error condition then the pre-allocated tree_root and fs_info are no not used and are not freeded. Signed-off-by: NIan Kent <raven@themaw.net> Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
-
- 18 1月, 2011 1 次提交
-
-
由 liubo 提交于
This patch comes from "Forced readonly mounts on errors" ideas. As we know, this is the first step in being more fault tolerant of disk corruptions instead of just using BUG() statements. The major content: - add a framework for generating errors that should result in filesystems going readonly. - keep FS state in disk super block. - make sure that all of resource will be freed and released at umount time. - make sure that fter FS is forced readonly on error, there will be no more disk change before FS is corrected. For this, we should stop write operation. After this patch is applied, the conversion from BUG() to such a framework can happen incrementally. Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 17 1月, 2011 2 次提交
-
-
由 Miao Xie 提交于
When we store data by raid profile in btrfs with two or more different size disks, df command shows there is some free space in the filesystem, but the user can not write any data in fact, df command shows the wrong free space information of btrfs. # mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10 # btrfs-show Label: none uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64 Total devices 2 FS bytes used 28.00KB devid 1 size 5.01GB used 2.03GB path /dev/sda9 devid 2 size 10.00GB used 2.01GB path /dev/sda10 # btrfs device scan /dev/sda9 /dev/sda10 # mount /dev/sda9 /mnt # dd if=/dev/zero of=tmpfile0 bs=4K count=9999999999 (fill the filesystem) # sync # df -TH Filesystem Type Size Used Avail Use% Mounted on /dev/sda9 btrfs 17G 8.6G 5.4G 62% /mnt # btrfs-show Label: none uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64 Total devices 2 FS bytes used 3.99GB devid 1 size 5.01GB used 5.01GB path /dev/sda9 devid 2 size 10.00GB used 4.99GB path /dev/sda10 It is because btrfs cannot allocate chunks when one of the pairing disks has no space, the free space on the other disks can not be used for ever, and should be subtracted from the total space, but btrfs doesn't subtract this space from the total. It is strange to the user. This patch fixes it by calcing the free space that can be used to allocate chunks. Implementation: 1. get all the devices free space, and align them by stripe length. 2. sort the devices by the free space. 3. check the free space of the devices, 3.1. if it is not zero, and then check the number of the devices that has more free space than this device, if the number of the devices is beyond the min stripe number, the free space can be used, and add into total free space. if the number of the devices is below the min stripe number, we can not use the free space, the check ends. 3.2. if the free space is zero, check the next devices, goto 3.1 This implementation is just likely fake chunk allocation. After appling this patch, df can show correct space information: # df -TH Filesystem Type Size Used Avail Use% Mounted on /dev/sda9 btrfs 17G 8.6G 0 100% /mnt Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Miao Xie 提交于
Josef has implemented mixed data/metadata chunks, we must add those chunks' space just like data chunks. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Reviewed-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 13 1月, 2011 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 22 12月, 2010 2 次提交
-
-
由 Li Zefan 提交于
Lzo is a much faster compression algorithm than gzib, so would allow more users to enable transparent compression, and some users can choose from compression ratio and speed for different applications Usage: # mount -t btrfs -o compress[=<zlib,lzo>] dev /mnt or # mount -t btrfs -o compress-force[=<zlib,lzo>] dev /mnt "-o compress" without argument is still allowed for compatability. Compatibility: If we mount a filesystem with lzo compression, it will not be able be mounted in old kernels. One reason is, otherwise btrfs will directly dump compressed data, which sits in inline extent, to user. Performance: The test copied a linux source tarball (~400M) from an ext4 partition to the btrfs partition, and then extracted it. (time in second) lzo zlib nocompress copy: 10.6 21.7 14.9 extract: 70.1 94.4 66.6 (data size in MB) lzo zlib nocompress copy: 185.87 108.69 394.49 extract: 193.80 132.36 381.21 Changelog: v1 -> v2: - Select LZO_COMPRESS and LZO_DECOMPRESS in btrfs Kconfig. - Add incompability flag. - Fix error handling in compress code. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
-
由 Li Zefan 提交于
Make the code aware of compression type, instead of always assuming zlib compression. Also make the zlib workspace function as common code for all compression types. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
-
- 11 12月, 2010 1 次提交
-
-
由 Li Zefan 提交于
We should drop dentry before deactivating the superblock, otherwise we can hit this bug: BUG: Dentry f349a690{i=100,n=/} still in use (1) [unmount of btrfs loop1] ... Steps to reproduce the bug: # mount /dev/loop1 /mnt # mkdir save # btrfs subvolume snapshot /mnt save/snap1 # umount /mnt # mount -o subvol=save/snap1 /dev/loop1 /mnt (crash) Reported-by: NMichael Niederle <mniederle@gmx.at> Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 28 11月, 2010 2 次提交
-
-
由 Josef Bacik 提交于
There is a problem with how we use sget, it searches through the list of supers attached to the fs_type looking for a super with the same fs_devices as what we're trying to mount. This depends on sb->s_fs_info being filled, but we don't fill that in until we get to btrfs_fill_super, so we could hit supers on the fs_type super list that have a null s_fs_info. In order to fix that we need to go ahead and setup a blank root with a blank fs_info to hold fs_devices, that way our test will work out right and then we can set s_fs_info in btrfs_set_super, and then open_ctree will simply use our pre-allocated root and fs_info when setting everything up. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Ian Kent 提交于
When mounting a btrfs file system btrfs_test_super() may attempt to use sb->s_fs_info, the btrfs root, of a super block that is going away and that has had the btrfs root set to NULL in its ->put_super(). But if the super block is going away it cannot be an existing super block so we can return false in this case. Signed-off-by: NIan Kent <raven@themaw.net> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 22 11月, 2010 1 次提交
-
-
由 Josef Bacik 提交于
When I added the clear_cache option I screwed up and took the break out of the space_cache case statement, so whenever you mount with space_cache you also get clear_cache, which does you no good if you say set space_cache in fstab so it always gets set. This patch adds the break back in properly. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 30 10月, 2010 3 次提交
-
-
由 Sage Weil 提交于
Add a mount option user_subvol_rm_allowed that allows users to delete a (potentially non-empty!) subvol when they would otherwise we allowed to do an rmdir(2). We duplicate the may_delete() checks from the core VFS code to implement identical security checks (minus the directory size check). We additionally require that the user has write+exec permission on the subvol root inode. Signed-off-by: NSage Weil <sage@newdream.net> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Andi Kleen 提交于
These are all the cases where a variable is set, but not read which are not bugs as far as I can see, but simply leftovers. Still needs more review. Found by gcc 4.6's new warnings Signed-off-by: NAndi Kleen <ak@linux.intel.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Julia Lawall 提交于
Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more clear what is the purpose of the operation, which otherwise looks like a no-op. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ type T; T x; identifier f; @@ T f (...) { <+... - ERR_PTR(PTR_ERR(x)) + x ...+> } @@ expression x; @@ - ERR_PTR(PTR_ERR(x)) + ERR_CAST(x) // </smpl> Signed-off-by: NJulia Lawall <julia@diku.dk> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 29 10月, 2010 3 次提交
-
-
由 Josef Bacik 提交于
If something goes wrong with the free space cache we need a way to make sure it's not loaded on mount and that it's cleared for everybody. When you pass the clear_cache option it will make it so all block groups are setup to be cleared, which keeps them from being loaded and then they will be truncated when the transaction is committed. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Josef Bacik 提交于
In order to save free space cache, we need an inode to hold the data, and we need a special item to point at the right inode for the right block group. So first, create a special item that will point to the right inode, and the number of extent entries we will have and the number of bitmaps we will have. We truncate and pre-allocate space everytime to make sure it's uptodate. This feature will be turned on as soon as you mount with -o space_cache, however it is safe to boot into old kernels, they will just generate the cache the old fashion way. When you boot back into a newer kernel we will notice that we modified and not the cache and automatically discard the cache. Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 23 10月, 2010 2 次提交
-
-
由 Josef Bacik 提交于
If we failed to find the root subvol id, or the subvol=<name>, we would deactivate the locked super and close the devices. The problem is at this point we have gotten the SB all setup, which includes setting super_operations, so when we'd deactiveate the super, we'd do a close_ctree() which closes the devices, so we'd end up closing the devices twice. So if you do something like this mount /dev/sda1 /mnt/test1 mount /dev/sda1 /mnt/test2 -o subvol=xxx umount /mnt/test1 it would blow up (if subvol xxx doesn't exist). This patch fixes that problem. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
The new ENOSPC stuff breaks out the raid types which breaks the way we were reporting df to the system. This fixes it back so that Available is the total space available to data and used is the actual bytes used by the filesystem. This means that Available is Total - data used - all of the metadata space. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 15 10月, 2010 1 次提交
-
-
由 Arnd Bergmann 提交于
All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode *i, struct file *f) { <+... ( nonseekable_open(...) | nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable */ }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, /* open uses nonseekable */ }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, /* we have seq_read */ }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, /* readdir is present */ }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, /* read accesses f_pos */ }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, /* write accesses f_pos */ }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, /* read and write both use no f_pos */ }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, /* write uses no f_pos */ }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, /* read uses no f_pos */ }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, /* no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: NArnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>
-
- 10 8月, 2010 1 次提交
-
-
由 Al Viro 提交于
NB: do we want btrfs_wait_ordered_range() on eviction of inodes with positive i_nlink on subvolume with zero root_refs? If not, btrfs_evict_inode() can be simplified by unconditionally bailing out in case of i_nlink > 0 in the very beginning... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 12 6月, 2010 2 次提交
-
-
由 Dan Carpenter 提交于
btrfs_iget() returns an ERR_PTR() on failure and not null. Signed-off-by: NDan Carpenter <error27@gmail.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Dan Carpenter 提交于
If btrfs_lookup_dir_item() fails, we should can just let the mount fail with an error. Signed-off-by: NDan Carpenter <error27@gmail.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 26 5月, 2010 1 次提交
-
-
由 Kay Sievers 提交于
This adds: alias: devname:<name> to some common kernel modules, which will allow the on-demand loading of the kernel module when the device node is accessed. Ideally all these modules would be compiled-in, but distros seems too much in love with their modularization that we need to cover the common cases with this new facility. It will allow us to remove a bunch of pretty useless init scripts and modprobes from init scripts. The static device node aliases will be carried in the module itself. The program depmod will extract this information to a file in the module directory: $ cat /lib/modules/2.6.34-00650-g537b60d1-dirty/modules.devname # Device nodes to trigger on-demand module loading. microcode cpu/microcode c10:184 fuse fuse c10:229 ppp_generic ppp c108:0 tun net/tun c10:200 dm_mod mapper/control c10:235 Udev will pick up the depmod created file on startup and create all the static device nodes which the kernel modules specify, so that these modules get automatically loaded when the device node is accessed: $ /sbin/udevd --debug ... static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184 static_dev_create_from_modules: mknod '/dev/fuse' c10:229 static_dev_create_from_modules: mknod '/dev/ppp' c108:0 static_dev_create_from_modules: mknod '/dev/net/tun' c10:200 static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235 udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666 udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666 A few device nodes are switched to statically allocated numbers, to allow the static nodes to work. This might also useful for systems which still run a plain static /dev, which is completely unsafe to use with any dynamic minor numbers. Note: The devname aliases must be limited to the *common* and *single*instance* device nodes, like the misc devices, and never be used for conceptually limited systems like the loop devices, which should rather get fixed properly and get a control node for losetup to talk to, instead of creating a random number of device nodes in advance, regardless if they are ever used. This facility is to hide the mess distros are creating with too modualized kernels, and just to hide that these modules are not compiled-in, and not to paper-over broken concepts. Thanks! :) Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: David S. Miller <davem@davemloft.net> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Ian Kent <raven@themaw.net> Signed-Off-By: NKay Sievers <kay.sievers@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 25 5月, 2010 3 次提交
-
-
由 Yan, Zheng 提交于
reserve metadata space for handling orphan inodes Signed-off-by: NYan Zheng <zheng.yan@oracle.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan, Zheng 提交于
Besides simplify the code, this change makes sure all metadata reservation for normal metadata operations are released after committing transaction. Changes since V1: Add code that check if unlink and rmdir will free space. Add ENOSPC handling for clone ioctl. Signed-off-by: NYan Zheng <zheng.yan@oracle.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan, Zheng 提交于
The size of reserved space is stored in space_info. If block groups of different raid types are linked to separate space_info, changing allocation profile will corrupt reserved space accounting. Signed-off-by: NYan Zheng <zheng.yan@oracle.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 31 3月, 2010 1 次提交
-
-
由 Josef Bacik 提交于
As Yan pointed out, theres not much reason for all this complicated math to account for file extents being split up into max_extent chunks, since they are likely to all end up in the same leaf anyway. Since there isn't much reason to use max_extent, just remove the option altogether so we have one less thing we need to test. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 30 3月, 2010 1 次提交
-
-
由 Tejun Heo 提交于
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: NTejun Heo <tj@kernel.org> Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
-