- 16 3月, 2009 3 次提交
-
-
由 Artem Bityutskiy 提交于
This patch introduces a helpful @c->idx_leb_size variable. The patch also fixes some spelling issues and makes comments use "LEB" instead of "eraseblock", which is more correct. Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
由 Artem Bityutskiy 提交于
Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
由 Artem Bityutskiy 提交于
When writing lprop nodes, do not forget to set @from to 0 when switching the LEB. This fixes the following bug: UBIFS error (pid 27768): ubifs_leb_write: writing -15456 bytes at 16:15880, error -22 UBIFS error (pid 27768): do_commit: commit failed, error -22 UBIFS warning (pid 27768): ubifs_ro_mode: switched to read-only mode, error -22 Pid: 27768, comm: freespace Not tainted 2.6.29-rc4-ubifs-2.6 #43 Call Trace: [<ffffffffa00c46d6>] ubifs_ro_mode+0x54/0x56 [ubifs] [<ffffffffa00cfa16>] do_commit+0x4f5/0x50a [ubifs] [<ffffffffa00cfae7>] ubifs_run_commit+0xbc/0xdb [ubifs] [<ffffffffa00d42b9>] ubifs_budget_space+0x742/0x9ed [ubifs] [<ffffffff812daf45>] ? __mutex_lock_common+0x361/0x3ae [<ffffffffa00bc437>] ? ubifs_write_begin+0x18d/0x44c [ubifs] [<ffffffffa00bc5cb>] ubifs_write_begin+0x321/0x44c [ubifs] [<ffffffff8106222b>] ? trace_hardirqs_on_caller+0x1f/0x14d [<ffffffff81097ce2>] generic_file_buffered_write+0x12f/0x2d9 [<ffffffff8109828d>] __generic_file_aio_write_nolock+0x261/0x295 [<ffffffff81098aff>] generic_file_aio_write+0x69/0xc5 [<ffffffffa00bb914>] ubifs_aio_write+0x14c/0x19e [ubifs] [<ffffffff810c8f42>] do_sync_write+0xe7/0x12d [<ffffffff81055378>] ? autoremove_wake_function+0x0/0x38 [<ffffffff81149edc>] ? security_file_permission+0x11/0x13 [<ffffffff810c9827>] vfs_write+0xab/0x105 [<ffffffff810c9945>] sys_write+0x47/0x6f [<ffffffff8100c35b>] system_call_fastpath+0x16/0x1b Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
- 15 3月, 2009 1 次提交
-
-
由 Artem Bityutskiy 提交于
Empty journal head LEBs are accounted as taken empty as well, so the GC LEB does not have to be the only taken empty LEB when nounting/remounting. Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
- 14 3月, 2009 1 次提交
-
-
由 Adrian Hunter 提交于
UBIFS fast path in write_begin may mark a page up to date and then discover that there may not be enough space to do the write, and so fall back to a slow path. The slow path tries harder, but may still find no space - leaving the page marked up to date, when it is not. This patch ensures that the page is marked not up to date in that case. The bug that this patch fixes becomes evident when the write is into a hole (sparse file) or is at the end of the file and a subsequent read is off the end of the file. In both cases, the file system should return zeros but was instead returning the page that had not been written because the file system was out of space. Signed-off-by: NAdrian Hunter <ext-adrian.hunter@nokia.com> Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
- 08 3月, 2009 2 次提交
-
-
由 Artem Bityutskiy 提交于
... which should be uint32_t, not int. Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
由 Artem Bityutskiy 提交于
Make 'ubifs_find_free_space()' return offset where free space starts, rather than the amount of free space. This is just more appropriat for its caller. Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
- 17 2月, 2009 1 次提交
-
-
由 Eric Sesterhenn 提交于
Trivial cleanup, list_del(); list_add{,_tail}() is equivalent to list_move{,_tail}(). Semantic patch for coccinelle can be found at www.cccmz.de/~snakebyte/list_move_tail.spatch Signed-off-by: NEric Sesterhenn <snakebyte@gmx.de> Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
- 09 2月, 2009 2 次提交
-
-
由 Adrian Hunter 提交于
The debugging function dbg_chk_lpt_sz() was not working correctly for small min_io_unit size e.g. NOR flash. Signed-off-by: NAdrian Hunter <ext-adrian.hunter@nokia.com> Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
由 Cornelia Huck 提交于
Rename the async_*_special() functions to async_*_domain(), which describes the purpose of these functions much better. [Broke up long lines to silence checkpatch] Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
-
- 07 2月, 2009 5 次提交
-
-
由 Tyler Hicks 提交于
The addition of filename encryption caused a regression in unencrypted filename symlink support. ecryptfs_copy_filename() is used when dealing with unencrypted filenames and it reported that the new, copied filename was a character longer than it should have been. This caused the return value of readlink() to count the NULL byte of the symlink target. Most applications don't care about the extra NULL byte, but a version control system (bzr) helped in discovering the bug. Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Roland McGrath 提交于
The elf_core_dump() code does its work with set_fs(KERNEL_DS) in force, so vma_dump_size() needs to switch back with set_fs(USER_DS) to safely use get_user() for a normal user-space address. Checking for VM_READ optimizes out the case where get_user() would fail anyway. The vm_file check here was already superfluous given the control flow earlier in the function, so that is a cleanup/optimization unrelated to other changes but an obvious and trivial one. Reported-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: NRoland McGrath <roland@redhat.com>
-
由 David Howells 提交于
The patch: commit a6f76f23 CRED: Make execve() take advantage of copy-on-write credentials moved the place in which the 'safeness' of a SUID/SGID exec was performed to before de_thread() was called. This means that LSM_UNSAFE_SHARE is now calculated incorrectly. This flag is set if any of the usage counts for fs_struct, files_struct and sighand_struct are greater than 1 at the time the determination is made. All of which are true for threads created by the pthread library. However, since we wish to make the security calculation before irrevocably damaging the process so that we can return it an error code in the case where we decide we want to reject the exec request on this basis, we have to make the determination before calling de_thread(). So, instead, we count up the number of threads (CLONE_THREAD) that are sharing our fs_struct (CLONE_FS), files_struct (CLONE_FILES) and sighand_structs (CLONE_SIGHAND/CLONE_THREAD) with us. These will be killed by de_thread() and so can be discounted by check_unsafe_exec(). We do have to be careful because CLONE_THREAD does not imply FS or FILES. We _assume_ that there will be no extra references to these structs held by the threads we're going to kill. This can be tested with the attached pair of programs. Build the two programs using the Makefile supplied, and run ./test1 as a non-root user. If successful, you should see something like: [dhowells@andromeda tmp]$ ./test1 --TEST1-- uid=4043, euid=4043 suid=4043 exec ./test2 --TEST2-- uid=4043, euid=0 suid=0 SUCCESS - Correct effective user ID and if unsuccessful, something like: [dhowells@andromeda tmp]$ ./test1 --TEST1-- uid=4043, euid=4043 suid=4043 exec ./test2 --TEST2-- uid=4043, euid=4043 suid=4043 ERROR - Incorrect effective user ID! The non-root user ID you see will depend on the user you run as. [test1.c] #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <pthread.h> static void *thread_func(void *arg) { while (1) {} } int main(int argc, char **argv) { pthread_t tid; uid_t uid, euid, suid; printf("--TEST1--\n"); getresuid(&uid, &euid, &suid); printf("uid=%d, euid=%d suid=%d\n", uid, euid, suid); if (pthread_create(&tid, NULL, thread_func, NULL) < 0) { perror("pthread_create"); exit(1); } printf("exec ./test2\n"); execlp("./test2", "test2", NULL); perror("./test2"); _exit(1); } [test2.c] #include <stdio.h> #include <stdlib.h> #include <unistd.h> int main(int argc, char **argv) { uid_t uid, euid, suid; getresuid(&uid, &euid, &suid); printf("--TEST2--\n"); printf("uid=%d, euid=%d suid=%d\n", uid, euid, suid); if (euid != 0) { fprintf(stderr, "ERROR - Incorrect effective user ID!\n"); exit(1); } printf("SUCCESS - Correct effective user ID\n"); exit(0); } [Makefile] CFLAGS = -D_GNU_SOURCE -Wall -Werror -Wunused all: test1 test2 test1: test1.c gcc $(CFLAGS) -o test1 test1.c -lpthread test2: test2.c gcc $(CFLAGS) -o test2 test2.c sudo chown root.root test2 sudo chmod +s test2 Reported-by: NDavid Smith <dsmith@redhat.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NDavid Smith <dsmith@redhat.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 Dave Kleikamp 提交于
This is a modification of a patch by Bill Pemberton <wfp5p@virginia.edu> nobh_write_end() could call attach_nobh_buffers() with head == NULL. This would result in a trap when attach_nobh_buffers() attempted to access bh->b_this_page. This can be illustrated by running the writev01 testcase from LTP on jfs. This error was introduced by commit 5b41e74a "vfs: fix data leak in nobh_write_end()". That patch did not take into account that if PageMappedToDisk() is true upon entry to nobh_write_begin(), then no buffers will be allocated for the page. In that case, we won't have to worry about a failed write leaving unitialized data in the page. Of course, head != NULL implies !page_has_buffers(page), so no need to test both. Signed-off-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Dmitri Monakhov <dmonakhov@openvz.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Chris Mason 提交于
The S_ISGID check in btrfs_new_inode caused an oops during subvol creation because sometimes the dir is null. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 06 2月, 2009 3 次提交
-
-
由 Al Viro 提交于
... and yes, gcc is insane enough to eat that without complaint. We probably want sparse to scream on those... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
lseek() further than length of the file will leave stale ->index (second-to-last during iteration). Next seq_read() will not notice that ->f_pos is big enough to return 0, but will print last item as if ->f_pos is pointing to it. Introduced in commit cb510b81 aka "seq_file: more atomicity in traverse()". Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Biederman 提交于
In 2.6.25 some /proc files were converted to use the seq_file infrastructure. But seq_files do not correctly support pread(), which broke some usersapce applications. To handle pread correctly we can't assume that f_pos is where we left it in seq_read. So move traverse() so that we can eventually use it in seq_read and do thus some day support pread(). Signed-off-by: NEric Biederman <ebiederm@xmission.com> Cc: Paul Turner <pjt@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 2月, 2009 2 次提交
-
-
由 Chris Mason 提交于
The code wasn't doing a kfree on the sorted array Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Mark Fasheh 提交于
This reverts commit 0e033342. I committed this by accident - Joel and Louis are working with the lockdep maintainer to provide a better solution than just turning lockdep off. Signed-off-by: NMark Fasheh <mfasheh@suse.com> Acked-by: N<Joel Becker <joel.becker@oracle.com>
-
- 04 2月, 2009 19 次提交
-
-
由 Chris Mason 提交于
On fast devices that go from congested to uncongested very quickly, pdflush is waiting too often in congestion_wait, and the FS is backing off to easily in write_cache_pages. For now, fix this on the btrfs side by only checking congestion after some bios have already gone down. Longer term a real fix is needed for pdflush, but that is a larger project. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Whenever an item deletion is done, we need to balance all the nodes in the tree to make sure we don't end up with an empty node if a pointer is deleted. This balance prep happens from the root of the tree down so we can drop our locks as we go. reada_for_balance was triggering read-ahead on neighboring nodes even when no balancing was required. This adds an extra check to avoid calling balance_level() and avoid reada_for_balance() when a balance won't be required. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
btrfs_unlock_up_safe would break out at the first NULL node entry or unlocked node it found in the path. Some of the callers have missing nodes at the lower levels of the path, so this commit fixes things to check all the nodes in the path before returning. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
btrfs_del_leaf does two things. First it removes the pointer in the parent, and then it frees the block that has the leaf. It has the parent node locked for both operations. But, it only needs the parent locked while it is deleting the pointer. After that it can safely free the block without the parent locked. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
btrfs_truncate_inode_items is setup to stop doing btree searches when it has finished removing the items for the inode. It used to detect the end of the inode by looking for an objectid that didn't match the one we were searching for. But, this would result in an extra search through the btree, which adds extra balancing and cow costs to the operation. This commit adds a check to see if we found the inode item, which means we can stop searching early. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
The compression code had some checks to make sure we were only compressing bytes inside of i_size, but it wasn't catching every case. To make things worse, some incorrect math about the number of bytes remaining would make it try to compress more pages than the file really had. The fix used here is to fall back to the non-compression code in this case, which does all the proper cleanup of delalloc and other accounting. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
With selinux on we end up calling __btrfs_setxattr when we create an inode, which calls btrfs_start_transaction(). The problem is we've already called that in btrfs_new_inode, and in btrfs_start_transaction we end up doing a wait_current_trans(). If btrfs-transaction has started committing it will wait for all handles to finish, while the other process is waiting for the transaction to commit. This is fixed by using btrfs_join_transaction, which won't wait for the transaction to commit. Thanks, Signed-off-by: NJosef Bacik <jbacik@redhat.com>
-
由 Chris Ball 提交于
Before this patch, new files/dirs would ignore the SGID bit on their parent directory and always be owned by the creating user's uid/gid. Signed-off-by: NChris Ball <cjb@laptop.org> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Every transaction in btrfs creates a new snapshot, and then schedules the snapshot from the last transaction for deletion. Snapshot deletion works by walking down the btree and dropping the reference counts on each btree block during the walk. If if a given leaf or node has a reference count greater than one, the reference count is decremented and the subtree pointed to by that node is ignored. If the reference count is one, walking continues down into that node or leaf, and the references of everything it points to are decremented. The old code would try to work in small pieces, walking down the tree until it found the lowest leaf or node to free and then returning. This was very friendly to the rest of the FS because it didn't have a huge impact on other operations. But it wouldn't always keep up with the rate that new commits added new snapshots for deletion, and it wasn't very optimal for the extent allocation tree because it wasn't finding leaves that were close together on disk and processing them at the same time. This changes things to walk down to a level 1 node and then process it in bulk. All the leaf pointers are sorted and the leaves are dropped in order based on their extent number. The extent allocation tree and commit code are now fast enough for this kind of bulk processing to work without slowing the rest of the FS down. Overall it does less IO and is better able to keep up with snapshot deletions under high load. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Most of the btrfs metadata operations can be protected by a spinlock, but some operations still need to schedule. So far, btrfs has been using a mutex along with a trylock loop, most of the time it is able to avoid going for the full mutex, so the trylock loop is a big performance gain. This commit is step one for getting rid of the blocking locks entirely. btrfs_tree_lock takes a spinlock, and the code explicitly switches to a blocking lock when it starts an operation that can schedule. We'll be able get rid of the blocking locks in smaller pieces over time. Tracing allows us to find the most common cause of blocking, so we can start with the hot spots first. The basic idea is: btrfs_tree_lock() returns with the spin lock held btrfs_set_lock_blocking() sets the EXTENT_BUFFER_BLOCKING bit in the extent buffer flags, and then drops the spin lock. The buffer is still considered locked by all of the btrfs code. If btrfs_tree_lock gets the spinlock but finds the blocking bit set, it drops the spin lock and waits on a wait queue for the blocking bit to go away. Much of the code that needs to set the blocking bit finishes without actually blocking a good percentage of the time. So, an adaptive spin is still used against the blocking bit to avoid very high context switch rates. btrfs_clear_lock_blocking() clears the blocking bit and returns with the spinlock held again. btrfs_tree_unlock() can be called on either blocking or spinning locks, it does the right thing based on the blocking bit. ctree.c has a helper function to set/clear all the locked buffers in a path as blocking. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Before metadata is written to disk, it is updated to reflect that writeout has begun. Once this update is done, the block must be cow'd before it can be modified again. This update was originally synchronized by using a per-fs spinlock. Today the buffers for the metadata blocks are locked before writeout begins, and everyone that tests the flag has the buffer locked as well. So, the per-fs spinlock (called hash_lock for no good reason) is no longer required. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
extent_io.c has debugging code to report and free leaked extent_state and extent_buffer objects at rmmod time. This helps track down leaks and it saves you from rebooting just to properly remove the kmem_cache object. But, the code runs under a fairly expensive spinlock and the checks to see if it is currently enabled are not entirely consistent. Some use #ifdef and some #if. This changes everything to #if and disables the leak checking. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
When a block goes through cow, we update the reference counts of everything that block points to. The internal pointers of the block can be in just about any order, and it is likely to have clusters of things that are close together and clusters of things that are not. To help reduce the seeks that come with updating all of these reference counts, sort them by byte number before actual updates are done. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
Tracing shows the delay between when an async thread goes to sleep and when more work is added is often very short. This commit adds a little bit of delay and extra checking to the code right before we schedule out. It allows more work to be added to the worker without requiring notifications from other procs. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Jim Owens 提交于
Add call to LSM security initialization and save resulting security xattr for new inodes. Add xattr support to symlink inode ops. Set inode->i_op for existing special files. Signed-off-by: Njim owens <jowens@hp.com>
-
由 Christian Hesse 提交于
This patch adds a menu entry to kconfig to enable acls for btrfs. This allows you to enable FS_POSIX_ACL at kernel compile time. (updated by Jeff Mahoney to make the changes in fs/btrfs/Kconfig instead) Signed-off-by: NChristian Hesse <mail@earthworm.de> Signed-off-by: NJeff Mahoney <jeffm@suse.com>
-
由 Chris Mason 提交于
The async bio submission thread was missing some bios that were added after it had decided there was no work left to do. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Felix Blyakher 提交于
Till VFS can correctly support read-only remount without racing, use WARN_ON instead of BUG_ON on detecting transaction in flight after quiescing filesystem. Signed-off-by: NFelix Blyakher <felixb@sgi.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Dave Chinner 提交于
Before trying to obtain, read or write a buffer, check that the buffer length is actually valid. If it is not valid, then something read in the recovery process has been corrupted and we should abort recovery. Reported-by: NEric Sesterhenn <snakebyte@gmx.de> Tested-by: NEric Sesterhenn <snakebyte@gmx.de> Reviewed-by: NChristoph Hellwig <hch@infradead.org> Reviewed-by: NFelix Blyakher <felixb@sgi.com> Signed-off-by: NDave Chinner <david@fromorbit.com> Signed-off-by: NFelix Blyakher <felixb@sgi.com>
-
- 03 2月, 2009 1 次提交
-
-
由 Mark Fasheh 提交于
We weren't reclaiming the clusters which get free'd from this function, so any user punching holes in a file would still have those bytes accounted against him/her. Add the call to vfs_dq_free_space_nodirty() to fix this. Interestingly enough, the journal credits calculation already took this into account. Signed-off-by: NMark Fasheh <mfasheh@suse.com> Acked-by: NJan Kara <jack@suse.cz>
-