- 17 12月, 2010 2 次提交
-
-
由 Christoph Lameter 提交于
__this_cpu_inc can create a single instruction with the same effect as the _get_cpu_var(..)++ construct in buffer.c. Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Christoph Hellwig <hch@lst.de> Acked-by: NH. Peter Anvin <hpa@zytor.com> Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Christoph Lameter 提交于
Optimize various per cpu area operations through these new percpu operations. These operations avoid address calculations through the use of segment prefixes and multiple memory references through RMW instructions etc. Reduces code size: Before: christoph@linux-2.6$ size fs/buffer.o text data bss dec hex filename 19169 80 28 19277 4b4d fs/buffer.o After: christoph@linux-2.6$ size fs/buffer.o text data bss dec hex filename 19138 80 28 19246 4b2e fs/buffer.o V3->V4: - Move the use of this_cpu_inc_return into a later patch so that this one can go in without percpu infrastructure changes. Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Christoph Hellwig <hch@lst.de> Acked-by: NH. Peter Anvin <hpa@zytor.com> Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 16 12月, 2010 3 次提交
-
-
由 Ryusuke Konishi 提交于
On 2.6.37-rc1, garbage collection ioctl of nilfs was broken due to the commit 263d90ce ("nilfs2: remove own inode hash used for GC"), and leading to filesystem corruption. The patch doesn't queue gc-inodes for log writer if they are reused through the vfs inode cache. Here, gc-inode is the inode which buffers blocks to be relocated on GC. That patch queues gc-inodes in nilfs_init_gcinode() function, but this function is not called when they don't have I_NEW flag. Thus, some of live blocks are wrongly overrode without being moved to new logs. This resolves the problem by moving the gc-inode queueing to an outer function to ensure it's done right. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Tavis Ormandy 提交于
The install_special_mapping routine (used, for example, to setup the vdso) skips the security check before insert_vm_struct, allowing a local attacker to bypass the mmap_min_addr security restriction by limiting the available pages for special mappings. bprm_mm_init() also skips the check, and although I don't think this can be used to bypass any restrictions, I don't see any reason not to have the security check. $ uname -m x86_64 $ cat /proc/sys/vm/mmap_min_addr 65536 $ cat install_special_mapping.s section .bss resb BSS_SIZE section .text global _start _start: mov eax, __NR_pause int 0x80 $ nasm -D__NR_pause=29 -DBSS_SIZE=0xfffed000 -f elf -o install_special_mapping.o install_special_mapping.s $ ld -m elf_i386 -Ttext=0x10000 -Tbss=0x11000 -o install_special_mapping install_special_mapping.o $ ./install_special_mapping & [1] 14303 $ cat /proc/14303/maps 0000f000-00010000 r-xp 00000000 00:00 0 [vdso] 00010000-00011000 r-xp 00001000 00:19 2453665 /home/taviso/install_special_mapping 00011000-ffffe000 rwxp 00000000 00:00 0 [stack] It's worth noting that Red Hat are shipping with mmap_min_addr set to 4096. Signed-off-by: NTavis Ormandy <taviso@google.com> Acked-by: NKees Cook <kees@ubuntu.com> Acked-by: NRobert Swiecki <swiecki@google.com> [ Changed to not drop the error code - akpm ] Reviewed-by: NJames Morris <jmorris@namei.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Paris 提交于
The fanotify_event_metadata now has a field which is supposed to indicate the length of the metadata portion of the event. Fill in that field as well. Based-in-part-on-patch-by: NAlexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: NEric Paris <eparis@redhat.com>
-
- 15 12月, 2010 2 次提交
-
-
由 Aaro Koskinen 提交于
There should be a check for the NUL character instead of '0'. Fortunately the only thing that cares about this is NFS serving, which is why we didn't notice this in the merge window testing. Reported-by: NPhil Carmody <ext-phil.2.carmody@nokia.com> Signed-off-by: NAaro Koskinen <aaro.koskinen@nokia.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Jon Nelson has found a test case which causes postgresql to fail with the error: psql:t.sql:4: ERROR: invalid page header in block 38269 of relation base/16384/16581 Under memory pressure, it looks like part of a file can end up getting replaced by zero's. Until we can figure out the cause, we'll roll back the change and use block_write_full_page() instead of ext4_bio_write_page(). The new, more efficient writing function can be used via the mount option mblk_io_submit, so we can test and fix the new page I/O code. To reproduce the problem, install postgres 8.4 or 9.0, and pin enough memory such that the system just at the end of triggering writeback before running the following sql script: begin; create temporary table foo as select x as a, ARRAY[x] as b FROM generate_series(1, 10000000 ) AS x; create index foo_a_idx on foo (a); create index foo_b_idx on foo USING GIN (b); rollback; If the temporary table is created on a hard drive partition which is encrypted using dm_crypt, then under memory pressure, approximately 30-40% of the time, pgsql will issue the above failure. This patch should fix this problem, and the problem will come back if the file system is mounted with the mblk_io_submit mount option. Reported-by: NJon Nelson <jnelson@jamponi.net> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 14 12月, 2010 3 次提交
-
-
由 Chris Mason 提交于
The extent allocator has code that allows us to fill allocations from any available block group, even if it doesn't match the raid level we've requested. This was put in because adding a new drive to a filesystem made with the default mkfs options actually upgrades the metadata from single spindle dup to full RAID1. But, the code also allows us to allocate from a raid0 chunk when we really want a raid1 or raid10 chunk. This can cause big trouble because mkfs creates a small (4MB) raid0 chunk for data and metadata which then goes unused for raid1/raid10 installs. The allocator will happily wander in and allocate from that chunk when things get tight, which is not correct. The fix here is to make sure that we provide duplication when the caller has asked for it. It does all the dups to be any raid level, which preserves the dup->raid1 upgrade abilities. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
When we mount in RAID degraded mode without adding a new device to replace the failed one, we can end up using the wrong RAID flags for allocations. This results in strange combinations of block groups (raid1 in a raid10 filesystem) and corruptions when we try to allocate blocks from single spindle chunks on drives that are actually missing. The first device has two small 4MB chunks in it that mkfs creates and these are usually unused in a raid1 or raid10 setup. But, in -o degraded, the allocator will fall back to these because the mask of desired raid groups isn't correct. The fix here is to count the missing devices as we build up the list of devices in the system. This count is used when picking the raid level to make sure we continue using the same levels that were in place before we lost a drive. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
If we just get a plain IO error when we read tree roots, the code wasn't properly sending that error up the chain. This allowed mounts to continue when they should failed, and allowed operations on partially setup root structs. The end result was usually oopsen on spinlocks that hadn't been spun up correctly. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 11 12月, 2010 8 次提交
-
-
由 Jan Beulich 提交于
... regarding an unused function when !MIGRATION, and regarding a printk() format string vs argument mismatch. Signed-off-by: NJan Beulich <jbeulich@novell.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Li Zefan 提交于
If we had reserved some bytes in struct btrfs_ioctl_vol_args, we wouldn't have to create a new structure for async snapshot creation. Here we convert async snapshot ioctl to use a more generic ABI, as we'll add more ioctls for snapshots/subvolumes in the future, readonly snapshots for example. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Xin Zhong 提交于
This problem is found in meego testing: http://bugs.meego.com/show_bug.cgi?id=6672 A file in btrfs is mmaped and the mmaped buffer is passed to pwrite to write to the same page of the same file. In btrfs_file_aio_write(), the pages is locked by prepare_pages(). So when btrfs_copy_from_user() is called, page fault happens and the same page needs to be locked again in filemap_fault(). The fix is to move iov_iter_fault_in_readable() before prepage_pages() to make page fault happen before pages are locked. And also disable page fault in critical region in btrfs_copy_from_user(). Reviewed-by: Yan, Zheng<zheng.z.yan@intel.com> Signed-off-by: NZhong, Xin <xin.zhong@intel.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Li Zefan 提交于
We should drop dentry before deactivating the superblock, otherwise we can hit this bug: BUG: Dentry f349a690{i=100,n=/} still in use (1) [unmount of btrfs loop1] ... Steps to reproduce the bug: # mount /dev/loop1 /mnt # mkdir save # btrfs subvolume snapshot /mnt save/snap1 # umount /mnt # mount -o subvol=save/snap1 /dev/loop1 /mnt (crash) Reported-by: NMichael Niederle <mniederle@gmx.at> Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Sage Weil 提交于
We were incorrectly taking the async path even for the sync ioctls by passing in &transid unconditionally. There's ample room for further cleanup here, but this keeps the fix simple. Signed-off-by: NSage Weil <sage@newdream.net> Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan, Zheng 提交于
"start + num_bytes >= actual_end" can happen when compressed page writeback races with file truncation. In that case we need unlock and release pages past the end of file. Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
Not being able to delete an orphan item isn't a horrible thing. The worst that happens is the next time around we try and do the orphan cleanup and we can't find the referenced object and just delete the item and move on. Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Chuck Lever 提交于
After a few unsuccessful NFS mount attempts in which the client and server cannot agree on an authentication flavor both support, the client panics. nfs_umount() is invoked in the kernel in this case. Turns out nfs_umount()'s UMNT RPC invocation causes the RPC client to write off the end of the rpc_clnt's iostat array. This is because the mount client's nrprocs field is initialized with the count of defined procedures (two: MNT and UMNT), rather than the size of the client's proc array (four). The fix is to use the same initialization technique used by most other upper layer clients in the kernel. Introduced by commit 0b524123, which failed to update nrprocs when support was added for UMNT in the kernel. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=24302 BugLink: http://bugs.launchpad.net/bugs/683938Reported-by: NStefan Bader <stefan.bader@canonical.com> Tested-by: NStefan Bader <stefan.bader@canonical.com> Cc: stable@kernel.org # >= 2.6.32 Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
- 10 12月, 2010 5 次提交
-
-
由 Christoph Hellwig 提交于
Now that we don't mark VFS inodes dirty anymore for internal timestamp changes, but rely on the transaction subsystem to push them out, we need to explicitly log the source inode in rename after updating it's timestamps to make sure the changes actually get forced out by sync/fsync or an AIL push. We already account for the fourth inode in the log reservation, as a rename of directories needs to update the nlink field, so just adding the xfs_trans_log_inode call is enough. This fixes the xfsqa 065 regression introduced by: "xfs: don't use vfs writeback for pure metadata modifications" Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Josef Bacik 提交于
If the orphan item doesn't exist, we return 1, which doesn't make any sense to the callers. Instead return -ENOENT if we didn't find the item. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
Since the fast caching uses normal tree locking, we can possibly deadlock if we get to the caching via a btrfs_search_slot() on the tree_root. So just check to see if the root we are on is the tree root, and just don't do the fast caching. Reported-by: NSage Weil <sage@newdream.net> Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
Currently if the space cache inode generation number doesn't match the generation number in the space cache header we will just fail to load the space cache, but we won't mark the space cache as an error, so we'll keep getting that error each time somebody tries to cache that block group until we actually clear the thing. Fix this by marking the space cache as having an error so we only get the message once. This patch also makes it so that we don't try and setup space cache for a block group that isn't cached, since we won't be able to write it out anyway. None of these problems are actual problems, they are just annoying and sub-optimal. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
This fixes a bug where we use dip after we have freed it. Instead just use the file_offset that was passed to the function. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 09 12月, 2010 2 次提交
-
-
由 Suresh Jayaraman 提交于
As the FIXME points out correctly, now filldir() itself returns -EOVERFLOW if it not possible to represent the inode number supplied by the filesystem in the field provided by userspace. Signed-off-by: NSuresh Jayaraman <sjayaraman@suse.de> Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NSteve French <sfrench@us.ibm.com>
-
由 Neil Brown 提交于
If vfs_getattr in fill_post_wcc returns an error, we don't set fh_post_change. For NFSv4, this can result in set_change_info triggering a BUG_ON. i.e. fh_post_saved being zero isn't really a bug. So: - instead of BUGging when fh_post_saved is zero, just clear ->atomic. - if vfs_getattr fails in fill_post_wcc, take a copy of i_ctime anyway. This will be used i seg_change_info, but not overly trusted. - While we are there, remove the pointless 'if' statements in set_change_info. There is no harm setting all the values. Signed-off-by: NNeilBrown <neilb@suse.de> Cc: stable@kernel.org Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 08 12月, 2010 15 次提交
-
-
由 Trond Myklebust 提交于
When a nfs_page is freed, nfs_free_request is called which also calls nfs_clear_request to clean out the lock and open contexts and free the pagecache page. However, a couple of places in the nfs code call nfs_clear_request themselves. What happens here if the refcount on the request is still high? We'll be releasing contexts and freeing pointers while the request is possibly still in use. Remove those bare calls to nfs_clear_context. That should only be done when the request is being freed. Note that when doing this, we need to watch out for tests of req->wb_page. Previously, nfs_set_page_tag_locked() and nfs_clear_page_tag_locked() would check the value of req->wb_page to figure out if the page is mapped into the nfsi->nfs_page_tree. We now indicate the page is mapped using the new bit PG_MAPPED in req->wb_flags . Reported-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Mi Jinlong 提交于
When nfs client(kernel) don't support NFSv4, maybe user build kernel without NFSv4, there is a problem. Using command "mount SERVER-IP:/nfsv3 /mnt/" to mount NFSv3 filesystem, mount should should success, but fail and get error: "mount.nfs: an incorrect mount option was specified" System call mount "nfs"(not "nfs4") with "vers=4", if CONFIG_NFS_V4 is not defined, the "vers=4" will be parsed as invalid argument and kernel return EINVAL to nfs-utils. About that, we really want get EPROTONOSUPPORT rather than EINVAL. This path make sure kernel parses argument success, and return EPROTONOSUPPORT at nfs_validate_mount_data(). Signed-off-by: NMi Jinlong <mijinlong@cn.fujitsu.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Sergey Vlasov 提交于
The commit 129a84de (locks: fix F_GETLK regression (failure to find conflicts)) fixed the posix_test_lock() function by itself, however, its usage in NFS changed by the commit 9d6a8c5c (locks: give posix_test_lock same interface as ->lock) remained broken - subsequent NFS-specific locking code received F_UNLCK instead of the user-specified lock type. To fix the problem, fl->fl_type needs to be saved before the posix_test_lock() call and restored if no local conflicts were reported. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=23892Tested-by: NAlexander Morozov <amorozov@etersoft.ru> Signed-off-by: NSergey Vlasov <vsu@altlinux.ru> Cc: <stable@kernel.org> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Aneesh Kumar K.V 提交于
An update of mode bits can result in ACL value being changed. We need to mark the acl cache invalid when we update mode. Similarly we need to update file attribute when we change ACL value Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Lino Sanfilippo 提交于
We should not try to open a file descriptor for the overflow event since this will always fail. Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Eric Paris 提交于
If fanotify_init is unable to allocate a new fsnotify group it will return but will not drop its reference on the associated user struct. Drop that reference on error. Reported-by: NVegard Nossum <vegard.nossum@gmail.com> Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Eric Paris 提交于
If inotify_init is unable to allocate a new file for the new inotify group we leak the new group. This patch drops the reference on the group on file allocation failure. Reported-by: NVegard Nossum <vegard.nossum@gmail.com> cc: stable@kernel.org Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Lino Sanfilippo 提交于
When fanotify_release() is called, there may still be processes waiting for access permission. Currently only processes for which an event has already been queued into the groups access list will be woken up. Processes for which no event has been queued will continue to sleep and thus cause a deadlock when fsnotify_put_group() is called. Furthermore there is a race allowing further processes to be waiting on the access wait queue after wake_up (if they arrive before clear_marks_by_group() is called). This patch corrects this by setting a flag to inform processes that the group is about to be destroyed and thus not to wait for access permission. [additional changelog from eparis] Lets think about the 4 relevant code paths from the PoV of the 'operator' 'listener' 'responder' and 'closer'. Where operator is the process doing an action (like open/read) which could require permission. Listener is the task (or in this case thread) slated with reading from the fanotify file descriptor. The 'responder' is the thread responsible for responding to access requests. 'Closer' is the thread attempting to close the fanotify file descriptor. The 'operator' is going to end up in: fanotify_handle_event() get_response_from_access() (THIS BLOCKS WAITING ON USERSPACE) The 'listener' interesting code path fanotify_read() copy_event_to_user() prepare_for_access_response() (THIS CREATES AN fanotify_response_event) The 'responder' code path: fanotify_write() process_access_response() (REMOVE A fanotify_response_event, SET RESPONSE, WAKE UP 'operator') The 'closer': fanotify_release() (SUPPOSED TO CLEAN UP THE REST OF THIS MESS) What we have today is that in the closer we remove all of the fanotify_response_events and set a bit so no more response events are ever created in prepare_for_access_response(). The bug is that we never wake all of the operators up and tell them to move along. You fix that in fanotify_get_response_from_access(). You also fix other operators which haven't gotten there yet. So I agree that's a good fix. [/additional changelog from eparis] [remove additional changes to minimize patch size] [move initialization so it was inside CONFIG_FANOTIFY_PERMISSION] Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Lino Sanfilippo 提交于
In mark_remove_from_mask() we destroy marks that have their event mask cleared. Thus we should not allow the creation of those marks in the first place. With this patch we check if the mask given from user is 0 in case of FAN_MARK_ADD. If so we return an error. Same for FAN_MARK_REMOVE since this does not have any effect. Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Lino Sanfilippo 提交于
If adding a mount or inode mark failed fanotify_free_mark() is called explicitly. But at this time the mark has already been put into the destroy list of the fsnotify_mark kernel thread. If the thread is too slow it will try to decrease the reference of a mark, that has already been freed by fanotify_free_mark(). (If its fast enough it will only decrease the marks ref counter from 2 to 1 - note that the counter has been increased to 2 in add_mark() - which has practically no effect.) This patch fixes the ref counting by not calling free_mark() explicitly, but decreasing the ref counter and rely on the fsnotify_mark thread to cleanup in case adding the mark has failed. Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Lino Sanfilippo 提交于
Unsetting FMODE_NONOTIFY in fsnotify_open() is too late, since fsnotify_perm() is called before. If FMODE_NONOTIFY is set fsnotify_perm() will skip permission checks, so a user can still disable permission checks by setting this flag in an open() call. This patch corrects this by unsetting the flag before fsnotify_perm is called. Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Eric Paris 提交于
If no event was sent to userspace we cannot expect userspace to respond to permissions requests. Today such requests just hang forever. This patch will deny any permissions event which was unable to be sent to userspace. Reported-by: NTvrtko Ursulin <tvrtko.ursulin@sophos.com> Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Jeff Layton 提交于
It's possible that cifs_mount will call cifs_build_path_to_root on a newly instantiated cifs_sb. In that case, it's likely that the master_tlink pointer has not yet been instantiated. Fix this by having cifs_build_path_to_root take a cifsTconInfo pointer as well, and have the caller pass that in. Reported-and-Tested-by: NRobbert Kouprie <robbert@exx.nl> Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NSteve French <sfrench@us.ibm.com>
-
由 Jeff Layton 提交于
This function will return 0 if everything went ok. Commit 9d002df4 however added a block of code after the following check for rc == -EREMOTE. With that change and when rc == 0, doing the "goto mount_fail_check" here skips that code, leaving the tlink_tree and master_tlink pointer unpopulated. That causes an oops later in cifs_root_iget. Reported-and-Tested-by: NRobbert Kouprie <robbert@exx.nl> Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NSteve French <sfrench@us.ibm.com>
-
由 Trond Myklebust 提交于
No functional changes, but clarify the code. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-