- 19 11月, 2012 4 次提交
-
-
由 Eric W. Biederman 提交于
Track the number of pids in the proc hash table. When the number of pids goes to 0 schedule work to unmount the kernel mount of proc. Move the mount of proc into alloc_pid when we allocate the pid for init. Remove the surprising calls of pid_ns_release proc in fork and proc_flush_task. Those code paths really shouldn't know about proc namespace implementation details and people have demonstrated several times that finding and understanding those code paths is difficult and non-obvious. Because of the call path detach pid is alwasy called with the rtnl_lock held free_pid is not allowed to sleep, so the work to unmounting proc is moved to a work queue. This has the side benefit of not blocking the entire world waiting for the unnecessary rcu_barrier in deactivate_locked_super. In the process of making the code clear and obvious this fixes a bug reported by Gao feng <gaofeng@cn.fujitsu.com> where we would leak a mount of proc during clone(CLONE_NEWPID|CLONE_NEWNET) if copy_pid_ns succeeded and copy_net_ns failed. Acked-by: N"Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
The expressions tsk->nsproxy->pid_ns and task_active_pid_ns aka ns_of_pid(task_pid(tsk)) should have the same number of cache line misses with the practical difference that ns_of_pid(task_pid(tsk)) is released later in a processes life. Furthermore by using task_active_pid_ns it becomes trivial to write an unshare implementation for the the pid namespace. So I have used task_active_pid_ns everywhere I can. In fork since the pid has not yet been attached to the process I use ns_of_pid, to achieve the same effect. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Now that we have s_fs_info pointing to our pid namespace the original reason for the proc root inode having a struct pid is gone. Caching a pid in the root inode has led to some complicated code. Now that we don't need the struct pid, just remove it. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
I had visions at one point of splitting proc into two filesystems. If that had happened proc/self being the the part of proc that actually deals with pids would have been a nice cleanup. As it is proc/self requires a lot of unnecessary infrastructure for a single file. The only user visible change is that a mounted /proc for a pid namespace that is dead now shows a broken proc symlink, instead of being completely invisible. I don't think anyone will notice or care. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
- 15 11月, 2012 2 次提交
-
-
由 Eric W. Biederman 提交于
Use kuid_t and kgid_t in struct fuse_conn and struct fuse_mount_data. The connection between between a fuse filesystem and a fuse daemon is established when a fuse filesystem is mounted and provided with a file descriptor the fuse daemon created by opening /dev/fuse. For now restrict the communication of uids and gids between the fuse filesystem and the fuse daemon to the initial user namespace. Enforce this by verifying the file descriptor passed to the mount of fuse was opened in the initial user namespace. Ensuring the mount happens in the initial user namespace is not necessary as mounts from non-initial user namespaces are not yet allowed. In fuse_req_init_context convert the currrent fsuid and fsgid into the initial user namespace for the request that will be sent to the fuse daemon. In fuse_fill_attr convert the uid and gid passed from the fuse daemon from the initial user namespace into kuids and kgids. In iattr_to_fattr called from fuse_setattr convert kuids and kgids into the uids and gids in the initial user namespace before passing them to the fuse filesystem. In fuse_change_attributes_common called from fuse_dentry_revalidate, fuse_permission, fuse_geattr, and fuse_setattr, and fuse_iget convert the uid and gid from the fuse daemon into a kuid and a kgid to store on the fuse inode. By default fuse mounts are restricted to task whose uid, suid, and euid matches the fuse user_id and whose gid, sgid, and egid matches the fuse group id. Convert the user_id and group_id mount options into kuids and kgids at mount time, and use uid_eq and gid_eq to compare the in fuse_allow_task. Cc: Miklos Szeredi <miklos@szeredi.hu> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Use kuid_t and kgid_t in struct autofs_info and struct autofs_wait_queue. When creating directories and symlinks default the uid and gid of the mount requester to the global root uid and gid. autofs4_wait will update these fields when a mount is requested. When generating autofsv5 packets report the uid and gid of the mount requestor in user namespace of the process that opened the pipe, reporting unmapped uids and gids as overflowuid and overflowgid. In autofs_dev_ioctl_requester return the uid and gid of the last mount requester converted into the calling processes user namespace. When the uid or gid don't map return overflowuid and overflowgid as appropriate, allowing failure to find a mount requester to be distinguished from failure to map a mount requester. The uid and gid mount options specifying the user and group of the root autofs inode are converted into kuid and kgid as they are parsed defaulting to the current uid and current gid of the process that mounts autofs. Mounting of autofs for the present remains confined to processes in the initial user namespace. Cc: Ian Kent <raven@themaw.net> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
- 29 10月, 2012 1 次提交
-
-
由 Mikulas Patocka 提交于
Functions generic_file_splice_read and generic_file_splice_write access the pagecache directly. For block devices these functions must be locked so that block size is not changed while they are in progress. This patch is an additional fix for commit b87570f5 ("Fix a crash when block device is read and block size is changed at the same time") that locked aio_read, aio_write and mmap against block size change. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 10月, 2012 1 次提交
-
-
由 Linus Torvalds 提交于
In commit 800179c9 ("This adds symlink and hardlink restrictions to the Linux VFS"), the new link protections were enabled by default, in the hope that no actual application would care, despite it being technically against legacy UNIX (and documented POSIX) behavior. However, it does turn out to break some applications. It's rare, and it's unfortunate, but it's unacceptable to break existing systems, so we'll have to default to legacy behavior. In particular, it has broken the way AFD distributes files, see http://www.dwd.de/AFD/ along with some legacy scripts. Distributions can end up setting this at initrd time or in system scripts: if you have security problems due to link attacks during your early boot sequence, you have bigger problems than some kernel sysctl setting. Do: echo 1 > /proc/sys/fs/protected_symlinks echo 1 > /proc/sys/fs/protected_hardlinks to re-enable the link protections. Alternatively, we may at some point introduce a kernel config option that sets these kinds of "more secure but not traditional" behavioural options automatically. Reported-by: NNick Bowler <nbowler@elliptictech.com> Reported-by: NHolger Kiehl <Holger.Kiehl@dwd.de> Cc: Kees Cook <keescook@chromium.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org # v3.6 Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 10月, 2012 13 次提交
-
-
由 Kees Cook 提交于
The compat ioctl for VIDEO_SET_SPU_PALETTE was missing an error check while converting ioctl arguments. This could lead to leaking kernel stack contents into userspace. Patch extracted from existing fix in grsecurity. Signed-off-by: NKees Cook <keescook@chromium.org> Cc: David Miller <davem@davemloft.net> Cc: Brad Spengler <spender@grsecurity.net> Cc: PaX Team <pageexec@freemail.hu> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
flush_old_exec() clears PF_KTHREAD but forgets about PF_NOFREEZE. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NTejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Josef Bacik 提交于
We BUG if we fail to commit the transaction when creating a snapshot, which is just obnoxious. Remove the BUG_ON(). Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Liu Bo 提交于
After cloning root's node, we forgot to dec the src's ref which can lead to a memory leak. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Josef Bacik 提交于
On a really full file system I was getting ENOSPC back from btrfs_update_inode when trying to update the parent inode when creating a snapshot. Just use the fallback method so we can update the inode and not have to worry about having a delayed ref. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Alex Lyakas 提交于
This patch also requires a change in the user-space part of "receive". We need to use "lchown" instead of "chown". We will do this in the following patch. Signed-off-by: NAlex Lyakas <alex.btrfs@zadarastorage.com> if (S_ISREG(sctx->cur_inode_mode)) {
-
由 Miao Xie 提交于
Steps to reproduce: # mkfs.btrfs -m raid1 <disk1> <disk2> # btrfstune -S 1 <disk1> # mount <disk1> <mnt> # btrfs device add <disk3> <disk4> <mnt> # mount -o remount,rw <mnt> # dd if=/dev/zero of=<mnt>/tmpfile bs=1M count=1 Deadlock happened. It is because of the nested chunk allocation. When we wrote the data into the filesystem, we would allocate the data chunk because there was no data chunk in the filesystem. At the end of the data chunk allocation, we should insert the metadata of the data chunk into the extent tree, but there was no raid1 chunk, so we tried to lock the chunk allocation mutex to allocate the new chunk, but we had held the mutex, the deadlock happened. By rights, we would allocate the raid1 chunk when we added the second device because the profile of the seed filesystem is raid1 and we had two devices. But we didn't do that in fact. It is because the last step of the first device insertion didn't commit the transaction. So when we added the second device, we didn't cow the tree, and just inserted the relative metadata into the leaves which were generated by the first device insertion, and its profile was dup. So, I fix this problem by commiting the transaction at the end of the first device insertion. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
-
由 Lukas Czerner 提交于
Currently if len argument in btrfs_ioctl_fitrim() is smaller than one FSB we will continue and finally return 0 bytes discarded. However if the length to discard is smaller then file system block we should really return EINVAL. Signed-off-by: NLukas Czerner <lczerner@redhat.com>
-
由 Tsutomu Itoh 提交于
We should free quota_root before returning from the error handling code. Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
-
由 Arne Jansen 提交于
When sending a device file, the stream was missing the mode. Also the rdev was encoded wrongly. Signed-off-by: NArne Jansen <sensille@gmx.net>
-
由 Jan Schmidt 提交于
This adds support for the new extended inode refs to btrfs send. Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
-
由 Stefan Behrens 提交于
gcc says "warning: comparison of unsigned expression >= 0 is always true" because i is an unsigned long. And gcc is right this time. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
-
由 Gabriel de Perthuis 提交于
To see the problem, create many hardlinks to the same file (120 should do it), then look up paths by inode with: ls -i btrfs inspect inode-resolve -v $ino /mnt/btrfs I noticed the memory layout of the fspath->val data had some irregularities (some unnecessary gaps that stop appearing about halfway), so I'm not sure there aren't any bugs left in it.
-
- 25 10月, 2012 1 次提交
-
-
由 Geert Uytterhoeven 提交于
The warning check for duplicate sysfs entries can cause a buffer overflow when printing the warning, as strcat() doesn't check buffer sizes. Use strlcat() instead. Since strlcat() doesn't return a pointer to the passed buffer, unlike strcat(), I had to convert the nested concatenation in sysfs_add_one() to an admittedly more obscure comma operator construct, to avoid emitting code for the concatenation if CONFIG_BUG is disabled. Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Cc: stable@vger.kernel.org Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 24 10月, 2012 6 次提交
-
-
由 Trond Myklebust 提交于
The current code is clearing it in all cases _except_ when zero. Reported-by: NStanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
-
由 Trond Myklebust 提交于
Commit e9406db2 (lockd: per-net NSM client creation and destruction helpers introduced) contains a nasty race on initialisation of the per-net NSM client because it doesn't check whether or not the client is set after grabbing the nsm_create_mutex. Reported-by: NNix <nix@esperi.org.uk> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
-
由 Jan Schmidt 提交于
Emphasis the way tree_mod_log_insert_move avoids adding MOD_LOG_KEY_REMOVE_WHILE_MOVING operations, depending on the direction of the move operation. Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
-
由 Jan Schmidt 提交于
In get_old_root we grab a lock on the extent buffer before we obtain a reference on that buffer. That order is changed now. Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
-
由 Jan Schmidt 提交于
In btrfs_find_all_roots' termination condition, we compare the level of the old buffer we got from btrfs_search_old_slot to the level of the current root node. We'd better compare it to the level of the rewinded root node. Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
-
由 Jan Schmidt 提交于
Tree mod log treated old root buffers as always empty buffers when starting the rewind operations. However, the old root may still be part of the current tree at a lower level, with still some valid entries. Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
-
- 23 10月, 2012 3 次提交
-
-
由 Jan Schmidt 提交于
Avoid the implicit free by tree_mod_log_set_root_pointer, which is wrong in two places. Where needed, we call tree_mod_log_free_eb explicitly now. Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
-
由 Jan Schmidt 提交于
Independant of the check (push_items < src_items) tree_mod_log_eb_copy did log the removal of the old data entries from the source buffer. Therefore, we must not call tree_mod_log_eb_move if the check evaluates to true, as that would log the removal twice, finally resulting in (rewinded) buffers with wrong values for header_nritems. Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
-
由 Lukas Czerner 提交于
Currently if len argument in ext4_trim_fs() is smaller than one block, the 'end' variable underflow. Avoid that by returning EINVAL if len is smaller than file system block. Also remove useless unlikely(). Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 22 10月, 2012 2 次提交
-
-
由 Dmitry Torokhov 提交于
In certain cases (for example when a cdev structure is embedded into another object whose lifetime is controlled by a separate kobject) it is beneficial to tie lifetime of another object to the lifetime of character device so that related object is not freed until after char_dev object is freed. To achieve this let's pin kobject's parent when doing cdev_add() and unpin when last reference to cdev structure is being released. Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tao Ma 提交于
In mke2fs, we only checksum the whole bitmap block and it is right. While in the kernel, we use EXT4_BLOCKS_PER_GROUP to indicate the size of the checksumed bitmap which is wrong when we enable bigalloc. The right size should be EXT4_CLUSTERS_PER_GROUP and this patch fixes it. Also as every caller of ext4_block_bitmap_csum_set and ext4_block_bitmap_csum_verify pass in EXT4_BLOCKS_PER_GROUP(sb)/8, we'd better removes this parameter and sets it in the function itself. Signed-off-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NLukas Czerner <lczerner@redhat.com> Cc: stable@vger.kernel.org
-
- 20 10月, 2012 1 次提交
-
-
由 KAMEZAWA Hiroyuki 提交于
/proc/<pid>/numa_maps scans vma and show mempolicy under mmap_sem. It sometimes accesses task->mempolicy which can be freed without mmap_sem and numa_maps can show some garbage while scanning. This patch tries to take reference count of task->mempolicy at reading numa_maps before calling get_vma_policy(). By this, task->mempolicy will not be freed until numa_maps reaches its end. V2->v3 - updated comments to be more verbose. - removed task_lock() in numa_maps code. V1->V2 - access task->mempolicy only once and remember it. Becase kernel/exit.c can overwrite it. Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: NDavid Rientjes <rientjes@google.com> Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 10月, 2012 1 次提交
-
-
由 David Rientjes 提交于
Commit 38f38657 ("xattr: extract simple_xattr code from tmpfs") moved some code from tmpfs but introduced a subtle bug along the way. If the name passed to simple_xattr_remove() does not exist in the list of xattrs, then it is possible to call kfree(new_xattr) when new_xattr is actually initialized to itself on the stack via uninitialized_var(). This causes a BUG() since the memory was not allocated via the slab allocator and was not bypassed through to the page allocator because it was too large. Initialize the local variable to NULL so the kfree() never takes place. Reported-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NDavid Rientjes <rientjes@google.com> Acked-by: NHugh Dickins <hughd@google.com> Acked-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 10月, 2012 5 次提交
-
-
由 Lukas Czerner 提交于
Currently when 'range->start' is beyond the end of file system nothing is done and that fact is ignored, where in fact we should return EINVAL. The same problem is when 'range.len' is smaller than file system block. Fix this by adding check for such conditions and return EINVAL appropriately. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Acked-by: NTino Reichardt <milky-kernel@mcmilk.de> Signed-off-by: NDave Kleikamp <dave.kleikamp@oracle.com>
-
由 Trond Myklebust 提交于
If the filehandle is stale, or open access is denied for some reason, nlm_fopen() may return one of the NLMv4-specific error codes nlm4_stale_fh or nlm4_failed. These get passed right through nlm_lookup_file(), and so when nlmsvc_retrieve_args() calls the latter, it needs to filter the result through the cast_status() machinery. Failure to do so, will trigger the BUG_ON() in encode_nlm_stat... Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Reported-by: NLarry McVoy <lm@bitmover.com> Cc: stable@kernel.org Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 David Rientjes 提交于
When reading /proc/pid/numa_maps, it's possible to return the contents of the stack where the mempolicy string should be printed if the policy gets freed from beneath us. This happens because mpol_to_str() may return an error the stack-allocated buffer is then printed without ever being stored. There are two possible error conditions in mpol_to_str(): - if the buffer allocated is insufficient for the string to be stored, and - if the mempolicy has an invalid mode. The first error condition is not triggered in any of the callers to mpol_to_str(): at least 50 bytes is always allocated on the stack and this is sufficient for the string to be written. A future patch should convert this into BUILD_BUG_ON() since we know the maximum strlen possible, but that's not -rc material. The second error condition is possible if a race occurs in dropping a reference to a task's mempolicy causing it to be freed during the read(). The slab poison value is then used for the mode and mpol_to_str() returns -EINVAL. This race is only possible because get_vma_policy() believes that mm->mmap_sem protects task->mempolicy, which isn't true. The exit path does not hold mm->mmap_sem when dropping the reference or setting task->mempolicy to NULL: it uses task_lock(task) instead. Thus, it's required for the caller of a task mempolicy to hold task_lock(task) while grabbing the mempolicy and reading it. Callers with a vma policy store their mempolicy earlier and can simply increment the reference count so it's guaranteed not to be freed. Reported-by: NDave Jones <davej@redhat.com> Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Al Viro 提交于
replace_fd() began with "eats a reference, tries to insert into descriptor table" semantics; at some point I'd switched it to much saner current behaviour ("try to insert into descriptor table, grabbing a new reference if inserted; caller should do fput() in any case"), but forgot to update the callers. Mea culpa... [Spotted by Pavel Roskin, who has really weird system with pipe-fed coredumps as part of what he considers a normal boot ;-)] Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Trond Myklebust 提交于
returning PTR_ERR(cb_info->task) just after we have set it to NULL looks like a typo... Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
-