1. 26 3月, 2011 1 次提交
  2. 25 3月, 2011 24 次提交
  3. 24 3月, 2011 15 次提交
    • A
      UBIFS: fix assertion warning and refine comments · 6ed09c34
      Artem Bityutskiy 提交于
      This patch fixes the following UBIFS assertion warning:
      
      UBIFS assert failed in do_readpage at 115 (pid 199)
      [<b00321b8>] (unwind_backtrace+0x0/0xdc) from [<af025118>]
      (do_readpage+0x108/0x594 [ubifs])
      [<af025118>] (do_readpage+0x108/0x594 [ubifs]) from [<af025764>]
      (ubifs_write_end+0x1c0/0x2e8 [ubifs])
      [<af025764>] (ubifs_write_end+0x1c0/0x2e8 [ubifs]) from
      [<b00a0164>] (generic_file_buffered_write+0x18c/0x270)
      [<b00a0164>] (generic_file_buffered_write+0x18c/0x270) from
      [<b00a08d4>] (__generic_file_aio_write+0x478/0x4c0)
      [<b00a08d4>] (__generic_file_aio_write+0x478/0x4c0) from
      [<b00a0984>] (generic_file_aio_write+0x68/0xc8)
      [<b00a0984>] (generic_file_aio_write+0x68/0xc8) from
      [<af024a78>] (ubifs_aio_write+0x178/0x1d8 [ubifs])
      [<af024a78>] (ubifs_aio_write+0x178/0x1d8 [ubifs]) from
      [<b00d104c>] (do_sync_write+0xb0/0x100)
      [<b00d104c>] (do_sync_write+0xb0/0x100) from [<b00d1abc>]
      (vfs_write+0xac/0x154)
      [<b00d1abc>] (vfs_write+0xac/0x154) from [<b00d1c10>]
      (sys_write+0x3c/0x68)
      [<b00d1c10>] (sys_write+0x3c/0x68) from [<b002d9a0>]
      (ret_fast_syscall+0x0/0x2c)
      
      The 'PG_checked' flag is used to indicate that the page does not
      supposedly exist on the media (e.g., a hole or a page beyond the
      inode size), so it requires slightly bigger budget, because we have
      to account the indexing size increase. And this flag basically
      tells that the budget for this page has to be "new page budget".
      The "new page budget" is slightly bigger than the "existing page
      budget".
      
      The 'do_readpage()' function has the following assertion which
      sometimes is hit: 'ubifs_assert(!PageChecked(page))'. Obviously,
      the meaning of this assertion is: "I should not be asked to read
      a page which does not exist on the media".
      
      However, in 'ubifs_write_begin()' we have a small "trick". Notice,
      that VFS may write pages which were not read yet, so the page data
      were not loaded from the media to the page cache yet. If VFS tells
      that it is going to change only some part of the page, we obviously
      have to load it from the media. However, if VFS tells that it is
      going to change whole page, we do not read it from the media for
      optimization purposes.
      
      However, since we do not read it, we do not know if it exists on
      the media or not (a hole, etc). So we set the 'PG_checked' flag
      to this page to force bigger budget, just in case.
      
      So 'ubifs_write_begin()' sets 'PG_checked'. Then we are in
      'ubifs_write_end()'. And VFS tells us: "hey, for some reasons I
      changed my mind and did not change whole page". Frankly, I do not
      know why this happens, but I hit this somehow on an ARM platform.
      And this is extremely rare.
      
      So in this case UBIFS does the following:
      
      1. Cancels allocated budget.
      2. Loads the page from the media by calling 'do_readpage()'.
      3. Asks VFS to repeat the whole write operation from the very
         beginning (call '->write_begin() again, etc).
      
      And the assertion warning is hit at the step 2 - remember we have
      the 'PG_checked' set for this page, and 'do_readpage()' does not
      like this. So this patch fixes the problem by adding step 1.5 and
      cleaning the 'PG_checked' before calling 'do_readpage()'.
      
      All in all, this patch does not fix any functionality issue, but it
      silences UBIFS false positive warning which may happen in very very
      rare cases.
      
      And while on it, this patch also improves a commentary which explains
      the reasons of setting the 'PG_checked' flag for the page. The old
      commentary was a bit difficult to understand.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      6ed09c34
    • A
      UBIFS: kill CONFIG_UBIFS_FS_DEBUG_CHKS · 9d523caf
      Artem Bityutskiy 提交于
      Simplify UBIFS configuration menu and kill the option to enable self-check
      compile-time. We do not really need this because we can do this run-time
      using the module parameters or the corresponding sysfs interfaces. And
      there is a value in simplifying the kernel configuration menu which becomes
      increasingly large.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      9d523caf
    • A
      UBIFS: use GFP_NOFS properly · fc5e58c0
      Artem Bityutskiy 提交于
      This patch fixes a brown-paperbag bug which was introduced by me:
      I used incorrect "GFP_KERNEL | GFP_NOFS" allocation flags to make
      sure my allocations do not cause write-back. But the correct form
      is "GFP_NOFS".
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      fc5e58c0
    • S
      userns: rename is_owner_or_cap to inode_owner_or_capable · 2e149670
      Serge E. Hallyn 提交于
      And give it a kernel-doc comment.
      
      [akpm@linux-foundation.org: btrfs changed in linux-next]
      Signed-off-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Daniel Lezcano <daniel.lezcano@free.fr>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2e149670
    • S
      userns: userns: check user namespace for task->file uid equivalence checks · e795b717
      Serge E. Hallyn 提交于
      Cheat for now and say all files belong to init_user_ns.  Next step will be
      to let superblocks belong to a user_ns, and derive inode_userns(inode)
      from inode->i_sb->s_user_ns.  Finally we'll introduce more flexible
      arrangements.
      
      Changelog:
      	Feb 15: make is_owner_or_cap take const struct inode
      	Feb 23: make is_owner_or_cap bool
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NDaniel Lezcano <daniel.lezcano@free.fr>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e795b717
    • O
      procfs: kill the global proc_mnt variable · 52e9fc76
      Oleg Nesterov 提交于
      After the previous cleanup in proc_get_sb() the global proc_mnt has no
      reasons to exists, kill it.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Acked-by: NSerge E. Hallyn <serge@hallyn.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52e9fc76
    • E
      pidns: call pid_ns_prepare_proc() from create_pid_namespace() · 4308eebb
      Eric W. Biederman 提交于
      Reorganize proc_get_sb() so it can be called before the struct pid of the
      first process is allocated.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Acked-by: NSerge E. Hallyn <serge@hallyn.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4308eebb
    • P
      sysctl: add some missing input constraint checks · cb16e95f
      Petr Holasek 提交于
      Add boundaries of allowed input ranges for: dirty_expire_centisecs,
      drop_caches, overcommit_memory, page-cluster and panic_on_oom.
      Signed-off-by: NPetr Holasek <pholasek@redhat.com>
      Acked-by: NDave Young <hidave.darkstar@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb16e95f
    • K
      proc: protect mm start_code/end_code in /proc/pid/stat · 5883f57c
      Kees Cook 提交于
      While mm->start_stack was protected from cross-uid viewing (commit
      f83ce3e6 ("proc: avoid information leaks to non-privileged
      processes")), the start_code and end_code values were not.  This would
      allow the text location of a PIE binary to leak, defeating ASLR.
      
      Note that the value "1" is used instead of "0" for a protected value since
      "ps", "killall", and likely other readers of /proc/pid/stat, take
      start_code of "0" to mean a kernel thread and will misbehave.  Thanks to
      Brad Spengler for pointing this out.
      
      Addresses CVE-2011-0726
      Signed-off-by: NKees Cook <kees.cook@canonical.com>
      Cc: <stable@kernel.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Eugene Teo <eugeneteo@kernel.sg>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Brad Spengler <spender@grsecurity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5883f57c
    • A
      proc: make struct proc_dir_entry::namelen unsigned int · 312ec7e5
      Alexey Dobriyan 提交于
      1. namelen is declared "unsigned short" which hints for "maybe space savings".
         Indeed in 2.4 struct proc_dir_entry looked like:
      
              struct proc_dir_entry {
                      unsigned short low_ino;
                      unsigned short namelen;
      
         Now, low_ino is "unsigned int", all savings were gone for a long time.
         "struct proc_dir_entry" is not that countless to worry about it's size,
         anyway.
      
      2. converting from unsigned short to int/unsigned int can only create
         problems, we better play it safe.
      
      Space is not really conserved, because of natural alignment for the next
      field.  sizeof(struct proc_dir_entry) remains the same.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      312ec7e5
    • J
      procfs: fix some wrong error code usage · fc3d8767
      Jovi Zhang 提交于
      [root@wei 1]# cat /proc/1/mem
      cat: /proc/1/mem: No such process
      
      error code -ESRCH is wrong in this situation.  Return -EPERM instead.
      Signed-off-by: NJovi Zhang <bookjovi@gmail.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fc3d8767
    • A
      procfs: fix /proc/<pid>/maps heap check · 0db0c01b
      Aaro Koskinen 提交于
      The current code fails to print the "[heap]" marking if the heap is split
      into multiple mappings.
      
      Fix the check so that the marking is displayed in all possible cases:
      	1. vma matches exactly the heap
      	2. the heap vma is merged e.g. with bss
      	3. the heap vma is splitted e.g. due to locked pages
      
      Test cases. In all cases, the process should have mapping(s) with
      [heap] marking:
      
      	(1) vma matches exactly the heap
      
      	#include <stdio.h>
      	#include <unistd.h>
      	#include <sys/types.h>
      
      	int main (void)
      	{
      		if (sbrk(4096) != (void *)-1) {
      			printf("check /proc/%d/maps\n", (int)getpid());
      			while (1)
      				sleep(1);
      		}
      		return 0;
      	}
      
      	# ./test1
      	check /proc/553/maps
      	[1] + Stopped                    ./test1
      	# cat /proc/553/maps | head -4
      	00008000-00009000 r-xp 00000000 01:00 3113640    /test1
      	00010000-00011000 rw-p 00000000 01:00 3113640    /test1
      	00011000-00012000 rw-p 00000000 00:00 0          [heap]
      	4006f000-40070000 rw-p 00000000 00:00 0
      
      	(2) the heap vma is merged
      
      	#include <stdio.h>
      	#include <unistd.h>
      	#include <sys/types.h>
      
      	char foo[4096] = "foo";
      	char bar[4096];
      
      	int main (void)
      	{
      		if (sbrk(4096) != (void *)-1) {
      			printf("check /proc/%d/maps\n", (int)getpid());
      			while (1)
      				sleep(1);
      		}
      		return 0;
      	}
      
      	# ./test2
      	check /proc/556/maps
      	[2] + Stopped                    ./test2
      	# cat /proc/556/maps | head -4
      	00008000-00009000 r-xp 00000000 01:00 3116312    /test2
      	00010000-00012000 rw-p 00000000 01:00 3116312    /test2
      	00012000-00014000 rw-p 00000000 00:00 0          [heap]
      	4004a000-4004b000 rw-p 00000000 00:00 0
      
      	(3) the heap vma is splitted (this fails without the patch)
      
      	#include <stdio.h>
      	#include <unistd.h>
      	#include <sys/mman.h>
      	#include <sys/types.h>
      
      	int main (void)
      	{
      		if ((sbrk(4096) != (void *)-1) && !mlockall(MCL_FUTURE) &&
      		    (sbrk(4096) != (void *)-1)) {
      			printf("check /proc/%d/maps\n", (int)getpid());
      			while (1)
      				sleep(1);
      		}
      		return 0;
      	}
      
      	# ./test3
      	check /proc/559/maps
      	[1] + Stopped                    ./test3
      	# cat /proc/559/maps|head -4
      	00008000-00009000 r-xp 00000000 01:00 3119108    /test3
      	00010000-00011000 rw-p 00000000 01:00 3119108    /test3
      	00011000-00012000 rw-p 00000000 00:00 0          [heap]
      	00012000-00013000 rw-p 00000000 00:00 0          [heap]
      
      It looks like the bug has been there forever, and since it only results in
      some information missing from a procfile, it does not fulfil the -stable
      "critical issue" criteria.
      Signed-off-by: NAaro Koskinen <aaro.koskinen@nokia.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0db0c01b
    • K
      proc: hide kernel addresses via %pK in /proc/<pid>/stack · 51e03149
      Konstantin Khlebnikov 提交于
      This file is readable for the task owner.  Hide kernel addresses from
      unprivileged users, leave them function names and offsets.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Acked-by: NKees Cook <kees.cook@canonical.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      51e03149
    • A
      bitops: remove minix bitops from asm/bitops.h · 61f2e7b0
      Akinobu Mita 提交于
      minix bit operations are only used by minix filesystem and useless by
      other modules.  Because byte order of inode and block bitmaps is different
      on each architecture like below:
      
      m68k:
      	big-endian 16bit indexed bitmaps
      
      h8300, microblaze, s390, sparc, m68knommu:
      	big-endian 32 or 64bit indexed bitmaps
      
      m32r, mips, sh, xtensa:
      	big-endian 32 or 64bit indexed bitmaps for big-endian mode
      	little-endian bitmaps for little-endian mode
      
      Others:
      	little-endian bitmaps
      
      In order to move minix bit operations from asm/bitops.h to architecture
      independent code in minix filesystem, this provides two config options.
      
      CONFIG_MINIX_FS_BIG_ENDIAN_16BIT_INDEXED is only selected by m68k.
      CONFIG_MINIX_FS_NATIVE_ENDIAN is selected by the architectures which use
      native byte order bitmaps (h8300, microblaze, s390, sparc, m68knommu,
      m32r, mips, sh, xtensa).  The architectures which always use little-endian
      bitmaps do not select these options.
      
      Finally, we can remove minix bit operations from asm/bitops.h for all
      architectures.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NGreg Ungerer <gerg@uclinux.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Andreas Schwab <schwab@linux-m68k.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NPaul Mundt <lethal@linux-sh.org>
      Cc: Chris Zankel <chris@zankel.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      61f2e7b0
    • A
      bitops: remove ext2 non-atomic bitops from asm/bitops.h · f312eff8
      Akinobu Mita 提交于
      As the result of conversions, there are no users of ext2 non-atomic bit
      operations except for ext2 filesystem itself.  Now we can put them into
      architecture independent code in ext2 filesystem, and remove from
      asm/bitops.h for all architectures.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f312eff8