1. 11 1月, 2012 17 次提交
    • D
      GFS2: fail mount if journal recovery fails · 376d3778
      David Teigland 提交于
      If the first mounter fails to recover one of the journals
      during mount, the mount should fail.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      376d3778
    • D
      GFS2: let spectator mount do read only recovery · e8ca5cc5
      David Teigland 提交于
      Previously, a spectator mount would not even attempt to do
      journal recovery for a failed node.  This meant that if all
      mounted nodes were spectators, everyone would be stuck after
      a node failed, all waiting for recovery to be performed.
      This is unnecessary since the failed node had a clean journal.
      
      Instead, allow a spectator mount to do a partial "read only"
      recovery, which means it will check if the failed journal is
      clean, and if so, report a successful recovery.  If the failed
      journal is not clean, it reports that journal recovery failed.
      This makes it work the same as a read only mount on a read only
      block device.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e8ca5cc5
    • B
      GFS2: Fix a use-after-free that coverity spotted · 49528b4e
      Bob Peterson 提交于
      In function gfs2_inplace_release it was trying to unlock a gfs2_holder
      structure associated with a reservation, after said reservation was
      freed. The problem is that the statements have the wrong order.
      This patch corrects the order so that the reservation is freed after
      the gfs2_holder is unlocked.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      49528b4e
    • D
      GFS2: dlm based recovery coordination · e0c2a9aa
      David Teigland 提交于
      This new method of managing recovery is an alternative to
      the previous approach of using the userland gfs_controld.
      
      - use dlm slot numbers to assign journal id's
      - use dlm recovery callbacks to initiate journal recovery
      - use a dlm lock to determine the first node to mount fs
      - use a dlm lock to track journals that need recovery
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e0c2a9aa
    • V
      procfs: add hidepid= and gid= mount options · 0499680a
      Vasiliy Kulikov 提交于
      Add support for mount options to restrict access to /proc/PID/
      directories.  The default backward-compatible "relaxed" behaviour is left
      untouched.
      
      The first mount option is called "hidepid" and its value defines how much
      info about processes we want to be available for non-owners:
      
      hidepid=0 (default) means the old behavior - anybody may read all
      world-readable /proc/PID/* files.
      
      hidepid=1 means users may not access any /proc/<pid>/ directories, but
      their own.  Sensitive files like cmdline, sched*, status are now protected
      against other users.  As permission checking done in proc_pid_permission()
      and files' permissions are left untouched, programs expecting specific
      files' modes are not confused.
      
      hidepid=2 means hidepid=1 plus all /proc/PID/ will be invisible to other
      users.  It doesn't mean that it hides whether a process exists (it can be
      learned by other means, e.g.  by kill -0 $PID), but it hides process' euid
      and egid.  It compicates intruder's task of gathering info about running
      processes, whether some daemon runs with elevated privileges, whether
      another user runs some sensitive program, whether other users run any
      program at all, etc.
      
      gid=XXX defines a group that will be able to gather all processes' info
      (as in hidepid=0 mode).  This group should be used instead of putting
      nonroot user in sudoers file or something.  However, untrusted users (like
      daemons, etc.) which are not supposed to monitor the tasks in the whole
      system should not be added to the group.
      
      hidepid=1 or higher is designed to restrict access to procfs files, which
      might reveal some sensitive private information like precise keystrokes
      timings:
      
      http://www.openwall.com/lists/oss-security/2011/11/05/3
      
      hidepid=1/2 doesn't break monitoring userspace tools.  ps, top, pgrep, and
      conky gracefully handle EPERM/ENOENT and behave as if the current user is
      the only user running processes.  pstree shows the process subtree which
      contains "pstree" process.
      
      Note: the patch doesn't deal with setuid/setgid issues of keeping
      preopened descriptors of procfs files (like
      https://lkml.org/lkml/2011/2/7/368).  We rely on that the leaked
      information like the scheduling counters of setuid apps doesn't threaten
      anybody's privacy - only the user started the setuid program may read the
      counters.
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Theodore Tso <tytso@MIT.EDU>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: James Morris <jmorris@namei.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0499680a
    • V
      procfs: parse mount options · 97412950
      Vasiliy Kulikov 提交于
      Add support for procfs mount options.  Actual mount options are coming in
      the next patches.
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Theodore Tso <tytso@MIT.EDU>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: James Morris <jmorris@namei.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97412950
    • P
      procfs: introduce the /proc/<pid>/map_files/ directory · 640708a2
      Pavel Emelyanov 提交于
      This one behaves similarly to the /proc/<pid>/fd/ one - it contains
      symlinks one for each mapping with file, the name of a symlink is
      "vma->vm_start-vma->vm_end", the target is the file.  Opening a symlink
      results in a file that point exactly to the same inode as them vma's one.
      
      For example the ls -l of some arbitrary /proc/<pid>/map_files/
      
       | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80403000-7f8f80404000 -> /lib64/libc-2.5.so
       | lr-x------ 1 root root 64 Aug 26 06:40 7f8f8061e000-7f8f80620000 -> /lib64/libselinux.so.1
       | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80826000-7f8f80827000 -> /lib64/libacl.so.1.1.0
       | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a2f000-7f8f80a30000 -> /lib64/librt-2.5.so
       | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a30000-7f8f80a4c000 -> /lib64/ld-2.5.so
      
      This *helps* checkpointing process in three ways:
      
      1. When dumping a task mappings we do know exact file that is mapped
         by particular region.  We do this by opening
         /proc/$pid/map_files/$address symlink the way we do with file
         descriptors.
      
      2. This also helps in determining which anonymous shared mappings are
         shared with each other by comparing the inodes of them.
      
      3. When restoring a set of processes in case two of them has a mapping
         shared, we map the memory by the 1st one and then open its
         /proc/$pid/map_files/$address file and map it by the 2nd task.
      
      Using /proc/$pid/maps for this is quite inconvenient since it brings
      repeatable re-reading and reparsing for this text file which slows down
      restore procedure significantly.  Also as being pointed in (3) it is a way
      easier to use top level shared mapping in children as
      /proc/$pid/map_files/$address when needed.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [gorcunov@openvz.org: make map_files depend on CHECKPOINT_RESTORE]
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Reviewed-by: NVasiliy Kulikov <segoon@openwall.com>
      Reviewed-by: N"Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      640708a2
    • C
      procfs: make proc_get_link to use dentry instead of inode · 7773fbc5
      Cyrill Gorcunov 提交于
      Prepare the ground for the next "map_files" patch which needs a name of a
      link file to analyse.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vasiliy Kulikov <segoon@openwall.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7773fbc5
    • F
      reiserfs: don't lock root inode searching · 9b467e6e
      Frederic Weisbecker 提交于
      Nothing requires that we lock the filesystem until the root inode is
      provided.
      
      Also iget5_locked() triggers a warning because we are holding the
      filesystem lock while allocating the inode, which result in a lockdep
      suspicion that we have a lock inversion against the reclaim path:
      
      [ 1986.896979] =================================
      [ 1986.896990] [ INFO: inconsistent lock state ]
      [ 1986.896997] 3.1.1-main #8
      [ 1986.897001] ---------------------------------
      [ 1986.897007] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      [ 1986.897016] kswapd0/16 [HC0[0]:SC0[0]:HE1:SE1] takes:
      [ 1986.897023]  (&REISERFS_SB(s)->lock){+.+.?.}, at: [<c01f8bd4>] reiserfs_write_lock+0x20/0x2a
      [ 1986.897044] {RECLAIM_FS-ON-W} state was registered at:
      [ 1986.897050]   [<c014a5b9>] mark_held_locks+0xae/0xd0
      [ 1986.897060]   [<c014aab3>] lockdep_trace_alloc+0x7d/0x91
      [ 1986.897068]   [<c0190ee0>] kmem_cache_alloc+0x1a/0x93
      [ 1986.897078]   [<c01e7728>] reiserfs_alloc_inode+0x13/0x3d
      [ 1986.897088]   [<c01a5b06>] alloc_inode+0x14/0x5f
      [ 1986.897097]   [<c01a5cb9>] iget5_locked+0x62/0x13a
      [ 1986.897106]   [<c01e99e0>] reiserfs_fill_super+0x410/0x8b9
      [ 1986.897114]   [<c01953da>] mount_bdev+0x10b/0x159
      [ 1986.897123]   [<c01e764d>] get_super_block+0x10/0x12
      [ 1986.897131]   [<c0195b38>] mount_fs+0x59/0x12d
      [ 1986.897138]   [<c01a80d1>] vfs_kern_mount+0x45/0x7a
      [ 1986.897147]   [<c01a83e3>] do_kern_mount+0x2f/0xb0
      [ 1986.897155]   [<c01a987a>] do_mount+0x5c2/0x612
      [ 1986.897163]   [<c01a9a72>] sys_mount+0x61/0x8f
      [ 1986.897170]   [<c044060c>] sysenter_do_call+0x12/0x32
      [ 1986.897181] irq event stamp: 7509691
      [ 1986.897186] hardirqs last  enabled at (7509691): [<c0190f34>] kmem_cache_alloc+0x6e/0x93
      [ 1986.897197] hardirqs last disabled at (7509690): [<c0190eea>] kmem_cache_alloc+0x24/0x93
      [ 1986.897209] softirqs last  enabled at (7508896): [<c01294bd>] __do_softirq+0xee/0xfd
      [ 1986.897222] softirqs last disabled at (7508859): [<c01030ed>] do_softirq+0x50/0x9d
      [ 1986.897234]
      [ 1986.897235] other info that might help us debug this:
      [ 1986.897242]  Possible unsafe locking scenario:
      [ 1986.897244]
      [ 1986.897250]        CPU0
      [ 1986.897254]        ----
      [ 1986.897257]   lock(&REISERFS_SB(s)->lock);
      [ 1986.897265] <Interrupt>
      [ 1986.897269]     lock(&REISERFS_SB(s)->lock);
      [ 1986.897276]
      [ 1986.897277]  *** DEADLOCK ***
      [ 1986.897278]
      [ 1986.897286] no locks held by kswapd0/16.
      [ 1986.897291]
      [ 1986.897292] stack backtrace:
      [ 1986.897299] Pid: 16, comm: kswapd0 Not tainted 3.1.1-main #8
      [ 1986.897306] Call Trace:
      [ 1986.897314]  [<c0439e76>] ? printk+0xf/0x11
      [ 1986.897324]  [<c01482d1>] print_usage_bug+0x20e/0x21a
      [ 1986.897332]  [<c01479b8>] ? print_irq_inversion_bug+0x172/0x172
      [ 1986.897341]  [<c014855c>] mark_lock+0x27f/0x483
      [ 1986.897349]  [<c0148d88>] __lock_acquire+0x628/0x1472
      [ 1986.897358]  [<c0149fae>] lock_acquire+0x47/0x5e
      [ 1986.897366]  [<c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a
      [ 1986.897384]  [<c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a
      [ 1986.897397]  [<c043b5ef>] mutex_lock_nested+0x35/0x26f
      [ 1986.897409]  [<c01f8bd4>] ? reiserfs_write_lock+0x20/0x2a
      [ 1986.897421]  [<c01f8bd4>] reiserfs_write_lock+0x20/0x2a
      [ 1986.897433]  [<c01e2edd>] map_block_for_writepage+0xc9/0x590
      [ 1986.897448]  [<c01b1706>] ? create_empty_buffers+0x33/0x8f
      [ 1986.897461]  [<c0121124>] ? get_parent_ip+0xb/0x31
      [ 1986.897472]  [<c043ef7f>] ? sub_preempt_count+0x81/0x8e
      [ 1986.897485]  [<c043cae0>] ? _raw_spin_unlock+0x27/0x3d
      [ 1986.897496]  [<c0121124>] ? get_parent_ip+0xb/0x31
      [ 1986.897508]  [<c01e355d>] reiserfs_writepage+0x1b9/0x3e7
      [ 1986.897521]  [<c0173b40>] ? clear_page_dirty_for_io+0xcb/0xde
      [ 1986.897533]  [<c014a6e3>] ? trace_hardirqs_on_caller+0x108/0x138
      [ 1986.897546]  [<c014a71e>] ? trace_hardirqs_on+0xb/0xd
      [ 1986.897559]  [<c0177b38>] shrink_page_list+0x34f/0x5e2
      [ 1986.897572]  [<c01780a7>] shrink_inactive_list+0x172/0x22c
      [ 1986.897585]  [<c0178464>] shrink_zone+0x303/0x3b1
      [ 1986.897597]  [<c043cae0>] ? _raw_spin_unlock+0x27/0x3d
      [ 1986.897611]  [<c01788c9>] kswapd+0x3b7/0x5f2
      
      The deadlock shouldn't happen since we are doing that allocation in the
      mount path, the filesystem is not available for any reclaim.  Still the
      warning is annoying.
      
      To solve this, acquire the lock later only where we need it, right before
      calling reiserfs_read_locked_inode() that wants to lock to walk the tree.
      Reported-by: NKnut Petersen <Knut_Petersen@t-online.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9b467e6e
    • F
      reiserfs: don't lock journal_init() · 37c69b98
      Frederic Weisbecker 提交于
      journal_init() doesn't need the lock since no operation on the filesystem
      is involved there.  journal_read() and get_list_bitmap() have yet to be
      reviewed carefully though before removing the lock there.  Just keep the
      it around these two calls for safety.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      37c69b98
    • F
      reiserfs: delay reiserfs lock until journal initialization · f32485be
      Frederic Weisbecker 提交于
      In the mount path, transactions that are made before journal
      initialization don't involve the filesystem.  We can delay the reiserfs
      lock until we play with the journal.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f32485be
    • D
      reiserfs: delete comments referring to the BKL · b18c1c6e
      Davidlohr Bueso 提交于
      Signed-off-by: NDavidlohr Bueso <dave@gnu.org>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b18c1c6e
    • D
      fs: binfmt_elf: create Kconfig variable for PIE randomization · e39f5602
      David Daney 提交于
      Randomization of PIE load address is hard coded in binfmt_elf.c for X86
      and ARM.  Create a new Kconfig variable
      (CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE) for this and use it instead.  Thus
      architecture specific policy is pushed out of the generic binfmt_elf.c and
      into the architecture Kconfig files.
      
      X86 and ARM Kconfigs are modified to select the new variable so there is
      no change in behavior.  A follow on patch will select it for MIPS too.
      Signed-off-by: NDavid Daney <david.daney@cavium.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e39f5602
    • K
      tracepoint: add tracepoints for debugging oom_score_adj · 43d2b113
      KAMEZAWA Hiroyuki 提交于
      oom_score_adj is used for guarding processes from OOM-Killer.  One of
      problem is that it's inherited at fork().  When a daemon set oom_score_adj
      and make children, it's hard to know where the value is set.
      
      This patch adds some tracepoints useful for debugging. This patch adds
      3 trace points.
        - creating new task
        - renaming a task (exec)
        - set oom_score_adj
      
      To debug, users need to enable some trace pointer. Maybe filtering is useful as
      
      # EVENT=/sys/kernel/debug/tracing/events/task/
      # echo "oom_score_adj != 0" > $EVENT/task_newtask/filter
      # echo "oom_score_adj != 0" > $EVENT/task_rename/filter
      # echo 1 > $EVENT/enable
      # EVENT=/sys/kernel/debug/tracing/events/oom/
      # echo 1 > $EVENT/enable
      
      output will be like this.
      # grep oom /sys/kernel/debug/tracing/trace
      bash-7699  [007] d..3  5140.744510: oom_score_adj_update: pid=7699 comm=bash oom_score_adj=-1000
      bash-7699  [007] ...1  5151.818022: task_newtask: pid=7729 comm=bash clone_flags=1200011 oom_score_adj=-1000
      ls-7729  [003] ...2  5151.818504: task_rename: pid=7729 oldcomm=bash newcomm=ls oom_score_adj=-1000
      bash-7699  [002] ...1  5175.701468: task_newtask: pid=7730 comm=bash clone_flags=1200011 oom_score_adj=-1000
      grep-7730  [007] ...2  5175.701993: task_rename: pid=7730 oldcomm=bash newcomm=grep oom_score_adj=-1000
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      43d2b113
    • J
      btrfs: pass __GFP_WRITE for buffered write page allocations · e3a41a5b
      Johannes Weiner 提交于
      Tell the page allocator that pages allocated for a buffered write are
      expected to become dirty soon.
      Signed-off-by: NJohannes Weiner <jweiner@redhat.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Shaohua Li <shaohua.li@intel.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e3a41a5b
    • K
      mm: account reaped page cache on inode cache pruning · 5f8aefd4
      Konstantin Khlebnikov 提交于
      Inode cache pruning indirectly reclaims page-cache by invalidating mapping
      pages.  Let's account them into reclaim-state to notice this progress in
      memory reclaimer.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5f8aefd4
    • X
      ext4: fix undefined behavior in ext4_fill_flex_info() · d50f2ab6
      Xi Wang 提交于
      Commit 503358ae ("ext4: avoid divide by
      zero when trying to mount a corrupted file system") fixes CVE-2009-4307
      by performing a sanity check on s_log_groups_per_flex, since it can be
      set to a bogus value by an attacker.
      
      	sbi->s_log_groups_per_flex = sbi->s_es->s_log_groups_per_flex;
      	groups_per_flex = 1 << sbi->s_log_groups_per_flex;
      
      	if (groups_per_flex < 2) { ... }
      
      This patch fixes two potential issues in the previous commit.
      
      1) The sanity check might only work on architectures like PowerPC.
      On x86, 5 bits are used for the shifting amount.  That means, given a
      large s_log_groups_per_flex value like 36, groups_per_flex = 1 << 36
      is essentially 1 << 4 = 16, rather than 0.  This will bypass the check,
      leaving s_log_groups_per_flex and groups_per_flex inconsistent.
      
      2) The sanity check relies on undefined behavior, i.e., oversized shift.
      A standard-confirming C compiler could rewrite the check in unexpected
      ways.  Consider the following equivalent form, assuming groups_per_flex
      is unsigned for simplicity.
      
      	groups_per_flex = 1 << sbi->s_log_groups_per_flex;
      	if (groups_per_flex == 0 || groups_per_flex == 1) {
      
      We compile the code snippet using Clang 3.0 and GCC 4.6.  Clang will
      completely optimize away the check groups_per_flex == 0, leaving the
      patched code as vulnerable as the original.  GCC keeps the check, but
      there is no guarantee that future versions will do the same.
      Signed-off-by: NXi Wang <xi.wang@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      d50f2ab6
  2. 10 1月, 2012 23 次提交