1. 27 3月, 2013 1 次提交
    • E
      userns: Restrict when proc and sysfs can be mounted · 87a8ebd6
      Eric W. Biederman 提交于
      Only allow unprivileged mounts of proc and sysfs if they are already
      mounted when the user namespace is created.
      
      proc and sysfs are interesting because they have content that is
      per namespace, and so fresh mounts are needed when new namespaces
      are created while at the same time proc and sysfs have content that
      is shared between every instance.
      
      Respect the policy of who may see the shared content of proc and sysfs
      by only allowing new mounts if there was an existing mount at the time
      the user namespace was created.
      
      In practice there are only two interesting cases: proc and sysfs are
      mounted at their usual places, proc and sysfs are not mounted at all
      (some form of mount namespace jail).
      
      Cc: stable@vger.kernel.org
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      87a8ebd6
  2. 23 3月, 2013 1 次提交
    • L
      vfs,proc: guarantee unique inodes in /proc · 51f0885e
      Linus Torvalds 提交于
      Dave Jones found another /proc issue with his Trinity tool: thanks to
      the namespace model, we can have multiple /proc dentries that point to
      the same inode, aliasing directories in /proc/<pid>/net/ for example.
      
      This ends up being a total disaster, because it acts like hardlinked
      directories, and causes locking problems.  We rely on the topological
      sort of the inodes pointed to by dentries, and if we have aliased
      directories, that odering becomes unreliable.
      
      In short: don't do this.  Multiple dentries with the same (directory)
      inode is just a bad idea, and the namespace code should never have
      exposed things this way.  But we're kind of stuck with it.
      
      This solves things by just always allocating a new inode during /proc
      dentry lookup, instead of using "iget_locked()" to look up existing
      inodes by superblock and number.  That actually simplies the code a bit,
      at the cost of potentially doing more inode [de]allocations.
      
      That said, the inode lookup wasn't free either (and did a lot of locking
      of inodes), so it is probably not that noticeable.  We could easily keep
      the old lookup model for non-directory entries, but rather than try to
      be excessively clever this just implements the minimal and simplest
      workaround for the problem.
      Reported-and-tested-by: NDave Jones <davej@redhat.com>
      Analyzed-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      51f0885e
  3. 09 3月, 2013 1 次提交
    • E
      proc: Use nd_jump_link in proc_ns_follow_link · db04dc67
      Eric W. Biederman 提交于
      Update proc_ns_follow_link to use nd_jump_link instead of just
      manually updating nd.path.dentry.
      
      This fixes the BUG_ON(nd->inode != parent->d_inode) reported by Dave
      Jones and reproduced trivially with mkdir /proc/self/ns/uts/a.
      
      Sigh it looks like the VFS change to require use of nd_jump_link
      happend while proc_ns_follow_link was baking and since the common case
      of proc_ns_follow_link continued to work without problems the need for
      making this change was overlooked.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      db04dc67
  4. 28 2月, 2013 3 次提交
  5. 26 2月, 2013 4 次提交
  6. 24 2月, 2013 2 次提交
  7. 23 2月, 2013 1 次提交
  8. 19 2月, 2013 2 次提交
  9. 28 1月, 2013 1 次提交
    • F
      cputime: Use accessors to read task cputime stats · 6fac4829
      Frederic Weisbecker 提交于
      This is in preparation for the full dynticks feature. While
      remotely reading the cputime of a task running in a full
      dynticks CPU, we'll need to do some extra-computation. This
      way we can account the time it spent tickless in userspace
      since its last cputime snapshot.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      6fac4829
  10. 19 1月, 2013 1 次提交
    • J
      tty: Added a CONFIG_TTY option to allow removal of TTY · 4f73bc4d
      Joe Millenbach 提交于
      The option allows you to remove TTY and compile without errors. This
      saves space on systems that won't support TTY interfaces anyway.
      bloat-o-meter output is below.
      
      The bulk of this patch consists of Kconfig changes adding "depends on
      TTY" to various serial devices and similar drivers that require the TTY
      layer.  Ideally, these dependencies would occur on a common intermediate
      symbol such as SERIO, but most drivers "select SERIO" rather than
      "depends on SERIO", and "select" does not respect dependencies.
      
      bloat-o-meter output comparing our previous minimal to new minimal by
      removing TTY.  The list is filtered to not show removed entries with awk
      '$3 != "-"' as the list was very long.
      
      add/remove: 0/226 grow/shrink: 2/14 up/down: 6/-35356 (-35350)
      function                                     old     new   delta
      chr_dev_init                                 166     170      +4
      allow_signal                                  80      82      +2
      static.__warned                              143     142      -1
      disallow_signal                               63      62      -1
      __set_special_pids                            95      94      -1
      unregister_console                           126     121      -5
      start_kernel                                 546     541      -5
      register_console                             593     588      -5
      copy_from_user                                45      40      -5
      sys_setsid                                   128     120      -8
      sys_vhangup                                   32      19     -13
      do_exit                                     1543    1526     -17
      bitmap_zero                                   60      40     -20
      arch_local_irq_save                          137     117     -20
      release_task                                 674     652     -22
      static.spin_unlock_irqrestore                308     260     -48
      Signed-off-by: NJoe Millenbach <jmillenbach@gmail.com>
      Reviewed-by: NJamey Sharp <jamey@minilop.net>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f73bc4d
  11. 03 1月, 2013 1 次提交
  12. 26 12月, 2012 1 次提交
    • E
      proc: Allow proc_free_inum to be called from any context · dfb2ea45
      Eric W. Biederman 提交于
      While testing the pid namespace code I hit this nasty warning.
      
      [  176.262617] ------------[ cut here ]------------
      [  176.263388] WARNING: at /home/eric/projects/linux/linux-userns-devel/kernel/softirq.c:160 local_bh_enable_ip+0x7a/0xa0()
      [  176.265145] Hardware name: Bochs
      [  176.265677] Modules linked in:
      [  176.266341] Pid: 742, comm: bash Not tainted 3.7.0userns+ #18
      [  176.266564] Call Trace:
      [  176.266564]  [<ffffffff810a539f>] warn_slowpath_common+0x7f/0xc0
      [  176.266564]  [<ffffffff810a53fa>] warn_slowpath_null+0x1a/0x20
      [  176.266564]  [<ffffffff810ad9ea>] local_bh_enable_ip+0x7a/0xa0
      [  176.266564]  [<ffffffff819308c9>] _raw_spin_unlock_bh+0x19/0x20
      [  176.266564]  [<ffffffff8123dbda>] proc_free_inum+0x3a/0x50
      [  176.266564]  [<ffffffff8111d0dc>] free_pid_ns+0x1c/0x80
      [  176.266564]  [<ffffffff8111d195>] put_pid_ns+0x35/0x50
      [  176.266564]  [<ffffffff810c608a>] put_pid+0x4a/0x60
      [  176.266564]  [<ffffffff8146b177>] tty_ioctl+0x717/0xc10
      [  176.266564]  [<ffffffff810aa4d5>] ? wait_consider_task+0x855/0xb90
      [  176.266564]  [<ffffffff81086bf9>] ? default_spin_lock_flags+0x9/0x10
      [  176.266564]  [<ffffffff810cab0a>] ? remove_wait_queue+0x5a/0x70
      [  176.266564]  [<ffffffff811e37e8>] do_vfs_ioctl+0x98/0x550
      [  176.266564]  [<ffffffff810b8a0f>] ? recalc_sigpending+0x1f/0x60
      [  176.266564]  [<ffffffff810b9127>] ? __set_task_blocked+0x37/0x80
      [  176.266564]  [<ffffffff810ab95b>] ? sys_wait4+0xab/0xf0
      [  176.266564]  [<ffffffff811e3d31>] sys_ioctl+0x91/0xb0
      [  176.266564]  [<ffffffff810a95f0>] ? task_stopped_code+0x50/0x50
      [  176.266564]  [<ffffffff81939199>] system_call_fastpath+0x16/0x1b
      [  176.266564] ---[ end trace 387af88219ad6143 ]---
      
      It turns out that spin_unlock_bh(proc_inum_lock) is not safe when
      put_pid is called with another spinlock held and irqs disabled.
      
      For now take the easy path and use spin_lock_irqsave(proc_inum_lock)
      in proc_free_inum and spin_loc_irq in proc_alloc_inum(proc_inum_lock).
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      dfb2ea45
  13. 21 12月, 2012 2 次提交
    • X
      proc: fix inconsistent lock state · ee297209
      Xiaotian Feng 提交于
      Lockdep found an inconsistent lock state when rcu is processing delayed
      work in softirq.  Currently, kernel is using spin_lock/spin_unlock to
      protect proc_inum_ida, but proc_free_inum is called by rcu in softirq
      context.
      
      Use spin_lock_bh/spin_unlock_bh fix following lockdep warning.
      
        =================================
        [ INFO: inconsistent lock state ]
        3.7.0 #36 Not tainted
        ---------------------------------
        inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
        swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
         (proc_inum_lock){+.?...}, at: proc_free_inum+0x1c/0x50
        {SOFTIRQ-ON-W} state was registered at:
           __lock_acquire+0x8ae/0xca0
           lock_acquire+0x199/0x200
           _raw_spin_lock+0x41/0x50
           proc_alloc_inum+0x4c/0xd0
           alloc_mnt_ns+0x49/0xc0
           create_mnt_ns+0x25/0x70
           mnt_init+0x161/0x1c7
           vfs_caches_init+0x107/0x11a
           start_kernel+0x348/0x38c
           x86_64_start_reservations+0x131/0x136
           x86_64_start_kernel+0x103/0x112
        irq event stamp: 2993422
        hardirqs last  enabled at (2993422):  _raw_spin_unlock_irqrestore+0x55/0x80
        hardirqs last disabled at (2993421):  _raw_spin_lock_irqsave+0x29/0x70
        softirqs last  enabled at (2993394):  _local_bh_enable+0x13/0x20
        softirqs last disabled at (2993395):  call_softirq+0x1c/0x30
      
        other info that might help us debug this:
         Possible unsafe locking scenario:
      
               CPU0
               ----
          lock(proc_inum_lock);
          <Interrupt>
            lock(proc_inum_lock);
      
         *** DEADLOCK ***
      
        no locks held by swapper/1/0.
      
        stack backtrace:
        Pid: 0, comm: swapper/1 Not tainted 3.7.0 #36
        Call Trace:
         <IRQ>  [<ffffffff810a40f1>] ? vprintk_emit+0x471/0x510
          print_usage_bug+0x2a5/0x2c0
          mark_lock+0x33b/0x5e0
          __lock_acquire+0x813/0xca0
          lock_acquire+0x199/0x200
          _raw_spin_lock+0x41/0x50
          proc_free_inum+0x1c/0x50
          free_pid_ns+0x1c/0x50
          put_pid_ns+0x2e/0x50
          put_pid+0x4a/0x60
          delayed_put_pid+0x12/0x20
          rcu_process_callbacks+0x462/0x790
          __do_softirq+0x1b4/0x3b0
          call_softirq+0x1c/0x30
          do_softirq+0x59/0xd0
          irq_exit+0x54/0xd0
          smp_apic_timer_interrupt+0x95/0xa3
          apic_timer_interrupt+0x72/0x80
          cpuidle_enter_tk+0x10/0x20
          cpuidle_enter_state+0x17/0x50
          cpuidle_idle_call+0x287/0x520
          cpu_idle+0xba/0x130
          start_secondary+0x2b3/0x2bc
      Signed-off-by: NXiaotian Feng <dannyfeng@tencent.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ee297209
    • M
      procfs: drop vmtruncate · 46f69557
      Marco Stornelli 提交于
      Removed vmtruncate
      Signed-off-by: NMarco Stornelli <marco.stornelli@gmail.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      46f69557
  14. 18 12月, 2012 7 次提交
  15. 13 12月, 2012 2 次提交
  16. 12 12月, 2012 1 次提交
  17. 11 12月, 2012 1 次提交
  18. 29 11月, 2012 1 次提交
    • F
      cputime: Rename thread_group_times to thread_group_cputime_adjusted · e80d0a1a
      Frederic Weisbecker 提交于
      We have thread_group_cputime() and thread_group_times(). The naming
      doesn't provide enough information about the difference between
      these two APIs.
      
      To lower the confusion, rename thread_group_times() to
      thread_group_cputime_adjusted(). This name better suggests that
      it's a version of thread_group_cputime() that does some stabilization
      on the raw cputime values. ie here: scale on top of CFS runtime
      stats and bound lower value for monotonicity.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      e80d0a1a
  19. 27 11月, 2012 1 次提交
  20. 20 11月, 2012 6 次提交