1. 23 3月, 2011 1 次提交
  2. 10 3月, 2011 1 次提交
  3. 08 3月, 2011 1 次提交
    • A
      unfuck proc_sysctl ->d_compare() · dfef6dcd
      Al Viro 提交于
      a) struct inode is not going to be freed under ->d_compare();
      however, the thing PROC_I(inode)->sysctl points to just might.
      Fortunately, it's enough to make freeing that sucker delayed,
      provided that we don't step on its ->unregistering, clear
      the pointer to it in PROC_I(inode) before dropping the reference
      and check if it's NULL in ->d_compare().
      
      b) I'm not sure that we *can* walk into NULL inode here (we recheck
      dentry->seq between verifying that it's still hashed / fetching
      dentry->d_inode and passing it to ->d_compare() and there's no
      negative hashed dentries in /proc/sys/*), but if we can walk into
      that, we really should not have ->d_compare() return 0 on it!
      Said that, I really suspect that this check can be simply killed.
      Nick?
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      dfef6dcd
  4. 03 3月, 2011 1 次提交
  5. 15 2月, 2011 1 次提交
  6. 02 2月, 2011 1 次提交
    • L
      security/selinux: fix /proc/sys/ labeling · 8e6c9693
      Lucian Adrian Grijincu 提交于
      This fixes an old (2007) selinux regression: filesystem labeling for
      /proc/sys returned
           -r--r--r-- unknown                          /proc/sys/fs/file-nr
      instead of
           -r--r--r-- system_u:object_r:sysctl_fs_t:s0 /proc/sys/fs/file-nr
      
      Events that lead to breaking of /proc/sys/ selinux labeling:
      
      1) sysctl was reimplemented to route all calls through /proc/sys/
      
          commit 77b14db5
          [PATCH] sysctl: reimplement the sysctl proc support
      
      2) proc_dir_entry was removed from ctl_table:
      
          commit 3fbfa981
          [PATCH] sysctl: remove the proc_dir_entry member for the sysctl tables
      
      3) selinux still walked the proc_dir_entry tree to apply
         labeling. Because ctl_tables don't have a proc_dir_entry, we did
         not label /proc/sys/ inodes any more. To achieve this the /proc/sys/
         inodes were marked private and private inodes were ignored by
         selinux.
      
          commit bbaca6c2
          [PATCH] selinux: enhance selinux to always ignore private inodes
      
          commit 86a71dbd
          [PATCH] sysctl: hide the sysctl proc inodes from selinux
      
      Access control checks have been done by means of a special sysctl hook
      that was called for read/write accesses to any /proc/sys/ entry.
      
      We don't have to do this because, instead of walking the
      proc_dir_entry tree we can walk the dentry tree (as done in this
      patch). With this patch:
      * we don't mark /proc/sys/ inodes as private
      * we don't need the sysclt security hook
      * we walk the dentry tree to find the path to the inode.
      
      We have to strip the PID in /proc/PID/ entries that have a
      proc_dir_entry because selinux does not know how to label paths like
      '/1/net/rpc/nfsd.fh' (and defaults to 'proc_t' labeling). Selinux does
      know of '/net/rpc/nfsd.fh' (and applies the 'sysctl_rpc_t' label).
      
      PID stripping from the path was done implicitly in the previous code
      because the proc_dir_entry tree had the root in '/net' in the example
      from above. The dentry tree has the root in '/1'.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NLucian Adrian Grijincu <lucian.grijincu@gmail.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      8e6c9693
  7. 26 1月, 2011 1 次提交
    • T
      console: rename acquire/release_console_sem() to console_lock/unlock() · ac751efa
      Torben Hohn 提交于
      The -rt patches change the console_semaphore to console_mutex.  As a
      result, a quite large chunk of the patches changes all
      acquire/release_console_sem() to acquire/release_console_mutex()
      
      This commit makes things use more neutral function names which dont make
      implications about the underlying lock.
      
      The only real change is the return value of console_trylock which is
      inverted from try_acquire_console_sem()
      
      This patch also paves the way to switching console_sem from a semaphore to
      a mutex.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: make console_trylock return 1 on success, per Geert]
      Signed-off-by: NTorben Hohn <torbenh@gmx.de>
      Cc: Thomas Gleixner <tglx@tglx.de>
      Cc: Greg KH <gregkh@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac751efa
  8. 21 1月, 2011 1 次提交
    • D
      kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT · 6a108a14
      David Rientjes 提交于
      The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
      is used to configure any non-standard kernel with a much larger scope than
      only small devices.
      
      This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
      references to the option throughout the kernel.  A new CONFIG_EMBEDDED
      option is added that automatically selects CONFIG_EXPERT when enabled and
      can be used in the future to isolate options that should only be
      considered for embedded systems (RISC architectures, SLOB, etc).
      
      Calling the option "EXPERT" more accurately represents its intention: only
      expert users who understand the impact of the configuration changes they
      are making should enable it.
      Reviewed-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NDavid Woodhouse <david.woodhouse@intel.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Greg KH <gregkh@suse.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Robin Holt <holt@sgi.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6a108a14
  9. 14 1月, 2011 13 次提交
  10. 07 1月, 2011 7 次提交
    • N
      fs: provide rcu-walk aware permission i_ops · b74c79e9
      Nick Piggin 提交于
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b74c79e9
    • N
      fs: rcu-walk aware d_revalidate method · 34286d66
      Nick Piggin 提交于
      Require filesystems be aware of .d_revalidate being called in rcu-walk
      mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
      -ECHILD from all implementations.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      34286d66
    • N
      fs: dcache reduce branches in lookup path · fb045adb
      Nick Piggin 提交于
      Reduce some branches and memory accesses in dcache lookup by adding dentry
      flags to indicate common d_ops are set, rather than having to check them.
      This saves a pointer memory access (dentry->d_op) in common path lookup
      situations, and saves another pointer load and branch in cases where we
      have d_op but not the particular operation.
      
      Patched with:
      
      git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fb045adb
    • N
      fs: rcu-walk for path lookup · 31e6b01f
      Nick Piggin 提交于
      Perform common cases of path lookups without any stores or locking in the
      ancestor dentry elements. This is called rcu-walk, as opposed to the current
      algorithm which is a refcount based walk, or ref-walk.
      
      This results in far fewer atomic operations on every path element,
      significantly improving path lookup performance. It also avoids cacheline
      bouncing on common dentries, significantly improving scalability.
      
      The overall design is like this:
      * LOOKUP_RCU is set in nd->flags, which distinguishes rcu-walk from ref-walk.
      * Take the RCU lock for the entire path walk, starting with the acquiring
        of the starting path (eg. root/cwd/fd-path). So now dentry refcounts are
        not required for dentry persistence.
      * synchronize_rcu is called when unregistering a filesystem, so we can
        access d_ops and i_ops during rcu-walk.
      * Similarly take the vfsmount lock for the entire path walk. So now mnt
        refcounts are not required for persistence. Also we are free to perform mount
        lookups, and to assume dentry mount points and mount roots are stable up and
        down the path.
      * Have a per-dentry seqlock to protect the dentry name, parent, and inode,
        so we can load this tuple atomically, and also check whether any of its
        members have changed.
      * Dentry lookups (based on parent, candidate string tuple) recheck the parent
        sequence after the child is found in case anything changed in the parent
        during the path walk.
      * inode is also RCU protected so we can load d_inode and use the inode for
        limited things.
      * i_mode, i_uid, i_gid can be tested for exec permissions during path walk.
      * i_op can be loaded.
      
      When we reach the destination dentry, we lock it, recheck lookup sequence,
      and increment its refcount and mountpoint refcount. RCU and vfsmount locks
      are dropped. This is termed "dropping rcu-walk". If the dentry refcount does
      not match, we can not drop rcu-walk gracefully at the current point in the
      lokup, so instead return -ECHILD (for want of a better errno). This signals the
      path walking code to re-do the entire lookup with a ref-walk.
      
      Aside from the final dentry, there are other situations that may be encounted
      where we cannot continue rcu-walk. In that case, we drop rcu-walk (ie. take
      a reference on the last good dentry) and continue with a ref-walk. Again, if
      we can drop rcu-walk gracefully, we return -ECHILD and do the whole lookup
      using ref-walk. But it is very important that we can continue with ref-walk
      for most cases, particularly to avoid the overhead of double lookups, and to
      gain the scalability advantages on common path elements (like cwd and root).
      
      The cases where rcu-walk cannot continue are:
      * NULL dentry (ie. any uncached path element)
      * parent with d_inode->i_op->permission or ACLs
      * dentries with d_revalidate
      * Following links
      
      In future patches, permission checks and d_revalidate become rcu-walk aware. It
      may be possible eventually to make following links rcu-walk aware.
      
      Uncached path elements will always require dropping to ref-walk mode, at the
      very least because i_mutex needs to be grabbed, and objects allocated.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      31e6b01f
    • N
      fs: icache RCU free inodes · fa0d7e3d
      Nick Piggin 提交于
      RCU free the struct inode. This will allow:
      
      - Subsequent store-free path walking patch. The inode must be consulted for
        permissions when walking, so an RCU inode reference is a must.
      - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
        to take i_lock no longer need to take sb_inode_list_lock to walk the list in
        the first place. This will simplify and optimize locking.
      - Could remove some nested trylock loops in dcache code
      - Could potentially simplify things a bit in VM land. Do not need to take the
        page lock to follow page->mapping.
      
      The downsides of this is the performance cost of using RCU. In a simple
      creat/unlink microbenchmark, performance drops by about 10% due to inability to
      reuse cache-hot slab objects. As iterations increase and RCU freeing starts
      kicking over, this increases to about 20%.
      
      In cases where inode lifetimes are longer (ie. many inodes may be allocated
      during the average life span of a single inode), a lot of this cache reuse is
      not applicable, so the regression caused by this patch is smaller.
      
      The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
      however this adds some complexity to list walking and store-free path walking,
      so I prefer to implement this at a later date, if it is shown to be a win in
      real situations. I haven't found a regression in any non-micro benchmark so I
      doubt it will be a problem.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fa0d7e3d
    • N
      fs: change d_compare for rcu-walk · 621e155a
      Nick Piggin 提交于
      Change d_compare so it may be called from lock-free RCU lookups. This
      does put significant restrictions on what may be done from the callback,
      however there don't seem to have been any problems with in-tree fses.
      If some strange use case pops up that _really_ cannot cope with the
      rcu-walk rules, we can just add new rcu-unaware callbacks, which would
      cause name lookup to drop out of rcu-walk mode.
      
      For in-tree filesystems, this is just a mechanical change.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      621e155a
    • N
      fs: change d_delete semantics · fe15ce44
      Nick Piggin 提交于
      Change d_delete from a dentry deletion notification to a dentry caching
      advise, more like ->drop_inode. Require it to be constant and idempotent,
      and not take d_lock. This is how all existing filesystems use the callback
      anyway.
      
      This makes fine grained dentry locking of dput and dentry lru scanning
      much simpler.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fe15ce44
  11. 06 12月, 2010 1 次提交
    • E
      Revert "vfs: show unreachable paths in getcwd and proc" · 7b2a69ba
      Eric W. Biederman 提交于
      Because it caused a chroot ttyname regression in 2.6.36.
      
      As of 2.6.36 ttyname does not work in a chroot.  It has already been
      reported that screen breaks, and for me this breaks an automated
      distribution testsuite, that I need to preserve the ability to run the
      existing binaries on for several more years.  glibc 2.11.3 which has a
      fix for this is not an option.
      
      The root cause of this breakage is:
      
          commit 8df9d1a4
          Author: Miklos Szeredi <mszeredi@suse.cz>
          Date:   Tue Aug 10 11:41:41 2010 +0200
      
          vfs: show unreachable paths in getcwd and proc
      
          Prepend "(unreachable)" to path strings if the path is not reachable
          from the current root.
      
          Two places updated are
           - the return string from getcwd()
           - and symlinks under /proc/$PID.
      
          Other uses of d_path() are left unchanged (we know that some old
          software crashes if /proc/mounts is changed).
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      
      So remove the nice sounding, but ultimately ill advised change to how
      /proc/fd symlinks work.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b2a69ba
  12. 30 11月, 2010 2 次提交
    • M
      sched: Add 'autogroup' scheduling feature: automated per session task groups · 5091faa4
      Mike Galbraith 提交于
      A recurring complaint from CFS users is that parallel kbuild has
      a negative impact on desktop interactivity.  This patch
      implements an idea from Linus, to automatically create task
      groups.  Currently, only per session autogroups are implemented,
      but the patch leaves the way open for enhancement.
      
      Implementation: each task's signal struct contains an inherited
      pointer to a refcounted autogroup struct containing a task group
      pointer, the default for all tasks pointing to the
      init_task_group.  When a task calls setsid(), a new task group
      is created, the process is moved into the new task group, and a
      reference to the preveious task group is dropped.  Child
      processes inherit this task group thereafter, and increase it's
      refcount.  When the last thread of a process exits, the
      process's reference is dropped, such that when the last process
      referencing an autogroup exits, the autogroup is destroyed.
      
      At runqueue selection time, IFF a task has no cgroup assignment,
      its current autogroup is used.
      
      Autogroup bandwidth is controllable via setting it's nice level
      through the proc filesystem:
      
        cat /proc/<pid>/autogroup
      
      Displays the task's group and the group's nice level.
      
        echo <nice level> > /proc/<pid>/autogroup
      
      Sets the task group's shares to the weight of nice <level> task.
      Setting nice level is rate limited for !admin users due to the
      abuse risk of task group locking.
      
      The feature is enabled from boot by default if
      CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via
      the boot option noautogroup, and can also be turned on/off on
      the fly via:
      
        echo [01] > /proc/sys/kernel/sched_autogroup_enabled
      
      ... which will automatically move tasks to/from the root task group.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      [ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5091faa4
    • M
      ARM: 6485/5: proc/vmcore - allow archs to override vmcore_elf_check_arch() · 9833c394
      Mika Westerberg 提交于
      Allow architectures to redefine this macro if needed. This is useful for
      example in architectures where 64-bit ELF vmcores are not supported.
      Specifying zero vmcore_elf64_check_arch() allows compiler to optimize
      away unnecessary parts of parse_crash_elf64_headers().
      
      We also rename the macro to vmcore_elf64_check_arch() to reflect that it
      is used for 64-bit vmcores only.
      Signed-off-by: NMika Westerberg <mika.westerberg@iki.fi>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      9833c394
  13. 25 11月, 2010 1 次提交
    • N
      pagemap: set pagemap walk limit to PMD boundary · ea251c1d
      Naoya Horiguchi 提交于
      Currently one pagemap_read() call walks in PAGEMAP_WALK_SIZE bytes (== 512
      pages.) But there is a corner case where walk_pmd_range() accidentally
      runs over a VMA associated with a hugetlbfs file.
      
      For example, when a process has mappings to VMAs as shown below:
      
        # cat /proc/<pid>/maps
        ...
        3a58f6d000-3a58f72000 rw-p 00000000 00:00 0
        7fbd51853000-7fbd51855000 rw-p 00000000 00:00 0
        7fbd5186c000-7fbd5186e000 rw-p 00000000 00:00 0
        7fbd51a00000-7fbd51c00000 rw-s 00000000 00:12 8614   /hugepages/test
      
      then pagemap_read() goes into walk_pmd_range() path and walks in the range
      0x7fbd51853000-0x7fbd51a53000, but the hugetlbfs VMA should be handled by
      walk_hugetlb_range().  Otherwise PMD for the hugepage is considered bad
      and cleared, which causes undesirable results.
      
      This patch fixes it by separating pagemap walk range into one PMD.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea251c1d
  14. 18 11月, 2010 1 次提交
  15. 17 11月, 2010 1 次提交
    • J
      console: add /proc/consoles · 23308ba5
      Jiri Slaby 提交于
      It allows users to see what consoles are currently known to the system
      and with what flags.
      
      It is based on Werner's patch, the part about traversing fds was
      removed, the code was moved to kernel/printk.c, where consoles are
      handled and it makes more sense to me.
      
      Signed-off-by: Jiri Slaby <jslaby@suse.cz> [cleanups]
      Signed-off-by: N"Dr. Werner Fink" <werner@suse.de>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      23308ba5
  16. 29 10月, 2010 2 次提交
  17. 28 10月, 2010 4 次提交