1. 12 5月, 2009 1 次提交
    • H
      x86: add extension fields for bootloader type and version · 5031296c
      H. Peter Anvin 提交于
      A long ago, in days of yore, it all began with a god named Thor.
      There were vikings and boats and some plans for a Linux kernel
      header.  Unfortunately, a single 8-bit field was used for bootloader
      type and version.  This has generally worked without *too* much pain,
      but we're getting close to flat running out of ID fields.
      
      Add extension fields for both type and version.  The type will be
      extended if it the old field is 0xE; the version is a simple MSB
      extension.
      
      Keep /proc/sys/kernel/bootloader_type containing
      (type << 4) + (ver & 0xf) for backwards compatiblity, but also add
      /proc/sys/kernel/bootloader_version which contains the full version
      number.
      
      [ Impact: new feature to support more bootloaders ]
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      5031296c
  2. 14 4月, 2009 1 次提交
  3. 07 4月, 2009 2 次提交
    • P
      mm: add /proc controls for pdflush threads · fafd688e
      Peter W Morreale 提交于
      Add /proc entries to give the admin the ability to control the minimum and
      maximum number of pdflush threads.  This allows finer control of pdflush
      on both large and small machines.
      
      The rationale is simply one size does not fit all.  Admins on large and/or
      small systems may want to tune the min/max pdflush thread count to best
      suit their needs.  Right now the min/max is hardcoded to 2/8.  While
      probably a fair estimate for smaller machines, large machines with large
      numbers of CPUs and large numbers of filesystems/block devices may benefit
      from larger numbers of threads working on different block devices.
      
      Even if the background flushing algorithm is radically changed, it is
      still likely that multiple threads will be involved and admins would still
      desire finer control on the min/max other than to have to recompile the
      kernel.
      
      The patch adds '/proc/sys/vm/nr_pdflush_threads_min' and
      '/proc/sys/vm/nr_pdflush_threads_max' with r/w permissions.
      
      The minimum value for nr_pdflush_threads_min is 1 and the maximum value is
      the current value of nr_pdflush_threads_max.  This minimum is required
      since additional thread creation is performed in a pdflush thread itself.
      
      The minimum value for nr_pdflush_threads_max is the current value of
      nr_pdflush_threads_min and the maximum value can be 1000.
      
      Documentation/sysctl/vm.txt is also updated.
      
      [akpm@linux-foundation.org: fix comment, fix whitespace, use __read_mostly]
      Signed-off-by: NPeter W Morreale <pmorreale@novell.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fafd688e
    • L
      kernel/sysctl.c: avoid annoying warnings · cd5f9a4c
      Linus Torvalds 提交于
      Some of the limit constants are used only depending on some complex
      configuration dependencies, yet it's not worth making the simple
      variables depend on those configuration details.  Just mark them as
      perhaps not being unused, and avoid the warning.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cd5f9a4c
  4. 03 4月, 2009 2 次提交
  5. 01 4月, 2009 1 次提交
  6. 13 2月, 2009 1 次提交
  7. 12 2月, 2009 1 次提交
  8. 16 1月, 2009 2 次提交
  9. 14 1月, 2009 2 次提交
  10. 08 1月, 2009 1 次提交
    • P
      NOMMU: Make mmap allocation page trimming behaviour configurable. · dd8632a1
      Paul Mundt 提交于
      NOMMU mmap allocates a piece of memory for an mmap that's rounded up in size to
      the nearest power-of-2 number of pages.  Currently it then discards the excess
      pages back to the page allocator, making that memory available for use by other
      things.  This can, however, cause greater amount of fragmentation.
      
      To counter this, a sysctl is added in order to fine-tune the trimming
      behaviour.  The default behaviour remains to trim pages aggressively, while
      this can either be disabled completely or set to a higher page-granular
      watermark in order to have finer-grained control.
      
      vm region vm_top bits taken from an earlier patch by David Howells.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMike Frysinger <vapier.adi@gmail.com>
      dd8632a1
  11. 07 1月, 2009 1 次提交
    • D
      mm: add dirty_background_bytes and dirty_bytes sysctls · 2da02997
      David Rientjes 提交于
      This change introduces two new sysctls to /proc/sys/vm:
      dirty_background_bytes and dirty_bytes.
      
      dirty_background_bytes is the counterpart to dirty_background_ratio and
      dirty_bytes is the counterpart to dirty_ratio.
      
      With growing memory capacities of individual machines, it's no longer
      sufficient to specify dirty thresholds as a percentage of the amount of
      dirtyable memory over the entire system.
      
      dirty_background_bytes and dirty_bytes specify quantities of memory, in
      bytes, that represent the dirty limits for the entire system.  If either
      of these values is set, its value represents the amount of dirty memory
      that is needed to commence either background or direct writeback.
      
      When a `bytes' or `ratio' file is written, its counterpart becomes a
      function of the written value.  For example, if dirty_bytes is written to
      be 8096, 8K of memory is required to commence direct writeback.
      dirty_ratio is then functionally equivalent to 8K / the amount of
      dirtyable memory:
      
      	dirtyable_memory = free pages + mapped pages + file cache
      
      	dirty_background_bytes = dirty_background_ratio * dirtyable_memory
      		-or-
      	dirty_background_ratio = dirty_background_bytes / dirtyable_memory
      
      		AND
      
      	dirty_bytes = dirty_ratio * dirtyable_memory
      		-or-
      	dirty_ratio = dirty_bytes / dirtyable_memory
      
      Only one of dirty_background_bytes and dirty_background_ratio may be
      specified at a time, and only one of dirty_bytes and dirty_ratio may be
      specified.  When one sysctl is written, the other appears as 0 when read.
      
      The `bytes' files operate on a page size granularity since dirty limits
      are compared with ZVC values, which are in page units.
      
      Prior to this change, the minimum dirty_ratio was 5 as implemented by
      get_dirty_limits() although /proc/sys/vm/dirty_ratio would show any user
      written value between 0 and 100.  This restriction is maintained, but
      dirty_bytes has a lower limit of only one page.
      
      Also prior to this change, the dirty_background_ratio could not equal or
      exceed dirty_ratio.  This restriction is maintained in addition to
      restricting dirty_background_bytes.  If either background threshold equals
      or exceeds that of the dirty threshold, it is implicitly set to half the
      dirty threshold.
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Andrea Righi <righi.andrea@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2da02997
  12. 18 12月, 2008 1 次提交
    • S
      trace: add a way to enable or disable the stack tracer · f38f1d2a
      Steven Rostedt 提交于
      Impact: enhancement to stack tracer
      
      The stack tracer currently is either on when configured in or
      off when it is not. It can not be disabled when it is configured on.
      (besides disabling the function tracer that it uses)
      
      This patch adds a way to enable or disable the stack tracer at
      run time. It defaults off on bootup, but a kernel parameter 'stacktrace'
      has been added to enable it on bootup.
      
      A new sysctl has been added "kernel.stack_tracer_enabled" to let
      the user enable or disable the stack tracer at run time.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f38f1d2a
  13. 05 12月, 2008 1 次提交
    • D
      sparc64: Add tsb-ratio sysctl. · 0871420f
      David S. Miller 提交于
      Add a sysctl to tweak the RSS limit used to decide when to grow
      the TSB for an address space.
      
      In order to avoid expensive divides and multiplies only simply
      positive and negative powers of two are supported.
      
      The function computed takes the number of TSB translations that will
      fit at one time in the TSB of a given size, and either adds or
      subtracts a percentage of entries.  This final value is the
      RSS limit.
      
      See tsb_size_to_rss_limit().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0871420f
  14. 02 12月, 2008 1 次提交
    • D
      epoll: introduce resource usage limits · 7ef9964e
      Davide Libenzi 提交于
      It has been thought that the per-user file descriptors limit would also
      limit the resources that a normal user can request via the epoll
      interface.  Vegard Nossum reported a very simple program (a modified
      version attached) that can make a normal user to request a pretty large
      amount of kernel memory, well within the its maximum number of fds.  To
      solve such problem, default limits are now imposed, and /proc based
      configuration has been introduced.  A new directory has been created,
      named /proc/sys/fs/epoll/ and inside there, there are two configuration
      points:
      
        max_user_instances = Maximum number of devices - per user
      
        max_user_watches   = Maximum number of "watched" fds - per user
      
      The current default for "max_user_watches" limits the memory used by epoll
      to store "watches", to 1/32 of the amount of the low RAM.  As example, a
      256MB 32bit machine, will have "max_user_watches" set to roughly 90000.
      That should be enough to not break existing heavy epoll users.  The
      default value for "max_user_instances" is set to 128, that should be
      enough too.
      
      This also changes the userspace, because a new error code can now come out
      from EPOLL_CTL_ADD (-ENOSPC).  The EMFILE from epoll_create() was already
      listed, so that should be ok.
      
      [akpm@linux-foundation.org: use get_current_user()]
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: <stable@kernel.org>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Reported-by: NVegard Nossum <vegardno@ifi.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ef9964e
  15. 14 11月, 2008 1 次提交
  16. 04 11月, 2008 1 次提交
  17. 27 10月, 2008 1 次提交
    • S
      ftrace: ftrace dump on oops control · 944ac425
      Steven Rostedt 提交于
      Impact: add (default-off) dump-trace-on-oops flag
      
      Currently, ftrace is set up to dump its contents to the console if the
      kernel panics or oops. This can be annoying if you have trace data in
      the buffers and you experience an oops, but the trace data is old or
      static.
      
      Usually when you want ftrace to dump its contents is when you are debugging
      your system and you have set up ftrace to trace the events leading to
      an oops.
      
      This patch adds a control variable called "ftrace_dump_on_oops" that will
      enable the ftrace dump to console on oops. This variable is default off
      but a developer can enable it either through the kernel command line
      by adding "ftrace_dump_on_oops" or at run time by setting (or disabling)
      /proc/sys/kernel/ftrace_dump_on_oops.
      
      v2:
      
         Replaced /** with /* as Randy explained that kernel-doc does
          not yet handle variables.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      944ac425
  18. 21 10月, 2008 1 次提交
  19. 20 10月, 2008 2 次提交
  20. 17 10月, 2008 3 次提交
  21. 10 10月, 2008 1 次提交
  22. 30 9月, 2008 1 次提交
    • T
      Configure out file locking features · bfcd17a6
      Thomas Petazzoni 提交于
      This patch adds the CONFIG_FILE_LOCKING option which allows to remove
      support for advisory locks. With this patch enabled, the flock()
      system call, the F_GETLK, F_SETLK and F_SETLKW operations of fcntl()
      and NFS support are disabled. These features are not necessarly needed
      on embedded systems. It allows to save ~11 Kb of kernel code and data:
      
         text          data     bss     dec     hex filename
      1125436        118764  212992 1457192  163c28 vmlinux.old
      1114299        118564  212992 1445855  160fdf vmlinux
       -11137    -200       0  -11337   -2C49 +/-
      
      This patch has originally been written by Matt Mackall
      <mpm@selenic.com>, and is part of the Linux Tiny project.
      Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Cc: matthew@wil.cx
      Cc: linux-fsdevel@vger.kernel.org
      Cc: mpm@selenic.com
      Cc: akpm@linux-foundation.org
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      bfcd17a6
  23. 12 9月, 2008 2 次提交
  24. 05 9月, 2008 1 次提交
  25. 28 7月, 2008 1 次提交
  26. 27 7月, 2008 5 次提交
    • A
      [PATCH] sanitize ->permission() prototype · e6305c43
      Al Viro 提交于
      * kill nameidata * argument; map the 3 bits in ->flags anybody cares
        about to new MAY_... ones and pass with the mask.
      * kill redundant gfs2_iop_permission()
      * sanitize ecryptfs_permission()
      * fix remaining places where ->permission() instances might barf on new
        MAY_... found in mask.
      
      The obvious next target in that direction is permission(9)
      
      folded fix for nfs_permission() breakage from Miklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e6305c43
    • A
      [PATCH] sanitize proc_sysctl · 9043476f
      Al Viro 提交于
      * keep references to ctl_table_head and ctl_table in /proc/sys inodes
      * grab the former during operations, use the latter for access to
        entry if that succeeds
      * have ->d_compare() check if table should be seen for one who does lookup;
        that allows us to avoid flipping inodes - if we have the same name resolve
        to different things, we'll just keep several dentries and ->d_compare()
        will reject the wrong ones.
      * have ->lookup() and ->readdir() scan the table of our inode first, then
        walk all ctl_table_header and scan ->attached_by for those that are
        attached to our directory.
      * implement ->getattr().
      * get rid of insane amounts of tree-walking
      * get rid of the need to know dentry in ->permission() and of the contortions
        induced by that.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9043476f
    • A
      [PATCH] sysctl: keep track of tree relationships · ae7edecc
      Al Viro 提交于
      In a sense, that's the heart of the series.  It's based on the following
      property of the trees we are actually asked to add: they can be split into
      stem that is already covered by registered trees and crown that is entirely
      new.  IOW, if a/b and a/c/d are introduced by our tree, then a/c is also
      introduced by it.
      
      That allows to associate tree and table entry with each node in the union;
      while directory nodes might be covered by many trees, only one will cover
      the node by its crown.  And that will allow much saner logics for /proc/sys
      in the next patches.  This patch introduces the data structures needed to
      keep track of that.
      
      When adding a sysctl table, we find a "parent" one.  Which is to say,
      find the deepest node on its stem that already is present in one of the
      tables from our table set or its ancestor sets.  That table will be our
      parent and that node in it - attachment point.  Add our table to list
      anchored in parent, have it refer the parent and contents of attachment
      point.  Also remember where its crown lives.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ae7edecc
    • A
      [PATCH] allow delayed freeing of ctl_table_header · f7e6ced4
      Al Viro 提交于
      Refcount the sucker; instead of freeing it by the end of unregistration
      just drop the refcount and free only when it hits zero.  Make sure that
      we _always_ make ->unregistering non-NULL in start_unregistering().
      
      That allows anybody to get a reference to such puppy, preventing its
      freeing and reuse.  It does *not* block unregistration.  Anybody who
      holds such a reference can
      	* try to grab a "use" reference (ctl_head_grab()); that will
      succeeds if and only if it hadn't entered unregistration yet.  If it
      succeeds, we can use it in all normal ways until we release the "use"
      reference (with ctl_head_finish()).  Note that this relies on having
      ->unregistering become non-NULL in all cases when one starts to unregister
      the sucker.
      	* keep pointers to ctl_table entries; they *can* be freed if
      the entire thing is unregistered.  However, if ctl_head_grab() succeeds,
      we know that unregistration had not happened (and will not happen until
      ctl_head_finish()) and such pointers can be used safely.
      
      IOW, now we can have inodes under /proc/sys keep references to ctl_table
      entries, protecting them with references to ctl_table_header and
      grabbing the latter for the duration of operations that require access
      to ctl_table.  That won't cause deadlocks, since unregistration will not
      be stopped by mere keeping a reference to ctl_table_header.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f7e6ced4
    • A
      [PATCH] beginning of sysctl cleanup - ctl_table_set · 73455092
      Al Viro 提交于
      New object: set of sysctls [currently - root and per-net-ns].
      Contains: pointer to parent set, list of tables and "should I see this set?"
      method (->is_seen(set)).
      Current lists of tables are subsumed by that; net-ns contains such a beast.
      ->lookup() for ctl_table_root returns pointer to ctl_table_set instead of
      that to ->list of that ctl_table_set.
      
      [folded compile fixes by rdd for configs without sysctl]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      73455092
  27. 26 7月, 2008 1 次提交
    • D
      printk ratelimiting rewrite · 717115e1
      Dave Young 提交于
      All ratelimit user use same jiffies and burst params, so some messages
      (callbacks) will be lost.
      
      For example:
      a call printk_ratelimit(5 * HZ, 1)
      b call printk_ratelimit(5 * HZ, 1) before the 5*HZ timeout of a, then b will
      will be supressed.
      
      - rewrite __ratelimit, and use a ratelimit_state as parameter.  Thanks for
        hints from andrew.
      
      - Add WARN_ON_RATELIMIT, update rcupreempt.h
      
      - remove __printk_ratelimit
      
      - use __ratelimit in net_ratelimit
      Signed-off-by: NDave Young <hidave.darkstar@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Cc: Dave Young <hidave.darkstar@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      717115e1
  28. 25 7月, 2008 1 次提交