1. 17 7月, 2007 4 次提交
    • L
      Remove duplicate comments from sysctl.c · 7144521f
      Linus Torvalds 提交于
      Randy Dunlap noticed that the recent comment clarifications from Andrew
      had somehow gotten duplicated.  Quoth Andrew: "hm, that could have been
      some late-night reject-fixing."
      
      Fix it up.
      
      Cc: From: Andrew Morton <akpm@linux-foundation.org>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7144521f
    • A
      sysctl.c: add text telling people to use CTL_UNNUMBERED · 2be7fe07
      Andrew Morton 提交于
      Hopefully this will help people to understand the new regime.
      
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2be7fe07
    • I
      vdso: print fatal signals · 45807a1d
      Ingo Molnar 提交于
      Add the print-fatal-signals=1 boot option and the
      /proc/sys/kernel/print-fatal-signals runtime switch.
      
      This feature prints some minimal information about userspace segfaults to
      the kernel console.  This is useful to find early bootup bugs where
      userspace debugging is very hard.
      
      Defaults to off.
      
      [akpm@linux-foundation.org: Don't add new sysctl numbers]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArjan van de Ven <arjan@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      45807a1d
    • K
      change zonelist order: zonelist order selection logic · f0c0b2b8
      KAMEZAWA Hiroyuki 提交于
      Make zonelist creation policy selectable from sysctl/boot option v6.
      
      This patch makes NUMA's zonelist (of pgdat) order selectable.
      Available order are Default(automatic)/ Node-based / Zone-based.
      
      [Default Order]
      The kernel selects Node-based or Zone-based order automatically.
      
      [Node-based Order]
      This policy treats the locality of memory as the most important parameter.
      Zonelist order is created by each zone's locality. This means lower zones
      (ex. ZONE_DMA) can be used before higher zone (ex. ZONE_NORMAL) exhausion.
      IOW. ZONE_DMA will be in the middle of zonelist.
      current 2.6.21 kernel uses this.
      
      Pros.
       * A user can expect local memory as much as possible.
      Cons.
       * lower zone will be exhansted before higher zone. This may cause OOM_KILL.
      
      Maybe suitable if ZONE_DMA is relatively big and you never see OOM_KILL
      because of ZONE_DMA exhaution and you need the best locality.
      
      (example)
      assume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL.
      
      *node(0)'s memory allocation order:
      
       node(0)'s NORMAL -> node(0)'s DMA -> node(1)'s NORMAL.
      
      *node(1)'s memory allocation order:
      
       node(1)'s NORMAL -> node(0)'s NORMAL -> node(0)'s DMA.
      
      [Zone-based order]
      This policy treats the zone type as the most important parameter.
      Zonelist order is created by zone-type order. This means lower zone
      never be used bofere higher zone exhaustion.
      IOW. ZONE_DMA will be always at the tail of zonelist.
      
      Pros.
       * OOM_KILL(bacause of lower zone) occurs only if the whole zones are exhausted.
      Cons.
       * memory locality may not be best.
      
      (example)
      assume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL.
      
      *node(0)'s memory allocation order:
      
       node(0)'s NORMAL -> node(1)'s NORMAL -> node(0)'s DMA.
      
      *node(1)'s memory allocation order:
      
       node(1)'s NORMAL -> node(0)'s NORMAL -> node(0)'s DMA.
      
      bootoption "numa_zonelist_order=" and proc/sysctl is supporetd.
      
      command:
      %echo N > /proc/sys/vm/numa_zonelist_order
      
      Will rebuild zonelist in Node-based order.
      
      command:
      %echo Z > /proc/sys/vm/numa_zonelist_order
      
      Will rebuild zonelist in Zone-based order.
      
      Thanks to Lee Schermerhorn, he gives me much help and codes.
      
      [Lee.Schermerhorn@hp.com: add check_highest_zone to build_zonelists_in_zone_order]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "jesse.barnes@intel.com" <jesse.barnes@intel.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f0c0b2b8
  2. 12 7月, 2007 1 次提交
    • E
      security: Protection for exploiting null dereference using mmap · ed032189
      Eric Paris 提交于
      Add a new security check on mmap operations to see if the user is attempting
      to mmap to low area of the address space.  The amount of space protected is
      indicated by the new proc tunable /proc/sys/vm/mmap_min_addr and defaults to
      0, preserving existing behavior.
      
      This patch uses a new SELinux security class "memprotect."  Policy already
      contains a number of allow rules like a_t self:process * (unconfined_t being
      one of them) which mean that putting this check in the process class (its
      best current fit) would make it useless as all user processes, which we also
      want to protect against, would be allowed. By taking the memprotect name of
      the new class it will also make it possible for us to move some of the other
      memory protect permissions out of 'process' and into the new class next time
      we bump the policy version number (which I also think is a good future idea)
      Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
      Acked-by: NChris Wright <chrisw@sous-sol.org>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      ed032189
  3. 10 7月, 2007 1 次提交
  4. 17 5月, 2007 1 次提交
  5. 10 5月, 2007 1 次提交
  6. 09 5月, 2007 1 次提交
    • K
      proc: maps protection · 5096add8
      Kees Cook 提交于
      The /proc/pid/ "maps", "smaps", and "numa_maps" files contain sensitive
      information about the memory location and usage of processes.  Issues:
      
      - maps should not be world-readable, especially if programs expect any
        kind of ASLR protection from local attackers.
      - maps cannot just be 0400 because "-D_FORTIFY_SOURCE=2 -O2" makes glibc
        check the maps when %n is in a *printf call, and a setuid(getuid())
        process wouldn't be able to read its own maps file.  (For reference
        see http://lkml.org/lkml/2006/1/22/150)
      - a system-wide toggle is needed to allow prior behavior in the case of
        non-root applications that depend on access to the maps contents.
      
      This change implements a check using "ptrace_may_attach" before allowing
      access to read the maps contents.  To control this protection, the new knob
      /proc/sys/kernel/maps_protect has been added, with corresponding updates to
      the procfs documentation.
      
      [akpm@linux-foundation.org: build fixes]
      [akpm@linux-foundation.org: New sysctl numbers are old hat]
      Signed-off-by: NKees Cook <kees@outflux.net>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5096add8
  7. 24 4月, 2007 1 次提交
  8. 05 3月, 2007 1 次提交
  9. 02 3月, 2007 1 次提交
  10. 15 2月, 2007 11 次提交
  11. 12 2月, 2007 5 次提交
    • O
      [PATCH] _proc_do_string(): fix short reads · 8d060877
      Oleg Nesterov 提交于
      If you try to read things like /proc/sys/kernel/osrelease with single-byte
      reads, you get just one byte and then EOF.  This is because _proc_do_string()
      assumes that the caller is read()ing into a buffer which is large enough to
      fit the whole string in a single hit.
      
      Fix.
      
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Michael Tokarev <mjt@tls.msk.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8d060877
    • A
      [PATCH] sysctl warning fix · cb799b89
      Andrew Morton 提交于
      kernel/sysctl.c:2816: warning: 'sysctl_ipc_data' defined but not used
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb799b89
    • T
      [PATCH] Add TAINT_USER and ability to set taint flags from userspace · 34f5a398
      Theodore Ts'o 提交于
      Allow taint flags to be set from userspace by writing to
      /proc/sys/kernel/tainted, and add a new taint flag, TAINT_USER, to be used
      when userspace has potentially done something dangerous that might
      compromise the kernel.  This will allow support personnel to ask further
      questions about what may have caused the user taint flag to have been set.
      
      For example, they might examine the logs of the realtime JVM to see if the
      Java program has used the really silly, stupid, dangerous, and
      completely-non-portable direct access to physical memory feature which MUST
      be implemented according to the Real-Time Specification for Java (RTSJ).
      Sigh.  What were those silly people at Sun thinking?
      
      [akpm@osdl.org: build fix]
      [bunk@stusta.de: cleanup]
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34f5a398
    • A
      [PATCH] sysctl_{,ms_}jiffies: fix oldlen semantics · 3ee75ac3
      Alexey Dobriyan 提交于
      currently it's
      1) if *oldlenp == 0,
      	don't writeback anything
      
      2) if *oldlenp >= table->maxlen,
      	don't writeback more than table->maxlen bytes and rewrite *oldlenp
      	don't look at underlying type granularity
      
      3) if 0 < *oldlenp < table->maxlen,
      		*cough*
      	string sysctls don't writeback more than *oldlenp bytes.
      	OK, that's because sizeof(char) == 1
      
      	int sysctls writeback anything in (0, table->maxlen] range
      	Though accept integers divisible by sizeof(int) for writing.
      
      sysctl_jiffies and sysctl_ms_jiffies don't writeback anything but
      sizeof(int), which violates 1) and 2).
      
      So, make sysctl_jiffies and sysctl_ms_jiffies accept
      a) *oldlenp == 0, not doing writeback
      b) *oldlenp >= sizeof(int), writing one integer.
      
      -EINVAL still returned for *oldlenp == 1, 2, 3.
      Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ee75ac3
    • E
      [PATCH] make reading /proc/sys/kernel/cap-bould not require CAP_SYS_MODULE · 6ff1b442
      Eric Paris 提交于
      Reading /proc/sys/kernel/cap-bound requires CAP_SYS_MODULE.  (see
      proc_dointvec_bset in kernel/sysctl.c)
      
      sysctl appears to drive all over proc reading everything it can get it's
      hands on and is complaining when it is being denied access to read
      cap-bound.  Clearly writing to cap-bound should be a sensitive operation
      but requiring CAP_SYS_MODULE to read cap-bound seems a bit to strong.  I
      believe the information could with reasonable certainty be obtained by
      looking at a bunch of the output of /proc/pid/status which has very low
      security protection, so at best we are just getting a little obfuscation of
      information.
      
      Currently SELinux policy has to 'dontaudit' capability checks for
      CAP_SYS_MODULE for things like sysctl which just want to read cap-bound.
      In doing so we also as a byproduct have to hide warnings of potential
      exploits such as if at some time that sysctl actually tried to load a
      module.  I wondered if anyone would have a problem opening cap-bound up to
      read from anyone?
      Acked-by: NChris Wright <chrisw@sous-sol.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ff1b442
  12. 14 12月, 2006 1 次提交
  13. 11 12月, 2006 3 次提交
  14. 09 12月, 2006 5 次提交
  15. 08 12月, 2006 3 次提交
    • H
      [PATCH] struct seq_operations and struct file_operations constification · 15ad7cdc
      Helge Deller 提交于
       - move some file_operations structs into the .rodata section
      
       - move static strings from policy_types[] array into the .rodata section
      
       - fix generic seq_operations usages, so that those structs may be defined
         as "const" as well
      
      [akpm@osdl.org: couple of fixes]
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      15ad7cdc
    • B
      [PATCH] sysctl: string length calculated is wrong if it contains negative numbers · bd9b0bac
      BP, Praveen 提交于
      In the functions do_proc_dointvec() and do_proc_doulongvec_minmax(),
      there seems to be a bug in string length calculation if string contains
      negative integer.
      
      The console log given below explains the bug. Setting negative values
      may not be a right thing to do for "console log level" but then the test
      (given below) can be used to demonstrate the bug in the code.
      
      # echo "-1 -1 -1 -123456" > /proc/sys/kernel/printk
      # cat /proc/sys/kernel/printk
      -1      -1      -1      -1234
      #
      # echo "-1 -1 -1 123456" > /proc/sys/kernel/printk
      # cat /proc/sys/kernel/printk
      -1      -1      -1      1234
      #
      
      (akpm: the bug is that 123456 gets truncated)
      
      It works as expected if string contains all +ve integers
      
      # echo "1 2 3 4" > /proc/sys/kernel/printk
      # cat /proc/sys/kernel/printk
      1       2       3       4
      #
      
      The patch given below fixes the issue.
      Signed-off-by: NPraveen BP <praveenbp@ti.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bd9b0bac
    • A
      [PATCH] new scheme to preempt swap token · 7602bdf2
      Ashwin Chaugule 提交于
      The new swap token patches replace the current token traversal algo.  The old
      algo had a crude timeout parameter that was used to handover the token from
      one task to another.  This algo, transfers the token to the tasks that are in
      need of the token.  The urgency for the token is based on the number of times
      a task is required to swap-in pages.  Accordingly, the priority of a task is
      incremented if it has been badly affected due to swap-outs.  To ensure that
      the token doesnt bounce around rapidly, the token holders are given a priority
      boost.  The priority of tasks is also decremented, if their rate of swap-in's
      keeps reducing.  This way, the condition to check whether to pre-empt the swap
      token, is a matter of comparing two task's priority fields.
      
      [akpm@osdl.org: cleanups]
      Signed-off-by: NAshwin Chaugule <ashwin.chaugule@celunite.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7602bdf2