1. 28 6月, 2006 1 次提交
    • I
      [PATCH] vdso: randomize the i386 vDSO by moving it into a vma · e6e5494c
      Ingo Molnar 提交于
      Move the i386 VDSO down into a vma and thus randomize it.
      
      Besides the security implications, this feature also helps debuggers, which
      can COW a vma-backed VDSO just like a normal DSO and can thus do
      single-stepping and other debugging features.
      
      It's good for hypervisors (Xen, VMWare) too, which typically live in the same
      high-mapped address space as the VDSO, hence whenever the VDSO is used, they
      get lots of guest pagefaults and have to fix such guest accesses up - which
      slows things down instead of speeding things up (the primary purpose of the
      VDSO).
      
      There's a new CONFIG_COMPAT_VDSO (default=y) option, which provides support
      for older glibcs that still rely on a prelinked high-mapped VDSO.  Newer
      distributions (using glibc 2.3.3 or later) can turn this option off.  Turning
      it off is also recommended for security reasons: attackers cannot use the
      predictable high-mapped VDSO page as syscall trampoline anymore.
      
      There is a new vdso=[0|1] boot option as well, and a runtime
      /proc/sys/vm/vdso_enabled sysctl switch, that allows the VDSO to be turned
      on/off.
      
      (This version of the VDSO-randomization patch also has working ELF
      coredumping, the previous patch crashed in the coredumping code.)
      
      This code is a combined work of the exec-shield VDSO randomization
      code and Gerd Hoffmann's hypervisor-centric VDSO patch. Rusty Russell
      started this patch and i completed it.
      
      [akpm@osdl.org: cleanups]
      [akpm@osdl.org: compile fix]
      [akpm@osdl.org: compile fix 2]
      [akpm@osdl.org: compile fix 3]
      [akpm@osdl.org: revernt MAXMEM change]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArjan van de Ven <arjan@infradead.org>
      Cc: Gerd Hoffmann <kraxel@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Jan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e6e5494c
  2. 27 6月, 2006 1 次提交
  3. 26 6月, 2006 1 次提交
  4. 23 6月, 2006 2 次提交
  5. 20 6月, 2006 1 次提交
  6. 24 3月, 2006 3 次提交
  7. 09 3月, 2006 1 次提交
    • D
      [PATCH] fix file counting · 529bf6be
      Dipankar Sarma 提交于
      I have benchmarked this on an x86_64 NUMA system and see no significant
      performance difference on kernbench.  Tested on both x86_64 and powerpc.
      
      The way we do file struct accounting is not very suitable for batched
      freeing.  For scalability reasons, file accounting was
      constructor/destructor based.  This meant that nr_files was decremented
      only when the object was removed from the slab cache.  This is susceptible
      to slab fragmentation.  With RCU based file structure, consequent batched
      freeing and a test program like Serge's, we just speed this up and end up
      with a very fragmented slab -
      
      llm22:~ # cat /proc/sys/fs/file-nr
      587730  0       758844
      
      At the same time, I see only a 2000+ objects in filp cache.  The following
      patch I fixes this problem.
      
      This patch changes the file counting by removing the filp_count_lock.
      Instead we use a separate percpu counter, nr_files, for now and all
      accesses to it are through get_nr_files() api.  In the sysctl handler for
      nr_files, we populate files_stat.nr_files before returning to user.
      
      Counting files as an when they are created and destroyed (as opposed to
      inside slab) allows us to correctly count open files with RCU.
      Signed-off-by: NDipankar Sarma <dipankar@in.ibm.com>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      529bf6be
  8. 03 3月, 2006 1 次提交
  9. 01 3月, 2006 1 次提交
  10. 21 2月, 2006 2 次提交
  11. 18 2月, 2006 1 次提交
  12. 02 2月, 2006 2 次提交
  13. 19 1月, 2006 1 次提交
  14. 15 1月, 2006 1 次提交
  15. 12 1月, 2006 1 次提交
  16. 09 1月, 2006 2 次提交
    • R
      [PATCH] Make high and batch sizes of per_cpu_pagelists configurable · 8ad4b1fb
      Rohit Seth 提交于
      As recently there has been lot of traffic on the right values for batch and
      high water marks for per_cpu_pagelists.  This patch makes these two
      variables configurable through /proc interface.
      
      A new tunable /proc/sys/vm/percpu_pagelist_fraction is added.  This entry
      controls the fraction of pages at most in each zone that are allocated for
      each per cpu page list.  The min value for this is 8.  It means that we
      don't allow more than 1/8th of pages in each zone to be allocated in any
      single per_cpu_pagelist.
      
      The batch value of each per cpu pagelist is also updated as a result.  It
      is set to pcp->high/4.  The upper limit of batch is (PAGE_SHIFT * 8)
      Signed-off-by: NRohit Seth <rohit.seth@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8ad4b1fb
    • A
      [PATCH] drop-pagecache · 9d0243bc
      Andrew Morton 提交于
      Add /proc/sys/vm/drop_caches.  When written to, this will cause the kernel to
      discard as much pagecache and/or reclaimable slab objects as it can.  THis
      operation requires root permissions.
      
      It won't drop dirty data, so the user should run `sync' first.
      
      Caveats:
      
      a) Holds inode_lock for exorbitant amounts of time.
      
      b) Needs to be taught about NUMA nodes: propagate these all the way through
         so the discarding can be controlled on a per-node basis.
      
      This is a debugging feature: useful for getting consistent results between
      filesystem benchmarks.  We could possibly put it under a config option, but
      it's less than 300 bytes.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9d0243bc
  17. 07 1月, 2006 1 次提交
  18. 05 1月, 2006 2 次提交
  19. 01 1月, 2006 1 次提交
  20. 31 12月, 2005 2 次提交
    • Y
      [PATCH] Fix false old value return of sysctl · 82c9df82
      Yi Yang 提交于
      For the sysctl syscall, if the user wants to get the old value of a
      sysctl entry and set a new value for it in the same syscall, the old
      value is always overwritten by the new value if the sysctl entry is of
      string type and if the user sets its strategy to sysctl_string.  This
      issue lies in the strategy being run twice if the strategy is set to
      sysctl_string, the general strategy sysctl_string always returns 0 if
      success.
      
      Such strategy routines as sysctl_jiffies and sysctl_jiffies_ms return 1
      because they do read and write for the sysctl entry.
      
      The strategy routine sysctl_string return 0 although it actually read
      and write the sysctl entry.
      
      According to my analysis, if a strategy routine do read and write, it
      should return 1, if it just does some necessary check but not read and
      write, it should return 0, for example sysctl_intvec.
      Signed-off-by: NYi Yang <yang.y.yi@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      82c9df82
    • L
      sysctl: don't overflow the user-supplied buffer with '\0' · 8febdd85
      Linus Torvalds 提交于
      If the string was too long to fit in the user-supplied buffer,
      the sysctl layer would zero-terminate it by writing past the
      end of the buffer. Don't do that.
      
      Noticed by Yi Yang <yang.y.yi@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8febdd85
  21. 09 11月, 2005 1 次提交
    • A
      [PATCH] Fix sysctl unregistration oops (CVE-2005-2709) · 330d57fb
      Al Viro 提交于
      You could open the /proc/sys/net/ipv4/conf/<if>/<whatever> file, then
      wait for interface to go away, try to grab as much memory as possible in
      hope to hit the (kfreed) ctl_table.  Then fill it with pointers to your
      function.  Then do read from file you've opened and if you are lucky,
      you'll get it called as ->proc_handler() in kernel mode.
      
      So this is at least an Oops and possibly more.  It does depend on an
      interface going away though, so less of a security risk than it would
      otherwise be.
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      330d57fb
  22. 07 11月, 2005 2 次提交
    • R
      [PATCH] more kernel-doc cleanups, additions · 1e5d5331
      Randy Dunlap 提交于
      Various core kernel-doc cleanups:
      - add missing function parameters in ipc, irq/manage, kernel/sys,
        kernel/sysctl, and mm/slab;
      - move description to just above function for kernel_restart()
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1e5d5331
    • Z
      [PATCH] aio: remove aio_max_nr accounting race · d55b5fda
      Zach Brown 提交于
      AIO was adding a new context's max requests to the global total before
      testing if that resulting total was over the global limit.  This let
      innocent tasks get their new limit tested along with a racing guilty task
      that was crossing the limit.  This serializes the _nr accounting with a
      spinlock It also switches to using unsigned long for the global totals.
      Individual contexts are still limited to an unsigned int's worth of
      requests by the syscall interface.
      
      The problem and fix were verified with a simple program that spun creating
      and destroying a context while holding on to another long lived context.
      Before the patch a task creating a tiny context could get a spurious EAGAIN
      if it raced with a task creating a very large context that overran the
      limit.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d55b5fda
  23. 30 8月, 2005 1 次提交
  24. 28 7月, 2005 1 次提交
    • M
      [PATCH] s390: spin lock retry · 951f22d5
      Martin Schwidefsky 提交于
      Split spin lock and r/w lock implementation into a single try which is done
      inline and an out of line function that repeatedly tries to get the lock
      before doing the cpu_relax().  Add a system control to set the number of
      retries before a cpu is yielded.
      
      The reason for the spin lock retry is that the diagnose 0x44 that is used to
      give up the virtual cpu is quite expensive.  For spin locks that are held only
      for a short period of time the costs of the diagnoses outweights the savings
      for spin locks that are held for a longer timer.  The default retry count is
      1000.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      951f22d5
  25. 14 7月, 2005 1 次提交
  26. 13 7月, 2005 1 次提交
    • R
      [PATCH] inotify · 0eeca283
      Robert Love 提交于
      inotify is intended to correct the deficiencies of dnotify, particularly
      its inability to scale and its terrible user interface:
      
              * dnotify requires the opening of one fd per each directory
                that you intend to watch. This quickly results in too many
                open files and pins removable media, preventing unmount.
              * dnotify is directory-based. You only learn about changes to
                directories. Sure, a change to a file in a directory affects
                the directory, but you are then forced to keep a cache of
                stat structures.
              * dnotify's interface to user-space is awful.  Signals?
      
      inotify provides a more usable, simple, powerful solution to file change
      notification:
      
              * inotify's interface is a system call that returns a fd, not SIGIO.
      	  You get a single fd, which is select()-able.
              * inotify has an event that says "the filesystem that the item
                you were watching is on was unmounted."
              * inotify can watch directories or files.
      
      Inotify is currently used by Beagle (a desktop search infrastructure),
      Gamin (a FAM replacement), and other projects.
      
      See Documentation/filesystems/inotify.txt.
      Signed-off-by: NRobert Love <rml@novell.com>
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0eeca283
  27. 26 6月, 2005 1 次提交
  28. 24 6月, 2005 1 次提交
    • A
      [PATCH] setuid core dump · d6e71144
      Alan Cox 提交于
      Add a new `suid_dumpable' sysctl:
      
      This value can be used to query and set the core dump mode for setuid
      or otherwise protected/tainted binaries. The modes are
      
      0 - (default) - traditional behaviour.  Any process which has changed
          privilege levels or is execute only will not be dumped
      
      1 - (debug) - all processes dump core when possible.  The core dump is
          owned by the current user and no security is applied.  This is intended
          for system debugging situations only.  Ptrace is unchecked.
      
      2 - (suidsafe) - any binary which normally would not be dumped is dumped
          readable by root only.  This allows the end user to remove such a dump but
          not access it directly.  For security reasons core dumps in this mode will
          not overwrite one another or other files.  This mode is appropriate when
          adminstrators are attempting to debug problems in a normal environment.
      
      (akpm:
      
      > > +EXPORT_SYMBOL(suid_dumpable);
      >
      > EXPORT_SYMBOL_GPL?
      
      No problem to me.
      
      > >  	if (current->euid == current->uid && current->egid == current->gid)
      > >  		current->mm->dumpable = 1;
      >
      > Should this be SUID_DUMP_USER?
      
      Actually the feedback I had from last time was that the SUID_ defines
      should go because its clearer to follow the numbers. They can go
      everywhere (and there are lots of places where dumpable is tested/used
      as a bool in untouched code)
      
      > Maybe this should be renamed to `dump_policy' or something.  Doing that
      > would help us catch any code which isn't using the #defines, too.
      
      Fair comment. The patch was designed to be easy to maintain for Red Hat
      rather than for merging. Changing that field would create a gigantic
      diff because it is used all over the place.
      
      )
      Signed-off-by: NAlan Cox <alan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d6e71144
  29. 01 5月, 2005 1 次提交
  30. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4