1. 15 2月, 2008 13 次提交
  2. 14 2月, 2008 9 次提交
  3. 13 2月, 2008 3 次提交
  4. 12 2月, 2008 5 次提交
    • K
      mempolicy: silently restrict nodemask to allowed nodes · 31f1de46
      KOSAKI Motohiro 提交于
      Kosaki Motohito noted that "numactl --interleave=all ..." failed in the
      presence of memoryless nodes.  This patch attempts to fix that problem.
      
      Some background:
      
      numactl --interleave=all calls set_mempolicy(2) with a fully populated
      [out to MAXNUMNODES] nodemask.  set_mempolicy() [in do_set_mempolicy()]
      calls contextualize_policy() which requires that the nodemask be a
      subset of the current task's mems_allowed; else EINVAL will be returned.
      
      A task's mems_allowed will always be a subset of node_states[N_HIGH_MEMORY]
      i.e., nodes with memory.  So, a fully populated nodemask will be
      declared invalid if it includes memoryless nodes.
      
        NOTE:  the same thing will occur when running in a cpuset
               with restricted mem_allowed--for the same reason:
               node mask contains dis-allowed nodes.
      
      mbind(2), on the other hand, just masks off any nodes in the nodemask
      that are not included in the caller's mems_allowed.
      
      In each case [mbind() and set_mempolicy()], mpol_check_policy() will
      complain [again, resulting in EINVAL] if the nodemask contains any
      memoryless nodes.  This is somewhat redundant as mpol_new() will remove
      memoryless nodes for interleave policy, as will bind_zonelist()--called
      by mpol_new() for BIND policy.
      
      Proposed fix:
      
      1) modify contextualize_policy logic to:
         a) remember whether the incoming node mask is empty.
         b) if not, restrict the nodemask to allowed nodes, as is
            currently done in-line for mbind().  This guarantees
            that the resulting mask includes only nodes with memory.
      
            NOTE:  this is a [benign, IMO] change in behavior for
                   set_mempolicy().  Dis-allowed nodes will be
                   silently ignored, rather than returning an error.
      
         c) fold this code into mpol_check_policy(), replace 2 calls to
            contextualize_policy() to call mpol_check_policy() directly
            and remove contextualize_policy().
      
      2) In existing mpol_check_policy() logic, after "contextualization":
         a) MPOL_DEFAULT:  require that in coming mask "was_empty"
         b) MPOL_{BIND|INTERLEAVE}:  require that contextualized nodemask
            contains at least one node.
         c) add a case for MPOL_PREFERRED:  if in coming was not empty
            and resulting mask IS empty, user specified invalid nodes.
            Return EINVAL.
         c) remove the now redundant check for memoryless nodes
      
      3) remove the now redundant masking of policy nodes for interleave
         policy from mpol_new().
      
      4) Now that mpol_check_policy() contextualizes the nodemask, remove
         the in-line nodes_and() from sys_mbind().  I believe that this
         restores mbind() to the behavior before the memoryless-nodes
         patch series.  E.g., we'll no longer treat an invalid nodemask
         with MPOL_PREFERRED as local allocation.
      
      [ Patch history:
      
        v1 -> v2:
         - Communicate whether or not incoming node mask was empty to
           mpol_check_policy() for better error checking.
         - As suggested by David Rientjes, remove the now unused
           cpuset_nodes_subset_current_mems_allowed() from cpuset.h
      
        v2 -> v3:
         - As suggested by Kosaki Motohito, fold the "contextualization"
           of policy nodemask into mpol_check_policy().  Looks a little
           cleaner. ]
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31f1de46
    • A
      Make topology fallback macros reference their arguments. · 271cad6d
      Andi Kleen 提交于
      This avoids warnings with unreferenced variables in the !NUMA case.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      271cad6d
    • T
      [IA64] Fix build for sim_defconfig · 10d0aa3c
      Tony Luck 提交于
      Commit bdc80787 broke the build
      for this config because the sim_defconfig selects CONFIG_HZ=250
      but include/asm-ia64/param.h has an ifdef for the simulator to
      force HZ to 32.  So we ended up with a kernel/timeconst.h set
      for HZ=250 ... which then failed the check for the right HZ
      value and died with:
      
      Drop the #ifdef magic from param.h and make force CONFIG_HZ=32
      directly for the simulator.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      10d0aa3c
    • J
      [SCSI] update SG_ALL to avoid causing chaining · 4660c8ed
      James Bottomley 提交于
      Since the sg chaining patches went in, our current value of 255 for
      SG_ALL excites chaining on some drivers which cannot support it (and
      would thus oops).  Redefine SG_ALL to mean no sg table size
      preference, but use the single allocation (non chained) limit.  This
      also helps for drivers that use it to size an internal table.
      
      We'll do an opt in system later where truly chaining supporting
      drivers can define their sg_tablesize to be anything up to
      SCSI_MAX_SG_CHAIN_ELEMENTS.
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      4660c8ed
    • A
      Prevent IDE boot ops on NUMA system · 1f07e988
      Andi Kleen 提交于
      Without this patch a Opteron test system here oopses at boot with
      current git.
      
      Calling to_pci_dev() on a NULL pointer gives a negative value so the
      following NULL pointer check never triggers and then an illegal address
      is referenced.  Check the unadjusted original device pointer for NULL
      instead.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f07e988
  5. 11 2月, 2008 4 次提交
    • B
      ide-disk: fix flush requests (take 2) · 395d8ef5
      Bartlomiej Zolnierkiewicz 提交于
      commit 813a0eb2
      Author: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Date:   Fri Jan 25 22:17:10 2008 +0100
      
          ide: switch idedisk_prepare_flush() to use REQ_TYPE_ATA_TASKFILE requests
      
      ...
      
      broke flush requests.
      
      Allocating IDE command structure on the stack for flush requests is not
      a very brilliant idea:
      
      - idedisk_prepare_flush() only prepares the request and it doesn't wait
        for it to be completed
      
      - there are can be multiple flush requests queued in the queue
      
      Fix the problem (per hints from James Bottomley) by:
      - dynamically allocating ide_task_t instance using kmalloc(..., GFP_ATOMIC)
      - adding new taskfile flag (IDE_TFLAG_DYN)
      - calling kfree() in ide_end_drive_command() if IDE_TFLAG_DYN is set
        (while at it rename 'args' to 'task' and fix whitespace damage)
      
      [ This will be fixed properly before 2.6.25 but this bug is rather
        critical and the proper solution requires some more work + testing. ]
      
      Thanks to Sebastian Siewior and Christoph Hellwig for reporting the
      problem and testing patches (extra thanks to Sebastian for bisecting
      it to the guilty commmit).
      Tested-by: NSebastian Siewior <ide-bug@ml.breakpoint.cc>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Tejun Heo <htejun@gmail.com>
      Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      395d8ef5
    • S
      ide: introduce CONFIG_BLK_DEV_IDEDMA_SFF option · 8e882ba1
      Sergei Shtylyov 提交于
      Introduce new option CONFIG_BLK_DEV_IDEDMA_SFF for non-PCI SFF-8038i compatible
      bus mastering IDE controllers (which there are a few known), thus fixing a hack
      made for Palmchip BK3710 controller...
      Signed-off-by: NSergei Shtylyov <sshtylyov@ru.mvista.com>
      Cc: Anton Salnikov <asalnikov@ru.mvista.com>
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      8e882ba1
    • J
      nfsd: clean up svc_reserve_auth() · fbb7878c
      J. Bruce Fields 提交于
      This is a void function attempting to return the return value from
      another void function, which seems harmless but extremely weird, and
      apparently makes some compilers complain.
      
      While we're there, clean up a little (e.g. the switch statement had a
      minor style problem and seemed overkill as long as there's only one
      case).
      
      Thanks to Trond for noticing this.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      fbb7878c
    • M
      Change pci_raw_ops to pci_raw_read/write · b6ce068a
      Matthew Wilcox 提交于
      We want to allow different implementations of pci_raw_ops for standard
      and extended config space on x86.  Rather than clutter generic code with
      knowledge of this, we make pci_raw_ops private to x86 and use it to
      implement the new raw interface -- raw_pci_read() and raw_pci_write().
      Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6ce068a
  6. 10 2月, 2008 6 次提交
    • O
      hrtimer: fix *rmtp handling in hrtimer_nanosleep() · 080344b9
      Oleg Nesterov 提交于
      Spotted by Pavel Emelyanov and Alexey Dobriyan.
      
      hrtimer_nanosleep() sets restart_block->arg1 = rmtp, but this rmtp points to
      the local variable which lives in the caller's stack frame. This means that
      if sys_restart_syscall() actually happens and it is interrupted as well, we
      don't update the user-space variable, but write into the already dead stack
      frame.
      
      Introduced by commit 04c22714
      hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier
      
      Change the callers to pass "__user *rmtp" to hrtimer_nanosleep(), and change
      hrtimer_nanosleep() to use copy_to_user() to actually update *rmtp.
      
      Small problem remains. man 2 nanosleep states that *rtmp should be written if
      nanosleep() was interrupted (it says nothing whether it is OK to update *rmtp
      if nanosleep returns 0), but (with or without this patch) we can dirty *rem
      even if nanosleep() returns 0.
      
      NOTE: this patch doesn't change compat_sys_nanosleep(), because it has other
      bugs. Fixed by the next patch.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Alexey Dobriyan <adobriyan@sw.ru>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: Pavel Emelyanov <xemul@sw.ru>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Toyo Abe <toyoa@mvista.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      
       include/linux/hrtimer.h |    2 -
       kernel/hrtimer.c        |   51 +++++++++++++++++++++++++-----------------------
       kernel/posix-timers.c   |   14 +------------
       3 files changed, 30 insertions(+), 37 deletions(-)
      080344b9
    • J
      ntp: correct inconsistent interval/tick_length usage · e13a2e61
      john stultz 提交于
      clocksource initialization and error accumulation.  This corrects a 280ppm
      drift seen on some systems using acpi_pm, and affects other clocksources as
      well (likely to a lesser degree).
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      e13a2e61
    • D
      [SPARC]: Merge asm-sparc{,64}/a.out.h · 344e53f5
      David S. Miller 提交于
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      344e53f5
    • T
      ext4: Add new "development flag" to the ext4 filesystem · 469108ff
      Theodore Tso 提交于
      This flag is simply a generic "this is a crash/burn test filesystem"
      marker.  If it is set, then filesystem code which is "in development"
      will be allowed to mount the filesystem.  Filesystem code which is not
      considered ready for prime-time will check for this flag, and if it is
      not set, it will refuse to touch the filesystem.
      
      As we start rolling ext4 out to distro's like Fedora, et. al, this makes
      it less likely that a user might accidentally start using ext4 on a
      production filesystem; a bad thing, since that will essentially make it
      be unfsckable until e2fsprogs catches up.
      Signed-off-by: NTheodore Tso <tytso@MIT.EDU>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      469108ff
    • T
      x86: introduce page pool in cpa · 76ebd054
      Thomas Gleixner 提交于
      DEBUG_PAGEALLOC was not possible on 64-bit due to its early-bootup
      hardcoded reliance on PSE pages, and the unrobustness of the runtime
      splitup of large pages. The splitup ended in recursive calls to
      alloc_pages() when a page for a pte split was requested.
      
      Avoid the recursion with a preallocated page pool, which is used to
      split up large mappings and gets refilled in the return path of
      kernel_map_pages after the split has been done. The size of the page
      pool is adjusted to the available memory.
      
      This part just implements the page pool and the initialization w/o
      using it yet.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      76ebd054
    • I
      x86: construct 32-bit boot time page tables in native format. · 551889a6
      Ian Campbell 提交于
      Specifically the boot time page tables in a CONFIG_X86_PAE=y enabled
      kernel are in PAE format.
      
      early_ioremap is updated to use the standard page table accessors.
      
      Clear any mappings beyond max_low_pfn from the boot page tables in
      native_pagetable_setup_start because the initial mappings can extend
      beyond the range of physical memory and into the vmalloc area.
      
      Derived from patches by Eric Biederman and H. Peter Anvin.
      
      [ jeremy@goop.org: PAE swapper_pg_dir needs to be page-sized fix ]
      Signed-off-by: NIan Campbell <ijc@hellion.org.uk>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Mika Penttilä <mika.penttila@kolumbus.fi>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      551889a6