1. 15 2月, 2008 7 次提交
  2. 14 2月, 2008 8 次提交
  3. 13 2月, 2008 2 次提交
  4. 12 2月, 2008 2 次提交
    • K
      mempolicy: silently restrict nodemask to allowed nodes · 31f1de46
      KOSAKI Motohiro 提交于
      Kosaki Motohito noted that "numactl --interleave=all ..." failed in the
      presence of memoryless nodes.  This patch attempts to fix that problem.
      
      Some background:
      
      numactl --interleave=all calls set_mempolicy(2) with a fully populated
      [out to MAXNUMNODES] nodemask.  set_mempolicy() [in do_set_mempolicy()]
      calls contextualize_policy() which requires that the nodemask be a
      subset of the current task's mems_allowed; else EINVAL will be returned.
      
      A task's mems_allowed will always be a subset of node_states[N_HIGH_MEMORY]
      i.e., nodes with memory.  So, a fully populated nodemask will be
      declared invalid if it includes memoryless nodes.
      
        NOTE:  the same thing will occur when running in a cpuset
               with restricted mem_allowed--for the same reason:
               node mask contains dis-allowed nodes.
      
      mbind(2), on the other hand, just masks off any nodes in the nodemask
      that are not included in the caller's mems_allowed.
      
      In each case [mbind() and set_mempolicy()], mpol_check_policy() will
      complain [again, resulting in EINVAL] if the nodemask contains any
      memoryless nodes.  This is somewhat redundant as mpol_new() will remove
      memoryless nodes for interleave policy, as will bind_zonelist()--called
      by mpol_new() for BIND policy.
      
      Proposed fix:
      
      1) modify contextualize_policy logic to:
         a) remember whether the incoming node mask is empty.
         b) if not, restrict the nodemask to allowed nodes, as is
            currently done in-line for mbind().  This guarantees
            that the resulting mask includes only nodes with memory.
      
            NOTE:  this is a [benign, IMO] change in behavior for
                   set_mempolicy().  Dis-allowed nodes will be
                   silently ignored, rather than returning an error.
      
         c) fold this code into mpol_check_policy(), replace 2 calls to
            contextualize_policy() to call mpol_check_policy() directly
            and remove contextualize_policy().
      
      2) In existing mpol_check_policy() logic, after "contextualization":
         a) MPOL_DEFAULT:  require that in coming mask "was_empty"
         b) MPOL_{BIND|INTERLEAVE}:  require that contextualized nodemask
            contains at least one node.
         c) add a case for MPOL_PREFERRED:  if in coming was not empty
            and resulting mask IS empty, user specified invalid nodes.
            Return EINVAL.
         c) remove the now redundant check for memoryless nodes
      
      3) remove the now redundant masking of policy nodes for interleave
         policy from mpol_new().
      
      4) Now that mpol_check_policy() contextualizes the nodemask, remove
         the in-line nodes_and() from sys_mbind().  I believe that this
         restores mbind() to the behavior before the memoryless-nodes
         patch series.  E.g., we'll no longer treat an invalid nodemask
         with MPOL_PREFERRED as local allocation.
      
      [ Patch history:
      
        v1 -> v2:
         - Communicate whether or not incoming node mask was empty to
           mpol_check_policy() for better error checking.
         - As suggested by David Rientjes, remove the now unused
           cpuset_nodes_subset_current_mems_allowed() from cpuset.h
      
        v2 -> v3:
         - As suggested by Kosaki Motohito, fold the "contextualization"
           of policy nodemask into mpol_check_policy().  Looks a little
           cleaner. ]
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31f1de46
    • A
      Prevent IDE boot ops on NUMA system · 1f07e988
      Andi Kleen 提交于
      Without this patch a Opteron test system here oopses at boot with
      current git.
      
      Calling to_pci_dev() on a NULL pointer gives a negative value so the
      following NULL pointer check never triggers and then an illegal address
      is referenced.  Check the unadjusted original device pointer for NULL
      instead.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f07e988
  5. 11 2月, 2008 4 次提交
    • B
      ide-disk: fix flush requests (take 2) · 395d8ef5
      Bartlomiej Zolnierkiewicz 提交于
      commit 813a0eb2
      Author: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Date:   Fri Jan 25 22:17:10 2008 +0100
      
          ide: switch idedisk_prepare_flush() to use REQ_TYPE_ATA_TASKFILE requests
      
      ...
      
      broke flush requests.
      
      Allocating IDE command structure on the stack for flush requests is not
      a very brilliant idea:
      
      - idedisk_prepare_flush() only prepares the request and it doesn't wait
        for it to be completed
      
      - there are can be multiple flush requests queued in the queue
      
      Fix the problem (per hints from James Bottomley) by:
      - dynamically allocating ide_task_t instance using kmalloc(..., GFP_ATOMIC)
      - adding new taskfile flag (IDE_TFLAG_DYN)
      - calling kfree() in ide_end_drive_command() if IDE_TFLAG_DYN is set
        (while at it rename 'args' to 'task' and fix whitespace damage)
      
      [ This will be fixed properly before 2.6.25 but this bug is rather
        critical and the proper solution requires some more work + testing. ]
      
      Thanks to Sebastian Siewior and Christoph Hellwig for reporting the
      problem and testing patches (extra thanks to Sebastian for bisecting
      it to the guilty commmit).
      Tested-by: NSebastian Siewior <ide-bug@ml.breakpoint.cc>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Tejun Heo <htejun@gmail.com>
      Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      395d8ef5
    • S
      ide: introduce CONFIG_BLK_DEV_IDEDMA_SFF option · 8e882ba1
      Sergei Shtylyov 提交于
      Introduce new option CONFIG_BLK_DEV_IDEDMA_SFF for non-PCI SFF-8038i compatible
      bus mastering IDE controllers (which there are a few known), thus fixing a hack
      made for Palmchip BK3710 controller...
      Signed-off-by: NSergei Shtylyov <sshtylyov@ru.mvista.com>
      Cc: Anton Salnikov <asalnikov@ru.mvista.com>
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      8e882ba1
    • J
      nfsd: clean up svc_reserve_auth() · fbb7878c
      J. Bruce Fields 提交于
      This is a void function attempting to return the return value from
      another void function, which seems harmless but extremely weird, and
      apparently makes some compilers complain.
      
      While we're there, clean up a little (e.g. the switch statement had a
      minor style problem and seemed overkill as long as there's only one
      case).
      
      Thanks to Trond for noticing this.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      fbb7878c
    • M
      Change pci_raw_ops to pci_raw_read/write · b6ce068a
      Matthew Wilcox 提交于
      We want to allow different implementations of pci_raw_ops for standard
      and extended config space on x86.  Rather than clutter generic code with
      knowledge of this, we make pci_raw_ops private to x86 and use it to
      implement the new raw interface -- raw_pci_read() and raw_pci_write().
      Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6ce068a
  6. 10 2月, 2008 7 次提交
  7. 09 2月, 2008 10 次提交
    • L
      ACPI: thermal: buildfix for CONFIG_THERMAL=n · a0dd25b2
      Len Brown 提交于
      This fixes the build, but acpi_fan_add() still needs
      to be updated to handle thermal_cooling_device_register()
      returning NULL as a non-fatal condition.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      a0dd25b2
    • K
      DCA: convert struct class_device to struct device. · 765cdb6c
      Kay Sievers 提交于
      Thanks to Kay for keeping us honest.
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
      Cc: "Williams, Dan J" <dan.j.williams@intel.com>
      Acked-by: NGreg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      765cdb6c
    • J
      IB/mlx4: Use multiple WQ blocks to post smaller send WQEs · ea54b10c
      Jack Morgenstein 提交于
      ConnectX HCA supports shrinking WQEs, so that a single work request
      can be made of multiple units of wqe_shift.  This way, WRs can differ
      in size, and do not have to be a power of 2 in size, saving memory and
      speeding up send WR posting.  Unfortunately, if we do this then the
      wqe_index field in CQEs can't be used to look up the WR ID anymore, so
      our implementation does this only if selective signaling is off.
      
      Further, on 32-bit platforms, we can't use vmap() to make the QP
      buffer virtually contigious. Thus we have to use constant-sized WRs to
      make sure a WR is always fully within a single page-sized chunk.
      
      Finally, we use WRs with the NOP opcode to avoid wrapping around the
      queue buffer in the middle of posting a WR, and we set the
      NoErrorCompletion bit to avoid getting completions with error for NOP
      WRs.  However, NEC is only supported starting with firmware 2.2.232,
      so we use constant-sized WRs for older firmware.  And, since MLX QPs
      only support SEND, we use constant-sized WRs in this case.
      
      When stamping during NOP posting, do stamping following setting of the
      NOP WQE valid bit.
      Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      ea54b10c
    • M
      CONFIG_HIGHPTE vs. sub-page page tables. · 2f569afd
      Martin Schwidefsky 提交于
      Background: I've implemented 1K/2K page tables for s390.  These sub-page
      page tables are required to properly support the s390 virtualization
      instruction with KVM.  The SIE instruction requires that the page tables
      have 256 page table entries (pte) followed by 256 page status table entries
      (pgste).  The pgstes are only required if the process is using the SIE
      instruction.  The pgstes are updated by the hardware and by the hypervisor
      for a number of reasons, one of them is dirty and reference bit tracking.
      To avoid wasting memory the standard pte table allocation should return
      1K/2K (31/64 bit) and 2K/4K if the process is using SIE.
      
      Problem: Page size on s390 is 4K, page table size is 1K or 2K.  That means
      the s390 version for pte_alloc_one cannot return a pointer to a struct
      page.  Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
      cannot return a pointer to a pte either, since that would require more than
      32 bit for the return value of pte_alloc_one (and the pte * would not be
      accessible since its not kmapped).
      
      Solution: The only solution I found to this dilemma is a new typedef: a
      pgtable_t.  For s390 pgtable_t will be a (pte *) - to be introduced with a
      later patch.  For everybody else it will be a (struct page *).  The
      additional problem with the initialization of the ptl lock and the
      NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
      a destructor pgtable_page_dtor.  The page table allocation and free
      functions need to call these two whenever a page table page is allocated or
      freed.  pmd_populate will get a pgtable_t instead of a struct page pointer.
       To get the pgtable_t back from a pmd entry that has been installed with
      pmd_populate a new function pmd_pgtable is added.  It replaces the pmd_page
      call in free_pte_range and apply_to_pte_range.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f569afd
    • R
      IRQ_NOPROBE helper functions · 46f4f8f6
      Ralf Baechle 提交于
      Probing non-ISA interrupts using the handle_percpu_irq as their handle_irq
      method may crash the system because handle_percpu_irq does not check
      IRQ_WAITING.  This for example hits the MIPS Qemu configuration.
      
      This patch provides two helper functions set_irq_noprobe and set_irq_probe to
      set rsp.  clear the IRQ_NOPROBE flag.  The only current caller is MIPS code
      but this really belongs into generic code.
      
      As an aside, interrupt probing these days has become a mostly obsolete if not
      dangerous art.  I think Linux interrupts should be changed to default to
      non-probing but that's subject of this patch.
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-and-tested-by: NRob Landley <rob@landley.net>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      46f4f8f6
    • D
      fs/char_dev.c: chrdev_open marked static and removed from fs.h · 922f9cfa
      Denis Cheng 提交于
      There is an outdated comment in serial_core.c also fixed.
      Signed-off-by: NDenis Cheng <crquan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      922f9cfa
    • P
      preemptible RCU: sparse annotations · b55ab616
      Patrick McHardy 提交于
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b55ab616
    • Y
      Add new string functions strict_strto* and convert kernel params to use them · 06b2a76d
      Yi Yang 提交于
      Currently, for every sysfs node, the callers will be responsible for
      implementing store operation, so many many callers are doing duplicate
      things to validate input, they have the same mistakes because they are
      calling simple_strtol/ul/ll/uul, especially for module params, they are
      just numeric, but you can echo such values as 0x1234xxx, 07777888 and
      1234aaa, for these cases, module params store operation just ignores
      succesive invalid char and converts prefix part to a numeric although input
      is acctually invalid.
      
      This patch tries to fix the aforementioned issues and implements
      strict_strtox serial functions, kernel/params.c uses them to strictly
      validate input, so module params will reject such values as 0x1234xxxx and
      returns an error:
      
      write error: Invalid argument
      
      Any modules which export numeric sysfs node can use strict_strtox instead of
      simple_strtox to reject any invalid input.
      
      Here are some test results:
      
      Before applying this patch:
      
      [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
      4096
      [root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
      [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
      4096
      [root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
      [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
      4096
      [root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
      [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
      4096
      [root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
      [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
      4096
      [root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
      [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
      4096
      [root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
      [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
      4096
      [root@yangyi-dev /]#
      
      After applying this patch:
      
      [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
      4096
      [root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
      [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
      4096
      [root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
      -bash: echo: write error: Invalid argument
      [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
      4096
      [root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
      -bash: echo: write error: Invalid argument
      [root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
      [root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
      -bash: echo: write error: Invalid argument
      [root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
      -bash: echo: write error: Invalid argument
      [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
      4096
      [root@yangyi-dev /]# echo -n 4096 > /sys/module/e1000/parameters/copybreak
      [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
      4096
      [root@yangyi-dev /]#
      
      [akpm@linux-foundation.org: fix compiler warnings]
      [akpm@linux-foundation.org: fix off-by-one found by tiwai@suse.de]
      Signed-off-by: NYi Yang <yi.y.yang@intel.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      06b2a76d
    • M
      use __u32 in linux/reiserfs_fs.h · 13d8bcd2
      Mike Frysinger 提交于
      Since this header is exported to userspace and all the other types in the
      header have been scrubbed, this brings the last straggler in line.
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      13d8bcd2
    • P
      NBD: remove limit on max number of nbd devices · 20a8143e
      Paul Clements 提交于
      Remove the arbitrary 128 device limit for NBD.  nbds_max can now be set to
      any number.  In certain scenarios where devices are used sparsely we have
      run into the 128 device limit.
      Signed-off-by: NPaul Clements <paul.clements@steeleye.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20a8143e