1. 12 3月, 2006 5 次提交
  2. 10 3月, 2006 2 次提交
    • C
      [PATCH] slab: Node rotor for freeing alien caches and remote per cpu pages. · 8fce4d8e
      Christoph Lameter 提交于
      The cache reaper currently tries to free all alien caches and all remote
      per cpu pages in each pass of cache_reap.  For a machines with large number
      of nodes (such as Altix) this may lead to sporadic delays of around ~10ms.
      Interrupts are disabled while reclaiming creating unacceptable delays.
      
      This patch changes that behavior by adding a per cpu reap_node variable.
      Instead of attempting to free all caches, we free only one alien cache and
      the per cpu pages from one remote node.  That reduces the time spend in
      cache_reap.  However, doing so will lengthen the time it takes to
      completely drain all remote per cpu pagesets and all alien caches.  The
      time needed will grow with the number of nodes in the system.  All caches
      are drained when they overflow their respective capacity.  So the drawback
      here is only that a bit of memory may be wasted for awhile longer.
      
      Details:
      
      1. Rename drain_remote_pages to drain_node_pages to allow the specification
         of the node to drain of pcp pages.
      
      2. Add additional functions init_reap_node, next_reap_node for NUMA
         that manage a per cpu reap_node counter.
      
      3. Add a reap_alien function that reaps only from the current reap_node.
      
      For us this seems to be a critical issue.  Holdoffs of an average of ~7ms
      cause some HPC benchmarks to slow down significantly.  F.e.  NAS parallel
      slows down dramatically.  NAS parallel has a 12-16 seconds runtime w/o rotor
      compared to 5.8 secs with the rotor patches.  It gets down to 5.05 secs with
      the additional interrupt holdoff reductions.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8fce4d8e
    • A
      [PATCH] mtd: 64 bit fixes · 0ef675d4
      Atsushi Nemoto 提交于
      Fix some bugs in mtd/jffs2 on 64bit platform.
      
      The MEMGETBADBLOCK/MEMSETBADBLOCK ioctl are not listed in compat_ioctl.h.
      
      And some variables in jffs2 are declared as uint32_t but used to hold
      size_t values.
      Signed-off-by: NAtsushi Nemoto <anemo@mba.ocn.ne.jp>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Acked-by: NDavid Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0ef675d4
  3. 09 3月, 2006 4 次提交
    • D
      [PATCH] fix file counting · 529bf6be
      Dipankar Sarma 提交于
      I have benchmarked this on an x86_64 NUMA system and see no significant
      performance difference on kernbench.  Tested on both x86_64 and powerpc.
      
      The way we do file struct accounting is not very suitable for batched
      freeing.  For scalability reasons, file accounting was
      constructor/destructor based.  This meant that nr_files was decremented
      only when the object was removed from the slab cache.  This is susceptible
      to slab fragmentation.  With RCU based file structure, consequent batched
      freeing and a test program like Serge's, we just speed this up and end up
      with a very fragmented slab -
      
      llm22:~ # cat /proc/sys/fs/file-nr
      587730  0       758844
      
      At the same time, I see only a 2000+ objects in filp cache.  The following
      patch I fixes this problem.
      
      This patch changes the file counting by removing the filp_count_lock.
      Instead we use a separate percpu counter, nr_files, for now and all
      accesses to it are through get_nr_files() api.  In the sysctl handler for
      nr_files, we populate files_stat.nr_files before returning to user.
      
      Counting files as an when they are created and destroyed (as opposed to
      inside slab) allows us to correctly count open files with RCU.
      Signed-off-by: NDipankar Sarma <dipankar@in.ibm.com>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      529bf6be
    • D
      [PATCH] rcu batch tuning · 21a1ea9e
      Dipankar Sarma 提交于
      This patch adds new tunables for RCU queue and finished batches.  There are
      two types of controls - number of completed RCU updates invoked in a batch
      (blimit) and monitoring for high rate of incoming RCUs on a cpu (qhimark,
      qlowmark).
      
      By default, the per-cpu batch limit is set to a small value.  If the input
      RCU rate exceeds the high watermark, we do two things - force quiescent
      state on all cpus and set the batch limit of the CPU to INTMAX.  Setting
      batch limit to INTMAX forces all finished RCUs to be processed in one shot.
       If we have more than INTMAX RCUs queued up, then we have bigger problems
      anyway.  Once the incoming queued RCUs fall below the low watermark, the
      batch limit is set to the default.
      Signed-off-by: NDipankar Sarma <dipankar@in.ibm.com>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      21a1ea9e
    • A
      [PATCH] percpu_counter_sum() · e2bab3d9
      Andrew Morton 提交于
      Implement percpu_counter_sum().  This is a more accurate but slower version of
      percpu_counter_read_positive().
      
      We need this for Alex's speedup-ext3_statfs patch and for the nr_file
      accounting fix.  Otherwise these things would be too inaccurate on large CPU
      counts.
      
      Cc: Ravikiran G Thirumalai <kiran@scalex86.org>
      Cc: Alex Tomas <alex@clusterfs.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e2bab3d9
    • L
      Mark the pipe file operations static · a19cbd4b
      Linus Torvalds 提交于
      They aren't used (nor even really usable) outside of pipe.c anyway
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a19cbd4b
  4. 07 3月, 2006 4 次提交
  5. 06 3月, 2006 1 次提交
    • T
      [PATCH] libata: implement ata_dev_revalidate() · 623a3128
      Tejun Heo 提交于
      ata_dev_revalidate() re-reads IDENTIFY PAGE of the given device and
      makes sure it's the same device as the configured one.  Once it's
      verified that it's the same device, @dev is configured according to
      newly read IDENTIFY PAGE.  Note that revalidation currently doesn't
      invoke transfer mode reconfiguration.
      
      Criteria for 'same device'
      
      * same class (of course)
      * same model string
      * same serial string
      * if ATA, same n_sectors (to catch geometry parameter changes)
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      623a3128
  6. 04 3月, 2006 3 次提交
  7. 03 3月, 2006 1 次提交
  8. 01 3月, 2006 2 次提交
  9. 28 2月, 2006 1 次提交
  10. 25 2月, 2006 1 次提交
    • U
      [PATCH] flags parameter for linkat · c04030e1
      Ulrich Drepper 提交于
      I'm currently at the POSIX meeting and one thing covered was the
      incompatibility of Linux's link() with the POSIX definition.  The name.
      Linux does not follow symlinks, POSIX requires it does.
      
      Even if somebody thinks this is a good default behavior we cannot change this
      because it would break the ABI.  But the fact remains that some application
      might want this behavior.
      
      We have one chance to help implementing this without breaking the behavior.
       For this we could use the new linkat interface which would need a new
      flags parameter.  If the new parameter is AT_SYMLINK_FOLLOW the new
      behavior could be invoked.
      
      I do not want to introduce such a patch now.  But we could add the
      parameter now, just don't use it.  The patch below would do this.  Can we
      get this late patch applied before the release more or less fixes the
      syscall API?
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c04030e1
  11. 23 2月, 2006 2 次提交
  12. 22 2月, 2006 1 次提交
  13. 21 2月, 2006 5 次提交
  14. 20 2月, 2006 1 次提交
  15. 18 2月, 2006 2 次提交
    • P
      [PATCH] Provide an interface for getting the current tick length · 726c14bf
      Paul Mackerras 提交于
      This provides an interface for arch code to find out how many
      nanoseconds are going to be added on to xtime by the next call to
      do_timer.  The value returned is a fixed-point number in 52.12 format
      in nanoseconds.  The reason for this format is that it gives the
      full precision that the timekeeping code is using internally.
      
      The motivation for this is to fix a problem that has arisen on 32-bit
      powerpc in that the value returned by do_gettimeofday drifts apart
      from xtime if NTP is being used.  PowerPC is now using a lockless
      do_gettimeofday based on reading the timebase register and performing
      some simple arithmetic.  (This method of getting the time is also
      exported to userspace via the VDSO.)  However, the factor and offset
      it uses were calculated based on the nominal tick length and weren't
      being adjusted when NTP varied the tick length.
      
      Note that 64-bit powerpc has had the lockless do_gettimeofday for a
      long time now.  It also had an extremely hairy routine that got called
      from the 32-bit compat routine for adjtimex, which adjusted the
      factor and offset according to what it thought the timekeeping code
      was going to do.  Not only was this only called if a 32-bit task did
      adjtimex (i.e. not if a 64-bit task did adjtimex), it was also
      duplicating computations from kernel/timer.c and it wasn't clear that
      it was (still) correct.
      
      The simple solution is to ask the timekeeping code how long the
      current jiffy will be on each timer interrupt, after calling
      do_timer.  If this jiffy will be a different length from the last one,
      we then need to compute new values for the factor and offset used in
      the lockless do_gettimeofday.  In this way we can keep xtime and
      do_gettimeofday in sync, even when NTP is varying the tick length.
      
      Note that when adjtimex varies the tick length, it almost always
      introduces the variation from the next tick on.  The only case I could
      see where adjtimex would vary the length of the current tick is when
      an old-style adjtime adjustment is being cancelled.  (It's not clear
      to me why the adjustment has to be cancelled immediately rather than
      from the next tick on.)  Thus I don't see any real need for a hook in
      adjtimex; the rare case of an old-style adjustment being cancelled can
      be fixed up at the next tick.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: Njohn stultz <johnstul@us.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      726c14bf
    • A
      [PATCH] x86_64: Add boot option to disable randomized mappings and cleanup · a62eaf15
      Andi Kleen 提交于
      AMD SimNow!'s JIT doesn't like them at all in the guest. For distribution
      installation it's easiest if it's a boot time option.
      
      Also I moved the variable to a more appropiate place and make
      it independent from sysctl
      
      And marked __read_mostly which it is.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a62eaf15
  16. 16 2月, 2006 4 次提交
  17. 15 2月, 2006 1 次提交
    • P
      [NETFILTER]: Fix xfrm lookup after SNAT · ee68cea2
      Patrick McHardy 提交于
      To find out if a packet needs to be handled by IPsec after SNAT, packets
      are currently rerouted in POST_ROUTING and a new xfrm lookup is done. This
      breaks SNAT of non-unicast packets to non-local addresses because the
      packet is routed as incoming packet and no neighbour entry is bound to the
      dst_entry. In general, it seems to be a bad idea to replace the dst_entry
      after the packet was already sent to the output routine because its state
      might not match what's expected.
      
      This patch changes the xfrm lookup in POST_ROUTING to re-use the original
      dst_entry without routing the packet again. This means no policy routing
      can be used for transport mode transforms (which keep the original route)
      when packets are SNATed to match the policy, but it looks like the best
      we can do for now.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee68cea2