1. 21 2月, 2006 1 次提交
  2. 18 2月, 2006 2 次提交
    • P
      [PATCH] Provide an interface for getting the current tick length · 726c14bf
      Paul Mackerras 提交于
      This provides an interface for arch code to find out how many
      nanoseconds are going to be added on to xtime by the next call to
      do_timer.  The value returned is a fixed-point number in 52.12 format
      in nanoseconds.  The reason for this format is that it gives the
      full precision that the timekeeping code is using internally.
      
      The motivation for this is to fix a problem that has arisen on 32-bit
      powerpc in that the value returned by do_gettimeofday drifts apart
      from xtime if NTP is being used.  PowerPC is now using a lockless
      do_gettimeofday based on reading the timebase register and performing
      some simple arithmetic.  (This method of getting the time is also
      exported to userspace via the VDSO.)  However, the factor and offset
      it uses were calculated based on the nominal tick length and weren't
      being adjusted when NTP varied the tick length.
      
      Note that 64-bit powerpc has had the lockless do_gettimeofday for a
      long time now.  It also had an extremely hairy routine that got called
      from the 32-bit compat routine for adjtimex, which adjusted the
      factor and offset according to what it thought the timekeeping code
      was going to do.  Not only was this only called if a 32-bit task did
      adjtimex (i.e. not if a 64-bit task did adjtimex), it was also
      duplicating computations from kernel/timer.c and it wasn't clear that
      it was (still) correct.
      
      The simple solution is to ask the timekeeping code how long the
      current jiffy will be on each timer interrupt, after calling
      do_timer.  If this jiffy will be a different length from the last one,
      we then need to compute new values for the factor and offset used in
      the lockless do_gettimeofday.  In this way we can keep xtime and
      do_gettimeofday in sync, even when NTP is varying the tick length.
      
      Note that when adjtimex varies the tick length, it almost always
      introduces the variation from the next tick on.  The only case I could
      see where adjtimex would vary the length of the current tick is when
      an old-style adjtime adjustment is being cancelled.  (It's not clear
      to me why the adjustment has to be cancelled immediately rather than
      from the next tick on.)  Thus I don't see any real need for a hook in
      adjtimex; the rare case of an old-style adjustment being cancelled can
      be fixed up at the next tick.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: Njohn stultz <johnstul@us.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      726c14bf
    • A
      [PATCH] x86_64: Add boot option to disable randomized mappings and cleanup · a62eaf15
      Andi Kleen 提交于
      AMD SimNow!'s JIT doesn't like them at all in the guest. For distribution
      installation it's easiest if it's a boot time option.
      
      Also I moved the variable to a more appropiate place and make
      it independent from sysctl
      
      And marked __read_mostly which it is.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a62eaf15
  3. 16 2月, 2006 4 次提交
  4. 15 2月, 2006 4 次提交
    • P
      [NETFILTER]: Fix xfrm lookup after SNAT · ee68cea2
      Patrick McHardy 提交于
      To find out if a packet needs to be handled by IPsec after SNAT, packets
      are currently rerouted in POST_ROUTING and a new xfrm lookup is done. This
      breaks SNAT of non-unicast packets to non-local addresses because the
      packet is routed as incoming packet and no neighbour entry is bound to the
      dst_entry. In general, it seems to be a bad idea to replace the dst_entry
      after the packet was already sent to the output routine because its state
      might not match what's expected.
      
      This patch changes the xfrm lookup in POST_ROUTING to re-use the original
      dst_entry without routing the packet again. This means no policy routing
      can be used for transport mode transforms (which keep the original route)
      when packets are SNATed to match the policy, but it looks like the best
      we can do for now.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee68cea2
    • C
      [PATCH] sched: revert "filter affine wakeups" · d6077cb8
      Chen, Kenneth W 提交于
      Revert commit d7102e95:
      
          [PATCH] sched: filter affine wakeups
      
      Apparently caused more than 10% performance regression for aim7 benchmark.
      The setup in use is 16-cpu HP rx8620, 64Gb of memory and 12 MSA1000s with 144
      disks.  Each disk is 72Gb with a single ext3 filesystem (courtesy of HP, who
      supplied benchmark results).
      
      The problem is, for aim7, the wake-up pattern is random, but it still needs
      load balancing action in the wake-up path to achieve best performance.  With
      the above commit, lack of load balancing hurts that workload.
      
      However, for workloads like database transaction processing, the requirement
      is exactly opposite.  In the wake up path, best performance is achieved with
      absolutely zero load balancing.  We simply wake up the process on the CPU that
      it was previously run.  Worst performance is obtained when we do load
      balancing at wake up.
      
      There isn't an easy way to auto detect the workload characteristics.  Ingo's
      earlier patch that detects idle CPU and decide whether to load balance or not
      doesn't perform with aim7 either since all CPUs are busy (it causes even
      bigger perf.  regression).
      
      Revert commit d7102e95, which causes more
      than 10% performance regression with aim7.
      Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d6077cb8
    • T
      [PATCH] NLM: Fix the NLM_GRANTED callback checks · 5ac5f9d1
      Trond Myklebust 提交于
      If 2 threads attached to the same process are blocking on different locks on
      different files (maybe even on different servers) but have the same lock
      arguments (i.e.  same offset+length - actually quite common, since most
      processes try to lock the entire file) then the first GRANTED call that wakes
      one up will also wake the other.
      
      Currently when the NLM_GRANTED callback comes in, lockd walks the list of
      blocked locks in search of a match to the lock that the NLM server has
      granted.  Although it checks the lock pid, start and end, it fails to check
      the filehandle and the server address.
      
      By checking the filehandle and server IP address, we ensure that this only
      happens if the locks truly are referencing the same file.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5ac5f9d1
    • M
      [PATCH] jbd: revert checkpoint list changes · 7c8903f6
      Mark Fasheh 提交于
      This patch reverts commit f93ea411:
        [PATCH] jbd: split checkpoint lists
      
      This broke journal_flush() for OCFS2, which is its method of being sure
      that metadata is sent to disk for another node.
      
      And two related commits 8d3c7fce and
      43c3e6f5 with the subjects:
        [PATCH] jbd: log_do_checkpoint fix
        [PATCH] jbd: remove_transaction fix
      
      These seem to be incremental bugfixes on the original patch and as such are
      no longer needed.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      Cc: Jan Kara <jack@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7c8903f6
  5. 12 2月, 2006 3 次提交
    • A
      [PATCH] nvidiafb: Add support for Geforce4 MX 4000 · bc7fc060
      Antonino A. Daplas 提交于
      Add support for Geforce4 MX 4000 (0x185)
      Signed-off-by: NAntonino Daplas <adaplas@pol.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bc7fc060
    • A
      [PATCH] select: fix returned timeval · 643a6545
      Andrew Morton 提交于
      With David Woodhouse <dwmw2@infradead.org>
      
      select() presently has a habit of increasing the value of the user's
      `timeout' argument on return.
      
      We were writing back a timeout larger than the original.  We _deliberately_
      round up, since we know we must wait at _least_ as long as the caller asks
      us to.
      
      The patch adds a couple of helper functions for magnitude comparison of
      timespecs and of timevals, and uses them to prevent the various poll and
      select functions from returning a timeout which is larger than the one which
      was passed in.
      
      The patch also fixes a bug in compat_sys_pselect7(): it was adding the new
      timeout value to the old one and was returning that.  It should just return
      the new timeout value.
      
      (We have various handy timespec/timeval-to-from-nsec conversion functions in
      time.h.  But this code open-codes it all).
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: george anzinger <george@mvista.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      643a6545
    • U
      [PATCH] fstatat64 support · cff2b760
      Ulrich Drepper 提交于
      The *at patches introduced fstatat and, due to inusfficient research, I
      used the newfstat functions generally as the guideline.  The result is that
      on 32-bit platforms we don't have all the information needed to implement
      fstatat64.
      
      This patch modifies the code to pass up 64-bit information if
      __ARCH_WANT_STAT64 is defined.  I renamed the syscall entry point to make
      this clear.  Other archs will continue to use the existing code.  On x86-64
      the compat code is implemented using a new sys32_ function.  this is what
      is done for the other stat syscalls as well.
      
      This patch might break some other archs (those which define
      __ARCH_WANT_STAT64 and which already wired up the syscall).  Yet others
      might need changes to accomodate the compatibility mode.  I really don't
      want to do that work because all this stat handling is a mess (more so in
      glibc, but the kernel is also affected).  It should be done by the arch
      maintainers.  I'll provide some stand-alone test shortly.  Those who are
      eager could compile glibc and run 'make check' (no installation needed).
      
      The patch below has been tested on x86 and x86-64.
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cff2b760
  6. 11 2月, 2006 3 次提交
    • P
      [PATCH] tty buffering stall fix · 8977d929
      Paul Fulghum 提交于
      Prevent stalled processing of received data when a driver allocates tty
      buffer space but does not immediately follow the allocation with more data
      and a call to schedule receive tty processing.  (example: hvc_console) This
      bug was introduced by the first locking patch for the new tty buffering.
      Signed-off-by: NPaul Fulghum <paulkf@microgate.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8977d929
    • A
      [PATCH] x86: don't initialise cpu_possible_map to all ones · 7a8ef1cb
      Andrew Morton 提交于
      Initialising cpu_possible_map to all-ones with CONFIG_HOTPLUG_CPU means that
      
      a) All for_each_cpu() loops will iterate across all NR_CPUS CPUs, rather
         than over possible ones.  That can be quite expensive.
      
      b) Soon we'll be allocating per-cpu areas only for possible CPUs.  So with
         CPU_MASK_ALL, we'll be wasting memory.
      
      I also switched voyager over to not use CPU_MASK_ALL in the non-CPU-hotplug
      case.  Should be OK..
      
      I note that parisc is also using CPU_MASK_ALL.  Suggest that it stop doing
      that.
      
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Zwane Mwaikambo <zwane@linuxpower.ca>
      Cc: Paul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7a8ef1cb
    • H
      [PATCH] kexec: fix in free initrd when overlapped with crashkernel region · 9c15e852
      Haren Myneni 提交于
      It is possible that the reserved crashkernel region can be overlapped with
      initrd since the bootloader sets the initrd location.  When the initrd
      region is freed, the second kernel memory will not be contiguous.  The
      Kexec_load can cause an oops since there is no contiguous memory to write
      the second kernel or this memory could be used in the first kernel itself
      and may not be part of the dump.  For example, on powerpc, the initrd is
      located at 36MB and the crashkernel starts at 32MB.  The kexec_load caused
      panic since writing into non-allocated memory (after 36MB).  We could see
      the similar issue even on other archs.
      
      One possibility is to move the initrd outside of crashkernel region.  But,
      the initrd region will be freed anyway before the system is up.  This patch
      fixes this issue and frees only regions that are not part of crashkernel
      memory in case overlaps.
      Signed-off-by: NHaren Myneni <haren@us.ibm.com>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Vivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9c15e852
  7. 10 2月, 2006 2 次提交
    • A
      [NETLINK]: Fix a severe bug · a70ea994
      Alexey Kuznetsov 提交于
      netlink overrun was broken while improvement of netlink.
      Destination socket is used in the place where it was meant to be source socket,
      so that now overrun is never sent to user netlink sockets, when it should be,
      and it even can be set on kernel socket, which results in complete deadlock
      of rtnetlink.
      
      Suggested fix is to restore status quo passing source socket as additional
      argument to netlink_attachskb().
      
      A little explanation: overrun is set on a socket, when it failed
      to receive some message and sender of this messages does not or even
      have no way to handle this error. This happens in two cases:
      1. when kernel sends something. Kernel never retransmits and cannot
         wait for buffer space.
      2. when user sends a broadcast and the message was not delivered
         to some recipients.
      Signed-off-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a70ea994
    • O
      [PATCH] do_sigaction: cleanup ->sa_mask manipulation · 9ac95f2f
      Oleg Nesterov 提交于
      Clear unblockable signals beforehand.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9ac95f2f
  8. 09 2月, 2006 1 次提交
  9. 08 2月, 2006 8 次提交
  10. 07 2月, 2006 2 次提交
  11. 06 2月, 2006 5 次提交
    • A
      [PATCH] jbd: fix transaction batching · fe1dcbc4
      Andrew Morton 提交于
      Ben points out that:
      
        When writing files out using O_SYNC, jbd's 1 jiffy delay results in a
        significant drop in throughput as the disk sits idle.  The patch below
        results in a 4-5x performance improvement (from 6.5MB/s to ~24-30MB/s on my
        IDE test box) when writing out files using O_SYNC.
      
      So optimise the batching code by omitting it entirely if the process which is
      doing a sync write is the same as the one which did the most recent sync
      write.  If that's true, we're unlikely to get any other processes joining the
      transaction.
      
      (Has been in -mm for ages - it took me a long time to get on to performance
      testing it)
      
      Numbers, on write-cache-disabled IDE:
      
      /usr/bin/time -p synctest -n 10 -uf -t 1 -p 1 dir-name
      
      Unpatched:
      	40 seconds
      Patched:
      	35 seconds
      Batching disabled:
      	35 seconds
      
      This is the problematic single-process-doing-fsync case.  With multiple
      fsyncing processes the numbers are AFACIT unaltered by the patch.
      
      Aside: performance testing and instrumentation shows that the transaction
      batching almost doesn't help (testing with synctest -n 1 -uf -t 100 -p 10
      dir-name on non-writeback-caching IDE).  This is because by the time one
      process is running a synchronous commit, a bunch of other processes already
      have a transaction handle open, so they're all going to batch into the same
      transaction anyway.
      
      The batching seems to offer maybe 5-10% speedup with this workload, but I'm
      pretty sure it was more important than that when it was first developed 4-odd
      years ago...
      
      Cc: "Stephen C. Tweedie" <sct@redhat.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fe1dcbc4
    • A
      [PATCH] reiserfs_get_acl() build fix · bc5e483d
      Andrew Morton 提交于
      With CONFIG_REISERFS_FS_XATTR=y, CONFIG_REISERFS_FS_POSIX_ACL=n:
      
      fs/reiserfs/xattr.c: In function `reiserfs_check_acl':
      fs/reiserfs/xattr.c:1330: called object is not a function
      
      Cc: Chris Mason <mason@suse.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bc5e483d
    • P
      [PATCH] pktcdvd: Allow larger packets · 5c55ac9b
      Phillip Susi 提交于
      The pktcdvd driver uses a compile time macro constant to define the maximum
      supported packet length.  I changed this from 32 sectors to 128 sectors
      because that allows over 100 MB of additional usable space on a 700 MB cdrw,
      and increases throughput.
      
      Note that you need a modified cdrwtool program that can format a CDRW disc
      with larger packets to benefit from this change.
      Signed-off-by: NPeter Osterlund <petero2@telia.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5c55ac9b
    • P
      [PATCH] pktcdvd: Don't waste kernel memory · e1bc89bc
      Peter Osterlund 提交于
      Allocate memory for read-gathering at open time, when it is known just how
      much memory is needed.  This avoids wasting kernel memory when the real packet
      size is smaller than the maximum packet size supported by the driver.  This is
      always the case when using DVD discs.
      Signed-off-by: NPeter Osterlund <petero2@telia.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e1bc89bc
    • P
      [PATCH] pktcdvd: Fix overflow for discs with large packets · a460ad62
      Phillip Susi 提交于
      The pktcdvd driver was using an 8 bit field to store the packet length
      obtained from the disc track info.  This causes it to overflow packet length
      values of 128KB or more.  I changed the field to 32 bits to fix this.
      
      The pktcdvd driver defaulted to its maximum allowed packet length when it
      detected a 0 in the track info field.  I changed this to fail the operation
      and refuse to access the media.  This seems more sane than attempting to
      access it with a value that almost certainly will not work.
      Signed-off-by: NPeter Osterlund <petero2@telia.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a460ad62
  12. 05 2月, 2006 2 次提交
  13. 04 2月, 2006 3 次提交