1. 24 6月, 2005 40 次提交
    • N
      [PATCH] add check to /proc/devices read routines · ac20427e
      Neil Horman 提交于
      Patch to add check to get_chrdev_list and get_blkdev_list to prevent reads
      of /proc/devices from spilling over the provided page if more than 4096
      bytes of string data are generated from all the registered character and
      block devices in a system
      Signed-off-by: NNeil Horman <nhorman@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: <viro@parcelfarce.linux.theplanet.co.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ac20427e
    • P
      [PATCH] remove redundant vm_flags clearing from madvise.c · 3bc1ee3e
      Pekka Enberg 提交于
      This patch removes redundant VM_ClearReadHint from mm/madvice.c which was
      left there by Prasanna's patch.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3bc1ee3e
    • J
      [PATCH] preempt_count is int - remove cast and don't assign to unsigned type · be5b4fbd
      Jesper Juhl 提交于
      In kernel/sched.c the return value from preempt_count() is cast to an int.
      That made sense when preempt_count was defined as different types on is not
      needed and should go away.  The patch removes the cast.
      
      In kernel/timer.c the return value from preempt_count() is assigned to a
      variable of type u32 and then that unsigned value is later compared to
      preempt_count().  Since preempt_count() returns an int, an int is what
      should be used to store its return value.  Storing the result in an
      unsigned 32bit integer made a tiny bit of sense back when preempt_count was
      different types on different archs, but no more - let's not play signed vs
      unsigned comparison games when we don't have to.  The patch modifies the
      code to use an int to hold the value.  While I was around that bit of code
      I also made two changes to a nearby (related) printk() - I modified it to
      specify the loglevel explicitly and also broke the line into a few pieces
      to avoid it being longer than 80 chars and clarified the text a bit.
      Signed-off-by: NJesper Juhl <juhl-lkml@dif.dk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      be5b4fbd
    • J
      [PATCH] streamline preempt_count type across archs · dcd497f9
      Jesper Juhl 提交于
      The preempt_count member of struct thread_info is currently either defined
      as int, unsigned int or __s32 depending on arch.  This patch makes the type
      of preempt_count an int on all archs.
      
      Having preempt_count be an unsigned type prevents the catching of
      preempt_count < 0 bugs, and using int on some archs and __s32 on others is
      not exactely "neat" - much nicer when it's just int all over.
      
      A previous version of this patch was already ACK'ed by Robert Love, and the
      only change in this version of the patch compared to the one he ACK'ed is
      that this one also makes sure the preempt_count member is consistently
      commented.
      Signed-off-by: NJesper Juhl <juhl-lkml@dif.dk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      dcd497f9
    • N
      [PATCH] optimise loop driver a bit · 35a82d1a
      Nick Piggin 提交于
      Looks like locking can be optimised quite a lot.  Increase lock widths
      slightly so lo_lock is taken fewer times per request.  Also it was quite
      trivial to cover lo_pending with that lock, and remove the atomic
      requirement.  This also makes memory ordering explicitly correct, which is
      nice (not that I particularly saw any mem ordering bugs).
      
      Test was reading 4 250MB files in parallel on ext2-on-tmpfs filesystem (1K
      block size, 4K page size).  System is 2 socket Xeon with HT (4 thread).
      
      intel:/home/npiggin# umount /dev/loop0 ; mount /dev/loop0 /mnt/loop ; /usr/bin/time ./mtloop.sh
      
      Before:
      0.24user 5.51system 0:02.84elapsed 202%CPU (0avgtext+0avgdata 0maxresident)k
      0.19user 5.52system 0:02.88elapsed 198%CPU (0avgtext+0avgdata 0maxresident)k
      0.19user 5.57system 0:02.89elapsed 198%CPU (0avgtext+0avgdata 0maxresident)k
      0.22user 5.51system 0:02.90elapsed 197%CPU (0avgtext+0avgdata 0maxresident)k
      0.19user 5.44system 0:02.91elapsed 193%CPU (0avgtext+0avgdata 0maxresident)k
      
      After:
      0.07user 2.34system 0:01.68elapsed 143%CPU (0avgtext+0avgdata 0maxresident)k
      0.06user 2.37system 0:01.68elapsed 144%CPU (0avgtext+0avgdata 0maxresident)k
      0.06user 2.39system 0:01.68elapsed 145%CPU (0avgtext+0avgdata 0maxresident)k
      0.06user 2.36system 0:01.68elapsed 144%CPU (0avgtext+0avgdata 0maxresident)k
      0.06user 2.42system 0:01.68elapsed 147%CPU (0avgtext+0avgdata 0maxresident)k
      Signed-off-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      35a82d1a
    • G
      [PATCH] CON_CONSDEV bit not set correctly on last console · ab4af03a
      Greg Edwards 提交于
      According to include/linux/console.h, CON_CONSDEV flag should be set on
      the last console specified on the boot command line:
      
           86 #define CON_PRINTBUFFER (1)
           87 #define CON_CONSDEV     (2) /* Last on the command line */
           88 #define CON_ENABLED     (4)
           89 #define CON_BOOT        (8)
      
      This does not currently happen if there is more than one console specified
      on the boot commandline.  Instead, it gets set on the first console on the
      command line.  This can cause problems for things like kdb that look for
      the CON_CONSDEV flag to see if the console is valid.
      
      Additionaly, it doesn't look like CON_CONSDEV is reassigned to the next
      preferred console at unregister time if the console being unregistered
      currently has that bit set.
      
      Example (from sn2 ia64):
      
      elilo vmlinuz root=<dev> console=ttyS0 console=ttySG0
      
      in this case, the flags on ttySG console struct will be 0x4 (should be
      0x6).
      
      Attached patch against bk fixes both issues for the cases I looked at.  It
      uses selected_console (which gets incremented for each console specified on
      the command line) as the indicator of which console to set CON_CONSDEV on.
      When adding the console to the list, if the previous one had CON_CONSDEV
      set, it masks it out.  Tested on ia64 and x86.
      
      The problem with the current behavior is it breaks overriding the default from
      the boot line.  In the ia64 case, there may be a global append line defining
      console=a in elilo.conf.  Then you want to boot your kernel, and want to
      override the default by passing console=b on the boot line.  elilo constructs
      the kernel cmdline by starting with the value of the global append line, then
      tacks on whatever else you specify, which puts console=b last.
      Signed-off-by: NGreg Edwards <edwardsg@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ab4af03a
    • R
      [PATCH] kstrdup: convert a few existing implementations · dfe52244
      Robert Love 提交于
      Convert a bunch of strdup() implementations and their callers to the new
      kstrdup().  A few remain, for example see sound/core, and there are tons of
      open coded strdup()'s around.  Sigh.  But this is a start.
      Signed-off-by: NRobert Love <rml@novell.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      dfe52244
    • P
      [PATCH] create a kstrdup library function · 543537bd
      Paulo Marques 提交于
      This patch creates a new kstrdup library function and changes the "local"
      implementations in several places to use this function.
      
      Most of the changes come from the sound and net subsystems.  The sound part
      had already been acknowledged by Takashi Iwai and the net part by David S.
      Miller.
      
      I left UML alone for now because I would need more time to read the code
      carefully before making changes there.
      Signed-off-by: NPaulo Marques <pmarques@grupopie.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      543537bd
    • A
      [PATCH] fix for prune_icache()/forced final iput() races · 991114c6
      Alexander Viro 提交于
      Based on analysis and a patch from Russ Weight <rweight@us.ibm.com>
      
      There is a race condition that can occur if an inode is allocated and then
      released (using iput) during the ->fill_super functions.  The race
      condition is between kswapd and mount.
      
      For most filesystems this can only happen in an error path when kswapd is
      running concurrently.  For isofs, however, the error can occur in a more
      common code path (which is how the bug was found).
      
      The logic here is "we want final iput() to free inode *now* instead of
      letting it sit in cache if fs is going down or had not quite come up".  The
      problem is with kswapd seeing such inodes in the middle of being killed and
      happily taking over.
      
      The clean solution would be to tell kswapd to leave those inodes alone and
      let our final iput deal with them.  I.e.  add a new flag
      (I_FORCED_FREEING), set it before write_inode_now() there and make
      prune_icache() leave those alone.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      991114c6
    • O
      [PATCH] posix-timers: use try_to_del_timer_sync() · f972be33
      Oleg Nesterov 提交于
      sys_timer_settime/sys_timer_delete needs to delete k_itimer->real.timer
      synchronously while holding ->it_lock, which is also locked in
      posix_timer_fn.
      
      This patch removes timer_active/set_timer_inactive which plays with
      timer_list's internals in favour of using try_to_del_timer_sync(), which
      was introduced in the previous patch.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f972be33
    • O
      [PATCH] timers: introduce try_to_del_timer_sync() · fd450b73
      Oleg Nesterov 提交于
      This patch splits del_timer_sync() into 2 functions.  The new one,
      try_to_del_timer_sync(), returns -1 when it hits executing timer.
      
      It can be used in interrupt context, or when the caller hold locks which
      can prevent completion of the timer's handler.
      
      NOTE.  Currently it can't be used in interrupt context in UP case, because
      ->running_timer is used only with CONFIG_SMP.
      
      Should the need arise, it is possible to kill #ifdef CONFIG_SMP in
      set_running_timer(), it is cheap.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fd450b73
    • O
      [PATCH] timers fixes/improvements · 55c888d6
      Oleg Nesterov 提交于
      This patch tries to solve following problems:
      
      1. del_timer_sync() is racy. The timer can be fired again after
         del_timer_sync have checked all cpus and before it will recheck
         timer_pending().
      
      2. It has scalability problems. All cpus are scanned to determine
         if the timer is running on that cpu.
      
         With this patch del_timer_sync is O(1) and no slower than plain
         del_timer(pending_timer), unless it has to actually wait for
         completion of the currently running timer.
      
         The only restriction is that the recurring timer should not use
         add_timer_on().
      
      3. The timers are not serialized wrt to itself.
      
         If CPU_0 does mod_timer(jiffies+1) while the timer is currently
         running on CPU 1, it is quite possible that local interrupt on
         CPU_0 will start that timer before it finished on CPU_1.
      
      4. The timers locking is suboptimal. __mod_timer() takes 3 locks
         at once and still requires wmb() in del_timer/run_timers.
      
         The new implementation takes 2 locks sequentially and does not
         need memory barriers.
      
      Currently ->base != NULL means that the timer is pending. In that case
      ->base.lock is used to lock the timer. __mod_timer also takes timer->lock
      because ->base can be == NULL.
      
      This patch uses timer->entry.next != NULL as indication that the timer is
      pending. So it does __list_del(), entry->next = NULL instead of list_del()
      when the timer is deleted.
      
      The ->base field is used for hashed locking only, it is initialized
      in init_timer() which sets ->base = per_cpu(tvec_bases). When the
      tvec_bases.lock is locked, it means that all timers which are tied
      to this base via timer->base are locked, and the base itself is locked
      too.
      
      So __run_timers/migrate_timers can safely modify all timers which could
      be found on ->tvX lists (pending timers).
      
      When the timer's base is locked, and the timer removed from ->entry list
      (which means that _run_timers/migrate_timers can't see this timer), it is
      possible to set timer->base = NULL and drop the lock: the timer remains
      locked.
      
      This patch adds lock_timer_base() helper, which waits for ->base != NULL,
      locks the ->base, and checks it is still the same.
      
      __mod_timer() schedules the timer on the local CPU and changes it's base.
      However, it does not lock both old and new bases at once. It locks the
      timer via lock_timer_base(), deletes the timer, sets ->base = NULL, and
      unlocks old base. Then __mod_timer() locks new_base, sets ->base = new_base,
      and adds this timer. This simplifies the code, because AB-BA deadlock is not
      possible. __mod_timer() also ensures that the timer's base is not changed
      while the timer's handler is running on the old base.
      
      __run_timers(), del_timer() do not change ->base anymore, they only clear
      pending flag.
      
      So del_timer_sync() can test timer->base->running_timer == timer to detect
      whether it is running or not.
      
      We don't need timer_list->lock anymore, this patch kills it.
      
      We also don't need barriers. del_timer() and __run_timers() used smp_wmb()
      before clearing timer's pending flag. It was needed because __mod_timer()
      did not lock old_base if the timer is not pending, so __mod_timer()->list_add()
      could race with del_timer()->list_del(). With this patch these functions are
      serialized through base->lock.
      
      One problem. TIMER_INITIALIZER can't use per_cpu(tvec_bases). So this patch
      adds global
      
              struct timer_base_s {
                      spinlock_t lock;
                      struct timer_list *running_timer;
              } __init_timer_base;
      
      which is used by TIMER_INITIALIZER. The corresponding fields in tvec_t_base_s
      struct are replaced by struct timer_base_s t_base.
      
      It is indeed ugly. But this can't have scalability problems. The global
      __init_timer_base.lock is used only when __mod_timer() is called for the first
      time AND the timer was compile time initialized. After that the timer migrates
      to the local CPU.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NRenaud Lienhart <renaud.lienhart@free.fr>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      55c888d6
    • N
      [PATCH] blk: unplug later · bdd646a4
      Nick Piggin 提交于
      get_request_wait needn't unplug the device immediately.
      Signed-off-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Cc: Jens Axboe <axboe@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bdd646a4
    • N
      [PATCH] blk: branch hints · fde6ad22
      Nick Piggin 提交于
      Sprinkle around a few branch hints in the block layer.
      Signed-off-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Cc: Jens Axboe <axboe@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fde6ad22
    • N
      [PATCH] blk: no memory barrier · 250dccc0
      Nick Piggin 提交于
      This memory barrier is not needed because the waitqueue will only get waiters
      on it in the following situations:
      
      rq->count has exceeded the threshold - however all manipulations of ->count
      are performed under the runqueue lock, and so we will correctly pick up any
      waiter.
      
      Memory allocation for the request fails.  In this case, there is no additional
      help provided by the memory barrier.  We are guaranteed to eventually wake up
      waiters because the request allocation mempool guarantees that if the mem
      allocation for a request fails, there must be some requests in flight.  They
      will wake up waiters when they are retired.
      Signed-off-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Cc: Jens Axboe <axboe@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      250dccc0
    • T
      [PATCH] blk: cleanup generic tag support error messages · 040c928c
      Tejun Heo 提交于
      Add KERN_ERR and __FUNCTION__ to generic tag error messages, and add a comment
      in blk_queue_end_tag() which explains the silent failure path.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Acked-by: NJens Axboe <axboe@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      040c928c
    • T
      [PATCH] blk: remove BLK_TAGS_{PER_LONG|MASK} · f7d37d02
      Tejun Heo 提交于
      Replace BLK_TAGS_PER_LONG with BITS_PER_LONG and remove unused BLK_TAGS_MASK.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Acked-by: NJens Axboe <axboe@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f7d37d02
    • T
      [PATCH] blk: remove blk_queue_tag->real_max_depth optimization · fa72b903
      Tejun Heo 提交于
      blk_queue_tag->real_max_depth was used to optimize out unnecessary
      allocations/frees on tag resize.  However, the whole thing was very broken -
      tag_map was never allocated to real_max_depth resulting in access beyond the
      end of the map, bits in [max_depth..real_max_depth] were set when initializing
      a map and copied when resizing resulting in pre-occupied tags.
      
      As the gain of the optimization is very small, well, almost nill, remove the
      whole thing.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Acked-by: NJens Axboe <axboe@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fa72b903
    • T
      [PATCH] blk: use find_first_zero_bit() in blk_queue_start_tag() · 2bf0fdad
      Tejun Heo 提交于
      blk_queue_start_tag() hand-coded searching for the first zero bit in the tag
      map.  Replace it with find_first_zero_bit().  With this patch,
      blk_queue_star_tag() doesn't need to fill remains of tag map with 1, thus
      allowing it to work properly with the next remove_real_max_depth patch.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Acked-by: NJens Axboe <axboe@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2bf0fdad
    • D
      [PATCH] ptrace_h8300: condition bugfix · 15d20bfd
      Domen Puncer 提交于
      Assignment doesn't make much sense here as condition would always be true.
      Signed-off-by: NDomen Puncer <domen@coderock.org>
      Signed-off-by: NYoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      15d20bfd
    • V
      [PATCH] xen: x86_64: use more usermode macro · 76381fee
      Vincent Hanquez 提交于
      Make use of the user_mode macro where it's possible.  This is useful for Xen
      because it will need only to redefine only the macro to a hypervisor call.
      Signed-off-by: NVincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
      Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      76381fee
    • V
      [PATCH] xen: x86_64: Add macro for debugreg · e9129e56
      Vincent Hanquez 提交于
      Add 2 macros to set and get debugreg on x86_64.  This is useful for Xen
      because it will need only to redefine each macro to a hypervisor call.
      Signed-off-by: NVincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
      Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e9129e56
    • V
      [PATCH] xen: x86: Use more usermode macro · 717b594a
      Vincent Hanquez 提交于
      Use the user_mode macro where it's possible.
      Signed-off-by: NVincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
      Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      717b594a
    • V
      [PATCH] xen: x86: Rename usermode macro · fa1e1bdf
      Vincent Hanquez 提交于
      Rename user_mode to user_mode_vm and add a user_mode macro similar to the
      x86-64 one.
      
      This is useful for Xen because the linux xen kernel does not runs on the same
      priviledge that a vanilla linux kernel, and with this we just need to redefine
      user_mode().
      Signed-off-by: NVincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
      Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fa1e1bdf
    • V
      [PATCH] xen: x86: Use new macro for debugreg · 1cc6f12e
      Vincent Hanquez 提交于
      Make use of the 2 new macro set_debugreg and get_debugreg.
      Signed-off-by: NVincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
      Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1cc6f12e
    • V
      [PATCH] xen: x86: add macro for debugreg · f5012310
      Vincent Hanquez 提交于
      Add 2 macros to set and get debugreg on x86.  This is useful for Xen because
      it will need only to redefine each macro to a hypervisor call.
      Signed-off-by: NVincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
      Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f5012310
    • N
      [PATCH] x86_64: avoid wasting IRQs · 701067c4
      Natalie Protasevich 提交于
      I suggest to change the way IRQs are handed out to PCI devices.
      
      Currently, each I/O APIC pin gets associated with an IRQ, no matter if the
      pin is used or not.  It is expected that each pin can potentually be
      engaged by a device inserted into the corresponding PCI slot.  However,
      this imposes severe limitation on systems that have designs that employ
      many I/O APICs, only utilizing couple lines of each, such as P64H2 chipset.
      
      It is used in ES7000, and currently, there is no way to boot the system
      with more that 9 I/O APICs.
      
      The simple change below allows to boot a system with say 64 (or more) I/O
      APICs, each providing 1 slot, which otherwise impossible because of the IRQ
      gaps created for unused lines on each I/O APIC.  It does not resolve the
      problem with number of devices that exceeds number of possible IRQs, but
      eases up a tension for IRQs on any large system with potentually large
      number of devices.
      
      I only implemented this for the ACPI boot, since if the system is this big
      and using newer chipsets it is probably (better be!) an ACPI based system
      :).  The change is completely "mechanical" and does not alter any internal
      structures or interrupt model/implementation.  The patch works for both
      i386 and x86_64 archs.  It works with MSIs just fine, and should not
      intervene with implementations like shared vectors, when they get worked
      out and incorporated.
      
      To illustrate, below is the interrupt distribution for 2-cell ES7000 with
      20 I/O APICs, and an Ethernet card in the last slot, which should be eth1
      and which was not configured because its IRQ exceeded allowable number (it
      actially turned out huge - 480!):
      
      zorro-tb2:~ # cat /proc/interrupts
                 CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
        0:      65716      30012      30007      30002      30009      30010      30010      30010    IO-APIC-edge  timer
        4:        373          0        725        280          0          0          0          0    IO-APIC-edge  serial
        8:          0          0          0          0          0          0          0          0    IO-APIC-edge  rtc
        9:          0          0          0          0          0          0          0          0   IO-APIC-level  acpi
       14:         39          3          0          0          0          0          0          0    IO-APIC-edge  ide0
       16:        108         13          0          0          0          0          0          0   IO-APIC-level  uhci_hcd:usb1
       18:          0          0          0          0          0          0          0          0   IO-APIC-level  uhci_hcd:usb3
       19:         15          0          0          0          0          0          0          0   IO-APIC-level  uhci_hcd:usb2
       23:          3          0          0          0          0          0          0          0   IO-APIC-level  ehci_hcd:usb4
       96:       4240        397         18          0          0          0          0          0   IO-APIC-level  aic7xxx
       97:         15          0          0          0          0          0          0          0   IO-APIC-level  aic7xxx
      192:        847          0          0          0          0          0          0          0   IO-APIC-level  eth0
      NMI:          0          0          0          0          0          0          0          0
      LOC:     273423     274528     272829     274228     274092     273761     273827     273694
      ERR:          7
      MIS:          0
      
      Even though the system doesn't have that many devices, some don't get
      enabled only because of IRQ numbering model.
      
      This is the IRQ picture after the patch was applied:
      
      zorro-tb2:~ # cat /proc/interrupts
                 CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
        0:      44169      10004      10004      10001      10004      10003      10004       6135    IO-APIC-edge  timer
        4:        345          0          0          0          0        244          0          0    IO-APIC-edge  serial
        8:          0          0          0          0          0          0          0          0    IO-APIC-edge  rtc
        9:          0          0          0          0          0          0          0          0   IO-APIC-level  acpi
       14:         39          0          3          0          0          0          0          0    IO-APIC-edge  ide0
       17:       4425          0          9          0          0          0          0          0   IO-APIC-level  aic7xxx
       18:         15          0          0          0          0          0          0          0   IO-APIC-level  aic7xxx, uhci_hcd:usb3
       21:        231          0          0          0          0          0          0          0   IO-APIC-level  uhci_hcd:usb1
       22:         26          0          0          0          0          0          0          0   IO-APIC-level  uhci_hcd:usb2
       23:          3          0          0          0          0          0          0          0   IO-APIC-level  ehci_hcd:usb4
       24:        348          0          0          0          0          0          0          0   IO-APIC-level  eth0
       25:          6        192          0          0          0          0          0          0   IO-APIC-level  eth1
      NMI:          0          0          0          0          0          0          0          0
      LOC:     107981     107636     108899     108698     108489     108326     108331     108254
      ERR:          7
      MIS:          0
      
      Not only we see the card in the last I/O APIC, but we are not even close to
      using up available IRQs, since we didn't waste any.
      Signed-off-by: NNatalie Protasevich <Natalie.Protasevich@unisys.com>
      Acked-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      701067c4
    • J
      [PATCH] eliminate duplicate rdpmc definition · 32ecd42b
      Jan Beulich 提交于
      Eliminate duplicate definition of rdpmc in x86-64's mtrr.h.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      32ecd42b
    • R
      [PATCH] x86_64: never block forced SIGSEGV · 0928d6ef
      Roland McGrath 提交于
      This is the x86_64 version of the signal fix I just posted for i386.
      
      This problem was first noticed on PPC and has already been fixed there.
      But the exact same issue applies to other platforms in the same way.  The
      signal blocking for sa_mask and the handled signal takes place after the
      handler setup.  When the stack is bogus, the handler setup forces a
      SIGSEGV.  But then this will be blocked, and returning to user mode will
      fault again and iterate.  This patch fixes the problem by checking whether
      signal handler setup failed, and not doing the signal-blocking if so.  This
      copies what was done in the ppc code.  I think all architectures' signal
      handler setup code follows this pattern and needs the change.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0928d6ef
    • J
      [PATCH] x86_64: fix hpet for systems that don't support legacy replacement · a3a00751
      john stultz 提交于
      Currently the x86-64 HPET code assumes the entire HPET implementation from
      the spec is present.  This breaks on boxes that do not implement the
      optional legacy timer replacement functionality portion of the spec.
      
      This patch fixes this issue, allowing x86-64 systems that cannot use the
      HPET for the timer interrupt and RTC to still use the HPET as a time
      source.  I've tested this patch on a system systems without HPET, with HPET
      but without legacy timer replacement, as well as HPET with legacy timer
      replacement.
      
      This version adds a minor check to cap the HPET counter value in
      gettimeoffset_hpet to avoid possible time inconsistencies.  Please ignore
      the A2 version I sent to you earlier.
      Acked-by: NAndi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a3a00751
    • A
      [PATCH] x86_64: i8259.c iso99 structure initialization · c0a88c98
      Alexander Nyberg 提交于
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c0a88c98
    • A
      [PATCH] mtrr size-and-base debugging · c92c6ffd
      Andrew Morton 提交于
      Consolidate the mtrr sanity checking, add a dump_stack().
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c92c6ffd
    • A
      [PATCH] x86: cpu_khz type fix · a3a255e7
      Andrew Morton 提交于
      x86_64's cpu_khz is unsigned int and there is no reason why x86 needs to use
      unsigned long.
      
      So make cpu_khz unsigned int on x86 as well.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a3a255e7
    • A
      [PATCH] Remove i386_ksyms.c, almost. · 129f6946
      Alexey Dobriyan 提交于
      * EXPORT_SYMBOL's moved to other files
      * #include <linux/config.h>, <linux/module.h> where needed
      * #include's in i386_ksyms.c cleaned up
      * After copy-paste, redundant due to Makefiles rules preprocessor directives
        removed:
      
      	#ifdef CONFIG_FOO
      	EXPORT_SYMBOL(foo);
      	#endif
      
      	obj-$(CONFIG_FOO) += foo.o
      
      * Tiny reformat to fit in 80 columns
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      129f6946
    • A
      [PATCH] x86: #include asm/uaccess.h in asm/checksum.h · a9ed8817
      Alexey Dobriyan 提交于
      csum_and_copy_to_user is static inline and uses VERIFY_WRITE.  Patch allows
      to remove asm/uaccess.h from i386_ksyms.c without dependency surprises.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a9ed8817
    • A
      [PATCH] VIA 82C586B IRQ routing fix · 80bb82af
      Aleksey Gorelov 提交于
      According to the VIA 82C586B datasheet (still available from
      http://gkernel.sourceforge.net/specs/via/586b.pdf.bz2) this chip need a
      special PIRQ mapping.
      Signed-off-by: NKarsten Keil <kkeil@suse.de>
      Signed-off-by: NAleksey Gorelov <aleksey_gorelov@phoenix.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      80bb82af
    • N
      [PATCH] x86: avoid wasting IRQs for PCI devices · c434b7a6
      Natalie Protasevich 提交于
      I have submitted the patch for x86_64, this is submission for i386.
      
      The patch changes the way IRQs are handed out to PCI devices.  Currently,
      each I/O APIC pin gets associated with an IRQ, no matter if the pin is used
      or not.  This imposes severe limitation on systems that have designs that
      employ many I/O APICs, only utilizing couple lines of each, such as P64H2
      chipset.  It is used in ES7000, and currently, there is no way to boot the
      system with more that 9 I/O APICs.
      
      The simple change below allows to boot a system with say 64 (or more) I/O
      APICs, each providing 1 slot, which otherwise impossible because of the IRQ
      gaps created for unused lines on each I/O APIC.  It does not resolve the
      problem with number of devices that exceeds number of possible IRQs, but
      eases up a tension for IRQs on any large system with potentually large
      number of devices.
      Signed-off-by: NNatalie Protasevich <Natalie.Protasevich@unisys.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c434b7a6
    • C
      [PATCH] ia64: Selectable Timer Interrupt Frequency · b5d23e5b
      Christoph Lameter 提交于
      It allows a selectable timer interrupt frequency of 100, 250 and 1000 HZ.
      Reducing the timer frequency may have important performance benefits on
      large systems.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b5d23e5b
    • C
      [PATCH] i386: Selectable Frequency of the Timer Interrupt · 59121003
      Christoph Lameter 提交于
      Make the timer frequency selectable. The timer interrupt may cause bus
      and memory contention in large NUMA systems since the interrupt occurs
      on each processor HZ times per second.
      Signed-off-by: NChristoph Lameter <christoph@lameter.com>
      Signed-off-by: NShai Fultheim <shai@scalex86.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      59121003
    • J
      [PATCH] allow early printk to use more than 25 lines · 799d19f6
      Jan Beulich 提交于
      Allow early printk code to take advantage of the full size of the screen, not
      just the first 25 lines.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Acked-by: NAndi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      799d19f6