1. 26 10月, 2005 1 次提交
  2. 24 10月, 2005 1 次提交
    • A
      [PATCH] inotify/idr leak fix · 8d3b3591
      Andrew Morton 提交于
      Fix a bug which was reported and diagnosed by
      Stefan Jones <stefan.jones@churchillrandoms.co.uk>
      
      IDR trees include a cache of idr_layer objects.  There's no way to destroy
      this cache, so when we discard an overall idr tree we end up leaking some
      memory.
      
      Add and use idr_destroy() for this.  v9fs and infiniband also need to use
      idr_destroy() to avoid leaks.
      
      Or, we make the cache global, like radix_tree_preload().  Which is probably
      better.  Later.
      
      Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org>
      Cc: Roland Dreier <rolandd@cisco.com>
      Cc: Robert Love <rml@novell.com>
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8d3b3591
  3. 21 10月, 2005 1 次提交
  4. 20 10月, 2005 2 次提交
    • Y
      [PATCH] swiotlb: make sure initial DMA allocations really are in DMA memory · 281dd25c
      Yasunori Goto 提交于
      This introduces a limit parameter to the core bootmem allocator; The new
      parameter indicates that physical memory allocated by the bootmem
      allocator should be within the requested limit.
      
      We also introduce alloc_bootmem_low_pages_limit, alloc_bootmem_node_limit,
      alloc_bootmem_low_pages_node_limit apis, but alloc_bootmem_low_pages_limit
      is the only api used for swiotlb.
      
      The existing alloc_bootmem_low_pages() api could instead have been
      changed and made to pass right limit to the core allocator.  But that
      would make the patch more intrusive for 2.6.14, as other arches use
      alloc_bootmem_low_pages().  We may be done that post 2.6.14 as a
      cleanup.
      
      With this, swiotlb gets memory within 4G for both x86_64 and ia64
      arches.
      Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Ravikiran G Thirumalai <kiran@scalex86.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      281dd25c
    • S
      [PATCH] Handle spurious page fault for hugetlb region · 3359b54c
      Seth, Rohit 提交于
      The hugetlb pages are currently pre-faulted.  At the time of mmap of
      hugepages, we populate the new PTEs.  It is possible that HW has already
      cached some of the unused PTEs internally.  These stale entries never
      get a chance to be purged in existing control flow.
      
      This patch extends the check in page fault code for hugepages.  Check if
      a faulted address falls with in size for the hugetlb file backing it.
      We return VM_FAULT_MINOR for these cases (assuming that the arch
      specific page-faulting code purges the stale entry for the archs that
      need it).
      Signed-off-by: NRohit Seth <rohit.seth@intel.com>
      
      [ This is apparently arguably an ia64 port bug. But the code won't
        hurt, and for now it fixes a real problem on some ia64 machines ]
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3359b54c
  5. 18 10月, 2005 2 次提交
  6. 17 10月, 2005 1 次提交
    • H
      [PATCH] list: add missing rcu_dereference on first element · b24d18aa
      Herbert Xu 提交于
      It seems that all the list_*_rcu primitives are missing a memory barrier
      on the very first dereference.  For example,
      
      #define list_for_each_rcu(pos, head) \
      	for (pos = (head)->next; prefetch(pos->next), pos != (head); \
      		pos = rcu_dereference(pos->next))
      
      It will go something like:
      
      	pos = (head)->next
      
      	prefetch(pos->next)
      
      	pos != (head)
      
      	do stuff
      
      We're missing a barrier here.
      
      	pos = rcu_dereference(pos->next)
      
      		fetch pos->next
      
      		barrier given by rcu_dereference(pos->next)
      
      		store pos
      
      Without the missing barrier, the pos->next value may turn out to be stale.
      In fact, if "do stuff" were also dereferencing pos and relying on
      list_for_each_rcu to provide the barrier then it may also break.
      
      So here is a patch to make sure that we have a barrier for the first
      element in the list.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: N"Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b24d18aa
  7. 16 10月, 2005 1 次提交
    • A
      [PATCH]: highest_possible_processor_id() has to be a macro · 688ce17b
      Al Viro 提交于
      	... otherwise, things like alpha and sparc64 break and break
      badly.  They define cpu_possible_map to something else in smp.h
      *AFTER* having included cpumask.h.
      
      	If that puppy is a macro, expansion will happen at the actual
      caller, when we'd already seen #define cpu_possible_map ... and we will
      get the right thing used.
      
      	As an inline helper it will be tokenized before we get to that
      define and that's it; no matter what we define later, it won't affect
      anything.  We get modules with dependency on cpu_possible_map instead
      of the right symbol (phys_cpu_present_map in case of sparc64), or outright
      link errors if they are built-in.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      688ce17b
  8. 15 10月, 2005 1 次提交
  9. 14 10月, 2005 1 次提交
  10. 13 10月, 2005 1 次提交
  11. 11 10月, 2005 7 次提交
  12. 10 10月, 2005 1 次提交
    • R
      [PATCH] x86_64: Set up safe page tables during resume · 3dd08325
      Rafael J. Wysocki 提交于
      The following patch makes swsusp avoid the possible temporary corruption
      of page translation tables during resume on x86-64.  This is achieved by
      creating a copy of the relevant page tables that will not be modified by
      swsusp and can be safely used by it on resume.
      
      The problem is that during resume on x86-64 swsusp may temporarily
      corrupt the page tables used for the direct mapping of RAM.  If that
      happens, a page fault occurs and cannot be handled properly, which leads
      to the solid hang of the affected system.  This leads to the loss of the
      system's state from before suspend and may result in the loss of data or
      the corruption of filesystems, so it is a serious issue.  Also, it
      appears to happen quite often (for me, as often as 50% of the time).
      
      The problem is related to the fact that (at least) one of the PMD
      entries used in the direct memory mapping (starting at PAGE_OFFSET)
      points to a page table the physical address of which is much greater
      than the physical address of the PMD entry itself.  Moreover,
      unfortunately, the physical address of the page table before suspend
      (i.e.  the one stored in the suspend image) happens to be different to
      the physical address of the corresponding page table used during resume
      (i.e.  the one that is valid right before swsusp_arch_resume() in
      arch/x86_64/kernel/suspend_asm.S is executed).  Thus while the image is
      restored, the "offending" PMD entry gets overwritten, so it does not
      point to the right physical address any more (i.e.  there's no page
      table at the address pointed to by it, because it points to the address
      the page table has been at during suspend).  Consequently, if the PMD
      entry is used later on, and it _is_ used in the process of copying the
      image pages, a page fault occurs, but it cannot be handled in the normal
      way and the system hangs.
      
      In principle we can call create_resume_mapping() from
      swsusp_arch_resume() (ie.  from suspend_asm.S), but then the memory
      allocations in create_resume_mapping(), resume_pud_mapping(), and
      resume_pmd_mapping() must be made carefully so that we use _only_
      NosaveFree pages in them (the other pages are overwritten by the loop in
      swsusp_arch_resume()).  Additionally, we are in atomic context at that
      time, so we cannot use GFP_KERNEL.  Moreover, if one of the allocations
      fails, we should free all of the allocated pages, so we need to trace
      them somehow.
      
      All of this is done in the appended patch, except that the functions
      populating the page tables are located in arch/x86_64/kernel/suspend.c
      rather than in init.c.  It may be done in a more elegan way in the
      future, with the help of some swsusp patches that are in the works now.
      
      [AK: move some externs into headers, renamed a function]
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3dd08325
  13. 09 10月, 2005 2 次提交
  14. 07 10月, 2005 1 次提交
  15. 05 10月, 2005 4 次提交
  16. 04 10月, 2005 3 次提交
    • H
      [IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnl · e5ed6399
      Herbert Xu 提交于
      The following patch renames __in_dev_get() to __in_dev_get_rtnl() and
      introduces __in_dev_get_rcu() to cover the second case.
      
      1) RCU with refcnt should use in_dev_get().
      2) RCU without refcnt should use __in_dev_get_rcu().
      3) All others must hold RTNL and use __in_dev_get_rtnl().
      
      There is one exception in net/ipv4/route.c which is in fact a pre-existing
      race condition.  I've marked it as such so that we remember to fix it.
      
      This patch is based on suggestions and prior work by Suzanne Wood and
      Paul McKenney.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5ed6399
    • E
      [INET]: speedup inet (tcp/dccp) lookups · 81c3d547
      Eric Dumazet 提交于
      Arnaldo and I agreed it could be applied now, because I have other
      pending patches depending on this one (Thank you Arnaldo)
      
      (The other important patch moves skc_refcnt in a separate cache line,
      so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)
      
      1) First some performance data :
      --------------------------------
      
      tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()
      
      The most time critical code is :
      
      sk_for_each(sk, node, &head->chain) {
           if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
               goto hit; /* You sunk my battleship! */
      }
      
      The sk_for_each() does use prefetch() hints but only the begining of
      "struct sock" is prefetched.
      
      As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far
      away from the begining of "struct sock", it has to bring into CPU
      cache cold cache line. Each iteration has to use at least 2 cache
      lines.
      
      This can be problematic if some chains are very long.
      
      2) The goal
      -----------
      
      The idea I had is to change things so that INET_MATCH() may return
      FALSE in 99% of cases only using the data already in the CPU cache,
      using one cache line per iteration.
      
      3) Description of the patch
      ---------------------------
      
      Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
      filling a 32 bits hole on 64 bits platform.
      
      struct sock_common {
      	unsigned short		skc_family;
      	volatile unsigned char	skc_state;
      	unsigned char		skc_reuse;
      	int			skc_bound_dev_if;
      	struct hlist_node	skc_node;
      	struct hlist_node	skc_bind_node;
      	atomic_t		skc_refcnt;
      +	unsigned int		skc_hash;
      	struct proto		*skc_prot;
      };
      
      Store in this 32 bits field the full hash, not masked by (ehash_size -
      1) Using this full hash as the first comparison done in INET_MATCH
      permits us immediatly skip the element without touching a second cache
      line in case of a miss.
      
      Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
      sk_hash and tw_hash) already contains the slot number if we mask with
      (ehash_size - 1)
      
      File include/net/inet_hashtables.h
      
      64 bits platforms :
      #define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
           (((__sk)->sk_hash == (__hash))
           ((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie))   &&  \
           ((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports))   &&  \
           (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))
      
      32bits platforms:
      #define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
           (((__sk)->sk_hash == (__hash))                 &&  \
           (inet_sk(__sk)->daddr          == (__saddr))   &&  \
           (inet_sk(__sk)->rcv_saddr      == (__daddr))   &&  \
           (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))
      
      
      - Adds a prefetch(head->chain.first) in 
      __inet_lookup_established()/__tcp_v4_check_established() and 
      __inet6_lookup_established()/__tcp_v6_check_established() and 
      __dccp_v4_check_established() to bring into cache the first element of the 
      list, before the {read|write}_lock(&head->lock);
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Acked-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81c3d547
    • H
      [NET]: Fix packet timestamping. · 325ed823
      Herbert Xu 提交于
      I've found the problem in general.  It affects any 64-bit
      architecture.  The problem occurs when you change the system time.
      
      Suppose that when you boot your system clock is forward by a day.
      This gets recorded down in skb_tv_base.  You then wind the clock back
      by a day.  From that point onwards the offset will be negative which
      essentially overflows the 32-bit variables they're stored in.
      
      In fact, why don't we just store the real time stamp in those 32-bit
      variables? After all, we're not going to overflow for quite a while
      yet.
      
      When we do overflow, we'll need a better solution of course.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      325ed823
  17. 02 10月, 2005 1 次提交
  18. 01 10月, 2005 1 次提交
    • Z
      [PATCH] aio: remove unlocked task_list test and resulting race · 897f15fb
      Zach Brown 提交于
      Only one of the run or kick path is supposed to put an iocb on the run
      list.  If both of them do it than one of them can end up referencing a
      freed iocb.  The kick path could delete the task_list item from the wait
      queue before getting the ctx_lock and putting the iocb on the run list.
      The run path was testing the task_list item outside the lock so that it
      could catch ki_retry methods that return -EIOCBRETRY *without* putting the
      iocb on a wait queue and promising to call kick_iocb.  This unlocked check
      could then race with the kick path to cause both to try and put the iocb on
      the run list.
      
      The patch stops the run path from testing task_list by requring that any
      ki_retry that returns -EIOCBRETRY *must* guarantee that kick_iocb() will be
      called in the future.  aio_p{read,write}, the only in-tree -EIOCBRETRY
      users, are updated.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      Signed-off-by: NBenjamin LaHaise <bcrl@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      897f15fb
  19. 30 9月, 2005 2 次提交
    • L
      Revert task flag re-ordering, add comments · 4a8342d2
      Linus Torvalds 提交于
      Roland points out that the flags end up having non-obvious dependencies
      elsewhere, so revert aa55a086 and add
      some comments about why things are as they are.
      
      We'll just have to fix up the broken comparisons. Roland has a patch.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4a8342d2
    • O
      [PATCH] fix TASK_STOPPED vs TASK_NONINTERACTIVE interaction · aa55a086
      Oleg Nesterov 提交于
      do_signal_stop:
      
      	for_each_thread(t) {
      		if (t->state < TASK_STOPPED)
      			++sig->group_stop_count;
      	}
      
      However, TASK_NONINTERACTIVE > TASK_STOPPED, so this loop will not
      count TASK_INTERRUPTIBLE | TASK_NONINTERACTIVE threads.
      
      See also wait_task_stopped(), which checks ->state > TASK_STOPPED.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      
      [ We really probably should always use the appropriate bitmasks to test
        task states, not do it like this. Using something like
      
      	#define TASK_RUNNABLE (TASK_RUNNING | TASK_INTERRUPTIBLE | \
      				TASK_UNINTERRUPTIBLE | TASK_NONINTERACTIVE)
      
        and then doing "if (task->state & TASK_RUNNABLE)" or similar. But the
        ordering of the task states is historical, and keeping the ordering
        does make sense regardless. ]
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      aa55a086
  20. 29 9月, 2005 1 次提交
    • D
      [PATCH] Keys: Add possessor permissions to keys [try #3] · 664cceb0
      David Howells 提交于
      The attached patch adds extra permission grants to keys for the possessor of a
      key in addition to the owner, group and other permissions bits. This makes
      SUID binaries easier to support without going as far as labelling keys and key
      targets using the LSM facilities.
      
      This patch adds a second "pointer type" to key structures (struct key_ref *)
      that can have the bottom bit of the address set to indicate the possession of
      a key. This is propagated through searches from the keyring to the discovered
      key. It has been made a separate type so that the compiler can spot attempts
      to dereference a potentially incorrect pointer.
      
      The "possession" attribute can't be attached to a key structure directly as
      it's not an intrinsic property of a key.
      
      Pointers to keys have been replaced with struct key_ref *'s wherever
      possession information needs to be passed through.
      
      This does assume that the bottom bit of the pointer will always be zero on
      return from kmem_cache_alloc().
      
      The key reference type has been made into a typedef so that at least it can be
      located in the sources, even though it's basically a pointer to an undefined
      type. I've also renamed the accessor functions to be more useful, and all
      reference variables should now end in "_ref".
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      664cceb0
  21. 28 9月, 2005 3 次提交
    • B
      [NET]: Fix GCC4 compile error: sysctl in linux/if_ether.h · 2fab35d7
      Ben Dooks 提交于
      The following is generated when compiling a
      recent (2.6.14-rc2-git5) kernel configured for
      ARM, with GCC4. 
      
        CC      init/main.o
      In file included from include/linux/netdevice.h:29,
                       from include/net/sock.h:48,
                       from init/main.c:50:
      include/linux/if_ether.h:114: error: array type has incomplete element type
      
      It seems that if CONFIG_SYSCTL is not set, then
      the compiler will throw an error due to the definition
      of the ether_table[] array
      
      Attached is a solution to the problem
      Signed-off-by: NBen Dooks <ben-linux@fluff.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2fab35d7
    • D
      [NET]: Add Sun Cassini driver. · 1f26dac3
      David S. Miller 提交于
      Written by Adrian Sun (asun@darksunrising.com).
      Ported to 2.6.x by Tom 'spot' Callaway <tcallawa@redhat.com>.
      Further cleaned up and integrated by David S. Miller
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f26dac3
    • E
      [NET]: Reorder some hot fields of struct net_device · 9356b8fc
      Eric Dumazet 提交于
      Place them on separate cache lines in SMP to lower memory bouncing
      between multiple CPU accessing the device.
      
           - One part is mostly used on receive path (including
             eth_type_trans()) (poll_list, poll, quota, weight, last_rx,
             dev_addr, broadcast)
      
           - One part is mostly used on queue transmit path (qdisc)
            (queue_lock, qdisc, qdisc_sleeping, qdisc_list, tx_queue_len)
      
           - One part is mostly used on xmit path (device)
            (xmit_lock, xmit_lock_owner, priv, hard_start_xmit, trans_start)
      
      'features' is placed outside of these hot points, in a location that
      may be shared by all cpus (because mostly read)
      
      name_hlist is moved close to name[IFNAMSIZ] to speedup __dev_get_by_name()
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9356b8fc
  22. 27 9月, 2005 2 次提交