1. 28 7月, 2009 1 次提交
    • B
      mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() · 9e1b32ca
      Benjamin Herrenschmidt 提交于
      mm: Pass virtual address to [__]p{te,ud,md}_free_tlb()
      
      Upcoming paches to support the new 64-bit "BookE" powerpc architecture
      will need to have the virtual address corresponding to PTE page when
      freeing it, due to the way the HW table walker works.
      
      Basically, the TLB can be loaded with "large" pages that cover the whole
      virtual space (well, sort-of, half of it actually) represented by a PTE
      page, and which contain an "indirect" bit indicating that this TLB entry
      RPN points to an array of PTEs from which the TLB can then create direct
      entries. Thus, in order to invalidate those when PTE pages are deleted,
      we need the virtual address to pass to tlbilx or tlbivax instructions.
      
      The old trick of sticking it somewhere in the PTE page struct page sucks
      too much, the address is almost readily available in all call sites and
      almost everybody implemets these as macros, so we may as well add the
      argument everywhere. I added it to the pmd and pud variants for consistency.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: David Howells <dhowells@redhat.com> [MN10300 & FRV]
      Acked-by: NNick Piggin <npiggin@suse.de>
      Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e1b32ca
  2. 27 7月, 2009 1 次提交
  3. 25 7月, 2009 1 次提交
  4. 24 7月, 2009 1 次提交
  5. 23 7月, 2009 2 次提交
    • A
      of/mdio: Add support function for Ethernet fixed-link property · 24c30dbb
      Anton Vorontsov 提交于
      Fixed-link support is broken for the ucc_eth, gianfar, and fs_enet
      device drivers.  The "OF MDIO rework" patches removed most of the
      support. Instead of re-adding fixed-link stuff to the drivers, this
      patch adds a support function for parsing the fixed-link property
      and obtaining a dummy phy to match.
      
      Note: the dummy phy handling in arch/powerpc is a bit of a hack and
      needs to be reworked.  This function is being added now to solve the
      regression in the Ethernet drivers, but it should be considered a
      temporary measure until the fixed link handling can be reworked.
      Signed-off-by: NAnton Vorontsov <avorontsov@ru.mvista.com>
      Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      24c30dbb
    • P
      perf_counter: PERF_SAMPLE_ID and inherited counters · 7f453c24
      Peter Zijlstra 提交于
      Anton noted that for inherited counters the counter-id as provided by
      PERF_SAMPLE_ID isn't mappable to the id found through PERF_RECORD_ID
      because each inherited counter gets its own id.
      
      His suggestion was to always return the parent counter id, since that
      is the primary counter id as exposed. However, these inherited
      counters have a unique identifier so that events like
      PERF_EVENT_PERIOD and PERF_EVENT_THROTTLE can be specific about which
      counter gets modified, which is important when trying to normalize the
      sample streams.
      
      This patch removes PERF_EVENT_PERIOD in favour of PERF_SAMPLE_PERIOD,
      which is more useful anyway, since changing periods became a lot more
      common than initially thought -- rendering PERF_EVENT_PERIOD the less
      useful solution (also, PERF_SAMPLE_PERIOD reports the more accurate
      value, since it reports the value used to trigger the overflow,
      whereas PERF_EVENT_PERIOD simply reports the requested period changed,
      which might only take effect on the next cycle).
      
      This still leaves us PERF_EVENT_THROTTLE to consider, but since that
      _should_ be a rare occurrence, and linking it to a primary id is the
      most useful bit to diagnose the problem, we introduce a
      PERF_SAMPLE_STREAM_ID, for those few cases where the full
      reconstruction is important.
      
      [Does change the ABI a little, but I see no other way out]
      Suggested-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1248095846.15751.8781.camel@twins>
      7f453c24
  6. 22 7月, 2009 3 次提交
    • P
      softirq: introduce tasklet_hrtimer infrastructure · 9ba5f005
      Peter Zijlstra 提交于
      commit ca109491 (hrtimer: removing all ur callback modes) moved all
      hrtimer callbacks into hard interrupt context when high resolution
      timers are active. That breaks code which relied on the assumption
      that the callback happens in softirq context.
      
      Provide a generic infrastructure which combines tasklets and hrtimers
      together to provide an in-softirq hrtimer experience.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: torvalds@linux-foundation.org
      Cc: kaber@trash.net
      Cc: David Miller <davem@davemloft.net>
      LKML-Reference: <1248265724.27058.1366.camel@twins>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      9ba5f005
    • E
      inotify: use GFP_NOFS under potential memory pressure · f44aebcc
      Eric Paris 提交于
      inotify can have a watchs removed under filesystem reclaim.
      
      =================================
      [ INFO: inconsistent lock state ]
      2.6.31-rc2 #16
      ---------------------------------
      inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
      khubd/217 [HC0[0]:SC0[0]:HE1:SE1] takes:
       (iprune_mutex){+.+.?.}, at: [<c10ba899>] invalidate_inodes+0x20/0xe3
      {IN-RECLAIM_FS-W} state was registered at:
        [<c10536ab>] __lock_acquire+0x2c9/0xac4
        [<c1053f45>] lock_acquire+0x9f/0xc2
        [<c1308872>] __mutex_lock_common+0x2d/0x323
        [<c1308c00>] mutex_lock_nested+0x2e/0x36
        [<c10ba6ff>] shrink_icache_memory+0x38/0x1b2
        [<c108bfb6>] shrink_slab+0xe2/0x13c
        [<c108c3e1>] kswapd+0x3d1/0x55d
        [<c10449b5>] kthread+0x66/0x6b
        [<c1003fdf>] kernel_thread_helper+0x7/0x10
        [<ffffffff>] 0xffffffff
      
      Two things are needed to fix this.  First we need a method to tell
      fsnotify_create_event() to use GFP_NOFS and second we need to stop using
      one global IN_IGNORED event and allocate them one at a time.  This solves
      current issues with multiple IN_IGNORED on a queue having tail drop
      problems and simplifies the allocations since we don't have to worry about
      two tasks opperating on the IGNORED event concurrently.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      f44aebcc
    • A
      rfkill: remove too-strict __must_check · e56f0975
      Alan Jenkins 提交于
      Some drivers don't need the return value of rfkill_set_hw_state(),
      so it should not be marked as __must_check.
      Signed-off-by: NAlan Jenkins <alan-jenkins@tuffmail.co.uk>
      Acked-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      e56f0975
  7. 21 7月, 2009 1 次提交
    • T
      genirq: Delegate irq affinity setting to the irq thread · 591d2fb0
      Thomas Gleixner 提交于
      irq_set_thread_affinity() calls set_cpus_allowed_ptr() which might
      sleep, but irq_set_thread_affinity() is called with desc->lock held
      and can be called from hard interrupt context as well. The code has
      another bug as it does not hold a ref on the task struct as required
      by set_cpus_allowed_ptr().
      
      Just set the IRQTF_AFFINITY bit in action->thread_flags. The next time
      the thread runs it migrates itself. Solves all of the above problems
      nicely.
      
      Add kerneldoc to irq_set_thread_affinity() while at it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <new-submission>
      591d2fb0
  8. 20 7月, 2009 1 次提交
  9. 18 7月, 2009 2 次提交
    • T
      sched: fix nr_uninterruptible accounting of frozen tasks really · 6301cb95
      Thomas Gleixner 提交于
      commit e3c8ca83 (sched: do not count frozen tasks toward load) broke
      the nr_uninterruptible accounting on freeze/thaw. On freeze the task
      is excluded from accounting with a check for (task->flags &
      PF_FROZEN), but that flag is cleared before the task is thawed. So
      while we prevent that the task with state TASK_UNINTERRUPTIBLE
      is accounted to nr_uninterruptible on freeze we decrement
      nr_uninterruptible on thaw.
      
      Use a separate flag which is handled by the freezing task itself. Set
      it before calling the scheduler with TASK_UNINTERRUPTIBLE state and
      clear it after we return from frozen state.
      
      Cc: <stable@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6301cb95
    • T
      vmlinux.lds.h: restructure BSS linker script macros · 04e448d9
      Tim Abbott 提交于
      The BSS section macros in vmlinux.lds.h currently place the .sbss
      input section outside the bounds of [__bss_start, __bss_end].  On all
      architectures except for microblaze that handle both .sbss and
      __bss_start/__bss_end, this is wrong: the .sbss input section is
      within the range [__bss_start, __bss_end].  Relatedly, the example
      code at the top of the file actually has __bss_start/__bss_end defined
      twice; I believe the right fix here is to define them in the
      BSS_SECTION macro but not in the BSS macro.
      
      Another problem with the current macros is that several
      architectures have an ALIGN(4) or some other small number just before
      __bss_stop in their linker scripts.  The BSS_SECTION macro currently
      hardcodes this to 4; while it should really be an argument.  It also
      ignores its sbss_align argument; fix that.
      
      mn10300 is the only user at present of any of the macros touched by
      this patch.  It looks like mn10300 actually was incorrectly converted
      to use the new BSS() macro (the alignment of 4 prior to conversion was
      a __bss_stop alignment, but the argument to the BSS macro is a start
      alignment).  So fix this as well.
      
      I'd like acks from Sam and David on this one.  Also CCing Paul, since
      he has a patch from me which will need to be updated to use
      BSS_SECTION(0, PAGE_SIZE, 4) once this gets merged.
      Signed-off-by: NTim Abbott <tabbott@ksplice.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      04e448d9
  10. 17 7月, 2009 4 次提交
  11. 16 7月, 2009 1 次提交
    • J
      ext3: Get rid of extenddisksize parameter of ext3_get_blocks_handle() · 43237b54
      Jan Kara 提交于
      Get rid of extenddisksize parameter of ext3_get_blocks_handle(). This seems to
      be a relict from some old days and setting disksize in this function does not
      make much sence. Currently it was set only by ext3_getblk().  Since the
      parameter has some effect only if create == 1, it is easy to check that the
      three callers which end up calling ext3_getblk() with create == 1 (ext3_append,
      ext3_quota_write, ext3_mkdir) do the right thing and set disksize themselves.
      Signed-off-by: NJan Kara <jack@suse.cz>
      43237b54
  12. 15 7月, 2009 3 次提交
  13. 13 7月, 2009 6 次提交
    • L
      tracing/events: Move TRACE_SYSTEM outside of include guard · d0b6e04a
      Li Zefan 提交于
      If TRACE_INCLDUE_FILE is defined, <trace/events/TRACE_INCLUDE_FILE.h>
      will be included and compiled, otherwise it will be
      <trace/events/TRACE_SYSTEM.h>
      
      So TRACE_SYSTEM should be defined outside of #if proctection,
      just like TRACE_INCLUDE_FILE.
      
      Imaging this scenario:
      
       #include <trace/events/foo.h>
          -> TRACE_SYSTEM == foo
       ...
       #include <trace/events/bar.h>
          -> TRACE_SYSTEM == bar
       ...
       #define CREATE_TRACE_POINTS
       #include <trace/events/foo.h>
          -> TRACE_SYSTEM == bar !!!
      
      and then bar.h will be included and compiled.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4A5A9CF1.2010007@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d0b6e04a
    • R
      USB: usb.h: fix kernel-doc notation · e376bbbb
      Randy Dunlap 提交于
      Fix usb.h kernel-doc warnings:
      
      Warning(include/linux/usb.h:918): Excess struct/union/enum/typedef member 'nodename' description in 'usb_device_driver'
      Warning(include/linux/usb.h:939): No description found for parameter 'nodename'
      Warning(include/linux/usb.h:1219): No description found for parameter 'sg'
      Warning(include/linux/usb.h:1219): No description found for parameter 'num_sgs'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      e376bbbb
    • G
      Revert "USB: Add Intel Langwell USB OTG Transceiver Drive" · 1a74826f
      Greg Kroah-Hartman 提交于
      This reverts commit 453f7755.
      
      The driver should not have been accepted as the MSRT code is not
      in the main kernel yet, which this depends on.
      
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Hao Wu <hao.wu@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      1a74826f
    • K
      Driver Core: remove BUS_ID_SIZE · 4ead0a2b
      Kay Sievers 提交于
      The name size limit is gone from the driver-core, this is
      the removal of the last left-over.
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      4ead0a2b
    • A
      headers: smp_lock.h redux · 405f5571
      Alexey Dobriyan 提交于
      * Remove smp_lock.h from files which don't need it (including some headers!)
      * Add smp_lock.h to files which do need it
      * Make smp_lock.h include conditional in hardirq.h
        It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT
      
        This will make hardirq.h inclusion cheaper for every PREEMPT=n config
        (which includes allmodconfig/allyesconfig, BTW)
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      405f5571
    • J
      personality: fix PER_CLEAR_ON_SETID · f9fabcb5
      Julien Tinnes 提交于
      We have found that the current PER_CLEAR_ON_SETID mask on Linux doesn't
      include neither ADDR_COMPAT_LAYOUT, nor MMAP_PAGE_ZERO.
      
      The current mask is READ_IMPLIES_EXEC|ADDR_NO_RANDOMIZE.
      
      We believe it is important to add MMAP_PAGE_ZERO, because by using this
      personality it is possible to have the first page mapped inside a
      process running as setuid root.  This could be used in those scenarios:
      
       - Exploiting a NULL pointer dereference issue in a setuid root binary
       - Bypassing the mmap_min_addr restrictions of the Linux kernel: by
         running a setuid binary that would drop privileges before giving us
         control back (for instance by loading a user-supplied library), we
         could get the first page mapped in a process we control.  By further
         using mremap and mprotect on this mapping, we can then completely
         bypass the mmap_min_addr restrictions.
      
      Less importantly, we believe ADDR_COMPAT_LAYOUT should also be added
      since on x86 32bits it will in practice disable most of the address
      space layout randomization (only the stack will remain randomized).
      Signed-off-by: NJulien Tinnes <jt@cr0.org>
      Signed-off-by: NTavis Ormandy <taviso@sdf.lonestar.org>
      Cc: stable@kernel.org
      Acked-by: NChristoph Hellwig <hch@infradead.org>
      Acked-by: NKees Cook <kees@ubuntu.com>
      Acked-by: NEugene Teo <eugene@redhat.com>
      [ Shortened lines and fixed whitespace as per Christophs' suggestion ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f9fabcb5
  14. 12 7月, 2009 1 次提交
  15. 11 7月, 2009 6 次提交
  16. 10 7月, 2009 3 次提交
    • T
      hrtimer: Fix migration expiry check · 6ff7041d
      Thomas Gleixner 提交于
      The timer migration expiry check should prevent the migration of a
      timer to another CPU when the timer expires before the next event is
      scheduled on the other CPU. Migrating the timer might delay it because
      we can not reprogram the clock event device on the other CPU. But the
      code implementing that check has two flaws:
      
      - for !HIGHRES the check compares the expiry value with the clock
        events device expiry value which is wrong for CLOCK_REALTIME based
        timers.
      
      - the check is racy. It holds the hrtimer base lock of the target CPU,
        but the clock event device expiry value can be modified
        nevertheless, e.g. by an timer interrupt firing.
      
      The !HIGHRES case is easy to fix as we can enqueue the timer on the
      cpu which was selected by the load balancer. It runs the idle
      balancing code once per jiffy anyway. So the maximum delay for the
      timer is the same as when we keep the tick on the current cpu going.
      
      In the HIGHRES case we can get the next expiry value from the hrtimer
      cpu_base of the target CPU and serialize the update with the cpu_base
      lock. This moves the lock section in hrtimer_interrupt() so we can set
      next_event to KTIME_MAX while we are handling the expired timers and
      set it to the next expiry value after we handled the timers under the
      base lock. While the expired timers are processed timer migration is
      blocked because the expiry time of the timer is always <= KTIME_MAX.
      
      Also remove the now useless clockevents_get_next_event() function.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6ff7041d
    • J
      memory barrier: adding smp_mb__after_lock · ad462769
      Jiri Olsa 提交于
      Adding smp_mb__after_lock define to be used as a smp_mb call after
      a lock.
      
      Making it nop for x86, since {read|write|spin}_lock() on x86 are
      full memory barriers.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad462769
    • J
      net: adding memory barrier to the poll and receive callbacks · a57de0b4
      Jiri Olsa 提交于
      Adding memory barrier after the poll_wait function, paired with
      receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
      to wrap the memory barrier.
      
      Without the memory barrier, following race can happen.
      The race fires, when following code paths meet, and the tp->rcv_nxt
      and __add_wait_queue updates stay in CPU caches.
      
      CPU1                         CPU2
      
      sys_select                   receive packet
        ...                        ...
        __add_wait_queue           update tp->rcv_nxt
        ...                        ...
        tp->rcv_nxt check          sock_def_readable
        ...                        {
        schedule                      ...
                                      if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
                                              wake_up_interruptible(sk->sk_sleep)
                                      ...
                                   }
      
      If there was no cache the code would work ok, since the wait_queue and
      rcv_nxt are opposit to each other.
      
      Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already
      passed the tp->rcv_nxt check and sleeps, or will get the new value for
      tp->rcv_nxt and will return with new data mask.
      In both cases the process (CPU1) is being added to the wait queue, so the
      waitqueue_active (CPU2) call cannot miss and will wake up CPU1.
      
      The bad case is when the __add_wait_queue changes done by CPU1 stay in its
      cache, and so does the tp->rcv_nxt update on CPU2 side.  The CPU1 will then
      endup calling schedule and sleep forever if there are no more data on the
      socket.
      
      Calls to poll_wait in following modules were ommited:
      	net/bluetooth/af_bluetooth.c
      	net/irda/af_irda.c
      	net/irda/irnet/irnet_ppp.c
      	net/mac80211/rc80211_pid_debugfs.c
      	net/phonet/socket.c
      	net/rds/af_rds.c
      	net/rfkill/core.c
      	net/sunrpc/cache.c
      	net/sunrpc/rpc_pipe.c
      	net/tipc/socket.c
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a57de0b4
  17. 09 7月, 2009 3 次提交