1. 07 3月, 2012 1 次提交
    • N
      tcp: fix tcp_shift_skb_data() to not shift SACKed data below snd_una · 4648dc97
      Neal Cardwell 提交于
      This commit fixes tcp_shift_skb_data() so that it does not shift
      SACKed data below snd_una.
      
      This fixes an issue whose symptoms exactly match reports showing
      tp->sacked_out going negative since 3.3.0-rc4 (see "WARNING: at
      net/ipv4/tcp_input.c:3418" thread on netdev).
      
      Since 2008 (832d11c5)
      tcp_shift_skb_data() had been shifting SACKed ranges that were below
      snd_una. It checked that the *end* of the skb it was about to shift
      from was above snd_una, but did not check that the end of the actual
      shifted range was above snd_una; this commit adds that check.
      
      Shifting SACKed ranges below snd_una is problematic because for such
      ranges tcp_sacktag_one() short-circuits: it does not declare anything
      as SACKed and does not increase sacked_out.
      
      Before the fixes in commits cc9a672e
      and daef52ba, shifting SACKed ranges
      below snd_una happened to work because tcp_shifted_skb() was always
      (incorrectly) passing in to tcp_sacktag_one() an skb whose end_seq
      tcp_shift_skb_data() had already guaranteed was beyond snd_una. Hence
      tcp_sacktag_one() never short-circuited and always increased
      tp->sacked_out in this case.
      
      After those two fixes, my testing has verified that shifting SACKed
      ranges below snd_una could cause tp->sacked_out to go negative with
      the following sequence of events:
      
      (1) tcp_shift_skb_data() sees an skb whose end_seq is beyond snd_una,
          then shifts a prefix of that skb that is below snd_una
      
      (2) tcp_shifted_skb() increments the packet count of the
          already-SACKed prev sk_buff
      
      (3) tcp_sacktag_one() sees the end of the new SACKed range is below
          snd_una, so it short-circuits and doesn't increase tp->sacked_out
      
      (5) tcp_clean_rtx_queue() sees the SACKed skb has been ACKed,
          decrements tp->sacked_out by this "inflated" pcount that was
          missing a matching increase in tp->sacked_out, and hence
          tp->sacked_out underflows to a u32 like 0xFFFFFFFF, which casted
          to s32 is negative.
      
      (6) this leads to the warnings seen in the recent "WARNING: at
          net/ipv4/tcp_input.c:3418" thread on the netdev list; e.g.:
          tcp_input.c:3418  WARN_ON((int)tp->sacked_out < 0);
      
      More generally, I think this bug can be tickled in some cases where
      two or more ACKs from the receiver are lost and then a DSACK arrives
      that is immediately above an existing SACKed skb in the write queue.
      
      This fix changes tcp_shift_skb_data() to abort this sequence at step
      (1) in the scenario above by noticing that the bytes are below snd_una
      and not shifting them.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4648dc97
  2. 06 3月, 2012 38 次提交
    • T
      tg3: Fix to use multi queue BQL interfaces · 5cb917bc
      Tom Herbert 提交于
      Fix tg3 to use BQL multi queue related netdev interfaces since the
      device supports multi queue.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Reported-by: NChristoph Lameter <cl@gentwo.org>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5cb917bc
    • L
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f3969bf7
      Linus Torvalds 提交于
      Pull perf fixes from Ingo Molnar:
       "It contains three cherry-picked fixes from perf/core, which turned out
        to be more urgent than we originally thought."
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf tools: Handle kernels that don't support attr.exclude_{guest,host}
        perf tools: Change perf_guest default back to false
        perf record: No build id option fails
      f3969bf7
    • L
      Merge tag 'usb-3.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 98e990af
      Linus Torvalds 提交于
      USB: revert a powerpc EHCI patch
      
      There is just one patch in here, a revert of a powerpc EHCI driver
      patch that was reported to cause problems.
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      * tag 'usb-3.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        Revert "powerpc/usb: fix issue of CPU halt when missing USB PHY clock"
      98e990af
    • L
      Merge tag 'tty-3.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 75d7b398
      Linus Torvalds 提交于
      tty: build fix for 3.3-rc6
      
      This contains one build fix for the powerpc udbg driver that was reported.
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      * tag 'tty-3.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        tty/powerpc: early udbg consoles can't be modules
      75d7b398
    • L
      Merge tag 'md-3.3-fixes' of git://neil.brown.name/md · a2e5f13c
      Linus Torvalds 提交于
      Pull md fixes from Neil Brown:
       "Three fixes for md in 3.3-rc: Two relate to the recently added drive
        replacement.  One fixes the problem where a read error in RAID10 would
        sometimes be retried indefinitely."
      
      * tag 'md-3.3-fixes' of git://neil.brown.name/md:
        md/raid10: fix assembling of arrays with replacement devices.
        md/raid10: fix handling of error on last working device in array.
        md/raid1: fix buglet in md_raid1_contested.
      a2e5f13c
    • L
      Merge branch 'akpm' (Andrew's patch bomb) · 3e85fb9c
      Linus Torvalds 提交于
      Merge the emailed seties of 19 patches from Andrew Morton
      
      * akpm:
        rapidio/tsi721: fix queue wrapping bug in inbound doorbell handler
        memcg: fix mapcount check in move charge code for anonymous page
        mm: thp: fix BUG on mm->nr_ptes
        alpha: fix 32/64-bit bug in futex support
        memcg: fix GPF when cgroup removal races with last exit
        debugobjects: Fix selftest for static warnings
        floppy/scsi: fix setting of BIO flags
        memcg: fix deadlock by inverting lrucare nesting
        drivers/rtc/rtc-r9701.c: fix crash in r9701_remove()
        c2port: class_create() returns an ERR_PTR
        pps: class_create() returns an ERR_PTR, not NULL
        hung_task: fix the broken rcu_lock_break() logic
        vfork: kill PF_STARTING
        coredump_wait: don't call complete_vfork_done()
        vfork: make it killable
        vfork: introduce complete_vfork_done()
        aio: wake up waiters when freeing unused kiocbs
        kprobes: return proper error code from register_kprobe()
        kmsg_dump: don't run on non-error paths by default
      3e85fb9c
    • A
      rapidio/tsi721: fix queue wrapping bug in inbound doorbell handler · b24823e6
      Alexandre Bounine 提交于
      Fix a bug that causes a kernel panic when the number of received doorbells
      is larger than number of entries in the inbound doorbell queue (current
      default value = 512).
      
      Another possible indication for this bug is large number of spurious
      doorbells reported by tsi721 driver after reaching the queue size maximum.
      Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Chul Kim <chul.kim@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: <stable@vger.kernel.org>		[3.2.x+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b24823e6
    • N
      memcg: fix mapcount check in move charge code for anonymous page · e6ca7b89
      Naoya Horiguchi 提交于
      Currently the charge on shared anonyous pages is supposed not to moved in
      task migration.  To implement this, we need to check that mapcount > 1,
      instread of > 2.  So this patch fixes it.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6ca7b89
    • A
      mm: thp: fix BUG on mm->nr_ptes · 1c641e84
      Andrea Arcangeli 提交于
      Dave Jones reports a few Fedora users hitting the BUG_ON(mm->nr_ptes...)
      in exit_mmap() recently.
      
      Quoting Hugh's discovery and explanation of the SMP race condition:
      
        "mm->nr_ptes had unusual locking: down_read mmap_sem plus
         page_table_lock when incrementing, down_write mmap_sem (or mm_users
         0) when decrementing; whereas THP is careful to increment and
         decrement it under page_table_lock.
      
         Now most of those paths in THP also hold mmap_sem for read or write
         (with appropriate checks on mm_users), but two do not: when
         split_huge_page() is called by hwpoison_user_mappings(), and when
         called by add_to_swap().
      
         It's conceivable that the latter case is responsible for the
         exit_mmap() BUG_ON mm->nr_ptes that has been reported on Fedora."
      
      The simplest way to fix it without having to alter the locking is to make
      split_huge_page() a noop in nr_ptes terms, so by counting the preallocated
      pagetables that exists for every mapped hugepage.  It was an arbitrary
      choice not to count them and either way is not wrong or right, because
      they are not used but they're still allocated.
      Reported-by: NDave Jones <davej@redhat.com>
      Reported-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Josh Boyer <jwboyer@redhat.com>
      Cc: <stable@vger.kernel.org>	[3.0.x, 3.1.x, 3.2.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c641e84
    • A
      alpha: fix 32/64-bit bug in futex support · 62aca403
      Andrew Morton 提交于
      Michael Cree said:
      
      : : I have noticed some user space problems (pulseaudio crashes in pthread
      : : code, glibc/nptl test suite failures, java compiler freezes on SMP alpha
      : : systems) that arise when using a 2.6.39 or later kernel on Alpha.
      : : Bisecting between 2.6.38 and 2.6.39 (using glibc/nptl test suite as
      : : criterion for good/bad kernel) eventually leads to:
      : :
      : : 8d7718aa is the first bad commit
      : : commit 8d7718aa
      : : Author: Michel Lespinasse <walken@google.com>
      : : Date:   Thu Mar 10 18:50:58 2011 -0800
      : :
      : :     futex: Sanitize futex ops argument types
      : :
      : :     Change futex_atomic_op_inuser and futex_atomic_cmpxchg_inatomic
      : :     prototypes to use u32 types for the futex as this is the data type the
      : :     futex core code uses all over the place.
      : :
      : : Looking at the commit I see there is a change of the uaddr argument in
      : : the Alpha architecture specific code for futexes from int to u32, but I
      : : don't see why this should cause a problem.
      
      Richard Henderson said:
      
      : futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
      :                               u32 oldval, u32 newval)
      : ...
      :         :       "r"(uaddr), "r"((long)oldval), "r"(newval)
      :
      :
      : There is no 32-bit compare instruction.  These are implemented by
      : consistently extending the values to a 64-bit type.  Since the
      : load instruction sign-extends, we want to sign-extend the other
      : quantity as well (despite the fact it's logically unsigned).
      :
      : So:
      :
      : -        :       "r"(uaddr), "r"((long)oldval), "r"(newval)
      : +        :       "r"(uaddr), "r"((long)(int)oldval), "r"(newval)
      :
      : should do the trick.
      
      Michael said:
      
      : This fixes the glibc test suite failures and the pulseaudio related
      : crashes, but it does not fix the java compiiler lockups that I was (and
      : are still) observing.  That is some other problem.
      Reported-by: NMichael Cree <mcree@orcon.net.nz>
      Tested-by: NMichael Cree <mcree@orcon.net.nz>
      Acked-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Reviewed-by: NMatt Turner <mattst88@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      62aca403
    • H
      memcg: fix GPF when cgroup removal races with last exit · 7512102c
      Hugh Dickins 提交于
      When moving tasks from old memcg (with move_charge_at_immigrate on new
      memcg), followed by removal of old memcg, hit General Protection Fault in
      mem_cgroup_lru_del_list() (called from release_pages called from
      free_pages_and_swap_cache from tlb_flush_mmu from tlb_finish_mmu from
      exit_mmap from mmput from exit_mm from do_exit).
      
      Somewhat reproducible, takes a few hours: the old struct mem_cgroup has
      been freed and poisoned by SLAB_DEBUG, but mem_cgroup_lru_del_list() is
      still trying to update its stats, and take page off lru before freeing.
      
      A task, or a charge, or a page on lru: each secures a memcg against
      removal.  In this case, the last task has been moved out of the old memcg,
      and it is exiting: anonymous pages are uncharged one by one from the
      memcg, as they are zapped from its pagetables, so the charge gets down to
      0; but the pages themselves are queued in an mmu_gather for freeing.
      
      Most of those pages will be on lru (and force_empty is careful to
      lru_add_drain_all, to add pages from pagevec to lru first), but not
      necessarily all: perhaps some have been isolated for page reclaim, perhaps
      some isolated for other reasons.  So, force_empty may find no task, no
      charge and no page on lru, and let the removal proceed.
      
      There would still be no problem if these pages were immediately freed; but
      typically (and the put_page_testzero protocol demands it) they have to be
      added back to lru before they are found freeable, then removed from lru
      and freed.  We don't see the issue when adding, because the
      mem_cgroup_iter() loops keep their own reference to the memcg being
      scanned; but when it comes to mem_cgroup_lru_del_list().
      
      I believe this was not an issue in v3.2: there, PageCgroupAcctLRU and
      PageCgroupUsed flags were used (like a trick with mirrors) to deflect view
      of pc->mem_cgroup to the stable root_mem_cgroup when neither set.
      38c5d72f ("memcg: simplify LRU handling by new rule") mercifully
      removed those convolutions, but left this General Protection Fault.
      
      But it's surprisingly easy to restore the old behaviour: just check
      PageCgroupUsed in mem_cgroup_lru_add_list() (which decides on which lruvec
      to add), and reset pc to root_mem_cgroup if page is uncharged.  A risky
      change?  just going back to how it worked before; testing, and an audit of
      uses of pc->mem_cgroup, show no problem.
      
      And there's a nice bonus: with mem_cgroup_lru_add_list() itself making
      sure that an uncharged page goes to root lru, mem_cgroup_reset_owner() no
      longer has any purpose, and we can safely revert 4e5f01c2 ("memcg:
      clear pc->mem_cgroup if necessary").
      
      Calling update_page_reclaim_stat() after add_page_to_lru_list() in swap.c
      is not strictly necessary: the lru_lock there, with RCU before memcg
      structures are freed, makes mem_cgroup_get_reclaim_stat_from_page safe
      without that; but it seems cleaner to rely on one dependency less.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7512102c
    • S
      debugobjects: Fix selftest for static warnings · 9f78ff00
      Stephen Boyd 提交于
      debugobjects is now printing a warning when a fixup for a NOTAVAILABLE
      object is run.  This causes the selftest to fail like:
      
      	ODEBUG: selftest warnings failed 4 != 5
      
      We could just increase the number of warnings that the selftest is
      expecting to see because that is actually what has changed.  But, it turns
      out that fixup_activate() was written with inverted logic and thus a fixup
      for a static object returned 1 indicating the object had been fixed, and 0
      otherwise.  Fix the logic to be correct and update the counts to reflect
      that nothing needed fixing for a static object.
      Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
      Reported-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f78ff00
    • M
      floppy/scsi: fix setting of BIO flags · 9354f1b8
      Muthu Kumar 提交于
      Fix setting bio flags in drivers (sd_dif/floppy).
      Signed-off-by: NMuthukumar R <muthur@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9354f1b8
    • H
      memcg: fix deadlock by inverting lrucare nesting · 9ce70c02
      Hugh Dickins 提交于
      We have forgotten the rules of lock nesting: the irq-safe ones must be
      taken inside the non-irq-safe ones, otherwise we are open to deadlock:
      
      CPU0                          CPU1
      ----                          ----
      lock(&(&pc->lock)->rlock);
                                    local_irq_disable();
                                    lock(&(&zone->lru_lock)->rlock);
                                    lock(&(&pc->lock)->rlock);
      <Interrupt>
      lock(&(&zone->lru_lock)->rlock);
      
      To check a different locking issue, I happened to add a spin_lock to
      memcg's bit_spin_lock in lock_page_cgroup(), and lockdep very quickly
      complained about __mem_cgroup_commit_charge_lrucare() (on CPU1 above).
      
      So delete __mem_cgroup_commit_charge_lrucare(), passing a bool lrucare to
      __mem_cgroup_commit_charge() instead, taking zone->lru_lock under
      lock_page_cgroup() in the lrucare case.
      
      The original was using spin_lock_irqsave, but we'd be in more trouble if
      it were ever called at interrupt time: unconditional _irq is enough.  And
      ClearPageLRU before del from lru, SetPageLRU before add to lru: no strong
      reason, but that is the ordering used consistently elsewhere.
      
      Fixes 36b62ad5 ("memcg: simplify corner case handling
      of LRU").
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9ce70c02
    • A
      drivers/rtc/rtc-r9701.c: fix crash in r9701_remove() · 73737b87
      Anatolij Gustschin 提交于
      If probing the RTC didn't succeed due to failed RTC register access, the
      RTC device will be unregistered.  Then, when removing the module
      r9701_remove() causes a kernel crash while trying to unregister a not
      registered RTC device.  Fix this by doing RTC register access test before
      RTC device registration.
      Signed-off-by: NAnatolij Gustschin <agust@denx.de>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73737b87
    • D
      c2port: class_create() returns an ERR_PTR · 22ea71d7
      Dan Carpenter 提交于
      class_create() doesn't return a NULL, it only returns ERR_PTRs.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      22ea71d7
    • D
      pps: class_create() returns an ERR_PTR, not NULL · 7ad12566
      Dan Carpenter 提交于
      class_create() never returns NULLs only ERR_PTRs.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Cc: Rodolfo Giometti <giometti@enneenne.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ad12566
    • O
      hung_task: fix the broken rcu_lock_break() logic · 6027ce49
      Oleg Nesterov 提交于
      check_hung_uninterruptible_tasks()->rcu_lock_break() introduced by
      "softlockup: check all tasks in hung_task" commit ce9dbe24 looks
      absolutely wrong.
      
      	- rcu_lock_break() does put_task_struct(). If the task has exited
      	  it is not safe to even read its ->state, nothing protects this
      	  task_struct.
      
      	- The TASK_DEAD checks are wrong too. Contrary to the comment, we
      	  can't use it to check if the task was unhashed. It can be unhashed
      	  without TASK_DEAD, or it can be valid with TASK_DEAD.
      
      	  For example, an autoreaping task can do release_task(current)
      	  long before it sets TASK_DEAD in do_exit().
      
      	  Or, a zombie task can have ->state == TASK_DEAD but release_task()
      	  was not called, and in this case we must not break the loop.
      
      Change this code to check pid_alive() instead, and do this before we drop
      the reference to the task_struct.
      
      Note: while_each_thread() under rcu_read_lock() is not really safe, it can
      livelock.  This will be fixed later, but fortunately in this case the
      "max_count" logic saves us anyway.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NMandeep Singh Baines <msb@google.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6027ce49
    • O
      vfork: kill PF_STARTING · 6e27f63e
      Oleg Nesterov 提交于
      Previously it was (ab)used by utrace.  Then it was wrongly used by the
      scheduler code.
      
      Currently it is not used, kill it before it finds the new erroneous user.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6e27f63e
    • O
      coredump_wait: don't call complete_vfork_done() · 57b59c4a
      Oleg Nesterov 提交于
      Now that CLONE_VFORK is killable, coredump_wait() no longer needs
      complete_vfork_done().  zap_threads() should find and kill all tasks with
      the same ->mm, this includes our parent if ->vfork_done is set.
      
      mm_release() becomes the only caller, unexport complete_vfork_done().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57b59c4a
    • O
      vfork: make it killable · d68b46fe
      Oleg Nesterov 提交于
      Make vfork() killable.
      
      Change do_fork(CLONE_VFORK) to do wait_for_completion_killable().  If it
      fails we do not return to the user-mode and never touch the memory shared
      with our child.
      
      However, in this case we should clear child->vfork_done before return, we
      use task_lock() in do_fork()->wait_for_vfork_done() and
      complete_vfork_done() to serialize with each other.
      
      Note: now that we use task_lock() we don't really need completion, we
      could turn task->vfork_done into "task_struct *wake_up_me" but this needs
      some complications.
      
      NOTE: this and the next patches do not affect in-kernel users of
      CLONE_VFORK, kernel threads run with all signals ignored including
      SIGKILL/SIGSTOP.
      
      However this is obviously the user-visible change.  Not only a fatal
      signal can kill the vforking parent, a sub-thread can do execve or
      exit_group() and kill the thread sleeping in vfork().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d68b46fe
    • O
      vfork: introduce complete_vfork_done() · c415c3b4
      Oleg Nesterov 提交于
      No functional changes.
      
      Move the clear-and-complete-vfork_done code into the new trivial helper,
      complete_vfork_done().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c415c3b4
    • J
      aio: wake up waiters when freeing unused kiocbs · 880641bb
      Jeff Moyer 提交于
      Bart Van Assche reported a hung fio process when either hot-removing
      storage or when interrupting the fio process itself.  The (pruned) call
      trace for the latter looks like so:
      
        fio             D 0000000000000001     0  6849   6848 0x00000004
         ffff880092541b88 0000000000000046 ffff880000000000 ffff88012fa11dc0
         ffff88012404be70 ffff880092541fd8 ffff880092541fd8 ffff880092541fd8
         ffff880128b894d0 ffff88012404be70 ffff880092541b88 000000018106f24d
        Call Trace:
          schedule+0x3f/0x60
          io_schedule+0x8f/0xd0
          wait_for_all_aios+0xc0/0x100
          exit_aio+0x55/0xc0
          mmput+0x2d/0x110
          exit_mm+0x10d/0x130
          do_exit+0x671/0x860
          do_group_exit+0x44/0xb0
          get_signal_to_deliver+0x218/0x5a0
          do_signal+0x65/0x700
          do_notify_resume+0x65/0x80
          int_signal+0x12/0x17
      
      The problem lies with the allocation batching code.  It will
      opportunistically allocate kiocbs, and then trim back the list of iocbs
      when there is not enough room in the completion ring to hold all of the
      events.
      
      In the case above, what happens is that the pruning back of events ends
      up freeing up the last active request and the context is marked as dead,
      so it is thus responsible for waking up waiters.  Unfortunately, the
      code does not check for this condition, so we end up with a hung task.
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Reported-by: NBart Van Assche <bvanassche@acm.org>
      Tested-by: NBart Van Assche <bvanassche@acm.org>
      Cc: <stable@kernel.org>		[3.2.x only]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      880641bb
    • P
      kprobes: return proper error code from register_kprobe() · f986a499
      Prashanth Nageshappa 提交于
      register_kprobe() aborts if the address of the new request falls in a
      prohibited area (such as ftrace pouch, __kprobes annotated functions,
      non-kernel text addresses, jump label text).  We however don't return the
      right error on this abort, resulting in a silent failure - incorrect
      adding/reporting of kprobes ('perf probe do_fork+18' or 'perf probe
      mcount' for instance).
      
      In V2 we are incorporating Masami Hiramatsu's  feedback.
      
      This patch fixes it by returning -EINVAL upon failure.
      
      While we are here, rename the label used for exit to be more appropriate.
      Signed-off-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
      Signed-off-by: NPrashanth K Nageshappa <prashanth@linux.vnet.ibm.com>
      Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f986a499
    • M
      kmsg_dump: don't run on non-error paths by default · c22ab332
      Matthew Garrett 提交于
      Since commit 04c6862c ("kmsg_dump: add kmsg_dump() calls to the
      reboot, halt, poweroff and emergency_restart paths"), kmsg_dump() gets
      run on normal paths including poweroff and reboot.
      
      This is less than ideal given pstore implementations that can only
      represent single backtraces, since a reboot may overwrite a stored oops
      before it's been picked up by userspace.  In addition, some pstore
      backends may have low performance and provide a significant delay in
      reboot as a result.
      
      This patch adds a printk.always_kmsg_dump kernel parameter (which can also
      be changed from userspace).  Without it, the code will only be run on
      failure paths rather than on normal paths.  The option can be enabled in
      environments where there's a desire to attempt to audit whether or not a
      reboot was cleanly requested or not.
      Signed-off-by: NMatthew Garrett <mjg@redhat.com>
      Acked-by: NSeiji Aguchi <seiji.aguchi@hds.com>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Marco Stornelli <marco.stornelli@gmail.com>
      Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c22ab332
    • N
      md/raid10: fix assembling of arrays with replacement devices. · 7a904848
      NeilBrown 提交于
      commit 56a2559b (md/raid10: recognise replacements ...)
      changed 'run' to set ->replacement or ->rdev depending on the
      'Replacement' status if the device, but it didn't remove the
      old unconditional setting of 'rdev'.  So it was largely ineffective.
      
      So remove that now.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7a904848
    • A
      drm, gma500: Fix Cedarview boot failures in 3.3-rc · 055bf38d
      Alan Cox 提交于
      Production GMA3600/3650 hardware turns out to be subtly different to the
      development platforms.  This combined with a minor driver bug is causing
      the kernel to hang on these platforms.
      
      This patch does the following
      
       - turn down a couple of messages that were meant to be debug and are
         causing much confusion
      
       - ensure the hotplug interrupt is disabled on Cedartrail systems.
      
       - fix a bug where gtt roll mode called psbfb_sync, which tries to sync
         the 2D engine. On other devices it is harmless as the 2D engine is
         present but not in use when in gtt roll mode, on Cedartrail it causes
         a hang
      
      Without these changes 3.3-rc hangs on boot on Cedartrail based systems.
      Signed-off-by: NAlan Cox <alan@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      055bf38d
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · aa139092
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
      1) TCP SACK processing can calculate an incorrect reordering value in
         some cases, fix from Neal Cardwell.
      
      2) tcp_mark_head_lost() can split SKBs in situations where it should
         not, violating send queue invariants expected by other pieces of
         code and thus resulting (eventually) in corrupted retransmit state
         counters.  Also from Neal Cardwell.
      
      3) qla3xxx erroneously calls spin_lock_irqrestore() with constant
         hw_flags of zero.  Fix from Santosh Nayak.
      
      4) Fix NULL deref in rt2x00, from Gabor Juhos.
      
      5) pch_gbe passes address of wrong typed object to pch_gbe_validate_option
         thus corrupting part of the value.  From Dan Carpenter.
      
      6) We must check the return value of nlmsg_parse() before trying to use
         the results.  From Eric Dumazet.
      
      7) Bridging code fails to check return value of ipv6_dev_get_saddr()
         thus potentially leaving uninitialized garbage in the outgoing ipv6
         header.  From Ulrich Weber.
      
      8) Due to rounding and a reversed operation on jiffies, bridge message
         ages can go backwards instead of forwards, thus breaking STP.  Fixes
         from Joakim Tjernlund.
      
      9) r8169 modifies Config* registers without properly holding the
         Config9346 lock, resulting in corrupted IP fragments on some chips.
         Fix from Francois Romieu.
      
      10) NET_PACKET_ENGINE default wan't set properly during the network
         driver mega-move.  Fix from Stephen Hemminger.
      
      11) vmxnet3 uses TCP header size where it actually should use the UDP
         header size, fix from Shreyas Bhatewara.
      
      12) Netfilter bridge module autoload is busted in the compat case, fix
         from Florian Westphal.
      
      13) Wireless Key removal was not setting multicast bits correctly thus
         accidently killing the unicast key 0 and thus all traffic stops.
         Fix from Johannes Berg.
      
      14) Fix endless retries of A-MPDU transmissions in brcm80211 driver.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (22 commits)
        qla3xxx: ethernet: Fix bogus interrupt state flag.
        bridge: check return value of ipv6_dev_get_saddr()
        rtnetlink: fix rtnl_calcit() and rtnl_dump_ifinfo()
        bridge: message age needs to increase, not decrease.
        bridge: Adjust min age inc for HZ > 256
        tcp: don't fragment SACKed skbs in tcp_mark_head_lost()
        r8169: corrupted IP fragments fix for large mtu.
        packetengines: fix config default
        vmxnet3: Fix transport header size
        enic: fix an endian bug in enic_probe()
        pch_gbe: memory corruption calling pch_gbe_validate_option()
        tg3: Fix tg3_get_stats64 for 5700 / 5701 devs
        tcp: fix false reordering signal in tcp_shifted_skb
        tcp: fix comment for tp->highest_sack
        netfilter: bridge: fix module autoload in compat case
        brcm80211: smac: only print block-ack timeout message at trace level
        brcm80211: smac: fix endless retry of A-MPDU transmissions
        mac80211: Fix a warning on changing to monitor mode from STA
        mac80211: zero initialize count field in ieee80211_tx_rate
        iwlwifi: fix key removal
        ...
      aa139092
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci · 4f0449e2
      Linus Torvalds 提交于
      Pull PCI fixes from Jesse Barnes:
       "A couple of fixes for booting specific machines, and one for a minor
        memory leak on pre-_CRS platforms."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci:
        x86/PCI: do not tie MSI MS-7253 use_crs quirk to BIOS version
        x86/PCI: use host bridge _CRS info on MSI MS-7253
        PCI: fix memleak when ACPI _CRS is not used.
      4f0449e2
    • L
      Merge branch 'for-3.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu · 789ce9b9
      Linus Torvalds 提交于
      Pull per-cpu patches from Tejun Heo:
       "This pull request contains four patches.  One replaces manual clearing
        with bitmap_clear(), two fix generic definition of __this_cpu ops so
        that they don't choose unnecessarily strict arch version.  One makes
        _this_cpu definition use raw_local_irq_*() so that it doesn't end up
        wrecking irq on/off state tracking when used from inside lockdep.
      
        Of the four patches, the raw_local_irq_*() update is the most
        important, so please feel free to cherry pick only that one patch and
        ignore the rest if you want to - commit e920d597 'percpu: use
        raw_local_irq_* in _this_cpu op'."
      
      * 'for-3.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
        percpu: fix __this_cpu_{sub,inc,dec}_return() definition
        percpu: use raw_local_irq_* in _this_cpu op
        percpu: fix generic definition of __this_cpu_add_and_return()
        percpu: use bitmap_clear
      789ce9b9
    • L
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · 3a81a6e7
      Linus Torvalds 提交于
      Pull MIPS fixes from Ralf Baechle:
       "What's in there: a number of MIPS fixes and touchups.  The most
        important change in this pull request is Kautuk Consul's port of
        changes to do_page_fault which fix a hang that affects some
        configurations.  Still not quite ready for a release, there are
        problems with 64-bit platforms."
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
        MIPS: traps.c: Fix typo
        MIPS: PowerTV: Fix defconfigs for coverage builds
        MIPS: Netlogic: Fix defconfigs for coverage builds
        MIPS: ATH79: Avoid a kernel bug on AR913X
        MIPS: PCI: use list_for_each_entry() for bus->devices traversal
        MIPS: fault.c: Port OOM changes to do_page_fault
        MIPS: vmlinux.lds.S: remove duplicate _sdata symbol
        MIPS: Alchemy: Increase minimum timeout for 32kHz timer.
        MIPS: txx9 7segled fix struct device has no member
        MIPS: Alchemy: Update Au1300 inlined GPIO macros
        MIPS: Remove temporary kludge from <asm/page.h>
        MIPS: BMIPS: smp-bmips.c does not need to include version.h
      3a81a6e7
    • A
      flush_tlb_range() needs ->page_table_lock when ->mmap_sem is not held · cd2934a3
      Al Viro 提交于
      All other callers already hold either ->mmap_sem (exclusive) or
      ->page_table_lock.  And we need it because some page table flushing
      instanced do work explicitly with ge tables.
      
      See e.g.  arch/powerpc/mm/tlb_hash32.c, flush_tlb_range() and
      flush_range() in there.  The same goes for uml, with a lot more
      extensive playing with page tables.
      
      Almost all callers are actually fine - flush_tlb_range() may have no
      need to bother playing with page tables, but it can do so safely; again,
      this caller is the sole exception - everything else either has exclusive
      ->mmap_sem on the mm in question, or mm->page_table_lock is held.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cd2934a3
    • A
      835ee797
    • A
    • S
      qla3xxx: ethernet: Fix bogus interrupt state flag. · 9d1dfc06
      Santosh Nayak 提交于
      In 'ql_adapter_initialize'
      the first call for 'spin_unlock_irqrestore()' is with hw_flags = 0,
      which is as good as 'spin_unlock_irq()' (unconditional interrupt
      enabling). If this is intended, then for better performance
      'spin_unlock_irqrestore()' can be replaced with 'spin_unlock_irq()'
      and 'spin_lock_irqsave()' can be replaced by 'spin_lock_irq()
      Signed-off-by: NSantosh Nayak <santoshprasadnayak@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d1dfc06
    • U
      bridge: check return value of ipv6_dev_get_saddr() · d1d81d4c
      Ulrich Weber 提交于
      otherwise source IPv6 address of ICMPV6_MGM_QUERY packet
      might be random junk if IPv6 is disabled on interface or
      link-local address is not yet ready (DAD).
      Signed-off-by: NUlrich Weber <ulrich.weber@sophos.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1d81d4c
    • L
      Merge tag 'mmc-fixes-for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc · 550cf00d
      Linus Torvalds 提交于
      MMC fixes from Chris Ball for 3.3:
       - atmel-mci: oops fix against regression introduced in 3.2
       - core: power saving regression fix against 3.3-rc1
       - core: suspend/resume fix for UHS-I cards
       - esdhc-imx: MMC card regression fix against 3.0
       - mmci: oops fix for ARM systems with large (64k) pages
       - MAINTAINERS update for atmel-mci.
      
      * tag 'mmc-fixes-for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
        mmc: core: Fixup suspend/resume issues for UHS-I cards
        mmc: mmci: reduce max_blk_count to avoid overflowing max_req_size
        mmc: sdhci-esdhc-imx: fix for mmc cards on i.MX5
        mmc: core: fix regression: set default clock gating delay to 0
        MAINTAINERS: hand over atmel-mci (sd/mmc interface)
        mmc: atmel-mci: don't use dma features when using DMA with no chan available
      550cf00d
    • L
      Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 5d329e24
      Linus Torvalds 提交于
      Pull from Jiri Kosina:
       "Please pull to receive updates for HID layer.  Nikolai's patch is
        rather important and should still go in for 3.3, as it's a regression
        fix for commit b4b583d4."
      
      * 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
        HID: hid-input: allow array fields out of range
        HID: usbhid: Add NOGET quirk for the AIREN Slim+ keyboard
      5d329e24
  3. 05 3月, 2012 1 次提交
    • N
      HID: hid-input: allow array fields out of range · 883e0e36
      Nikolai Kondrashov 提交于
      Allow array field values out of range as per HID 1.11 specification,
      section 6.2.25:
      
      	Rather than returning a single bit for each button in the group, an
      	array returns an index in each field that corresponds to the pressed
      	button (like keyboard scan codes). An out-of range value in and array
      	field is considered no controls asserted.
      
      Apparently, "and" above is a typo and should be "an".
      
      This fixes at least Waltop tablet pen clicks - otherwise BTN_TOUCH is never
      released.
      
      The relevant part of Waltop tablet report descriptors is this:
      
      	0x09, 0x42,         /*          Usage (Tip Switch),         */
      	0x09, 0x44,         /*          Usage (Barrel Switch),      */
      	0x09, 0x46,         /*          Usage (Tablet Pick),        */
      	0x15, 0x01,         /*          Logical Minimum (1),        */
      	0x25, 0x03,         /*          Logical Maximum (3),        */
      	0x75, 0x04,         /*          Report Size (4),            */
      	0x95, 0x01,         /*          Report Count (1),           */
      	0x80,               /*          Input,                      */
      
      This is a regression fix for commit b4b583d4 ("HID: be more strict when
      ignoring out-of-range fields").
      Signed-off-by: NNikolai Kondrashov <spbnick@gmail.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      883e0e36