1. 14 6月, 2013 11 次提交
    • N
      dm-raid: silence compiler warning on rebuilds_per_group. · 3f6bbd3f
      NeilBrown 提交于
      This doesn't really need to be initialised, but it doesn't hurt,
      silences the compiler, and as it is a counter it makes sense for it to
      start at zero.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      3f6bbd3f
    • J
      DM RAID: Fix raid_resume not reviving failed devices in all cases · a4dc163a
      Jonathan Brassow 提交于
      DM RAID:  Fix raid_resume not reviving failed devices in all cases
      
      When a device fails in a RAID array, it is marked as Faulty.  Later,
      md_check_recovery is called which (through the call chain) calls
      'hot_remove_disk' in order to have the personalities remove the device
      from use in the array.
      
      Sometimes, it is possible for the array to be suspended before the
      personalities get their chance to perform 'hot_remove_disk'.  This is
      normally not an issue.  If the array is deactivated, then the failed
      device will be noticed when the array is reinstantiated.  If the
      array is resumed and the disk is still missing, md_check_recovery will
      be called upon resume and 'hot_remove_disk' will be called at that
      time.  However, (for dm-raid) if the device has been restored,
      a resume on the array would cause it to attempt to revive the device
      by calling 'hot_add_disk'.  If 'hot_remove_disk' had not been called,
      a situation is then created where the device is thought to concurrently
      be the replacement and the device to be replaced.  Thus, the device
      is first sync'ed with the rest of the array (because it is the replacement
      device) and then marked Faulty and removed from the array (because
      it is also the device being replaced).
      
      The solution is to check and see if the device had properly been removed
      before the array was suspended.  This is done by seeing whether the
      device's 'raid_disk' field is -1 - a condition that implies that
      'md_check_recovery -> remove_and_add_spares (where raid_disk is set to -1)
      -> hot_remove_disk' has been called.  If 'raid_disk' is not -1, then
      'hot_remove_disk' must be called to complete the removal of the previously
      faulty device before it can be revived via 'hot_add_disk'.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a4dc163a
    • J
      DM RAID: Break-up untidy function · f381e71b
      Jonathan Brassow 提交于
      DM RAID:  Break-up untidy function
      
      Clean-up excessive indentation by moving some code in raid_resume()
      into its own function.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f381e71b
    • J
      DM RAID: Add ability to restore transiently failed devices on resume · 9092c02d
      Jonathan Brassow 提交于
      DM RAID: Add ability to restore transiently failed devices on resume
      
      This patch adds code to the resume function to check over the devices
      in the RAID array.  If any are found to be marked as failed and their
      superblocks can be read, an attempt is made to reintegrate them into
      the array.  This allows the user to refresh the array with a simple
      suspend and resume of the array - rather than having to load a
      completely new table, allocate and initialize all the structures and
      throw away the old instantiation.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      9092c02d
    • L
      Merge tag 'acpi-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 25e33ed9
      Linus Torvalds 提交于
      Pull ACPI fix from Rafael Wysocki:
       "This is an alternative fix for the regression introduced in 3.9 whose
        previous fix had to be reverted right before 3.10-rc5, because it
        broke one of the Tony's machines.
      
        In this one the check is confined to the ACPI video driver (which is
        the only one causing the problem to happen in the first place) and the
        Tony's box shouldn't even notice it.
      
         - ACPI fix for an issue causing ACPI video driver to attempt to bind
           to devices it shouldn't touch from Rafael J Wysocki."
      
      * tag 'acpi-3.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI / video: Do not bind to device objects with a scan handler
      25e33ed9
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · cb03dc09
      Linus Torvalds 提交于
      Pull x86 fixes from Peter Anvin:
       "Another set of fixes, the biggest bit of this is yet another tweak to
        the UEFI anti-bricking code; apparently we finally got some feedback
        from Samsung as to what makes at least their systems fail.  This set
        should actually fix the boot regressions that some other systems (e.g.
        SGI) have exhibited.
      
        Other than that, there is a patch to avoid a panic with particularly
        unhappy memory layouts and two minor protocol fixes which may or may
        not be manifest bugs"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Fix typo in kexec register clearing
        x86, relocs: Move __vvar_page from S_ABS to S_REL
        Modify UEFI anti-bricking code
        x86: Fix adjust_range_size_mask calling position
      cb03dc09
    • L
      Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu · cb7e9704
      Linus Torvalds 提交于
      Pull RCU fixes from Paul McKenney:
       "I must confess that this past merge window was not RCU's best showing.
        This series contains three more fixes for RCU regressions:
      
         1.   A fix to __DECLARE_TRACE_RCU() that causes it to act as an
              interrupt from idle rather than as a task switch from idle.
              This change is needed due to the recent use of _rcuidle()
              tracepoints that can be invoked from interrupt handlers as well
              as from idle.  Without this fix, invoking _rcuidle() tracepoints
              from interrupt handlers results in splats and (more seriously)
              confusion on RCU's part as to whether a given CPU is idle or not.
              This confusion can in turn result in too-short grace periods and
              therefore random memory corruption.
      
         2.   A fix to a subtle deadlock that could result due to RCU doing
              a wakeup while holding one of its rcu_node structure's locks.
              Although the probability of occurrence is low, it really
              does happen.  The fix, courtesy of Steven Rostedt, uses
              irq_work_queue() to avoid the deadlock.
      
         3.   A fix to a silent deadlock (invisible to lockdep) due to the
              interaction of timeouts posted by RCU debug code enabled by
              CONFIG_PROVE_RCU_DELAY=y, grace-period initialization, and CPU
              hotplug operations.  This will not occur in production kernels,
              but really does occur in randconfig testing.  Diagnosis courtesy
              of Steven Rostedt"
      
      * 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
        rcu: Fix deadlock with CPU hotplug, RCU GP init, and timer migration
        rcu: Don't call wakeup() with rcu_node structure ->lock held
        trace: Allow idle-safe tracepoints to be called from irq
      cb7e9704
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · dcae7f2d
      Linus Torvalds 提交于
      Pull s390 fixes from Martin Schwidefsky:
       "Three kvm related memory management fixes, a fix for show_trace, a fix
        for early console output and a patch from Ben to help prevent compile
        errors in regard to irq functions (or our lack thereof)"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/pci: Implement IRQ functions if !PCI
        s390/sclp: fix new line detection
        s390/pgtable: make pgste lock an explicit barrier
        s390/pgtable: Save pgste during modify_prot_start/commit
        s390/dumpstack: fix address ranges for asynchronous and panic stack
        s390/pgtable: Fix guest overindication for change bit
      dcae7f2d
    • L
      Merge tag 'asoc-v3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound · 509768f7
      Linus Torvalds 提交于
      Pull ASoC sound updates from Mark Brown:
       "Takashi is travelling at the minute and it'd be good to get the
        MAINTAINERS update in here merged so sending directly.
      
        As well as the usual driver specifics we've got a couple of core fixes
        here, one fixing capabilities for unidirectional streams and the other
        fixing suspend while audio streams are active.
      
        The suspend fix is a little involved but mostly as a result of
        removing some special casing that was doing the wrong thing."
      
      * tag 'asoc-v3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound:
        ASoC: tlv320aic3x: Remove deadlock from snd_soc_dapm_put_volsw_aic3x()
        ASoC: dapm: Treat DAI widgets like AIF widgets for power
        ASoC: arizona: Correct AEC loopback enable
        ASoC: pcm: Require both CODEC and CPU support when declaring stream caps
        MAINTAINERS: Remove myself from Wolfson maintainers
        ASoC: wm8994: Ensure microphone detection state is reset on removal
        ASoC: wm8994: Avoid leaking pm_runtime reference on removed jack race
        ASoC: cs42l52: fix hp_gain_enum shift value.
        ASoC: cs42l52: use correct PCM mixer TLV dB scale to match datasheet.
      509768f7
    • L
      Merge tag 'md-3.10-fixes' of git://neil.brown.name/md · 82ea4be6
      Linus Torvalds 提交于
      Pull md bugfixes from Neil Brown:
       "A few bugfixes for md
      
        Some tagged for -stable"
      
      * tag 'md-3.10-fixes' of git://neil.brown.name/md:
        md/raid1,5,10: Disable WRITE SAME until a recovery strategy is in place
        md/raid1,raid10: use freeze_array in place of raise_barrier in various places.
        md/raid1: consider WRITE as successful only if at least one non-Faulty and non-rebuilding drive completed it.
        md: md_stop_writes() should always freeze recovery.
      82ea4be6
    • J
      turbostat: Increase output buffer size to accommodate C8-C10 · b844db31
      Josh Triplett 提交于
      On platforms with C8-C10 support, the additional C-states cause
      turbostat to overrun its output buffer of 128 bytes per CPU.  Increase
      this to 256 bytes per CPU.
      
      [ As a bugfix, this should go into 3.10; however, since the C8-C10
        support didn't go in until after 3.9, this need not go into any stable
        kernel. ]
      Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b844db31
  2. 13 6月, 2013 29 次提交
    • H
      Merge tag 'efi-urgent' into x86/urgent · 45df901c
      H. Peter Anvin 提交于
       * More tweaking to the EFI variable anti-bricking algorithm. Quite a
         few users were reporting boot regressions in v3.9. This has now been
         fixed with a more accurate "minimum storage requirement to avoid
         bricking" value from Samsung (5K instead of 50%) and code to trigger
         garbage collection when we near our limit - Matthew Garrett.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      45df901c
    • H
      md/raid1,5,10: Disable WRITE SAME until a recovery strategy is in place · 5026d7a9
      H. Peter Anvin 提交于
      There are cases where the kernel will believe that the WRITE SAME
      command is supported by a block device which does not, in fact,
      support WRITE SAME.  This currently happens for SATA drivers behind a
      SAS controller, but there are probably a hundred other ways that can
      happen, including drive firmware bugs.
      
      After receiving an error for WRITE SAME the block layer will retry the
      request as a plain write of zeroes, but mdraid will consider the
      failure as fatal and consider the drive failed.  This has the effect
      that all the mirrors containing a specific set of data are each
      offlined in very rapid succession resulting in data loss.
      
      However, just bouncing the request back up to the block layer isn't
      ideal either, because the whole initial request-retry sequence should
      be inside the write bitmap fence, which probably means that md needs
      to do its own conversion of WRITE SAME to write zero.
      
      Until the failure scenario has been sorted out, disable WRITE SAME for
      raid1, raid5, and raid10.
      
      [neilb: added raid5]
      
      This patch is appropriate for any -stable since 3.7 when write_same
      support was added.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5026d7a9
    • N
      md/raid1,raid10: use freeze_array in place of raise_barrier in various places. · e2d59925
      NeilBrown 提交于
      Various places in raid1 and raid10 are calling raise_barrier when they
      really should call freeze_array.
      The former is only intended to be called from "make_request".
      The later has extra checks for 'nr_queued' and makes a call to
      flush_pending_writes(), so it is safe to call it from within the
      management thread.
      
      Using raise_barrier will sometimes deadlock.  Using freeze_array
      should not.
      
      As 'freeze_array' currently expects one request to be pending (in
      handle_read_error - the only previous caller), we need to pass
      it the number of pending requests (extra) to ignore.
      
      The deadlock was made particularly noticeable by commits
      050b6615 (raid10) and 6b740b8d (raid1) which
      appeared in 3.4, so the fix is appropriate for any -stable
      kernel since then.
      
      This patch probably won't apply directly to some early kernels and
      will need to be applied by hand.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAlexander Lyakas <alex.bolshoy@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e2d59925
    • A
      md/raid1: consider WRITE as successful only if at least one non-Faulty and... · 3056e3ae
      Alex Lyakas 提交于
      md/raid1: consider WRITE as successful only if at least one non-Faulty and non-rebuilding drive completed it.
      
      Without that fix, the following scenario could happen:
      
      - RAID1 with drives A and B; drive B was freshly-added and is rebuilding
      - Drive A fails
      - WRITE request arrives to the array. It is failed by drive A, so
      r1_bio is marked as R1BIO_WriteError, but the rebuilding drive B
      succeeds in writing it, so the same r1_bio is marked as
      R1BIO_Uptodate.
      - r1_bio arrives to handle_write_finished, badblocks are disabled,
      md_error()->error() does nothing because we don't fail the last drive
      of raid1
      - raid_end_bio_io()  calls call_bio_endio()
      - As a result, in call_bio_endio():
              if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
                      clear_bit(BIO_UPTODATE, &bio->bi_flags);
      this code doesn't clear the BIO_UPTODATE flag, and the whole master
      WRITE succeeds, back to the upper layer.
      
      So we returned success to the upper layer, even though we had written
      the data onto the rebuilding drive only. But when we want to read the
      data back, we would not read from the rebuilding drive, so this data
      is lost.
      
      [neilb - applied identical change to raid10 as well]
      
      This bug can result in lost data, so it is suitable for any
      -stable kernel.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAlex Lyakas <alex@zadarastorage.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      3056e3ae
    • N
      md: md_stop_writes() should always freeze recovery. · 6b6204ee
      NeilBrown 提交于
      __md_stop_writes() will currently sometimes freeze recovery.
      So any caller must be ready for that to happen, and indeed they are.
      
      However if __md_stop_writes() doesn't freeze_recovery, then
      a recovery could start before mddev_suspend() is called, which
      could be awkward.  This can particularly cause problems or dm-raid.
      
      So change __md_stop_writes() to always freeze recovery.  This is safe
      and more predicatable.
      Reported-by: NBrassow Jonathan <jbrassow@redhat.com>
      Tested-by: NBrassow Jonathan <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      6b6204ee
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 26e04462
      Linus Torvalds 提交于
      Pull networking update from David Miller:
      
       1) Fix dump iterator in nfnl_acct_dump() and ctnl_timeout_dump() to
          dump all objects properly, from Pablo Neira Ayuso.
      
       2) xt_TCPMSS must use the default MSS of 536 when no MSS TCP option is
          present.  Fix from Phil Oester.
      
       3) qdisc_get_rtab() looks for an existing matching rate table and uses
          that instead of creating a new one.  However, it's key matching is
          incomplete, it fails to check to make sure the ->data[] array is
          identical too.  Fix from Eric Dumazet.
      
       4) ip_vs_dest_entry isn't fully initialized before copying back to
          userspace, fix from Dan Carpenter.
      
       5) Fix ubuf reference counting regression in vhost_net, from Jason
          Wang.
      
       6) When sock_diag dumps a socket filter back to userspace, we have to
          translate it out of the kernel's internal representation first.
          From Nicolas Dichtel.
      
       7) davinci_mdio holds a spinlock while calling pm_runtime, which
          sleeps.  Fix from Sebastian Siewior.
      
       8) Timeout check in sh_eth_check_reset is off by one, from Sergei
          Shtylyov.
      
       9) If sctp socket init fails, we can NULL deref during cleanup.  Fix
          from Daniel Borkmann.
      
      10) netlink_mmap() does not propagate errors properly, from Patrick
          McHardy.
      
      11) Disable powersave and use minstrel by default in ath9k.  From Sujith
          Manoharan.
      
      12) Fix a regression in that SOCK_ZEROCOPY is not set on tuntap sockets
          which prevents vhost from being able to use zerocopy.  From Jason
          Wang.
      
      13) Fix race between port lookup and TX path in team driver, from Jiri
          Pirko.
      
      14) Missing length checks in bluetooth L2CAP packet parsing, from Johan
          Hedberg.
      
      15) rtlwifi fails to connect to networking using any encryption method
          other than WPA2.  Fix from Larry Finger.
      
      16) Fix iwlegacy build due to incorrect CONFIG_* ifdeffing for power
          management stuff.  From Yijing Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
        b43: stop format string leaking into error msgs
        ath9k: Use minstrel rate control by default
        Revert "ath9k_hw: Update rx gain initval to improve rx sensitivity"
        ath9k: Disable PowerSave by default
        net: wireless: iwlegacy: fix build error for il_pm_ops
        rtlwifi: Fix a false leak indication for PCI devices
        wl12xx/wl18xx: scan all 5ghz channels
        wl12xx: increase minimum singlerole firmware version required
        wl12xx: fix minimum required firmware version for wl127x multirole
        rtlwifi: rtl8192cu: Fix problem in connecting to WEP or WPA(1) networks
        mwifiex: debugfs: Fix out of bounds array access
        Bluetooth: Fix mgmt handling of power on failures
        Bluetooth: Fix missing length checks for L2CAP signalling PDUs
        Bluetooth: btmrvl: support Marvell Bluetooth device SD8897
        Bluetooth: Fix checks for LE support on LE-only controllers
        team: fix checks in team_get_first_port_txable_rcu()
        team: move add to port list before port enablement
        team: check return value of team_get_port_by_index_rcu() for NULL
        tuntap: set SOCK_ZEROCOPY flag during open
        netlink: fix error propagation in netlink_mmap()
        ...
      26e04462
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 645a9929
      Linus Torvalds 提交于
      Pull input layer bugfix from Jiri Kosina:
       "Memory leak regression fix from Benjamin Tissoires"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
        HID: multitouch: prevent memleak with the allocated name
      645a9929
    • L
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · b2cc9c19
      Linus Torvalds 提交于
      Pull block layer fixes from Jens Axboe:
       "Outside of bcache (which really isn't super big), these are all
        few-liners.  There are a few important fixes in here:
      
         - Fix blk pm sleeping when holding the queue lock
      
         - A small collection of bcache fixes that have been done and tested
           since bcache was included in this merge window.
      
         - A fix for a raid5 regression introduced with the bio changes.
      
         - Two important fixes for mtip32xx, fixing an oops and potential data
           corruption (or hang) due to wrong bio iteration on stacked devices."
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        scatterlist: sg_set_buf() argument must be in linear mapping
        raid5: Initialize bi_vcnt
        pktcdvd: silence static checker warning
        block: remove refs to XD disks from documentation
        blkpm: avoid sleep when holding queue lock
        mtip32xx: Correctly handle bio->bi_idx != 0 conditions
        mtip32xx: Fix NULL pointer dereference during module unload
        bcache: Fix error handling in init code
        bcache: clarify free/available/unused space
        bcache: drop "select CLOSURES"
        bcache: Fix incompatible pointer type warning
      b2cc9c19
    • L
      Merge branch 'akpm' (updates from Andrew Morton) · a568fa1c
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton:
       "Bunch of fixes and one little addition to math64.h"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits)
        include/linux/math64.h: add div64_ul()
        mm: memcontrol: fix lockless reclaim hierarchy iterator
        frontswap: fix incorrect zeroing and allocation size for frontswap_map
        kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules()
        mm: migration: add migrate_entry_wait_huge()
        ocfs2: add missing lockres put in dlm_mig_lockres_handler
        mm/page_alloc.c: fix watermark check in __zone_watermark_ok()
        drivers/misc/sgi-gru/grufile.c: fix info leak in gru_get_config_info()
        aio: fix io_destroy() regression by using call_rcu()
        rtc-at91rm9200: use shadow IMR on at91sam9x5
        rtc-at91rm9200: add shadow interrupt mask
        rtc-at91rm9200: refactor interrupt-register handling
        rtc-at91rm9200: add configuration support
        rtc-at91rm9200: add match-table compile guard
        fs/ocfs2/namei.c: remove unecessary ERROR when removing non-empty directory
        swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion
        drivers/rtc/rtc-twl.c: fix missing device_init_wakeup() when booted with device tree
        cciss: fix broken mutex usage in ioctl
        audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE
        drivers/rtc/rtc-cmos.c: fix accidentally enabling rtc channel
        ...
      a568fa1c
    • A
      include/linux/math64.h: add div64_ul() · c2853c8d
      Alex Shi 提交于
      There is div64_long() to handle the s64/long division, but no mocro do
      u64/ul division.  It is necessary in some scenarios, so add this
      function.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NAlex Shi <alex.shi@intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2853c8d
    • J
      mm: memcontrol: fix lockless reclaim hierarchy iterator · 89dc991f
      Johannes Weiner 提交于
      The lockless reclaim hierarchy iterator currently has a misplaced
      barrier that can lead to use-after-free crashes.
      
      The reclaim hierarchy iterator consist of a sequence count and a
      position pointer that are read and written locklessly, with memory
      barriers enforcing ordering.
      
      The write side sets the position pointer first, then updates the
      sequence count to "publish" the new position.  Likewise, the read side
      must read the sequence count first, then the position.  If the sequence
      count is up to date, it's guaranteed that the position is up to date as
      well:
      
        writer:                         reader:
        iter->position = position       if iter->sequence == expected:
        smp_wmb()                           smp_rmb()
        iter->sequence = sequence           position = iter->position
      
      However, the read side barrier is currently misplaced, which can lead to
      dereferencing stale position pointers that no longer point to valid
      memory.  Fix this.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NTejun Heo <tj@kernel.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: <stable@kernel.org>		[3.10+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      89dc991f
    • A
      frontswap: fix incorrect zeroing and allocation size for frontswap_map · 7b57976d
      Akinobu Mita 提交于
      The bitmap accessed by bitops must have enough size to hold the required
      numbers of bits rounded up to a multiple of BITS_PER_LONG.  And the
      bitmap must not be zeroed by memset() if the number of bits cleared is
      not a multiple of BITS_PER_LONG.
      
      This fixes incorrect zeroing and allocation size for frontswap_map.  The
      incorrect zeroing part doesn't cause any problem because frontswap_map
      is freed just after zeroing.  But the wrongly calculated allocation size
      may cause the problem.
      
      For 32bit systems, the allocation size of frontswap_map is about twice
      as large as required size.  For 64bit systems, the allocation size is
      smaller than requeired if the number of bits is not a multiple of
      BITS_PER_LONG.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b57976d
    • C
      kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules() · 736f3203
      Chen Gang 提交于
      audit_add_tree_rule() must set 'rule->tree = NULL;' firstly, to protect
      the rule itself freed in kill_rules().
      
      The reason is when it is killed, the 'rule' itself may have already
      released, we should not access it.  one example: we add a rule to an
      inode, just at the same time the other task is deleting this inode.
      
      The work flow for adding a rule:
      
          audit_receive() -> (need audit_cmd_mutex lock)
            audit_receive_skb() ->
              audit_receive_msg() ->
                audit_receive_filter() ->
                  audit_add_rule() ->
                    audit_add_tree_rule() -> (need audit_filter_mutex lock)
                      ...
                      unlock audit_filter_mutex
                      get_tree()
                      ...
                      iterate_mounts() -> (iterate all related inodes)
                        tag_mount() ->
                          tag_trunk() ->
                            create_trunk() -> (assume it is 1st rule)
                              fsnotify_add_mark() ->
                                fsnotify_add_inode_mark() ->  (add mark to inode->i_fsnotify_marks)
                              ...
                              get_tree(); (each inode will get one)
                      ...
                      lock audit_filter_mutex
      
      The work flow for deleting an inode:
      
          __destroy_inode() ->
           fsnotify_inode_delete() ->
             __fsnotify_inode_delete() ->
              fsnotify_clear_marks_by_inode() ->  (get mark from inode->i_fsnotify_marks)
                fsnotify_destroy_mark() ->
                 fsnotify_destroy_mark_locked() ->
                   audit_tree_freeing_mark() ->
                     evict_chunk() ->
                       ...
                       tree->goner = 1
                       ...
                       kill_rules() ->   (assume current->audit_context == NULL)
                         call_rcu() ->   (rule->tree != NULL)
                           audit_free_rule_rcu() ->
                             audit_free_rule()
                       ...
                       audit_schedule_prune() ->  (assume current->audit_context == NULL)
                         kthread_run() ->    (need audit_cmd_mutex and audit_filter_mutex lock)
                           prune_one() ->    (delete it from prue_list)
                             put_tree(); (match the original get_tree above)
      Signed-off-by: NChen Gang <gang.chen@asianux.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      736f3203
    • N
      mm: migration: add migrate_entry_wait_huge() · 30dad309
      Naoya Horiguchi 提交于
      When we have a page fault for the address which is backed by a hugepage
      under migration, the kernel can't wait correctly and do busy looping on
      hugepage fault until the migration finishes.  As a result, users who try
      to kick hugepage migration (via soft offlining, for example) occasionally
      experience long delay or soft lockup.
      
      This is because pte_offset_map_lock() can't get a correct migration entry
      or a correct page table lock for hugepage.  This patch introduces
      migration_entry_wait_huge() to solve this.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@vger.kernel.org>	[2.6.35+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30dad309
    • X
      ocfs2: add missing lockres put in dlm_mig_lockres_handler · 27749f2f
      Xue jiufei 提交于
      dlm_mig_lockres_handler() is missing a dlm_lockres_put() on an error path.
      Signed-off-by: Njoyce <xuejiufei@huawei.com>
      Reviewed-by: Nshencanquan <shencanquan@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27749f2f
    • T
      mm/page_alloc.c: fix watermark check in __zone_watermark_ok() · 026b0814
      Tomasz Stanislawski 提交于
      The watermark check consists of two sub-checks.  The first one is:
      
      	if (free_pages <= min + lowmem_reserve)
      		return false;
      
      The check assures that there is minimal amount of RAM in the zone.  If
      CMA is used then the free_pages is reduced by the number of free pages
      in CMA prior to the over-mentioned check.
      
      	if (!(alloc_flags & ALLOC_CMA))
      		free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
      
      This prevents the zone from being drained from pages available for
      non-movable allocations.
      
      The second check prevents the zone from getting too fragmented.
      
      	for (o = 0; o < order; o++) {
      		free_pages -= z->free_area[o].nr_free << o;
      		min >>= 1;
      		if (free_pages <= min)
      			return false;
      	}
      
      The field z->free_area[o].nr_free is equal to the number of free pages
      including free CMA pages.  Therefore the CMA pages are subtracted twice.
      This may cause a false positive fail of __zone_watermark_ok() if the CMA
      area gets strongly fragmented.  In such a case there are many 0-order
      free pages located in CMA.  Those pages are subtracted twice therefore
      they will quickly drain free_pages during the check against
      fragmentation.  The test fails even though there are many free non-cma
      pages in the zone.
      
      This patch fixes this issue by subtracting CMA pages only for a purpose of
      (free_pages <= min + lowmem_reserve) check.
      
      Laura said:
      
        We were observing allocation failures of higher order pages (order 5 =
        128K typically) under tight memory conditions resulting in driver
        failure.  The output from the page allocation failure showed plenty of
        free pages of the appropriate order/type/zone and mostly CMA pages in
        the lower orders.
      
        For full disclosure, we still observed some page allocation failures
        even after applying the patch but the number was drastically reduced and
        those failures were attributed to fragmentation/other system issues.
      Signed-off-by: NTomasz Stanislawski <t.stanislaws@samsung.com>
      Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>
      Tested-by: NLaura Abbott <lauraa@codeaurora.org>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Tested-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Cc: <stable@vger.kernel.org>	[3.7+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      026b0814
    • D
      drivers/misc/sgi-gru/grufile.c: fix info leak in gru_get_config_info() · 282c4c0e
      Dan Carpenter 提交于
      The "info.fill" array isn't initialized so it can leak uninitialized stack
      information to user space.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NRobin Holt <holt@sgi.com>
      Acked-by: NDimitri Sivanich <sivanich@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      282c4c0e
    • K
      aio: fix io_destroy() regression by using call_rcu() · 4fcc712f
      Kent Overstreet 提交于
      There was a regression introduced by 36f55889 ("aio: refcounting
      cleanup"), reported by Jens Axboe - the refcounting cleanup switched to
      using RCU in the shutdown path, but the synchronize_rcu() was done in
      the context of the io_destroy() syscall greatly increasing the time it
      could block.
      
      This patch switches it to call_rcu() and makes shutdown asynchronous
      (more asynchronous than it was originally; before the refcount changes
      io_destroy() would still wait on pending kiocbs).
      
      Note that there's a global quota on the max outstanding kiocbs, and that
      quota must be manipulated synchronously; otherwise io_setup() could
      return -EAGAIN when there isn't quota available, and userspace won't
      have any way of waiting until shutdown of the old kioctxs has finished
      (besides busy looping).
      
      So we release our quota before kioctx shutdown has finished, which
      should be fine since the quota never corresponded to anything real
      anyways.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Reported-by: NJens Axboe <axboe@kernel.dk>
      Tested-by: NJens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      Tested-by: NBenjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4fcc712f
    • J
      rtc-at91rm9200: use shadow IMR on at91sam9x5 · bba00e59
      Johan Hovold 提交于
      Add support for the at91sam9x5-family which must use the shadow
      interrupt mask due to a hardware issue (causing RTC_IMR to always be
      zero).
      Signed-off-by: NJohan Hovold <jhovold@gmail.com>
      Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
      Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
      Cc: Robert Nelson <Robert.Nelson@digikey.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bba00e59
    • J
      rtc-at91rm9200: add shadow interrupt mask · e9f08bbe
      Johan Hovold 提交于
      Add shadow interrupt-mask register which can be used on SoCs where the
      actual hardware register is broken.
      
      Note that some care needs to be taken to make sure the shadow mask
      corresponds to the actual hardware state.  The added overhead is not an
      issue for the non-broken SoCs due to the relatively infrequent
      interrupt-mask updates.  We do, however, only use the shadow mask value
      as a fall-back when it actually needed as there is still a theoretical
      possibility that the mask is incorrect (see the code for details).
      Signed-off-by: NJohan Hovold <jhovold@gmail.com>
      Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
      Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
      Cc: Robert Nelson <Robert.Nelson@digikey.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e9f08bbe
    • J
      rtc-at91rm9200: refactor interrupt-register handling · e304fcd0
      Johan Hovold 提交于
      Add accessors for the interrupt register.
      
      This will allow us to easily add a shadow interrupt-mask register to use
      on SoCs where the interrupt-mask register cannot be used.
      Signed-off-by: NJohan Hovold <jhovold@gmail.com>
      Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
      Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
      Cc: Robert Nelson <Robert.Nelson@digikey.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e304fcd0
    • J
      rtc-at91rm9200: add configuration support · de645475
      Johan Hovold 提交于
      Add configuration support which can be used to implement SoC-specific
      workarounds for broken hardware.
      Signed-off-by: NJohan Hovold <jhovold@gmail.com>
      Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
      Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
      Cc: Robert Nelson <Robert.Nelson@digikey.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de645475
    • J
      rtc-at91rm9200: add match-table compile guard · 558c61e5
      Johan Hovold 提交于
      The members of Atmel's at91sam9x5 family (9x5) have a broken RTC
      interrupt mask register (AT91_RTC_IMR).  It does not reflect enabled
      interrupts but instead always returns zero.
      
      The kernel's rtc-at91rm9200 driver handles the RTC for the 9x5 family.
      Currently when the date/time is set, an interrupt is generated and this
      driver neglects to handle the interrupt.  The kernel complains about the
      un-handled interrupt and disables it henceforth.  This not only breaks
      the RTC function, but since that interrupt is shared (Atmel's SYS
      interrupt) then other things break as well (e.g.  the debug port no
      longer accepts characters).
      
      Tested on the at91sam9g25.  Bug confirmed by Atmel.
      
      This patch (of 5):
      
      Add missing match-table compile guard.
      Signed-off-by: NJohan Hovold <jhovold@gmail.com>
      Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
      Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
      Cc: Robert Nelson <Robert.Nelson@digikey.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      558c61e5
    • G
      fs/ocfs2/namei.c: remove unecessary ERROR when removing non-empty directory · e0991271
      Goldwyn Rodrigues 提交于
      While removing a non-empty directory, the kernel dumps a message:
      
        (rmdir,21743,1):ocfs2_unlink:953 ERROR: status = -39
      
      Suppress the error message from being printed in the dmesg so users
      don't panic.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Acked-by: NSunil Mushran <sunil.mushran@gmail.com>
      Reviewed-by: NJie Liu <jeff.liu@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e0991271
    • R
      swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion · cbab0e4e
      Rafael Aquini 提交于
      read_swap_cache_async() can race against get_swap_page(), and stumble
      across a SWAP_HAS_CACHE entry in the swap map whose page wasn't brought
      into the swapcache yet.
      
      This transient swap_map state is expected to be transitory, but the
      actual placement of discard at scan_swap_map() inserts a wait for I/O
      completion thus making the thread at read_swap_cache_async() to loop
      around its -EEXIST case, while the other end at get_swap_page() is
      scheduled away at scan_swap_map().  This can leave the system deadlocked
      if the I/O completion happens to be waiting on the CPU waitqueue where
      read_swap_cache_async() is busy looping and !CONFIG_PREEMPT.
      
      This patch introduces a cond_resched() call to make the aforementioned
      read_swap_cache_async() busy loop condition to bail out when necessary,
      thus avoiding the subtle race window.
      Signed-off-by: NRafael Aquini <aquini@redhat.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cbab0e4e
    • T
      drivers/rtc/rtc-twl.c: fix missing device_init_wakeup() when booted with device tree · 24b8256a
      Tony Lindgren 提交于
      When booted in legacy mode device_init_wakeup() gets called by
      drivers/mfd/twl-core.c when the children are initialized.  However, when
      booted using device tree, the children are created with
      of_platform_populate() instead add_children().
      
      This means that the RTC driver will not have device_init_wakeup() set,
      and we need to call it from the driver probe like RTC drivers typically
      do.
      
      Without this we cannot test PM wake-up events on omaps for cases where
      there may not be any physical wake-up event.
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      Reported-by: NKevin Hilman <khilman@linaro.org>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: Jingoo Han <jg1.han@samsung.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      24b8256a
    • S
      cciss: fix broken mutex usage in ioctl · 03f47e88
      Stephen M. Cameron 提交于
      If a new logical drive is added and the CCISS_REGNEWD ioctl is invoked
      (as is normal with the Array Configuration Utility) the process will
      hang as below.  It attempts to acquire the same mutex twice, once in
      do_ioctl() and once in cciss_unlocked_open().  The BKL was recursive,
      the mutex isn't.
      
        Linux version 3.10.0-rc2 (scameron@localhost.localdomain) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Fri May 24 14:32:12 CDT 2013
        [...]
        acu             D 0000000000000001     0  3246   3191 0x00000080
        Call Trace:
          schedule+0x29/0x70
          schedule_preempt_disabled+0xe/0x10
          __mutex_lock_slowpath+0x17b/0x220
          mutex_lock+0x2b/0x50
          cciss_unlocked_open+0x2f/0x110 [cciss]
          __blkdev_get+0xd3/0x470
          blkdev_get+0x5c/0x1e0
          register_disk+0x182/0x1a0
          add_disk+0x17c/0x310
          cciss_add_disk+0x13a/0x170 [cciss]
          cciss_update_drive_info+0x39b/0x480 [cciss]
          rebuild_lun_table+0x258/0x370 [cciss]
          cciss_ioctl+0x34f/0x470 [cciss]
          do_ioctl+0x49/0x70 [cciss]
          __blkdev_driver_ioctl+0x28/0x30
          blkdev_ioctl+0x200/0x7b0
          block_ioctl+0x3c/0x40
          do_vfs_ioctl+0x89/0x350
          SyS_ioctl+0xa1/0xb0
          system_call_fastpath+0x16/0x1b
      
      This mutex usage was added into the ioctl path when the big kernel lock
      was removed.  As it turns out, these paths are all thread safe anyway
      (or can easily be made so) and we don't want ioctl() to be single
      threaded in any case.
      Signed-off-by: NStephen M. Cameron <scameron@beardog.cce.hp.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Mike Miller <mike.miller@hp.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      03f47e88
    • O
      audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE · f000cfdd
      Oleg Nesterov 提交于
      audit_log_start() does wait_for_auditd() in a loop until
      audit_backlog_wait_time passes or audit_skb_queue has a room.
      
      If signal_pending() is true this becomes a busy-wait loop, schedule() in
      TASK_INTERRUPTIBLE won't block.
      
      Thanks to Guy for fully investigating and explaining the problem.
      
      (akpm: that'll cause the system to lock up on a non-preemptible
      uniprocessor kernel)
      
      (Guy: "Our customer was in fact running a uniprocessor machine, and they
      reported a system hang.")
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reported-by: NGuy Streeter <streeter@redhat.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f000cfdd
    • D
      drivers/rtc/rtc-cmos.c: fix accidentally enabling rtc channel · ebf8d6c8
      Derek Basehore 提交于
      During resume, we call hpet_rtc_timer_init after masking an irq bit in
      hpet.  This will cause the call to hpet_disable_rtc_channel to be undone
      if RTC_AIE is the only bit not masked.
      
      Allowing the cmos interrupt handler to run before resuming caused some
      issues where the timer for the alarm was not removed.  This would cause
      other, later timers to not be cleared, so utilities such as hwclock
      would time out when waiting for the update interrupt.
      
      [akpm@linux-foundation.org: coding-style tweak]
      Signed-off-by: NDerek Basehore <dbasehore@chromium.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ebf8d6c8