1. 13 6月, 2013 40 次提交
    • Y
      net: sh_eth: fix incorrect RX length error if R8A7740 · dd019897
      Yoshihiro Shimoda 提交于
      This patch fixes an issue that the driver increments the "RX length error"
      on every buffer in sh_eth_rx() if the R8A7740.
      This patch also adds a description about the Receive Frame Status bits.
      Signed-off-by: NYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd019897
    • E
      ip_tunnel: remove __net_init/exit from exported functions · d3b6f614
      Eric Dumazet 提交于
      If CONFIG_NET_NS is not set then __net_init is the same as __init and
      __net_exit is the same as __exit. These functions will be removed from
      memory after the module loads or is removed. Functions that are exported
      for use by other functions should never be labeled for removal.
      
      Bug introduced by commit c5441932
      ("GRE: Refactor GRE tunneling code.")
      Reported-by: NSteinar H. Gunderson <sgunderson@bigfoot.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3b6f614
    • M
      drivers: net: davinci_mdio: restore mdio clk divider in mdio resume · cc60ab0a
      Mugunthan V N 提交于
      During suspend resume cycle all the register data is lost, so MDIO
      clock divier value gets reset. This patch restores the clock divider
      value.
      Signed-off-by: NMugunthan V N <mugunthanvnm@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc60ab0a
    • M
      drivers: net: davinci_mdio: moving mdio resume earlier than cpsw ethernet driver · 5033ec3e
      Mugunthan V N 提交于
      MDIO driver should resume before CPSW ethernet driver so that CPSW connect
      to the phy and start tx/rx ethernet packets, changing the suspend/resume
      apis with suspend_late/resume_early.
      Signed-off-by: NMugunthan V N <mugunthanvnm@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5033ec3e
    • S
      net/ipv4: ip_vti clear skb cb before tunneling. · baafc77b
      Saurabh Mohan 提交于
      If users apply shaper to vti tunnel then it will cause a kernel crash. The
      problem seems to be due to the vti_tunnel_xmit function not clearing
      skb->opt field before passing the packet to xfrm tunneling code.
      Signed-off-by: NSaurabh Mohan <saurabh@vyatta.com>
      Acked-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      baafc77b
    • N
      tg3: Wait for boot code to finish after power on · df465abf
      Nithin Sujir 提交于
      Some systems that don't need wake-on-lan may choose to power down the
      chip on system standby. Upon resume, the power on causes the boot code
      to startup and initialize the hardware. On one new platform, this is
      causing the device to go into a bad state due to a race between the
      driver and boot code, once every several hundred resumes. The same race
      exists on open since we come up from a power on.
      
      This patch adds a wait for boot code signature at the beginning of
      tg3_init_hw() which is common to both cases. If there has not been a
      power-off or the boot code has already completed, the signature will be
      present and poll_fw() returns immediately. Also return immediately if
      the device does not have firmware.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NNithin Nayak Sujir <nsujir@broadcom.com>
      Signed-off-by: NMichael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df465abf
    • G
      l2tp: Fix sendmsg() return value · a6f79d0f
      Guillaume Nault 提交于
      PPPoL2TP sockets should comply with the standard send*() return values
      (i.e. return number of bytes sent instead of 0 upon success).
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6f79d0f
    • G
      l2tp: Fix PPP header erasure and memory leak · 55b92b7a
      Guillaume Nault 提交于
      Copy user data after PPP framing header. This prevents erasure of the
      added PPP header and avoids leaking two bytes of uninitialised memory
      at the end of skb's data buffer.
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55b92b7a
    • N
      bonding: fix igmp_retrans type and two related races · 4f5474e7
      Nikolay Aleksandrov 提交于
      First the type of igmp_retrans (which is the actual counter of
      igmp_resend parameter) is changed to u8 to be able to store values up
      to 255 (as per documentation). There are two races that were hidden
      there and which are easy to trigger after the previous fix, the first is
      between bond_resend_igmp_join_requests and bond_change_active_slave
      where igmp_retrans is set and can be altered by the periodic. The second
      race condition is between multiple running instances of the periodic
      (upon execution it can be scheduled again for immediate execution which
      can cause the counter to go < 0 which in the unsigned case leads to
      unnecessary igmp retransmissions).
      Since in bond_change_active_slave bond->lock is held for reading and
      curr_slave_lock for writing, we use curr_slave_lock for mutual
      exclusion. We can't drop them as there're cases where RTNL is not held
      when bond_change_active_slave is called. RCU is unlocked in
      bond_resend_igmp_join_requests before getting curr_slave_lock since we
      don't need it there and it's pointless to delay.
      The decrement is moved inside the "if" block because if we decrement
      unconditionally there's still a possibility for a race condition although
      it is much more difficult to hit (many changes have to happen in
      a very short period in order to trigger) which in the case of 3 parallel
      running instances of this function and igmp_retrans == 1
      (with check bond->igmp_retrans-- > 1) is:
      f1 passes, doesn't re-schedule, but decrements - igmp_retrans = 0
      f2 then passes, doesn't re-schedule, but decrements - igmp_retrans = 255
      f3 does the unnecessary retransmissions.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NJay Vosburgh <fubar@us.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f5474e7
    • N
      bonding: reset master mac on first enslave failure · b8fad459
      Nikolay Aleksandrov 提交于
      If the bond device is supposed to get the first slave's MAC address and
      the first enslavement fails then we need to reset the master's MAC
      otherwise it will stay the same as the failed slave device. We do it
      after err_undo_flags since that is the first place where the MAC can be
      changed and we check if it should've been the first slave and if the
      bond's MAC was set to it because that err place is used by multiple
      locations prior to changing the master's MAC address.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NJay Vosburgh <fubar@us.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8fad459
    • D
      packet: packet_getname_spkt: make sure string is always 0-terminated · 2dc85bf3
      Daniel Borkmann 提交于
      uaddr->sa_data is exactly of size 14, which is hard-coded here and
      passed as a size argument to strncpy(). A device name can be of size
      IFNAMSIZ (== 16), meaning we might leave the destination string
      unterminated. Thus, use strlcpy() and also sizeof() while we're
      at it. We need to memset the data area beforehand, since strlcpy
      does not padd the remaining buffer with zeroes for user space, so
      that we do not possibly leak anything.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2dc85bf3
    • D
      net: ethernet: stmicro: stmmac: Fix compile error when STMMAC_XMIT_DEBUG used · 631f24a2
      Dinh Nguyen 提交于
      drivers/net/ethernet/stmicro/stmmac/stmmac_main.c: In function:
      stmmac_xmit drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:1902:74:
      error: expected ) before __func__
      Signed-off-by: NDinh Nguyen <dinguyen@altera.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      CC: David S. Miller <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      631f24a2
    • S
      be2net: Fix 32-bit DMA Mask handling · 0c5fed09
      Somnath Kotur 提交于
      Fix to set the coherent DMA mask only if dma_set_mask() succeeded, and to
      error out if either fails.
      Signed-off-by: NSomnath Kotur <somnath.kotur@emulex.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c5fed09
    • D
      Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge · e86c9861
      David S. Miller 提交于
      Included change:
      - fix "rtnl locked" concurrent executions by using rtnl_lock instead of
        rtnl_trylock. This fix enables batman-adv initialisation to do not fail just
        because somewhere else in the system another code path is holding the rtnl
        lock. It is easy to see the problem when batman-adv is trying to start
        together with other networking components.
      - fix the routing protocol forwarding policy by enhancing the duplicate control
        packet detection. When the right circumstances trigger the issue, some nodes in
        the network become totally unreachable, so breaking the mesh connectivity.
      - fix the Bridge Loop Avoidance component by not running the originator address
        change handling routine when the component is disabled. The routine was
        generating useless packets that were sent over the network.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e86c9861
    • J
      xen-netback: don't de-reference vif pointer after having called xenvif_put() · 94f950c4
      Jan Beulich 提交于
      When putting vif-s on the rx notify list, calling xenvif_put() must be
      deferred until after the removal from the list and the issuing of the
      notification, as both operations dereference the pointer.
      
      Changing this got me to notice that the "irq" variable was effectively
      unused (and was of too narrow type anyway).
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Acked-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94f950c4
    • M
      macvlan: don't touch promisc without passthrough · 99ffc3e7
      Michael S. Tsirkin 提交于
      commit df8ef8f3
      "macvlan: add FDB bridge ops and macvlan flags"
      added a way to control NOPROMISC macvlan flag through netlink.
      
      However, with a non passthrough device we never set promisc on open,
      even if NOPROMISC is off.  As a result:
      
      If userspace clears NOPROMISC on open, then does not clear it on a
      netlink command, promisc counter is not decremented on stop and there
      will be no way to clear it once macvlan is detached.
      
      If userspace does not clear NOPROMISC on open, then sets NOPROMISC on a
      netlink command, promisc counter will be decremented from 0 and overflow
      to fffffffff with no way to clear promisc.
      
      To fix, simply ignore NOPROMISC flag in a netlink command for
      non-passthrough devices, same as we do at open/close.
      
      Since we touch this code anyway - check dev_set_promiscuity return code
      and pass it to users (though an error here is unlikely).
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Reviewed-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99ffc3e7
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 26e04462
      Linus Torvalds 提交于
      Pull networking update from David Miller:
      
       1) Fix dump iterator in nfnl_acct_dump() and ctnl_timeout_dump() to
          dump all objects properly, from Pablo Neira Ayuso.
      
       2) xt_TCPMSS must use the default MSS of 536 when no MSS TCP option is
          present.  Fix from Phil Oester.
      
       3) qdisc_get_rtab() looks for an existing matching rate table and uses
          that instead of creating a new one.  However, it's key matching is
          incomplete, it fails to check to make sure the ->data[] array is
          identical too.  Fix from Eric Dumazet.
      
       4) ip_vs_dest_entry isn't fully initialized before copying back to
          userspace, fix from Dan Carpenter.
      
       5) Fix ubuf reference counting regression in vhost_net, from Jason
          Wang.
      
       6) When sock_diag dumps a socket filter back to userspace, we have to
          translate it out of the kernel's internal representation first.
          From Nicolas Dichtel.
      
       7) davinci_mdio holds a spinlock while calling pm_runtime, which
          sleeps.  Fix from Sebastian Siewior.
      
       8) Timeout check in sh_eth_check_reset is off by one, from Sergei
          Shtylyov.
      
       9) If sctp socket init fails, we can NULL deref during cleanup.  Fix
          from Daniel Borkmann.
      
      10) netlink_mmap() does not propagate errors properly, from Patrick
          McHardy.
      
      11) Disable powersave and use minstrel by default in ath9k.  From Sujith
          Manoharan.
      
      12) Fix a regression in that SOCK_ZEROCOPY is not set on tuntap sockets
          which prevents vhost from being able to use zerocopy.  From Jason
          Wang.
      
      13) Fix race between port lookup and TX path in team driver, from Jiri
          Pirko.
      
      14) Missing length checks in bluetooth L2CAP packet parsing, from Johan
          Hedberg.
      
      15) rtlwifi fails to connect to networking using any encryption method
          other than WPA2.  Fix from Larry Finger.
      
      16) Fix iwlegacy build due to incorrect CONFIG_* ifdeffing for power
          management stuff.  From Yijing Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
        b43: stop format string leaking into error msgs
        ath9k: Use minstrel rate control by default
        Revert "ath9k_hw: Update rx gain initval to improve rx sensitivity"
        ath9k: Disable PowerSave by default
        net: wireless: iwlegacy: fix build error for il_pm_ops
        rtlwifi: Fix a false leak indication for PCI devices
        wl12xx/wl18xx: scan all 5ghz channels
        wl12xx: increase minimum singlerole firmware version required
        wl12xx: fix minimum required firmware version for wl127x multirole
        rtlwifi: rtl8192cu: Fix problem in connecting to WEP or WPA(1) networks
        mwifiex: debugfs: Fix out of bounds array access
        Bluetooth: Fix mgmt handling of power on failures
        Bluetooth: Fix missing length checks for L2CAP signalling PDUs
        Bluetooth: btmrvl: support Marvell Bluetooth device SD8897
        Bluetooth: Fix checks for LE support on LE-only controllers
        team: fix checks in team_get_first_port_txable_rcu()
        team: move add to port list before port enablement
        team: check return value of team_get_port_by_index_rcu() for NULL
        tuntap: set SOCK_ZEROCOPY flag during open
        netlink: fix error propagation in netlink_mmap()
        ...
      26e04462
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 645a9929
      Linus Torvalds 提交于
      Pull input layer bugfix from Jiri Kosina:
       "Memory leak regression fix from Benjamin Tissoires"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
        HID: multitouch: prevent memleak with the allocated name
      645a9929
    • L
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · b2cc9c19
      Linus Torvalds 提交于
      Pull block layer fixes from Jens Axboe:
       "Outside of bcache (which really isn't super big), these are all
        few-liners.  There are a few important fixes in here:
      
         - Fix blk pm sleeping when holding the queue lock
      
         - A small collection of bcache fixes that have been done and tested
           since bcache was included in this merge window.
      
         - A fix for a raid5 regression introduced with the bio changes.
      
         - Two important fixes for mtip32xx, fixing an oops and potential data
           corruption (or hang) due to wrong bio iteration on stacked devices."
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        scatterlist: sg_set_buf() argument must be in linear mapping
        raid5: Initialize bi_vcnt
        pktcdvd: silence static checker warning
        block: remove refs to XD disks from documentation
        blkpm: avoid sleep when holding queue lock
        mtip32xx: Correctly handle bio->bi_idx != 0 conditions
        mtip32xx: Fix NULL pointer dereference during module unload
        bcache: Fix error handling in init code
        bcache: clarify free/available/unused space
        bcache: drop "select CLOSURES"
        bcache: Fix incompatible pointer type warning
      b2cc9c19
    • L
      Merge branch 'akpm' (updates from Andrew Morton) · a568fa1c
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton:
       "Bunch of fixes and one little addition to math64.h"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits)
        include/linux/math64.h: add div64_ul()
        mm: memcontrol: fix lockless reclaim hierarchy iterator
        frontswap: fix incorrect zeroing and allocation size for frontswap_map
        kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules()
        mm: migration: add migrate_entry_wait_huge()
        ocfs2: add missing lockres put in dlm_mig_lockres_handler
        mm/page_alloc.c: fix watermark check in __zone_watermark_ok()
        drivers/misc/sgi-gru/grufile.c: fix info leak in gru_get_config_info()
        aio: fix io_destroy() regression by using call_rcu()
        rtc-at91rm9200: use shadow IMR on at91sam9x5
        rtc-at91rm9200: add shadow interrupt mask
        rtc-at91rm9200: refactor interrupt-register handling
        rtc-at91rm9200: add configuration support
        rtc-at91rm9200: add match-table compile guard
        fs/ocfs2/namei.c: remove unecessary ERROR when removing non-empty directory
        swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion
        drivers/rtc/rtc-twl.c: fix missing device_init_wakeup() when booted with device tree
        cciss: fix broken mutex usage in ioctl
        audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE
        drivers/rtc/rtc-cmos.c: fix accidentally enabling rtc channel
        ...
      a568fa1c
    • A
      include/linux/math64.h: add div64_ul() · c2853c8d
      Alex Shi 提交于
      There is div64_long() to handle the s64/long division, but no mocro do
      u64/ul division.  It is necessary in some scenarios, so add this
      function.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NAlex Shi <alex.shi@intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2853c8d
    • J
      mm: memcontrol: fix lockless reclaim hierarchy iterator · 89dc991f
      Johannes Weiner 提交于
      The lockless reclaim hierarchy iterator currently has a misplaced
      barrier that can lead to use-after-free crashes.
      
      The reclaim hierarchy iterator consist of a sequence count and a
      position pointer that are read and written locklessly, with memory
      barriers enforcing ordering.
      
      The write side sets the position pointer first, then updates the
      sequence count to "publish" the new position.  Likewise, the read side
      must read the sequence count first, then the position.  If the sequence
      count is up to date, it's guaranteed that the position is up to date as
      well:
      
        writer:                         reader:
        iter->position = position       if iter->sequence == expected:
        smp_wmb()                           smp_rmb()
        iter->sequence = sequence           position = iter->position
      
      However, the read side barrier is currently misplaced, which can lead to
      dereferencing stale position pointers that no longer point to valid
      memory.  Fix this.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NTejun Heo <tj@kernel.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: <stable@kernel.org>		[3.10+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      89dc991f
    • A
      frontswap: fix incorrect zeroing and allocation size for frontswap_map · 7b57976d
      Akinobu Mita 提交于
      The bitmap accessed by bitops must have enough size to hold the required
      numbers of bits rounded up to a multiple of BITS_PER_LONG.  And the
      bitmap must not be zeroed by memset() if the number of bits cleared is
      not a multiple of BITS_PER_LONG.
      
      This fixes incorrect zeroing and allocation size for frontswap_map.  The
      incorrect zeroing part doesn't cause any problem because frontswap_map
      is freed just after zeroing.  But the wrongly calculated allocation size
      may cause the problem.
      
      For 32bit systems, the allocation size of frontswap_map is about twice
      as large as required size.  For 64bit systems, the allocation size is
      smaller than requeired if the number of bits is not a multiple of
      BITS_PER_LONG.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b57976d
    • C
      kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules() · 736f3203
      Chen Gang 提交于
      audit_add_tree_rule() must set 'rule->tree = NULL;' firstly, to protect
      the rule itself freed in kill_rules().
      
      The reason is when it is killed, the 'rule' itself may have already
      released, we should not access it.  one example: we add a rule to an
      inode, just at the same time the other task is deleting this inode.
      
      The work flow for adding a rule:
      
          audit_receive() -> (need audit_cmd_mutex lock)
            audit_receive_skb() ->
              audit_receive_msg() ->
                audit_receive_filter() ->
                  audit_add_rule() ->
                    audit_add_tree_rule() -> (need audit_filter_mutex lock)
                      ...
                      unlock audit_filter_mutex
                      get_tree()
                      ...
                      iterate_mounts() -> (iterate all related inodes)
                        tag_mount() ->
                          tag_trunk() ->
                            create_trunk() -> (assume it is 1st rule)
                              fsnotify_add_mark() ->
                                fsnotify_add_inode_mark() ->  (add mark to inode->i_fsnotify_marks)
                              ...
                              get_tree(); (each inode will get one)
                      ...
                      lock audit_filter_mutex
      
      The work flow for deleting an inode:
      
          __destroy_inode() ->
           fsnotify_inode_delete() ->
             __fsnotify_inode_delete() ->
              fsnotify_clear_marks_by_inode() ->  (get mark from inode->i_fsnotify_marks)
                fsnotify_destroy_mark() ->
                 fsnotify_destroy_mark_locked() ->
                   audit_tree_freeing_mark() ->
                     evict_chunk() ->
                       ...
                       tree->goner = 1
                       ...
                       kill_rules() ->   (assume current->audit_context == NULL)
                         call_rcu() ->   (rule->tree != NULL)
                           audit_free_rule_rcu() ->
                             audit_free_rule()
                       ...
                       audit_schedule_prune() ->  (assume current->audit_context == NULL)
                         kthread_run() ->    (need audit_cmd_mutex and audit_filter_mutex lock)
                           prune_one() ->    (delete it from prue_list)
                             put_tree(); (match the original get_tree above)
      Signed-off-by: NChen Gang <gang.chen@asianux.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      736f3203
    • N
      mm: migration: add migrate_entry_wait_huge() · 30dad309
      Naoya Horiguchi 提交于
      When we have a page fault for the address which is backed by a hugepage
      under migration, the kernel can't wait correctly and do busy looping on
      hugepage fault until the migration finishes.  As a result, users who try
      to kick hugepage migration (via soft offlining, for example) occasionally
      experience long delay or soft lockup.
      
      This is because pte_offset_map_lock() can't get a correct migration entry
      or a correct page table lock for hugepage.  This patch introduces
      migration_entry_wait_huge() to solve this.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@vger.kernel.org>	[2.6.35+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30dad309
    • X
      ocfs2: add missing lockres put in dlm_mig_lockres_handler · 27749f2f
      Xue jiufei 提交于
      dlm_mig_lockres_handler() is missing a dlm_lockres_put() on an error path.
      Signed-off-by: Njoyce <xuejiufei@huawei.com>
      Reviewed-by: Nshencanquan <shencanquan@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27749f2f
    • T
      mm/page_alloc.c: fix watermark check in __zone_watermark_ok() · 026b0814
      Tomasz Stanislawski 提交于
      The watermark check consists of two sub-checks.  The first one is:
      
      	if (free_pages <= min + lowmem_reserve)
      		return false;
      
      The check assures that there is minimal amount of RAM in the zone.  If
      CMA is used then the free_pages is reduced by the number of free pages
      in CMA prior to the over-mentioned check.
      
      	if (!(alloc_flags & ALLOC_CMA))
      		free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
      
      This prevents the zone from being drained from pages available for
      non-movable allocations.
      
      The second check prevents the zone from getting too fragmented.
      
      	for (o = 0; o < order; o++) {
      		free_pages -= z->free_area[o].nr_free << o;
      		min >>= 1;
      		if (free_pages <= min)
      			return false;
      	}
      
      The field z->free_area[o].nr_free is equal to the number of free pages
      including free CMA pages.  Therefore the CMA pages are subtracted twice.
      This may cause a false positive fail of __zone_watermark_ok() if the CMA
      area gets strongly fragmented.  In such a case there are many 0-order
      free pages located in CMA.  Those pages are subtracted twice therefore
      they will quickly drain free_pages during the check against
      fragmentation.  The test fails even though there are many free non-cma
      pages in the zone.
      
      This patch fixes this issue by subtracting CMA pages only for a purpose of
      (free_pages <= min + lowmem_reserve) check.
      
      Laura said:
      
        We were observing allocation failures of higher order pages (order 5 =
        128K typically) under tight memory conditions resulting in driver
        failure.  The output from the page allocation failure showed plenty of
        free pages of the appropriate order/type/zone and mostly CMA pages in
        the lower orders.
      
        For full disclosure, we still observed some page allocation failures
        even after applying the patch but the number was drastically reduced and
        those failures were attributed to fragmentation/other system issues.
      Signed-off-by: NTomasz Stanislawski <t.stanislaws@samsung.com>
      Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>
      Tested-by: NLaura Abbott <lauraa@codeaurora.org>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Tested-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Cc: <stable@vger.kernel.org>	[3.7+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      026b0814
    • D
      drivers/misc/sgi-gru/grufile.c: fix info leak in gru_get_config_info() · 282c4c0e
      Dan Carpenter 提交于
      The "info.fill" array isn't initialized so it can leak uninitialized stack
      information to user space.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NRobin Holt <holt@sgi.com>
      Acked-by: NDimitri Sivanich <sivanich@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      282c4c0e
    • K
      aio: fix io_destroy() regression by using call_rcu() · 4fcc712f
      Kent Overstreet 提交于
      There was a regression introduced by 36f55889 ("aio: refcounting
      cleanup"), reported by Jens Axboe - the refcounting cleanup switched to
      using RCU in the shutdown path, but the synchronize_rcu() was done in
      the context of the io_destroy() syscall greatly increasing the time it
      could block.
      
      This patch switches it to call_rcu() and makes shutdown asynchronous
      (more asynchronous than it was originally; before the refcount changes
      io_destroy() would still wait on pending kiocbs).
      
      Note that there's a global quota on the max outstanding kiocbs, and that
      quota must be manipulated synchronously; otherwise io_setup() could
      return -EAGAIN when there isn't quota available, and userspace won't
      have any way of waiting until shutdown of the old kioctxs has finished
      (besides busy looping).
      
      So we release our quota before kioctx shutdown has finished, which
      should be fine since the quota never corresponded to anything real
      anyways.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Reported-by: NJens Axboe <axboe@kernel.dk>
      Tested-by: NJens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      Tested-by: NBenjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4fcc712f
    • J
      rtc-at91rm9200: use shadow IMR on at91sam9x5 · bba00e59
      Johan Hovold 提交于
      Add support for the at91sam9x5-family which must use the shadow
      interrupt mask due to a hardware issue (causing RTC_IMR to always be
      zero).
      Signed-off-by: NJohan Hovold <jhovold@gmail.com>
      Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
      Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
      Cc: Robert Nelson <Robert.Nelson@digikey.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bba00e59
    • J
      rtc-at91rm9200: add shadow interrupt mask · e9f08bbe
      Johan Hovold 提交于
      Add shadow interrupt-mask register which can be used on SoCs where the
      actual hardware register is broken.
      
      Note that some care needs to be taken to make sure the shadow mask
      corresponds to the actual hardware state.  The added overhead is not an
      issue for the non-broken SoCs due to the relatively infrequent
      interrupt-mask updates.  We do, however, only use the shadow mask value
      as a fall-back when it actually needed as there is still a theoretical
      possibility that the mask is incorrect (see the code for details).
      Signed-off-by: NJohan Hovold <jhovold@gmail.com>
      Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
      Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
      Cc: Robert Nelson <Robert.Nelson@digikey.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e9f08bbe
    • J
      rtc-at91rm9200: refactor interrupt-register handling · e304fcd0
      Johan Hovold 提交于
      Add accessors for the interrupt register.
      
      This will allow us to easily add a shadow interrupt-mask register to use
      on SoCs where the interrupt-mask register cannot be used.
      Signed-off-by: NJohan Hovold <jhovold@gmail.com>
      Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
      Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
      Cc: Robert Nelson <Robert.Nelson@digikey.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e304fcd0
    • J
      rtc-at91rm9200: add configuration support · de645475
      Johan Hovold 提交于
      Add configuration support which can be used to implement SoC-specific
      workarounds for broken hardware.
      Signed-off-by: NJohan Hovold <jhovold@gmail.com>
      Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
      Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
      Cc: Robert Nelson <Robert.Nelson@digikey.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de645475
    • J
      rtc-at91rm9200: add match-table compile guard · 558c61e5
      Johan Hovold 提交于
      The members of Atmel's at91sam9x5 family (9x5) have a broken RTC
      interrupt mask register (AT91_RTC_IMR).  It does not reflect enabled
      interrupts but instead always returns zero.
      
      The kernel's rtc-at91rm9200 driver handles the RTC for the 9x5 family.
      Currently when the date/time is set, an interrupt is generated and this
      driver neglects to handle the interrupt.  The kernel complains about the
      un-handled interrupt and disables it henceforth.  This not only breaks
      the RTC function, but since that interrupt is shared (Atmel's SYS
      interrupt) then other things break as well (e.g.  the debug port no
      longer accepts characters).
      
      Tested on the at91sam9g25.  Bug confirmed by Atmel.
      
      This patch (of 5):
      
      Add missing match-table compile guard.
      Signed-off-by: NJohan Hovold <jhovold@gmail.com>
      Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
      Cc: Ludovic Desroches <ludovic.desroches@atmel.com>
      Cc: Robert Nelson <Robert.Nelson@digikey.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      558c61e5
    • G
      fs/ocfs2/namei.c: remove unecessary ERROR when removing non-empty directory · e0991271
      Goldwyn Rodrigues 提交于
      While removing a non-empty directory, the kernel dumps a message:
      
        (rmdir,21743,1):ocfs2_unlink:953 ERROR: status = -39
      
      Suppress the error message from being printed in the dmesg so users
      don't panic.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Acked-by: NSunil Mushran <sunil.mushran@gmail.com>
      Reviewed-by: NJie Liu <jeff.liu@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e0991271
    • R
      swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion · cbab0e4e
      Rafael Aquini 提交于
      read_swap_cache_async() can race against get_swap_page(), and stumble
      across a SWAP_HAS_CACHE entry in the swap map whose page wasn't brought
      into the swapcache yet.
      
      This transient swap_map state is expected to be transitory, but the
      actual placement of discard at scan_swap_map() inserts a wait for I/O
      completion thus making the thread at read_swap_cache_async() to loop
      around its -EEXIST case, while the other end at get_swap_page() is
      scheduled away at scan_swap_map().  This can leave the system deadlocked
      if the I/O completion happens to be waiting on the CPU waitqueue where
      read_swap_cache_async() is busy looping and !CONFIG_PREEMPT.
      
      This patch introduces a cond_resched() call to make the aforementioned
      read_swap_cache_async() busy loop condition to bail out when necessary,
      thus avoiding the subtle race window.
      Signed-off-by: NRafael Aquini <aquini@redhat.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cbab0e4e
    • T
      drivers/rtc/rtc-twl.c: fix missing device_init_wakeup() when booted with device tree · 24b8256a
      Tony Lindgren 提交于
      When booted in legacy mode device_init_wakeup() gets called by
      drivers/mfd/twl-core.c when the children are initialized.  However, when
      booted using device tree, the children are created with
      of_platform_populate() instead add_children().
      
      This means that the RTC driver will not have device_init_wakeup() set,
      and we need to call it from the driver probe like RTC drivers typically
      do.
      
      Without this we cannot test PM wake-up events on omaps for cases where
      there may not be any physical wake-up event.
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      Reported-by: NKevin Hilman <khilman@linaro.org>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: Jingoo Han <jg1.han@samsung.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      24b8256a
    • S
      cciss: fix broken mutex usage in ioctl · 03f47e88
      Stephen M. Cameron 提交于
      If a new logical drive is added and the CCISS_REGNEWD ioctl is invoked
      (as is normal with the Array Configuration Utility) the process will
      hang as below.  It attempts to acquire the same mutex twice, once in
      do_ioctl() and once in cciss_unlocked_open().  The BKL was recursive,
      the mutex isn't.
      
        Linux version 3.10.0-rc2 (scameron@localhost.localdomain) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Fri May 24 14:32:12 CDT 2013
        [...]
        acu             D 0000000000000001     0  3246   3191 0x00000080
        Call Trace:
          schedule+0x29/0x70
          schedule_preempt_disabled+0xe/0x10
          __mutex_lock_slowpath+0x17b/0x220
          mutex_lock+0x2b/0x50
          cciss_unlocked_open+0x2f/0x110 [cciss]
          __blkdev_get+0xd3/0x470
          blkdev_get+0x5c/0x1e0
          register_disk+0x182/0x1a0
          add_disk+0x17c/0x310
          cciss_add_disk+0x13a/0x170 [cciss]
          cciss_update_drive_info+0x39b/0x480 [cciss]
          rebuild_lun_table+0x258/0x370 [cciss]
          cciss_ioctl+0x34f/0x470 [cciss]
          do_ioctl+0x49/0x70 [cciss]
          __blkdev_driver_ioctl+0x28/0x30
          blkdev_ioctl+0x200/0x7b0
          block_ioctl+0x3c/0x40
          do_vfs_ioctl+0x89/0x350
          SyS_ioctl+0xa1/0xb0
          system_call_fastpath+0x16/0x1b
      
      This mutex usage was added into the ioctl path when the big kernel lock
      was removed.  As it turns out, these paths are all thread safe anyway
      (or can easily be made so) and we don't want ioctl() to be single
      threaded in any case.
      Signed-off-by: NStephen M. Cameron <scameron@beardog.cce.hp.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Mike Miller <mike.miller@hp.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      03f47e88
    • O
      audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE · f000cfdd
      Oleg Nesterov 提交于
      audit_log_start() does wait_for_auditd() in a loop until
      audit_backlog_wait_time passes or audit_skb_queue has a room.
      
      If signal_pending() is true this becomes a busy-wait loop, schedule() in
      TASK_INTERRUPTIBLE won't block.
      
      Thanks to Guy for fully investigating and explaining the problem.
      
      (akpm: that'll cause the system to lock up on a non-preemptible
      uniprocessor kernel)
      
      (Guy: "Our customer was in fact running a uniprocessor machine, and they
      reported a system hang.")
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reported-by: NGuy Streeter <streeter@redhat.com>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f000cfdd
    • D
      drivers/rtc/rtc-cmos.c: fix accidentally enabling rtc channel · ebf8d6c8
      Derek Basehore 提交于
      During resume, we call hpet_rtc_timer_init after masking an irq bit in
      hpet.  This will cause the call to hpet_disable_rtc_channel to be undone
      if RTC_AIE is the only bit not masked.
      
      Allowing the cmos interrupt handler to run before resuming caused some
      issues where the timer for the alarm was not removed.  This would cause
      other, later timers to not be cleared, so utilities such as hwclock
      would time out when waiting for the update interrupt.
      
      [akpm@linux-foundation.org: coding-style tweak]
      Signed-off-by: NDerek Basehore <dbasehore@chromium.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ebf8d6c8