1. 17 8月, 2013 1 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · ddea368c
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Fix SKB leak in 8139cp, from Dave Jones.
      
       2) Fix use of *_PAGES interfaces with mlx5 firmware, from Moshe Lazar.
      
       3) RCU conversion of macvtap introduced two races, fixes by Eric
          Dumazet
      
       4) Synchronize statistic flows in bnx2x driver to prevent corruption,
          from Dmitry Kravkov
      
       5) Undo optimization in IP tunneling, we were using the inner IP header
          in some cases to inherit the IP ID, but that isn't correct in some
          circumstances.  From Pravin B Shelar
      
       6) Use correct struct size when parsing netlink attributes in
          rtnl_bridge_getlink().  From Asbjoern Sloth Toennesen
      
       7) Length verifications in tun_get_user() are bogus, from Weiping Pan
          and Dan Carpenter
      
       8) Fix bad merge resolution during 3.11 networking development in
          openvswitch, albeit a harmless one which added some unreachable
          code.  From Jesse Gross
      
       9) Wrong size used in flexible array allocation in openvswitch, from
          Pravin B Shelar
      
      10) Clear out firmware capability flags the be2net driver isn't ready to
          handle yet, from Sarveshwar Bandi
      
      11) Revert DMA mapping error checking addition to cxgb3 driver, it's
          buggy.  From Alexey Kardashevskiy
      
      12) Fix regression in packet scheduler rate limiting when working with a
          link layer of ATM.  From Jesper Dangaard Brouer
      
      13) Fix several errors in TCP Cubic congestion control, in particular
          overflow errors in timestamp calculations.  From Eric Dumazet and
          Van Jacobson
      
      14) In ipv6 routing lookups, we need to backtrack if subtree traversal
          don't result in a match.  From Hannes Frederic Sowa
      
      15) ipgre_header() returns incorrect packet offset.  Fix from Timo Teräs
      
      16) Get "low latency" out of the new MIB counter names.  From Eliezer
          Tamir
      
      17) State check in ndo_dflt_fdb_del() is inverted, from Sridhar
          Samudrala
      
      18) Handle TCP Fast Open properly in netfilter conntrack, from Yuchung
          Cheng
      
      19) Wrong memcpy length in pcan_usb driver, from Stephane Grosjean
      
      20) Fix dealock in TIPC, from Wang Weidong and Ding Tianhong
      
      21) call_rcu() call to destroy SCTP transport is done too early and
          might result in an oops.  From Daniel Borkmann
      
      22) Fix races in genetlink family dumps, from Johannes Berg
      
      23) Flags passed into macvlan by the user need to be validated properly,
          from Michael S Tsirkin
      
      24) Fix skge build on 32-bit, from Stephen Hemminger
      
      25) Handle malformed TCP headers properly in xt_TCPMSS, from Pablo Neira
          Ayuso
      
      26) Fix handling of stacked vlans in vlan_dev_real_dev(), from Nikolay
          Aleksandrov
      
      27) Eliminate MTU calculation overflows in esp{4,6}, from Daniel
          Borkmann
      
      28) neigh_parms need to be setup before calling the ->ndo_neigh_setup()
          method.  From Veaceslav Falico
      
      29) Kill out-of-bounds prefetch in fib_trie, from Eric Dumazet
      
      30) Don't dereference MLD query message if the length isn't value in the
          bridge multicast code, from Linus Lüssing
      
      31) Fix VXLAN IGMP join regression due to an inverted check, from Cong
          Wang
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (70 commits)
        net/mlx5_core: Support MANAGE_PAGES and QUERY_PAGES firmware command changes
        tun: signedness bug in tun_get_user()
        qlcnic: Fix diagnostic interrupt test for 83xx adapters
        qlcnic: Fix beacon state return status handling
        qlcnic: Fix set driver version command
        net: tg3: fix NULL pointer dereference in tg3_io_error_detected and tg3_io_slot_reset
        net_sched: restore "linklayer atm" handling
        drivers/net/ethernet/via/via-velocity.c: update napi implementation
        Revert "cxgb3: Check and handle the dma mapping errors"
        be2net: Clear any capability flags that driver is not interested in.
        openvswitch: Reset tunnel key between input and output.
        openvswitch: Use correct type while allocating flex array.
        openvswitch: Fix bad merge resolution.
        tun: compare with 0 instead of total_len
        rtnetlink: rtnl_bridge_getlink: Call nlmsg_find_attr() with ifinfomsg header
        ethernet/arc/arc_emac - fix NAPI "work > weight" warning
        ip_tunnel: Do not use inner ip-header-id for tunnel ip-header-id.
        bnx2x: prevent crash in shutdown flow with CNIC
        bnx2x: fix PTE write access error
        bnx2x: fix memory leak in VF
        ...
      ddea368c
  2. 16 8月, 2013 7 次提交
  3. 15 8月, 2013 9 次提交
    • J
      net_sched: restore "linklayer atm" handling · 8a8e3d84
      Jesper Dangaard Brouer 提交于
      commit 56b765b7 ("htb: improved accuracy at high rates")
      broke the "linklayer atm" handling.
      
       tc class add ... htb rate X ceil Y linklayer atm
      
      The linklayer setting is implemented by modifying the rate table
      which is send to the kernel.  No direct parameter were
      transferred to the kernel indicating the linklayer setting.
      
      The commit 56b765b7 ("htb: improved accuracy at high rates")
      removed the use of the rate table system.
      
      To keep compatible with older iproute2 utils, this patch detects
      the linklayer by parsing the rate table.  It also supports future
      versions of iproute2 to send this linklayer parameter to the
      kernel directly. This is done by using the __reserved field in
      struct tc_ratespec, to convey the choosen linklayer option, but
      only using the lower 4 bits of this field.
      
      Linklayer detection is limited to speeds below 100Mbit/s, because
      at high rates the rtab is gets too inaccurate, so bad that
      several fields contain the same values, this resembling the ATM
      detect.  Fields even start to contain "0" time to send, e.g. at
      1000Mbit/s sending a 96 bytes packet cost "0", thus the rtab have
      been more broken than we first realized.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a8e3d84
    • D
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch · 09a8f031
      David S. Miller 提交于
      Jesse Gross says:
      
      ====================
      Three bug fixes that are fairly small either way but resolve obviously
      incorrect code. For net/3.11.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09a8f031
    • J
      drivers/net/ethernet/via/via-velocity.c: update napi implementation · 2fdac010
      Julia Lawall 提交于
      Drivers supporting NAPI should use a NAPI-specific function for receiving
      packets.  Hence netif_rx is changed to netif_receive_skb.
      
      Furthermore netif_napi_del should be used in the probe and remove function
      to clean up the NAPI resource information.
      
      Thanks to Francois Romieu, David Shwatrz and Rami Rosen for their help on
      this patch.
      Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2fdac010
    • A
      Revert "cxgb3: Check and handle the dma mapping errors" · 728e2cca
      Alexey Kardashevskiy 提交于
      This reverts commit f83331ba.
      
      As the tests PPC64 (powernv platform) show, IOMMU pages are leaking
      when transferring big amount of small packets (<=64 bytes),
      "ping -f" and waiting for 15 seconds is the simplest way to confirm the bug.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Santosh Rastapur <santosh@chelsio.com>
      Cc: Jay Fenlason <fenlason@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Divy Le ray <divy@chelsio.com>
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NDivy Le Ray <divy@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      728e2cca
    • S
      be2net: Clear any capability flags that driver is not interested in. · 3da988c9
      Sarveshwar Bandi 提交于
      It is possible for some versions of firmware to advertise capabilities that driver
      is not ready to handle. This may lead to controller stall. Since the driver is
      interested only in subset of flags, clearing the rest.
      Signed-off-by: NSarveshwar Bandi <sarveshwar.bandi@emulex.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3da988c9
    • J
      openvswitch: Reset tunnel key between input and output. · 36bf5cc6
      Jesse Gross 提交于
      It doesn't make sense to output a tunnel packet using the same
      parameters that it was received with since that will generally
      just result in the packet going back to us. As a result, userspace
      assumes that the tunnel key is cleared when transitioning through
      the switch. In the majority of cases this doesn't matter since a
      packet is either going to a tunnel port (in which the key is
      overwritten with new values) or to a non-tunnel port (in which
      case the key is ignored). However, it's theoreticaly possible that
      userspace could rely on the documented behavior, so this corrects
      it.
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      36bf5cc6
    • P
      openvswitch: Use correct type while allocating flex array. · 42415c90
      Pravin B Shelar 提交于
      Flex array is used to allocate hash buckets which is type struct
      hlist_head, but we use `struct hlist_head *` to calculate
      array size.  Since hlist_head is of size pointer it works fine.
      
      Following patch use correct type.
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      42415c90
    • J
      openvswitch: Fix bad merge resolution. · 30444e98
      Jesse Gross 提交于
      git silently included an extra hunk in vport_cmd_set() during
      automatic merging. This code is unreachable so it does not actually
      introduce a problem but it is clearly incorrect.
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      30444e98
    • L
      Merge branch 'akpm' (patches from Andrew Morton) · f1d6e17f
      Linus Torvalds 提交于
      Merge a bunch of fixes from Andrew Morton.
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        fs/proc/task_mmu.c: fix buffer overflow in add_page_map()
        arch: *: Kconfig: add "kernel/Kconfig.freezer" to "arch/*/Kconfig"
        ocfs2: fix null pointer dereference in ocfs2_dir_foreach_blk_id()
        x86 get_unmapped_area(): use proper mmap base for bottom-up direction
        ocfs2: fix NULL pointer dereference in ocfs2_duplicate_clusters_by_page
        ocfs2: Revert 40bd62eb to avoid regression in extended allocation
        drivers/rtc/rtc-stmp3xxx.c: provide timeout for potentially endless loop polling a HW bit
        hugetlb: fix lockdep splat caused by pmd sharing
        aoe: adjust ref of head for compound page tails
        microblaze: fix clone syscall
        mm: save soft-dirty bits on file pages
        mm: save soft-dirty bits on swapped pages
        memcg: don't initialize kmem-cache destroying work for root caches
      f1d6e17f
  4. 14 8月, 2013 23 次提交
    • W
      tun: compare with 0 instead of total_len · d9bf5f13
      Weiping Pan 提交于
      Since we set "len = total_len" in the beginning of tun_get_user(),
      so we should compare the new len with 0, instead of total_len,
      or the if statement always returns false.
      Signed-off-by: NWeiping Pan <wpan@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9bf5f13
    • A
      rtnetlink: rtnl_bridge_getlink: Call nlmsg_find_attr() with ifinfomsg header · 3e805ad2
      Asbjoern Sloth Toennesen 提交于
      Fix the iproute2 command `bridge vlan show`, after switching from
      rtgenmsg to ifinfomsg.
      
      Let's start with a little history:
      
      Feb 20:   Vlad Yasevich got his VLAN-aware bridge patchset included in
                the 3.9 merge window.
                In the kernel commit 6cbdceeb, he added attribute support to
                bridge GETLINK requests sent with rtgenmsg.
      
      Mar 6th:  Vlad got this iproute2 reference implementation of the bridge
                vlan netlink interface accepted (iproute2 9eff0e5c)
      
      Apr 25th: iproute2 switched from using rtgenmsg to ifinfomsg (63338dca)
                http://patchwork.ozlabs.org/patch/239602/
                http://marc.info/?t=136680900700007
      
      Apr 28th: Linus released 3.9
      
      Apr 30th: Stephen released iproute2 3.9.0
      
      The `bridge vlan show` command haven't been working since the switch to
      ifinfomsg, or in a released version of iproute2. Since the kernel side
      only supports rtgenmsg, which iproute2 switched away from just prior to
      the iproute2 3.9.0 release.
      
      I haven't been able to find any documentation, about neither rtgenmsg
      nor ifinfomsg, and in which situation to use which, but kernel commit
      88c5b5ce seams to suggest that ifinfomsg should be used.
      
      Fixing this in kernel will break compatibility, but I doubt that anybody
      have been using it due to this bug in the user space reference
      implementation, at least not without noticing this bug. That said the
      functionality is still fully functional in 3.9, when reversing iproute2
      commit 63338dca.
      
      This could also be fixed in iproute2, but thats an ugly patch that would
      reintroduce rtgenmsg in iproute2, and from searching in netdev it seams
      like rtgenmsg usage is discouraged. I'm assuming that the only reason
      that Vlad implemented the kernel side to use rtgenmsg, was because
      iproute2 was using it at the time.
      Signed-off-by: NAsbjoern Sloth Toennesen <ast@fiberby.net>
      Reviewed-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e805ad2
    • Y
      fs/proc/task_mmu.c: fix buffer overflow in add_page_map() · 8c829622
      yonghua zheng 提交于
      Recently we met quite a lot of random kernel panic issues after enabling
      CONFIG_PROC_PAGE_MONITOR.  After debuggind we found this has something
      to do with following bug in pagemap:
      
      In struct pagemapread:
      
        struct pagemapread {
            int pos, len;
            pagemap_entry_t *buffer;
            bool v2;
        };
      
      pos is number of PM_ENTRY_BYTES in buffer, but len is the size of
      buffer, it is a mistake to compare pos and len in add_page_map() for
      checking buffer is full or not, and this can lead to buffer overflow and
      random kernel panic issue.
      
      Correct len to be total number of PM_ENTRY_BYTES in buffer.
      
      [akpm@linux-foundation.org: document pagemapread.pos and .len units, fix PM_ENTRY_BYTES definition]
      Signed-off-by: NYonghua Zheng <younghua.zheng@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c829622
    • C
      arch: *: Kconfig: add "kernel/Kconfig.freezer" to "arch/*/Kconfig" · 57a1a197
      Chen Gang 提交于
      All architectures include "kernel/Kconfig.freezer" except three left, so
      let them include it too, or 'allmodconfig' will report error.
      
      The related errors: (with allmodconfig for openrisc):
      
          CC      kernel/cgroup_freezer.o
        kernel/cgroup_freezer.c: In function 'freezer_css_online':
        kernel/cgroup_freezer.c:133:15: error: 'system_freezing_cnt' undeclared (first use in this function)
        kernel/cgroup_freezer.c:133:15: note: each undeclared identifier is reported only once for each function it appears in
        kernel/cgroup_freezer.c: In function 'freezer_css_offline':
        kernel/cgroup_freezer.c:157:15: error: 'system_freezing_cnt' undeclared (first use in this function)
        kernel/cgroup_freezer.c: In function 'freezer_attach':
        kernel/cgroup_freezer.c:200:4: error: implicit declaration of function 'freeze_task'
        kernel/cgroup_freezer.c: In function 'freezer_apply_state':
        kernel/cgroup_freezer.c:371:16: error: 'system_freezing_cnt' undeclared (first use in this function)
      Signed-off-by: NChen Gang <gang.chen@asianux.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Chen Liqin <liqin.chen@sunplusct.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57a1a197
    • J
      ocfs2: fix null pointer dereference in ocfs2_dir_foreach_blk_id() · d6394b59
      Jeff Liu 提交于
      Fix a NULL pointer deference while removing an empty directory, which
      was introduced by commit 3704412b ("[readdir] convert ocfs2").
      
        BUG: unable to handle kernel NULL pointer dereference at (null)
        IP: [<(null)>]           (null)
        PGD 6da85067 PUD 6da89067 PMD 0
        Oops: 0010 [#1] SMP
        CPU: 0 PID: 6564 Comm: rmdir Tainted: G           O 3.11.0-rc1 #4
        RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
        Call Trace:
          ocfs2_dir_foreach+0x49/0x50 [ocfs2]
          ocfs2_empty_dir+0x12c/0x3e0 [ocfs2]
          ocfs2_unlink+0x56e/0xc10 [ocfs2]
          vfs_rmdir+0xd5/0x140
          do_rmdir+0x1cb/0x1e0
          SyS_rmdir+0x16/0x20
          system_call_fastpath+0x16/0x1b
        Code:  Bad RIP value.
        RIP  [<          (null)>]           (null)
        RSP <ffff88006daddc10>
        CR2: 0000000000000000
      
      [dan.carpenter@oracle.com: fix pointer math]
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reported-by: NDavid Weber <wb@munzinger.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d6394b59
    • R
      x86 get_unmapped_area(): use proper mmap base for bottom-up direction · df54d6fa
      Radu Caragea 提交于
      When the stack is set to unlimited, the bottomup direction is used for
      mmap-ings but the mmap_base is not used and thus effectively renders
      ASLR for mmapings along with PIE useless.
      
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Adrian Sendroiu <molecula2788@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      df54d6fa
    • T
      ocfs2: fix NULL pointer dereference in ocfs2_duplicate_clusters_by_page · c7dd3392
      Tiger Yang 提交于
      Since ocfs2_cow_file_pos will invoke ocfs2_refcount_icow with a NULL as
      the struct file pointer, it finally result in a null pointer dereference
      in ocfs2_duplicate_clusters_by_page.
      
      This patch replace file pointer with inode pointer in
      cow_duplicate_clusters to fix this issue.
      
      [jeff.liu@oracle.com: rebased patch against linux-next tree]
      Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Acked-by: NTao Ma <tm@tao.ma>
      Tested-by: NDavid Weber <wb@munzinger.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c7dd3392
    • J
      ocfs2: Revert 40bd62eb to avoid regression in extended allocation · 6115ea28
      Jie Liu 提交于
      Revert commit 40bd62eb ("fs/ocfs2/journal.h: add bits_wanted while
      calculating credits in ocfs2_calc_extend_credits").
      
      Unfortunately this change broke fallocate even if there is insufficient
      disk space for the preallocation, which is a serious problem.
      
        # df -h
        /dev/sda8        22G  1.2G   21G   6% /ocfs2
        # fallocate -o 0 -l 200M /ocfs2/testfile
        fallocate: /ocfs2/test: fallocate failed: No space left on device
      
      and a kernel warning:
      
        CPU: 3 PID: 3656 Comm: fallocate Tainted: G        W  O 3.11.0-rc3 #2
        Call Trace:
          dump_stack+0x77/0x9e
          warn_slowpath_common+0xc4/0x110
          warn_slowpath_null+0x2a/0x40
          start_this_handle+0x6c/0x640 [jbd2]
          jbd2__journal_start+0x138/0x300 [jbd2]
          jbd2_journal_start+0x23/0x30 [jbd2]
          ocfs2_start_trans+0x166/0x300 [ocfs2]
          __ocfs2_extend_allocation+0x38f/0xdb0 [ocfs2]
          ocfs2_allocate_unwritten_extents+0x3c9/0x520
          __ocfs2_change_file_space+0x5e0/0xa60 [ocfs2]
          ocfs2_fallocate+0xb1/0xe0 [ocfs2]
          do_fallocate+0x1cb/0x220
          SyS_fallocate+0x6f/0xb0
          system_call_fastpath+0x16/0x1b
        JBD2: fallocate wants too many credits (51216 > 4381)
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6115ea28
    • L
      drivers/rtc/rtc-stmp3xxx.c: provide timeout for potentially endless loop polling a HW bit · 28a0c883
      Lothar Waßmann 提交于
      It's always a bad idea to poll on HW bits without a timeout.
      
      The i.MX28 RTC can be easily brought into a state in which the RTC is
      not running (until after a power-on-reset) and thus the status bits
      which are polled in the driver won't ever change.
      
      This patch prevents the kernel from getting stuck in this case.
      Signed-off-by: NLothar Waßmann <LW@KARO-electronics.de>
      Acked-by: NWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      28a0c883
    • M
      hugetlb: fix lockdep splat caused by pmd sharing · b610ded7
      Michal Hocko 提交于
      Dave has reported the following lockdep splat:
      
        =================================
        [ INFO: inconsistent lock state ]
        3.11.0-rc1+ #9 Not tainted
        ---------------------------------
        inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
        kswapd0/49 [HC0[0]:SC0[0]:HE1:SE1] takes:
         (&mapping->i_mmap_mutex){+.+.?.}, at: [<c114971b>] page_referenced+0x87/0x5e3
        {RECLAIM_FS-ON-W} state was registered at:
           mark_held_locks+0x81/0xe7
           lockdep_trace_alloc+0x5e/0xbc
           __alloc_pages_nodemask+0x8b/0x9b6
           __get_free_pages+0x20/0x31
           get_zeroed_page+0x12/0x14
           __pmd_alloc+0x1c/0x6b
           huge_pmd_share+0x265/0x283
           huge_pte_alloc+0x5d/0x71
           hugetlb_fault+0x7c/0x64a
           handle_mm_fault+0x255/0x299
           __do_page_fault+0x142/0x55c
           do_page_fault+0xd/0x16
           error_code+0x6c/0x74
        irq event stamp: 3136917
        hardirqs last  enabled at (3136917):  _raw_spin_unlock_irq+0x27/0x50
        hardirqs last disabled at (3136916):  _raw_spin_lock_irq+0x15/0x78
        softirqs last  enabled at (3136180):  __do_softirq+0x137/0x30f
        softirqs last disabled at (3136175):  irq_exit+0xa8/0xaa
        other info that might help us debug this:
         Possible unsafe locking scenario:
               CPU0
               ----
          lock(&mapping->i_mmap_mutex);
          <Interrupt>
            lock(&mapping->i_mmap_mutex);
      
        *** DEADLOCK ***
        no locks held by kswapd0/49.
      
        stack backtrace:
        CPU: 1 PID: 49 Comm: kswapd0 Not tainted 3.11.0-rc1+ #9
        Hardware name: Dell Inc.                 Precision WorkStation 490    /0DT031, BIOS A08 04/25/2008
        Call Trace:
          dump_stack+0x4b/0x79
          print_usage_bug+0x1d9/0x1e3
          mark_lock+0x1e0/0x261
          __lock_acquire+0x623/0x17f2
          lock_acquire+0x7d/0x195
          mutex_lock_nested+0x6c/0x3a7
          page_referenced+0x87/0x5e3
          shrink_page_list+0x3d9/0x947
          shrink_inactive_list+0x155/0x4cb
          shrink_lruvec+0x300/0x5ce
          shrink_zone+0x53/0x14e
          kswapd+0x517/0xa75
          kthread+0xa8/0xaa
          ret_from_kernel_thread+0x1b/0x28
      
      which is a false positive caused by hugetlb pmd sharing code which
      allocates a new pmd from withing mapping->i_mmap_mutex.  If this
      allocation causes reclaim then the lockdep detector complains that we
      might self-deadlock.
      
      This is not correct though, because hugetlb pages are not reclaimable so
      their mapping will be never touched from the reclaim path.
      
      The patch tells lockup detector that hugetlb i_mmap_mutex is special by
      assigning it a separate lockdep class so it won't report possible
      deadlocks on unrelated mappings.
      
      [peterz@infradead.org: comment for annotation]
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b610ded7
    • E
      aoe: adjust ref of head for compound page tails · fb32975d
      Ed Cashin 提交于
      Fix a BUG which can trigger when direct-IO is used with AOE.
      
      As discussed previously, the fact that some users of the block layer
      provide bios that point to pages with a zero _count means that it is not
      OK for the network layer to do a put_page on the skb frags during an
      skb_linearize, so the aoe driver gets a reference to pages in bios and
      puts the reference before ending the bio.  And because it cannot use
      get_page on a page with a zero _count, it manipulates the value
      directly.
      
      It is not OK to increment the _count of a compound page tail, though,
      since the VM layer will VM_BUG_ON a non-zero _count.  Block users that
      do direct I/O can result in the aoe driver seeing compound page tails in
      bios.  In that case, the same logic works as long as the head of the
      compound page is used instead of the tails.  This patch handles compound
      pages and does not BUG.
      
      It relies on the block layer user leaving the relationship between the
      page tail and its head alone for the duration between the submission of
      the bio and its completion, whether successful or not.
      Signed-off-by: NEd Cashin <ecashin@coraid.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fb32975d
    • M
      microblaze: fix clone syscall · dfa9771a
      Michal Simek 提交于
      Fix inadvertent breakage in the clone syscall ABI for Microblaze that
      was introduced in commit f3268edb ("microblaze: switch to generic
      fork/vfork/clone").
      
      The Microblaze syscall ABI for clone takes the parent tid address in the
      4th argument; the third argument slot is used for the stack size.  The
      incorrectly-used CLONE_BACKWARDS type assigned parent tid to the 3rd
      slot.
      
      This commit restores the original ABI so that existing userspace libc
      code will work correctly.
      
      All kernel versions from v3.8-rc1 were affected.
      Signed-off-by: NMichal Simek <michal.simek@xilinx.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dfa9771a
    • C
      mm: save soft-dirty bits on file pages · 41bb3476
      Cyrill Gorcunov 提交于
      Andy reported that if file page get reclaimed we lose the soft-dirty bit
      if it was there, so save _PAGE_BIT_SOFT_DIRTY bit when page address get
      encoded into pte entry.  Thus when #pf happens on such non-present pte
      we can restore it back.
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41bb3476
    • C
      mm: save soft-dirty bits on swapped pages · 179ef71c
      Cyrill Gorcunov 提交于
      Andy Lutomirski reported that if a page with _PAGE_SOFT_DIRTY bit set
      get swapped out, the bit is getting lost and no longer available when
      pte read back.
      
      To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is saved in
      pte entry for the page being swapped out.  When such page is to be read
      back from a swap cache we check for bit presence and if it's there we
      clear it and restore the former _PAGE_SOFT_DIRTY bit back.
      
      One of the problem was to find a place in pte entry where we can save
      the _PTE_SWP_SOFT_DIRTY bit while page is in swap.  The _PAGE_PSE was
      chosen for that, it doesn't intersect with swap entry format stored in
      pte.
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      179ef71c
    • A
      memcg: don't initialize kmem-cache destroying work for root caches · 3e6b11df
      Andrey Vagin 提交于
      struct memcg_cache_params has a union.  Different parts of this union
      are used for root and non-root caches.  A part with destroying work is
      used only for non-root caches.
      
      I fixed the same problem in another place v3.9-rc1-16204-gf101a946, but
      didn't notice this one.
      
      This patch fixes the kernel panic:
      
      [   46.848187] BUG: unable to handle kernel paging request at 000000fffffffeb8
      [   46.849026] IP: [<ffffffff811a484c>] kmem_cache_destroy_memcg_children+0x6c/0xc0
      [   46.849092] PGD 0
      [   46.849092] Oops: 0000 [#1] SMP
      ...
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Cc: Glauber Costa <glommer@openvz.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: <stable@vger.kernel.org>    [3.9.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3e6b11df
    • A
      ethernet/arc/arc_emac - fix NAPI "work > weight" warning · 9cff866e
      Alexey Brodkin 提交于
      Initially I improperly set a boundary for maximum number of input
      packets to process on NAPI poll ("work") so it might be more than
      expected amount ("weight").
      
      This was really harmless but seeing WARN_ON_ONCE on every device boot is
      not nice. So trivial fix ("<" instead of "<=") is here.
      Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
      
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Mischa Jonker <mjonker@synopsys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9cff866e
    • L
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 28fbc8b6
      Linus Torvalds 提交于
      Pull scheduler fixes from Ingo Molnar:
       "Docbook fixes that make 99% of the diffstat, plus a oneliner fix"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched: Ensure update_cfs_shares() is called for parents of continuously-running tasks
        sched: Fix some kernel-doc warnings
      28fbc8b6
    • L
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · bfd36050
      Linus Torvalds 提交于
      Pull perf fixes from Ingo Molnar:
       "Two small fixlets"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86: Add Haswell ULT model number used in Macbook Air and other systems
        perf/x86: Fix intel QPI uncore event definitions
      bfd36050
    • S
      perf/arm: Fix armpmu_map_hw_event() · b88a2595
      Stephen Boyd 提交于
      Fix constraint check in armpmu_map_hw_event().
      Reported-and-tested-by: NVince Weaver <vincent.weaver@maine.edu>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b88a2595
    • P
      ip_tunnel: Do not use inner ip-header-id for tunnel ip-header-id. · 4221f405
      Pravin B Shelar 提交于
      Using inner-id for tunnel id is not safe in some rare cases.
      E.g. packets coming from multiple sources entering same tunnel
      can have same id. Therefore on tunnel packet receive we
      could have packets from two different stream but with same
      source and dst IP with same ip-id which could confuse ip packet
      reassembly.
      
      Following patch reverts optimization from commit
      490ab081 (IP_GRE: Fix IP-Identification.)
      
      CC: Jarno Rajahalme <jrajahalme@nicira.com>
      CC: Ansis Atteka <aatteka@nicira.com>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4221f405
    • D
      Merge branch 'bnx2x' · 50f850fd
      David S. Miller 提交于
      Dmitry Kravkov says:
      
      ====================
      Please consider applying the series of bnx2x fixes to net:
      	* statistics may cause FW assert
      	* missing fairness configuration in DCB flow
      	* memory leak in sriov related part
      	* Illegal PTE access
      	* Pagefault crash in shutdown flow with cnic
      v1->v2
      	* fixed sparse error pointed by Joe Perches
      	* added missing signed-off from Sergei Shtylyov
      v2->v3
      	* added missing signed-off from Sergei Shtylyov
      	* fixed formatting from Sergei Shtylyov
      v3->v4
      	* patch 1/6: fixed declaration order
      	* patch 2/6 replaced with: protect flows using set_bit constraints
      v4->v5
      	* patch 2/6: replace proprietary locking with semaphore
      	* droped 1/6: since adds redundant code from Benjamin Poirier
      The following patchset contains four netfilter fixes, they are:
      
      * Fix possible invalid access and mangling of the TCPMSS option in
        xt_TCPMSS. This was spotted by Julian Anastasov.
      
      * Fix possible off by one access and mangling of the TCP packet in
        xt_TCPOPTSTRIP, also spotted by Julian Anastasov.
      
      * Fix possible information leak due to missing initialization of one
        padding field of several structures that are included in nfqueue and
        nflog netlink messages, from Dan Carpenter.
      
      * Fix TCP window tracking with Fast Open, from Yuchung Cheng.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50f850fd
    • Y
      bnx2x: prevent crash in shutdown flow with CNIC · 6ef5a92c
      Yuval Mintz 提交于
      There might be a crash as during shutdown flow CNIC might try
      to access resources already freed by bnx2x.
      Change bnx2x_close() into dev_close() in __bnx2x_remove (shutdown flow)
      to guarantee CNIC is notified of the device's change of status.
      Signed-off-by: NYuval Mintz <yuvalmin@broadcom.com>
      Signed-off-by: NDmitry Kravkov <dmitry@broadcom.com>
      Signed-off-by: NAriel Elior <ariele@broadcom.com>
      Signed-off-by: NEilon Greenstein <eilong@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ef5a92c
    • B
      bnx2x: fix PTE write access error · a6d3a5ba
      Barak Witkowsky 提交于
      PTE write access error  might occur in MF_ALLOWED mode when IOMMU
      is active. The patch adds rmmod HSI indicating to MFW to stop
      running queries which might trigger this failure.
      Signed-off-by: NBarak Witkowsky <barak@broadcom.com>
      Signed-off-by: NDmitry Kravkov <dmitry@broadcom.com>
      Signed-off-by: NAriel Elior <ariele@broadcom.com>
      Signed-off-by: NEilon Greenstein <eilong@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6d3a5ba