1. 02 10月, 2013 17 次提交
    • M
      netfilter: ebt_ulog: fix info leaks · ca0a1067
      Mathias Krause 提交于
      The ulog messages leak heap bytes by the means of padding bytes and
      incompletely filled string arrays. Fix those by memset(0)'ing the
      whole struct before filling it.
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ca0a1067
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · c31eeace
      Linus Torvalds 提交于
      Pull networking changes from David Miller:
      
       1) Multiply in netfilter IPVS can overflow when calculating destination
          weight.  From Simon Kirby.
      
       2) Use after free fixes in IPVS from Julian Anastasov.
      
       3) SFC driver bug fixes from Daniel Pieczko.
      
       4) Memory leak in pcan_usb_core failure paths, from Alexey Khoroshilov.
      
       5) Locking and encapsulation fixes to serial line CAN driver, from
          Andrew Naujoks.
      
       6) Duplex and VF handling fixes to bnx2x driver from Yaniv Rosner,
          Eilon Greenstein, and Ariel Elior.
      
       7) In lapb, if no other packets are outstanding, T1 timeouts actually
          stall things and no packet gets sent.  Fix from Josselin Costanzi.
      
       8) ICMP redirects should not make it to the socket error queues, from
          Duan Jiong.
      
       9) Fix bugs in skge DMA mapping error handling, from Nikulas Patocka.
      
      10) Fix setting of VLAN priority field on via-rhine driver, from Roget
          Luethi.
      
      11) Fix TX stalls and VLAN promisc programming in be2net driver from
          Ajit Khaparde.
      
      12) Packet padding doesn't get handled correctly in new usbnet SG
          support code, from Ming Lei.
      
      13) Fix races in netdevice teardown wrt.  network namespace closing.
          From Eric W.  Biederman.
      
      14) Fix potential missed initialization of net_secret if not TCP
          connections are openned.  From Eric Dumazet.
      
      15) Cinterion PLXX product ID in qmi_wwan driver is wrong, from
          Aleksander Morgado.
      
      16) skb_cow_head() can change skb->data and thus packet header pointers,
          don't use stale ip_hdr reference in ip_tunnel code.
      
      17) Backend state transition handling fixes in xen-netback, from Paul
          Durrant.
      
      18) Packet offset for AH protocol is handled wrong in flow dissector,
          from Eric Dumazet.
      
      19) Taking down an fq packet scheduler instance can leave stale packets
          in the queues, fix from Eric Dumazet.
      
      20) Fix performance regressions introduced by TCP Small Queues.  From
          Eric Dumazet.
      
      21) IPV6 GRE tunneling code calculates max_headroom incorrectly, from
          Hannes Frederic Sowa.
      
      22) Multicast timer handlers in ipv4 and ipv6 can be the last and final
          reference to the ipv4/ipv6 specific network device state, so use the
          reference put that will check and release the object if the
          reference hits zero.  From Salam Noureddine.
      
      23) Fix memory corruption in ip_tunnel driver, and use skb_push()
          instead of __skb_push() so that similar bugs are less hard to find.
          From Steffen Klassert.
      
      24) Add forgotten hookup of rtnl_ops in SIT and ip6tnl drivers, from
          Nicolas Dichtel.
      
      25) fq scheduler doesn't accurately rate limit in certain circumstances,
          from Eric Dumazet.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
        pkt_sched: fq: rate limiting improvements
        ip6tnl: allow to use rtnl ops on fb tunnel
        sit: allow to use rtnl ops on fb tunnel
        ip_tunnel: Remove double unregister of the fallback device
        ip_tunnel_core: Change __skb_push back to skb_push
        ip_tunnel: Add fallback tunnels to the hash lists
        ip_tunnel: Fix a memory corruption in ip_tunnel_xmit
        qlcnic: Fix SR-IOV configuration
        ll_temac: Reset dma descriptors indexes on ndo_open
        skbuff: size of hole is wrong in a comment
        ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put
        ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put
        ethernet: moxa: fix incorrect placement of __initdata tag
        ipv6: gre: correct calculation of max_headroom
        powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file
        Revert "powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file"
        bonding: Fix broken promiscuity reference counting issue
        tcp: TSQ can use a dynamic limit
        dm9601: fix IFF_ALLMULTI handling
        pkt_sched: fq: qdisc dismantle fixes
        ...
      c31eeace
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 0b936842
      Linus Torvalds 提交于
      Pull sparc fix from David Miller:
       "Just a single bug fix to a regression added during some strlcpy()
        conversions"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc64: Fix buggy strlcpy() conversion in ldom_reboot().
      0b936842
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 517bf8fc
      Linus Torvalds 提交于
      Pull vfs lru leak fix from Al Viro:
       "The fix in "super: fix for destroy lrus" didn't - they need to be
        destroyed, all right, but that's the wrong place..."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fs/super.c: fix lru_list leak for real
      517bf8fc
    • L
      Merge git://git.kernel.org/pub/scm/virt/kvm/kvm · 77c4ad8e
      Linus Torvalds 提交于
      Pull two KVM fixes from Gleb Natapov.
      
      * git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: VMX: do not check bit 12 of EPT violation exit qualification when undefined
        ARM: kvm: rename cpu_reset to avoid name clash
      77c4ad8e
    • A
      fs/super.c: fix lru_list leak for real · c2d22ecd
      Al Viro 提交于
      Freeing ->s_{inode,dentry}_lru in deactivate_locked_super() is wrong;
      the right place is destroy_super().  As it is, we leak them if sget()
      decides that new superblock it has allocated (and never shown to
      anybody) isn't needed and should be freed.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c2d22ecd
    • E
      pkt_sched: fq: rate limiting improvements · 0eab5eb7
      Eric Dumazet 提交于
      FQ rate limiting suffers from two problems, reported
      by Steinar :
      
      1) FQ enforces a delay when flow quantum is exhausted in order
      to reduce cpu overhead. But if packets are small, current
      delay computation is slightly wrong, and observed rates can
      be too high.
      
      Steinar had this problem because he disabled TSO and GSO,
      and default FQ quantum is 2*1514.
      
      (Of course, I wish recent TSO auto sizing changes will help
      to not having to disable TSO in the first place)
      
      2) maxrate was not used for forwarded flows (skbs not attached
      to a socket)
      
      Tested:
      
      tc qdisc add dev eth0 root est 1sec 4sec fq maxrate 8Mbit
      netperf -H lpq84 -l 1000 &
      sleep 10 ; tc -s qdisc show dev eth0
      qdisc fq 8003: root refcnt 32 limit 10000p flow_limit 100p buckets 1024
       quantum 3028 initial_quantum 15140 maxrate 8000Kbit
       Sent 16819357 bytes 11258 pkt (dropped 0, overlimits 0 requeues 0)
       rate 7831Kbit 653pps backlog 7570b 5p requeues 0
        44 flows (43 inactive, 1 throttled), next packet delay 2977352 ns
        0 gc, 0 highprio, 5545 throttled
      
      lpq83:~# tcpdump -p -i eth0 host lpq84 -c 12
      09:02:52.079484 IP lpq83 > lpq84: . 1389536928:1389538376(1448) ack 3808678021 win 457 <nop,nop,timestamp 961812 572609068>
      09:02:52.079499 IP lpq83 > lpq84: . 1448:2896(1448) ack 1 win 457 <nop,nop,timestamp 961812 572609068>
      09:02:52.079906 IP lpq84 > lpq83: . ack 2896 win 16384 <nop,nop,timestamp 572609080 961812>
      09:02:52.082568 IP lpq83 > lpq84: . 2896:4344(1448) ack 1 win 457 <nop,nop,timestamp 961815 572609071>
      09:02:52.082581 IP lpq83 > lpq84: . 4344:5792(1448) ack 1 win 457 <nop,nop,timestamp 961815 572609071>
      09:02:52.083017 IP lpq84 > lpq83: . ack 5792 win 16384 <nop,nop,timestamp 572609083 961815>
      09:02:52.085678 IP lpq83 > lpq84: . 5792:7240(1448) ack 1 win 457 <nop,nop,timestamp 961818 572609074>
      09:02:52.085693 IP lpq83 > lpq84: . 7240:8688(1448) ack 1 win 457 <nop,nop,timestamp 961818 572609074>
      09:02:52.086117 IP lpq84 > lpq83: . ack 8688 win 16384 <nop,nop,timestamp 572609086 961818>
      09:02:52.088792 IP lpq83 > lpq84: . 8688:10136(1448) ack 1 win 457 <nop,nop,timestamp 961821 572609077>
      09:02:52.088806 IP lpq83 > lpq84: . 10136:11584(1448) ack 1 win 457 <nop,nop,timestamp 961821 572609077>
      09:02:52.089217 IP lpq84 > lpq83: . ack 11584 win 16384 <nop,nop,timestamp 572609090 961821>
      Reported-by: NSteinar H. Gunderson <sesse@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0eab5eb7
    • N
      ip6tnl: allow to use rtnl ops on fb tunnel · bb814094
      Nicolas Dichtel 提交于
      rtnl ops where introduced by c075b130 ("ip6tnl: advertise tunnel param via
      rtnl"), but I forget to assign rtnl ops to fb tunnels.
      
      Now that it is done, we must remove the explicit call to
      unregister_netdevice_queue(), because  the fallback tunnel is added to the queue
      in ip6_tnl_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
      is valid since commit 0bd87628 ("ip6tnl: add x-netns support")).
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb814094
    • N
      sit: allow to use rtnl ops on fb tunnel · 205983c4
      Nicolas Dichtel 提交于
      rtnl ops where introduced by ba3e3f50 ("sit: advertise tunnel param via
      rtnl"), but I forget to assign rtnl ops to fb tunnels.
      
      Now that it is done, we must remove the explicit call to
      unregister_netdevice_queue(), because  the fallback tunnel is added to the queue
      in sit_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
      is valid since commit 5e6700b3 ("sit: add support of x-netns")).
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      205983c4
    • D
      Merge branch 'ip_tunnel' · 9cb17124
      David S. Miller 提交于
      ip_tunnel bug fixes from Steffen Klassert.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9cb17124
    • S
      ip_tunnel: Remove double unregister of the fallback device · cfe4a536
      Steffen Klassert 提交于
      When queueing the netdevices for removal, we queue the
      fallback device twice in ip_tunnel_destroy(). The first
      time when we queue all netdevices in the namespace and
      then again explicitly. Fix this by removing the explicit
      queueing of the fallback device.
      
      Bug was introduced when network namespace support was added
      with commit 6c742e71 ("ipip: add x-netns support").
      
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfe4a536
    • S
      ip_tunnel_core: Change __skb_push back to skb_push · 78a3694d
      Steffen Klassert 提交于
      Git commit 0e6fbc5b ("ip_tunnels: extend iptunnel_xmit()")
      moved the IP header installation to iptunnel_xmit() and
      changed skb_push() to __skb_push(). This makes possible
      bugs hard to track down, so change it back to skb_push().
      
      Cc: Pravin Shelar <pshelar@nicira.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78a3694d
    • S
      ip_tunnel: Add fallback tunnels to the hash lists · 67013282
      Steffen Klassert 提交于
      Currently we can not update the tunnel parameters of
      the fallback tunnels because we don't find them in the
      hash lists. Fix this by adding them on initialization.
      
      Bug was introduced with commit c5441932
      ("GRE: Refactor GRE tunneling code.")
      
      Cc: Pravin Shelar <pshelar@nicira.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67013282
    • S
      ip_tunnel: Fix a memory corruption in ip_tunnel_xmit · 3e08f4a7
      Steffen Klassert 提交于
      We might extend the used aera of a skb beyond the total
      headroom when we install the ipip header. Fix this by
      calling skb_cow_head() unconditionally.
      
      Bug was introduced with commit c5441932
      ("GRE: Refactor GRE tunneling code.")
      
      Cc: Pravin Shelar <pshelar@nicira.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e08f4a7
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · e024bdc0
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      The following patchset contains Netfilter/IPVS fixes for your net
      tree, they are:
      
      * Fix BUG_ON splat due to malformed TCP packets seen by synproxy, from
        Patrick McHardy.
      
      * Fix possible weight overflow in lblc and lblcr schedulers due to
        32-bits arithmetics, from Simon Kirby.
      
      * Fix possible memory access race in the lblc and lblcr schedulers,
        introduced when it was converted to use RCU, two patches from
        Julian Anastasov.
      
      * Fix hard dependency on CPU 0 when reading per-cpu stats in the
        rate estimator, from Julian Anastasov.
      
      * Fix race that may lead to object use after release, when invoking
        ipvsadm -C && ipvsadm -R, introduced when adding RCU, from Julian
        Anastasov.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e024bdc0
    • M
      qlcnic: Fix SR-IOV configuration · 1ed98ed5
      Manish Chopra 提交于
      o Interface needs to be brought down and up while configuring SR-IOV.
        Protect interface up/down using rtnl_lock()/rtnl_unlock()
      Signed-off-by: NManish Chopra <manish.chopra@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ed98ed5
    • R
      ll_temac: Reset dma descriptors indexes on ndo_open · 7167cf0e
      Ricardo Ribalda 提交于
      The dma descriptors indexes are only initialized on the probe function.
      
      If a packet is on the buffer when temac_stop is called, the dma
      descriptors indexes can be left on a incorrect state where no other
      package can be sent.
      
      So an interface could be left in an usable state after ifdow/ifup.
      
      This patch makes sure that the descriptors indexes are in a proper
      status when the device is open.
      Signed-off-by: NRicardo Ribalda Delgado <ricardo.ribalda@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7167cf0e
  2. 01 10月, 2013 23 次提交
    • N
      skbuff: size of hole is wrong in a comment · 45906723
      Nicolas Dichtel 提交于
      Since commit c93bdd0e ("netvm: allow skb allocation to use PFMEMALLOC
      reserves"), hole size is one bit less than what is written in the comment.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45906723
    • D
    • S
      ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put · 9260d3e1
      Salam Noureddine 提交于
      It is possible for the timer handlers to run after the call to
      ipv6_mc_down so use in6_dev_put instead of __in6_dev_put in the
      handler function in order to do proper cleanup when the refcnt
      reaches 0. Otherwise, the refcnt can reach zero without the
      inet6_dev being destroyed and we end up leaking a reference to
      the net_device and see messages like the following,
      
      unregister_netdevice: waiting for eth0 to become free. Usage count = 1
      
      Tested on linux-3.4.43.
      Signed-off-by: NSalam Noureddine <noureddine@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9260d3e1
    • S
      ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put · e2401654
      Salam Noureddine 提交于
      It is possible for the timer handlers to run after the call to
      ip_mc_down so use in_dev_put instead of __in_dev_put in the handler
      function in order to do proper cleanup when the refcnt reaches 0.
      Otherwise, the refcnt can reach zero without the in_device being
      destroyed and we end up leaking a reference to the net_device and
      see messages like the following,
      
      unregister_netdevice: waiting for eth0 to become free. Usage count = 1
      
      Tested on linux-3.4.43.
      Signed-off-by: NSalam Noureddine <noureddine@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2401654
    • B
      ethernet: moxa: fix incorrect placement of __initdata tag · 437a3ae1
      Bartlomiej Zolnierkiewicz 提交于
      __initdata tag should be placed between the variable name and equal
      sign for the variable to be placed in the intended .init.data section.
      
      In this particular case __initdata is incorrect as moxart_mac_driver
      can be used after the driver gets initialized.
      
      Also while at it static-ize moxart_mac_driver.
      Signed-off-by: NBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      437a3ae1
    • H
      ipv6: gre: correct calculation of max_headroom · 3da812d8
      Hannes Frederic Sowa 提交于
      gre_hlen already accounts for sizeof(struct ipv6_hdr) + gre header,
      so initialize max_headroom to zero. Otherwise the
      
      	if (encap_limit >= 0) {
      		max_headroom += 8;
      		mtu -= 8;
      	}
      
      increments an uninitialized variable before max_headroom was reset.
      
      Found with coverity: 728539
      
      Cc: Dmitry Kozlov <xeb@mail.ru>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3da812d8
    • A
      powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file · e58f6f4f
      Aida Mynzhasova 提交于
      Currently IEEE 1588 timer reference clock source is determined through
      hard-coded value in gianfar_ptp driver. This patch allows to select ptp
      clock source by means of device tree file node.
      
      For instance:
      
      	fsl,cksel = <0>;
      
      for using external (TSEC_TMR_CLK input) high precision timer
      reference clock.
      
      Other acceptable values:
      
      	<1> : eTSEC system clock
      	<2> : eTSEC1 transmit clock
      	<3> : RTC clock input
      
      When this attribute isn't used, eTSEC system clock will serve as
      IEEE 1588 timer reference clock.
      Signed-off-by: NAida Mynzhasova <aida.mynzhasova@skitlab.ru>
      Acked-by: NKumar Gala <galak@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e58f6f4f
    • D
      Revert "powerpc/83xx: gianfar_ptp: select 1588 clock source through dts file" · 3f3f0960
      David S. Miller 提交于
      This reverts commit 894116bd.
      
      I applied the wrong version of this patch, correct
      version coming up.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f3f0960
    • N
      bonding: Fix broken promiscuity reference counting issue · 5a0068de
      Neil Horman 提交于
      Recently grabbed this report:
      https://bugzilla.redhat.com/show_bug.cgi?id=1005567
      
      Of an issue in which the bonding driver, with an attached vlan encountered the
      following errors when bond0 was taken down and back up:
      
      dummy1: promiscuity touches roof, set promiscuity failed. promiscuity feature of
      device might be broken.
      
      The error occurs because, during __bond_release_one, if we release our last
      slave, we take on a random mac address and issue a NETDEV_CHANGEADDR
      notification.  With an attached vlan, the vlan may see that the vlan and bond
      mac address were in sync, but no longer are.  This triggers a call to dev_uc_add
      and dev_set_rx_mode, which enables IFF_PROMISC on the bond device.  Then, when
      we complete __bond_release_one, we use the current state of the bond flags to
      determine if we should decrement the promiscuity of the releasing slave.  But
      since the bond changed promiscuity state during the release operation, we
      incorrectly decrement the slave promisc count when it wasn't in promiscuous mode
      to begin with, causing the above error
      
      Fix is pretty simple, just cache the bonding flags at the start of the function
      and use those when determining the need to set promiscuity.
      
      This is also needed for the ALLMULTI flag
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: Mark Wu <wudxw@linux.vnet.ibm.com>
      CC: "David S. Miller" <davem@davemloft.net>
      Reported-by: NMark Wu <wudxw@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a0068de
    • E
      tcp: TSQ can use a dynamic limit · c9eeec26
      Eric Dumazet 提交于
      When TCP Small Queues was added, we used a sysctl to limit amount of
      packets queues on Qdisc/device queues for a given TCP flow.
      
      Problem is this limit is either too big for low rates, or too small
      for high rates.
      
      Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO
      auto sizing, it can better control number of packets in Qdisc/device
      queues.
      
      New limit is two packets or at least 1 to 2 ms worth of packets.
      
      Low rates flows benefit from this patch by having even smaller
      number of packets in queues, allowing for faster recovery,
      better RTT estimations.
      
      High rates flows benefit from this patch by allowing more than 2 packets
      in flight as we had reports this was a limiting factor to reach line
      rate. [ In particular if TX completion is delayed because of coalescing
      parameters ]
      
      Example for a single flow on 10Gbp link controlled by FQ/pacing
      
      14 packets in flight instead of 2
      
      $ tc -s -d qd
      qdisc fq 8001: dev eth0 root refcnt 32 limit 10000p flow_limit 100p
      buckets 1024 quantum 3028 initial_quantum 15140
       Sent 1168459366606 bytes 771822841 pkt (dropped 0, overlimits 0
      requeues 6822476)
       rate 9346Mbit 771713pps backlog 953820b 14p requeues 6822476
        2047 flow, 2046 inactive, 1 throttled, delay 15673 ns
        2372 gc, 0 highprio, 0 retrans, 9739249 throttled, 0 flows_plimit
      
      Note that sk_pacing_rate is currently set to twice the actual rate, but
      this might be refined in the future when a flow is in congestion
      avoidance.
      
      Additional change : skb->destructor should be set to tcp_wfree().
      
      A future patch (for linux 3.13+) might remove tcp_limit_output_bytes
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Wei Liu <wei.liu2@citrix.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9eeec26
    • L
      Merge tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · f9273188
      Linus Torvalds 提交于
      Pull NFS client bugfixes from Trond Myklebust:
       - Stable fix for Oopses in the pNFS files layout driver
       - Fix a regression when doing a non-exclusive file create on NFSv4.x
       - NFSv4.1 security negotiation fixes when looking up the root
         filesystem
       - Fix a memory ordering issue in the pNFS files layout driver
      
      * tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        NFS: Give "flavor" an initial value to fix a compile warning
        NFSv4.1: try SECINFO_NO_NAME flavs until one works
        NFSv4.1: Ensure memory ordering between nfs4_ds_connect and nfs4_fl_prepare_ds
        NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails
        NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem method
      f9273188
    • P
      dm9601: fix IFF_ALLMULTI handling · bf0ea638
      Peter Korsgaard 提交于
      Pass-all-multicast is controlled by bit 3 in RX control, not bit 2
      (pass undersized frames).
      Reported-by: NJoseph Chang <joseph_chang@davicom.com.tw>
      Signed-off-by: NPeter Korsgaard <peter@korsgaard.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf0ea638
    • L
      Merge branch 'akpm' (fixes from Andrew Morton) · 522d6d38
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton.
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
        pidns: fix free_pid() to handle the first fork failure
        ipc,msg: prevent race with rmid in msgsnd,msgrcv
        ipc/sem.c: update sem_otime for all operations
        mm/hwpoison: fix the lack of one reference count against poisoned page
        mm/hwpoison: fix false report on 2nd attempt at page recovery
        mm/hwpoison: fix test for a transparent huge page
        mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood
        block: change config option name for cmdline partition parsing
        mm/mlock.c: prevent walking off the end of a pagetable in no-pmd configuration
        mm: avoid reinserting isolated balloon pages into LRU lists
        arch/parisc/mm/fault.c: fix uninitialized variable usage
        include/asm-generic/vtime.h: avoid zero-length file
        nilfs2: fix issue with race condition of competition between segments for dirty blocks
        Documentation/kernel-parameters.txt: replace kernelcore with Movable
        mm/bounce.c: fix a regression where MS_SNAP_STABLE (stable pages snapshotting) was ignored
        kernel/kmod.c: check for NULL in call_usermodehelper_exec()
        ipc/sem.c: synchronize the proc interface
        ipc/sem.c: optimize sem_lock()
        ipc/sem.c: fix race in sem_lock()
        mm/compaction.c: periodically schedule when freeing pages
        ...
      522d6d38
    • O
      pidns: fix free_pid() to handle the first fork failure · 314a8ad0
      Oleg Nesterov 提交于
      "case 0" in free_pid() assumes that disable_pid_allocation() should
      clear PIDNS_HASH_ADDING before the last pid goes away.
      
      However this doesn't happen if the first fork() fails to create the
      child reaper which should call disable_pid_allocation().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      314a8ad0
    • D
      ipc,msg: prevent race with rmid in msgsnd,msgrcv · 4271b05a
      Davidlohr Bueso 提交于
      This fixes a race in both msgrcv() and msgsnd() between finding the msg
      and actually dealing with the queue, as another thread can delete shmid
      underneath us if we are preempted before acquiring the
      kern_ipc_perm.lock.
      
      Manfred illustrates this nicely:
      
      Assume a preemptible kernel that is preempted just after
      
          msq = msq_obtain_object_check(ns, msqid)
      
      in do_msgrcv().  The only lock that is held is rcu_read_lock().
      
      Now the other thread processes IPC_RMID.  When the first task is
      resumed, then it will happily wait for messages on a deleted queue.
      
      Fix this by checking for if the queue has been deleted after taking the
      lock.
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Reported-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: <stable@vger.kernel.org> 	[3.11]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4271b05a
    • M
      ipc/sem.c: update sem_otime for all operations · 0e8c6656
      Manfred Spraul 提交于
      In commit 0a2b9d4c ("ipc/sem.c: move wake_up_process out of the
      spinlock section"), the update of semaphore's sem_otime(last semop time)
      was moved to one central position (do_smart_update).
      
      But since do_smart_update() is only called for operations that modify
      the array, this means that wait-for-zero semops do not update sem_otime
      anymore.
      
      The fix is simple:
      Non-alter operations must update sem_otime.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Reported-by: NJia He <jiakernel@gmail.com>
      Tested-by: NJia He <jiakernel@gmail.com>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0e8c6656
    • W
      mm/hwpoison: fix the lack of one reference count against poisoned page · fb31ba30
      Wanpeng Li 提交于
      The lack of one reference count against poisoned page for hwpoison_inject
      w/o hwpoison_filter enabled result in hwpoison detect -1 users still
      referenced the page, however, the number should be 0 except the poison
      handler held one after successfully unmap.  This patch fix it by hold one
      referenced count against poisoned page for hwpoison_inject w/ and w/o
      hwpoison_filter enabled.
      
      Before patch:
      
      [   71.902112] Injecting memory failure at pfn 224706
      [   71.902137] MCE 0x224706: dirty LRU page recovery: Failed
      [   71.902138] MCE 0x224706: dirty LRU page still referenced by -1 users
      
      After patch:
      
      [   94.710860] Injecting memory failure at pfn 215b68
      [   94.710885] MCE 0x215b68: dirty LRU page recovery: Recovered
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fb31ba30
    • W
      mm/hwpoison: fix false report on 2nd attempt at page recovery · 2d421acd
      Wanpeng Li 提交于
      If the page is poisoned by software injection w/ MF_COUNT_INCREASED
      flag, there is a false report during the 2nd attempt at page recovery
      which is not truthful.
      
      This patch fixes it by reporting the first attempt to try free buddy
      page recovery if MF_COUNT_INCREASED is set.
      
      Before patch:
      
      [  346.332041] Injecting memory failure at pfn 200010
      [  346.332189] MCE 0x200010: free buddy, 2nd try page recovery: Delayed
      
      After patch:
      
      [  297.742600] Injecting memory failure at pfn 200010
      [  297.742941] MCE 0x200010: free buddy page recovery: Delayed
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d421acd
    • W
      mm/hwpoison: fix test for a transparent huge page · e76d30e2
      Wanpeng Li 提交于
      PageTransHuge() can't guarantee the page is a transparent huge page
      since it returns true for both transparent huge and hugetlbfs pages.
      
      This patch fixes it by checking the page is also !hugetlbfs page.
      
      Before patch:
      
      [  121.571128] Injecting memory failure at pfn 23a200
      [  121.571141] MCE 0x23a200: huge page recovery: Delayed
      [  140.355100] MCE: Memory failure is now running on 0x23a200
      
      After patch:
      
      [   94.290793] Injecting memory failure at pfn 23a000
      [   94.290800] MCE 0x23a000: huge page recovery: Delayed
      [  105.722303] MCE: Software-unpoisoned page 0x23a000
      Signed-off-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e76d30e2
    • W
      mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood · 20cb6cab
      Wanpeng Li 提交于
      madvise_hwpoison won't check if the page is small page or huge page and
      traverses in small page granularity against the range unconditionally,
      which result in a printk flood "MCE xxx: already hardware poisoned" if
      the page is a huge page.
      
      This patch fixes it by using compound_order(compound_head(page)) for
      huge page iterator.
      
      Testcase:
      
      #define _GNU_SOURCE
      #include <stdlib.h>
      #include <stdio.h>
      #include <sys/mman.h>
      #include <unistd.h>
      #include <fcntl.h>
      #include <sys/types.h>
      #include <errno.h>
      
      #define PAGES_TO_TEST 3
      #define PAGE_SIZE	4096 * 512
      
      int main(void)
      {
      	char *mem;
      	int i;
      
      	mem = mmap(NULL, PAGES_TO_TEST * PAGE_SIZE,
      			PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, 0, 0);
      
      	if (madvise(mem, PAGES_TO_TEST * PAGE_SIZE, MADV_HWPOISON) == -1)
      		return -1;
      
      	munmap(mem, PAGES_TO_TEST * PAGE_SIZE);
      
      	return 0;
      }
      Signed-off-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20cb6cab
    • P
      block: change config option name for cmdline partition parsing · 080506ad
      Paul Gortmaker 提交于
      Recently commit bab55417 ("block: support embedded device command
      line partition") introduced CONFIG_CMDLINE_PARSER.  However, that name
      is too generic and sounds like it enables/disables generic kernel boot
      arg processing, when it really is block specific.
      
      Before this option becomes a part of a full/final release, add the BLK_
      prefix to it so that it is clear in absence of any other context that it
      is block specific.
      
      In addition, fix up the following less critical items:
       - help text was not really at all helpful.
       - index file for Documentation was not updated
       - add the new arg to Documentation/kernel-parameters.txt
       - clarify wording in source comments
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Cai Zhiyong <caizhiyong@huawei.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      080506ad
    • V
      mm/mlock.c: prevent walking off the end of a pagetable in no-pmd configuration · eadb41ae
      Vlastimil Babka 提交于
      The function __munlock_pagevec_fill() introduced in commit 7a8010cd
      ("mm: munlock: manual pte walk in fast path instead of
      follow_page_mask()") uses pmd_addr_end() for restricting its operation
      within current page table.
      
      This is insufficient on architectures/configurations where pmd is folded
      and pmd_addr_end() just returns the end of the full range to be walked.
      In this case, it allows pte++ to walk off the end of a page table
      resulting in unpredictable behaviour.
      
      This patch fixes the function by using pgd_addr_end() and pud_addr_end()
      before pmd_addr_end(), which will yield correct page table boundary on
      all configurations.  This is similar to what existing page walkers do
      when walking each level of the page table.
      
      Additionaly, the patch clarifies a comment for get_locked_pte() call in the
      function.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Reviewed-by: NBob Liu <bob.liu@oracle.com>
      Cc: Jörn Engel <joern@logfs.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eadb41ae
    • R
      mm: avoid reinserting isolated balloon pages into LRU lists · 117aad1e
      Rafael Aquini 提交于
      Isolated balloon pages can wrongly end up in LRU lists when
      migrate_pages() finishes its round without draining all the isolated
      page list.
      
      The same issue can happen when reclaim_clean_pages_from_list() tries to
      reclaim pages from an isolated page list, before migration, in the CMA
      path.  Such balloon page leak opens a race window against LRU lists
      shrinkers that leads us to the following kernel panic:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
        IP: [<ffffffff810c2625>] shrink_page_list+0x24e/0x897
        PGD 3cda2067 PUD 3d713067 PMD 0
        Oops: 0000 [#1] SMP
        CPU: 0 PID: 340 Comm: kswapd0 Not tainted 3.12.0-rc1-22626-g4367597 #87
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        RIP: shrink_page_list+0x24e/0x897
        RSP: 0000:ffff88003da499b8  EFLAGS: 00010286
        RAX: 0000000000000000 RBX: ffff88003e82bd60 RCX: 00000000000657d5
        RDX: 0000000000000000 RSI: 000000000000031f RDI: ffff88003e82bd40
        RBP: ffff88003da49ab0 R08: 0000000000000001 R09: 0000000081121a45
        R10: ffffffff81121a45 R11: ffff88003c4a9a28 R12: ffff88003e82bd40
        R13: ffff88003da0e800 R14: 0000000000000001 R15: ffff88003da49d58
        FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000067d9000 CR3: 000000003ace5000 CR4: 00000000000407b0
        Call Trace:
          shrink_inactive_list+0x240/0x3de
          shrink_lruvec+0x3e0/0x566
          __shrink_zone+0x94/0x178
          shrink_zone+0x3a/0x82
          balance_pgdat+0x32a/0x4c2
          kswapd+0x2f0/0x372
          kthread+0xa2/0xaa
          ret_from_fork+0x7c/0xb0
        Code: 80 7d 8f 01 48 83 95 68 ff ff ff 00 4c 89 e7 e8 5a 7b 00 00 48 85 c0 49 89 c5 75 08 80 7d 8f 00 74 3e eb 31 48 8b 80 18 01 00 00 <48> 8b 74 0d 48 8b 78 30 be 02 00 00 00 ff d2 eb
        RIP  [<ffffffff810c2625>] shrink_page_list+0x24e/0x897
         RSP <ffff88003da499b8>
        CR2: 0000000000000028
        ---[ end trace 703d2451af6ffbfd ]---
        Kernel panic - not syncing: Fatal exception
      
      This patch fixes the issue, by assuring the proper tests are made at
      putback_movable_pages() & reclaim_clean_pages_from_list() to avoid
      isolated balloon pages being wrongly reinserted in LRU lists.
      
      [akpm@linux-foundation.org: clarify awkward comment text]
      Signed-off-by: NRafael Aquini <aquini@redhat.com>
      Reported-by: NLuiz Capitulino <lcapitulino@redhat.com>
      Tested-by: NLuiz Capitulino <lcapitulino@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      117aad1e