1. 27 3月, 2018 15 次提交
    • E
      net: fix possible out-of-bound read in skb_network_protocol() · 1dfe82eb
      Eric Dumazet 提交于
      skb mac header is not necessarily set at the time skb_network_protocol()
      is called. Use skb->data instead.
      
      BUG: KASAN: slab-out-of-bounds in skb_network_protocol+0x46b/0x4b0 net/core/dev.c:2739
      Read of size 2 at addr ffff8801b3097a0b by task syz-executor5/14242
      
      CPU: 1 PID: 14242 Comm: syz-executor5 Not tainted 4.16.0-rc6+ #280
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x24d lib/dump_stack.c:53
       print_address_description+0x73/0x250 mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report+0x23c/0x360 mm/kasan/report.c:412
       __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:443
       skb_network_protocol+0x46b/0x4b0 net/core/dev.c:2739
       harmonize_features net/core/dev.c:2924 [inline]
       netif_skb_features+0x509/0x9b0 net/core/dev.c:3011
       validate_xmit_skb+0x81/0xb00 net/core/dev.c:3084
       validate_xmit_skb_list+0xbf/0x120 net/core/dev.c:3142
       packet_direct_xmit+0x117/0x790 net/packet/af_packet.c:256
       packet_snd net/packet/af_packet.c:2944 [inline]
       packet_sendmsg+0x3aed/0x60b0 net/packet/af_packet.c:2969
       sock_sendmsg_nosec net/socket.c:629 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:639
       ___sys_sendmsg+0x767/0x8b0 net/socket.c:2047
       __sys_sendmsg+0xe5/0x210 net/socket.c:2081
      
      Fixes: 19acc327 ("gso: Handle Trans-Ether-Bridging protocol in skb_network_protocol()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Reported-by: NReported-by: syzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1dfe82eb
    • G
      net-usb: add qmi_wwan if on lte modem wistron neweb d18q1 · d4c4bc11
      Giuseppe Lippolis 提交于
      This modem is embedded on dlink dwr-921 router.
          The oem configuration states:
      
          T:  Bus=02 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  2 Spd=480 MxCh= 0
          D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
          P:  Vendor=1435 ProdID=0918 Rev= 2.32
          S:  Manufacturer=Android
          S:  Product=Android
          S:  SerialNumber=0123456789ABCDEF
          C:* #Ifs= 7 Cfg#= 1 Atr=80 MxPwr=500mA
          I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
          E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
          E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
          I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
          E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
          E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
          I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
          E:  Ad=84(I) Atr=03(Int.) MxPS=  64 Ivl=32ms
          E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
          E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
          I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
          E:  Ad=86(I) Atr=03(Int.) MxPS=  64 Ivl=32ms
          E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
          E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
          I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
          E:  Ad=88(I) Atr=03(Int.) MxPS=  64 Ivl=32ms
          E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
          E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
          I:* If#= 5 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
          E:  Ad=8a(I) Atr=03(Int.) MxPS=  64 Ivl=32ms
          E:  Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
          E:  Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
          I:* If#= 6 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=(none)
          E:  Ad=8b(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
          E:  Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=125us
      
      Tested on openwrt distribution
      Signed-off-by: NGiuseppe Lippolis <giu.lippolis@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4c4bc11
    • D
      Merge tag 'batadv-net-for-davem-20180326' of git://git.open-mesh.org/linux-merge · d7785b59
      David S. Miller 提交于
      Simon Wunderlich says:
      
      ====================
      Here are some batman-adv bugfixes:
      
       - fix multicast-via-unicast transmissions for AP isolation and gateway
         extension, by Linus Luessing (2 patches)
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7785b59
    • S
      net: dsa: mt7530: fix module autoloading for OF platform drivers · 3c82b372
      Sean Wang 提交于
      It's required to create a modules.alias via MODULE_DEVICE_TABLE helper
      for the OF platform driver. Otherwise, module autoloading cannot work.
      Signed-off-by: NSean Wang <sean.wang@mediatek.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c82b372
    • S
      net: dsa: mt7530: remove redundant MODULE_ALIAS entries · 1c82c9e1
      Sean Wang 提交于
      MODULE_ALIAS exports information to allow the module to be auto-loaded at
      boot for the drivers registered using legacy platform registration.
      
      However, currently the driver is always used by DT-only platform,
      MODULE_ALIAS is redundant and should be removed properly.
      Signed-off-by: NSean Wang <sean.wang@mediatek.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c82c9e1
    • J
      vhost_net: add missing lock nesting notation · aaa3149b
      Jason Wang 提交于
      We try to hold TX virtqueue mutex in vhost_net_rx_peek_head_len()
      after RX virtqueue mutex is held in handle_rx(). This requires an
      appropriate lock nesting notation to calm down deadlock detector.
      
      Fixes: 03088137 ("vhost_net: basic polling support")
      Reported-by: syzbot+7f073540b1384a614e09@syzkaller.appspotmail.com
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aaa3149b
    • T
      net/usb/qmi_wwan.c: Add USB id for lt4120 modem · f3d801ba
      Torsten Hilbrich 提交于
      This is needed to support the modem found in HP EliteBook 820 G3.
      Signed-off-by: NTorsten Hilbrich <torsten.hilbrich@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3d801ba
    • X
      team: move dev_mc_sync after master_upper_dev_link in team_port_add · 982cf3b3
      Xin Long 提交于
      The same fix as in 'bonding: move dev_mc_sync after master_upper_dev_link
      in bond_enslave' is needed for team driver.
      
      The panic can be reproduced easily:
      
        ip link add team1 type team
        ip link set team1 up
        ip link add link team1 vlan1 type vlan id 80
        ip link set vlan1 master team1
      
      Fixes: cb41c997 ("team: team should sync the port's uc/mc addrs when add a port")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      982cf3b3
    • D
      Merge branch 'bond-hwaddr-sync-fixes' · e49c78f4
      David S. Miller 提交于
      Xin Long says:
      
      ====================
      bonding: a bunch of fixes for dev hwaddr sync in bond_enslave
      
      This patchset is mainly to fix a crash when adding vlan as slave of
      bond which is also the parent link in patch 2/3,  and also fix some
      err process problems in bond_enslave in patch 1/3 and 3/3.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e49c78f4
    • X
      bonding: process the err returned by dev_set_allmulti properly in bond_enslave · 9f5a90c1
      Xin Long 提交于
      When dev_set_promiscuity(1) succeeds but dev_set_allmulti(1) fails,
      dev_set_promiscuity(-1) should be done before going to the err path.
      Otherwise, dev->promiscuity will leak.
      
      Fixes: 7e1a1ac1 ("bonding: Check return of dev_set_promiscuity/allmulti")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f5a90c1
    • X
      bonding: move dev_mc_sync after master_upper_dev_link in bond_enslave · ae42cc62
      Xin Long 提交于
      Beniamino found a crash when adding vlan as slave of bond which is also
      the parent link:
      
        ip link add bond1 type bond
        ip link set bond1 up
        ip link add link bond1 vlan1 type vlan id 80
        ip link set vlan1 master bond1
      
      The call trace is as below:
      
        [<ffffffffa850842a>] queued_spin_lock_slowpath+0xb/0xf
        [<ffffffffa8515680>] _raw_spin_lock+0x20/0x30
        [<ffffffffa83f6f07>] dev_mc_sync+0x37/0x80
        [<ffffffffc08687dc>] vlan_dev_set_rx_mode+0x1c/0x30 [8021q]
        [<ffffffffa83efd2a>] __dev_set_rx_mode+0x5a/0xa0
        [<ffffffffa83f7138>] dev_mc_sync_multiple+0x78/0x80
        [<ffffffffc084127c>] bond_enslave+0x67c/0x1190 [bonding]
        [<ffffffffa8401909>] do_setlink+0x9c9/0xe50
        [<ffffffffa8403bf2>] rtnl_newlink+0x522/0x880
        [<ffffffffa8403ff7>] rtnetlink_rcv_msg+0xa7/0x260
        [<ffffffffa8424ecb>] netlink_rcv_skb+0xab/0xc0
        [<ffffffffa83fe498>] rtnetlink_rcv+0x28/0x30
        [<ffffffffa8424850>] netlink_unicast+0x170/0x210
        [<ffffffffa8424bf8>] netlink_sendmsg+0x308/0x420
        [<ffffffffa83cc396>] sock_sendmsg+0xb6/0xf0
      
      This is actually a dead lock caused by sync slave hwaddr from master when
      the master is the slave's 'slave'. This dead loop check is actually done
      by netdev_master_upper_dev_link. However, Commit 1f718f0f ("bonding:
      populate neighbour's private on enslave") moved it after dev_mc_sync.
      
      This patch is to fix it by moving dev_mc_sync after master_upper_dev_link,
      so that this loop check would be earlier than dev_mc_sync. It also moves
      if (mode == BOND_MODE_8023AD) into if (!bond_uses_primary) clause as an
      improvement.
      
      Note team driver also has this issue, I will fix it in another patch.
      
      Fixes: 1f718f0f ("bonding: populate neighbour's private on enslave")
      Reported-by: NBeniamino Galvani <bgalvani@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae42cc62
    • X
      bonding: fix the err path for dev hwaddr sync in bond_enslave · 5c78f6bf
      Xin Long 提交于
      vlan_vids_add_by_dev is called right after dev hwaddr sync, so on
      the err path it should unsync dev hwaddr. Otherwise, the slave
      dev's hwaddr will never be unsync when this err happens.
      
      Fixes: 1ff412ad ("bonding: change the bond's vlan syncing functions with the standard ones")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: NAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c78f6bf
    • J
      net: sched, fix OOO packets with pfifo_fast · eb82a994
      John Fastabend 提交于
      After the qdisc lock was dropped in pfifo_fast we allow multiple
      enqueue threads and dequeue threads to run in parallel. On the
      enqueue side the skb bit ooo_okay is used to ensure all related
      skbs are enqueued in-order. On the dequeue side though there is
      no similar logic. What we observe is with fewer queues than CPUs
      it is possible to re-order packets when two instances of
      __qdisc_run() are running in parallel. Each thread will dequeue
      a skb and then whichever thread calls the ndo op first will
      be sent on the wire. This doesn't typically happen because
      qdisc_run() is usually triggered by the same core that did the
      enqueue. However, drivers will trigger __netif_schedule()
      when queues are transitioning from stopped to awake using the
      netif_tx_wake_* APIs. When this happens netif_schedule() calls
      qdisc_run() on the same CPU that did the netif_tx_wake_* which
      is usually done in the interrupt completion context. This CPU
      is selected with the irq affinity which is unrelated to the
      enqueue operations.
      
      To resolve this we add a RUNNING bit to the qdisc to ensure
      only a single dequeue per qdisc is running. Enqueue and dequeue
      operations can still run in parallel and also on multi queue
      NICs we can still have a dequeue in-flight per qdisc, which
      is typically per CPU.
      
      Fixes: c5ad119f ("net: sched: pfifo_fast use skb_array")
      Reported-by: NJakob Unterwurzacher <jakob.unterwurzacher@theobroma-systems.com>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb82a994
    • P
      net: qmi_wwan: add BroadMobi BM806U 2020:2033 · 74398925
      Pawel Dembicki 提交于
      BroadMobi BM806U is an Qualcomm MDM9225 based 3G/4G modem.
      Tested hardware BM806U is mounted on D-Link DWR-921-C3 router.
      The USB id is added to qmi_wwan.c to allow QMI communication with
      the BM806U.
      
      Tested on 4.14 kernel and OpenWRT.
      Signed-off-by: NPawel Dembicki <paweldembicki@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74398925
    • S
      Documentation/isdn: check and fix dead links ... · 16e693c5
      Sanjeev Gupta 提交于
      and switch to https where possible.
      
      All links have been eyeballed to verify that the domains have
      not changed, etc.
      Signed-off-by: NSanjeev Gupta <ghane0@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16e693c5
  2. 26 3月, 2018 5 次提交
    • D
      Merge tag 'wireless-drivers-for-davem-2018-03-24' of... · c87308b1
      David S. Miller 提交于
      Merge tag 'wireless-drivers-for-davem-2018-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for 4.16
      
      Some fixes for 4.16, only for iwlwifi and brcmfmac this time. All
      pretty small.
      
      iwlwifi
      
      * fix an issue with the multicast queue
      
      * fix IGTK handling
      
      * fix some missing return value checks
      
      * add support for a HW workaround for issues on some platforms
      
      * a couple of fixes for channel-switch
      
      * a few fixes for the aggregation handling code
      
      brcmfmac
      
      * drop Inter-Access Point Protocol packets by default
      
      * fix check for ISO3166 regulatory code
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c87308b1
    • P
      ipv6: the entire IPv6 header chain must fit the first fragment · 10b8a3de
      Paolo Abeni 提交于
      While building ipv6 datagram we currently allow arbitrary large
      extheaders, even beyond pmtu size. The syzbot has found a way
      to exploit the above to trigger the following splat:
      
      kernel BUG at ./include/linux/skbuff.h:2073!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
          (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 4230 Comm: syzkaller672661 Not tainted 4.16.0-rc2+ #326
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:__skb_pull include/linux/skbuff.h:2073 [inline]
      RIP: 0010:__ip6_make_skb+0x1ac8/0x2190 net/ipv6/ip6_output.c:1636
      RSP: 0018:ffff8801bc18f0f0 EFLAGS: 00010293
      RAX: ffff8801b17400c0 RBX: 0000000000000738 RCX: ffffffff84f01828
      RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8801b415ac18
      RBP: ffff8801bc18f360 R08: ffff8801b4576844 R09: 0000000000000000
      R10: ffff8801bc18f380 R11: ffffed00367aee4e R12: 00000000000000d6
      R13: ffff8801b415a740 R14: dffffc0000000000 R15: ffff8801b45767c0
      FS:  0000000001535880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000002000b000 CR3: 00000001b4123001 CR4: 00000000001606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        ip6_finish_skb include/net/ipv6.h:969 [inline]
        udp_v6_push_pending_frames+0x269/0x3b0 net/ipv6/udp.c:1073
        udpv6_sendmsg+0x2a96/0x3400 net/ipv6/udp.c:1343
        inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:764
        sock_sendmsg_nosec net/socket.c:630 [inline]
        sock_sendmsg+0xca/0x110 net/socket.c:640
        ___sys_sendmsg+0x320/0x8b0 net/socket.c:2046
        __sys_sendmmsg+0x1ee/0x620 net/socket.c:2136
        SYSC_sendmmsg net/socket.c:2167 [inline]
        SyS_sendmmsg+0x35/0x60 net/socket.c:2162
        do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x42/0xb7
      RIP: 0033:0x4404c9
      RSP: 002b:00007ffdce35f948 EFLAGS: 00000217 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004404c9
      RDX: 0000000000000003 RSI: 0000000020001f00 RDI: 0000000000000003
      RBP: 00000000006cb018 R08: 00000000004002c8 R09: 00000000004002c8
      R10: 0000000020000080 R11: 0000000000000217 R12: 0000000000401df0
      R13: 0000000000401e80 R14: 0000000000000000 R15: 0000000000000000
      Code: ff e8 1d 5e b9 fc e9 15 e9 ff ff e8 13 5e b9 fc e9 44 e8 ff ff e8 29
      5e b9 fc e9 c0 e6 ff ff e8 3f f3 80 fc 0f 0b e8 38 f3 80 fc <0f> 0b 49 8d
      87 80 00 00 00 4d 8d 87 84 00 00 00 48 89 85 20 fe
      RIP: __skb_pull include/linux/skbuff.h:2073 [inline] RSP: ffff8801bc18f0f0
      RIP: __ip6_make_skb+0x1ac8/0x2190 net/ipv6/ip6_output.c:1636 RSP:
      ffff8801bc18f0f0
      
      As stated by RFC 7112 section 5:
      
         When a host fragments an IPv6 datagram, it MUST include the entire
         IPv6 Header Chain in the First Fragment.
      
      So this patch addresses the issue dropping datagrams with excessive
      extheader length. It also updates the error path to report to the
      calling socket nonnegative pmtu values.
      
      The issue apparently predates git history.
      
      v1 -> v2: cleanup error path, as per Eric's suggestion
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: syzbot+91e6f9932ff122fa4410@syzkaller.appspotmail.com
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10b8a3de
    • A
      netlink: make sure nladdr has correct size in netlink_connect() · 78802879
      Alexander Potapenko 提交于
      KMSAN reports use of uninitialized memory in the case when |alen| is
      smaller than sizeof(struct sockaddr_nl), and therefore |nladdr| isn't
      fully copied from the userspace.
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78802879
    • R
      lan78xx: Set ASD in MAC_CR when EEE is enabled. · e69647a1
      Raghuram Chary J 提交于
      Description:
      EEE does not work with lan7800 when AutoSpeed is not set.
      (This can happen when EEPROM is not populated or configured incorrectly)
      
      Root-Cause:
      When EEE is enabled, the mac config register ASD is not set
      i.e. in default state, causing EEE fail.
      
      Fix:
      Set the register when eeprom is not present.
      
      Fixes: 55d7de9d ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
      Signed-off-by: NRaghuram Chary J <raghuramchary.jallipalli@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e69647a1
    • H
      net/ipv4: disable SMC TCP option with SYN Cookies · bc58a1ba
      Hans Wippel 提交于
      Currently, the SMC experimental TCP option in a SYN packet is lost on
      the server side when SYN Cookies are active. However, the corresponding
      SYNACK sent back to the client contains the SMC option. This causes an
      inconsistent view of the SMC capabilities on the client and server.
      
      This patch disables the SMC option in the SYNACK when SYN Cookies are
      active to avoid this issue.
      
      Fixes: 60e2a778 ("tcp: TCP experimental option for SMC")
      Signed-off-by: NHans Wippel <hwippel@linux.vnet.ibm.com>
      Signed-off-by: NUrsula Braun <ubraun@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc58a1ba
  3. 25 3月, 2018 2 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · b9ee96b4
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for your net tree,
      they are:
      
      1) Don't pick fixed hash implementation for NFT_SET_EVAL sets, otherwise
         userspace hits EOPNOTSUPP with valid rules using the meter statement,
         from Florian Westphal.
      
      2) If you send a batch that flushes the existing ruleset (that contains
         a NAT chain) and the new ruleset definition comes with a new NAT
         chain, don't bogusly hit EBUSY. Also from Florian.
      
      3) Missing netlink policy attribute validation, from Florian.
      
      4) Detach conntrack template from skbuff if IP_NODEFRAG is set on,
         from Paolo Abeni.
      
      5) Cache device names in flowtable object, otherwise we may end up
         walking over devices going aways given no rtnl_lock is held.
      
      6) Fix incorrect net_device ingress with ingress hooks.
      
      7) Fix crash when trying to read more data than available in UDP
         packets from the nf_socket infrastructure, from Subash.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9ee96b4
    • S
      netfilter: nf_socket: Fix out of bounds access in nf_sk_lookup_slow_v{4,6} · 32c1733f
      Subash Abhinov Kasiviswanathan 提交于
      skb_header_pointer will copy data into a buffer if data is non linear,
      otherwise it will return a pointer in the linear section of the data.
      nf_sk_lookup_slow_v{4,6} always copies data of size udphdr but later
      accesses memory within the size of tcphdr (th->doff) in case of TCP
      packets. This causes a crash when running with KASAN with the following
      call stack -
      
      BUG: KASAN: stack-out-of-bounds in xt_socket_lookup_slow_v4+0x524/0x718
      net/netfilter/xt_socket.c:178
      Read of size 2 at addr ffffffe3d417a87c by task syz-executor/28971
      CPU: 2 PID: 28971 Comm: syz-executor Tainted: G    B   W  O    4.9.65+ #1
      Call trace:
      [<ffffff9467e8d390>] dump_backtrace+0x0/0x428 arch/arm64/kernel/traps.c:76
      [<ffffff9467e8d7e0>] show_stack+0x28/0x38 arch/arm64/kernel/traps.c:226
      [<ffffff946842d9b8>] __dump_stack lib/dump_stack.c:15 [inline]
      [<ffffff946842d9b8>] dump_stack+0xd4/0x124 lib/dump_stack.c:51
      [<ffffff946811d4b0>] print_address_description+0x68/0x258 mm/kasan/report.c:248
      [<ffffff946811d8c8>] kasan_report_error mm/kasan/report.c:347 [inline]
      [<ffffff946811d8c8>] kasan_report.part.2+0x228/0x2f0 mm/kasan/report.c:371
      [<ffffff946811df44>] kasan_report+0x5c/0x70 mm/kasan/report.c:372
      [<ffffff946811bebc>] check_memory_region_inline mm/kasan/kasan.c:308 [inline]
      [<ffffff946811bebc>] __asan_load2+0x84/0x98 mm/kasan/kasan.c:739
      [<ffffff94694d6f04>] __tcp_hdrlen include/linux/tcp.h:35 [inline]
      [<ffffff94694d6f04>] xt_socket_lookup_slow_v4+0x524/0x718 net/netfilter/xt_socket.c:178
      
      Fix this by copying data into appropriate size headers based on protocol.
      
      Fixes: a583636a ("inet: refactor inet[6]_lookup functions to take skb")
      Signed-off-by: NTejaswi Tanikella <tejaswit@codeaurora.org>
      Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      32c1733f
  4. 24 3月, 2018 7 次提交
    • L
      batman-adv: fix packet loss for broadcasted DHCP packets to a server · a752c0a4
      Linus Lüssing 提交于
      DHCP connectivity issues can currently occur if the following conditions
      are met:
      
      1) A DHCP packet from a client to a server
      2) This packet has a multicast destination
      3) This destination has a matching entry in the translation table
         (FF:FF:FF:FF:FF:FF for IPv4, 33:33:00:01:00:02/33:33:00:01:00:03
          for IPv6)
      4) The orig-node determined by TT for the multicast destination
         does not match the orig-node determined by best-gateway-selection
      
      In this case the DHCP packet will be dropped.
      
      The "gateway-out-of-range" check is supposed to only be applied to
      unicasted DHCP packets to a specific DHCP server.
      
      In that case dropping the the unicasted frame forces the client to
      retry via a broadcasted one, but now directed to the new best
      gateway.
      
      A DHCP packet with broadcast/multicast destination is already ensured to
      always be delivered to the best gateway. Dropping a multicasted
      DHCP packet here will only prevent completing DHCP as there is no
      other fallback.
      
      So far, it seems the unicast check was implicitly performed by
      expecting the batadv_transtable_search() to return NULL for multicast
      destinations. However, a multicast address could have always ended up in
      the translation table and in fact is now common.
      
      To fix this potential loss of a DHCP client-to-server packet to a
      multicast address this patch adds an explicit multicast destination
      check to reliably bail out of the gateway-out-of-range check for such
      destinations.
      
      The issue and fix were tested in the following three node setup:
      
      - Line topology, A-B-C
      - A: gateway client, DHCP client
      - B: gateway server, hop-penalty increased: 30->60, DHCP server
      - C: gateway server, code modifications to announce FF:FF:FF:FF:FF:FF
      
      Without this patch, A would never transmit its DHCP Discover packet
      due to an always "out-of-range" condition. With this patch,
      a full DHCP handshake between A and B was possible again.
      
      Fixes: be7af5cf ("batman-adv: refactoring gateway handling code")
      Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
      a752c0a4
    • L
      batman-adv: fix multicast-via-unicast transmission with AP isolation · f8fb3419
      Linus Lüssing 提交于
      For multicast frames AP isolation is only supposed to be checked on
      the receiving nodes and never on the originating one.
      
      Furthermore, the isolation or wifi flag bits should only be intepreted
      as such for unicast and never multicast TT entries.
      
      By injecting flags to the multicast TT entry claimed by a single
      target node it was verified in tests that this multicast address
      becomes unreachable, leading to packet loss.
      
      Omitting the "src" parameter to the batadv_transtable_search() call
      successfully skipped the AP isolation check and made the target
      reachable again.
      
      Fixes: 1d8ab8d3 ("batman-adv: Modified forwarding behaviour for multicast packets")
      Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
      f8fb3419
    • E
      ipv6: fix possible deadlock in rt6_age_examine_exception() · 1bfa26ff
      Eric Dumazet 提交于
      syzbot reported a LOCKDEP splat [1] in rt6_age_examine_exception()
      
      rt6_age_examine_exception() is called while rt6_exception_lock is held.
      This lock is the lower one in the lock hierarchy, thus we can not
      call dst_neigh_lookup() function, as it can fallback to neigh_create()
      
      We should instead do a pure RCU lookup. As a bonus we avoid
      a pair of atomic operations on neigh refcount.
      
      [1]
      
      WARNING: possible circular locking dependency detected
      4.16.0-rc4+ #277 Not tainted
      
      syz-executor7/4015 is trying to acquire lock:
       (&ndev->lock){++--}, at: [<00000000416dce19>] __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928
      
      but task is already holding lock:
       (&tbl->lock){++-.}, at: [<00000000b5cb1d65>] neigh_ifdown+0x3d/0x250 net/core/neighbour.c:292
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #3 (&tbl->lock){++-.}:
             __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
             _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312
             __neigh_create+0x87e/0x1d90 net/core/neighbour.c:528
             neigh_create include/net/neighbour.h:315 [inline]
             ip6_neigh_lookup+0x9a7/0xba0 net/ipv6/route.c:228
             dst_neigh_lookup include/net/dst.h:405 [inline]
             rt6_age_examine_exception net/ipv6/route.c:1609 [inline]
             rt6_age_exceptions+0x381/0x660 net/ipv6/route.c:1645
             fib6_age+0xfb/0x140 net/ipv6/ip6_fib.c:2033
             fib6_clean_node+0x389/0x580 net/ipv6/ip6_fib.c:1919
             fib6_walk_continue+0x46c/0x8a0 net/ipv6/ip6_fib.c:1845
             fib6_walk+0x91/0xf0 net/ipv6/ip6_fib.c:1893
             fib6_clean_tree+0x1e6/0x340 net/ipv6/ip6_fib.c:1970
             __fib6_clean_all+0x1f4/0x3a0 net/ipv6/ip6_fib.c:1986
             fib6_clean_all net/ipv6/ip6_fib.c:1997 [inline]
             fib6_run_gc+0x16b/0x3c0 net/ipv6/ip6_fib.c:2053
             ndisc_netdev_event+0x3c2/0x4a0 net/ipv6/ndisc.c:1781
             notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
             __raw_notifier_call_chain kernel/notifier.c:394 [inline]
             raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
             call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
             call_netdevice_notifiers net/core/dev.c:1725 [inline]
             __dev_notify_flags+0x262/0x430 net/core/dev.c:6960
             dev_change_flags+0xf5/0x140 net/core/dev.c:6994
             devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080
             inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919
             sock_do_ioctl+0xef/0x390 net/socket.c:957
             sock_ioctl+0x36b/0x610 net/socket.c:1081
             vfs_ioctl fs/ioctl.c:46 [inline]
             do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
             SYSC_ioctl fs/ioctl.c:701 [inline]
             SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
             do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
             entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      -> #2 (rt6_exception_lock){+.-.}:
             __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
             _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
             spin_lock_bh include/linux/spinlock.h:315 [inline]
             rt6_flush_exceptions+0x21/0x210 net/ipv6/route.c:1367
             fib6_del_route net/ipv6/ip6_fib.c:1677 [inline]
             fib6_del+0x624/0x12c0 net/ipv6/ip6_fib.c:1761
             __ip6_del_rt+0xc7/0x120 net/ipv6/route.c:2980
             ip6_del_rt+0x132/0x1a0 net/ipv6/route.c:2993
             __ipv6_dev_ac_dec+0x3b1/0x600 net/ipv6/anycast.c:332
             ipv6_dev_ac_dec net/ipv6/anycast.c:345 [inline]
             ipv6_sock_ac_close+0x2b4/0x3e0 net/ipv6/anycast.c:200
             inet6_release+0x48/0x70 net/ipv6/af_inet6.c:433
             sock_release+0x8d/0x1e0 net/socket.c:594
             sock_close+0x16/0x20 net/socket.c:1149
             __fput+0x327/0x7e0 fs/file_table.c:209
             ____fput+0x15/0x20 fs/file_table.c:243
             task_work_run+0x199/0x270 kernel/task_work.c:113
             exit_task_work include/linux/task_work.h:22 [inline]
             do_exit+0x9bb/0x1ad0 kernel/exit.c:865
             do_group_exit+0x149/0x400 kernel/exit.c:968
             get_signal+0x73a/0x16d0 kernel/signal.c:2469
             do_signal+0x90/0x1e90 arch/x86/kernel/signal.c:809
             exit_to_usermode_loop+0x258/0x2f0 arch/x86/entry/common.c:162
             prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
             syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
             do_syscall_64+0x6ec/0x940 arch/x86/entry/common.c:292
             entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      -> #1 (&(&tb->tb6_lock)->rlock){+.-.}:
             __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
             _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
             spin_lock_bh include/linux/spinlock.h:315 [inline]
             __ip6_ins_rt+0x56/0x90 net/ipv6/route.c:1007
             ip6_route_add+0x141/0x190 net/ipv6/route.c:2955
             addrconf_prefix_route+0x44f/0x620 net/ipv6/addrconf.c:2359
             fixup_permanent_addr net/ipv6/addrconf.c:3368 [inline]
             addrconf_permanent_addr net/ipv6/addrconf.c:3391 [inline]
             addrconf_notify+0x1ad2/0x2310 net/ipv6/addrconf.c:3460
             notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
             __raw_notifier_call_chain kernel/notifier.c:394 [inline]
             raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
             call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
             call_netdevice_notifiers net/core/dev.c:1725 [inline]
             __dev_notify_flags+0x15d/0x430 net/core/dev.c:6958
             dev_change_flags+0xf5/0x140 net/core/dev.c:6994
             do_setlink+0xa22/0x3bb0 net/core/rtnetlink.c:2357
             rtnl_newlink+0xf37/0x1a50 net/core/rtnetlink.c:2965
             rtnetlink_rcv_msg+0x57f/0xb10 net/core/rtnetlink.c:4641
             netlink_rcv_skb+0x14b/0x380 net/netlink/af_netlink.c:2444
             rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4659
             netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline]
             netlink_unicast+0x4c4/0x6b0 net/netlink/af_netlink.c:1334
             netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1897
             sock_sendmsg_nosec net/socket.c:629 [inline]
             sock_sendmsg+0xca/0x110 net/socket.c:639
             ___sys_sendmsg+0x767/0x8b0 net/socket.c:2047
             __sys_sendmsg+0xe5/0x210 net/socket.c:2081
             SYSC_sendmsg net/socket.c:2092 [inline]
             SyS_sendmsg+0x2d/0x50 net/socket.c:2088
             do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
             entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      -> #0 (&ndev->lock){++--}:
             lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
             __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
             _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312
             __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928
             ipv6_dev_mc_dec+0x110/0x1f0 net/ipv6/mcast.c:961
             pndisc_destructor+0x21a/0x340 net/ipv6/ndisc.c:392
             pneigh_ifdown net/core/neighbour.c:695 [inline]
             neigh_ifdown+0x149/0x250 net/core/neighbour.c:294
             rt6_disable_ip+0x537/0x700 net/ipv6/route.c:3874
             addrconf_ifdown+0x14b/0x14f0 net/ipv6/addrconf.c:3633
             addrconf_notify+0x5f8/0x2310 net/ipv6/addrconf.c:3557
             notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
             __raw_notifier_call_chain kernel/notifier.c:394 [inline]
             raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
             call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
             call_netdevice_notifiers net/core/dev.c:1725 [inline]
             __dev_notify_flags+0x262/0x430 net/core/dev.c:6960
             dev_change_flags+0xf5/0x140 net/core/dev.c:6994
             devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080
             inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919
             packet_ioctl+0x1ff/0x310 net/packet/af_packet.c:4066
             sock_do_ioctl+0xef/0x390 net/socket.c:957
             sock_ioctl+0x36b/0x610 net/socket.c:1081
             vfs_ioctl fs/ioctl.c:46 [inline]
             do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
             SYSC_ioctl fs/ioctl.c:701 [inline]
             SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
             do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
             entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      other info that might help us debug this:
      
      Chain exists of:
        &ndev->lock --> rt6_exception_lock --> &tbl->lock
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&tbl->lock);
                                     lock(rt6_exception_lock);
                                     lock(&tbl->lock);
        lock(&ndev->lock);
      
       *** DEADLOCK ***
      
      2 locks held by syz-executor7/4015:
       #0:  (rtnl_mutex){+.+.}, at: [<00000000a2f16daa>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
       #1:  (&tbl->lock){++-.}, at: [<00000000b5cb1d65>] neigh_ifdown+0x3d/0x250 net/core/neighbour.c:292
      
      stack backtrace:
      CPU: 0 PID: 4015 Comm: syz-executor7 Not tainted 4.16.0-rc4+ #277
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x24d lib/dump_stack.c:53
       print_circular_bug.isra.38+0x2cd/0x2dc kernel/locking/lockdep.c:1223
       check_prev_add kernel/locking/lockdep.c:1863 [inline]
       check_prevs_add kernel/locking/lockdep.c:1976 [inline]
       validate_chain kernel/locking/lockdep.c:2417 [inline]
       __lock_acquire+0x30a8/0x3e00 kernel/locking/lockdep.c:3431
       lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
       __raw_write_lock_bh include/linux/rwlock_api_smp.h:203 [inline]
       _raw_write_lock_bh+0x31/0x40 kernel/locking/spinlock.c:312
       __ipv6_dev_mc_dec+0x45/0x350 net/ipv6/mcast.c:928
       ipv6_dev_mc_dec+0x110/0x1f0 net/ipv6/mcast.c:961
       pndisc_destructor+0x21a/0x340 net/ipv6/ndisc.c:392
       pneigh_ifdown net/core/neighbour.c:695 [inline]
       neigh_ifdown+0x149/0x250 net/core/neighbour.c:294
       rt6_disable_ip+0x537/0x700 net/ipv6/route.c:3874
       addrconf_ifdown+0x14b/0x14f0 net/ipv6/addrconf.c:3633
       addrconf_notify+0x5f8/0x2310 net/ipv6/addrconf.c:3557
       notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x32/0x70 net/core/dev.c:1707
       call_netdevice_notifiers net/core/dev.c:1725 [inline]
       __dev_notify_flags+0x262/0x430 net/core/dev.c:6960
       dev_change_flags+0xf5/0x140 net/core/dev.c:6994
       devinet_ioctl+0x126a/0x1ac0 net/ipv4/devinet.c:1080
       inet_ioctl+0x184/0x310 net/ipv4/af_inet.c:919
       packet_ioctl+0x1ff/0x310 net/packet/af_packet.c:4066
       sock_do_ioctl+0xef/0x390 net/socket.c:957
       sock_ioctl+0x36b/0x610 net/socket.c:1081
       vfs_ioctl fs/ioctl.c:46 [inline]
       do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
       SYSC_ioctl fs/ioctl.c:701 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
       do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      Fixes: c757faa8 ("ipv6: prepare fib6_age() for exception table")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1bfa26ff
    • D
      Merge branch 'mlxsw-GRE-mtu-changes' · 69ebaed0
      David S. Miller 提交于
      Ido Schimmel says:
      
      ====================
      mlxsw: Handle changes to MTU in GRE tunnels
      
      Petr says:
      
      When offloading GRE tunnels, the MTU setting is kept fixed after the
      initial offload even as the slow-path configuration changed. Worse: the
      offloaded MTU setting is actually just a transient value set at the time
      of NETDEV_REGISTER of the tunnel. As of commit ffc2b6ee ("ip_gre:
      fix IFLA_MTU ignored on NEWLINK"), that transient value is zero, and
      unless there's e.g. a VRF migration that prompts re-offload, it stays at
      zero, and all GRE packets end up trapping.
      
      Thus, in patch #1, change the way the MTU is changed post-registration,
      so that the full event protocol is observed. That way the drivers get to
      see the change and have a chance to react.
      
      In the remaining two patches, implement support for MTU change in mlxsw
      driver.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69ebaed0
    • P
      mlxsw: spectrum_router: Handle MTU change of GRE netdevs · 68c3cd92
      Petr Machata 提交于
      Update MTU of overlay loopback in accordance with the setting on the
      tunnel netdevice.
      
      Fixes: 0063587d ("mlxsw: spectrum: Support decap-only IP-in-IP tunnels")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68c3cd92
    • P
      mlxsw: spectrum_router: Move mlxsw_sp_rif_ipip_lb_op() · 22b99058
      Petr Machata 提交于
      Move the function so that it can be called without forward declaration
      from a function that will be added in a follow-up patch.
      
      Fixes: 0063587d ("mlxsw: spectrum: Support decap-only IP-in-IP tunnels")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22b99058
    • P
      ip_tunnel: Emit events for post-register MTU changes · f6cc9c05
      Petr Machata 提交于
      For tunnels created with IFLA_MTU, MTU of the netdevice is set by
      rtnl_create_link() (called from rtnl_newlink()) before the device is
      registered. However without IFLA_MTU that's not done.
      
      rtnl_newlink() proceeds by calling struct rtnl_link_ops.newlink, which
      via ip_tunnel_newlink() calls register_netdevice(), and that emits
      NETDEV_REGISTER. Thus any listeners that inspect the netdevice get the
      MTU of 0.
      
      After ip_tunnel_newlink() corrects the MTU after registering the
      netdevice, but since there's no event, the listeners don't get to know
      about the MTU until something else happens--such as a NETDEV_UP event.
      That's not ideal.
      
      So instead of setting the MTU directly, go through dev_set_mtu(), which
      takes care of distributing the necessary NETDEV_PRECHANGEMTU and
      NETDEV_CHANGEMTU events.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6cc9c05
  5. 23 3月, 2018 11 次提交
    • L
      Merge branch 'akpm' (patches from Andrew) · f36b7534
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton:
       "13 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm, thp: do not cause memcg oom for thp
        mm/vmscan: wake up flushers for legacy cgroups too
        Revert "mm: page_alloc: skip over regions of invalid pfns where possible"
        mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink()
        mm/thp: do not wait for lock_page() in deferred_split_scan()
        mm/khugepaged.c: convert VM_BUG_ON() to collapse fail
        x86/mm: implement free pmd/pte page interfaces
        mm/vmalloc: add interfaces to free unmapped page table
        h8300: remove extraneous __BIG_ENDIAN definition
        hugetlbfs: check for pgoff value overflow
        lockdep: fix fs_reclaim warning
        MAINTAINERS: update Mark Fasheh's e-mail
        mm/mempolicy.c: avoid use uninitialized preferred_node
      f36b7534
    • L
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 8401c72c
      Linus Torvalds 提交于
      Pull libnvdimm fixes from Dan Williams:
       "Two regression fixes, two bug fixes for older issues, two fixes for
        new functionality added this cycle that have userspace ABI concerns,
        and a small cleanup. These have appeared in a linux-next release and
        have a build success report from the 0day robot.
      
         * The 4.16 rework of altmap handling led to some configurations
           leaking page table allocations due to freeing from the altmap
           reservation rather than the page allocator.
      
           The impact without the fix is leaked memory and a WARN() message
           when tearing down libnvdimm namespaces. The rework also missed a
           place where error handling code needed to be removed that can lead
           to a crash if devm_memremap_pages() fails.
      
         * acpi_map_pxm_to_node() had a latent bug whereby it could
           misidentify the closest online node to a given proximity domain.
      
         * Block integrity handling was reworked several kernels back to allow
           calling add_disk() after setting up the integrity profile.
      
           The nd_btt and nd_blk drivers are just now catching up to fix
           automatic partition detection at driver load time.
      
         * The new peristence_domain attribute, a platform indicator of
           whether cpu caches are powerfail protected for example, is meant to
           be a single value enum and not a set of flags.
      
           This oversight was caught while reviewing new userspace code in
           libndctl to communicate the attribute.
      
           Fix this new enabling up so that we are not stuck with an unwanted
           userspace ABI"
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        libnvdimm, nfit: fix persistence domain reporting
        libnvdimm, region: hide persistence_domain when unknown
        acpi, numa: fix pxm to online numa node associations
        x86, memremap: fix altmap accounting at free
        libnvdimm: remove redundant assignment to pointer 'dev'
        libnvdimm, {btt, blk}: do integrity setup before add_disk()
        kernel/memremap: Remove stale devres_free() call
      8401c72c
    • L
      Merge tag 'drm-fixes-for-v4.16-rc7' of git://people.freedesktop.org/~airlied/linux · 9ec7ccc8
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "A bunch of fixes all over the place (core, i915, amdgpu, imx, sun4i,
        ast, tegra, vmwgfx), nothing too serious or worrying at this stage.
      
         - one uapi fix to stop multi-planar images with getfb
      
         - Sun4i error path and clock fixes
      
         - udl driver mmap offset fix
      
         - i915 DP MST and GPU reset fixes
      
         - vmwgfx mutex and black screen fixes
      
         - imx array underflow fix and vblank fix
      
         - amdgpu: display fixes
      
         - exynos devicetree fix
      
         - ast mode fix"
      
      * tag 'drm-fixes-for-v4.16-rc7' of git://people.freedesktop.org/~airlied/linux: (29 commits)
        drm/ast: Fixed 1280x800 Display Issue
        drm: udl: Properly check framebuffer mmap offsets
        drm/i915: Specify which engines to reset following semaphore/event lockups
        drm/vmwgfx: Fix a destoy-while-held mutex problem.
        drm/vmwgfx: Fix black screen and device errors when running without fbdev
        drm: Reject getfb for multi-plane framebuffers
        drm/amd/display: Add one to EDID's audio channel count when passing to DC
        drm/amd/display: We shouldn't set format_default on plane as atomic driver
        drm/amd/display: Fix FMT truncation programming
        drm/amd/display: Allow truncation to 10 bits
        drm/sun4i: hdmi: Fix another error handling path in 'sun4i_hdmi_bind()'
        drm/sun4i: hdmi: Fix an error handling path in 'sun4i_hdmi_bind()'
        drm/i915/dp: Write to SET_POWER dpcd to enable MST hub.
        drm/amd/display: fix dereferencing possible ERR_PTR()
        drm/amd/display: Refine disable VGA
        drm/tegra: Shutdown on driver unbind
        drm/tegra: dsi: Don't disable regulator on ->exit()
        drm/tegra: dc: Detach IOMMU group from domain only once
        dt-bindings: exynos: Document #sound-dai-cells property of the HDMI node
        drm/imx: move arming of the vblank event to atomic_flush
        ...
      9ec7ccc8
    • D
      mm, thp: do not cause memcg oom for thp · 9d3c3354
      David Rientjes 提交于
      Commit 25160354 ("mm, thp: remove __GFP_NORETRY from khugepaged and
      madvised allocations") changed the page allocator to no longer detect
      thp allocations based on __GFP_NORETRY.
      
      It did not, however, modify the mem cgroup try_charge() path to avoid
      oom kill for either khugepaged collapsing or thp faulting.  It is never
      expected to oom kill a process to allocate a hugepage for thp; reclaim
      is governed by the thp defrag mode and MADV_HUGEPAGE, but allocations
      (and charging) should fallback instead of oom killing processes.
      
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1803191409420.124411@chino.kir.corp.google.com
      Fixes: 25160354 ("mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations")
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9d3c3354
    • A
      mm/vmscan: wake up flushers for legacy cgroups too · 1c610d5f
      Andrey Ryabinin 提交于
      Commit 726d061f ("mm: vmscan: kick flushers when we encounter dirty
      pages on the LRU") added flusher invocation to shrink_inactive_list()
      when many dirty pages on the LRU are encountered.
      
      However, shrink_inactive_list() doesn't wake up flushers for legacy
      cgroup reclaim, so the next commit bbef9384 ("mm: vmscan: remove old
      flusher wakeup from direct reclaim path") removed the only source of
      flusher's wake up in legacy mem cgroup reclaim path.
      
      This leads to premature OOM if there is too many dirty pages in cgroup:
          # mkdir /sys/fs/cgroup/memory/test
          # echo $$ > /sys/fs/cgroup/memory/test/tasks
          # echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
          # dd if=/dev/zero of=tmp_file bs=1M count=100
          Killed
      
          dd invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0
      
          Call Trace:
           dump_stack+0x46/0x65
           dump_header+0x6b/0x2ac
           oom_kill_process+0x21c/0x4a0
           out_of_memory+0x2a5/0x4b0
           mem_cgroup_out_of_memory+0x3b/0x60
           mem_cgroup_oom_synchronize+0x2ed/0x330
           pagefault_out_of_memory+0x24/0x54
           __do_page_fault+0x521/0x540
           page_fault+0x45/0x50
      
          Task in /test killed as a result of limit of /test
          memory: usage 51200kB, limit 51200kB, failcnt 73
          memory+swap: usage 51200kB, limit 9007199254740988kB, failcnt 0
          kmem: usage 296kB, limit 9007199254740988kB, failcnt 0
          Memory cgroup stats for /test: cache:49632KB rss:1056KB rss_huge:0KB shmem:0KB
                  mapped_file:0KB dirty:49500KB writeback:0KB swap:0KB inactive_anon:0KB
      	    active_anon:1168KB inactive_file:24760KB active_file:24960KB unevictable:0KB
          Memory cgroup out of memory: Kill process 3861 (bash) score 88 or sacrifice child
          Killed process 3876 (dd) total-vm:8484kB, anon-rss:1052kB, file-rss:1720kB, shmem-rss:0kB
          oom_reaper: reaped process 3876 (dd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
      
      Wake up flushers in legacy cgroup reclaim too.
      
      Link: http://lkml.kernel.org/r/20180315164553.17856-1-aryabinin@virtuozzo.com
      Fixes: bbef9384 ("mm: vmscan: remove old flusher wakeup from direct reclaim path")
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Tested-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c610d5f
    • D
      Revert "mm: page_alloc: skip over regions of invalid pfns where possible" · f59f1caf
      Daniel Vacek 提交于
      This reverts commit b92df1de ("mm: page_alloc: skip over regions of
      invalid pfns where possible").  The commit is meant to be a boot init
      speed up skipping the loop in memmap_init_zone() for invalid pfns.
      
      But given some specific memory mapping on x86_64 (or more generally
      theoretically anywhere but on arm with CONFIG_HAVE_ARCH_PFN_VALID) the
      implementation also skips valid pfns which is plain wrong and causes
      'kernel BUG at mm/page_alloc.c:1389!'
      
        crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1
        kernel BUG at mm/page_alloc.c:1389!
        invalid opcode: 0000 [#1] SMP
        --
        RIP: 0010: move_freepages+0x15e/0x160
        --
        Call Trace:
          move_freepages_block+0x73/0x80
          __rmqueue+0x263/0x460
          get_page_from_freelist+0x7e1/0x9e0
          __alloc_pages_nodemask+0x176/0x420
        --
      
        crash> page_init_bug -v | grep RAM
        <struct resource 0xffff88067fffd2f8>          1000 -        9bfff       System RAM (620.00 KiB)
        <struct resource 0xffff88067fffd3a0>        100000 -     430bffff       System RAM (  1.05 GiB = 1071.75 MiB = 1097472.00 KiB)
        <struct resource 0xffff88067fffd410>      4b0c8000 -     4bf9cfff       System RAM ( 14.83 MiB = 15188.00 KiB)
        <struct resource 0xffff88067fffd480>      4bfac000 -     646b1fff       System RAM (391.02 MiB = 400408.00 KiB)
        <struct resource 0xffff88067fffd560>      7b788000 -     7b7fffff       System RAM (480.00 KiB)
        <struct resource 0xffff88067fffd640>     100000000 -    67fffffff       System RAM ( 22.00 GiB)
      
        crash> page_init_bug | head -6
        <struct resource 0xffff88067fffd560>      7b788000 -     7b7fffff       System RAM (480.00 KiB)
        <struct page 0xffffea0001ede200>   1fffff00000000  0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32          4096    1048575
        <struct page 0xffffea0001ede200>       505736 505344 <struct page 0xffffea0001ed8000> 505855 <struct page 0xffffea0001edffc0>
        <struct page 0xffffea0001ed8000>                0  0 <struct pglist_data 0xffff88047ffd9000> 0 <struct zone 0xffff88047ffd9000> DMA               1       4095
        <struct page 0xffffea0001edffc0>   1fffff00000400  0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32          4096    1048575
        BUG, zones differ!
      
        crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000
              PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
        ffffea0001e00000  78000000                0        0  0 0
        ffffea0001ed7fc0  7b5ff000                0        0  0 0
        ffffea0001ed8000  7b600000                0        0  0 0       <<<<
        ffffea0001ede1c0  7b787000                0        0  0 0
        ffffea0001ede200  7b788000                0        0  1 1fffff00000000
      
      Link: http://lkml.kernel.org/r/20180316143855.29838-1-neelx@redhat.com
      Fixes: b92df1de ("mm: page_alloc: skip over regions of invalid pfns where possible")
      Signed-off-by: NDaniel Vacek <neelx@redhat.com>
      Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f59f1caf
    • K
      mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink() · b3cd54b2
      Kirill A. Shutemov 提交于
      shmem_unused_huge_shrink() gets called from reclaim path.  Waiting for
      page lock may lead to deadlock there.
      
      There was a bug report that may be attributed to this:
      
        http://lkml.kernel.org/r/alpine.LRH.2.11.1801242349220.30642@mail.ewheeler.net
      
      Replace lock_page() with trylock_page() and skip the page if we failed
      to lock it.  We will get to the page on the next scan.
      
      We can test for the PageTransHuge() outside the page lock as we only
      need protection against splitting the page under us.  Holding pin oni
      the page is enough for this.
      
      Link: http://lkml.kernel.org/r/20180316210830.43738-1-kirill.shutemov@linux.intel.com
      Fixes: 779750d2 ("shmem: split huge pages beyond i_size under memory pressure")
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: NEric Wheeler <linux-mm@lists.ewheeler.net>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>	[4.8+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3cd54b2
    • K
      mm/thp: do not wait for lock_page() in deferred_split_scan() · fa41b900
      Kirill A. Shutemov 提交于
      deferred_split_scan() gets called from reclaim path.  Waiting for page
      lock may lead to deadlock there.
      
      Replace lock_page() with trylock_page() and skip the page if we failed
      to lock it.  We will get to the page on the next scan.
      
      Link: http://lkml.kernel.org/r/20180315150747.31945-1-kirill.shutemov@linux.intel.com
      Fixes: 9a982250 ("thp: introduce deferred_split_huge_page()")
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fa41b900
    • K
      mm/khugepaged.c: convert VM_BUG_ON() to collapse fail · fece2029
      Kirill A. Shutemov 提交于
      khugepaged is not yet able to convert PTE-mapped huge pages back to PMD
      mapped.  We do not collapse such pages.  See check
      khugepaged_scan_pmd().
      
      But if between khugepaged_scan_pmd() and __collapse_huge_page_isolate()
      somebody managed to instantiate THP in the range and then split the PMD
      back to PTEs we would have a problem --
      VM_BUG_ON_PAGE(PageCompound(page)) will get triggered.
      
      It's possible since we drop mmap_sem during collapse to re-take for
      write.
      
      Replace the VM_BUG_ON() with graceful collapse fail.
      
      Link: http://lkml.kernel.org/r/20180315152353.27989-1-kirill.shutemov@linux.intel.com
      Fixes: b1caa957 ("khugepaged: ignore pmd tables with THP mapped with ptes")
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fece2029
    • T
      x86/mm: implement free pmd/pte page interfaces · 28ee90fe
      Toshi Kani 提交于
      Implement pud_free_pmd_page() and pmd_free_pte_page() on x86, which
      clear a given pud/pmd entry and free up lower level page table(s).
      
      The address range associated with the pud/pmd entry must have been
      purged by INVLPG.
      
      Link: http://lkml.kernel.org/r/20180314180155.19492-3-toshi.kani@hpe.com
      Fixes: e61ce6ad ("mm: change ioremap to set up huge I/O mappings")
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Reported-by: NLei Li <lious.lilei@hisilicon.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      28ee90fe
    • T
      mm/vmalloc: add interfaces to free unmapped page table · b6bdb751
      Toshi Kani 提交于
      On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap() may
      create pud/pmd mappings.  A kernel panic was observed on arm64 systems
      with Cortex-A75 in the following steps as described by Hanjun Guo.
      
       1. ioremap a 4K size, valid page table will build,
       2. iounmap it, pte0 will set to 0;
       3. ioremap the same address with 2M size, pgd/pmd is unchanged,
          then set the a new value for pmd;
       4. pte0 is leaked;
       5. CPU may meet exception because the old pmd is still in TLB,
          which will lead to kernel panic.
      
      This panic is not reproducible on x86.  INVLPG, called from iounmap,
      purges all levels of entries associated with purged address on x86.  x86
      still has memory leak.
      
      The patch changes the ioremap path to free unmapped page table(s) since
      doing so in the unmap path has the following issues:
      
       - The iounmap() path is shared with vunmap(). Since vmap() only
         supports pte mappings, making vunmap() to free a pte page is an
         overhead for regular vmap users as they do not need a pte page freed
         up.
      
       - Checking if all entries in a pte page are cleared in the unmap path
         is racy, and serializing this check is expensive.
      
       - The unmap path calls free_vmap_area_noflush() to do lazy TLB purges.
         Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB
         purge.
      
      Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(), which
      clear a given pud/pmd entry and free up a page for the lower level
      entries.
      
      This patch implements their stub functions on x86 and arm64, which work
      as workaround.
      
      [akpm@linux-foundation.org: fix typo in pmd_free_pte_page() stub]
      Link: http://lkml.kernel.org/r/20180314180155.19492-2-toshi.kani@hpe.com
      Fixes: e61ce6ad ("mm: change ioremap to set up huge I/O mappings")
      Reported-by: NLei Li <lious.lilei@hisilicon.com>
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Wang Xuefeng <wxf.wang@hisilicon.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6bdb751