1. 27 4月, 2015 4 次提交
  2. 26 4月, 2015 4 次提交
    • E
      net: fix crash in build_skb() · 2ea2f62c
      Eric Dumazet 提交于
      When I added pfmemalloc support in build_skb(), I forgot netlink
      was using build_skb() with a vmalloc() area.
      
      In this patch I introduce __build_skb() for netlink use,
      and build_skb() is a wrapper handling both skb->head_frag and
      skb->pfmemalloc
      
      This means netlink no longer has to hack skb->head_frag
      
      [ 1567.700067] kernel BUG at arch/x86/mm/physaddr.c:26!
      [ 1567.700067] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      [ 1567.700067] Dumping ftrace buffer:
      [ 1567.700067]    (ftrace buffer empty)
      [ 1567.700067] Modules linked in:
      [ 1567.700067] CPU: 9 PID: 16186 Comm: trinity-c182 Not tainted 4.0.0-next-20150424-sasha-00037-g4796e21 #2167
      [ 1567.700067] task: ffff880127efb000 ti: ffff880246770000 task.ti: ffff880246770000
      [ 1567.700067] RIP: __phys_addr (arch/x86/mm/physaddr.c:26 (discriminator 3))
      [ 1567.700067] RSP: 0018:ffff8802467779d8  EFLAGS: 00010202
      [ 1567.700067] RAX: 000041000ed8e000 RBX: ffffc9008ed8e000 RCX: 000000000000002c
      [ 1567.700067] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffffb3fd6049
      [ 1567.700067] RBP: ffff8802467779f8 R08: 0000000000000019 R09: ffff8801d0168000
      [ 1567.700067] R10: ffff8801d01680c7 R11: ffffed003a02d019 R12: ffffc9000ed8e000
      [ 1567.700067] R13: 0000000000000f40 R14: 0000000000001180 R15: ffffc9000ed8e000
      [ 1567.700067] FS:  00007f2a7da3f700(0000) GS:ffff8801d1000000(0000) knlGS:0000000000000000
      [ 1567.700067] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1567.700067] CR2: 0000000000738308 CR3: 000000022e329000 CR4: 00000000000007e0
      [ 1567.700067] Stack:
      [ 1567.700067]  ffffc9000ed8e000 ffff8801d0168000 ffffc9000ed8e000 ffff8801d0168000
      [ 1567.700067]  ffff880246777a28 ffffffffad7c0a21 0000000000001080 ffff880246777c08
      [ 1567.700067]  ffff88060d302e68 ffff880246777b58 ffff880246777b88 ffffffffad9a6821
      [ 1567.700067] Call Trace:
      [ 1567.700067] build_skb (include/linux/mm.h:508 net/core/skbuff.c:316)
      [ 1567.700067] netlink_sendmsg (net/netlink/af_netlink.c:1633 net/netlink/af_netlink.c:2329)
      [ 1567.774369] ? sched_clock_cpu (kernel/sched/clock.c:311)
      [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273)
      [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273)
      [ 1567.774369] sock_sendmsg (net/socket.c:614 net/socket.c:623)
      [ 1567.774369] sock_write_iter (net/socket.c:823)
      [ 1567.774369] ? sock_sendmsg (net/socket.c:806)
      [ 1567.774369] __vfs_write (fs/read_write.c:479 fs/read_write.c:491)
      [ 1567.774369] ? get_lock_stats (kernel/locking/lockdep.c:249)
      [ 1567.774369] ? default_llseek (fs/read_write.c:487)
      [ 1567.774369] ? vtime_account_user (kernel/sched/cputime.c:701)
      [ 1567.774369] ? rw_verify_area (fs/read_write.c:406 (discriminator 4))
      [ 1567.774369] vfs_write (fs/read_write.c:539)
      [ 1567.774369] SyS_write (fs/read_write.c:586 fs/read_write.c:577)
      [ 1567.774369] ? SyS_read (fs/read_write.c:577)
      [ 1567.774369] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
      [ 1567.774369] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2594 kernel/locking/lockdep.c:2636)
      [ 1567.774369] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:42)
      [ 1567.774369] system_call_fastpath (arch/x86/kernel/entry_64.S:261)
      
      Fixes: 79930f58 ("net: do not deplete pfmemalloc reserve")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ea2f62c
    • C
      net: eth: altera: Resolve false errors from MSGDMA to TSE · 20d96964
      Chee Nouk Phoon 提交于
      This patch resolves false errors from MSGDMA in TX mSGDMA MM to ST
      mode, and is a continuation of the patch recently submitted by Andrea
      Oetken. The MSGDMA had a logic bug that masked detection of this issue
      prior to Quartus 14.1/Build 164. When the MSGDMA logic bug was addressed
      in Quartus 14.1/Build 164, the driver problem was exposed.
      
      The problem is corrected by making sure MSGDMA_DESC_CTL_TR_ERR_IRQ is not
      set for any of the transmit DMA descriptors, and only used for receive
      descriptors.
      
      Fixes: 71cd26e7 altera tse: Error-Bit on tx-avalon-stream always set.
      Signed-off-by: NChee Nouk Phoon <cnphoon@altera.com>
      Signed-off-by: Vince Bridgers <vbridger@opensource.altera.com>a
      Cc: Andreas Oetken <ennoerlangen@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20d96964
    • M
      ehea: Fix memory hook reference counting crashes · 3051f392
      Michael Ellerman 提交于
      The recent commit to only register the EHEA memory hotplug hooks on
      adapter probe has a few problems.
      
      Firstly the reference counting is wrong for multiple adapters, in that
      the hooks are registered multiple times. Secondly the check in the tear
      down path is backward. Finally the error path doesn't decrement the
      count.
      
      The multiple registration of the hooks is the biggest problem, as it
      leads to oopses when the system is rebooted, and/or errors during memory
      hotplug, eg:
      
        $ ./mem-on-off-test.sh -r 2
        ...
        ehea: memory is going offline
        ehea: LPAR memory changed - re-initializing driver
        ehea: re-initializing driver complete
        ehea: memory is going offline
        ehea: LPAR memory changed - re-initializing driver
        ehea: opcode=26c ret=fffffffffffffffc arg1=8000000003000003 arg2=0 arg3=700000060000d600 arg4=3fded0000 arg5=200 arg6=0 arg7=0
        ehea: register_rpage_mr failed
        ehea: registering mr failed
        ehea: register MR failed - driver inoperable!
        ehea: memory is going offline
      
      Fixes: aa183323 ("ehea: Register memory hotplug, reboot and crash hooks on adapter probe")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3051f392
    • G
      net/tg3: Release IRQs on permanent error · dfc8f370
      Gavin Shan 提交于
      When having permanent EEH error, the PCI device will be removed
      from the system. For this case, we shouldn't set pcierr_recovery
      to true wrongly, which blocks the driver to release the allocated
      interrupts and their handlers. Eventually, we can't disable MSI
      or MSIx successfully because of the MSI or MSIx interrupts still
      have associated interrupt actions, which is turned into following
      stack dump.
      
      Oops: Exception in kernel mode, sig: 5 [#1]
              :
      [c0000000003b76a8] .free_msi_irqs+0x80/0x1a0 (unreliable)
      [c00000000039f388] .pci_remove_bus_device+0x98/0x110
      [c0000000000790f4] .pcibios_remove_pci_devices+0x9c/0x128
      [c000000000077b98] .handle_eeh_events+0x2d8/0x4b0
      [c0000000000782d0] .eeh_event_handler+0x130/0x1c0
      [c000000000022bd4] .kernel_thread+0x54/0x70
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Acked-by: NPrashant Sreedharan <prashant@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dfc8f370
  3. 25 4月, 2015 1 次提交
  4. 24 4月, 2015 9 次提交
    • E
      inet: fix possible panic in reqsk_queue_unlink() · b357a364
      Eric Dumazet 提交于
      [ 3897.923145] BUG: unable to handle kernel NULL pointer dereference at
       0000000000000080
      [ 3897.931025] IP: [<ffffffffa9f27686>] reqsk_timer_handler+0x1a6/0x243
      
      There is a race when reqsk_timer_handler() and tcp_check_req() call
      inet_csk_reqsk_queue_unlink() on the same req at the same time.
      
      Before commit fa76ce73 ("inet: get rid of central tcp/dccp listener
      timer"), listener spinlock was held and race could not happen.
      
      To solve this bug, we change reqsk_queue_unlink() to not assume req
      must be found, and we return a status, to conditionally release a
      refcount on the request sock.
      
      This also means tcp_check_req() in non fastopen case might or not
      consume req refcount, so tcp_v6_hnd_req() & tcp_v4_hnd_req() have
      to properly handle this.
      
      (Same remark for dccp_check_req() and its callers)
      
      inet_csk_reqsk_queue_drop() is now too big to be inlined, as it is
      called 4 times in tcp and 3 times in dccp.
      
      Fixes: fa76ce73 ("inet: get rid of central tcp/dccp listener timer")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b357a364
    • J
      rhashtable: don't attempt to grow when at max_size · 1d8dc3d3
      Johannes Berg 提交于
      The conversion of mac80211's station table to rhashtable had a bug
      that I found by accident in code review, that hadn't been found as
      rhashtable apparently managed to have a maximum hash chain length
      of one (!) in all our testing.
      
      In order to test the bug and verify the fix I set my rhashtable's
      max_size very low (4) in order to force getting hash collisions.
      
      At that point, rhashtable WARNed in rhashtable_insert_rehash() but
      didn't actually reject the hash table insertion. This caused it to
      lose insertions - my master list of stations would have 9 entries,
      but the rhashtable only had 5. This may warrant a deeper look, but
      that WARN_ON() just shouldn't happen.
      
      Fix this by not returning true from rht_grow_above_100() when the
      rhashtable's max_size has been reached - in this case the user is
      explicitly configuring it to be at most that big, so even if it's
      now above 100% it shouldn't attempt to resize.
      
      This fixes the "lost insertion" issue and consequently allows my
      code to display its error (and verify my fix for it.)
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d8dc3d3
    • R
      bgmac: fix requests for extra polling calls from NAPI · e580267d
      Rafał Miłecki 提交于
      After d75b1ade ("net: less interrupt masking in NAPI") polling
      function has to return whole budget when it wants NAPI to call it again.
      Signed-off-by: NRafał Miłecki <zajec5@gmail.com>
      Cc: Felix Fietkau <nbd@openwrt.org>
      Fixes: eb64e292 ("bgmac: leave interrupts disabled as long as there is work to do")
      Acked-by: NFelix Fietkau <nbd@openwrt.org>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e580267d
    • E
      tcp: avoid looping in tcp_send_fin() · 845704a5
      Eric Dumazet 提交于
      Presence of an unbound loop in tcp_send_fin() had always been hard
      to explain when analyzing crash dumps involving gigantic dying processes
      with millions of sockets.
      
      Lets try a different strategy :
      
      In case of memory pressure, try to add the FIN flag to last packet
      in write queue, even if packet was already sent. TCP stack will
      be able to deliver this FIN after a timeout event. Note that this
      FIN being delivered by a retransmit, it also carries a Push flag
      given our current implementation.
      
      By checking sk_under_memory_pressure(), we anticipate that cooking
      many FIN packets might deplete tcp memory.
      
      In the case we could not allocate a packet, even with __GFP_WAIT
      allocation, then not sending a FIN seems quite reasonable if it allows
      to get rid of this socket, free memory, and not block the process from
      eventually doing other useful work.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      845704a5
    • L
      ethernet: myri10ge: use arch_phys_wc_add() · e4b6c303
      Luis R. Rodriguez 提交于
      This driver already uses ioremap_wc() on the same range
      so when write-combining is available that will be used
      instead.
      
      Cc: Hyong-Youb Kim <hykim@myri.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Suresh Siddha <sbsiddha@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: netdev@vger.kernel.org
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Antonino Daplas <adaplas@gmail.com>
      Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
      Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4b6c303
    • G
      can: CAN_GRCAN should depend on HAS_DMA · 2fb42aab
      Geert Uytterhoeven 提交于
      If NO_DMA=y:
      
          drivers/built-in.o: In function `grcan_free_dma_buffers':
          grcan.c:(.text+0x2d7716): undefined reference to `dma_free_coherent'
          drivers/built-in.o: In function `grcan_allocate_dma_buffers':
          grcan.c:(.text+0x2d779c): undefined reference to `dma_alloc_coherent'
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2fb42aab
    • G
      ethernet: arc: ARC_EMAC and EMAC_ROCKCHIP should depend on HAS_DMA · 0357cc1d
      Geert Uytterhoeven 提交于
      If NO_DMA=y:
      
          drivers/built-in.o: In function `arc_emac_tx_clean':
          emac_main.c:(.text+0x2decde): undefined reference to `dma_unmap_single'
          drivers/built-in.o: In function `arc_emac_rx':
          emac_main.c:(.text+0x2dee1c): undefined reference to `dma_unmap_single'
          emac_main.c:(.text+0x2dee72): undefined reference to `dma_map_single'
          emac_main.c:(.text+0x2dee7e): undefined reference to `dma_mapping_error'
          drivers/built-in.o: In function `arc_emac_probe':
          (.text+0x2df2ee): undefined reference to `dmam_alloc_coherent'
          drivers/built-in.o: In function `arc_emac_open':
          emac_main.c:(.text+0x2df6d8): undefined reference to `dma_map_single'
          emac_main.c:(.text+0x2df6e4): undefined reference to `dma_mapping_error'
          drivers/built-in.o: In function `arc_emac_tx':
          emac_main.c:(.text+0x2df9e4): undefined reference to `dma_map_single'
          emac_main.c:(.text+0x2df9f0): undefined reference to `dma_mapping_error'
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0357cc1d
    • G
      ethernet: amd: AMD_XGBE should depend on HAS_DMA · 61e77d29
      Geert Uytterhoeven 提交于
      If NO_DMA=y:
      
          drivers/built-in.o: In function `xgbe_probe':
          xgbe-main.c:(.text+0x2def0a): undefined reference to `dma_set_mask'
          xgbe-main.c:(.text+0x2def20): undefined reference to `dma_supported'
          drivers/built-in.o: In function `xgbe_rx_poll':
          xgbe-drv.c:(.text+0x2e0320): undefined reference to `dma_sync_single_for_cpu'
          xgbe-drv.c:(.text+0x2e035e): undefined reference to `dma_sync_single_for_cpu'
          drivers/built-in.o: In function `xgbe_unmap_rdata':
          xgbe-desc.c:(.text+0x2e5fe4): undefined reference to `dma_unmap_page'
          xgbe-desc.c:(.text+0x2e5ffa): undefined reference to `dma_unmap_single'
          xgbe-desc.c:(.text+0x2e604a): undefined reference to `dma_unmap_page'
          xgbe-desc.c:(.text+0x2e6084): undefined reference to `dma_unmap_page'
          drivers/built-in.o: In function `xgbe_alloc_pages':
          xgbe-desc.c:(.text+0x2e6156): undefined reference to `dma_map_page'
          xgbe-desc.c:(.text+0x2e6164): undefined reference to `dma_mapping_error'
          drivers/built-in.o: In function `xgbe_free_ring':
          xgbe-desc.c:(.text+0x2e63d4): undefined reference to `dma_unmap_page'
          xgbe-desc.c:(.text+0x2e640e): undefined reference to `dma_unmap_page'
          xgbe-desc.c:(.text+0x2e644a): undefined reference to `dma_free_coherent'
          drivers/built-in.o: In function `xgbe_init_ring':
          xgbe-desc.c:(.text+0x2e64d4): undefined reference to `dma_alloc_coherent'
          drivers/built-in.o: In function `xgbe_map_tx_skb':
          xgbe-desc.c:(.text+0x2e6628): undefined reference to `dma_map_single'
          xgbe-desc.c:(.text+0x2e6638): undefined reference to `dma_mapping_error'
          xgbe-desc.c:(.text+0x2e66b2): undefined reference to `dma_map_single'
          xgbe-desc.c:(.text+0x2e66c2): undefined reference to `dma_mapping_error'
          xgbe-desc.c:(.text+0x2e6762): undefined reference to `dma_map_page'
          xgbe-desc.c:(.text+0x2e6772): undefined reference to `dma_mapping_error'
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61e77d29
    • J
      net: unix: garbage: fixed several comment and whitespace style issues · d1ab39f1
      Jason Eastman 提交于
      fixed several comment and whitespace style issues
      Signed-off-by: NJason Eastman <eastman.jason.linux@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1ab39f1
  5. 23 4月, 2015 22 次提交
    • D
      Merge branch 'tipc-fixes' · 37a06a87
      David S. Miller 提交于
      Jon Maloy says:
      
      ====================
      tipc: three bug fixes
      
      A set of unrelated corrections; one for the tipc netns implementation,
      one regarding problems with random link resets, and one removing a
      an erroneous refcount decrement when reading link statistsics via
      netlink.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37a06a87
    • E
      tipc: fix node refcount issue · 73a31737
      Erik Hugne 提交于
      When link statistics is dumped over netlink, we iterate over
      the list of peer nodes and append each links statistics to
      the netlink msg. In the case where the dump is resumed after
      filling up a nlmsg, the node refcnt is decremented without
      having been incremented previously which may cause the node
      reference to be freed. When this happens, the following
      info/stacktrace will be generated, followed by a crash or
      undefined behavior.
      We fix this by removing the erroneous call to tipc_node_put
      inside the loop that iterates over nodes.
      
      [  384.312303] INFO: trying to register non-static key.
      [  384.313110] the code is fine but needs lockdep annotation.
      [  384.313290] turning off the locking correctness validator.
      [  384.313290] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.0.0+ #13
      [  384.313290] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  384.313290]  ffff88003c6d0290 ffff88003cc03ca8 ffffffff8170adf1 0000000000000007
      [  384.313290]  ffffffff82728730 ffff88003cc03d38 ffffffff810a6a6d 00000000001d7200
      [  384.313290]  ffff88003c6d0ab0 ffff88003cc03ce8 0000000000000285 0000000000000001
      [  384.313290] Call Trace:
      [  384.313290]  <IRQ>  [<ffffffff8170adf1>] dump_stack+0x4c/0x65
      [  384.313290]  [<ffffffff810a6a6d>] __lock_acquire+0xf3d/0xf50
      [  384.313290]  [<ffffffff810a7375>] lock_acquire+0xd5/0x290
      [  384.313290]  [<ffffffffa0043e8c>] ? link_timeout+0x1c/0x170 [tipc]
      [  384.313290]  [<ffffffffa0043e70>] ? link_state_event+0x4e0/0x4e0 [tipc]
      [  384.313290]  [<ffffffff81712890>] _raw_spin_lock_bh+0x40/0x80
      [  384.313290]  [<ffffffffa0043e8c>] ? link_timeout+0x1c/0x170 [tipc]
      [  384.313290]  [<ffffffffa0043e8c>] link_timeout+0x1c/0x170 [tipc]
      [  384.313290]  [<ffffffff810c4698>] call_timer_fn+0xb8/0x490
      [  384.313290]  [<ffffffff810c45e0>] ? process_timeout+0x10/0x10
      [  384.313290]  [<ffffffff810c5a2c>] run_timer_softirq+0x21c/0x420
      [  384.313290]  [<ffffffffa0043e70>] ? link_state_event+0x4e0/0x4e0 [tipc]
      [  384.313290]  [<ffffffff8105a954>] __do_softirq+0xf4/0x630
      [  384.313290]  [<ffffffff8105afdd>] irq_exit+0x5d/0x60
      [  384.313290]  [<ffffffff8103ade1>] smp_apic_timer_interrupt+0x41/0x50
      [  384.313290]  [<ffffffff817144a0>] apic_timer_interrupt+0x70/0x80
      [  384.313290]  <EOI>  [<ffffffff8100db10>] ? default_idle+0x20/0x210
      [  384.313290]  [<ffffffff8100db0e>] ? default_idle+0x1e/0x210
      [  384.313290]  [<ffffffff8100e61a>] arch_cpu_idle+0xa/0x10
      [  384.313290]  [<ffffffff81099803>] cpu_startup_entry+0x2c3/0x530
      [  384.313290]  [<ffffffff810d2893>] ? clockevents_register_device+0x113/0x200
      [  384.313290]  [<ffffffff81038b0f>] start_secondary+0x13f/0x170
      
      Fixes: 8a0f6ebe ("tipc: involve reference counter for node structure")
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73a31737
    • E
      tipc: fix random link reset problem · 9871b27f
      Erik Hugne 提交于
      In the function tipc_sk_rcv(), the stack variable 'err'
      is only initialized to TIPC_ERR_NO_PORT for the first
      iteration over the link input queue. If a chain of messages
      are received from a link, failure to lookup the socket for
      any but the first message will cause the message to bounce back
      out on a random link.
      We fix this by properly initializing err.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9871b27f
    • Y
      tipc: fix topology server broken issue · def81f69
      Ying Xue 提交于
      When a new topology server is launched in a new namespace, its
      listening socket is inserted into the "init ns" namespace's socket
      hash table rather than the one owned by the new namespace. Although
      the socket's namespace is forcedly changed to the new namespace later,
      the socket is still stored in the socket hash table of "init ns"
      namespace. When a client created in the new namespace connects
      its own topology server, the connection is failed as its server's
      socket could not be found from its own namespace's socket table.
      
      If __sock_create() instead of original sock_create_kern() is used
      to create the server's socket through specifying an expected namesapce,
      the socket will be inserted into the specified namespace's socket
      table, thereby avoiding to the topology server broken issue.
      
      Fixes: 76100a8a ("tipc: fix netns refcnt leak")
      Reported-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      def81f69
    • D
      ibmveth: Fix off-by-one error in ibmveth_change_mtu() · 4fce1482
      David Gibson 提交于
      AFAIK the PAPR document which defines the virtual device interface used by
      the ibmveth driver doesn't specify a specific maximum MTU.  So, in the
      ibmveth driver, the maximum allowed MTU is determined by the maximum
      allocated buffer size of 64k (corresponding to one page in the common case)
      minus the per-buffer overhead IBMVETH_BUFF_OH (which has value 22 for 14
      bytes of ethernet header, plus 8 bytes for an opaque handle).
      
      This suggests a maximum allowable MTU of 65514 bytes, but in fact the
      driver only permits a maximum MTU of 65513.  This is because there is a <
      instead of an <= in ibmveth_change_mtu(), which only permits an MTU which
      is strictly smaller than the buffer size, rather than allowing the buffer
      to be completely filled.
      
      This patch fixes the buglet.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Acked-by: NThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fce1482
    • J
      netdev_alloc_pcpu_stats: use less common iterator variable · ec65aafb
      Johannes Berg 提交于
      With the CPU iteration variable called 'i', it's relatively easy
      to have variable shadowing which sparse will warn about. Avoid
      that by renaming the variable to __cpu which is less likely to
      be used in the surrounding context.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec65aafb
    • L
      vxlan: remove the unnecessary codes · 60840429
      Li RongQing 提交于
      The return value of vxlan_fdb_replace always is greater than or equal to 0
      Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60840429
    • B
      net/macb: Factor out one-time assignment from loop · 21d3515c
      Ben Shelton 提交于
      In 02c958dd (net/macb: add TX multiqueue support for gem), the
      initialization of tx_head and tx_tail in macb_init_rings() was moved
      inside the loop that iterates over each element in the ring.  Since
      tx_head and tx_tail only need to be assigned once, move them back out of
      the loop.
      Signed-off-by: NBen Shelton <ben.shelton@ni.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21d3515c
    • E
      net: do not deplete pfmemalloc reserve · 79930f58
      Eric Dumazet 提交于
      build_skb() should look at the page pfmemalloc status.
      If set, this means page allocator allocated this page in the
      expectation it would help to free other pages. Networking
      stack can do that only if skb->pfmemalloc is also set.
      
      Also, we must refrain using high order pages from the pfmemalloc
      reserve, so __page_frag_refill() must also use __GFP_NOMEMALLOC for
      them. Under memory pressure, using order-0 pages is probably the best
      strategy.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79930f58
    • J
      ip6_gre: use netdev_alloc_pcpu_stats() · 26349c71
      Johannes Berg 提交于
      The code there just open-codes the same, so use the provided macro instead.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26349c71
    • D
      Merge branch 'mpls' · 0e2d6865
      David S. Miller 提交于
      Robert Shearman says:
      
      ====================
      mpls: ABI changes for security and correctness
      
      V2:
       - don't treat loopback interfaces specially by enabling mpls by
         default
      
      These changes make mpls not be enabled by default on all
      interfaces when in use for security, along with ensuring that a label
      not valid as an outgoing label can be added in mpls routes.
      
      This series contains three ABI/behaviour-affecting changes which have
      been split out from "[PATCH net-next v4 0/6] mpls: Behaviour-changing
      improvements" without any further modification. These changes need to
      be considered for 4.1 otherwise we'll be stuck with the current
      behaviour/ABI forever.
      ====================
      Reviewed-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e2d6865
    • R
      mpls: Prevent use of implicit NULL label as outgoing label · 5a9ab017
      Robert Shearman 提交于
      The reserved implicit-NULL label isn't allowed to appear in the label
      stack for packets, so make it an error for the control plane to
      specify it as an outgoing label.
      Suggested-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Reviewed-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a9ab017
    • R
      mpls: Per-device enabling of packet input · 37bde799
      Robert Shearman 提交于
      An MPLS network is a single trust domain where the edges must be in
      control of what labels make their way into the core. The simplest way
      of ensuring this is for the edge device to always impose the labels,
      and not allow forward labeled traffic from untrusted neighbours. This
      is achieved by allowing a per-device configuration of whether MPLS
      traffic input from that interface should be processed or not.
      
      To be secure by default, the default state is changed to MPLS being
      disabled on all interfaces unless explicitly enabled and no global
      option is provided to change the default. Whilst this differs from
      other protocols (e.g. IPv6), network operators are used to explicitly
      enabling MPLS forwarding on interfaces, and with the number of links
      to the MPLS core typically fairly low this doesn't present too much of
      a burden on operators.
      
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Reviewed-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37bde799
    • R
      mpls: Per-device MPLS state · 03c57747
      Robert Shearman 提交于
      Add per-device MPLS state to supported interfaces. Use the presence of
      this state in mpls_route_add to determine that this is a supported
      interface.
      
      Use the presence of mpls_dev to drop packets that arrived on an
      unsupported interface - previously they were allowed through.
      
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Reviewed-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03c57747
    • Y
      bnx2x: Prevent inner-reload while VFs exist · 909d9faa
      Yuval Mintz 提交于
      On some feature changes, driver employes an inner-reload flow where it
      resets the function and re-configures it with the new required set of
      parameters.
      
      Such a flow proves fatal to any VF since those were not intended to be used
      while HW is being reset underneath, causing them [at best] to lose all
      connectivity.
      
      This changes driver behavior to fail all configuration changes [e.g., mtu
      change] requested of the driver in case VFs are active.
      Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: NAriel Elior <Ariel.Elior@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      909d9faa
    • D
      Merge branch 'rhashtable-fixes' · a2c3935a
      David S. Miller 提交于
      Thomas Graf says:
      
      ====================
      rhashtable rehashing fixes
      
      Some rhashtable rehashing bugs found while testing with the
      next rhashtable self-test queued up for the next devel cycle:
      
      https://github.com/tgraf/net-next/commits/rht
      
      v2:
       - Moved schedule_work() call into rhashtable_insert_rehash()
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2c3935a
    • T
      rhashtable: Do not schedule more than one rehash if we can't grow further · a87b9ebf
      Thomas Graf 提交于
      The current code currently only stops inserting rehashes into the
      chain when no resizes are currently scheduled. As long as resizes
      are scheduled and while inserting above the utilization watermark,
      more and more rehashes will be scheduled.
      
      This lead to a perfect DoS storm with thousands of rehashes
      scheduled which lead to thousands of spinlocks to be taken
      sequentially.
      
      Instead, only allow either a series of resizes or a single rehash.
      Drop any further rehashes and return -EBUSY.
      
      Fixes: ccd57b1b ("rhashtable: Add immediate rehash during insertion")
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a87b9ebf
    • T
      rhashtable: Schedule async resize when sync realloc fails · e2307ed6
      Thomas Graf 提交于
      When rhashtable_insert_rehash() fails with ENOMEM, this indicates that
      we can't allocate the necessary memory in the current context but the
      limits as set by the user would still allow to grow.
      
      Thus attempt an async resize in the background where we can allocate
      using GFP_KERNEL which is more likely to succeed. The insertion itself
      will still fail to indicate pressure.
      
      This fixes a bug where the table would never continue growing once the
      utilization is above 100%.
      
      Fixes: ccd57b1b ("rhashtable: Add immediate rehash during insertion")
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2307ed6
    • E
      tcp: fix possible deadlock in tcp_send_fin() · d83769a5
      Eric Dumazet 提交于
      Using sk_stream_alloc_skb() in tcp_send_fin() is dangerous in
      case a huge process is killed by OOM, and tcp_mem[2] is hit.
      
      To be able to free memory we need to make progress, so this
      patch allows FIN packets to not care about tcp_mem[2], if
      skb allocation succeeded.
      
      In a follow-up patch, we might abort tcp_send_fin() infinite loop
      in case TIF_MEMDIE is set on this thread, as memory allocator
      did its best getting extra memory already.
      
      This patch reverts d22e1537 ("tcp: fix tcp fin memory accounting")
      
      Fixes: d22e1537 ("tcp: fix tcp fin memory accounting")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d83769a5
    • L
      Merge tag 'mmc-4.1-rc1' of git://git.linaro.org/people/ulf.hansson/mmc · 5e6c94a9
      Linus Torvalds 提交于
      Pull MMC fixes from Ulf Hansson:
       "Here is two mmc core fixes for v.4.1 rc1:
      
         - fix error code propagation in mmc_pwrseq_simple_alloc()
      
         - revert 'mmc: core: Convert mmc_driver to device_driver'"
      
      * tag 'mmc-4.1-rc1' of git://git.linaro.org/people/ulf.hansson/mmc:
        Revert "mmc: core: Convert mmc_driver to device_driver"
        mmc: pwrseq: Fix error code propagation in mmc_pwrseq_simple_alloc()
      5e6c94a9
    • V
      dmaengine: hsu: don't prompt for hsu_core part · 3cfe2137
      Vinod Koul 提交于
      HSU_DMA is selected by the HSU_DMA_PCI driver, this should be user selected
      so remove the user prompt for this
      Signed-off-by: NVinod Koul <vinod.koul@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3cfe2137
    • L
      Merge tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 8b3c8ba3
      Linus Torvalds 提交于
      Pull ARM SoC late changes from Olof Johansson:
       "We were expecting to sit on this branch through most of the merge
        window since the contents was merged into our tree late, but we ended
        up sitting on all of our contents so it can go in with the rest.
      
        The contents here is:
      
         - a large branch of cleanups of the CM/PRM blocks on OMAP.
      
         - a couple of patches plumbing up CM/PRM on OMAP5 and DRA7.
      
         - a branch with DT updates for Freescale i.MX.  including some
           shuffling from .dts to .dtsi (include) files that causes a little
           churn"
      
      * tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (78 commits)
        ARM: OMAP2+: Fix booting with configs that don't have MFD_SYSCON
        ARM: OMAP4+: control: add support for initializing control module via DT
        ARM: dts: dra7: add minimal l4 bus layout with control module support
        ARM: dts: omap5: add minimal l4 bus layout with control module support
        ARM: OMAP4+: control: remove support for legacy pad read/write
        ARM: OMAP4: display: convert display to use syscon for dsi muxing
        ARM: dts: omap4: add minimal l4 bus layout with control module support
        ARM: dts: am4372: add minimal l4 bus layout with control module support
        ARM: dts: am43xx-epos-evm: fix pinmux node layout
        ARM: dts: am33xx: add minimal l4 bus layout with control module support
        ARM: dts: omap3: add minimal l4 bus layout with control module support
        ARM: dts: omap24xx: add minimal l4 bus layout with control module support
        ARM: OMAP2+: control: add syscon support for register accesses
        ARM: OMAP2+: id: cache omap_type value
        ARM: OMAP2+: control: remove API for getting control module base address
        ARM: OMAP2+: clock: add low-level support for regmap
        ARM: OMAP4+: PRM: get rid of cpu_is_omap44xx calls from interrupt init
        ARM: OMAP4+: PRM: setup prm_features from the PRM init time flags
        ARM: OMAP2+: CM: move SoC specific init calls within a generic API
        ARM: OMAP4+: PRM: determine prm_device_inst based on DT compatibility
        ...
      8b3c8ba3