1. 21 5月, 2016 2 次提交
  2. 14 5月, 2016 3 次提交
  3. 05 5月, 2016 1 次提交
    • F
      drivers: replace dev->trans_start accesses with dev_trans_start · 4d0e9657
      Florian Westphal 提交于
      a trans_start struct member exists twice:
      - in struct net_device (legacy)
      - in struct netdev_queue
      
      Instead of open-coding dev->trans_start usage to obtain the current
      trans_start value, use dev_trans_start() instead.
      
      This is not exactly the same, as dev_trans_start also considers
      the trans_start values of the netdev queues owned by the device
      and provides the most recent one.
      
      For legacy devices this doesn't matter as dev_trans_start can cope
      with netdev trans_start values of 0 (they are ignored).
      
      This is a prerequisite to eventual removal of dev->trans_start.
      
      Cc: linux-rdma@vger.kernel.org
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d0e9657
  4. 07 4月, 2016 4 次提交
  5. 18 3月, 2016 1 次提交
    • J
      mm: introduce page reference manipulation functions · fe896d18
      Joonsoo Kim 提交于
      The success of CMA allocation largely depends on the success of
      migration and key factor of it is page reference count.  Until now, page
      reference is manipulated by direct calling atomic functions so we cannot
      follow up who and where manipulate it.  Then, it is hard to find actual
      reason of CMA allocation failure.  CMA allocation should be guaranteed
      to succeed so finding offending place is really important.
      
      In this patch, call sites where page reference is manipulated are
      converted to introduced wrapper function.  This is preparation step to
      add tracepoint to each page reference manipulation function.  With this
      facility, we can easily find reason of CMA allocation failure.  There is
      no functional change in this patch.
      
      In addition, this patch also converts reference read sites.  It will
      help a second step that renames page._count to something else and
      prevents later attempt to direct access to it (Suggested by Andrew).
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fe896d18
  6. 25 2月, 2016 6 次提交
    • S
      igb: call ndo_stop() instead of dev_close() when running offline selftest · 46eafa59
      Stefan Assmann 提交于
      Calling dev_close() causes IFF_UP to be cleared which will remove the
      interfaces routes and some addresses. That's probably not what the user
      intended when running the offline selftest. Besides this does not happen
      if the interface is brought down before the test, so the current
      behaviour is inconsistent.
      Instead call the net_device_ops ndo_stop function directly and avoid
      touching IFF_UP at all.
      Signed-off-by: NStefan Assmann <sassmann@kpanic.de>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      46eafa59
    • C
      igb: Fix VLAN tag stripping on Intel i350 · 030f9f52
      Corinna Vinschen 提交于
      Problem: When switching off VLAN offloading on an i350, the VLAN
      interface gets unusable.  For testing, set up a VLAN on an i350
      and some remote machine, e.g.:
      
        $ ip link add link eth0 name eth0.42 type vlan id 42
        $ ip addr add 192.168.42.1/24 dev eth0.42
        $ ip link set dev eth0.42 up
      
      Offloading is switched on by default:
      
        $ ethtool -k eth0 | grep vlan-offload
        rx-vlan-offload: on
        tx-vlan-offload: on
      
        $ ping -c 3 -I eth0.42 192.168.42.2
        [...works as usual...]
      
      Now switch off VLAN offloading and try again:
      
        $ ethtool -K eth0 rxvlan off
        Actual changes:
        rx-vlan-offload: off
        tx-vlan-offload: off [requested on]
        $ ping -c 3 -I eth0.42 192.168.42.2
        PING 192.168.42.2 (192.168.42.2) from 192.168.42.1 eth0.42: 56(84) bytes of da
      ta.
      
        --- 192.168.42.2 ping statistics ---
        3 packets transmitted, 0 received, 100% packet loss, time 1999ms
      
      I can only reproduce it on an i350, the above works fine on a 82580.
      
      While inspecting the igb source, I came across the code in igb_set_vmolr
      which sets the E1000_VMOLR_STRVLAN/E1000_DVMOLR_STRVLAN flags once and
      for all, and in all of the igb code there's no other place where the
      STRVLAN is set or cleared.  Thus, VLAN stripping is enabled in igb
      unconditionally, independently of the offloading setting.
      
      I compared that to the latest Intel igb-5.3.3.5 driver from
      http://sourceforge.net/projects/e1000/ which in fact sets and clears the
      STRVLAN flag independently from igb_set_vmolr in its own function
      igb_set_vf_vlan_strip, depending on the vlan settings.
      
      So I included the STRVLAN handling from the igb-5.3.3.5 driver into our
      current igb driver and tested the above scenario again.  This time ping
      still works after switching off VLAN offloading.
      
      Tested on i350, with and without addtional VFs, as well as on 82580
      successfully.
      Signed-off-by: NCorinna Vinschen <vinschen@redhat.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      030f9f52
    • A
      igb: Add support for generic Tx checksums · 6e033700
      Alexander Duyck 提交于
      This patch adds support for generic Tx checksums to the igb driver.  It
      turns out this is actually pretty easy after going over the datasheet as we
      were doing a number of steps we didn't need to.
      
      In order to perform a Tx checksum for an L4 header we need to fill in the
      following fields in the Tx descriptor:
        MACLEN (maximum of 127), retrieved from:
      		skb_network_offset()
        IPLEN  (maximum of 511), retrieved from:
      		skb_checksum_start_offset() - skb_network_offset()
        TUCMD.L4T indicates offset and if checksum or crc32c, based on:
      		skb->csum_offset
      
      The added advantage to doing this is that we can support inner checksum
      offloads for tunnels and MPLS while still being able to transparently
      insert VLAN tags.
      
      I also took the opportunity to clean-up many of the feature flag
      configuration bits to make them a bit more consistent between drivers.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      6e033700
    • T
      igb: rename igb define to be more generic · c883de9f
      Todd Fujinaka 提交于
      E1000_MRQC_ENABLE_RSS_4Q enables 4 and 8 queues depending on the part
      so rename to be generic.
      
      Similarly, E1000_MRQC_ENABLE_VMDQ_RSS_2Q has no numeric meaning so
      rename to be more generic.
      Signed-off-by: NTodd Fujinaka <todd.fujinaka@intel.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c883de9f
    • T
      igb: enable WoL for OEM devices regardless of EEPROM setting · 5e350b92
      Todd Fujinaka 提交于
      Override EEPROM settings for specific OEM devices.
      Signed-off-by: NTodd Fujinaka <todd.fujinaka@intel.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      5e350b92
    • T
      igb: When GbE link up, wait for Remote receiver status condition · b72f3f72
      Takuma Ueba 提交于
      I210 device IPv6 autoconf test sometimes fails,
      because DAD NS for link-local is not transmitted.
      This packet is silently dropped.
      This problem is seen only GbE environment.
      
      igb_watchdog_task link up detection continues to the following process.
      The following cases are observed:
      1.PHY 1000BASE-T Status Register Remote receiver status bit is NG.
      (NG status becomes OK after about 200 - 700ms)
      2.In this case, the transfer packet is silently dropped.
      
      1000BASE-T Status register
      [Expected]: 0x3800 or 0x7800
      [problem occurred]: 0x2800 or 0x6800
      Frequency of occurrence: approx 1/10 - 1/40 observed
      
      In order to avoid this problem,
      wait until 1000BASE-T Status register "Remote receiver status OK"
      
      After applying this patch, at least 400 runs succeed with no problems.
      Signed-off-by: NTakuma Ueba <t.ueba11@gmail.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      b72f3f72
  7. 16 2月, 2016 12 次提交
  8. 16 12月, 2015 1 次提交
  9. 13 12月, 2015 3 次提交
    • J
      igb: improve handling of disconnected adapters · 7b06a690
      Jarod Wilson 提交于
      Clean up array_rd32 so that it uses igb_rd32 the same as rd32, per the
      suggestion of Alexander Duyck, and use io_addr in more places, so that
      we don't have the need to call E1000_REMOVED (which simply looks for a
      null hw_addr) nearly as much.
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Acked-by: NAlexander Duyck <aduyck@mirantis.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      7b06a690
    • J
      igb: fix NULL derefs due to skipped SR-IOV enabling · be06998f
      Jan Beulich 提交于
      The combined effect of commits 6423fc34 ("igb: do not re-init SR-IOV
      during probe") and ceee3450 ("igb: make sure SR-IOV init uses the
      right number of queues") causes VFs no longer getting set up, leading
      to NULL pointer dereferences due to the adapter's ->vf_data being NULL
      while ->vfs_allocated_count is non-zero. The first commit not only
      neglected the side effect of igb_sriov_reinit() that the second commit
      tried to account for, but also that of setting IGB_FLAG_HAS_MSIX,
      without which igb_enable_sriov() is effectively a no-op. Calling
      igb_{,re}set_interrupt_capability() as done here seems to address this,
      but I'm not sure whether this is better than sinply reverting the other
      two commits.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      be06998f
    • J
      igb: don't unmap NULL hw_addr · 73bf8048
      Jarod Wilson 提交于
      I've got a startech thunderbolt dock someone loaned me, which among other
      things, has the following device in it:
      
      08:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
      
      This hotplugs just fine (kernel 4.2.0 plus a patch or two here):
      
      [  863.020315] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.2.18-k
      [  863.020316] igb: Copyright (c) 2007-2014 Intel Corporation.
      [  863.028657] igb 0000:08:00.0: enabling device (0000 -> 0002)
      [  863.062089] igb 0000:08:00.0: added PHC on eth0
      [  863.062090] igb 0000:08:00.0: Intel(R) Gigabit Ethernet Network Connection
      [  863.062091] igb 0000:08:00.0: eth0: (PCIe:2.5Gb/s:Width x1) e8:ea:6a:00:1b:2a
      [  863.062194] igb 0000:08:00.0: eth0: PBA No: 000200-000
      [  863.062196] igb 0000:08:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
      [  863.064889] igb 0000:08:00.0 enp8s0: renamed from eth0
      
      But disconnecting it is another story:
      
      [ 1002.807932] igb 0000:08:00.0: removed PHC on enp8s0
      [ 1002.807944] igb 0000:08:00.0 enp8s0: PCIe link lost, device now detached
      [ 1003.341141] ------------[ cut here ]------------
      [ 1003.341148] WARNING: CPU: 0 PID: 199 at lib/iomap.c:43 bad_io_access+0x38/0x40()
      [ 1003.341149] Bad IO access at port 0x0 ()
      [ 1003.342767] Modules linked in: snd_usb_audio snd_usbmidi_lib snd_rawmidi igb dca firewire_ohci firewire_core crc_itu_t rfcomm ctr ccm arc4 iwlmvm mac80211 fuse xt_CHECKSUM ipt_MASQUERADE
      nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat
      nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
      nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter bnep dm_mirror dm_region_hash dm_log dm_mod coretemp x86_pkg_temp_thermal intel_powerclamp kvm_intel snd_hda_codec_hdmi kvm
      crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drbg
      [ 1003.342793]  ansi_cprng aesni_intel hp_wmi aes_x86_64 iTCO_wdt lrw iTCO_vendor_support ppdev gf128mul sparse_keymap glue_helper ablk_helper cryptd snd_hda_codec_realtek snd_hda_codec_generic
      microcode snd_hda_intel uvcvideo iwlwifi snd_hda_codec videobuf2_vmalloc videobuf2_memops snd_hda_core videobuf2_core snd_hwdep btusb v4l2_common btrtl snd_seq btbcm btintel videodev cfg80211
      snd_seq_device rtsx_pci_ms bluetooth pcspkr input_leds i2c_i801 media parport_pc memstick rfkill sg lpc_ich snd_pcm 8250_fintek parport joydev snd_timer snd soundcore hp_accel ie31200_edac
      mei_me lis3lv02d edac_core input_polldev mei hp_wireless shpchp tpm_infineon sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables autofs4 xfs libcrc32c sd_mod sr_mod cdrom
      rtsx_pci_sdmmc mmc_core crc32c_intel serio_raw rtsx_pci
      [ 1003.342822]  nouveau ahci libahci mxm_wmi e1000e xhci_pci hwmon ptp drm_kms_helper pps_core xhci_hcd ttm wmi video ipv6
      [ 1003.342839] CPU: 0 PID: 199 Comm: kworker/0:2 Not tainted 4.2.0-2.el7_UNSUPPORTED.x86_64 #1
      [ 1003.342840] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS M70 Ver. 01.07 02/26/2015
      [ 1003.342843] Workqueue: pciehp-3 pciehp_power_thread
      [ 1003.342844]  ffffffff81a90655 ffff8804866d3b48 ffffffff8164763a 0000000000000000
      [ 1003.342846]  ffff8804866d3b98 ffff8804866d3b88 ffffffff8107134a ffff8804866d3b88
      [ 1003.342847]  ffff880486f46000 ffff88046c8a8000 ffff880486f46840 ffff88046c8a8098
      [ 1003.342848] Call Trace:
      [ 1003.342852]  [<ffffffff8164763a>] dump_stack+0x45/0x57
      [ 1003.342855]  [<ffffffff8107134a>] warn_slowpath_common+0x8a/0xc0
      [ 1003.342857]  [<ffffffff810713c6>] warn_slowpath_fmt+0x46/0x50
      [ 1003.342859]  [<ffffffff8133719e>] ? pci_disable_msix+0x3e/0x50
      [ 1003.342860]  [<ffffffff812f6328>] bad_io_access+0x38/0x40
      [ 1003.342861]  [<ffffffff812f6567>] pci_iounmap+0x27/0x40
      [ 1003.342865]  [<ffffffffa0b728d7>] igb_remove+0xc7/0x160 [igb]
      [ 1003.342867]  [<ffffffff8132189f>] pci_device_remove+0x3f/0xc0
      [ 1003.342869]  [<ffffffff81433426>] __device_release_driver+0x96/0x130
      [ 1003.342870]  [<ffffffff814334e3>] device_release_driver+0x23/0x30
      [ 1003.342871]  [<ffffffff8131b404>] pci_stop_bus_device+0x94/0xa0
      [ 1003.342872]  [<ffffffff8131b3ad>] pci_stop_bus_device+0x3d/0xa0
      [ 1003.342873]  [<ffffffff8131b3ad>] pci_stop_bus_device+0x3d/0xa0
      [ 1003.342874]  [<ffffffff8131b516>] pci_stop_and_remove_bus_device+0x16/0x30
      [ 1003.342876]  [<ffffffff81333f5b>] pciehp_unconfigure_device+0x9b/0x180
      [ 1003.342877]  [<ffffffff81333a73>] pciehp_disable_slot+0x43/0xb0
      [ 1003.342878]  [<ffffffff81333b6d>] pciehp_power_thread+0x8d/0xb0
      [ 1003.342885]  [<ffffffff810881b2>] process_one_work+0x152/0x3d0
      [ 1003.342886]  [<ffffffff8108854a>] worker_thread+0x11a/0x460
      [ 1003.342887]  [<ffffffff81088430>] ? process_one_work+0x3d0/0x3d0
      [ 1003.342890]  [<ffffffff8108ddd9>] kthread+0xc9/0xe0
      [ 1003.342891]  [<ffffffff8108dd10>] ? kthread_create_on_node+0x180/0x180
      [ 1003.342893]  [<ffffffff8164e29f>] ret_from_fork+0x3f/0x70
      [ 1003.342894]  [<ffffffff8108dd10>] ? kthread_create_on_node+0x180/0x180
      [ 1003.342895] ---[ end trace 65a77e06d5aa9358 ]---
      
      Upon looking at the igb driver, I see that igb_rd32() attempted to read from
      hw_addr and failed, so it set hw->hw_addr to NULL and spit out the message
      in the log output above, "PCIe link lost, device now detached".
      
      Well, now that hw_addr is NULL, the attempt to call pci_iounmap is obviously
      not going to go well. As suggested by Mark Rustad, do something similar to
      what ixgbe does, and save a copy of hw_addr as adapter->io_addr, so we can
      still call pci_iounmap on it on teardown. Additionally, for consistency,
      make the pci_iomap call assignment directly to io_addr, so map and unmap
      match.
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      73bf8048
  10. 16 10月, 2015 1 次提交
    • J
      drivers/net/intel: use napi_complete_done() · 32b3e08f
      Jesse Brandeburg 提交于
      As per Eric Dumazet's previous patches:
      (see commit (24d2e4a5) - tg3: use napi_complete_done())
      
      Quoting verbatim:
      Using napi_complete_done() instead of napi_complete() allows
      us to use /sys/class/net/ethX/gro_flush_timeout
      
      GRO layer can aggregate more packets if the flush is delayed a bit,
      without having to set too big coalescing parameters that impact
      latencies.
      </end quote>
      
      Tested
      configuration: low latency via ethtool -C ethx adaptive-rx off
      				rx-usecs 10 adaptive-tx off tx-usecs 15
      workload: streaming rx using netperf TCP_MAERTS
      
      igb:
      MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET : demo
      ...
      Interim result:  941.48 10^6bits/s over 1.000 seconds ending at 1440193171.589
      
      Alignment      Offset         Bytes    Bytes       Recvs   Bytes    Sends
      Local  Remote  Local  Remote  Xfered   Per                 Per
      Recv   Send    Recv   Send             Recv (avg)          Send (avg)
          8       8      0       0 1176930056  1475.36    797726   16384.00  71905
      
      MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET : demo
      ...
      Interim result:  941.49 10^6bits/s over 0.997 seconds ending at 1440193142.763
      
      Alignment      Offset         Bytes    Bytes       Recvs   Bytes    Sends
      Local  Remote  Local  Remote  Xfered   Per                 Per
      Recv   Send    Recv   Send             Recv (avg)          Send (avg)
          8       8      0       0 1175182320  50476.00     23282   16384.00  71816
      
      i40e:
      Hard to test because the traffic is incoming so fast (24Gb/s) that GRO
      always receives 87kB, even at the highest interrupt rate.
      
      Other drivers were only compile tested.
      Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      32b3e08f
  11. 05 10月, 2015 1 次提交
  12. 29 9月, 2015 1 次提交
    • S
      igb: assume MSI-X interrupts during initialization · cbfe360a
      Stefan Assmann 提交于
      In igb_sw_init() the sequence of calls was changed from
      igb_init_queue_configuration()
      igb_init_interrupt_scheme()
      igb_probe_vfs()
      to
      igb_probe_vfs()
      igb_init_queue_configuration()
      igb_init_interrupt_scheme()
      
      This results in adapter->flags not having the IGB_FLAG_HAS_MSIX bit set
      during igb_probe_vfs()->igb_enable_sriov(). Therefore SR-IOV does not
      get enabled properly and we run into a NULL pointer if the max_vfs
      module parameter is specified (adapter->vf_data does not get allocated,
      crash on accessing the structure).
      
      [    7.419348] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
      [    7.419367] IP: [<ffffffffa02161c6>] igb_reset+0xe6/0x5d0 [igb]
      [    7.419370] PGD 0
      [    7.419373] Oops: 0002 [#1] SMP
      [    7.419381] Modules linked in: ahci(+) libahci igb(+) i40e(+) vxlan ip6_udp_tunnel udp_tunnel megaraid_sas(+) ixgbe(+) mdio
      [    7.419385] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.2.0+ #153
      [    7.419387] Hardware name: Dell Inc. PowerEdge R720/0C4Y3R, BIOS 1.6.0 03/07/2013
      [...]
      [    7.419431] Call Trace:
      [    7.419442]  [<ffffffffa0217236>] igb_probe+0x8b6/0x1340 [igb]
      [    7.419447]  [<ffffffff814c7f15>] local_pci_probe+0x45/0xa0
      
      Prevent this by setting the IGB_FLAG_HAS_MSIX bit before calling
      igb_probe_vfs(). The real interrupt capabilities will be checked during
      igb_init_interrupt_scheme() so this is safe to do.
      Signed-off-by: NStefan Assmann <sassmann@kpanic.de>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      cbfe360a
  13. 22 8月, 2015 1 次提交
    • M
      mm: make page pfmemalloc check more robust · 2f064f34
      Michal Hocko 提交于
      Commit c48a11c7 ("netvm: propagate page->pfmemalloc to skb") added
      checks for page->pfmemalloc to __skb_fill_page_desc():
      
              if (page->pfmemalloc && !page->mapping)
                      skb->pfmemalloc = true;
      
      It assumes page->mapping == NULL implies that page->pfmemalloc can be
      trusted.  However, __delete_from_page_cache() can set set page->mapping
      to NULL and leave page->index value alone.  Due to being in union, a
      non-zero page->index will be interpreted as true page->pfmemalloc.
      
      So the assumption is invalid if the networking code can see such a page.
      And it seems it can.  We have encountered this with a NFS over loopback
      setup when such a page is attached to a new skbuf.  There is no copying
      going on in this case so the page confuses __skb_fill_page_desc which
      interprets the index as pfmemalloc flag and the network stack drops
      packets that have been allocated using the reserves unless they are to
      be queued on sockets handling the swapping which is the case here and
      that leads to hangs when the nfs client waits for a response from the
      server which has been dropped and thus never arrive.
      
      The struct page is already heavily packed so rather than finding another
      hole to put it in, let's do a trick instead.  We can reuse the index
      again but define it to an impossible value (-1UL).  This is the page
      index so it should never see the value that large.  Replace all direct
      users of page->pfmemalloc by page_is_pfmemalloc which will hide this
      nastiness from unspoiled eyes.
      
      The information will get lost if somebody wants to use page->index
      obviously but that was the case before and the original code expected
      that the information should be persisted somewhere else if that is
      really needed (e.g.  what SLAB and SLUB do).
      
      [akpm@linux-foundation.org: fix blooper in slub]
      Fixes: c48a11c7 ("netvm: propagate page->pfmemalloc to skb")
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Debugged-by: NVlastimil Babka <vbabka@suse.com>
      Debugged-by: NJiri Bohac <jbohac@suse.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: <stable@vger.kernel.org>	[3.6+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f064f34
  14. 19 8月, 2015 3 次提交