1. 09 10月, 2015 4 次提交
  2. 08 10月, 2015 6 次提交
  3. 07 10月, 2015 1 次提交
  4. 06 10月, 2015 2 次提交
  5. 05 10月, 2015 4 次提交
  6. 03 10月, 2015 3 次提交
    • S
      phylib: Add phy_set_max_speed helper · f3a6bd39
      Simon Horman 提交于
      Add a helper to allow ethernet drivers to limit the speed of a phy
      (that they are attached to).
      
      This mainly involves factoring out the business-end of
      of_set_phy_supported() and exporting a new symbol.
      
      This code seems to be open coded in several places, in several different
      variants.
      
      It is is envisaged that this will be used in situations where setting the
      "max-speed" property in DT is not appropriate, e.g. because the maximum
      speed is not a property of the phy hardware.
      Signed-off-by: NSimon Horman <horms+renesas@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3a6bd39
    • D
      sched, bpf: add helper for retrieving routing realms · c46646d0
      Daniel Borkmann 提交于
      Using routing realms as part of the classifier is quite useful, it
      can be viewed as a tag for one or multiple routing entries (think of
      an analogy to net_cls cgroup for processes), set by user space routing
      daemons or via iproute2 as an indicator for traffic classifiers and
      later on processed in the eBPF program.
      
      Unlike actions, the classifier can inspect device flags and enable
      netif_keep_dst() if necessary. tc actions don't have that possibility,
      but in case people know what they are doing, it can be used from there
      as well (e.g. via devs that must keep dsts by design anyway).
      
      If a realm is set, the handler returns the non-zero realm. User space
      can set the full 32bit realm for the dst.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c46646d0
    • D
      ebpf: migrate bpf_prog's flags to bitfield · a91263d5
      Daniel Borkmann 提交于
      As we need to add further flags to the bpf_prog structure, lets migrate
      both bools to a bitfield representation. The size of the base structure
      (excluding insns) remains unchanged at 40 bytes.
      
      Add also tags for the kmemchecker, so that it doesn't throw false
      positives. Even in case gcc would generate suboptimal code, it's not
      being accessed in performance critical paths.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a91263d5
  7. 02 10月, 2015 2 次提交
    • G
      memcg: remove pcp_counter_lock · ef510194
      Greg Thelen 提交于
      Commit 733a572e ("memcg: make mem_cgroup_read_{stat|event}() iterate
      possible cpus instead of online") removed the last use of the per memcg
      pcp_counter_lock but forgot to remove the variable.
      
      Kill the vestigial variable.
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef510194
    • G
      memcg: fix dirty page migration · 0610c25d
      Greg Thelen 提交于
      The problem starts with a file backed dirty page which is charged to a
      memcg.  Then page migration is used to move oldpage to newpage.
      
      Migration:
       - copies the oldpage's data to newpage
       - clears oldpage.PG_dirty
       - sets newpage.PG_dirty
       - uncharges oldpage from memcg
       - charges newpage to memcg
      
      Clearing oldpage.PG_dirty decrements the charged memcg's dirty page
      count.
      
      However, because newpage is not yet charged, setting newpage.PG_dirty
      does not increment the memcg's dirty page count.  After migration
      completes newpage.PG_dirty is eventually cleared, often in
      account_page_cleaned().  At this time newpage is charged to a memcg so
      the memcg's dirty page count is decremented which causes underflow
      because the count was not previously incremented by migration.  This
      underflow causes balance_dirty_pages() to see a very large unsigned
      number of dirty memcg pages which leads to aggressive throttling of
      buffered writes by processes in non root memcg.
      
      This issue:
       - can harm performance of non root memcg buffered writes.
       - can report too small (even negative) values in
         memory.stat[(total_)dirty] counters of all memcg, including the root.
      
      To avoid polluting migrate.c with #ifdef CONFIG_MEMCG checks, introduce
      page_memcg() and set_page_memcg() helpers.
      
      Test:
          0) setup and enter limited memcg
          mkdir /sys/fs/cgroup/test
          echo 1G > /sys/fs/cgroup/test/memory.limit_in_bytes
          echo $$ > /sys/fs/cgroup/test/cgroup.procs
      
          1) buffered writes baseline
          dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
          sync
          grep ^dirty /sys/fs/cgroup/test/memory.stat
      
          2) buffered writes with compaction antagonist to induce migration
          yes 1 > /proc/sys/vm/compact_memory &
          rm -rf /data/tmp/foo
          dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
          kill %
          sync
          grep ^dirty /sys/fs/cgroup/test/memory.stat
      
          3) buffered writes without antagonist, should match baseline
          rm -rf /data/tmp/foo
          dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
          sync
          grep ^dirty /sys/fs/cgroup/test/memory.stat
      
                             (speed, dirty residue)
                   unpatched                       patched
          1) 841 MB/s 0 dirty pages          886 MB/s 0 dirty pages
          2) 611 MB/s -33427456 dirty pages  793 MB/s 0 dirty pages
          3) 114 MB/s -33427456 dirty pages  891 MB/s 0 dirty pages
      
          Notice that unpatched baseline performance (1) fell after
          migration (3): 841 -> 114 MB/s.  In the patched kernel, post
          migration performance matches baseline.
      
      Fixes: c4843a75 ("memcg: add per cgroup dirty page accounting")
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Reported-by: NDave Hansen <dave.hansen@intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>	[4.2+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0610c25d
  8. 30 9月, 2015 11 次提交
  9. 29 9月, 2015 1 次提交
  10. 26 9月, 2015 1 次提交
  11. 25 9月, 2015 5 次提交
    • S
      IB/mlx5: Remove support for IB_DEVICE_LOCAL_DMA_LKEY · c6790aa9
      Sagi Grimberg 提交于
      Commit 96249d70 ("IB/core: Guarantee that a local_dma_lkey
      is available") allows ULPs that make use of the local dma key to keep
      working as before by allocating a DMA MR with local permissions and
      converted these consumers to use the MR associated with the PD
      rather then device->local_dma_lkey.
      
      ConnectIB has some known issues with memory registration
      using the local_dma_lkey (SEND, RDMA, RECV seems to work ok).
      
      Thus don't expose support for it (remove device->local_dma_lkey
      setting), and take advantage of the above commit such that no regression
      is introduced to working systems.
      
      The local_dma_lkey support will be restored in CX4 depending on FW
      capability query.
      Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      c6790aa9
    • R
      phy: add phy_device_remove() · 38737e49
      Russell King 提交于
      Add a phy_device_remove() function to complement phy_device_register(),
      which undoes the effects of phy_device_register() by removing the phy
      device from visibility, but not freeing it.
      
      This allows these details to be moved out of the mdio bus code into
      the phy code where this action belongs.
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38737e49
    • R
      phy: fix mdiobus module safety · 3e3aaf64
      Russell King 提交于
      Re-implement the mdiobus module refcounting to ensure that we actually
      ensure that the mdiobus module code does not go away while we might call
      into it.
      
      The old scheme using bus->dev.driver was buggy, because bus->dev is a
      class device which never has a struct device_driver associated with it,
      and hence the associated code trying to obtain a refcount did nothing
      useful.
      
      Instead, take the approach that other subsystems do: pass the module
      when calling mdiobus_register(), and record that in the mii_bus struct.
      When we need to increment the module use count in the phy code, use
      this stored pointer.  When the phy is deteched, drop the module
      refcount, remembering that the phy device might go away at that point.
      
      This doesn't stop the mii_bus going away while there are in-use phys -
      it merely stops the underlying code vanishing.
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e3aaf64
    • P
      skbuff: Fix skb checksum flag on skb pull · 6ae459bd
      Pravin B Shelar 提交于
      VXLAN device can receive skb with checksum partial. But the checksum
      offset could be in outer header which is pulled on receive. This results
      in negative checksum offset for the skb. Such skb can cause the assert
      failure in skb_checksum_help(). Following patch fixes the bug by setting
      checksum-none while pulling outer header.
      
      Following is the kernel panic msg from old kernel hitting the bug.
      
      ------------[ cut here ]------------
      kernel BUG at net/core/dev.c:1906!
      RIP: 0010:[<ffffffff81518034>] skb_checksum_help+0x144/0x150
      Call Trace:
      <IRQ>
      [<ffffffffa0164c28>] queue_userspace_packet+0x408/0x470 [openvswitch]
      [<ffffffffa016614d>] ovs_dp_upcall+0x5d/0x60 [openvswitch]
      [<ffffffffa0166236>] ovs_dp_process_packet_with_key+0xe6/0x100 [openvswitch]
      [<ffffffffa016629b>] ovs_dp_process_received_packet+0x4b/0x80 [openvswitch]
      [<ffffffffa016c51a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
      [<ffffffffa0171383>] vxlan_rcv+0x53/0x60 [openvswitch]
      [<ffffffffa01734cb>] vxlan_udp_encap_recv+0x8b/0xf0 [openvswitch]
      [<ffffffff8157addc>] udp_queue_rcv_skb+0x2dc/0x3b0
      [<ffffffff8157b56f>] __udp4_lib_rcv+0x1cf/0x6c0
      [<ffffffff8157ba7a>] udp_rcv+0x1a/0x20
      [<ffffffff8154fdbd>] ip_local_deliver_finish+0xdd/0x280
      [<ffffffff81550128>] ip_local_deliver+0x88/0x90
      [<ffffffff8154fa7d>] ip_rcv_finish+0x10d/0x370
      [<ffffffff81550365>] ip_rcv+0x235/0x300
      [<ffffffff8151ba1d>] __netif_receive_skb+0x55d/0x620
      [<ffffffff8151c360>] netif_receive_skb+0x80/0x90
      [<ffffffff81459935>] virtnet_poll+0x555/0x6f0
      [<ffffffff8151cd04>] net_rx_action+0x134/0x290
      [<ffffffff810683d8>] __do_softirq+0xa8/0x210
      [<ffffffff8162fe6c>] call_softirq+0x1c/0x30
      [<ffffffff810161a5>] do_softirq+0x65/0xa0
      [<ffffffff810687be>] irq_exit+0x8e/0xb0
      [<ffffffff81630733>] do_IRQ+0x63/0xe0
      [<ffffffff81625f2e>] common_interrupt+0x6e/0x6e
      Reported-by: NAnupam Chanda <achanda@vmware.com>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Acked-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ae459bd
    • T
      cgroup, writeback: don't enable cgroup writeback on traditional hierarchies · 9badce00
      Tejun Heo 提交于
      inode_cgwb_enabled() gates cgroup writeback support.  If it returns
      true, each inode is attached to the corresponding memory domain which
      gets mapped to io domain.  It currently only tests whether the
      filesystem and bdi support cgroup writeback; however, cgroup writeback
      support doesn't work on traditional hierarchies and thus it should
      also test whether memcg and iocg are on the default hierarchy.
      
      This caused traditional hierarchy setups to hit the cgroup writeback
      path inadvertently and ended up creating separate writeback domains
      for each memcg and mapping them all to the root iocg uncovering a
      couple issues in the cgroup writeback path.
      
      cgroup writeback was never meant to be enabled on traditional
      hierarchies.  Make inode_cgwb_enabled() test whether both memcg and
      iocg are on the default hierarchy.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NArtem Bityutskiy <dedekind1@gmail.com>
      Reported-by: NDexuan Cui <decui@microsoft.com>
      Link: http://lkml.kernel.org/g/1443012552.19983.209.camel@gmail.com
      Link: http://lkml.kernel.org/g/f30d4a6aa8a546ff88f73021d026a453@SIXPR30MB031.064d.mgd.msft.net
      9badce00