1. 10 10月, 2014 12 次提交
    • C
      mm: introduce check_data_rlimit helper · 9c599024
      Cyrill Gorcunov 提交于
      To eliminate code duplication lets introduce check_data_rlimit helper
      which we will use in brk() and prctl() syscalls.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Vagin <avagin@openvz.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Vasiliy Kulikov <segoon@openwall.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Julien Tinnes <jln@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c599024
    • D
      mm: rename allocflags_to_migratetype for clarity · 43e7a34d
      David Rientjes 提交于
      The page allocator has gfp flags (like __GFP_WAIT) and alloc flags (like
      ALLOC_CPUSET) that have separate semantics.
      
      The function allocflags_to_migratetype() actually takes gfp flags, not
      alloc flags, and returns a migratetype.  Rename it to
      gfpflags_to_migratetype().
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      43e7a34d
    • V
      mm, compaction: khugepaged should not give up due to need_resched() · 1f9efdef
      Vlastimil Babka 提交于
      Async compaction aborts when it detects zone lock contention or
      need_resched() is true.  David Rientjes has reported that in practice,
      most direct async compactions for THP allocation abort due to
      need_resched().  This means that a second direct compaction is never
      attempted, which might be OK for a page fault, but khugepaged is intended
      to attempt a sync compaction in such case and in these cases it won't.
      
      This patch replaces "bool contended" in compact_control with an int that
      distinguishes between aborting due to need_resched() and aborting due to
      lock contention.  This allows propagating the abort through all compaction
      functions as before, but passing the abort reason up to
      __alloc_pages_slowpath() which decides when to continue with direct
      reclaim and another compaction attempt.
      
      Another problem is that try_to_compact_pages() did not act upon the
      reported contention (both need_resched() or lock contention) immediately
      and would proceed with another zone from the zonelist.  When
      need_resched() is true, that means initializing another zone compaction,
      only to check again need_resched() in isolate_migratepages() and aborting.
       For zone lock contention, the unintended consequence is that the lock
      contended status reported back to the allocator is detrmined from the last
      zone where compaction was attempted, which is rather arbitrary.
      
      This patch fixes the problem in the following way:
      - async compaction of a zone aborting due to need_resched() or fatal signal
        pending means that further zones should not be tried. We report
        COMPACT_CONTENDED_SCHED to the allocator.
      - aborting zone compaction due to lock contention means we can still try
        another zone, since it has different set of locks. We report back
        COMPACT_CONTENDED_LOCK only if *all* zones where compaction was attempted,
        it was aborted due to lock contention.
      
      As a result of these fixes, khugepaged will proceed with second sync
      compaction as intended, when the preceding async compaction aborted due to
      need_resched().  Page fault compactions aborting due to need_resched()
      will spare some cycles previously wasted by initializing another zone
      compaction only to abort again.  Lock contention will be reported only
      when compaction in all zones aborted due to lock contention, and therefore
      it's not a good idea to try again after reclaim.
      
      In stress-highalloc from mmtests configured to use __GFP_NO_KSWAPD, this
      has improved number of THP collapse allocations by 10%, which shows
      positive effect on khugepaged.  The benchmark's success rates are
      unchanged as it is not recognized as khugepaged.  Numbers of compact_stall
      and compact_fail events have however decreased by 20%, with
      compact_success still a bit improved, which is good.  With benchmark
      configured not to use __GFP_NO_KSWAPD, there is 6% improvement in THP
      collapse allocations, and only slight improvement in stalls and failures.
      
      [akpm@linux-foundation.org: fix warnings]
      Reported-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f9efdef
    • V
      mm, compaction: defer each zone individually instead of preferred zone · 53853e2d
      Vlastimil Babka 提交于
      When direct sync compaction is often unsuccessful, it may become deferred
      for some time to avoid further useless attempts, both sync and async.
      Successful high-order allocations un-defer compaction, while further
      unsuccessful compaction attempts prolong the compaction deferred period.
      
      Currently the checking and setting deferred status is performed only on
      the preferred zone of the allocation that invoked direct compaction.  But
      compaction itself is attempted on all eligible zones in the zonelist, so
      the behavior is suboptimal and may lead both to scenarios where 1)
      compaction is attempted uselessly, or 2) where it's not attempted despite
      good chances of succeeding, as shown on the examples below:
      
      1) A direct compaction with Normal preferred zone failed and set
         deferred compaction for the Normal zone.  Another unrelated direct
         compaction with DMA32 as preferred zone will attempt to compact DMA32
         zone even though the first compaction attempt also included DMA32 zone.
      
         In another scenario, compaction with Normal preferred zone failed to
         compact Normal zone, but succeeded in the DMA32 zone, so it will not
         defer compaction.  In the next attempt, it will try Normal zone which
         will fail again, instead of skipping Normal zone and trying DMA32
         directly.
      
      2) Kswapd will balance DMA32 zone and reset defer status based on
         watermarks looking good.  A direct compaction with preferred Normal
         zone will skip compaction of all zones including DMA32 because Normal
         was still deferred.  The allocation might have succeeded in DMA32, but
         won't.
      
      This patch makes compaction deferring work on individual zone basis
      instead of preferred zone.  For each zone, it checks compaction_deferred()
      to decide if the zone should be skipped.  If watermarks fail after
      compacting the zone, defer_compaction() is called.  The zone where
      watermarks passed can still be deferred when the allocation attempt is
      unsuccessful.  When allocation is successful, compaction_defer_reset() is
      called for the zone containing the allocated page.  This approach should
      approximate calling defer_compaction() only on zones where compaction was
      attempted and did not yield allocated page.  There might be corner cases
      but that is inevitable as long as the decision to stop compacting dues not
      guarantee that a page will be allocated.
      
      Due to a new COMPACT_DEFERRED return value, some functions relying
      implicitly on COMPACT_SKIPPED = 0 had to be updated, with comments made
      more accurate.  The did_some_progress output parameter of
      __alloc_pages_direct_compact() is removed completely, as the caller
      actually does not use it after compaction sets it - it is only considered
      when direct reclaim sets it.
      
      During testing on a two-node machine with a single very small Normal zone
      on node 1, this patch has improved success rates in stress-highalloc
      mmtests benchmark.  The success here were previously made worse by commit
      3a025760 ("mm: page_alloc: spill to remote nodes before waking
      kswapd") as kswapd was no longer resetting often enough the deferred
      compaction for the Normal zone, and DMA32 zones on both nodes were thus
      not considered for compaction.  On different machine, success rates were
      improved with __GFP_NO_KSWAPD allocations.
      
      [akpm@linux-foundation.org: fix CONFIG_COMPACTION=n build]
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53853e2d
    • L
      lib/genalloc.c: add genpool range check function · 9efb3a42
      Laura Abbott 提交于
      After allocating an address from a particular genpool, there is no good
      way to verify if that address actually belongs to a genpool.  Introduce
      addr_in_gen_pool which will return if an address plus size falls
      completely within the genpool range.
      Signed-off-by: NLaura Abbott <lauraa@codeaurora.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Reviewed-by: NOlof Johansson <olof@lixom.net>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: David Riley <davidriley@chromium.org>
      Cc: Ritesh Harjain <ritesh.harjani@gmail.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thierry Reding <thierry.reding@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9efb3a42
    • L
      lib/genalloc.c: add power aligned algorithm · 505e3be6
      Laura Abbott 提交于
      One of the more common algorithms used for allocation is to align the
      start address of the allocation to the order of size requested.  Add this
      as an algorithm option for genalloc.
      Signed-off-by: NLaura Abbott <lauraa@codeaurora.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NOlof Johansson <olof@lixom.net>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: David Riley <davidriley@chromium.org>
      Cc: Ritesh Harjain <ritesh.harjani@gmail.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thierry Reding <thierry.reding@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      505e3be6
    • Z
      memory-hotplug: add sysfs valid_zones attribute · ed2f2400
      Zhang Zhen 提交于
      Currently memory-hotplug has two limits:
      
      1. If the memory block is in ZONE_NORMAL, you can change it to
         ZONE_MOVABLE, but this memory block must be adjacent to ZONE_MOVABLE.
      
      2. If the memory block is in ZONE_MOVABLE, you can change it to
         ZONE_NORMAL, but this memory block must be adjacent to ZONE_NORMAL.
      
      With this patch, we can easy to know a memory block can be onlined to
      which zone, and don't need to know the above two limits.
      
      Updated the related Documentation.
      
      [akpm@linux-foundation.org: use conventional comment layout]
      [akpm@linux-foundation.org: fix build with CONFIG_MEMORY_HOTREMOVE=n]
      [akpm@linux-foundation.org: remove unused local zone_prev]
      Signed-off-by: NZhang Zhen <zhenzhang.zhang@huawei.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed2f2400
    • J
      mm/slab: use percpu allocator for cpu cache · bf0dea23
      Joonsoo Kim 提交于
      Because of chicken and egg problem, initialization of SLAB is really
      complicated.  We need to allocate cpu cache through SLAB to make the
      kmem_cache work, but before initialization of kmem_cache, allocation
      through SLAB is impossible.
      
      On the other hand, SLUB does initialization in a more simple way.  It uses
      percpu allocator to allocate cpu cache so there is no chicken and egg
      problem.
      
      So, this patch try to use percpu allocator in SLAB.  This simplifies the
      initialization step in SLAB so that we could maintain SLAB code more
      easily.
      
      In my testing there is no performance difference.
      
      This implementation relies on percpu allocator.  Because percpu allocator
      uses vmalloc address space, vmalloc address space could be exhausted by
      this change on many cpu system with *32 bit* kernel.  This implementation
      can cover 1024 cpus in worst case by following calculation.
      
      Worst: 1024 cpus * 4 bytes for pointer * 300 kmem_caches *
      	120 objects per cpu_cache = 140 MB
      Normal: 1024 cpus * 4 bytes for pointer * 150 kmem_caches(slab merge) *
      	80 objects per cpu_cache = 46 MB
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jeremiah Mahler <jmmahler@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bf0dea23
    • J
      topology: add support for node_to_mem_node() to determine the fallback node · ad2c8144
      Joonsoo Kim 提交于
      Anton noticed (http://www.spinics.net/lists/linux-mm/msg67489.html) that
      on ppc LPARs with memoryless nodes, a large amount of memory was consumed
      by slabs and was marked unreclaimable.  He tracked it down to slab
      deactivations in the SLUB core when we allocate remotely, leading to poor
      efficiency always when memoryless nodes are present.
      
      After much discussion, Joonsoo provided a few patches that help
      significantly.  They don't resolve the problem altogether:
      
       - memory hotplug still needs testing, that is when a memoryless node
         becomes memory-ful, we want to dtrt
       - there are other reasons for going off-node than memoryless nodes,
         e.g., fully exhausted local nodes
      
      Neither case is resolved with this series, but I don't think that should
      block their acceptance, as they can be explored/resolved with follow-on
      patches.
      
      The series consists of:
      
      [1/3] topology: add support for node_to_mem_node() to determine the
            fallback node
      
      [2/3] slub: fallback to node_to_mem_node() node if allocating on
            memoryless node
      
            - Joonsoo's patches to cache the nearest node with memory for each
              NUMA node
      
      [3/3] Partial revert of 81c98869 (""kthread: ensure locality of
            task_struct allocations")
      
       - At Tejun's request, keep the knowledge of memoryless node fallback
         to the allocator core.
      
      This patch (of 3):
      
      We need to determine the fallback node in slub allocator if the allocation
      target node is memoryless node.  Without it, the SLUB wrongly select the
      node which has no memory and can't use a partial slab, because of node
      mismatch.  Introduced function, node_to_mem_node(X), will return a node Y
      with memory that has the nearest distance.  If X is memoryless node, it
      will return nearest distance node, but, if X is normal node, it will
      return itself.
      
      We will use this function in following patch to determine the fallback
      node.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Han Pingtian <hanpt@linux.vnet.ibm.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ad2c8144
    • J
      mm/sl[ao]b: always track caller in kmalloc_(node_)track_caller() · 61f47105
      Joonsoo Kim 提交于
      Now, we track caller if tracing or slab debugging is enabled.  If they are
      disabled, we could save one argument passing overhead by calling
      __kmalloc(_node)().  But, I think that it would be marginal.  Furthermore,
      default slab allocator, SLUB, doesn't use this technique so I think that
      it's okay to change this situation.
      
      After this change, we can turn on/off CONFIG_DEBUG_SLAB without full
      kernel build and remove some complicated '#if' defintion.  It looks more
      benefitial to me.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      61f47105
    • J
      mm/slab_common: move kmem_cache definition to internal header · 07f361b2
      Joonsoo Kim 提交于
      We don't need to keep kmem_cache definition in include/linux/slab.h if we
      don't need to inline kmem_cache_size().  According to my code inspection,
      this function is only called at lc_create() in lib/lru_cache.c which may
      be called at initialization phase of something, so we don't need to inline
      it.  Therfore, move it to slab_common.c and move kmem_cache definition to
      internal header.
      
      After this change, we can change kmem_cache definition easily without full
      kernel build.  For instance, we can turn on/off CONFIG_SLUB_STATS without
      full kernel build.
      
      [akpm@linux-foundation.org: export kmem_cache_size() to modules]
      [rdunlap@infradead.org: add header files to fix kmemcheck.c build errors]
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      07f361b2
    • O
      proc/maps: make vm_is_stack() logic namespace-friendly · 58cb6548
      Oleg Nesterov 提交于
      - Rename vm_is_stack() to task_of_stack() and change it to return
        "struct task_struct *" rather than the global (and thus wrong in
        general) pid_t.
      
      - Add the new pid_of_stack() helper which calls task_of_stack() and
        uses the right namespace to report the correct pid_t.
      
        Unfortunately we need to define this helper twice, in task_mmu.c
        and in task_nommu.c. perhaps it makes sense to add fs/proc/util.c
        and move at least pid_of_stack/task_of_stack there to avoid the
        code duplication.
      
      - Change show_map_vma() and show_numa_map() to use the new helper.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Greg Ungerer <gerg@uclinux.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58cb6548
  2. 09 10月, 2014 3 次提交
  3. 08 10月, 2014 2 次提交
    • E
      net: better IFF_XMIT_DST_RELEASE support · 02875878
      Eric Dumazet 提交于
      Testing xmit_more support with netperf and connected UDP sockets,
      I found strange dst refcount false sharing.
      
      Current handling of IFF_XMIT_DST_RELEASE is not optimal.
      
      Dropping dst in validate_xmit_skb() is certainly too late in case
      packet was queued by cpu X but dequeued by cpu Y
      
      The logical point to take care of drop/force is in __dev_queue_xmit()
      before even taking qdisc lock.
      
      As Julian Anastasov pointed out, need for skb_dst() might come from some
      packet schedulers or classifiers.
      
      This patch adds new helper to cleanly express needs of various drivers
      or qdiscs/classifiers.
      
      Drivers that need skb_dst() in their ndo_start_xmit() should call
      following helper in their setup instead of the prior :
      
      	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
      ->
      	netif_keep_dst(dev);
      
      Instead of using a single bit, we use two bits, one being
      eventually rebuilt in bonding/team drivers.
      
      The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
      rebuilt in bonding/team. Eventually, we could add something
      smarter later.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02875878
    • P
      net: phy: adjust fixed_phy_register() return value · fd2ef0ba
      Petri Gynther 提交于
      Adjust fixed_phy_register() to return struct phy_device *, so that
      it becomes easy to use fixed PHYs without device tree support:
      
        phydev = fixed_phy_register(PHY_POLL, &fixed_phy_status, NULL);
        fixed_phy_set_link_update(phydev, fixed_phy_link_update);
        phy_connect_direct(netdev, phydev, handler_fn, phy_interface);
      
      This change is a prerequisite for modifying bcmgenet driver to work
      without a device tree on Broadcom's MIPS-based 7xxx platforms.
      Signed-off-by: NPetri Gynther <pgynther@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd2ef0ba
  4. 07 10月, 2014 1 次提交
  5. 06 10月, 2014 1 次提交
  6. 05 10月, 2014 1 次提交
    • V
      net: Cleanup skb cloning by adding SKB_FCLONE_FREE · c8753d55
      Vijay Subramanian 提交于
      SKB_FCLONE_UNAVAILABLE has overloaded meaning depending on type of skb.
      1: If skb is allocated from head_cache, it indicates fclone is not available.
      2: If skb is a companion fclone skb (allocated from fclone_cache), it indicates
      it is available to be used.
      
      To avoid confusion for case 2 above, this patch  replaces
      SKB_FCLONE_UNAVAILABLE with SKB_FCLONE_FREE where appropriate. For fclone
      companion skbs, this indicates it is free for use.
      
      SKB_FCLONE_UNAVAILABLE will now simply indicate skb is from head_cache and
      cannot / will not have a companion fclone.
      Signed-off-by: NVijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8753d55
  7. 04 10月, 2014 7 次提交
  8. 02 10月, 2014 3 次提交
    • P
      d068b02c
    • T
      udp: Generalize skb_udp_segment · 8bce6d7d
      Tom Herbert 提交于
      skb_udp_segment is the function called from udp4_ufo_fragment to
      segment a UDP tunnel packet. This function currently assumes
      segmentation is transparent Ethernet bridging (i.e. VXLAN
      encapsulation). This patch generalizes the function to
      operate on either Ethertype or IP protocol.
      
      The inner_protocol field must be set to the protocol of the inner
      header. This can now be either an Ethertype or an IP protocol
      (in a union). A new flag in the skbuff indicates which type is
      effective. skb_set_inner_protocol and skb_set_inner_ipproto
      helper functions were added to set the inner_protocol. These
      functions are called from the point where the tunnel encapsulation
      is occuring.
      
      When skb_udp_tunnel_segment is called, the function to segment the
      inner packet is selected based on the inner IP or Ethertype. In the
      case of an IP protocol encapsulation, the function is derived from
      inet[6]_offloads. In the case of Ethertype, skb->protocol is
      set to the inner_protocol and skb_mac_gso_segment is called. (GRE
      currently does this, but it might be possible to lookup the protocol
      in offload_base and call the appropriate segmenation function
      directly).
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8bce6d7d
    • E
      net: cleanup and document skb fclone layout · d0bf4a9e
      Eric Dumazet 提交于
      Lets use a proper structure to clearly document and implement
      skb fast clones.
      
      Then, we might experiment more easily alternative layouts.
      
      This patch adds a new skb_fclone_busy() helper, used by tcp and xfrm,
      to stop leaking of implementation details.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0bf4a9e
  9. 01 10月, 2014 3 次提交
    • B
      HID: wacom: implement generic HID handling for pen generic devices · 7704ac93
      Benjamin Tissoires 提交于
      ISDv4 and v5 are plain HID devices. We can directly implement a generic
      HID parsing/handling and remove the need to manually add those PID in
      the list of supported devices.
      
      This patch implements the pen support only. The finger part will come in
      a later patch.
      
      To be properly notified of an .event() and a .report(), we need to force
      hid-core to go through the HID parsing. By default, wacom.ko binds only
      hidraw, so the hid parsing is not done by hid-core. When a true HID device
      is there, we add the flag HID_CLAIMED_DRIVER to hid->claimed which will
      force hid-core to parse the incoming reports.
      (Note that this can be easily backported by directly setting the .claimed
      flag to HID_CLAIMED_DRIVER even if hid-core does not support
      HID_CONNECT_DRIVER)
      Signed-off-by: NBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Acked-by: NJason Gerecke <killertofu@gmail.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      7704ac93
    • M
      net/mlx4_core: New init and exit flow for mlx4_core · e1c00e10
      Majd Dibbiny 提交于
      In the new flow, we separate the pci initialization and teardown
      from the initialization and teardown of the other resources.
      
      __mlx4_init_one handles the pci resources initialization. It then
      calls mlx4_load_one to initialize the remainder of the resources.
      
      When removing a device, mlx4_remove_one is invoked. However, now
      mlx4_remove_one calls mlx4_unload_one to free all the resources except the pci
      resources. When mlx4_unload_one returns, mlx4_remove_one then frees the
      pci resources.
      
      The above separation will allow us to implement 'reset flow' in the future.
      It will also enable more EQs for VFs and is a pre-step to the modern API to
      enable/disable SRIOV.
      
      Also added nvfs; an integer array of size MLX4_MAX_PORTS + 1; to the mlx4_dev
      struct. This new field is used to avoid parsing the num_vfs module parameter
      each time the mlx4_restart_one is called.
      Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1c00e10
    • H
      bcma: register bcma as device tree driver · 2101e533
      Hauke Mehrtens 提交于
      This driver is used by the bcm53xx ARM SoC code. Now it is possible to
      give the address of the chipcommon core in device tree and bcma will
      search for all the other cores.
      Signed-off-by: NHauke Mehrtens <hauke@hauke-m.de>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      2101e533
  10. 30 9月, 2014 7 次提交
    • S
      tty: serial: 8250: use 32bit variable for rpm_tx_active · baeb7ef3
      Sebastian Andrzej Siewior 提交于
      The kbuild test robot wrote me:
      |  make.cross ARCH=powerpc
      |>> ERROR: ".__xchg_called_with_bad_pointer" [drivers/tty/serial/8250/8250.ko] undefined!
      
      The generic implementation of xchg() on arm and x86 works for variables of
      size one bye (char). According to the report powerpc does not support
      xchg() for one byte sized variables and looking further it seems also to
      be the same case for sparc and tile (or for 10 out of 26 architectures
      which provide a custom implementation).
      For that reason I increase the size of the variable from one to four
      bytes to get it work on powerpc (and the others).
      Reported-By: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      baeb7ef3
    • M
      macvlan: add source mode · 79cf79ab
      Michael Braun 提交于
      This patch adds a new mode of operation to macvlan, called "source".
      It allows one to set a list of allowed mac address, which is used
      to match against source mac address from received frames on underlying
      interface.
      This enables creating mac based VLAN associations, instead of standard
      port or tag based. The feature is useful to deploy 802.1x mac based
      behavior, where drivers of underlying interfaces doesn't allows that.
      
      Configuration is done through the netlink interface using e.g.:
       ip link add link eth0 name macvlan0 type macvlan mode source
       ip link add link eth0 name macvlan1 type macvlan mode source
       ip link set link dev macvlan0 type macvlan macaddr add 00:11:11:11:11:11
       ip link set link dev macvlan0 type macvlan macaddr add 00:22:22:22:22:22
       ip link set link dev macvlan0 type macvlan macaddr add 00:33:33:33:33:33
       ip link set link dev macvlan1 type macvlan macaddr add 00:33:33:33:33:33
       ip link set link dev macvlan1 type macvlan macaddr add 00:44:44:44:44:44
      
      This allows clients with MAC addresses 00:11:11:11:11:11,
      00:22:22:22:22:22 to be part of only VLAN associated with macvlan0
      interface. Clients with MAC addresses 00:44:44:44:44:44 with only VLAN
      associated with macvlan1 interface. And client with MAC address
      00:33:33:33:33:33 to be associated with both VLANs.
      
      Based on work of Stefan Gula <steweg@gmail.com>
      
      v8: last version of Stefan Gula for Kernel 3.2.1
      v9: rework onto linux-next 2014-03-12 by Michael Braun
          add MACADDR_SET command, enable to configure mac for source mode
          while creating interface
      v10:
        - reduce indention level
        - rename source_list to source_entry
        - use aligned 64bit ether address
        - use hash_64 instead of addr[5]
      v11:
        - rebase for 3.14 / linux-next 20.04.2014
      v12
        - rebase for linux-next 2014-09-25
      Signed-off-by: NMichael Braun <michael-dev@fami-braun.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79cf79ab
    • M
      ARCNET: add support for multi interfaces on com20020 · c51da42a
      Michael Grzeschik 提交于
      The com20020-pci driver is currently designed to instance
      one netdev with one pci device. This patch adds support to
      instance many cards with one pci device, depending on the device
      data in the private data.
      Signed-off-by: NMichael Grzeschik <m.grzeschik@pengutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c51da42a
    • M
      ARCNET: add com20020 PCI IDs with metadata · 8c14f9c7
      Michael Grzeschik 提交于
      This patch adds metadata for the com20020 to prepare for devices with
      multiple io address areas with multi card interfaces.
      Signed-off-by: NMichael Grzeschik <m.grzeschik@pengutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c14f9c7
    • A
      NFSD: Implement SEEK · 24bab491
      Anna Schumaker 提交于
      This patch adds server support for the NFS v4.2 operation SEEK, which
      returns the position of the next hole or data segment in a file.
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      24bab491
    • A
      NFSD: Add generic v4.2 infrastructure · 87a15a80
      Anna Schumaker 提交于
      It's cleaner to introduce everything at once and have the server reply
      with "not supported" than it would be to introduce extra operations when
      implementing a specific one in the middle of the list.
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      87a15a80
    • E
      net: reorganize sk_buff for faster __copy_skb_header() · b1937227
      Eric Dumazet 提交于
      With proliferation of bit fields in sk_buff, __copy_skb_header() became
      quite expensive, showing as the most expensive function in a GSO
      workload.
      
      __copy_skb_header() performance is also critical for non GSO TCP
      operations, as it is used from skb_clone()
      
      This patch carefully moves all the fields that were not copied in a
      separate zone : cloned, nohdr, fclone, peeked, head_frag, xmit_more
      
      Then I moved all other fields and all other copied fields in a section
      delimited by headers_start[0]/headers_end[0] section so that we
      can use a single memcpy() call, inlined by compiler using long
      word load/stores.
      
      I also tried to make all copies in the natural orders of sk_buff,
      to help hardware prefetching.
      
      I made sure sk_buff size did not change.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1937227