1. 05 12月, 2012 3 次提交
  2. 04 12月, 2012 2 次提交
    • M
      tun: only queue packets on device · 5d097109
      Michael S. Tsirkin 提交于
      Historically tun supported two modes of operation:
      - in default mode, a small number of packets would get queued
        at the device, the rest would be queued in qdisc
      - in one queue mode, all packets would get queued at the device
      
      This might have made sense up to a point where we made the
      queue depth for both modes the same and set it to
      a huge value (500) so unless the consumer
      is stuck the chance of losing packets is small.
      
      Thus in practice both modes behave the same, but the
      default mode has some problems:
      - if packets are never consumed, fragments are never orphaned
        which cases a DOS for sender using zero copy transmit
      - overrun errors are hard to diagnose: fifo error is incremented
        only once so you can not distinguish between
        userspace that is stuck and a transient failure,
        tcpdump on the device does not show any traffic
      
      Userspace solves this simply by enabling IFF_ONE_QUEUE
      but there seems to be little point in not doing the
      right thing for everyone, by default.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d097109
    • M
      sctp: Add support to per-association statistics via a new SCTP_GET_ASSOC_STATS call · 196d6759
      Michele Baldessari 提交于
      The current SCTP stack is lacking a mechanism to have per association
      statistics. This is an implementation modeled after OpenSolaris'
      SCTP_GET_ASSOC_STATS.
      
      Userspace part will follow on lksctp if/when there is a general ACK on
      this.
      V4:
      - Move ipackets++ before q->immediate.func() for consistency reasons
      - Move sctp_max_rto() at the end of sctp_transport_update_rto() to avoid
        returning bogus RTO values
      - return asoc->rto_min when max_obs_rto value has not changed
      
      V3:
      - Increase ictrlchunks in sctp_assoc_bh_rcv() as well
      - Move ipackets++ to sctp_inq_push()
      - return 0 when no rto updates took place since the last call
      
      V2:
      - Implement partial retrieval of stat struct to cope for future expansion
      - Kill the rtxpackets counter as it cannot be precise anyway
      - Rename outseqtsns to outofseqtsns to make it clearer that these are out
        of sequence unexpected TSNs
      - Move asoc->ipackets++ under a lock to avoid potential miscounts
      - Fold asoc->opackets++ into the already existing asoc check
      - Kill unneeded (q->asoc) test when increasing rtxchunks
      - Do not count octrlchunks if sending failed (SCTP_XMIT_OK != 0)
      - Don't count SHUTDOWNs as SACKs
      - Move SCTP_GET_ASSOC_STATS to the private space API
      - Adjust the len check in sctp_getsockopt_assoc_stats() to allow for
        future struct growth
      - Move association statistics in their own struct
      - Update idupchunks when we send a SACK with dup TSNs
      - return min_rto in max_rto when RTO has not changed. Also return the
        transport when max_rto last changed.
      
      Signed-off: Michele Baldessari <michele@acksyn.org>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      196d6759
  3. 03 12月, 2012 5 次提交
    • J
      netfilter: nf_nat: Handle routing changes in MASQUERADE target · a0ecb85a
      Jozsef Kadlecsik 提交于
      When the route changes (backup default route, VPNs) which affect a
      masqueraded target, the packets were sent out with the outdated source
      address. The patch addresses the issue by comparing the outgoing interface
      directly with the masqueraded interface in the nat table.
      
      Events are inefficient in this case, because it'd require adding route
      events to the network core and then scanning the whole conntrack table
      and re-checking the route for all entry.
      Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      a0ecb85a
    • F
      netfilter: kill support for per-af queue backends · 0360ae41
      Florian Westphal 提交于
      We used to have several queueing backends, but nowadays only
      nfnetlink_queue remains.
      
      In light of this there doesn't seem to be a good reason to
      support per-af registering -- just hook up nfnetlink_queue on module
      load and remove it on unload.
      
      This means that the userspace BIND/UNBIND_PF commands are now obsolete;
      the kernel will ignore them.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0360ae41
    • P
      netfilter: ctnetlink: dump entries from the dying and unconfirmed lists · d871befe
      Pablo Neira Ayuso 提交于
      This patch adds a new operation to dump the content of the dying and
      unconfirmed lists.
      
      Under some situations, the global conntrack counter can be inconsistent
      with the number of entries that we can dump from the conntrack table.
      The way to resolve this is to allow dumping the content of the unconfirmed
      and dying lists, so far it was not possible to look at its content.
      
      This provides some extra instrumentation to resolve problematic situations
      in which anyone suspects memory leaks.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d871befe
    • P
      netfilter: nf_conntrack: improve nf_conn object traceability · 04dac011
      Pablo Neira Ayuso 提交于
      This patch modifies the conntrack subsystem so that all existing
      allocated conntrack objects can be found in any of the following
      places:
      
      * the hash table, this is the typical place for alive conntrack objects.
      * the unconfirmed list, this is the place for newly created conntrack objects
        that are still traversing the stack.
      * the dying list, this is where you can find conntrack objects that are dying
        or that should die anytime soon (eg. once the destroy event is delivered to
        the conntrackd daemon).
      
      Thus, we make sure that we follow the track for all existing conntrack
      objects. This patch, together with some extension of the ctnetlink interface
      to dump the content of the dying and unconfirmed lists, will help in case
      to debug suspected nf_conn object leaks.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      04dac011
    • E
      net: fix sparse endianness warnings on sock_common · 077b393d
      Eric Dumazet 提交于
      # make C=2 CF=-D__CHECK_ENDIAN__ net/ipv4/inet_hashtables.o
      ...
      net/ipv4/inet_hashtables.c:242:7: warning: restricted __portpair degrades to integer
      net/ipv4/inet_hashtables.c:242:7: warning: restricted __addrpair degrades to integer
      ...
      
      Move __portpair/__addrpair from include/net/inet_hashtables.h
      to include/net/sock.h where we need them in struct sock_common
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ling Ma <ling.ma.program@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      077b393d
  4. 02 12月, 2012 2 次提交
  5. 01 12月, 2012 2 次提交
    • E
      net: move inet_dport/inet_num in sock_common · ce43b03e
      Eric Dumazet 提交于
      commit 68835aba (net: optimize INET input path further)
      moved some fields used for tcp/udp sockets lookup in the first cache
      line of struct sock_common.
      
      This patch moves inet_dport/inet_num as well, filling a 32bit hole
      on 64 bit arches and reducing number of cache line misses in lookups.
      
      Also change INET_MATCH()/INET_TW_MATCH() to perform the ports match
      before addresses match, as this check is more discriminant.
      
      Remove the hash check from MATCH() macros because we dont need to
      re validate the hash value after taking a refcount on socket, and
      use likely/unlikely compiler hints, as the sk_hash/hash check
      makes the following conditional tests 100% predicted by cpu.
      
      Introduce skc_addrpair/skc_portpair pair values to better
      document the alignment requirements of the port/addr pairs
      used in the various MATCH() macros, and remove some casts.
      
      The namespace check can also be done at last.
      
      This slightly improves TCP/UDP lookup times.
      
      IP/TCP early demux needs inet->rx_dst_ifindex and
      TCP needs inet->min_ttl, lets group them together in same cache line.
      
      With help from Ben Hutchings & Joe Perches.
      
      Idea of this patch came after Ling Ma proposal to move skc_hash
      to the beginning of struct sock_common, and should allow him
      to submit a final version of his patch. My tests show an improvement
      doing so.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Ling Ma <ling.ma.program@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce43b03e
    • R
      rtnelink: remove unused parameter from rtnl_create_link(). · c0713563
      Rami Rosen 提交于
      This patch removes an unused parameter (src_net) from rtnl_create_link()
      method and from the method single invocation, in veth.
      This parameter was used in the past when calling
      ops->get_tx_queues(src_net, tb) in rtnl_create_link().
      The get_tx_queues() member of rtnl_link_ops was replaced by two methods,
      get_num_tx_queues() and get_num_rx_queues(), which do not get any
      parameter. This was done in commit d40156aa by
      Jiri Pirko ("rtnl: allow to specify different num for rx and tx queue count").
      Signed-off-by: NRami Rosen <ramirose@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0713563
  6. 30 11月, 2012 1 次提交
  7. 28 11月, 2012 2 次提交
  8. 27 11月, 2012 6 次提交
    • M
      Revert "mm: remove __GFP_NO_KSWAPD" · 82b212f4
      Mel Gorman 提交于
      With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction
      based on failures" reverted, Zdenek Kabelac reported the following
      
        Hmm,  so it's just took longer to hit the problem and observe
        kswapd0 spinning on my CPU again - it's not as endless like before -
        but still it easily eats minutes - it helps to	turn off  Firefox
        or TB  (memory hungry apps) so kswapd0 stops soon - and restart
        those apps again.  (And I still have like >1GB of cached memory)
      
        kswapd0         R  running task        0    30      2 0x00000000
        Call Trace:
          preempt_schedule+0x42/0x60
          _raw_spin_unlock+0x55/0x60
          put_super+0x31/0x40
          drop_super+0x22/0x30
          prune_super+0x149/0x1b0
          shrink_slab+0xba/0x510
      
      The sysrq+m indicates the system has no swap so it'll never reclaim
      anonymous pages as part of reclaim/compaction.  That is one part of the
      problem but not the root cause as file-backed pages could also be
      reclaimed.
      
      The likely underlying problem is that kswapd is woken up or kept awake
      for each THP allocation request in the page allocator slow path.
      
      If compaction fails for the requesting process then compaction will be
      deferred for a time and direct reclaim is avoided.  However, if there
      are a storm of THP requests that are simply rejected, it will still be
      the the case that kswapd is awake for a prolonged period of time as
      pgdat->kswapd_max_order is updated each time.  This is noticed by the
      main kswapd() loop and it will not call kswapd_try_to_sleep().  Instead
      it will loopp, shrinking a small number of pages and calling
      shrink_slab() on each iteration.
      
      The temptation is to supply a patch that checks if kswapd was woken for
      THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not
      backed up by proper testing.  As 3.7 is very close to release and this
      is not a bug we should release with, a safer path is to revert "mm:
      remove __GFP_NO_KSWAPD" for now and revisit it with the view to ironing
      out the balance_pgdat() logic in general.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Zdenek Kabelac <zkabelac@redhat.com>
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82b212f4
    • T
      include/linux/bug.h: fix sparse warning related to BUILD_BUG_ON_INVALID · c5782e9f
      Tushar Behera 提交于
      Commit baf05aa9 ("bug: introduce BUILD_BUG_ON_INVALID() macro")
      introduces this macro only when _CHECKER_ is not defined.  Define a
      silent macro in the else condition to fix following sparse warning:
      
        mm/filemap.c:395:9: error: undefined identifier 'BUILD_BUG_ON_INVALID'
        mm/filemap.c:396:9: error: undefined identifier 'BUILD_BUG_ON_INVALID'
        mm/filemap.c:397:9: error: undefined identifier 'BUILD_BUG_ON_INVALID'
        include/linux/mm.h:419:9: error: undefined identifier 'BUILD_BUG_ON_INVALID'
        include/linux/mm.h:419:9: error: not a function <noident>
      Signed-off-by: NTushar Behera <tushar.behera@linaro.org>
      Acked-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c5782e9f
    • B
      sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name · c91f6df2
      Brian Haley 提交于
      Instead of having the getsockopt() of SO_BINDTODEVICE return an index, which
      will then require another call like if_indextoname() to get the actual interface
      name, have it return the name directly.
      
      This also matches the existing man page description on socket(7) which mentions
      the argument being an interface name.
      
      If the value has not been set, zero is returned and optlen will be set to zero
      to indicate there is no interface name present.
      
      Added a seqlock to protect this code path, and dev_ifname(), from someone
      changing the device name via dev_change_name().
      
      v2: Added seqlock protection while copying device name.
      
      v3: Fixed word wrap in patch.
      Signed-off-by: NBrian Haley <brian.haley@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c91f6df2
    • G
      stmmac: add Rx watchdog support to mitigate the DMA irqs · 62a2ab93
      Giuseppe CAVALLARO 提交于
      GMAC devices newer than databook 3.40 has an embedded timer
      that can be used for mitigating the number of interrupts.
      So this patch adds this optimizations.
      
      At any rate, the Rx watchdog can be disable (on bugged HW) by
      passing from the platform the riwt_off field.
      
      In this implementation the rx timer stored in the Reg9 is fixed
      to the max value. This will be tuned by using ethtool.
      
      V2: added a platform parameter to force to disable the rx-watchdog
      for example on new core where it is bugged.
      
      V3: do not disable NAPI when Rx watchdog is used.
      
      V4: a new extra statistic field has been added to show the early
      receive status in the interrupt handler.
      This patch also adds an extra check to avoid to call
      napi_schedule when the DMA_INTR_ENA_RIE bit is disabled in the
      Interrupt Mask register.
      Signed-off-by: NGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62a2ab93
    • H
      bcma: add more package IDs · 0751f865
      Hauke Mehrtens 提交于
      Signed-off-by: NHauke Mehrtens <hauke@hauke-m.de>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      0751f865
    • A
      openvswitch: add skb mark matching and set action · 39c7caeb
      Ansis Atteka 提交于
      This patch adds support for skb mark matching and set action.
      Signed-off-by: NAnsis Atteka <aatteka@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      39c7caeb
  9. 26 11月, 2012 11 次提交
  10. 24 11月, 2012 3 次提交
  11. 22 11月, 2012 3 次提交
新手
引导
客服 返回
顶部