1. 24 9月, 2012 4 次提交
    • P
      netfilter: nfnetlink_queue: add NFQA_CAP_LEN attribute · 6ee584be
      Pablo Neira Ayuso 提交于
      This patch adds the NFQA_CAP_LEN attribute that allows us to know
      what is the real packet size from user-space (even if we decided
      to retrieve just a few bytes from the packet instead of all of it).
      
      Security software that inspects packets should always check for
      this new attribute to make sure that it is inspecting the entire
      packet.
      
      This also helps to provide a workaround for the problem described
      in: http://marc.info/?l=netfilter-devel&m=134519473212536&w=2
      
      Original idea from Florian Westphal.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      6ee584be
    • P
      netfilter: nfnetlink_queue: fix maximum packet length to userspace · ba8d3b0b
      Pablo Neira Ayuso 提交于
      The packets that we send via NFQUEUE are encapsulated in the NFQA_PAYLOAD
      attribute. The length of the packet in userspace is obtained via
      attr->nla_len field. This field contains the size of the Netlink
      attribute header plus the packet length.
      
      If the maximum packet length is specified, ie. 65535 bytes, and
      packets in the range of (65531,65535] are sent to userspace, the
      attr->nla_len overflows and it reports bogus lengths to the
      application.
      
      To fix this, this patch limits the maximum packet length to 65531
      bytes. If larger packet length is specified, the packet that we
      send to user-space is truncated to 65531 bytes.
      
      To support 65535 bytes packets, we have to revisit the idea of
      the 32-bits Netlink attribute length.
      Reported-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ba8d3b0b
    • P
      netfilter: nf_ct_ftp: add sequence tracking pickup facility for injected entries · 7be54ca4
      Pablo Neira Ayuso 提交于
      This patch allows the FTP helper to pickup the sequence tracking from
      the first packet seen. This is useful to fix the breakage of the first
      FTP command after the failover while using conntrackd to synchronize
      states.
      
      The seq_aft_nl_num field in struct nf_ct_ftp_info has been shrinked to
      16-bits (enough for what it does), so we can use the remaining 16-bits
      to store the flags while using the same size for the private FTP helper
      data.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      7be54ca4
    • F
      netfilter: xt_time: add support to ignore day transition · 54eb3df3
      Florian Westphal 提交于
      Currently, if you want to do something like:
      "match Monday, starting 23:00, for two hours"
      You need two rules, one for Mon 23:00 to 0:00 and one for Tue 0:00-1:00.
      
      The rule: --weekdays Mo --timestart 23:00  --timestop 01:00
      
      looks correct, but it will first match on monday from midnight to 1 a.m.
      and then again for another hour from 23:00 onwards.
      
      This permits userspace to explicitly ignore the day transition and
      match for a single, continuous time period instead.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      54eb3df3
  2. 23 9月, 2012 5 次提交
  3. 22 9月, 2012 2 次提交
  4. 21 9月, 2012 12 次提交
  5. 20 9月, 2012 17 次提交
    • J
      ixgbevf: scheduling while atomic in reset hw path · 012dc19a
      John Fastabend 提交于
      In ixgbevf_reset_hw_vf() msleep is called while holding mbx_lock
      resulting in a schedule while atomic bug with trace below.
      
      This patch uses mdelay instead.
      
      BUG: scheduling while atomic: ip/6539/0x00000002
      2 locks held by ip/6539:
       #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81419cc3>] rtnl_lock+0x17/0x19
       #1:  (&(&adapter->mbx_lock)->rlock){+.+...}, at: [<ffffffffa0030855>] ixgbevf_reset+0x30/0xc1 [ixgbevf]
      Modules linked in: ixgbevf ixgbe mdio libfc scsi_transport_fc 8021q scsi_tgt garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 uinput igb coretemp hwmon crc32c_intel ioatdma i2c_i801 shpchp microcode lpc_ich mfd_core i2c_core joydev dca pcspkr serio_raw pata_acpi ata_generic usb_storage pata_jmicron
      Pid: 6539, comm: ip Not tainted 3.6.0-rc3jk-net-next+ #104
      Call Trace:
       [<ffffffff81072202>] __schedule_bug+0x6a/0x79
       [<ffffffff814bc7e0>] __schedule+0xa2/0x684
       [<ffffffff8108f85f>] ? trace_hardirqs_off+0xd/0xf
       [<ffffffff814bd0c0>] schedule+0x64/0x66
       [<ffffffff814bb5e2>] schedule_timeout+0xa6/0xca
       [<ffffffff810536b9>] ? lock_timer_base+0x52/0x52
       [<ffffffff812629e0>] ? __udelay+0x15/0x17
       [<ffffffff814bb624>] schedule_timeout_uninterruptible+0x1e/0x20
       [<ffffffff810541c0>] msleep+0x1b/0x22
       [<ffffffffa002e723>] ixgbevf_reset_hw_vf+0x90/0xe5 [ixgbevf]
       [<ffffffffa0030860>] ixgbevf_reset+0x3b/0xc1 [ixgbevf]
       [<ffffffffa0032fba>] ixgbevf_open+0x43/0x43e [ixgbevf]
       [<ffffffff81409610>] ? dev_set_rx_mode+0x2e/0x33
       [<ffffffff8140b0f1>] __dev_open+0xa0/0xe5
       [<ffffffff814097ed>] __dev_change_flags+0xbe/0x142
       [<ffffffff8140b01c>] dev_change_flags+0x21/0x56
       [<ffffffff8141a843>] do_setlink+0x2e2/0x7f4
       [<ffffffff81016e36>] ? native_sched_clock+0x37/0x39
       [<ffffffff8141b0ac>] rtnl_newlink+0x277/0x4bb
       [<ffffffff8141aee9>] ? rtnl_newlink+0xb4/0x4bb
       [<ffffffff812217d1>] ? selinux_capable+0x32/0x3a
       [<ffffffff8104fb17>] ? ns_capable+0x4f/0x67
       [<ffffffff81419cc3>] ? rtnl_lock+0x17/0x19
       [<ffffffff81419f28>] rtnetlink_rcv_msg+0x236/0x253
       [<ffffffff81419cf2>] ? rtnetlink_rcv+0x2d/0x2d
       [<ffffffff8142fd42>] netlink_rcv_skb+0x43/0x94
       [<ffffffff81419ceb>] rtnetlink_rcv+0x26/0x2d
       [<ffffffff8142faf1>] netlink_unicast+0xee/0x174
       [<ffffffff81430327>] netlink_sendmsg+0x26a/0x288
       [<ffffffff813fb04f>] ? rcu_read_unlock+0x56/0x67
       [<ffffffff813f5e6d>] __sock_sendmsg_nosec+0x58/0x61
       [<ffffffff813f81b7>] __sock_sendmsg+0x3d/0x48
       [<ffffffff813f8339>] sock_sendmsg+0x6e/0x87
       [<ffffffff81107c9f>] ? might_fault+0xa5/0xac
       [<ffffffff81402a72>] ? copy_from_user+0x2a/0x2c
       [<ffffffff81402e62>] ? verify_iovec+0x54/0xaa
       [<ffffffff813f9834>] __sys_sendmsg+0x206/0x288
       [<ffffffff810694fa>] ? up_read+0x23/0x3d
       [<ffffffff811307e5>] ? fcheck_files+0xac/0xea
       [<ffffffff8113095e>] ? fget_light+0x3a/0xb9
       [<ffffffff813f9a2e>] sys_sendmsg+0x42/0x60
       [<ffffffff814c5ba9>] system_call_fastpath+0x16/0x1b
      
      CC: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Tested-By: NRobert Garrett <robertx.e.garrett@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      012dc19a
    • A
      ixgbevf: Add support for VF API negotiation · 31186785
      Alexander Duyck 提交于
      This change makes it so that the VF can support the PF/VF API negotiation
      protocol.  Specifically in this case we are adding support for API 1.0
      which will mean that the VF is capable of cleaning up buffers that span
      multiple descriptors without triggering an error.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NSibai Li <sibai.li@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      31186785
    • A
      igb: Support to enable EEE on all eee_supported devices · e5461112
      Akeem G. Abodunrin 提交于
      Current implementation enables EEE on only i350 device. This patch enables
      EEE on all eee_supported devices. Also, configured LPI clock to keep
      running before EEE is enabled on i210 and i211 devices.
      Signed-off-by: NAkeem G. Abodunrin <akeem.g.abodunrin@intel.com>
      Tested-by: NJeff Pieper  <jeffrey.e.pieper@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      e5461112
    • A
      igb: Remove artificial restriction on RQDPC stat reading · ae1c07a6
      Alexander Duyck 提交于
      For some reason the reading of the RQDPC register was being artificially
      limited to 4K.  Instead of limiting the value we should read the value and
      add the full amount.  Otherwise this can lead to a misleading number of
      dropped packets when the actual value is in fact much higher.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NJeff Pieper   <jeffrey.e.pieper@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      ae1c07a6
    • M
      r8169: use unlimited DMA burst for TX · aee77e4a
      Michal Schmidt 提交于
      The r8169 driver currently limits the DMA burst for TX to 1024 bytes. I have
      a box where this prevents the interface from using the gigabit line to its full
      potential. This patch solves the problem by setting TX_DMA_BURST to unlimited.
      
      The box has an ASRock B75M motherboard with on-board RTL8168evl/8111evl
      (XID 0c900880). TSO is enabled.
      
      I used netperf (TCP_STREAM test) to measure the dependency of TX throughput
      on MTU. I did it for three different values of TX_DMA_BURST ('5'=512, '6'=1024,
      '7'=unlimited). This chart shows the results:
      http://michich.fedorapeople.org/r8169/r8169-effects-of-TX_DMA_BURST.png
      
      Interesting points:
       - With the current DMA burst limit (1024):
         - at the default MTU=1500 I get only 842 Mbit/s.
         - when going from small MTU, the performance rises monotonically with
           increasing MTU only up to a peak at MTU=1076 (908 MBit/s). Then there's
           a sudden drop to 762 MBit/s from which the throughput rises monotonically
           again with further MTU increases.
       - With a smaller DMA burst limit (512):
         - there's a similar peak at MTU=1076 and another one at MTU=564.
       - With unlimited DMA burst:
         - at the default MTU=1500 I get nice 940 Mbit/s.
         - the throughput rises monotonically with increasing MTU with no strange
           peaks.
      
      Notice that the peaks occur at MTU sizes that are multiples of the DMA burst
      limit plus 52. Why 52? Because:
        20 (IP header) + 20 (TCP header) + 12 (TCP options) = 52
      
      The Realtek-provided r8168 driver (v8.032.00) uses unlimited TX DMA burst too,
      except for CFG_METHOD_1 where the TX DMA burst is set to 512 bytes.
      CFG_METHOD_1 appears to be the oldest MAC version of "RTL8168B/8111B",
      i.e. RTL_GIGA_MAC_VER_11 in r8169. Not sure if this MAC version really needs
      the smaller burst limit, or if any other versions have similar requirements.
      Signed-off-by: NMichal Schmidt <mschmidt@redhat.com>
      Acked-by: NFrancois Romieu <romieu@fr.zoreil.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aee77e4a
    • A
      ipv6: unify fragment thresh handling code · 6b102865
      Amerigo Wang 提交于
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Michal Kubeček <mkubecek@suse.cz>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b102865
    • A
      ipv6: make ip6_frag_nqueues() and ip6_frag_mem() static inline · d4915c08
      Amerigo Wang 提交于
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Michal Kubeček <mkubecek@suse.cz>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4915c08
    • A
      ipv6: unify conntrack reassembly expire code with standard one · b836c99f
      Amerigo Wang 提交于
      Two years ago, Shan Wei tried to fix this:
      http://patchwork.ozlabs.org/patch/43905/
      
      The problem is that RFC2460 requires an ICMP Time
      Exceeded -- Fragment Reassembly Time Exceeded message should be
      sent to the source of that fragment, if the defragmentation
      times out.
      
      "
         If insufficient fragments are received to complete reassembly of a
         packet within 60 seconds of the reception of the first-arriving
         fragment of that packet, reassembly of that packet must be
         abandoned and all the fragments that have been received for that
         packet must be discarded.  If the first fragment (i.e., the one
         with a Fragment Offset of zero) has been received, an ICMP Time
         Exceeded -- Fragment Reassembly Time Exceeded message should be
         sent to the source of that fragment.
      "
      
      As Herbert suggested, we could actually use the standard IPv6
      reassembly code which follows RFC2460.
      
      With this patch applied, I can see ICMP Time Exceeded sent
      from the receiver when the sender sent out 3/4 fragmented
      IPv6 UDP packet.
      
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Michal Kubeček <mkubecek@suse.cz>
      Cc: David Miller <davem@davemloft.net>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: netfilter-devel@vger.kernel.org
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b836c99f
    • A
      ipv6: add a new namespace for nf_conntrack_reasm · c038a767
      Amerigo Wang 提交于
      As pointed by Michal, it is necessary to add a new
      namespace for nf_conntrack_reasm code, this prepares
      for the second patch.
      
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Michal Kubeček <mkubecek@suse.cz>
      Cc: David Miller <davem@davemloft.net>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: netfilter-devel@vger.kernel.org
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c038a767
    • A
      netpoll: call ->ndo_select_queue() in tx path · 8c4c49df
      Amerigo Wang 提交于
      In netpoll tx path, we miss the chance of calling ->ndo_select_queue(),
      thus could cause problems when bonding is involved.
      
      This patch makes dev_pick_tx() extern (and rename it to netdev_pick_tx())
      to let netpoll call it in netpoll_send_skb_on_dev().
      Reported-by: NSylvain Munaut <s.munaut@whatever-company.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Tested-by: NSylvain Munaut <s.munaut@whatever-company.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c4c49df
    • S
      netdev: make address const in device address management · 6b6e2725
      stephen hemminger 提交于
      The internal functions for add/deleting addresses don't change
      their argument.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b6e2725
    • M
      i825xx: znet: fix compiler warnings when building a 64-bit kernel · 1d3ff767
      Mika Westerberg 提交于
      When building 64-bit kernel with this driver we get following warnings from
      the compiler:
      
      drivers/net/ethernet/i825xx/znet.c: In function ‘hardware_init’:
      drivers/net/ethernet/i825xx/znet.c:863:29: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      drivers/net/ethernet/i825xx/znet.c:870:29: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      
      Fix these by calling isa_virt_to_bus() before passing the pointers to
      set_dma_addr().
      Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d3ff767
    • E
      gre: add GSO support · 6b78f16e
      Eric Dumazet 提交于
      Add GSO support to GRE tunnels.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b78f16e
    • E
      net: provide a default dev->ethtool_ops · 2c60db03
      Eric Dumazet 提交于
      Instead of forcing device drivers to provide empty ethtool_ops or tweak
      net/core/ethtool.c again, we could provide a generic ethtool_ops.
      
      This occurred to me when I wanted to add GSO support to GRE tunnels.
      ethtool -k support should be generic for all drivers.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Reviewed-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c60db03
    • G
      net: dev: fix incorrect getting net device's name · 828de4f6
      Gao feng 提交于
      When moving a nic from net namespace A to net namespace B,
      in dev_change_net_namesapce,we call __dev_get_by_name to
      decide if the netns B has the device has the same name.
      
      if the netns B already has the same named device,we call
      dev_get_valid_name to try to get a valid name for this nic in
      the netns B,but net_device->nd_net still point to netns A now.
      
      this patch fix it.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      828de4f6
    • L
      ipv6: recursive check rt->dst.from when call rt6_check_expired · 3fd91fb3
      Li RongQing 提交于
      If dst cache dst_a copies from dst_b, and dst_b copies from dst_c, check
      if dst_a is expired or not, we should not end with dst_a->dst.from, dst_b,
      we should check dst_c.
      
      CC: Gao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fd91fb3
    • E
      net: more accurate network taps in transmit path · b40863c6
      Eric Dumazet 提交于
      dev_queue_xmit_nit() should be called right before ndo_start_xmit()
      calls or we might give wrong packet contents to taps users :
      
      Packet checksum can be changed, or packet can be linearized or
      segmented, and segments partially sent for the later case.
      
      Also a memory allocation can fail and packet never really hit the
      driver entry point.
      Reported-by: NJamie Gloudon <jamie.gloudon@gmail.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b40863c6