1. 05 1月, 2012 1 次提交
  2. 04 1月, 2012 1 次提交
  3. 31 12月, 2011 6 次提交
  4. 28 12月, 2011 2 次提交
  5. 26 12月, 2011 1 次提交
    • J
      KVM: Don't automatically expose the TSC deadline timer in cpuid · 4d25a066
      Jan Kiszka 提交于
      Unlike all of the other cpuid bits, the TSC deadline timer bit is set
      unconditionally, regardless of what userspace wants.
      
      This is broken in several ways:
       - if userspace doesn't use KVM_CREATE_IRQCHIP, and doesn't emulate the TSC
         deadline timer feature, a guest that uses the feature will break
       - live migration to older host kernels that don't support the TSC deadline
         timer will cause the feature to be pulled from under the guest's feet;
         breaking it
       - guests that are broken wrt the feature will fail.
      
      Fix by not enabling the feature automatically; instead report it to userspace.
      Because the feature depends on KVM_CREATE_IRQCHIP, which we cannot guarantee
      will be called, we expose it via a KVM_CAP_TSC_DEADLINE_TIMER and not
      KVM_GET_SUPPORTED_CPUID.
      
      Fixes the Illumos guest kernel, which uses the TSC deadline timer feature.
      
      [avi: add the KVM_CAP + documentation]
      Reported-by: NAlexey Zaytsev <alexey.zaytsev@gmail.com>
      Tested-by: NAlexey Zaytsev <alexey.zaytsev@gmail.com>
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      4d25a066
  6. 25 12月, 2011 3 次提交
    • P
      netfilter: xtables: add nfacct match to support extended accounting · ceb98d03
      Pablo Neira Ayuso 提交于
      This patch adds the match that allows to perform extended
      accounting. It requires the new nfnetlink_acct infrastructure.
      
       # iptables -I INPUT -p tcp --sport 80 -m nfacct --nfacct-name http-traffic
       # iptables -I OUTPUT -p tcp --dport 80 -m nfacct --nfacct-name http-traffic
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ceb98d03
    • P
      netfilter: add extended accounting infrastructure over nfnetlink · 94139027
      Pablo Neira Ayuso 提交于
      We currently have two ways to account traffic in netfilter:
      
      - iptables chain and rule counters:
      
       # iptables -L -n -v
      Chain INPUT (policy DROP 3 packets, 867 bytes)
       pkts bytes target     prot opt in     out     source               destination
          8  1104 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0
      
      - use flow-based accounting provided by ctnetlink:
      
       # conntrack -L
      tcp      6 431999 ESTABLISHED src=192.168.1.130 dst=212.106.219.168 sport=58152 dport=80 packets=47 bytes=7654 src=212.106.219.168 dst=192.168.1.130 sport=80 dport=58152 packets=49 bytes=66340 [ASSURED] mark=0 use=1
      
      While trying to display real-time accounting statistics, we require
      to pool the kernel periodically to obtain this information. This is
      OK if the number of flows is relatively low. However, in case that
      the number of flows is huge, we can spend a considerable amount of
      cycles to iterate over the list of flows that have been obtained.
      
      Moreover, if we want to obtain the sum of the flow accounting results
      that match some criteria, we have to iterate over the whole list of
      existing flows, look for matchings and update the counters.
      
      This patch adds the extended accounting infrastructure for
      nfnetlink which aims to allow displaying real-time traffic accounting
      without the need of complicated and resource-consuming implementation
      in user-space. Basically, this new infrastructure allows you to create
      accounting objects. One accounting object is composed of packet and
      byte counters.
      
      In order to manipulate create accounting objects, you require the
      new libnetfilter_acct library. It contains several examples of use:
      
      libnetfilter_acct/examples# ./nfacct-add http-traffic
      libnetfilter_acct/examples# ./nfacct-get
      http-traffic = { pkts = 000000000000,   bytes = 000000000000 };
      
      Then, you can use one of this accounting objects in several iptables
      rules using the new nfacct match (which comes in a follow-up patch):
      
       # iptables -I INPUT -p tcp --sport 80 -m nfacct --nfacct-name http-traffic
       # iptables -I OUTPUT -p tcp --dport 80 -m nfacct --nfacct-name http-traffic
      
      The idea is simple: if one packet matches the rule, the nfacct match
      updates the counters.
      
      Thanks to Patrick McHardy, Eric Dumazet, Changli Gao for reviewing and
      providing feedback for this contribution.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      94139027
    • E
      rfs: better sizing of dev_flow_table · 60b778ce
      Eric Dumazet 提交于
      Aim of this patch is to provide full range of rps_flow_cnt on 64bit arches.
      
      Theorical limit on number of flows is 2^32
      
      Fix some buggy RPS/RFS macros as well.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Tom Herbert <therbert@google.com>
      CC: Xi Wang <xi.wang@gmail.com>
      CC: Laurent Chavey <chavey@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60b778ce
  7. 24 12月, 2011 1 次提交
  8. 23 12月, 2011 2 次提交
    • P
      netfilter: nf_nat: export NAT definitions to userspace · cbc9f2f4
      Patrick McHardy 提交于
      Export the NAT definitions to userspace. So far userspace (specifically,
      iptables) has been copying the headers files from include/net. Also
      rename some structures and definitions in preparation for IPv6 NAT.
      Since these have never been officially exported, this doesn't affect
      existing userspace code.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      cbc9f2f4
    • P
      netfilter: rework user-space expectation helper support · 3d058d7b
      Pablo Neira Ayuso 提交于
      This partially reworks bc01befd
      which added userspace expectation support.
      
      This patch removes the nf_ct_userspace_expect_list since now we
      force to use the new iptables CT target feature to add the helper
      extension for conntracks that have attached expectations from
      userspace.
      
      A new version of the proof-of-concept code to implement userspace
      helpers from userspace is available at:
      
      http://people.netfilter.org/pablo/userspace-conntrack-helpers/nf-ftp-helper-POC.tar.bz2
      
      This patch also modifies the CT target to allow to set the
      conntrack's userspace helper status flags. This flag is used
      to tell the conntrack system to explicitly allocate the helper
      extension.
      
      This helper extension is useful to link the userspace expectations
      with the master conntrack that is being tracked from one userspace
      helper.
      
      This feature fixes a problem in the current approach of the
      userspace helper support. Basically, if the master conntrack that
      has got a userspace expectation vanishes, the expectations point to
      one invalid memory address. Thus, triggering an oops in the
      expectation deletion event path.
      
      I decided not to add a new revision of the CT target because
      I only needed to add a new flag for it. I'll document in this
      issue in the iptables manpage. I have also changed the return
      value from EINVAL to EOPNOTSUPP if one flag not supported is
      specified. Thus, in the future adding new features that only
      require a new flag can be added without a new revision.
      
      There is no official code using this in userspace (apart from
      the proof-of-concept) that uses this infrastructure but there
      will be some by beginning 2012.
      Reported-by: NSam Roberts <vieuxtech@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      3d058d7b
  9. 22 12月, 2011 1 次提交
    • S
      VFS: Fix race between CPU hotplug and lglocks · e30e2fdf
      Srivatsa S. Bhat 提交于
      Currently, the *_global_[un]lock_online() routines are not at all synchronized
      with CPU hotplug. Soft-lockups detected as a consequence of this race was
      reported earlier at https://lkml.org/lkml/2011/8/24/185. (Thanks to Cong Meng
      for finding out that the root-cause of this issue is the race condition
      between br_write_[un]lock() and CPU hotplug, which results in the lock states
      getting messed up).
      
      Fixing this race by just adding {get,put}_online_cpus() at appropriate places
      in *_global_[un]lock_online() is not a good option, because, then suddenly
      br_write_[un]lock() would become blocking, whereas they have been kept as
      non-blocking all this time, and we would want to keep them that way.
      
      So, overall, we want to ensure 3 things:
      1. br_write_lock() and br_write_unlock() must remain as non-blocking.
      2. The corresponding lock and unlock of the per-cpu spinlocks must not happen
         for different sets of CPUs.
      3. Either prevent any new CPU online operation in between this lock-unlock, or
         ensure that the newly onlined CPU does not proceed with its corresponding
         per-cpu spinlock unlocked.
      
      To achieve all this:
      (a) We introduce a new spinlock that is taken by the *_global_lock_online()
          routine and released by the *_global_unlock_online() routine.
      (b) We register a callback for CPU hotplug notifications, and this callback
          takes the same spinlock as above.
      (c) We maintain a bitmap which is close to the cpu_online_mask, and once it is
          initialized in the lock_init() code, all future updates to it are done in
          the callback, under the above spinlock.
      (d) The above bitmap is used (instead of cpu_online_mask) while locking and
          unlocking the per-cpu locks.
      
      The callback takes the spinlock upon the CPU_UP_PREPARE event. So, if the
      br_write_lock-unlock sequence is in progress, the callback keeps spinning,
      thus preventing the CPU online operation till the lock-unlock sequence is
      complete. This takes care of requirement (3).
      
      The bitmap that we maintain remains unmodified throughout the lock-unlock
      sequence, since all updates to it are managed by the callback, which takes
      the same spinlock as the one taken by the lock code and released only by the
      unlock routine. Combining this with (d) above, satisfies requirement (2).
      
      Overall, since we use a spinlock (mentioned in (a)) to prevent CPU hotplug
      operations from racing with br_write_lock-unlock, requirement (1) is also
      taken care of.
      
      By the way, it is to be noted that a CPU offline operation can actually run
      in parallel with our lock-unlock sequence, because our callback doesn't react
      to notifications earlier than CPU_DEAD (in order to maintain our bitmap
      properly). And this means, since we use our own bitmap (which is stale, on
      purpose) during the lock-unlock sequence, we could end up unlocking the
      per-cpu lock of an offline CPU (because we had locked it earlier, when the
      CPU was online), in order to satisfy requirement (2). But this is harmless,
      though it looks a bit awkward.
      Debugged-by: NCong Meng <mc@linux.vnet.ibm.com>
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: stable@vger.kernel.org
      e30e2fdf
  10. 21 12月, 2011 1 次提交
  11. 20 12月, 2011 4 次提交
  12. 19 12月, 2011 1 次提交
  13. 17 12月, 2011 12 次提交
  14. 16 12月, 2011 2 次提交
  15. 15 12月, 2011 2 次提交