1. 05 5月, 2016 9 次提交
    • F
      netfilter: conntrack: use a single hashtable for all namespaces · 56d52d48
      Florian Westphal 提交于
      We already include netns address in the hash and compare the netns pointers
      during lookup, so even if namespaces have overlapping addresses entries
      will be spread across the table.
      
      Assuming 64k bucket size, this change saves 0.5 mbyte per namespace on a
      64bit system.
      
      NAT bysrc and expectation hash is still per namespace, those will
      changed too soon.
      
      Future patch will also make conntrack object slab cache global again.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      56d52d48
    • F
      netfilter: conntrack: make netns address part of hash · 1b8c8a9f
      Florian Westphal 提交于
      Once we place all conntracks into a global hash table we want them to be
      spread across entire hash table, even if namespaces have overlapping ip
      addresses.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      1b8c8a9f
    • F
      netfilter: conntrack: check netns when comparing conntrack objects · e0c7d472
      Florian Westphal 提交于
      Once we place all conntracks in the same hash table we must also compare
      the netns pointer to skip conntracks that belong to a different namespace.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      e0c7d472
    • F
      netfilter: conntrack: small refactoring of conntrack seq_printf · 245cfdca
      Florian Westphal 提交于
      The iteration process is lockless, so we test if the conntrack object is
      eligible for printing (e.g. is AF_INET) after obtaining the reference
      count.
      
      Once we put all conntracks into same hash table we might see more
      entries that need to be skipped.
      
      So add a helper and first perform the test in a lockless fashion
      for fast skip.
      
      Once we obtain the reference count, just repeat the check.
      
      Note that this refactoring also includes a missing check for unconfirmed
      conntrack entries due to slab rcu object re-usage, so they need to be
      skipped since they are not part of the listing.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      245cfdca
    • F
      netfilter: conntrack: use nf_ct_key_equal() in more places · 86804348
      Florian Westphal 提交于
      This prepares for upcoming change that places all conntracks into a
      single, global table.  For this to work we will need to also compare
      net pointer during lookup.  To avoid open-coding such check use the
      nf_ct_key_equal helper and then later extend it to also consider net_eq.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      86804348
    • F
      netfilter: conntrack: don't attempt to iterate over empty table · 88b68bc5
      Florian Westphal 提交于
      Once we place all conntracks into same table iteration becomes more
      costly because the table contains conntracks that we are not interested
      in (belonging to other netns).
      
      So don't bother scanning if the current namespace has no entries.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      88b68bc5
    • F
      netfilter: conntrack: fix lookup race during hash resize · 5e3c61f9
      Florian Westphal 提交于
      When resizing the conntrack hash table at runtime via
      echo 42 > /sys/module/nf_conntrack/parameters/hashsize, we are racing with
      the conntrack lookup path -- reads can happen in parallel and nothing
      prevents readers from observing a the newly allocated hash but the old
      size (or vice versa).
      
      So access to hash[bucket] can trigger OOB read access in case the table got
      expanded and we saw the new size but the old hash pointer (or it got shrunk
      and we got new hash ptr but the size of the old and larger table):
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN
      CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.6.0-rc2+ #107
      [..]
      Call Trace:
      [<ffffffff822c3d6a>] ? nf_conntrack_tuple_taken+0x12a/0xe90
      [<ffffffff822c3ac1>] ? nf_ct_invert_tuplepr+0x221/0x3a0
      [<ffffffff8230e703>] get_unique_tuple+0xfb3/0x2760
      
      Use generation counter to obtain the address/length of the same table.
      
      Also add a synchronize_net before freeing the old hash.
      AFAICS, without it we might access ct_hash[bucket] after ct_hash has been
      freed, provided that lockless reader got delayed by another event:
      
      CPU1			CPU2
      seq_begin
      seq_retry
      <delay>			resize occurs
      			free oldhash
      for_each(oldhash[size])
      
      Note that resize is only supported in init_netns, it took over 2 minutes
      of constant resizing+flooding to produce the warning, so this isn't a
      big problem in practice.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      5e3c61f9
    • F
      netfilter: conntrack: keep BH enabled during lookup · 2cf12348
      Florian Westphal 提交于
      No need to disable BH here anymore:
      
      stats are switched to _ATOMIC variant (== this_cpu_inc()), which
      nowadays generates same code as the non _ATOMIC NF_STAT, at least on x86.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2cf12348
    • F
      netfilter: nftables: add connlabel set support · 1ad8f48d
      Florian Westphal 提交于
      Conntrack labels are currently sized depending on the iptables
      ruleset, i.e. if we're asked to test or set bits 1, 2, and 65 then we
      would allocate enough room to store at least bit 65.
      
      However, with nft, the input is just a register with arbitrary runtime
      content.
      
      We therefore ask for the upper ceiling we currently have, which is
      enough room to store 128 bits.
      
      Alternatively, we could alter nf_connlabel_replace to increase
      net->ct.label_words at run time, but since 128 bits is not that
      big we'd only save sizeof(long) so it doesn't seem worth it for now.
      
      This follows a similar approach that xtables 'connlabel'
      match uses, so when user inputs
      
          ct label set bar
      
      then we will set the bit used by the 'bar' label and leave the rest alone.
      
      This is done by passing the sreg content to nf_connlabels_replace
      as both value and mask argument.
      Labels (bits) already set thus cannot be re-set to zero, but
      this is not supported by xtables connlabel match either.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      1ad8f48d
  2. 29 4月, 2016 1 次提交
  3. 25 4月, 2016 16 次提交
  4. 24 4月, 2016 12 次提交
  5. 22 4月, 2016 2 次提交
    • L
      Merge tag 'rtc-4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux · 5f44abd0
      Linus Torvalds 提交于
      Pull RTC fixes from Alexandre Belloni:
       "A few fixes for the RTC subsystem.  The documentation fix already
        missed 4.5 so I think it is worth taking it now:
      
        A documentation fix for s3c and two fixes for the ds1307"
      
      * tag 'rtc-4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
        rtc: ds1307: Use irq when available for wakeup-source device
        rtc: ds1307: ds3231 temperature s16 overflow
        rtc: s3c: Document in binding that only s3c6410 needs a src clk
      5f44abd0
    • L
      Merge tag 'pm+acpi-4.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · f78fe081
      Linus Torvalds 提交于
      Pull power management fixes from Rafael Wysocki:
       "Two fixes for issues introduced recently, one for an intel_pstate
        driver problem uncovered by the recent switch over from using timers
        and the other one for a potential cpufreq core problem related to
        system suspend/resume.
      
        Specifics:
      
         - Fix an intel_pstate driver problem causing CPUs to get stuck in the
           highest P-state when completely idle uncovered by the recent switch
           over from using timers (Rafael Wysocki).
      
         - Avoid attempts to get the current CPU frequency when all devices
           (like I2C controllers that may be nedded for that purpose) have
           been suspended during system suspend/resume (Rafael Wysocki)"
      
      * tag 'pm+acpi-4.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: Abort cpufreq_update_current_freq() for cpufreq_suspended set
        intel_pstate: Avoid getting stuck in high P-states when idle
      f78fe081