• B
    network: fix dnsmasq/radvd binding to IPv6 on recent kernels · db488c79
    Benjamin Cama 提交于
    I hit this problem recently when trying to create a bridge with an IPv6
    address on a 3.2 kernel: dnsmasq (and, further, radvd) would not bind to
    the given address, waiting 20s and then giving up with -EADDRNOTAVAIL
    (resp. exiting immediately with "error parsing or activating the config
    file", without libvirt noticing it, BTW). This can be reproduced with (I
    think) any kernel >= 2.6.39 and the following XML (to be used with
    "virsh net-create"):
    
            <network>
              <name>test-bridge</name>
              <bridge name='testbr0' />
              <ip family='ipv6' address='fd00::1' prefix='64'>
              </ip>
            </network>
    
    (it happens even when you have an IPv4, too)
    
    The problem is that since commit [1] (which, ironically, was made to
    “help IPv6 autoconfiguration”) the linux bridge code makes bridges
    behave like “real” devices regarding carrier detection. This makes the
    bridges created by libvirt, which are started without any up devices,
    stay with the NO-CARRIER flag set, and thus prevents DAD (Duplicate
    address detection) from happening, thus letting the IPv6 address flagged
    as “tentative”. Such addresses cannot be bound to (see RFC 2462), so
    dnsmasq fails binding to it (for radvd, it detects that "interface XXX
    is not RUNNING", thus that "interface XXX does not exist, ignoring the
    interface" (sic)). It seems that this behavior was enhanced somehow with
    commit [2] by avoiding setting NO-CARRIER on empty bridges, but I
    couldn't reproduce this behavior on my kernel. Anyway, with the “dummy
    tap to set MAC address” trick, this wouldn't work.
    
    To fix this, the idea is to get the bridge's attached device to be up so
    that DAD can happen (deactivating DAD altogether is not a good idea, I
    think). Currently, libvirt creates a dummy TAP device to set the MAC
    address of the bridge, keeping it down. But even if we set this device
    up, it is not RUNNING as soon as the tap file descriptor attached to it
    is closed, thus still preventing DAD. So, we must modify the API a bit,
    so that we can get the fd, keep the tap device persistent, run the
    daemons, and close it after DAD has taken place. After that, the bridge
    will be flagged NO-CARRIER again, but the daemons will be running, even
    if not happy about the device's state (but we don't really care about
    the bridge's daemons doing anything when no up interface is connected to
    it).
    
    Other solutions that I envisioned were:
          * Keeping the *-nic interface up: this would waste an fd for each
            bridge during all its life. May be acceptable, I don't really
            know.
          * Stop using the dummy tap trick, and set the MAC address directly
            on the bridge: it is possible since quite some time it seems,
            even if then there is the problem of the bridge not being
            RUNNING when empty, contrary to what [2] says, so this will need
            fixing (and this fix only happened in 3.1, so it wouldn't work
            for 2.6.39)
          * Using the --interface option of dnsmasq, but I saw somewhere
            that it's not used by libvirt for backward compatibility. I am
            not sure this would solve this problem, though, as I don't know
            how dnsmasq binds itself to it with this option.
    
    This is why this patch does what's described earlier.
    
    This patch also makes radvd start even if the interface is
    “missing” (i.e. it is not RUNNING), as it daemonizes before binding to
    it, and thus sometimes does it after the interface has been brought down
    by us (by closing the tap fd), and then originally stops. This also
    makes it stop yelling about it in the logs when the interface is down at
    a later time.
    
    [1]
    http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=1faa4356a3bd89ea11fb92752d897cff3a20ec0e
    [2]
    http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=b64b73d7d0c480f75684519c6134e79d50c1b341
    db488c79
.mailmap 1.8 KB