1. 04 5月, 2012 1 次提交
  2. 10 4月, 2012 1 次提交
  3. 03 4月, 2012 2 次提交
  4. 29 3月, 2012 1 次提交
  5. 28 3月, 2012 2 次提交
  6. 26 3月, 2012 3 次提交
  7. 24 3月, 2012 1 次提交
    • H
      poll: add poll_requested_events() and poll_does_not_wait() functions · 626cf236
      Hans Verkuil 提交于
      In some cases the poll() implementation in a driver has to do different
      things depending on the events the caller wants to poll for.  An example
      is when a driver needs to start a DMA engine if the caller polls for
      POLLIN, but doesn't want to do that if POLLIN is not requested but instead
      only POLLOUT or POLLPRI is requested.  This is something that can happen
      in the video4linux subsystem among others.
      
      Unfortunately, the current epoll/poll/select implementation doesn't
      provide that information reliably.  The poll_table_struct does have it: it
      has a key field with the event mask.  But once a poll() call matches one
      or more bits of that mask any following poll() calls are passed a NULL
      poll_table pointer.
      
      Also, the eventpoll implementation always left the key field at ~0 instead
      of using the requested events mask.
      
      This was changed in eventpoll.c so the key field now contains the actual
      events that should be polled for as set by the caller.
      
      The solution to the NULL poll_table pointer is to set the qproc field to
      NULL in poll_table once poll() matches the events, not the poll_table
      pointer itself.  That way drivers can obtain the mask through a new
      poll_requested_events inline.
      
      The poll_table_struct can still be NULL since some kernel code calls it
      internally (netfs_state_poll() in ./drivers/staging/pohmelfs/netfs.h).  In
      that case poll_requested_events() returns ~0 (i.e.  all events).
      
      Very rarely drivers might want to know whether poll_wait will actually
      wait.  If another earlier file descriptor in the set already matched the
      events the caller wanted to wait for, then the kernel will return from the
      select() call without waiting.  This might be useful information in order
      to avoid doing expensive work.
      
      A new helper function poll_does_not_wait() is added that drivers can use
      to detect this situation.  This is now used in sock_poll_wait() in
      include/net/sock.h.  This was the only place in the kernel that needed
      this information.
      
      Drivers should no longer access any of the poll_table internals, but use
      the poll_requested_events() and poll_does_not_wait() access functions
      instead.  In order to enforce that the poll_table fields are now prepended
      with an underscore and a comment was added warning against using them
      directly.
      
      This required a change in unix_dgram_poll() in unix/af_unix.c which used
      the key field to get the requested events.  It's been replaced by a call
      to poll_requested_events().
      
      For qproc it was especially important to change its name since the
      behavior of that field changes with this patch since this function pointer
      can now be NULL when that wasn't possible in the past.
      
      Any driver accessing the qproc or key fields directly will now fail to compile.
      
      Some notes regarding the correctness of this patch: the driver's poll()
      function is called with a 'struct poll_table_struct *wait' argument.  This
      pointer may or may not be NULL, drivers can never rely on it being one or
      the other as that depends on whether or not an earlier file descriptor in
      the select()'s fdset matched the requested events.
      
      There are only three things a driver can do with the wait argument:
      
      1) obtain the key field:
      
      	events = wait ? wait->key : ~0;
      
         This will still work although it should be replaced with the new
         poll_requested_events() function (which does exactly the same).
         This will now even work better, since wait is no longer set to NULL
         unnecessarily.
      
      2) use the qproc callback. This could be deadly since qproc can now be
         NULL. Renaming qproc should prevent this from happening. There are no
         kernel drivers that actually access this callback directly, BTW.
      
      3) test whether wait == NULL to determine whether poll would return without
         waiting. This is no longer sufficient as the correct test is now
         wait == NULL || wait->_qproc == NULL.
      
         However, the worst that can happen here is a slight performance hit in
         the case where wait != NULL and wait->_qproc == NULL. In that case the
         driver will assume that poll_wait() will actually add the fd to the set
         of waiting file descriptors. Of course, poll_wait() will not do that
         since it tests for wait->_qproc. This will not break anything, though.
      
         There is only one place in the whole kernel where this happens
         (sock_poll_wait() in include/net/sock.h) and that code will be replaced
         by a call to poll_does_not_wait() in the next patch.
      
         Note that even if wait->_qproc != NULL drivers cannot rely on poll_wait()
         actually waiting. The next file descriptor from the set might match the
         event mask and thus any possible waits will never happen.
      Signed-off-by: NHans Verkuil <hans.verkuil@cisco.com>
      Reviewed-by: NJonathan Corbet <corbet@lwn.net>
      Reviewed-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      626cf236
  8. 23 3月, 2012 10 次提交
    • A
      bonding: remove entries for master_ip and vlan_ip and query devices instead · eaddcd76
      Andy Gospodarek 提交于
      The following patch aimed to resolve an issue where secondary, tertiary,
      etc. addresses added to bond interfaces could overwrite the
      bond->master_ip and vlan_ip values.
      
              commit 917fbdb3
              Author: Henrik Saavedra Persson <henrik.e.persson@ericsson.com>
              Date:   Wed Nov 23 23:37:15 2011 +0000
      
                  bonding: only use primary address for ARP
      
      That patch was good because it prevented bonds using ARP monitoring from
      sending frames with an invalid source IP address.  Unfortunately, it
      didn't always work as expected.
      
      When using an ioctl (like ifconfig does) to set the IP address and
      netmask, 2 separate ioctls are actually called to set the IP and netmask
      if the mask chosen doesn't match the standard mask for that class of
      address.  The first ioctl did not have a mask that matched the one in
      the primary address and would still cause the device address to be
      overwritten.  The second ioctl that was called to set the mask would
      then detect as secondary and ignored, but the damage was already done.
      
      This was not an issue when using an application that used netlink
      sockets as the setting of IP and netmask came down at once.  The
      inconsistent behavior between those two interfaces was something that
      needed to be resolved.
      
      While I was thinking about how I wanted to resolve this, Ralf Zeidler
      came with a patch that resolved this on a RHEL kernel by keeping a full
      shadow of the entries in dev->ifa_list for the bonding device and vlan
      devices in the bonding driver.  I didn't like the duplication of the
      list as I want to see the 'bonding' struct and code shrink rather than
      grow, but liked the general idea.
      
      As the Subject indicates this patch drops the master_ip and vlan_ip
      elements from the 'bonding' and 'vlan_entry' structs, respectively.
      This can be done because a device's address-list is now traversed to
      determine the optimal source IP address for ARP requests and for checks
      to see if the bonding device has a particular IP address.  This code
      could have all be contained inside the bonding driver, but it made more
      sense to me to EXPORT and call inet_confirm_addr since it did exactly
      what was needed.
      
      I tested this and a backported patch and everything works as expected.
      Ralf also helped with verification of the backported patch.
      
      Thanks to Ralf for all his help on this.
      
      v2: Whitespace and organizational changes based on suggestions from Jay
      Vosburgh and Dave Miller.
      
      v3: Fixup incorrect usage of rcu_read_unlock based on Dave Miller's
      suggestion.
      Signed-off-by: NAndy Gospodarek <andy@greyhouse.net>
      CC: Ralf Zeidler <ralf.zeidler@nsn.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eaddcd76
    • R
      netfilter: remove forward module param confusion. · 523f610e
      Rusty Russell 提交于
      It used to be an int, and it got changed to a bool parameter at least
      7 years ago.  It happens that NF_ACCEPT and NF_DROP are 0 and 1, so
      this works, but it's unclear, and the check that it's in range is not
      required.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      523f610e
    • P
      netfilter: nf_conntrack: permanently attach timeout policy to conntrack · 60b5f8f7
      Pablo Neira Ayuso 提交于
      We need to permanently attach the timeout policy to the conntrack,
      otherwise we may apply the custom timeout policy inconsistently.
      
      Without this patch, the following example:
      
       nfct timeout add test inet icmp timeout 100
       iptables -I PREROUTING -t raw -p icmp -s 1.1.1.1 -j CT --timeout test
      
      Will only apply the custom timeout policy to outgoing packets from
      1.1.1.1, but not to reply packets from 2.2.2.2 going to 1.1.1.1.
      
      To fix this issue, this patch modifies the current logic to attach the
      timeout policy when the first packet is seen (which is when the
      conntrack entry is created). Then, we keep using the attached timeout
      policy until the conntrack entry is destroyed.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      60b5f8f7
    • P
      netfilter: xt_CT: fix assignation of the generic protocol tracker · eeb4cb95
      Pablo Neira Ayuso 提交于
      `iptables -p all' uses 0 to match all protocols, while the conntrack
      subsystem uses 255. We still need `-p all' to attach the custom
      timeout policies for the generic protocol tracker.
      
      Moreover, we may use `iptables -p sctp' while the SCTP tracker is
      not loaded. In that case, we have to default on the generic protocol
      tracker.
      
      Another possibility is `iptables -p ip' that should be supported
      as well. This patch makes sure we validate all possible scenarios.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      eeb4cb95
    • P
      netfilter: xt_CT: missing rcu_read_lock section in timeout assignment · 1ac0bf99
      Pablo Neira Ayuso 提交于
      Fix a dereference to pointer without rcu_read_lock held.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      1ac0bf99
    • P
      netfilter: cttimeout: fix dependency with l4protocol conntrack module · c1ebd7df
      Pablo Neira Ayuso 提交于
      This patch introduces nf_conntrack_l4proto_find_get() and
      nf_conntrack_l4proto_put() to fix module dependencies between
      timeout objects and l4-protocol conntrack modules.
      
      Thus, we make sure that the module cannot be removed if it is
      used by any of the cttimeout objects.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c1ebd7df
    • S
      xfrm: Access the replay notify functions via the registered callbacks · 1265fd61
      Steffen Klassert 提交于
      We call the wrong replay notify function when we use ESN replay
      handling. This leads to the fact that we don't send notifications
      if we use ESN. Fix this by calling the registered callbacks instead
      of xfrm_replay_notify().
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1265fd61
    • S
      xfrm: Remove unused xfrm_state from xfrm_state_check_space · 26b2072e
      Steffen Klassert 提交于
      The xfrm_state argument is unused in this function, so remove it.
      Also the name xfrm_state_check_space does not really match what this
      function does. It actually checks if we have enough head and tailroom
      on the skb. So we rename the function to xfrm_skb_check_space.
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26b2072e
    • D
      RDS: use gfp flags from caller in conn_alloc() · f0229eaa
      Dan Carpenter 提交于
      We should be using the gfp flags the caller specified here, instead of
      GFP_KERNEL.  I think this might be a bugfix, depending on the value of
      "sock->sk->sk_allocation" when we call rds_conn_create_outgoing() in
      rds_sendmsg().  Otherwise, it's just a cleanup.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NVenkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0229eaa
    • D
      netlabel: use GFP flags from caller instead of GFP_ATOMIC · 64b5fad5
      Dan Carpenter 提交于
      This function takes a GFP flags as a parameter, but they are never used.
      We don't take a lock in this function so there is no reason to prefer
      GFP_ATOMIC over the caller's GFP flags.
      
      There is only one caller, cipso_v4_map_cat_rng_ntoh(), and it passes
      GFP_ATOMIC as the GFP flags so this doesn't change how the code works.
      It's just a cleanup.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64b5fad5
  9. 22 3月, 2012 19 次提交