1. 12 4月, 2012 1 次提交
  2. 06 4月, 2012 5 次提交
    • E
      net: fix a race in sock_queue_err_skb() · 110c4330
      Eric Dumazet 提交于
      As soon as an skb is queued into socket error queue, another thread
      can consume it, so we are not allowed to reference skb anymore, or risk
      use after free.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      110c4330
    • E
      netlink: fix races after skb queueing · 4a7e7c2a
      Eric Dumazet 提交于
      As soon as an skb is queued into socket receive_queue, another thread
      can consume it, so we are not allowed to reference skb anymore, or risk
      use after free.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a7e7c2a
    • S
      phonet: Check input from user before allocating · bcf1b70a
      Sasha Levin 提交于
      A phonet packet is limited to USHRT_MAX bytes, this is never checked during
      tx which means that the user can specify any size he wishes, and the kernel
      will attempt to allocate that size.
      
      In the good case, it'll lead to the following warning, but it may also cause
      the kernel to kick in the OOM and kill a random task on the server.
      
      [ 8921.744094] WARNING: at mm/page_alloc.c:2255 __alloc_pages_slowpath+0x65/0x730()
      [ 8921.749770] Pid: 5081, comm: trinity Tainted: G        W    3.4.0-rc1-next-20120402-sasha #46
      [ 8921.756672] Call Trace:
      [ 8921.758185]  [<ffffffff810b2ba7>] warn_slowpath_common+0x87/0xb0
      [ 8921.762868]  [<ffffffff810b2be5>] warn_slowpath_null+0x15/0x20
      [ 8921.765399]  [<ffffffff8117eae5>] __alloc_pages_slowpath+0x65/0x730
      [ 8921.769226]  [<ffffffff81179c8a>] ? zone_watermark_ok+0x1a/0x20
      [ 8921.771686]  [<ffffffff8117d045>] ? get_page_from_freelist+0x625/0x660
      [ 8921.773919]  [<ffffffff8117f3a8>] __alloc_pages_nodemask+0x1f8/0x240
      [ 8921.776248]  [<ffffffff811c03e0>] kmalloc_large_node+0x70/0xc0
      [ 8921.778294]  [<ffffffff811c4bd4>] __kmalloc_node_track_caller+0x34/0x1c0
      [ 8921.780847]  [<ffffffff821b0e3c>] ? sock_alloc_send_pskb+0xbc/0x260
      [ 8921.783179]  [<ffffffff821b3c65>] __alloc_skb+0x75/0x170
      [ 8921.784971]  [<ffffffff821b0e3c>] sock_alloc_send_pskb+0xbc/0x260
      [ 8921.787111]  [<ffffffff821b002e>] ? release_sock+0x7e/0x90
      [ 8921.788973]  [<ffffffff821b0ff0>] sock_alloc_send_skb+0x10/0x20
      [ 8921.791052]  [<ffffffff824cfc20>] pep_sendmsg+0x60/0x380
      [ 8921.792931]  [<ffffffff824cb4a6>] ? pn_socket_bind+0x156/0x180
      [ 8921.794917]  [<ffffffff824cb50f>] ? pn_socket_autobind+0x3f/0x90
      [ 8921.797053]  [<ffffffff824cb63f>] pn_socket_sendmsg+0x4f/0x70
      [ 8921.798992]  [<ffffffff821ab8e7>] sock_aio_write+0x187/0x1b0
      [ 8921.801395]  [<ffffffff810e325e>] ? sub_preempt_count+0xae/0xf0
      [ 8921.803501]  [<ffffffff8111842c>] ? __lock_acquire+0x42c/0x4b0
      [ 8921.805505]  [<ffffffff821ab760>] ? __sock_recv_ts_and_drops+0x140/0x140
      [ 8921.807860]  [<ffffffff811e07cc>] do_sync_readv_writev+0xbc/0x110
      [ 8921.809986]  [<ffffffff811958e7>] ? might_fault+0x97/0xa0
      [ 8921.811998]  [<ffffffff817bd99e>] ? security_file_permission+0x1e/0x90
      [ 8921.814595]  [<ffffffff811e17e2>] do_readv_writev+0xe2/0x1e0
      [ 8921.816702]  [<ffffffff810b8dac>] ? do_setitimer+0x1ac/0x200
      [ 8921.818819]  [<ffffffff810e2ec1>] ? get_parent_ip+0x11/0x50
      [ 8921.820863]  [<ffffffff810e325e>] ? sub_preempt_count+0xae/0xf0
      [ 8921.823318]  [<ffffffff811e1926>] vfs_writev+0x46/0x60
      [ 8921.825219]  [<ffffffff811e1a3f>] sys_writev+0x4f/0xb0
      [ 8921.827127]  [<ffffffff82658039>] system_call_fastpath+0x16/0x1b
      [ 8921.829384] ---[ end trace dffe390f30db9eb7 ]---
      Signed-off-by: NSasha Levin <levinsasha928@gmail.com>
      Acked-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bcf1b70a
    • E
      tcp: tcp_sendpages() should call tcp_push() once · 35f9c09f
      Eric Dumazet 提交于
      commit 2f533844 (tcp: allow splice() to build full TSO packets) added
      a regression for splice() calls using SPLICE_F_MORE.
      
      We need to call tcp_flush() at the end of the last page processed in
      tcp_sendpages(), or else transmits can be deferred and future sends
      stall.
      
      Add a new internal flag, MSG_SENDPAGE_NOTLAST, acting like MSG_MORE, but
      with different semantic.
      
      For all sendpage() providers, its a transparent change. Only
      sock_sendpage() and tcp_sendpages() can differentiate the two different
      flags provided by pipe_to_sendpage()
      Reported-by: NTom Herbert <therbert@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: H.K. Jerry Chu <hkchu@google.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail&gt;com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35f9c09f
    • S
      simple_open: automatically convert to simple_open() · 234e3405
      Stephen Boyd 提交于
      Many users of debugfs copy the implementation of default_open() when
      they want to support a custom read/write function op.  This leads to a
      proliferation of the default_open() implementation across the entire
      tree.
      
      Now that the common implementation has been consolidated into libfs we
      can replace all the users of this function with simple_open().
      
      This replacement was done with the following semantic patch:
      
      <smpl>
      @ open @
      identifier open_f != simple_open;
      identifier i, f;
      @@
      -int open_f(struct inode *i, struct file *f)
      -{
      (
      -if (i->i_private)
      -f->private_data = i->i_private;
      |
      -f->private_data = i->i_private;
      )
      -return 0;
      -}
      
      @ has_open depends on open @
      identifier fops;
      identifier open.open_f;
      @@
      struct file_operations fops = {
      ...
      -.open = open_f,
      +.open = simple_open,
      ...
      };
      </smpl>
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Julia Lawall <Julia.Lawall@lip6.fr>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      234e3405
  3. 05 4月, 2012 2 次提交
    • R
      ipv6: fix array index in ip6_mc_add_src() · 78d50217
      RongQing.Li 提交于
      Convert array index from the loop bound to the loop index.
      
      And remove the void type conversion to ip6_mc_del1_src() return
      code, seem it is unnecessary, since ip6_mc_del1_src() does not
      use __must_check similar attribute, no compiler will report the
      warning when it is removed.
      
      v2: enrich the commit header
      Signed-off-by: NRongQing.Li <roy.qing.li@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78d50217
    • T
      sctp: Allow struct sctp_event_subscribe to grow without breaking binaries · acdd5985
      Thomas Graf 提交于
      getsockopt(..., SCTP_EVENTS, ...) performs a length check and returns
      an error if the user provides less bytes than the size of struct
      sctp_event_subscribe.
      
      Struct sctp_event_subscribe needs to be extended by an u8 for every
      new event or notification type that is added.
      
      This obviously makes getsockopt fail for binaries that are compiled
      against an older versions of <net/sctp/user.h> which do not contain
      all event types.
      
      This patch changes getsockopt behaviour to no longer return an error
      if not enough bytes are being provided by the user. Instead, it
      returns as much of sctp_event_subscribe as fits into the provided buffer.
      
      This leads to the new behavior that users see what they have been aware
      of at compile time.
      
      The setsockopt(..., SCTP_EVENTS, ...) API is already behaving like this.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      acdd5985
  4. 04 4月, 2012 6 次提交
  5. 03 4月, 2012 1 次提交
  6. 02 4月, 2012 4 次提交
  7. 29 3月, 2012 2 次提交
  8. 28 3月, 2012 4 次提交
  9. 27 3月, 2012 3 次提交
  10. 26 3月, 2012 3 次提交
  11. 24 3月, 2012 1 次提交
    • H
      poll: add poll_requested_events() and poll_does_not_wait() functions · 626cf236
      Hans Verkuil 提交于
      In some cases the poll() implementation in a driver has to do different
      things depending on the events the caller wants to poll for.  An example
      is when a driver needs to start a DMA engine if the caller polls for
      POLLIN, but doesn't want to do that if POLLIN is not requested but instead
      only POLLOUT or POLLPRI is requested.  This is something that can happen
      in the video4linux subsystem among others.
      
      Unfortunately, the current epoll/poll/select implementation doesn't
      provide that information reliably.  The poll_table_struct does have it: it
      has a key field with the event mask.  But once a poll() call matches one
      or more bits of that mask any following poll() calls are passed a NULL
      poll_table pointer.
      
      Also, the eventpoll implementation always left the key field at ~0 instead
      of using the requested events mask.
      
      This was changed in eventpoll.c so the key field now contains the actual
      events that should be polled for as set by the caller.
      
      The solution to the NULL poll_table pointer is to set the qproc field to
      NULL in poll_table once poll() matches the events, not the poll_table
      pointer itself.  That way drivers can obtain the mask through a new
      poll_requested_events inline.
      
      The poll_table_struct can still be NULL since some kernel code calls it
      internally (netfs_state_poll() in ./drivers/staging/pohmelfs/netfs.h).  In
      that case poll_requested_events() returns ~0 (i.e.  all events).
      
      Very rarely drivers might want to know whether poll_wait will actually
      wait.  If another earlier file descriptor in the set already matched the
      events the caller wanted to wait for, then the kernel will return from the
      select() call without waiting.  This might be useful information in order
      to avoid doing expensive work.
      
      A new helper function poll_does_not_wait() is added that drivers can use
      to detect this situation.  This is now used in sock_poll_wait() in
      include/net/sock.h.  This was the only place in the kernel that needed
      this information.
      
      Drivers should no longer access any of the poll_table internals, but use
      the poll_requested_events() and poll_does_not_wait() access functions
      instead.  In order to enforce that the poll_table fields are now prepended
      with an underscore and a comment was added warning against using them
      directly.
      
      This required a change in unix_dgram_poll() in unix/af_unix.c which used
      the key field to get the requested events.  It's been replaced by a call
      to poll_requested_events().
      
      For qproc it was especially important to change its name since the
      behavior of that field changes with this patch since this function pointer
      can now be NULL when that wasn't possible in the past.
      
      Any driver accessing the qproc or key fields directly will now fail to compile.
      
      Some notes regarding the correctness of this patch: the driver's poll()
      function is called with a 'struct poll_table_struct *wait' argument.  This
      pointer may or may not be NULL, drivers can never rely on it being one or
      the other as that depends on whether or not an earlier file descriptor in
      the select()'s fdset matched the requested events.
      
      There are only three things a driver can do with the wait argument:
      
      1) obtain the key field:
      
      	events = wait ? wait->key : ~0;
      
         This will still work although it should be replaced with the new
         poll_requested_events() function (which does exactly the same).
         This will now even work better, since wait is no longer set to NULL
         unnecessarily.
      
      2) use the qproc callback. This could be deadly since qproc can now be
         NULL. Renaming qproc should prevent this from happening. There are no
         kernel drivers that actually access this callback directly, BTW.
      
      3) test whether wait == NULL to determine whether poll would return without
         waiting. This is no longer sufficient as the correct test is now
         wait == NULL || wait->_qproc == NULL.
      
         However, the worst that can happen here is a slight performance hit in
         the case where wait != NULL and wait->_qproc == NULL. In that case the
         driver will assume that poll_wait() will actually add the fd to the set
         of waiting file descriptors. Of course, poll_wait() will not do that
         since it tests for wait->_qproc. This will not break anything, though.
      
         There is only one place in the whole kernel where this happens
         (sock_poll_wait() in include/net/sock.h) and that code will be replaced
         by a call to poll_does_not_wait() in the next patch.
      
         Note that even if wait->_qproc != NULL drivers cannot rely on poll_wait()
         actually waiting. The next file descriptor from the set might match the
         event mask and thus any possible waits will never happen.
      Signed-off-by: NHans Verkuil <hans.verkuil@cisco.com>
      Reviewed-by: NJonathan Corbet <corbet@lwn.net>
      Reviewed-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      626cf236
  12. 23 3月, 2012 8 次提交
    • A
      bonding: remove entries for master_ip and vlan_ip and query devices instead · eaddcd76
      Andy Gospodarek 提交于
      The following patch aimed to resolve an issue where secondary, tertiary,
      etc. addresses added to bond interfaces could overwrite the
      bond->master_ip and vlan_ip values.
      
              commit 917fbdb3
              Author: Henrik Saavedra Persson <henrik.e.persson@ericsson.com>
              Date:   Wed Nov 23 23:37:15 2011 +0000
      
                  bonding: only use primary address for ARP
      
      That patch was good because it prevented bonds using ARP monitoring from
      sending frames with an invalid source IP address.  Unfortunately, it
      didn't always work as expected.
      
      When using an ioctl (like ifconfig does) to set the IP address and
      netmask, 2 separate ioctls are actually called to set the IP and netmask
      if the mask chosen doesn't match the standard mask for that class of
      address.  The first ioctl did not have a mask that matched the one in
      the primary address and would still cause the device address to be
      overwritten.  The second ioctl that was called to set the mask would
      then detect as secondary and ignored, but the damage was already done.
      
      This was not an issue when using an application that used netlink
      sockets as the setting of IP and netmask came down at once.  The
      inconsistent behavior between those two interfaces was something that
      needed to be resolved.
      
      While I was thinking about how I wanted to resolve this, Ralf Zeidler
      came with a patch that resolved this on a RHEL kernel by keeping a full
      shadow of the entries in dev->ifa_list for the bonding device and vlan
      devices in the bonding driver.  I didn't like the duplication of the
      list as I want to see the 'bonding' struct and code shrink rather than
      grow, but liked the general idea.
      
      As the Subject indicates this patch drops the master_ip and vlan_ip
      elements from the 'bonding' and 'vlan_entry' structs, respectively.
      This can be done because a device's address-list is now traversed to
      determine the optimal source IP address for ARP requests and for checks
      to see if the bonding device has a particular IP address.  This code
      could have all be contained inside the bonding driver, but it made more
      sense to me to EXPORT and call inet_confirm_addr since it did exactly
      what was needed.
      
      I tested this and a backported patch and everything works as expected.
      Ralf also helped with verification of the backported patch.
      
      Thanks to Ralf for all his help on this.
      
      v2: Whitespace and organizational changes based on suggestions from Jay
      Vosburgh and Dave Miller.
      
      v3: Fixup incorrect usage of rcu_read_unlock based on Dave Miller's
      suggestion.
      Signed-off-by: NAndy Gospodarek <andy@greyhouse.net>
      CC: Ralf Zeidler <ralf.zeidler@nsn.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eaddcd76
    • R
      netfilter: remove forward module param confusion. · 523f610e
      Rusty Russell 提交于
      It used to be an int, and it got changed to a bool parameter at least
      7 years ago.  It happens that NF_ACCEPT and NF_DROP are 0 and 1, so
      this works, but it's unclear, and the check that it's in range is not
      required.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      523f610e
    • P
      netfilter: nf_conntrack: permanently attach timeout policy to conntrack · 60b5f8f7
      Pablo Neira Ayuso 提交于
      We need to permanently attach the timeout policy to the conntrack,
      otherwise we may apply the custom timeout policy inconsistently.
      
      Without this patch, the following example:
      
       nfct timeout add test inet icmp timeout 100
       iptables -I PREROUTING -t raw -p icmp -s 1.1.1.1 -j CT --timeout test
      
      Will only apply the custom timeout policy to outgoing packets from
      1.1.1.1, but not to reply packets from 2.2.2.2 going to 1.1.1.1.
      
      To fix this issue, this patch modifies the current logic to attach the
      timeout policy when the first packet is seen (which is when the
      conntrack entry is created). Then, we keep using the attached timeout
      policy until the conntrack entry is destroyed.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      60b5f8f7
    • P
      netfilter: xt_CT: fix assignation of the generic protocol tracker · eeb4cb95
      Pablo Neira Ayuso 提交于
      `iptables -p all' uses 0 to match all protocols, while the conntrack
      subsystem uses 255. We still need `-p all' to attach the custom
      timeout policies for the generic protocol tracker.
      
      Moreover, we may use `iptables -p sctp' while the SCTP tracker is
      not loaded. In that case, we have to default on the generic protocol
      tracker.
      
      Another possibility is `iptables -p ip' that should be supported
      as well. This patch makes sure we validate all possible scenarios.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      eeb4cb95
    • P
      netfilter: xt_CT: missing rcu_read_lock section in timeout assignment · 1ac0bf99
      Pablo Neira Ayuso 提交于
      Fix a dereference to pointer without rcu_read_lock held.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      1ac0bf99
    • P
      netfilter: cttimeout: fix dependency with l4protocol conntrack module · c1ebd7df
      Pablo Neira Ayuso 提交于
      This patch introduces nf_conntrack_l4proto_find_get() and
      nf_conntrack_l4proto_put() to fix module dependencies between
      timeout objects and l4-protocol conntrack modules.
      
      Thus, we make sure that the module cannot be removed if it is
      used by any of the cttimeout objects.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c1ebd7df
    • S
      xfrm: Access the replay notify functions via the registered callbacks · 1265fd61
      Steffen Klassert 提交于
      We call the wrong replay notify function when we use ESN replay
      handling. This leads to the fact that we don't send notifications
      if we use ESN. Fix this by calling the registered callbacks instead
      of xfrm_replay_notify().
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1265fd61
    • S
      xfrm: Remove unused xfrm_state from xfrm_state_check_space · 26b2072e
      Steffen Klassert 提交于
      The xfrm_state argument is unused in this function, so remove it.
      Also the name xfrm_state_check_space does not really match what this
      function does. It actually checks if we have enough head and tailroom
      on the skb. So we rename the function to xfrm_skb_check_space.
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26b2072e