1. 23 1月, 2008 1 次提交
  2. 21 1月, 2008 2 次提交
  3. 10 1月, 2008 1 次提交
    • P
      [NEIGH]: Fix race between neigh_parms_release and neightbl_fill_parms · 9cd40029
      Pavel Emelyanov 提交于
      The neightbl_fill_parms() is called under the write-locked tbl->lock
      and accesses the parms->dev. The negh_parm_release() calls the
      dev_put(parms->dev) without this lock. This creates a tiny race window
      on which the parms contains potentially stale dev pointer.
      
      To fix this race it's enough to move the dev_put() upper under the
      tbl->lock, but note, that the parms are held by neighbors and thus can
      live after the neigh_parms_release() is called, so we still can have a
      parm with bad dev pointer.
      
      I didn't find where the neigh->parms->dev is accessed, but still think
      that putting the dev is to be done in a place, where the parms are
      really freed. Am I right with that?
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9cd40029
  4. 09 1月, 2008 2 次提交
    • P
      [NET]: Clone the sk_buff 'iif' field in __skb_clone() · 02f1c89d
      Paul Moore 提交于
      Both NetLabel and SELinux (other LSMs may grow to use it as well) rely
      on the 'iif' field to determine the receiving network interface of
      inbound packets.  Unfortunately, at present this field is not
      preserved across a skb clone operation which can lead to garbage
      values if the cloned skb is sent back through the network stack.  This
      patch corrects this problem by properly copying the 'iif' field in
      __skb_clone() and removing the 'iif' field assignment from
      skb_act_clone() since it is no longer needed.
      
      Also, while we are here, put the assignments in the same order as the
      offsets to reduce cacheline bounces.
      Signed-off-by: NPaul Moore <paul.moore@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02f1c89d
    • D
      [NET]: Stop polling when napi_disable() is pending. · fed17f30
      David S. Miller 提交于
      This finally adds the code in net_rx_action() to break out of the
      ->poll()'ing loop when a napi_disable() is found to be pending.
      
      Now, even if a device is being flooded with packets it can be cleanly
      brought down.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fed17f30
  5. 21 12月, 2007 2 次提交
    • W
      [NET]: Fix function put_cmsg() which may cause usr application memory overflow · 1ac70e7a
      Wei Yongjun 提交于
      When used function put_cmsg() to copy kernel information to user 
      application memory, if the memory length given by user application is 
      not enough, by the bad length calculate of msg.msg_controllen, 
      put_cmsg() function may cause the msg.msg_controllen to be a large 
      value, such as 0xFFFFFFF0, so the following put_cmsg() can also write 
      data to usr application memory even usr has no valid memory to store 
      this. This may cause usr application memory overflow.
      
      int put_cmsg(struct msghdr * msg, int level, int type, int len, void *data)
      {
          struct cmsghdr __user *cm
              = (__force struct cmsghdr __user *)msg->msg_control;
          struct cmsghdr cmhdr;
          int cmlen = CMSG_LEN(len);
          ~~~~~~~~~~~~~~~~~~~~~
          int err;
      
          if (MSG_CMSG_COMPAT & msg->msg_flags)
              return put_cmsg_compat(msg, level, type, len, data);
      
          if (cm==NULL || msg->msg_controllen < sizeof(*cm)) {
              msg->msg_flags |= MSG_CTRUNC;
              return 0; /* XXX: return error? check spec. */
          }
          if (msg->msg_controllen < cmlen) {
          ~~~~~~~~~~~~~~~~~~~~~~~~
              msg->msg_flags |= MSG_CTRUNC;
              cmlen = msg->msg_controllen;
          }
          cmhdr.cmsg_level = level;
          cmhdr.cmsg_type = type;
          cmhdr.cmsg_len = cmlen;
      
          err = -EFAULT;
          if (copy_to_user(cm, &cmhdr, sizeof cmhdr))
              goto out;
          if (copy_to_user(CMSG_DATA(cm), data, cmlen - sizeof(struct cmsghdr)))
              goto out;
          cmlen = CMSG_SPACE(len);
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~
          If MSG_CTRUNC flags is set, msg->msg_controllen is less than 
      CMSG_SPACE(len), "msg->msg_controllen -= cmlen" will cause unsinged int 
      type msg->msg_controllen to be a large value.
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~
          msg->msg_control += cmlen;
          msg->msg_controllen -= cmlen;
          ~~~~~~~~~~~~~~~~~~~~~
          err = 0;
      out:
          return err;
      }
      
      The same promble exists in put_cmsg_compat(). This patch can fix this 
      problem.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ac70e7a
    • J
      [NET] net/core/: Spelling fixes · 53ccaae1
      Joe Perches 提交于
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53ccaae1
  6. 11 12月, 2007 1 次提交
  7. 26 11月, 2007 1 次提交
    • H
      [SKBUFF]: Free old skb properly in skb_morph · 2d4baff8
      Herbert Xu 提交于
      The skb_morph function only freed the data part of the dst skb, but leaked
      the auxiliary data such as the netfilter fields.  This patch fixes this by
      moving the relevant parts from __kfree_skb to skb_release_all and calling
      it in skb_morph.
      
      It also makes kfree_skbmem static since it's no longer called anywhere else
      and it now no longer does skb_release_data.
      
      Thanks to Yasuyuki KOZAKAI for finding this problem and posting a patch for
      it.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      2d4baff8
  8. 20 11月, 2007 1 次提交
  9. 15 11月, 2007 2 次提交
    • P
      [INET]: Fix potential kfree on vmalloc-ed area of request_sock_queue · dab6ba36
      Pavel Emelyanov 提交于
      The request_sock_queue's listen_opt is either vmalloc-ed or
      kmalloc-ed depending on the number of table entries. Thus it 
      is expected to be handled properly on free, which is done in 
      the reqsk_queue_destroy().
      
      However the error path in inet_csk_listen_start() calls 
      the lite version of reqsk_queue_destroy, called 
      __reqsk_queue_destroy, which calls the kfree unconditionally. 
      
      Fix this and move the __reqsk_queue_destroy into a .c file as 
      it looks too big to be inline.
      
      As David also noticed, this is an error recovery path only,
      so no locking is required and the lopt is known to be not NULL.
      
      reqsk_queue_yank_listen_sk is also now only used in
      net/core/request_sock.c so we should move it there too.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Acked-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dab6ba36
    • P
      [NET]: Remove notifier block from chain when register_netdevice_notifier fails · c67625a1
      Pavel Emelyanov 提交于
      Commit fcc5a03a:
      
      	[NET]: Allow netdev REGISTER/CHANGENAME events to fail
      
      makes the register_netdevice_notifier() handle the error from the
      NETDEV_REGISTER event, sent to the registering block.
      
      The bad news is that in this case the notifier block is 
      not removed from the list, but the error is returned to the 
      caller. In case the caller is in module init function and 
      handles this error this can abort the module loading. The
      notifier block will be then removed from the kernel, but 
      will be left in the list. Oops :(
      
      I think that the notifier block should be removed from the
      chain in case of error, regardless whether this error is 
      handled by the caller or not. In the worst case (the error 
      is _not_ handled) module will not receive the events any 
      longer.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c67625a1
  10. 13 11月, 2007 3 次提交
  11. 11 11月, 2007 3 次提交
  12. 07 11月, 2007 5 次提交
    • P
      [NET]: Clean proto_(un)register from in-code ifdefs · b733c007
      Pavel Emelyanov 提交于
      The struct proto has the per-cpu "inuse" counter, which is handled
      with a special care. All the handling code hides under the ifdef
      CONFIG_SMP and it introduces some code duplication and makes it
      look worse than it could.
      
      Clean this.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b733c007
    • J
      [NETNS]: Fix compiler error in net_namespace.c · 45a19b0a
      Johann Felix Soden 提交于
      Because net_free is called by copy_net_ns before its declaration, the
      compiler gives an error. This patch puts net_free before copy_net_ns
      to fix this.
      
      The compiler error:
      net/core/net_namespace.c: In function 'copy_net_ns':
      net/core/net_namespace.c:97: error: implicit declaration of function 'net_free'
      net/core/net_namespace.c: At top level:
      net/core/net_namespace.c:104: warning: conflicting types for 'net_free'
      net/core/net_namespace.c:104: error: static declaration of 'net_free' follows non-static declaration
      net/core/net_namespace.c:97: error: previous implicit declaration of 'net_free' was here
      
      The error was introduced by the '[NET]: Hide the dead code in the
      net_namespace.c' patch (6a1a3b9f).
      Signed-off-by: NJohann Felix Soden <johfel@users.sourceforge.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45a19b0a
    • J
      [NET]: Removing duplicit #includes · 40208d71
      Jiri Olsa 提交于
      Removing duplicit #includes for net/
      Signed-off-by: NJiri Olsa <olsajiri@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40208d71
    • E
      [NET]: Define infrastructure to keep 'inuse' changes in an efficent SMP/NUMA way. · 286ab3d4
      Eric Dumazet 提交于
      "struct proto" currently uses an array stats[NR_CPUS] to track change on
      'inuse' sockets per protocol.
      
      If NR_CPUS is big, this means we use a big memory area for this.
      Moreover, all this memory area is located on a single node on NUMA
      machines, increasing memory pressure on the boot node.
      
      In this patch, I tried to :
      
      - Keep a fast !CONFIG_SMP implementation
      - Keep a fast CONFIG_SMP implementation for often used protocols
      (tcp,udp,raw,...)
      - Introduce a NUMA efficient implementation
      
      Some helper macros are defined in include/net/sock.h
      These macros take into account CONFIG_SMP
      
      If a "struct proto" is declared without using DEFINE_PROTO_INUSE /
      REF_PROTO_INUSE
      macros, it will automatically use a default implementation, using a
      dynamically allocated percpu zone.
      This default implementation will be NUMA efficient, but might use 32/64
      bytes per possible cpu
      because of current alloc_percpu() implementation.
      However it still should be better than previous implementation based on
      stats[NR_CPUS] field.
      
      When a "struct proto" is changed to use the new macros, we use a single
      static "int" percpu variable,
      lowering the memory and cpu costs, still preserving NUMA efficiency.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      286ab3d4
    • A
      [NET]: Remove /proc/net/stat/*_arp_cache upon module removal · 3f192b5c
      Alexey Dobriyan 提交于
      neigh_table_init_no_netlink() creates them, but they aren't removed anywhere.
      
      Steps to reproduce:
      
      	modprobe clip
      	rmmod clip
      	cat /proc/net/stat/clip_arp_cache
      
      BUG: unable to handle kernel paging request at virtual address f89d7758
      printing eip: c05a99da *pdpt = 0000000000004001 *pde = 0000000004408067 *pte = 0000000000000000
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in: atm af_packet ipv6 binfmt_misc sbs sbshc fan dock battery backlight ac power_supply parport loop rtc_cmos rtc_core rtc_lib serio_raw button k8temp hwmon amd_rng sr_mod cdrom shpchp pci_hotplug ehci_hcd ohci_hcd uhci_hcd usbcore
      Pid: 2082, comm: cat Not tainted (2.6.24-rc1-b1d08ac0-bloat #4)
      EIP: 0060:[<c05a99da>] EFLAGS: 00210256 CPU: 0
      EIP is at neigh_stat_seq_next+0x26/0x3f
      EAX: 00000001 EBX: f89d7600 ECX: c587bf40 EDX: 00000000
      ESI: 00000000 EDI: 00000001 EBP: 00000400 ESP: c587bf1c
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      Process cat (pid: 2082, ti=c587b000 task=c5984e10 task.ti=c587b000)
      Stack: c06228cc c5313790 c049e5c0 0804f000 c45a7b00 c53137b0 00000000 00000000
             00000082 00000001 00000000 00000000 00000000 fffffffb c58d6780 c049e437
             c45a7b00 c04b1f93 c587bfa0 00000400 0804f000 00000400 0804f000 c04b1f2f
      Call Trace:
       [<c049e5c0>] seq_read+0x189/0x281
       [<c049e437>] seq_read+0x0/0x281
       [<c04b1f93>] proc_reg_read+0x64/0x77
       [<c04b1f2f>] proc_reg_read+0x0/0x77
       [<c048907e>] vfs_read+0x80/0xd1
       [<c0489491>] sys_read+0x41/0x67
       [<c04080fa>] sysenter_past_esp+0x6b/0xc1
       =======================
      Code: e9 ec 8d 05 00 56 8b 11 53 8b 40 70 8b 58 3c eb 29 0f a3 15 80 91 7b c0 19 c0 85 c0 8d 42 01 74 17 89 c6 c1 fe 1f 89 01 89 71 04 <8b> 83 58 01 00 00 f7 d0 8b 04 90 eb 09 89 c2 83 fa 01 7e d2 31
      EIP: [<c05a99da>] neigh_stat_seq_next+0x26/0x3f SS:ESP 0068:c587bf1c
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f192b5c
  13. 02 11月, 2007 1 次提交
    • J
      [SG] Get rid of __sg_mark_end() · c46f2334
      Jens Axboe 提交于
      sg_mark_end() overwrites the page_link information, but all users want
      __sg_mark_end() behaviour where we just set the end bit. That is the most
      natural way to use the sg list, since you'll fill it in and then mark the
      end point.
      
      So change sg_mark_end() to only set the termination bit. Add a sg_magic
      debug check as well, and clear a chain pointer if it is set.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      c46f2334
  14. 01 11月, 2007 13 次提交
  15. 31 10月, 2007 2 次提交
    • D
      [NET]: Fix incorrect sg_mark_end() calls. · 51c739d1
      David S. Miller 提交于
      This fixes scatterlist corruptions added by
      
      	commit 68e3f5dd
      	[CRYPTO] users: Fix up scatterlist conversion errors
      
      The issue is that the code calls sg_mark_end() which clobbers the
      sg_page() pointer of the final scatterlist entry.
      
      The first part fo the fix makes skb_to_sgvec() do __sg_mark_end().
      
      After considering all skb_to_sgvec() call sites the most correct
      solution is to call __sg_mark_end() in skb_to_sgvec() since that is
      what all of the callers would end up doing anyways.
      
      I suspect this might have fixed some problems in virtio_net which is
      the sole non-crypto user of skb_to_sgvec().
      
      Other similar sg_mark_end() cases were converted over to
      __sg_mark_end() as well.
      
      Arguably sg_mark_end() is a poorly named function because it doesn't
      just "mark", it clears out the page pointer as a side effect, which is
      what led to these bugs in the first place.
      
      The one remaining plain sg_mark_end() call is in scsi_alloc_sgtable()
      and arguably it could be converted to __sg_mark_end() if only so that
      we can delete this confusing interface from linux/scatterlist.h
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51c739d1
    • D
      [NETNS]: fix net released by rcu callback · 310928d9
      Daniel Lezcano 提交于
      When a network namespace reference is held by a network subsystem,
      and when this reference is decremented in a rcu update callback, we
      must ensure that there is no more outstanding rcu update before
      trying to free the network namespace.
      
      In the normal case, the rcu_barrier is called when the network namespace
      is exiting in the cleanup_net function.
      
      But when a network namespace creation fails, and the subsystems are
      undone (like the cleanup), the rcu_barrier is missing.
      
      This patch adds the missing rcu_barrier.
      Signed-off-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      310928d9