1. 02 8月, 2013 1 次提交
  2. 11 7月, 2013 2 次提交
  3. 09 7月, 2013 1 次提交
  4. 26 6月, 2013 1 次提交
    • E
      net: poll/select low latency socket support · 2d48d67f
      Eliezer Tamir 提交于
      select/poll busy-poll support.
      
      Split sysctl value into two separate ones, one for read and one for poll.
      updated Documentation/sysctl/net.txt
      
      Add a new poll flag POLL_LL. When this flag is set, sock_poll will call
      sk_poll_ll if possible. sock_poll sets this flag in its return value
      to indicate to select/poll when a socket that can busy poll is found.
      
      When poll/select have nothing to report, call the low-level
      sock_poll again until we are out of time or we find something.
      
      Once the system call finds something, it stops setting POLL_LL, so it can
      return the result to the user ASAP.
      Signed-off-by: NEliezer Tamir <eliezer.tamir@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d48d67f
  5. 18 6月, 2013 2 次提交
  6. 11 6月, 2013 1 次提交
  7. 07 6月, 2013 1 次提交
  8. 29 5月, 2013 1 次提交
    • A
      net: Block MSG_CMSG_COMPAT in send(m)msg and recv(m)msg · 1be374a0
      Andy Lutomirski 提交于
      To: linux-kernel@vger.kernel.org
      Cc: x86@kernel.org, trinity@vger.kernel.org, Andy Lutomirski <luto@amacapital.net>, netdev@vger.kernel.org, "David S.
      	Miller" <davem@davemloft.net>
      Subject: [PATCH 5/5] net: Block MSG_CMSG_COMPAT in send(m)msg and recv(m)msg
      
      MSG_CMSG_COMPAT is (AFAIK) not intended to be part of the API --
      it's a hack that steals a bit to indicate to other networking code
      that a compat entry was used.  So don't allow it from a non-compat
      syscall.
      
      This prevents an oops when running this code:
      
      int main()
      {
      	int s;
      	struct sockaddr_in addr;
      	struct msghdr *hdr;
      
      	char *highpage = mmap((void*)(TASK_SIZE_MAX - 4096), 4096,
      	                      PROT_READ | PROT_WRITE,
      	                      MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
      	if (highpage == MAP_FAILED)
      		err(1, "mmap");
      
      	s = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
      	if (s == -1)
      		err(1, "socket");
      
              addr.sin_family = AF_INET;
              addr.sin_port = htons(1);
              addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
      	if (connect(s, (struct sockaddr*)&addr, sizeof(addr)) != 0)
      		err(1, "connect");
      
      	void *evil = highpage + 4096 - COMPAT_MSGHDR_SIZE;
      	printf("Evil address is %p\n", evil);
      
      	if (syscall(__NR_sendmmsg, s, evil, 1, MSG_CMSG_COMPAT) < 0)
      		err(1, "sendmmsg");
      
      	return 0;
      }
      
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1be374a0
  9. 23 5月, 2013 1 次提交
  10. 30 4月, 2013 1 次提交
  11. 20 4月, 2013 1 次提交
    • D
      net: socket: move ktime2ts to ktime header api · 6e94d1ef
      Daniel Borkmann 提交于
      Currently, ktime2ts is a small helper function that is only used in
      net/socket.c. Move this helper into the ktime API as a small inline
      function, so that i) it's maintained together with ktime routines,
      and ii) also other files can make use of it. The function is named
      ktime_to_timespec_cond() and placed into the generic part of ktime,
      since we internally make use of ktime_to_timespec(). ktime_to_timespec()
      itself does not check the ktime variable for zero, hence, we name
      this function ktime_to_timespec_cond() for only a conditional
      conversion, and adapt its users to it.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e94d1ef
  12. 15 4月, 2013 1 次提交
  13. 11 4月, 2013 1 次提交
  14. 26 2月, 2013 1 次提交
  15. 23 2月, 2013 1 次提交
    • A
      fs: Preserve error code in get_empty_filp(), part 2 · 39b65252
      Anatol Pomozov 提交于
      Allocating a file structure in function get_empty_filp() might fail because
      of several reasons:
       - not enough memory for file structures
       - operation is not allowed
       - user is over its limit
      
      Currently the function returns NULL in all cases and we loose the exact
      reason of the error. All callers of get_empty_filp() assume that the function
      can fail with ENFILE only.
      
      Return error through pointer. Change all callers to preserve this error code.
      
      [AV: cleaned up a bit, carved the get_empty_filp() part out into a separate commit
      (things remaining here deal with alloc_file()), removed pipe(2) behaviour change]
      Signed-off-by: NAnatol Pomozov <anatol.pomozov@gmail.com>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      39b65252
  16. 12 2月, 2013 1 次提交
  17. 01 2月, 2013 1 次提交
    • P
      wanrouter: completely decouple obsolete code from kernel. · a786a7c0
      Paul Gortmaker 提交于
      The original suggestion to delete wanrouter started earlier
      with the mainline commit f0d1b3c2
      ("net/wanrouter: Deprecate and schedule for removal") in May 2012.
      
      More importantly, Dan Carpenter found[1] that the driver had a
      fundamental breakage introduced back in 2008, with commit
      7be6065b ("netdevice wanrouter: Convert directly reference of
      netdev->priv").  So we know with certainty that the code hasn't been
      used by anyone willing to at least take the effort to send an e-mail
      report of breakage for at least 4 years.
      
      This commit does a decouple of the wanrouter subsystem, by going
      after the Makefile/Kconfig and similar files, so that these mainline
      files that we are keeping do not have the big wanrouter file/driver
      deletion commit tied into their history.
      
      Once this commit is in place, we then can remove the obsolete cyclomx
      drivers and similar that have a dependency on CONFIG_WAN_ROUTER_DRIVERS.
      
      [1] http://www.spinics.net/lists/netdev/msg218670.htmlOriginally-by: NJoe Perches <joe@perches.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      a786a7c0
  18. 26 10月, 2012 2 次提交
    • D
      cgroup: net_cls: Rework update socket logic · 6a328d8c
      Daniel Wagner 提交于
      The cgroup logic part of net_cls is very similar as the one in
      net_prio. Let's stream line the net_cls logic with the net_prio one.
      
      The net_prio update logic was changed by following commit (note there
      were some changes necessary later on)
      
      commit 406a3c63
      Author: John Fastabend <john.r.fastabend@intel.com>
      Date:   Fri Jul 20 10:39:25 2012 +0000
      
          net: netprio_cgroup: rework update socket logic
      
          Instead of updating the sk_cgrp_prioidx struct field on every send
          this only updates the field when a task is moved via cgroup
          infrastructure.
      
          This allows sockets that may be used by a kernel worker thread
          to be managed. For example in the iscsi case today a user can
          put iscsid in a netprio cgroup and control traffic will be sent
          with the correct sk_cgrp_prioidx value set but as soon as data
          is sent the kernel worker thread isssues a send and sk_cgrp_prioidx
          is updated with the kernel worker threads value which is the
          default case.
      
          It seems more correct to only update the field when the user
          explicitly sets it via control group infrastructure. This allows
          the users to manage sockets that may be used with other threads.
      
      Since classid is now updated when the task is moved between the
      cgroups, we don't have to call sock_update_classid() from various
      places to ensure we always using the latest classid value.
      
      [v2: Use iterate_fd() instead of open coding]
      Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc:  Li Zefan <lizefan@huawei.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: <netdev@vger.kernel.org>
      Cc: <cgroups@vger.kernel.org>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a328d8c
    • D
      cgroup: net_cls: Pass in task to sock_update_classid() · fd9a08a7
      Daniel Wagner 提交于
      sock_update_classid() assumes that the update operation always are
      applied on the current task. sock_update_classid() needs to know on
      which tasks to work on in order to be able to migrate task between
      cgroups using the struct cgroup_subsys attach() callback.
      Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: <netdev@vger.kernel.org>
      Cc: <cgroups@vger.kernel.org>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd9a08a7
  19. 28 9月, 2012 1 次提交
  20. 27 9月, 2012 2 次提交
  21. 06 9月, 2012 1 次提交
    • M
      Fix order of arguments to compat_put_time[spec|val] · ed6fe9d6
      Mikulas Patocka 提交于
      Commit 644595f8 ("compat: Handle COMPAT_USE_64BIT_TIME in
      net/socket.c") introduced a bug where the helper functions to take
      either a 64-bit or compat time[spec|val] got the arguments in the wrong
      order, passing the kernel stack pointer off as a user pointer (and vice
      versa).
      
      Because of the user address range check, that in turn then causes an
      EFAULT due to the user pointer range checking failing for the kernel
      address.  Incorrectly resuling in a failed system call for 32-bit
      processes with a 64-bit kernel.
      
      On odder architectures like HP-PA (with separate user/kernel address
      spaces), it can be used read kernel memory.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed6fe9d6
  22. 05 9月, 2012 1 次提交
    • M
      net: Providing protocol type via system.sockprotoname xattr of /proc/PID/fd entries · 600e1779
      Masatake YAMATO 提交于
      lsof reports some of socket descriptors as "can't identify protocol" like:
      
          [yamato@localhost]/tmp% sudo lsof | grep dbus | grep iden
          dbus-daem   652          dbus    6u     sock ... 17812 can't identify protocol
          dbus-daem   652          dbus   34u     sock ... 24689 can't identify protocol
          dbus-daem   652          dbus   42u     sock ... 24739 can't identify protocol
          dbus-daem   652          dbus   48u     sock ... 22329 can't identify protocol
          ...
      
      lsof cannot resolve the protocol used in a socket because procfs
      doesn't provide the map between inode number on sockfs and protocol
      type of the socket.
      
      For improving the situation this patch adds an extended attribute named
      'system.sockprotoname' in which the protocol name for
      /proc/PID/fd/SOCKET is stored. So lsof can know the protocol for a
      given /proc/PID/fd/SOCKET with getxattr system call.
      
      A few weeks ago I submitted a patch for the same purpose. The patch
      was introduced /proc/net/sockfs which enumerates inodes and protocols
      of all sockets alive on a system. However, it was rejected because (1)
      a global lock was needed, and (2) the layout of struct socket was
      changed with the patch.
      
      This patch doesn't use any global lock; and doesn't change the layout
      of any structs.
      
      In this patch, a protocol name is stored to dentry->d_name of sockfs
      when new socket is associated with a file descriptor. Before this
      patch dentry->d_name was not used; it was just filled with empty
      string. lsof may use an extended attribute named
      'system.sockprotoname' to retrieve the value of dentry->d_name.
      
      It is nice if we can see the protocol name with ls -l
      /proc/PID/fd. However, "socket:[#INODE]", the name format returned
      from sockfs_dname() was already defined. To keep the compatibility
      between kernel and user land, the extended attribute is used to
      prepare the value of dentry->d_name.
      Signed-off-by: NMasatake YAMATO <yamato@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      600e1779
  23. 16 8月, 2012 1 次提交
  24. 23 7月, 2012 1 次提交
    • J
      net: netprio_cgroup: rework update socket logic · 406a3c63
      John Fastabend 提交于
      Instead of updating the sk_cgrp_prioidx struct field on every send
      this only updates the field when a task is moved via cgroup
      infrastructure.
      
      This allows sockets that may be used by a kernel worker thread
      to be managed. For example in the iscsi case today a user can
      put iscsid in a netprio cgroup and control traffic will be sent
      with the correct sk_cgrp_prioidx value set but as soon as data
      is sent the kernel worker thread isssues a send and sk_cgrp_prioidx
      is updated with the kernel worker threads value which is the
      default case.
      
      It seems more correct to only update the field when the user
      explicitly sets it via control group infrastructure. This allows
      the users to manage sockets that may be used with other threads.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      406a3c63
  25. 21 7月, 2012 1 次提交
    • M
      tun: fix a crash bug and a memory leak · b09e786b
      Mikulas Patocka 提交于
      This patch fixes a crash
      tun_chr_close -> netdev_run_todo -> tun_free_netdev -> sk_release_kernel ->
      sock_release -> iput(SOCK_INODE(sock))
      introduced by commit 1ab5ecb9
      
      The problem is that this socket is embedded in struct tun_struct, it has
      no inode, iput is called on invalid inode, which modifies invalid memory
      and optionally causes a crash.
      
      sock_release also decrements sockets_in_use, this causes a bug that
      "sockets: used" field in /proc/*/net/sockstat keeps on decreasing when
      creating and closing tun devices.
      
      This patch introduces a flag SOCK_EXTERNALLY_ALLOCATED that instructs
      sock_release to not free the inode and not decrement sockets_in_use,
      fixing both memory corruption and sockets_in_use underflow.
      
      It should be backported to 3.3 an 3.4 stabke.
      Signed-off-by: NMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: stable@kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b09e786b
  26. 16 5月, 2012 1 次提交
  27. 15 5月, 2012 1 次提交
  28. 22 4月, 2012 1 次提交
  29. 21 4月, 2012 1 次提交
  30. 16 4月, 2012 1 次提交
  31. 06 4月, 2012 1 次提交
    • E
      tcp: tcp_sendpages() should call tcp_push() once · 35f9c09f
      Eric Dumazet 提交于
      commit 2f533844 (tcp: allow splice() to build full TSO packets) added
      a regression for splice() calls using SPLICE_F_MORE.
      
      We need to call tcp_flush() at the end of the last page processed in
      tcp_sendpages(), or else transmits can be deferred and future sends
      stall.
      
      Add a new internal flag, MSG_SENDPAGE_NOTLAST, acting like MSG_MORE, but
      with different semantic.
      
      For all sendpage() providers, its a transparent change. Only
      sock_sendpage() and tcp_sendpages() can differentiate the two different
      flags provided by pipe_to_sendpage()
      Reported-by: NTom Herbert <therbert@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: H.K. Jerry Chu <hkchu@google.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail&gt;com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35f9c09f
  32. 12 3月, 2012 1 次提交
  33. 21 2月, 2012 1 次提交
  34. 13 1月, 2012 1 次提交
  35. 06 1月, 2012 1 次提交
    • L
      vfs: fix up ENOIOCTLCMD error handling · 07d106d0
      Linus Torvalds 提交于
      We're doing some odd things there, which already messes up various users
      (see the net/socket.c code that this removes), and it was going to add
      yet more crud to the block layer because of the incorrect error code
      translation.
      
      ENOIOCTLCMD is not an error return that should be returned to user mode
      from the "ioctl()" system call, but it should *not* be translated as
      EINVAL ("Invalid argument").  It should be translated as ENOTTY
      ("Inappropriate ioctl for device").
      
      That EINVAL confusion has apparently so permeated some code that the
      block layer actually checks for it, which is sad.  We continue to do so
      for now, but add a big comment about how wrong that is, and we should
      remove it entirely eventually.  In the meantime, this tries to keep the
      changes localized to just the EINVAL -> ENOTTY fix, and removing code
      that makes it harder to do the right thing.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      07d106d0
  36. 05 1月, 2012 1 次提交