提交 · 08e6d76821557da7c443dc56ea430c7c92b36a6f · openanolis / cloud-kernel

15 1月, 2020 1 次提交

alinux: fs,ext4: remove projid limit when create hard link · 08e6d768

由 zhangliguang 提交于 12月 27, 2018

This is a temporary workaround plan to avoid the limitation when
creating hard link cross two projids.
Signed-off-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

08e6d768

24 8月, 2018 1 次提交

namei: allow restricted O_CREAT of FIFOs and regular files · 30aba665

由 Salvatore Mesoraca 提交于 8月 23, 2018

Disallows open of FIFOs or regular files not owned by the user in world
writable sticky directories, unless the owner is the same as that of the
directory or the file is opened without the O_CREAT flag.  The purpose
is to make data spoofing attacks harder.  This protection can be turned
on and off separately for FIFOs and regular files via sysctl, just like
the symlinks/hardlinks protection.  This patch is based on Openwall's
"HARDEN_FIFO" feature by Solar Designer.

This is a brief list of old vulnerabilities that could have been prevented
by this feature, some of them even allow for privilege escalation:

CVE-2000-1134
CVE-2007-3852
CVE-2008-0525
CVE-2009-0416
CVE-2011-4834
CVE-2015-1838
CVE-2015-7442
CVE-2016-7489

This list is not meant to be complete.  It's difficult to track down all
vulnerabilities of this kind because they were often reported without any
mention of this particular attack vector.  In fact, before
hardlinks/symlinks restrictions, fifos/regular files weren't the favorite
vehicle to exploit them.

[s.mesoraca16@gmail.com: fix bug reported by Dan Carpenter]
  Link: https://lkml.kernel.org/r/20180426081456.GA7060@mwanda
  Link: http://lkml.kernel.org/r/1524829819-11275-1-git-send-email-s.mesoraca16@gmail.com
[keescook@chromium.org: drop pr_warn_ratelimited() in favor of audit changes in the future]
[keescook@chromium.org: adjust commit subjet]
Link: http://lkml.kernel.org/r/20180416175918.GA13494@beastSigned-off-by: NSalvatore Mesoraca <s.mesoraca16@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Suggested-by: NSolar Designer <solar@openwall.com>
Suggested-by: NKees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

30aba665

04 10月, 2017 1 次提交

docs: Update binfmt_misc links · 852f1a21

由 Tom Saeger 提交于 10月 03, 2017

Documentation/binfmt_misc.txt moved to
Documentation/admin-guide/binfmt-misc.rst
Signed-off-by: NTom Saeger <tom.saeger@oracle.com>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

852f1a21

01 10月, 2016 1 次提交

mnt: Add a per mount namespace limit on the number of mounts · d2921684

由 Eric W. Biederman 提交于 9月 28, 2016

CAI Qian <caiqian@redhat.com> pointed out that the semantics
of shared subtrees make it possible to create an exponentially
increasing number of mounts in a mount namespace.

    mkdir /tmp/1 /tmp/2
    mount --make-rshared /
    for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done

Will create create 2^20 or 1048576 mounts, which is a practical problem
as some people have managed to hit this by accident.

As such CVE-2016-6213 was assigned.

Ian Kent <raven@themaw.net> described the situation for autofs users
as follows:

> The number of mounts for direct mount maps is usually not very large because of
> the way they are implemented, large direct mount maps can have performance
> problems. There can be anywhere from a few (likely case a few hundred) to less
> than 10000, plus mounts that have been triggered and not yet expired.
>
> Indirect mounts have one autofs mount at the root plus the number of mounts that
> have been triggered and not yet expired.
>
> The number of autofs indirect map entries can range from a few to the common
> case of several thousand and in rare cases up to between 30000 and 50000. I've
> not heard of people with maps larger than 50000 entries.
>
> The larger the number of map entries the greater the possibility for a large
> number of active mounts so it's not hard to expect cases of a 1000 or somewhat
> more active mounts.

So I am setting the default number of mounts allowed per mount
namespace at 100,000.  This is more than enough for any use case I
know of, but small enough to quickly stop an exponential increase
in mounts.  Which should be perfect to catch misconfigurations and
malfunctioning programs.

For anyone who needs a higher limit this can be changed by writing
to the new /proc/sys/fs/mount-max sysctl.
Tested-by: NCAI Qian <caiqian@redhat.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

d2921684

20 1月, 2016 1 次提交

pipe: limit the per-user amount of pages allocated in pipes · 759c0114

由 Willy Tarreau 提交于 1月 18, 2016

On no-so-small systems, it is possible for a single process to cause an
OOM condition by filling large pipes with data that are never read. A
typical process filling 4000 pipes with 1 MB of data will use 4 GB of
memory. On small systems it may be tricky to set the pipe max size to
prevent this from happening.

This patch makes it possible to enforce a per-user soft limit above
which new pipes will be limited to a single page, effectively limiting
them to 4 kB each, as well as a hard limit above which no new pipes may
be created for this user. This has the effect of protecting the system
against memory abuse without hurting other users, and still allowing
pipes to work correctly though with less data at once.

The limit are controlled by two new sysctls : pipe-user-pages-soft, and
pipe-user-pages-hard. Both may be disabled by setting them to zero. The
default soft limit allows the default number of FDs per process (1024)
to create pipes of the default size (64kB), thus reaching a limit of 64MB
before starting to create only smaller pipes. With 256 processes limited
to 1024 FDs each, this results in 1024*64kB + (256*1024 - 1024) * 4kB =
1084 MB of memory allocated for a user. The hard limit is disabled by
default to avoid breaking existing applications that make intensive use
of pipes (eg: for splicing).

Reported-by: socketpair@gmail.com
Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Mitigates: CVE-2013-4312 (Linux 2.0+)
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NWilly Tarreau <w@1wt.eu>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

759c0114

31 7月, 2012 1 次提交

fs: make dumpable=2 require fully qualified path · 9520628e

由 Kees Cook 提交于 7月 30, 2012

When the suid_dumpable sysctl is set to "2", and there is no core dump
pipe defined in the core_pattern sysctl, a local user can cause core files
to be written to root-writable directories, potentially with
user-controlled content.

This means an admin can unknowningly reintroduce a variation of
CVE-2006-2451, allowing local users to gain root privileges.

  $ cat /proc/sys/fs/suid_dumpable
  2
  $ cat /proc/sys/kernel/core_pattern
  core
  $ ulimit -c unlimited
  $ cd /
  $ ls -l core
  ls: cannot access core: No such file or directory
  $ touch core
  touch: cannot touch `core': Permission denied
  $ OHAI="evil-string-here" ping localhost >/dev/null 2>&1 &
  $ pid=$!
  $ sleep 1
  $ kill -SEGV $pid
  $ ls -l core
  -rw------- 1 root kees 458752 Jun 21 11:35 core
  $ sudo strings core | grep evil
  OHAI=evil-string-here

While cron has been fixed to abort reading a file when there is any
parse error, there are still other sensitive directories that will read
any file present and skip unparsable lines.

Instead of introducing a suid_dumpable=3 mode and breaking all users of
mode 2, this only disables the unsafe portion of mode 2 (writing to disk
via relative path).  Most users of mode 2 (e.g.  Chrome OS) already use
a core dump pipe handler, so this change will not break them.  For the
situations where a pipe handler is not defined but mode 2 is still
active, crash dumps will only be written to fully qualified paths.  If a
relative path is defined (e.g.  the default "core" pattern), dump
attempts will trigger a printk yelling about the lack of a fully
qualified path.
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alan Cox <alan@linux.intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: James Morris <james.l.morris@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9520628e

30 7月, 2012 1 次提交

fs: add link restrictions · 800179c9

由 Kees Cook 提交于 7月 25, 2012

This adds symlink and hardlink restrictions to the Linux VFS.

Symlinks:

A long-standing class of security issues is the symlink-based
time-of-check-time-of-use race, most commonly seen in world-writable
directories like /tmp. The common method of exploitation of this flaw
is to cross privilege boundaries when following a given symlink (i.e. a
root process follows a symlink belonging to another user). For a likely
incomplete list of hundreds of examples across the years, please see:
http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp

The solution is to permit symlinks to only be followed when outside
a sticky world-writable directory, or when the uid of the symlink and
follower match, or when the directory owner matches the symlink's owner.

Some pointers to the history of earlier discussion that I could find:

1996 Aug, Zygo Blaxell
http://marc.info/?l=bugtraq&m=87602167419830&w=2
1996 Oct, Andrew Tridgell
http://lkml.indiana.edu/hypermail/linux/kernel/9610.2/0086.html
1997 Dec, Albert D Cahalan
http://lkml.org/lkml/1997/12/16/4
2005 Feb, Lorenzo Hernández García-Hierro
http://lkml.indiana.edu/hypermail/linux/kernel/0502.0/1896.html
2010 May, Kees Cook
https://lkml.org/lkml/2010/5/30/144

Past objections and rebuttals could be summarized as:

- Violates POSIX.
- POSIX didn't consider this situation and it's not useful to follow
a broken specification at the cost of security.
- Might break unknown applications that use this feature.
- Applications that break because of the change are easy to spot and
fix. Applications that are vulnerable to symlink ToCToU by not having
the change aren't. Additionally, no applications have yet been found
that rely on this behavior.
- Applications should just use mkstemp() or O_CREATE|O_EXCL.
- True, but applications are not perfect, and new software is written
all the time that makes these mistakes; blocking this flaw at the
kernel is a single solution to the entire class of vulnerability.
- This should live in the core VFS.
- This should live in an LSM. (https://lkml.org/lkml/2010/5/31/135)
- This should live in an LSM.
- This should live in the core VFS. (https://lkml.org/lkml/2010/8/2/188)

Hardlinks:

On systems that have user-writable directories on the same partition
as system files, a long-standing class of security issues is the
hardlink-based time-of-check-time-of-use race, most commonly seen in
world-writable directories like /tmp. The common method of exploitation
of this flaw is to cross privilege boundaries when following a given
hardlink (i.e. a root process follows a hardlink created by another
user). Additionally, an issue exists where users can "pin" a potentially
vulnerable setuid/setgid file so that an administrator will not actually
upgrade a system fully.

The solution is to permit hardlinks to only be created when the user is
already the existing file's owner, or if they already have read/write
access to the existing file.

Many Linux users are surprised when they learn they can link to files
they have no access to, so this change appears to follow the doctrine
of "least surprise". Additionally, this change does not violate POSIX,
which states "the implementation may require that the calling process
has permission to access the existing file"[1].

This change is known to break some implementations of the "at" daemon,
though the version used by Fedora and Ubuntu has been fixed[2] for
a while. Otherwise, the change has been undisruptive while in use in
Ubuntu for the last 1.5 years.

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html
[2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279

This patch is based on the patches in Openwall and grsecurity, along with
suggestions from Al Viro. I have added a sysctl to enable the protected
behavior, and documentation.
Signed-off-by: NKees Cook <keescook@chromium.org>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

800179c9

01 6月, 2012 1 次提交

mqueue: separate mqueue default value from maximum value · cef0184c

由 KOSAKI Motohiro 提交于 5月 31, 2012

Commit b231cca4 ("message queues: increase range limits") changed
mqueue default value when attr parameter is specified NULL from hard
coded value to fs.mqueue.{msg,msgsize}_max sysctl value.

This made large side effect.  When user need to use two mqueue
applications 1) using !NULL attr parameter and it require big message
size and 2) using NULL attr parameter and only need small size message,
app (1) require to raise fs.mqueue.msgsize_max and app (2) consume large
memory size even though it doesn't need.

Doug Ledford propsed to switch back it to static hard coded value.
However it also has a compatibility problem.  Some applications might
started depend on the default value is tunable.

The solution is to separate default value from maximum value.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>
Acked-by: NDoug Ledford <dledford@redhat.com>
Acked-by: NJoe Korty <joe.korty@ccur.com>
Cc: Amerigo Wang <amwang@redhat.com>
Acked-by: NSerge E. Hallyn <serue@us.ibm.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cef0184c

24 5月, 2011 1 次提交

Documentation: update epoll sysctl text · 52307a9e

由 Lucian Adrian Grijincu 提交于 5月 23, 2011

max_user_instances was removed in this commit:

   commit 9df04e1f
   Author: Davide Libenzi <davidel@xmailserver.org>
   Date:   Thu Jan 29 14:25:26 2009 -0800

    epoll: drop max_user_instances and rely only on max_user_watches

but the documentation entry was not removed.

Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: NLucian Adrian Grijincu <lucian.grijincu@gmail.com>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

52307a9e

17 3月, 2011 1 次提交

Documentation: file handles are now freed · ca3b78aa

由 Federica Teodori 提交于 3月 15, 2011

Since file handles are freed, a little amendment to the documentation
Signed-off-by: NFederica Teodori <federica.teodori@googlemail.com>
Acked-by: Rik van Riel<riel@redhat.com>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ca3b78aa

24 9月, 2009 1 次提交

Documentation: update stale definition of file-nr in fs.txt · bcadbbd4

由 Xiaotian Feng 提交于 9月 23, 2009

In "documentation: update Documentation/filesystem/proc.txt and
Documentation/sysctls" (commit 760df93e) we merged /proc/sys/fs
documentation in Documentation/sysctl/fs.txt and
Documentation/filesystem/proc.txt, but stale file-nr definition
remained.

This patch adds back the right fs-nr definition for 2.6 kernel.

Signed-off-by: Xiaotian Feng<dfeng@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bcadbbd4

03 4月, 2009 1 次提交

documentation: update Documentation/filesystem/proc.txt and Documentation/sysctls · 760df93e

由 Shen Feng 提交于 4月 02, 2009

Now /proc/sys is described in many places and much information is
redundant.  This patch updates the proc.txt and move the /proc/sys
desciption out to the files in Documentation/sysctls.

Details are:

merge
-  2.1  /proc/sys/fs - File system data
-  2.11 /proc/sys/fs/mqueue - POSIX message queues filesystem
-  2.17 /proc/sys/fs/epoll - Configuration options for the epoll interface
with Documentation/sysctls/fs.txt.

remove
-  2.2  /proc/sys/fs/binfmt_misc - Miscellaneous binary formats
since it's not better then the Documentation/binfmt_misc.txt.

merge
-  2.3  /proc/sys/kernel - general kernel parameters
with Documentation/sysctls/kernel.txt

remove
-  2.5  /proc/sys/dev - Device specific parameters
since it's obsolete the sysfs is used now.

remove
-  2.6  /proc/sys/sunrpc - Remote procedure calls
since it's not better then the Documentation/sysctls/sunrpc.txt

move
-  2.7  /proc/sys/net - Networking stuff
-  2.9  Appletalk
-  2.10 IPX
to newly created Documentation/sysctls/net.txt.

remove
-  2.8  /proc/sys/net/ipv4 - IPV4 settings
since it's not better then the Documentation/networking/ip-sysctl.txt.

add
- Chapter 3 Per-Process Parameters
to descibe /proc/<pid>/xxx parameters.
Signed-off-by: NShen Feng <shen@cn.fujitsu.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

760df93e

07 2月, 2008 1 次提交

get rid of NR_OPEN and introduce a sysctl_nr_open · 9cfe015a

由 Eric Dumazet 提交于 2月 06, 2008

NR_OPEN (historically set to 1024*1024) actually forbids processes to open
more than 1024*1024 handles.

Unfortunatly some production servers hit the not so 'ridiculously high
value' of 1024*1024 file descriptors per process.

Changing NR_OPEN is not considered safe because of vmalloc space potential
exhaust.

This patch introduces a new sysctl (/proc/sys/fs/nr_open) wich defaults to
1024*1024, so that admins can decide to change this limit if their workload
needs it.

[akpm@linux-foundation.org: export it for sparc64]
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9cfe015a

30 11月, 2006 1 次提交

Fix typos in /Documentation : Misc · 5d3f083d

由 Matt LaPlante 提交于 11月 30, 2006

This patch fixes typos in various Documentation txts. The patch addresses some
misc words.
Signed-off-by: NMatt LaPlante <kernel1@cyberdogtech.com>
Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

5d3f083d

28 8月, 2006 1 次提交

[PATCH] Fix docs for fs.suid_dumpable · a2e0b563

由 Alexey Dobriyan 提交于 8月 27, 2006

Sergey Vlasov noticed that there is not kernel.suid_dumpable, but
fs.suid_dumpable.

How KERN_SETUID_DUMPABLE ended up in fs_table[]? Hell knows...
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a2e0b563

17 4月, 2005 1 次提交

Linux-2.6.12-rc2 · 1da177e4

由 Linus Torvalds 提交于 4月 16, 2005

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

1da177e4

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功