1. 20 10月, 2016 1 次提交
  2. 10 10月, 2016 1 次提交
    • L
      printk: reinstate KERN_CONT for printing continuation lines · 4bcc595c
      Linus Torvalds 提交于
      Long long ago the kernel log buffer was a buffered stream of bytes, very
      much like stdio in user space.  It supported log levels by scanning the
      stream and noticing the log level markers at the beginning of each line,
      but if you wanted to print a partial line in multiple chunks, you just
      did multiple printk() calls, and it just automatically worked.
      
      Except when it didn't, and you had very confusing output when different
      lines got all mixed up with each other.  Then you got fragment lines
      mixing with each other, or with non-fragment lines, because it was
      traditionally impossible to tell whether a printk() call was a
      continuation or not.
      
      To at least help clarify the issue of continuation lines, we added a
      KERN_CONT marker back in 2007 to mark continuation lines:
      
        47492527 ("printk: add KERN_CONT annotation").
      
      That continuation marker was initially an empty string, and didn't
      actuall make any semantic difference.  But it at least made it possible
      to annotate the source code, and have check-patch notice that a printk()
      didn't need or want a log level marker, because it was a continuation of
      a previous line.
      
      To avoid the ambiguity between a continuation line that had that
      KERN_CONT marker, and a printk with no level information at all, we then
      in 2009 made KERN_CONT be a real log level marker which meant that we
      could now reliably tell the difference between the two cases.
      
        5fd29d6c ("printk: clean up handling of log-levels and newlines")
      
      and we could take advantage of that to make sure we didn't mix up
      continuation lines with lines that just didn't have any loglevel at all.
      
      Then, in 2012, the kernel log buffer was changed to be a "record" based
      log, where each line was a record that has a loglevel and a timestamp.
      
      You can see the beginning of that conversion in commits
      
        e11fea92 ("kmsg: export printk records to the /dev/kmsg interface")
        7ff9554b ("printk: convert byte-buffer to variable-length record buffer")
      
      with a number of follow-up commits to fix some painful fallout from that
      conversion.  Over all, it took a couple of months to sort out most of
      it.  But the upside was that you could have concurrent readers (and
      writers) of the kernel log and not have lines with mixed output in them.
      
      And one particular pain-point for the record-based kernel logging was
      exactly the fragmentary lines that are generated in smaller chunks.  In
      order to still log them as one recrod, the continuation lines need to be
      attached to the previous record properly.
      
      However the explicit continuation record marker that is actually useful
      for this exact case was actually removed in aroundm the same time by commit
      
        61e99ab8 ("printk: remove the now unnecessary "C" annotation for KERN_CONT")
      
      due to the incorrect belief that KERN_CONT wasn't meaningful.  The
      ambiguity between "is this a continuation line" or "is this a plain
      printk with no log level information" was reintroduced, and in fact
      became an even bigger pain point because there was now the whole
      record-level merging of kernel messages going on.
      
      This patch reinstates the KERN_CONT as a real non-empty string marker,
      so that the ambiguity is fixed once again.
      
      But it's not a plain revert of that original removal: in the four years
      since we made KERN_CONT an empty string again, not only has the format
      of the log level markers changed, we've also had some usage changes in
      this area.
      
      For example, some ACPI code seems to use KERN_CONT _together_ with a log
      level, and now uses both the KERN_CONT marker and (for example) a
      KERN_INFO marker to show that it's an informational continuation of a
      line.
      
      Which is actually not a bad idea - if the continuation line cannot be
      attached to its predecessor, without the log level information we don't
      know what log level to assign to it (and we traditionally just assigned
      it the default loglevel).  So having both a log level and the KERN_CONT
      marker is not necessarily a bad idea, but it does mean that we need to
      actually iterate over potentially multiple markers, rather than just a
      single one.
      
      Also, since KERN_CONT was still conceptually needed, and encouraged, but
      didn't actually _do_ anything, we've also had the reverse problem:
      rather than having too many annotations it has too few, and there is bit
      rot with code that no longer marks the continuation lines with the
      KERN_CONT marker.
      
      So this patch not only re-instates the non-empty KERN_CONT marker, it
      also fixes up the cases of bit-rot I noticed in my own logs.
      
      There are probably other cases where KERN_CONT will be needed to be
      added, either because it is new code that never dealt with the need for
      KERN_CONT, or old code that has bitrotted without anybody noticing.
      
      That said, we should strive to avoid the need for KERN_CONT.  It does
      result in real problems for logging, and should generally not be seen as
      a good feature.  If we some day can get rid of the feature entirely,
      because nobody does any fragmented printk calls, that would be lovely.
      
      But until that point, let's at mark the code that relies on the hacky
      multi-fragment kernel printk's.  Not only does it avoid the ambiguity,
      it also annotates code as "maybe this would be good to fix some day".
      
      (That said, particularly during single-threaded bootup, the downsides of
      KERN_CONT are very limited.  Things get much hairier when you have
      multiple threads going on and user level reading and writing logs too).
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4bcc595c
  3. 08 10月, 2016 1 次提交
  4. 28 9月, 2016 1 次提交
  5. 20 9月, 2016 1 次提交
    • V
      lsm,audit,selinux: Introduce a new audit data type LSM_AUDIT_DATA_FILE · 43af5de7
      Vivek Goyal 提交于
      Right now LSM_AUDIT_DATA_PATH type contains "struct path" in union "u"
      of common_audit_data. This information is used to print path of file
      at the same time it is also used to get to dentry and inode. And this
      inode information is used to get to superblock and device and print
      device information.
      
      This does not work well for layered filesystems like overlay where dentry
      contained in path is overlay dentry and not the real dentry of underlying
      file system. That means inode retrieved from dentry is also overlay
      inode and not the real inode.
      
      SELinux helpers like file_path_has_perm() are doing checks on inode
      retrieved from file_inode(). This returns the real inode and not the
      overlay inode. That means we are doing check on real inode but for audit
      purposes we are printing details of overlay inode and that can be
      confusing while debugging.
      
      Hence, introduce a new type LSM_AUDIT_DATA_FILE which carries file
      information and inode retrieved is real inode using file_inode(). That
      way right avc denied information is given to user.
      
      For example, following is one example avc before the patch.
      
        type=AVC msg=audit(1473360868.399:214): avc:  denied  { read open } for
          pid=1765 comm="cat"
          path="/root/.../overlay/container1/merged/readfile"
          dev="overlay" ino=21443
          scontext=unconfined_u:unconfined_r:test_overlay_client_t:s0:c10,c20
          tcontext=unconfined_u:object_r:test_overlay_files_ro_t:s0
          tclass=file permissive=0
      
      It looks as follows after the patch.
      
        type=AVC msg=audit(1473360017.388:282): avc:  denied  { read open } for
          pid=2530 comm="cat"
          path="/root/.../overlay/container1/merged/readfile"
          dev="dm-0" ino=2377915
          scontext=unconfined_u:unconfined_r:test_overlay_client_t:s0:c10,c20
          tcontext=unconfined_u:object_r:test_overlay_files_ro_t:s0
          tclass=file permissive=0
      
      Notice that now dev information points to "dm-0" device instead of
      "overlay" device. This makes it clear that check failed on underlying
      inode and not on the overlay inode.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      [PM: slight tweaks to the description to make checkpatch.pl happy]
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      43af5de7
  6. 14 9月, 2016 1 次提交
  7. 31 8月, 2016 1 次提交
  8. 30 8月, 2016 2 次提交
  9. 19 8月, 2016 1 次提交
  10. 10 8月, 2016 1 次提交
  11. 09 8月, 2016 4 次提交
  12. 21 7月, 2016 1 次提交
  13. 28 6月, 2016 6 次提交
  14. 25 6月, 2016 1 次提交
  15. 24 6月, 2016 1 次提交
    • A
      fs: Treat foreign mounts as nosuid · 380cf5ba
      Andy Lutomirski 提交于
      If a process gets access to a mount from a different user
      namespace, that process should not be able to take advantage of
      setuid files or selinux entrypoints from that filesystem.  Prevent
      this by treating mounts from other mount namespaces and those not
      owned by current_user_ns() or an ancestor as nosuid.
      
      This will make it safer to allow more complex filesystems to be
      mounted in non-root user namespaces.
      
      This does not remove the need for MNT_LOCK_NOSUID.  The setuid,
      setgid, and file capability bits can no longer be abused if code in
      a user namespace were to clear nosuid on an untrusted filesystem,
      but this patch, by itself, is insufficient to protect the system
      from abuse of files that, when execed, would increase MAC privilege.
      
      As a more concrete explanation, any task that can manipulate a
      vfsmount associated with a given user namespace already has
      capabilities in that namespace and all of its descendents.  If they
      can cause a malicious setuid, setgid, or file-caps executable to
      appear in that mount, then that executable will only allow them to
      elevate privileges in exactly the set of namespaces in which they
      are already privileges.
      
      On the other hand, if they can cause a malicious executable to
      appear with a dangerous MAC label, running it could change the
      caller's security context in a way that should not have been
      possible, even inside the namespace in which the task is confined.
      
      As a hardening measure, this would have made CVE-2014-5207 much
      more difficult to exploit.
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NSeth Forshee <seth.forshee@canonical.com>
      Acked-by: NJames Morris <james.l.morris@oracle.com>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      380cf5ba
  16. 16 6月, 2016 1 次提交
  17. 09 6月, 2016 1 次提交
  18. 01 6月, 2016 1 次提交
    • S
      selinux: Only apply bounds checking to source types · 7ea59202
      Stephen Smalley 提交于
      The current bounds checking of both source and target types
      requires allowing any domain that has access to the child
      domain to also have the same permissions to the parent, which
      is undesirable.  Drop the target bounds checking.
      
      KaiGai Kohei originally removed all use of target bounds in
      commit 7d52a155 ("selinux: remove dead code in
      type_attribute_bounds_av()") but this was reverted in
      commit 2ae3ba39 ("selinux: libsepol: remove dead code in
      check_avtab_hierarchy_callback()") because it would have
      required explicitly allowing the parent any permissions
      to the child that the child is allowed to itself.
      
      This change in contrast retains the logic for the case where both
      source and target types are bounded, thereby allowing access
      if the parent of the source is allowed the corresponding
      permissions to the parent of the target.  Further, this change
      reworks the logic such that we only perform a single computation
      for each case and there is no ambiguity as to how to resolve
      a bounds violation.
      
      Under the new logic, if the source type and target types are both
      bounded, then the parent of the source type must be allowed the same
      permissions to the parent of the target type.  If only the source
      type is bounded, then the parent of the source type must be allowed
      the same permissions to the target type.
      
      Examples of the new logic and comparisons with the old logic:
      1. If we have:
      	typebounds A B;
      then:
      	allow B self:process <permissions>;
      will satisfy the bounds constraint iff:
      	allow A self:process <permissions>;
      is also allowed in policy.
      
      Under the old logic, the allow rule on B satisfies the
      bounds constraint if any of the following three are allowed:
      	allow A B:process <permissions>; or
      	allow B A:process <permissions>; or
      	allow A self:process <permissions>;
      However, either of the first two ultimately require the third to
      satisfy the bounds constraint under the old logic, and therefore
      this degenerates to the same result (but is more efficient - we only
      need to perform one compute_av call).
      
      2. If we have:
      	typebounds A B;
      	typebounds A_exec B_exec;
      then:
      	allow B B_exec:file <permissions>;
      will satisfy the bounds constraint iff:
      	allow A A_exec:file <permissions>;
      is also allowed in policy.
      
      This is essentially the same as #1; it is merely included as
      an example of dealing with object types related to a bounded domain
      in a manner that satisfies the bounds relationship.  Note that
      this approach is preferable to leaving B_exec unbounded and having:
      	allow A B_exec:file <permissions>;
      in policy because that would allow B's entrypoints to be used to
      enter A.  Similarly for _tmp or other related types.
      
      3. If we have:
      	typebounds A B;
      and an unbounded type T, then:
      	allow B T:file <permissions>;
      will satisfy the bounds constraint iff:
      	allow A T:file <permissions>;
      is allowed in policy.
      
      The old logic would have been identical for this example.
      
      4. If we have:
      	typebounds A B;
      and an unbounded domain D, then:
      	allow D B:unix_stream_socket <permissions>;
      is not subject to any bounds constraints under the new logic
      because D is not bounded.  This is desirable so that we can
      allow a domain to e.g. connectto a child domain without having
      to allow it to do the same to its parent.
      
      The old logic would have required:
      	allow D A:unix_stream_socket <permissions>;
      to also be allowed in policy.
      Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
      [PM: re-wrapped description to appease checkpatch.pl]
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      7ea59202
  19. 27 4月, 2016 2 次提交
  20. 21 4月, 2016 1 次提交
    • R
      rtnetlink: add new RTM_GETSTATS message to dump link stats · 10c9ead9
      Roopa Prabhu 提交于
      This patch adds a new RTM_GETSTATS message to query link stats via netlink
      from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK
      returns a lot more than just stats and is expensive in some cases when
      frequent polling for stats from userspace is a common operation.
      
      RTM_GETSTATS is an attempt to provide a light weight netlink message
      to explicity query only link stats from the kernel on an interface.
      The idea is to also keep it extensible so that new kinds of stats can be
      added to it in the future.
      
      This patch adds the following attribute for NETDEV stats:
      struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = {
              [IFLA_STATS_LINK_64]  = { .len = sizeof(struct rtnl_link_stats64) },
      };
      
      Like any other rtnetlink message, RTM_GETSTATS can be used to get stats of
      a single interface or all interfaces with NLM_F_DUMP.
      
      Future possible new types of stat attributes:
      link af stats:
          - IFLA_STATS_LINK_IPV6  (nested. for ipv6 stats)
          - IFLA_STATS_LINK_MPLS  (nested. for mpls/mdev stats)
      extended stats:
          - IFLA_STATS_LINK_EXTENDED (nested. extended software netdev stats like bridge,
            vlan, vxlan etc)
          - IFLA_STATS_LINK_HW_EXTENDED (nested. extended hardware stats which are
            available via ethtool today)
      
      This patch also declares a filter mask for all stat attributes.
      User has to provide a mask of stats attributes to query. filter mask
      can be specified in the new hdr 'struct if_stats_msg' for stats messages.
      Other important field in the header is the ifindex.
      
      This api can also include attributes for global stats (eg tcp) in the future.
      When global stats are included in a stats msg, the ifindex in the header
      must be zero. A single stats message cannot contain both global and
      netdev specific stats. To easily distinguish them, netdev specific stat
      attributes name are prefixed with IFLA_STATS_LINK_
      
      Without any attributes in the filter_mask, no stats will be returned.
      
      This patch has been tested with mofified iproute2 ifstat.
      Suggested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10c9ead9
  21. 20 4月, 2016 3 次提交
  22. 14 4月, 2016 1 次提交
    • P
      selinux: Change bool variable name to index. · 0fd71a62
      Prarit Bhargava 提交于
      security_get_bool_value(int bool) argument "bool" conflicts with
      in-kernel macros such as BUILD_BUG().  This patch changes this to
      index which isn't a type.
      
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Andrew Perepechko <anserper@ya.ru>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: selinux@tycho.nsa.gov
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Paul Moore <pmoore@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      [PM: wrapped description for checkpatch.pl, use "selinux:..." as subj]
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      0fd71a62
  23. 11 4月, 2016 2 次提交
  24. 06 4月, 2016 4 次提交