1. 10 1月, 2018 2 次提交
    • J
      consistently use the LOCK an UNLOCK macros · c4bc0b1a
      Jens Gustedt 提交于
      In some places there has been a direct usage of the functions. Use the
      macros consistently everywhere, such that it might be easier later on to
      capture the fast path directly inside the macro and only have the call
      overhead on the slow path.
      c4bc0b1a
    • J
      new lock algorithm with state and congestion count in one atomic int · 47d0bcd4
      Jens Gustedt 提交于
      A variant of this new lock algorithm has been presented at SAC'16, see
      https://hal.inria.fr/hal-01304108. A full version of that paper is
      available at https://hal.inria.fr/hal-01236734.
      
      The main motivation of this is to improve on the safety of the basic lock
      implementation in musl. This is achieved by squeezing a lock flag and a
      congestion count (= threads inside the critical section) into a single
      int. Thereby an unlock operation does exactly one memory
      transfer (a_fetch_add) and never touches the value again, but still
      detects if a waiter has to be woken up.
      
      This is a fix of a use-after-free bug in pthread_detach that had
      temporarily been patched. Therefore this patch also reverts
      
               c1e27367
      
      This is also the only place where internal knowledge of the lock
      algorithm is used.
      
      The main price for the improved safety is a little bit larger code.
      
      Under high congestion, the scheduling behavior will be different
      compared to the previous algorithm. In that case, a successful
      put-to-sleep may appear out of order compared to the arrival in the
      critical section.
      47d0bcd4
  2. 19 12月, 2017 6 次提交
    • R
      fix iconv output of surrogate pairs in ucs2 · 628cf979
      Rich Felker 提交于
      in the unified code for handling utf-16 and ucs2 output, the check for
      ucs2 wrongly looked at the source charset rather than the destination
      charset.
      628cf979
    • R
      add support for BOM-determined-endian UCS2, UTF-16, and UTF-32 to iconv · 95c6044e
      Rich Felker 提交于
      previously, the charset names without endianness specified were always
      interpreted as big endian. unicode specifies that UTF-16 and UTF-32
      have BOM-determined endianness if BOM is present, and are otherwise
      big endian. since commit 5b546faa
      added support for stateful encodings, it is now possible to implement
      BOM support via the conversion descriptor state.
      
      for conversions to these charsets, the output is always big endian and
      does not have a BOM.
      95c6044e
    • R
      add cp866 (dos cyrillic) to iconv · 9d4d0ee4
      Rich Felker 提交于
      9d4d0ee4
    • R
      update case mappings to unicode 10.0 · 54941edd
      Rich Felker 提交于
      the mapping tables and code are not automatically generated; they were
      produced by comparing the output of towupper/towlower against the
      mappings in the UCD, ignoring characters that were previously excluded
      from case mappings or from alphabetic status (micro sign and circled
      letters), and adding table entries or code for everything else
      missing.
      
      based very loosely on a patch by Reini Urban.
      54941edd
    • R
      update ctype tables to unicode 10.0 · c72c1c52
      Rich Felker 提交于
      c72c1c52
    • R
      reformat ctype tables to be diff-friendly, match tool output · d3f23337
      Rich Felker 提交于
      the new version of the code used to generate these tables forces a
      newline every 256 entries, whereas at the time these files were
      originally generated and committed, it only wrapped them at 80
      columns. the new behavior ensures that localized changes to the
      tables, if they are ever needed, will produce localized diffs.
      
      commit d060edf6 made the corresponding
      changes to the iconv tables.
      d3f23337
  3. 15 12月, 2017 3 次提交
    • N
      use the name UTC instead of GMT for UTC timezone · eb7f93c4
      Natanael Copa 提交于
      notes by maintainer:
      
      both C and POSIX use the term UTC to specify related functionality,
      despite POSIX defining it as something more like UT1 or historical
      (pre-UTC) GMT without leap seconds. neither specifies the associated
      string for %Z. old choice of "GMT" violated principle of least
      surprise for users and some applications/tests. use "UTC" instead.
      eb7f93c4
    • N
      fix sysconf for infinite rlimits · 3ec82877
      Natanael Copa 提交于
      sysconf should return -1 for infinity, not LONG_MAX.
      3ec82877
    • R
      fix data race in at_quick_exit · 64303156
      Rich Felker 提交于
      aside from theoretical arbitrary results due to UB, this could
      practically cause unbounded overflow of static array if hit, but
      hitting it depends on having more than 32 calls to at_quick_exit and
      having them sufficiently often.
      64303156
  4. 13 12月, 2017 1 次提交
  5. 12 12月, 2017 1 次提交
    • T
      implement strftime padding specifier extensions · 8a6bd730
      Timo Teräs 提交于
      notes added by maintainer:
      
      the '-' specifier allows default padding to be suppressed, and '_'
      allows padding with spaces instead of the default (zeros).
      
      these extensions seem to be included in several other implementations
      including FreeBSD and derivatives, and Solaris. while portable
      software should not depend on them, time format strings are often
      exposed to the user for configurable time display. reportedly some
      python programs also use and depend on them.
      8a6bd730
  6. 07 12月, 2017 1 次提交
    • W
      implement the fopencookie extension to stdio · 06184334
      William Pitcock 提交于
      notes added by maintainer:
      
      this function is a GNU extension. it was chosen over the similar BSD
      function funopen because the latter depends on fpos_t being an
      arithmetic type as part of its public API, conflicting with our
      definition of fpos_t and with the intent that it be an opaque type. it
      was accepted for inclusion because, despite not being widely used, it
      is usually very difficult to extricate software using it from the
      dependency on it.
      
      calling pattern for the read and write callbacks is not likely to
      match glibc or other implementations, but should work with any
      reasonable callbacks. in particular the read function is never called
      without at least one byte being needed to satisfy its caller, so that
      spurious blocking is not introduced.
      
      contracts for what callbacks called from inside libc/stdio can do are
      always complicated, and at some point still need to be specified
      explicitly. at the very least, the callbacks must return or block
      indefinitely (they cannot perform nonlocal exits) and they should not
      make calls to stdio using their own FILE as an argument.
      06184334
  7. 21 11月, 2017 2 次提交
    • R
      make fgetwc handling of encoding errors consistent with/without buffer · 4000b010
      Rich Felker 提交于
      previously, fgetwc left all but the first byte of an illegal sequence
      unread (available for subsequent calls) when reading out of the FILE
      buffer, but dropped all bytes contibuting to the error when falling
      back to reading a byte at a time. neither behavior was ideal. in the
      buffered case, each malformed character produced one error per byte,
      rather than one per character. in the unbuffered case, consuming the
      last byte that caused the transition from "incomplete" to "invalid"
      state potentially dropped (and produced additional spurious encoding
      errors for) the next valid character.
      
      to handle both cases uniformly without duplicate code, revise the
      buffered case to only cover situations where a complete and valid
      character is present in the buffer, and fall back to byte-at-a-time
      for all other cases. this allows using mbtowc (stateless) instead of
      mbrtowc, which may slightly improve performance too.
      
      when an encoding error has been hit in the byte-at-a-time case, leave
      the final byte that produced the error unread (via ungetc) except in
      the case of single-byte errors (for UTF-8, bytes c0, c1, f5-ff, and
      continuation bytes with no lead byte). single-byte errors are fully
      consumed so as not to leave the caller in an infinite loop repeating
      the same error.
      
      none of these changes are distinguished from a conformance standpoint,
      since the file position is unspecified after encoding errors. they are
      intended merely as QoI/consistency improvements.
      4000b010
    • R
      fix treatment by fgetws of encoding errors as eof · a90d9da1
      Rich Felker 提交于
      fgetwc does not set the stream's error indicator on encoding errors,
      making ferror insufficient to distinguish between error and eof
      conditions. feof is also insufficient, since it will return true if
      the file ended with a partial character encoding error.
      
      whether fgetwc should be setting the error indicator itself is a
      question with conflicting answers. the POSIX text for the function
      states it as a requirement, but the ISO C text seems to require that
      it not. this may be revisited in the future based on the outcome of
      Austin Group issue #1170.
      a90d9da1
  8. 19 11月, 2017 1 次提交
  9. 15 11月, 2017 1 次提交
    • R
      add reverse iconv mappings for JIS-based encodings · a223dbd2
      Rich Felker 提交于
      these encodings are still commonly used in messaging protocols and
      such. the reverse mapping is implemented as a binary search of a list
      of the jis 0208 characters in unicode order; the existing forward
      table is used to perform the comparison in the search.
      a223dbd2
  10. 14 11月, 2017 1 次提交
    • R
      generalize iconv framework for 8-bit codepages · 105eff9d
      Rich Felker 提交于
      previously, 8-bit codepages could only remap the high 128 bytes; the
      low range was assumed/forced to agree with ascii. interpretation of
      codepage table headers has been changed so that it's possible to
      represent mappings for up to 256 slots (fewer if the initial portion
      of the map is elided because it coincides with unicode codepoints).
      this requires consuming a bit more of the 10-bit space of characters
      that can be represented in 8-bit codepages, but there's still a plenty
      left. the size of the legacy_chars table is actually reduced now by
      eliding the first 256 entries and considering them to map implicitly
      via the identity map.
      
      before these changes, there seem to have been minor bugs/omissions in
      codepage table generation, so it's likely that some actual bug fixes
      are silently included in this commit. round-trip testing of a few
      codepages was performed on the new version of the code, but no
      differential testing against the old version was done.
      105eff9d
  11. 11 11月, 2017 6 次提交
    • R
      reformat cjk iconv tables to be diff-friendly, match tool output · d060edf6
      Rich Felker 提交于
      the new version of the code used to generate these tables forces a
      newline every 256 entries, whereas at the time these files were
      originally generated and committed, it only wrapped them at 80
      columns. the new behavior ensures that localized changes to the
      tables, if they are ever needed, will produce localized diffs. other
      tables including hkscs were already committed in the new format.
      
      binary comparison of the generated object files was performed to
      confirm that no spurious changes slipped in.
      d060edf6
    • B
      prevent fork's errno from being clobbered by atfork handlers · c21051e9
      Bobby Bingham 提交于
      If the syscall fails, errno must be set correctly for the caller.
      There's no guarantee that the handlers registered with pthread_atfork
      won't clobber errno, so we need to ensure it gets set after they are
      called.
      c21051e9
    • R
      add iso-2022-jp support (decoding only) to iconv · a39f20bf
      Rich Felker 提交于
      this implementation aims to match the baseline defined by rfc1468 (the
      original mime charset definition) plus the halfwidth katakana
      extension included in the whatwg definition of the charset. rejection
      of si/so controls and newlines in doublebyte state are not currently
      enforced. the jis x 0201 mode is currently interpreted as having the
      yen sign and overline character in place of backslash and tilde; ascii
      mode has the standard ascii characters in those slots.
      a39f20bf
    • R
      add iconv framework for decoding stateful encodings · 5b546faa
      Rich Felker 提交于
      assuming pointers obtained from malloc have some nonzero alignment,
      repurpose the low bit of iconv_t as an indicator that the descriptor
      is a stateless value representing the source and destination character
      encodings.
      5b546faa
    • R
      simplify/optimize iconv utf-8 case · 0df5b39a
      Rich Felker 提交于
      the special case where mbrtowc returns 0 but consumed 1 byte of input
      does not need to be considered, because the short-circuit for low
      bytes already covered that case.
      0df5b39a
    • R
      handle ascii range individually in each iconv case · 9eb6dd51
      Rich Felker 提交于
      short-circuiting low bytes before the switch precluded support for
      character encodings that don't coincide with ascii in this range. this
      limitation affected iso-2022 encodings, which use the esc byte to
      introduce a shift sequence, and things like ebcdic.
      9eb6dd51
  12. 10 11月, 2017 4 次提交
    • R
      move iconv_close to its own translation unit · bff59d13
      Rich Felker 提交于
      this is in preparation to support stateful conversion descriptors,
      which are necessarily allocated and thus must be freed in iconv_close.
      putting it in a separate TU will avoid pulling in free if iconv_close
      is not referenced.
      bff59d13
    • R
      refactor iconv conversion descriptor encoding/decoding · 79f49eff
      Rich Felker 提交于
      this change is made to avoid having assumptions about the encoding
      spread out across the file, and to facilitate future change to a form
      that can accommodate allocted, stateful descriptors when needed.
      
      this commit should not produce any functional changes; with the
      compiler tested the only change to code generation was minor
      reordering of local variables on stack.
      79f49eff
    • A
      fix getaddrinfo error code for non-numeric service with AI_NUMERICSERV · 30fdda6c
      A. Wilcox 提交于
      If AI_NUMERICSERV is specified and a numeric service was not provided,
      POSIX mandates getaddrinfo return EAI_NONAME. EAI_SERVICE is only for
      services that cannot be used on the specified socket type.
      30fdda6c
    • R
      fix mismatched type of __pthread_tsd_run_dtors weak definition · 67b29947
      Rich Felker 提交于
      commit a6054e3c changed this function
      not to take an argument, but the weak definition used by timer_create
      was not updated to match.
      
      reported by Pascal Cuoq.
      67b29947
  13. 06 11月, 2017 1 次提交
    • R
      adjust posix_spawn dup2 action behavior to match future requirements · 6fc6ca1a
      Rich Felker 提交于
      the resolution to Austin Group issue #411 defined new semantics for
      the posix_spawn dup2 file action in the (previously useless) case
      where src and dest fd are equal. future issues will require the dup2
      file action to remove the close-on-exec flag. without this change,
      passing fds to a child with posix_spawn while avoiding fd-leak races
      in a multithreaded parent required a complex dance with temporary fds.
      
      based on patch by Petr Skocik. changes were made to preserve the
      80-column formatting of the function and to remove code that became
      unreachable as a result of the new functionality.
      6fc6ca1a
  14. 22 10月, 2017 1 次提交
    • R
      fix regression in glob with literal . or .. path component · ec04d122
      Rich Felker 提交于
      commit 8c4be3e2 was written to
      preclude the GLOB_PERIOD extension from matching these directory
      entries, but also precluded literal matches.
      
      adjust the check that excludes . and .. to check whether the
      GLOB_PERIOD flag is in effect, so that it cannot alter behavior in
      cases governed by the standard, and also don't exclude . or .. in any
      case where normal glob behavior (fnmatch's FNM_PERIOD flag) would have
      included one or both of them (patterns such as ".*").
      
      it's still not clear whether this is the preferred behavior for
      GLOB_PERIOD, but at least it's clear that it can no longer break
      applications which are not relying on quirks of a nonstandard feature.
      ec04d122
  15. 20 10月, 2017 1 次提交
    • W
      posix_spawn: use larger stack to cover worst-case in execvpe · 004dc954
      Will Dietz 提交于
      execvpe stack-allocates a buffer used to hold the full path
      (combination of a PATH entry and the program name)
      while searching through $PATH, so at least
      NAME_MAX+PATH_MAX is needed.
      
      The stack size can be made conditionally smaller
      (the current 1024 appears appropriate)
      should this larger size be burdensome in those situations.
      004dc954
  16. 19 10月, 2017 1 次提交
    • R
      in dns parsing callback, enforce MAXADDRS to preclude overflow · 45ca5d3f
      Rich Felker 提交于
      MAXADDRS was chosen not to need enforcement, but the logic used to
      compute it assumes the answers received match the RR types of the
      queries. specifically, it assumes that only one replu contains A
      record answers. if the replies to both the A and the AAAA query have
      their answer sections filled with A records, MAXADDRS can be exceeded
      and clobber the stack of the calling function.
      
      this bug was found and reported by Felix Wilhelm.
      45ca5d3f
  17. 14 10月, 2017 3 次提交
    • R
      fix incorrect base name offset from nftw when pathname ends in slash(es) · 5b5eb527
      Rich Felker 提交于
      the rightmost '/' character is not necessarily the delimiter before
      the basename; it could be a spurious trailing character on the
      directory name.
      
      this change does not introduce any normalization of pathnames or
      stripping of trailing slashes, contrary to at least glibc and perhaps
      other implementations; it jusst prevents their presence from breaking
      things. whether further changes should be made is an open question
      that may depend on conformance and/or application compatibility
      considerations.
      
      based loosely on patch by Joakim Sindholt.
      5b5eb527
    • R
      fix read-after-free type error in pthread_detach · c1e27367
      Rich Felker 提交于
      calling __unlock on t->exitlock is not valid because __unlock reads
      the waiters count after making the atomic store that could allow
      pthread_exit to continue and unmap the thread's stack and the object t
      points to. for now, inline the __unlock logic with an unconditional
      futex wake operation so that the waiters count is not needed.
      
      once __lock/__unlock have been made safe for self-synchronized
      destruction, we could switch back to using them.
      c1e27367
    • S
      math: rewrite fma with mostly int arithmetics · 90747692
      Szabolcs Nagy 提交于
      the freebsd fma code failed to raise underflow exception in some
      cases in nearest rounding mode (affects fmal too) e.g.
      
        fma(-0x1p-1000, 0x1.000001p-74, 0x1p-1022)
      
      and the inexact exception may be raised spuriously since the fenv
      is not saved/restored around the exact multiplication algorithm
      (affects x86 fma too).
      
      another issue is that the underflow behaviour when the rounded result
      is the minimal normal number is target dependent, ieee754 allows two
      ways to raise underflow for inexact results: raise if the result before
      rounding is in the subnormal range (e.g. aarch64, arm, powerpc) or if
      the result after rounding with infinite exponent range is in the
      subnormal range (e.g. x86, mips, sh).
      
      to avoid all these issues the algorithm was rewritten with mostly int
      arithmetics and float arithmetics is only used to get correct rounding
      and raise exceptions according to the behaviour of the target without
      any fenv.h dependency. it also unifies x86 and non-x86 fma.
      
      fmaf is not affected, fmal need to be fixed too.
      
      this algorithm depends on a_clz_64 and it required a few spurious
      instructions to make sure underflow exception is raised in a particular
      corner case. (normally FORCE_EVAL(tiny*tiny) would be used for this,
      but on i386 gcc is broken if the expression is constant
      https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57245
      and there is no easy portable fix for the macro.)
      90747692
  18. 13 10月, 2017 2 次提交
    • R
      for executing init array functions, use function type with prototype · b3516058
      Rich Felker 提交于
      this is for consistency with the way it's done in in the dynamic
      linker, avoiding a deprecated C feature (non-prototype function
      types), and improving code generation. GCC unnecessarily uses the
      variadic calling convention (e.g. clearing rax on x86_64) when making
      a call where the argument types are not known for compatibility with
      wrong code which calls variadic functions this way. (C on the other
      hand is clear that such calls have undefined behavior.)
      b3516058
    • R
      fix access by setjmp and longjmp to __hwcap on arm built as thumb2 · e364774d
      Rich Felker 提交于
      this is a subtle issue with how the assembler/linker work. for the adr
      pseudo-instruction used to find __hwcap, the assembler in thumb mode
      generates a 16-bit thumb add instruction which can only represent
      word-aligned addresses, despite not knowing the alignment of the
      label. if the setjmp function is assigned a non-multiple-of-4 address
      at link time, the load then loads from the wrong address (the last
      instruction rather than the data containing the offset) and ends up
      reading nonsense instead of the value of __hwcap. this in turn causes
      the checks for floating-point/vector register sets (e.g. IWMMX) to
      evaluate incorrectly, crashing when setjmp/longjmp try to save/restore
      those registers.
      
      fix based on bug report by Felix Hädicke.
      e364774d
  19. 07 9月, 2017 2 次提交
    • R
      work around incorrect EPERM from mmap syscall · da438ee1
      Rich Felker 提交于
      under some conditions, the mmap syscall wrongly fails with EPERM
      instead of ENOMEM when memory is exhausted; this is probably the
      result of the kernel trying to fit the allocation somewhere that
      crosses into the kernel range or below mmap_min_addr. in any case it's
      a conformance bug, so work around it. for now, only handle the case of
      anonymous mappings with no requested address; in other cases EPERM may
      be a legitimate error.
      
      this indirectly fixes the possibility of malloc failing with the wrong
      errno value.
      da438ee1
    • R
      fix glob descent into . and .. with GLOB_PERIOD · 8c4be3e2
      Rich Felker 提交于
      GLOB_PERIOD is a gnu extension, and GNU glob does not seem to honor it
      except in the last path component. it's not clear whether this a bug
      or intentional, but it seems reasonable that it should exclude the
      special entries . and .. when walking.
      
      changes based on report and analysis by Julien Ramseier.
      8c4be3e2