1. 07 12月, 2017 1 次提交
    • W
      implement the fopencookie extension to stdio · 06184334
      William Pitcock 提交于
      notes added by maintainer:
      
      this function is a GNU extension. it was chosen over the similar BSD
      function funopen because the latter depends on fpos_t being an
      arithmetic type as part of its public API, conflicting with our
      definition of fpos_t and with the intent that it be an opaque type. it
      was accepted for inclusion because, despite not being widely used, it
      is usually very difficult to extricate software using it from the
      dependency on it.
      
      calling pattern for the read and write callbacks is not likely to
      match glibc or other implementations, but should work with any
      reasonable callbacks. in particular the read function is never called
      without at least one byte being needed to satisfy its caller, so that
      spurious blocking is not introduced.
      
      contracts for what callbacks called from inside libc/stdio can do are
      always complicated, and at some point still need to be specified
      explicitly. at the very least, the callbacks must return or block
      indefinitely (they cannot perform nonlocal exits) and they should not
      make calls to stdio using their own FILE as an argument.
      06184334
  2. 21 11月, 2017 2 次提交
    • R
      make fgetwc handling of encoding errors consistent with/without buffer · 4000b010
      Rich Felker 提交于
      previously, fgetwc left all but the first byte of an illegal sequence
      unread (available for subsequent calls) when reading out of the FILE
      buffer, but dropped all bytes contibuting to the error when falling
      back to reading a byte at a time. neither behavior was ideal. in the
      buffered case, each malformed character produced one error per byte,
      rather than one per character. in the unbuffered case, consuming the
      last byte that caused the transition from "incomplete" to "invalid"
      state potentially dropped (and produced additional spurious encoding
      errors for) the next valid character.
      
      to handle both cases uniformly without duplicate code, revise the
      buffered case to only cover situations where a complete and valid
      character is present in the buffer, and fall back to byte-at-a-time
      for all other cases. this allows using mbtowc (stateless) instead of
      mbrtowc, which may slightly improve performance too.
      
      when an encoding error has been hit in the byte-at-a-time case, leave
      the final byte that produced the error unread (via ungetc) except in
      the case of single-byte errors (for UTF-8, bytes c0, c1, f5-ff, and
      continuation bytes with no lead byte). single-byte errors are fully
      consumed so as not to leave the caller in an infinite loop repeating
      the same error.
      
      none of these changes are distinguished from a conformance standpoint,
      since the file position is unspecified after encoding errors. they are
      intended merely as QoI/consistency improvements.
      4000b010
    • R
      fix treatment by fgetws of encoding errors as eof · a90d9da1
      Rich Felker 提交于
      fgetwc does not set the stream's error indicator on encoding errors,
      making ferror insufficient to distinguish between error and eof
      conditions. feof is also insufficient, since it will return true if
      the file ended with a partial character encoding error.
      
      whether fgetwc should be setting the error indicator itself is a
      question with conflicting answers. the POSIX text for the function
      states it as a requirement, but the ISO C text seems to require that
      it not. this may be revisited in the future based on the outcome of
      Austin Group issue #1170.
      a90d9da1
  3. 19 11月, 2017 1 次提交
  4. 15 11月, 2017 1 次提交
    • R
      add reverse iconv mappings for JIS-based encodings · a223dbd2
      Rich Felker 提交于
      these encodings are still commonly used in messaging protocols and
      such. the reverse mapping is implemented as a binary search of a list
      of the jis 0208 characters in unicode order; the existing forward
      table is used to perform the comparison in the search.
      a223dbd2
  5. 14 11月, 2017 2 次提交
    • R
      generalize iconv framework for 8-bit codepages · 105eff9d
      Rich Felker 提交于
      previously, 8-bit codepages could only remap the high 128 bytes; the
      low range was assumed/forced to agree with ascii. interpretation of
      codepage table headers has been changed so that it's possible to
      represent mappings for up to 256 slots (fewer if the initial portion
      of the map is elided because it coincides with unicode codepoints).
      this requires consuming a bit more of the 10-bit space of characters
      that can be represented in 8-bit codepages, but there's still a plenty
      left. the size of the legacy_chars table is actually reduced now by
      eliding the first 256 entries and considering them to map implicitly
      via the identity map.
      
      before these changes, there seem to have been minor bugs/omissions in
      codepage table generation, so it's likely that some actual bug fixes
      are silently included in this commit. round-trip testing of a few
      codepages was performed on the new version of the code, but no
      differential testing against the old version was done.
      105eff9d
    • R
      fix malloc state corruption when ldso rejects loading a second libc · a71b46cf
      Rich Felker 提交于
      commit c49d3c8a added logic to detect
      attempts to load libc.so via another name and instead redirect to the
      existing libc, rather than loading two and producing dangerously
      inconsistent state. however, the check for and unmapping of the
      duplicate libc happened after reclaim_gaps was already called,
      donating the slack space around the writable segment to malloc.
      subsequent unmapping of the library then invalidated malloc's free
      lists.
      
      fix the issue by moving the call to reclaim_gaps out of map_library
      into load_library, after the duplicate libc check but before the first
      call to calloc, so that the gaps can still be used to satisfy the
      allocation of struct dso. this change also eliminates the need for an
      ugly hack (temporarily setting runtime=1) to avoid reclaim_gaps when
      loading the main program via map_library, which happens when ldso is
      invoked as a command.
      
      only programs/libraries erroneously containing a DT_NEEDED reference
      to libc.so via an absolute pathname or symlink were affected by this
      issue.
      a71b46cf
  6. 11 11月, 2017 6 次提交
    • R
      reformat cjk iconv tables to be diff-friendly, match tool output · d060edf6
      Rich Felker 提交于
      the new version of the code used to generate these tables forces a
      newline every 256 entries, whereas at the time these files were
      originally generated and committed, it only wrapped them at 80
      columns. the new behavior ensures that localized changes to the
      tables, if they are ever needed, will produce localized diffs. other
      tables including hkscs were already committed in the new format.
      
      binary comparison of the generated object files was performed to
      confirm that no spurious changes slipped in.
      d060edf6
    • B
      prevent fork's errno from being clobbered by atfork handlers · c21051e9
      Bobby Bingham 提交于
      If the syscall fails, errno must be set correctly for the caller.
      There's no guarantee that the handlers registered with pthread_atfork
      won't clobber errno, so we need to ensure it gets set after they are
      called.
      c21051e9
    • R
      add iso-2022-jp support (decoding only) to iconv · a39f20bf
      Rich Felker 提交于
      this implementation aims to match the baseline defined by rfc1468 (the
      original mime charset definition) plus the halfwidth katakana
      extension included in the whatwg definition of the charset. rejection
      of si/so controls and newlines in doublebyte state are not currently
      enforced. the jis x 0201 mode is currently interpreted as having the
      yen sign and overline character in place of backslash and tilde; ascii
      mode has the standard ascii characters in those slots.
      a39f20bf
    • R
      add iconv framework for decoding stateful encodings · 5b546faa
      Rich Felker 提交于
      assuming pointers obtained from malloc have some nonzero alignment,
      repurpose the low bit of iconv_t as an indicator that the descriptor
      is a stateless value representing the source and destination character
      encodings.
      5b546faa
    • R
      simplify/optimize iconv utf-8 case · 0df5b39a
      Rich Felker 提交于
      the special case where mbrtowc returns 0 but consumed 1 byte of input
      does not need to be considered, because the short-circuit for low
      bytes already covered that case.
      0df5b39a
    • R
      handle ascii range individually in each iconv case · 9eb6dd51
      Rich Felker 提交于
      short-circuiting low bytes before the switch precluded support for
      character encodings that don't coincide with ascii in this range. this
      limitation affected iso-2022 encodings, which use the esc byte to
      introduce a shift sequence, and things like ebcdic.
      9eb6dd51
  7. 10 11月, 2017 4 次提交
    • R
      move iconv_close to its own translation unit · bff59d13
      Rich Felker 提交于
      this is in preparation to support stateful conversion descriptors,
      which are necessarily allocated and thus must be freed in iconv_close.
      putting it in a separate TU will avoid pulling in free if iconv_close
      is not referenced.
      bff59d13
    • R
      refactor iconv conversion descriptor encoding/decoding · 79f49eff
      Rich Felker 提交于
      this change is made to avoid having assumptions about the encoding
      spread out across the file, and to facilitate future change to a form
      that can accommodate allocted, stateful descriptors when needed.
      
      this commit should not produce any functional changes; with the
      compiler tested the only change to code generation was minor
      reordering of local variables on stack.
      79f49eff
    • A
      fix getaddrinfo error code for non-numeric service with AI_NUMERICSERV · 30fdda6c
      A. Wilcox 提交于
      If AI_NUMERICSERV is specified and a numeric service was not provided,
      POSIX mandates getaddrinfo return EAI_NONAME. EAI_SERVICE is only for
      services that cannot be used on the specified socket type.
      30fdda6c
    • R
      fix mismatched type of __pthread_tsd_run_dtors weak definition · 67b29947
      Rich Felker 提交于
      commit a6054e3c changed this function
      not to take an argument, but the weak definition used by timer_create
      was not updated to match.
      
      reported by Pascal Cuoq.
      67b29947
  8. 06 11月, 2017 23 次提交