1. 15 5月, 2015 1 次提交
    • T
      Fix insufficiently-paranoid GB18030 encoding verifier. · a868931f
      Tom Lane 提交于
      The previous coding effectively only verified that the second byte of a
      multibyte character was in the expected range; moreover, it wasn't careful
      to make sure that the second byte even exists in the buffer before touching
      it.  The latter seems unlikely to cause any real problems in the field
      (in particular, it could never be a problem with null-terminated input),
      but it's still a bug.
      
      Since GB18030 is not a supported backend encoding, the only thing we'd
      really be doing with GB18030 text is converting it to UTF8 in LocalToUtf,
      which would fail anyway on any invalid character for lack of a match in
      its lookup table.  So the only user-visible consequence of this change
      should be that you'll get "invalid byte sequence for encoding" rather than
      "character has no equivalent" for malformed GB18030 input.  However,
      impending changes to the GB18030 conversion code will require these tighter
      up-front checks to avoid producing bogus results.
      a868931f
  2. 07 5月, 2014 1 次提交
    • B
      pgindent run for 9.4 · 0a783200
      Bruce Momjian 提交于
      This includes removing tabs after periods in C comments, which was
      applied to back branches, so this change should not effect backpatching.
      0a783200
  3. 25 3月, 2014 1 次提交
  4. 19 1月, 2014 1 次提交
    • T
      Make various variables const (read-only). · 0d79c0a8
      Tom Lane 提交于
      These changes should generally improve correctness/maintainability.
      A nice side benefit is that several kilobytes move from initialized
      data to text segment, allowing them to be shared across processes and
      probably reducing copy-on-write overhead while forking a new backend.
      Unfortunately this doesn't seem to help libpq in the same way (at least
      not when it's compiled with -fpic on x86_64), but we can hope the linker
      at least collects all nominally-const data together even if it's not
      actually part of the text segment.
      
      Also, make pg_encname_tbl[] static in encnames.c, since there seems
      no very good reason for any other code to use it; per a suggestion
      from Wim Lewis, who independently submitted a patch that was mostly
      a subset of this one.
      
      Oskari Saarenmaa, with some editorialization by me
      0d79c0a8
  5. 30 5月, 2013 1 次提交
  6. 17 12月, 2012 1 次提交
  7. 11 7月, 2012 1 次提交
  8. 06 7月, 2012 1 次提交
  9. 05 7月, 2012 1 次提交
    • R
      Add wchar -> mb conversion routines. · 72dd6291
      Robert Haas 提交于
      This is infrastructure for Alexander Korotkov's work on indexing regular
      expression searches.
      
      Alexander Korotkov, with a bit of further hackery on the MULE conversion
      by me
      72dd6291
  10. 04 7月, 2012 1 次提交
    • T
      Improve documentation about MULE encoding. · 09022de1
      Tom Lane 提交于
      This commit improves the comments in pg_wchar.h and creates #define symbols
      for some formerly hard-coded values.  No substantive code changes.
      
      Tatsuo Ishii and Tom Lane
      09022de1
  11. 11 6月, 2012 1 次提交
  12. 24 4月, 2012 1 次提交
  13. 31 10月, 2011 1 次提交
    • T
      Further improvement of make_greater_string. · eb5834d5
      Tom Lane 提交于
      Make sure that it considers all the possibilities that the old code did,
      instead of trying only one possibility per character position.  To keep the
      runtime in bounds, instead tweak the character incrementers to not try
      every possible multibyte character code.  Remove unnecessary logic to
      restore the old character value on failure.  Additional comment and
      formatting cleanup.
      eb5834d5
  14. 30 10月, 2011 1 次提交
    • R
      Improve make_greater_string() with encoding-specific incrementers. · 78d523b6
      Robert Haas 提交于
      This infrastructure doesn't in any way guarantee that the character
      we produce will sort before the one we incremented; but it does at least
      make it much more likely that we'll end up with something that is a valid
      character, which improves our chances.
      
      Kyotaro Horiguchi, with various adjustments by me.
      78d523b6
  15. 06 9月, 2011 1 次提交
    • P
      Improve "invalid byte sequence for encoding" message · a2a5ce68
      Peter Eisentraut 提交于
      It used to say
      
      ERROR:  invalid byte sequence for encoding "UTF8": 0xdb24
      
      Change this to
      
      ERROR:  invalid byte sequence for encoding "UTF8": 0xdb 0x24
      
      to make it clear that this is a byte sequence and not a code point.
      
      Also fix the adjacent "character has no equivalent" message that has
      the same issue.
      a2a5ce68
  16. 21 9月, 2010 1 次提交
  17. 19 8月, 2010 1 次提交
    • T
      Rename utf2ucs() to utf8_to_unicode(), and export it so it can be used · 2d8314bd
      Tom Lane 提交于
      elsewhere.
      
      Similarly rename the version in mbprint.c, not because this affects anything
      but just to keep the two copies in exact sync.  There was some discussion of
      having only one copy in src/port/ instead, but this function is so small
      and unlikely to change that that seems like overkill.
      
      Slightly editorialized version of a patch by Joseph Adams.  (The bug-fix
      aspect of his patch was applied separately, and back-patched.)
      2d8314bd
  18. 05 1月, 2010 1 次提交
  19. 11 6月, 2009 1 次提交
  20. 03 3月, 2009 1 次提交
    • T
      When we are in error recursion trouble, arrange to suppress translation and · fd9e2acc
      Tom Lane 提交于
      encoding conversion of any elog/ereport message being sent to the frontend.
      This generalizes a patch that I put in last October, which suppressed
      translation of only specific messages known to be associated with recursive
      can't-translate-the-message behavior.  As shown in bug #4680, we need a more
      general answer in order to have some hope of coping with broken encoding
      conversion setups.  This approach seems a good deal less klugy anyway.
      
      Patch in all supported branches.
      fd9e2acc
  21. 11 2月, 2009 2 次提交
  22. 30 1月, 2009 1 次提交
    • T
      Replace argument-checking Asserts with regular test-and-elog checks in all · 0d65eea3
      Tom Lane 提交于
      encoding conversion functions.  These are not can't-happen cases because
      it's possible to create a conversion with the wrong conversion function
      for the specified encoding pair.  That would lead to an Assert crash in
      an Assert-enabled build, or incorrect conversion otherwise, neither of
      which is desirable.  This would be a DOS issue if production databases
      were customarily built with asserts enabled, but fortunately that's not so.
      Per an observation by Heikki.
      
      Back-patch to all supported branches.
      0d65eea3
  23. 29 10月, 2008 1 次提交
  24. 28 10月, 2008 1 次提交
    • T
      Install a more robust solution for the problem of infinite error-processing · b0169bb1
      Tom Lane 提交于
      recursion when we are unable to convert a localized error message to the
      client's encoding.  We've been over this ground before, but as reported by
      Ibrar Ahmed, it still didn't work in the case of conversion failures for
      the conversion-failure message itself :-(.  Fix by installing a "circuit
      breaker" that disables attempts to localize this message once we get into
      recursion trouble.
      
      Patch all supported branches, because it is in fact broken in all of them;
      though I had to add some missing translations to the older branches in
      order to expose the failure in the particular test case I was using.
      b0169bb1
  25. 16 11月, 2007 1 次提交
  26. 16 10月, 2007 1 次提交
  27. 19 9月, 2007 1 次提交
    • A
      Close previously open holes for invalidly encoded data to enter the · 55613bf9
      Andrew Dunstan 提交于
      database via builtin functions, as recently discussed on -hackers.
      
      chr() now returns a character in the database encoding. For UTF8 encoded databases
      the argument is treated as a Unicode code point. For other multi-byte encodings
      the argument must designate a strict ascii character, or an error is raised,
      as is also the case if the argument is 0.
      
      ascii() is adjusted so that it remains the inverse of chr().
      
      The two argument form of convert() is gone, and the three argument form now
      takes a bytea first argument and returns a bytea. To cover this loss three new
      functions are introduced:
      . convert_from(bytea, name) returns text - converts the first argument from the
        named encoding to the database encoding
      . convert_to(text, name) returns bytea - converts the first argument from the
        database encoding to the named encoding
      . length(bytea, name) returns int - gives the length of the first argument in
        characters in the named encoding
      55613bf9
  28. 13 7月, 2007 1 次提交
  29. 15 4月, 2007 1 次提交
  30. 26 3月, 2007 1 次提交
  31. 25 3月, 2007 1 次提交
  32. 25 1月, 2007 1 次提交
    • T
      Get pg_utf_mblen(), pg_utf2wchar_with_len(), and utf2ucs() all on the same · 0887fa11
      Tom Lane 提交于
      page about the maximum UTF8 sequence length we support (4 bytes since 8.1,
      3 before that).  pg_utf2wchar_with_len never got updated to support 4-byte
      characters at all, and in any case had a buffer-overrun risk in that it
      could produce multiple pg_wchars from what mblen claims to be just one UTF8
      character.  The only reason we don't have a major security hole is that most
      callers allocate worst-case output buffers; the sole exception in released
      versions appears to be pre-8.2 iwchareq() (ie, ILIKE), which can be crashed
      due to zeroing out its return address --- but AFAICS that can't be exploited
      for anything more than a crash, due to inability to control what gets written
      there.  Per report from James Russell and Michael Fuhr.
      
      Pre-8.1 the risk is much less, but I still think pg_utf2wchar_with_len's
      behavior given an incomplete final character risks buffer overrun, so
      back-patch that logic change anyway.
      
      This patch also makes sure that UTF8 sequences exceeding the supported
      length (whichever it is) are consistently treated as error cases, rather
      than being treated like a valid shorter sequence in some places.
      0887fa11
  33. 04 10月, 2006 1 次提交
  34. 22 8月, 2006 2 次提交
  35. 22 5月, 2006 1 次提交
    • T
      Change the backend to reject strings containing invalidly-encoded multibyte · c61a2f58
      Tom Lane 提交于
      characters in all cases.  Formerly we mostly just threw warnings for invalid
      input, and failed to detect it at all if no encoding conversion was required.
      The tighter check is needed to defend against SQL-injection attacks as per
      CVE-2006-2313 (further details will be published after release).  Embedded
      zero (null) bytes will be rejected as well.  The checks are applied during
      input to the backend (receipt from client or COPY IN), so it no longer seems
      necessary to check in textin() and related routines; any string arriving at
      those functions will already have been validated.  Conversion failure
      reporting (for characters with no equivalent in the destination encoding)
      has been cleaned up and made consistent while at it.
      
      Also, fix a few longstanding errors in little-used encoding conversion
      routines: win1251_to_iso, win866_to_iso, euc_tw_to_big5, euc_tw_to_mic,
      mic_to_euc_tw were all broken to varying extents.
      
      Patches by Tatsuo Ishii and Tom Lane.  Thanks to Akio Ishida and Yasuo Ohgaki
      for identifying the security issues.
      c61a2f58
  36. 19 2月, 2006 1 次提交
    • P
      Add support for Windows codepages 1253, 1254, 1255, and 1257 and clean · 1b658473
      Peter Eisentraut 提交于
      up a bunch of the support utilities.
      
      In src/backend/utils/mb/Unicode remove nearly duplicate copies of the
      UCS_to_XXX perl script and replace with one version to handle all generic
      files.  Update the Makefile so that it knows about all the map files.
      This produces a slight difference in some of the map files, using a
      uniform naming convention and not mapping the null character.
      
      In src/backend/utils/mb/conversion_procs create a master utf8<->win
      codepage function like the ISO 8859 versions instead of having a separate
      handler for each conversion.
      
      There is an externally visible change in the name of the win1258 to utf8
      conversion.  According to the documentation notes, it was named
      incorrectly and this changes it to a standard name.
      
      Running the Unicode mapping perl scripts has shown some additional mapping
      changes in koi8r and iso8859-7.
      1b658473
  37. 10 2月, 2006 1 次提交
    • B
      Allow psql multi-line column values to align in the proper columns · c01999a5
      Bruce Momjian 提交于
        If the second output column value is 'a\nb', the 'b' should appear
        in the second display column, rather than the first column as it
        does now.
      
      Change libpq's PQdsplen() to return more useful values.
      
      > Note: this changes the PQdsplen function, it can now return zero or
      > minus one which was not possible before. It doesn't appear anyone is
      > actually using the functions other than psql but it is a change. The
      > functions are not actually documentated anywhere so it's not like we're
      > breaking a defined interface. The new semantics follow the Unicode
      > standard.
      
      BACKWARD COMPATIBLE CHANGE.
      
      The only user-visible change I saw in the regression tests is that a
      SELECT * on a table where all the columns have been dropped doesn't
      return a blank line like before.  This seems like a step forward.
      
      Martijn van Oosterhout
      c01999a5
  38. 27 12月, 2005 1 次提交