提交 · a868931fecdf93f3ceb1c9431bb93757b706269d · Greenplum / Gpdb

15 5月, 2015 1 次提交

Fix insufficiently-paranoid GB18030 encoding verifier. · a868931f

由 Tom Lane 提交于 5月 15, 2015

The previous coding effectively only verified that the second byte of a
multibyte character was in the expected range; moreover, it wasn't careful
to make sure that the second byte even exists in the buffer before touching
it.  The latter seems unlikely to cause any real problems in the field
(in particular, it could never be a problem with null-terminated input),
but it's still a bug.

Since GB18030 is not a supported backend encoding, the only thing we'd
really be doing with GB18030 text is converting it to UTF8 in LocalToUtf,
which would fail anyway on any invalid character for lack of a match in
its lookup table.  So the only user-visible consequence of this change
should be that you'll get "invalid byte sequence for encoding" rather than
"character has no equivalent" for malformed GB18030 input.  However,
impending changes to the GB18030 conversion code will require these tighter
up-front checks to avoid producing bogus results.

a868931f

07 5月, 2014 1 次提交

pgindent run for 9.4 · 0a783200

由 Bruce Momjian 提交于 5月 06, 2014

This includes removing tabs after periods in C comments, which was
applied to back branches, so this change should not effect backpatching.

0a783200

25 3月, 2014 1 次提交

Remove wchar.c Asserts that were stricter than the main code · 5db55c6b

由 Bruce Momjian 提交于 3月 24, 2014

Assert errors were thrown for functions being passed invalid encodings,
while the main code handled it just fine.

Also document that libpq's PQclientEncoding() returns -1 for an encoding
lookup failure.

Per report from Peter Geoghegan

5db55c6b

19 1月, 2014 1 次提交

Make various variables const (read-only). · 0d79c0a8

由 Tom Lane 提交于 1月 18, 2014

These changes should generally improve correctness/maintainability.
A nice side benefit is that several kilobytes move from initialized
data to text segment, allowing them to be shared across processes and
probably reducing copy-on-write overhead while forking a new backend.
Unfortunately this doesn't seem to help libpq in the same way (at least
not when it's compiled with -fpic on x86_64), but we can hope the linker
at least collects all nominally-const data together even if it's not
actually part of the text segment.

Also, make pg_encname_tbl[] static in encnames.c, since there seems
no very good reason for any other code to use it; per a suggestion
from Wim Lewis, who independently submitted a patch that was mostly
a subset of this one.

Oskari Saarenmaa, with some editorialization by me

0d79c0a8

30 5月, 2013 1 次提交

pgindent run for release 9.3 · 9af4159f

由 Bruce Momjian 提交于 5月 29, 2013

This is the first run of the Perl-based pgindent script.  Also update
pgindent instructions.

9af4159f

17 12月, 2012 1 次提交
- A
  Tidy up from frontend Assert change. · 3717f083
  由 Andrew Dunstan 提交于 12月 16, 2012
```
Quiet compiler warnings noted by Peter Eisentraut.
```
  3717f083
11 7月, 2012 1 次提交
- T
  Fix ASCII case in pg_wchar2mule_with_len. · 60e9c224
  由 Tom Lane 提交于 7月 10, 2012
```
Also some cosmetic improvements for wchar-to-mblen patch.
```
  60e9c224
06 7月, 2012 1 次提交
- R
  Fix failure of new wchar->mb functions to advance from pointer. · f6a05fd9
  由 Robert Haas 提交于 7月 05, 2012
```
Bug spotted by Tom Lane.
```
  f6a05fd9
05 7月, 2012 1 次提交

Add wchar -> mb conversion routines. · 72dd6291

由 Robert Haas 提交于 7月 04, 2012

This is infrastructure for Alexander Korotkov's work on indexing regular
expression searches.

Alexander Korotkov, with a bit of further hackery on the MULE conversion
by me

72dd6291

04 7月, 2012 1 次提交

Improve documentation about MULE encoding. · 09022de1

由 Tom Lane 提交于 7月 04, 2012

This commit improves the comments in pg_wchar.h and creates #define symbols
for some formerly hard-coded values.  No substantive code changes.

Tatsuo Ishii and Tom Lane

09022de1

11 6月, 2012 1 次提交
- B
  Run pgindent on 9.2 source tree in preparation for first 9.3 · 927d61ee
  由 Bruce Momjian 提交于 6月 10, 2012
```
commit-fest.
```
  927d61ee
24 4月, 2012 1 次提交
- R
  Lots of doc corrections. · 5d4b60f2
  由 Robert Haas 提交于 4月 23, 2012
```
Josh Kupershmidt
```
  5d4b60f2
31 10月, 2011 1 次提交

Further improvement of make_greater_string. · eb5834d5

由 Tom Lane 提交于 10月 30, 2011

Make sure that it considers all the possibilities that the old code did,
instead of trying only one possibility per character position. To keep the
runtime in bounds, instead tweak the character incrementers to not try
every possible multibyte character code. Remove unnecessary logic to
restore the old character value on failure. Additional comment and
formatting cleanup.

eb5834d5

30 10月, 2011 1 次提交

Improve make_greater_string() with encoding-specific incrementers. · 78d523b6

由 Robert Haas 提交于 10月 29, 2011

This infrastructure doesn't in any way guarantee that the character
we produce will sort before the one we incremented; but it does at least
make it much more likely that we'll end up with something that is a valid
character, which improves our chances.

Kyotaro Horiguchi, with various adjustments by me.

78d523b6

06 9月, 2011 1 次提交

Improve "invalid byte sequence for encoding" message · a2a5ce68

由 Peter Eisentraut 提交于 9月 05, 2011

It used to say

ERROR:  invalid byte sequence for encoding "UTF8": 0xdb24

Change this to

ERROR:  invalid byte sequence for encoding "UTF8": 0xdb 0x24

to make it clear that this is a byte sequence and not a code point.

Also fix the adjacent "character has no equivalent" message that has
the same issue.

a2a5ce68

21 9月, 2010 1 次提交
- M
  
  Remove cvs keywords from all files. · 9f2e2113
  由 Magnus Hagander 提交于 9月 20, 2010
  
  9f2e2113
19 8月, 2010 1 次提交

Rename utf2ucs() to utf8_to_unicode(), and export it so it can be used · 2d8314bd

由 Tom Lane 提交于 8月 18, 2010

elsewhere.

Similarly rename the version in mbprint.c, not because this affects anything
but just to keep the two copies in exact sync.  There was some discussion of
having only one copy in src/port/ instead, but this function is so small
and unlikely to change that that seems like overkill.

Slightly editorialized version of a patch by Joseph Adams.  (The bug-fix
aspect of his patch was applied separately, and back-patched.)

2d8314bd

05 1月, 2010 1 次提交
- A
  
  Remove sometimes inaccurate error hint about source of wrongly encoded data. · fc09fb7b
  由 Andrew Dunstan 提交于 1月 04, 2010
  
  fc09fb7b
11 6月, 2009 1 次提交
- B
  8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list · d7471402
  由 Bruce Momjian 提交于 6月 11, 2009
```
provided by Andrew.
```
  d7471402
03 3月, 2009 1 次提交

When we are in error recursion trouble, arrange to suppress translation and · fd9e2acc

由 Tom Lane 提交于 3月 02, 2009

encoding conversion of any elog/ereport message being sent to the frontend.
This generalizes a patch that I put in last October, which suppressed
translation of only specific messages known to be associated with recursive
can't-translate-the-message behavior. As shown in bug #4680, we need a more
general answer in order to have some hope of coping with broken encoding
conversion setups. This approach seems a good deal less klugy anyway.

Patch in all supported branches.

fd9e2acc

11 2月, 2009 2 次提交
- P
  
  Support for KOI8U encoding · 8b9dd6b5
  由 Peter Eisentraut 提交于 2月 10, 2009
  
  8b9dd6b5
- P
  Remove the encoding *numbers* from the comments. They are useless, and · 1cb54c28
  由 Peter Eisentraut 提交于 2月 10, 2009
```
make maintenance harder.
```
  1cb54c28
30 1月, 2009 1 次提交

Replace argument-checking Asserts with regular test-and-elog checks in all · 0d65eea3

由 Tom Lane 提交于 1月 29, 2009

encoding conversion functions. These are not can't-happen cases because
it's possible to create a conversion with the wrong conversion function
for the specified encoding pair. That would lead to an Assert crash in
an Assert-enabled build, or incorrect conversion otherwise, neither of
which is desirable. This would be a DOS issue if production databases
were customarily built with asserts enabled, but fortunately that's not so.
Per an observation by Heikki.

Back-patch to all supported branches.

0d65eea3

29 10月, 2008 1 次提交
- P
  
  Unicode escapes in strings and identifiers · 06735e32
  由 Peter Eisentraut 提交于 10月 29, 2008
  
  06735e32
28 10月, 2008 1 次提交

Install a more robust solution for the problem of infinite error-processing · b0169bb1

由 Tom Lane 提交于 10月 27, 2008

recursion when we are unable to convert a localized error message to the
client's encoding. We've been over this ground before, but as reported by
Ibrar Ahmed, it still didn't work in the case of conversion failures for
the conversion-failure message itself :-(. Fix by installing a "circuit
breaker" that disables attempts to localize this message once we get into
recursion trouble.

Patch all supported branches, because it is in fact broken in all of them;
though I had to add some missing translations to the older branches in
order to expose the failure in the particular test case I was using.

b0169bb1

16 11月, 2007 1 次提交
- B
  
  pgindent run for 8.3. · fdf5a5ef
  由 Bruce Momjian 提交于 11月 15, 2007
  
  fdf5a5ef
16 10月, 2007 1 次提交

Fix pg_wchar_table[] to match revised ordering of the encoding ID enum. · febd60bf

由 Tom Lane 提交于 10月 15, 2007

Add some comments so hopefully the next poor sod doesn't fall into the
same trap.  (Wrong comments are worse than none at all...)

febd60bf

19 9月, 2007 1 次提交

Close previously open holes for invalidly encoded data to enter the · 55613bf9

由 Andrew Dunstan 提交于 9月 18, 2007

database via builtin functions, as recently discussed on -hackers.

chr() now returns a character in the database encoding. For UTF8 encoded databases
the argument is treated as a Unicode code point. For other multi-byte encodings
the argument must designate a strict ascii character, or an error is raised,
as is also the case if the argument is 0.

ascii() is adjusted so that it remains the inverse of chr().

The two argument form of convert() is gone, and the three argument form now
takes a bytea first argument and returns a bytea. To cover this loss three new
functions are introduced:
. convert_from(bytea, name) returns text - converts the first argument from the
  named encoding to the database encoding
. convert_to(text, name) returns bytea - converts the first argument from the
  database encoding to the named encoding
. length(bytea, name) returns int - gives the length of the first argument in
  characters in the named encoding

55613bf9

13 7月, 2007 1 次提交
- T
  
  Suppress an integer-overflow warning. · 4dbbef28
  由 Tom Lane 提交于 7月 12, 2007
  
  4dbbef28
15 4月, 2007 1 次提交
- T
  Make JOHAB client only encoding per discussions in pgsql-hackers · 6041b922
  由 Tatsuo Ishii 提交于 4月 15, 2007
```
"Server-side support of all encodings" around 2007/3/26.
initdb required.
```
  6041b922
26 3月, 2007 1 次提交
- T
  Fix pg_wchar_table's maxmblen field of EUC_CN, EUC_TW, MULE_INTERNAL · a6fbd2f1
  由 Tatsuo Ishii 提交于 3月 26, 2007
```
and GB18030. patches from ITAGAKI Takahiro.
```
  a6fbd2f1
25 3月, 2007 1 次提交

Add new encoding EUC_JIS_2004 and SHIFT_JIS_2004, · 75c6519f

由 Tatsuo Ishii 提交于 3月 25, 2007

along with new conversions among EUC_JIS_2004, SHIFT_JIS_2004 and UTF-8.
catalog version has been bump up.

75c6519f

25 1月, 2007 1 次提交

Get pg_utf_mblen(), pg_utf2wchar_with_len(), and utf2ucs() all on the same · 0887fa11

由 Tom Lane 提交于 1月 24, 2007

page about the maximum UTF8 sequence length we support (4 bytes since 8.1,
3 before that). pg_utf2wchar_with_len never got updated to support 4-byte
characters at all, and in any case had a buffer-overrun risk in that it
could produce multiple pg_wchars from what mblen claims to be just one UTF8
character. The only reason we don't have a major security hole is that most
callers allocate worst-case output buffers; the sole exception in released
versions appears to be pre-8.2 iwchareq() (ie, ILIKE), which can be crashed
due to zeroing out its return address --- but AFAICS that can't be exploited
for anything more than a crash, due to inability to control what gets written
there. Per report from James Russell and Michael Fuhr.

Pre-8.1 the risk is much less, but I still think pg_utf2wchar_with_len's
behavior given an incomplete final character risks buffer overrun, so
back-patch that logic change anyway.

This patch also makes sure that UTF8 sequences exceeding the supported
length (whichever it is) are consistently treated as error cases, rather
than being treated like a valid shorter sequence in some places.

0887fa11

04 10月, 2006 1 次提交
- B
  
  pgindent run for 8.2. · f99a569a
  由 Bruce Momjian 提交于 10月 04, 2006
  
  f99a569a
22 8月, 2006 2 次提交
- B
  In new "invalid byte sequence" error hint, call it "error", not · a3132359
  由 Bruce Momjian 提交于 8月 22, 2006
```
"failure".
```
  a3132359
- B
  Add hint for "invalid byte sequence for encoding" error message, · e11cab65
  由 Bruce Momjian 提交于 8月 22, 2006
```
suggesting review of client_encoding.
```
  e11cab65
22 5月, 2006 1 次提交

Change the backend to reject strings containing invalidly-encoded multibyte · c61a2f58

由 Tom Lane 提交于 5月 21, 2006

characters in all cases. Formerly we mostly just threw warnings for invalid
input, and failed to detect it at all if no encoding conversion was required.
The tighter check is needed to defend against SQL-injection attacks as per
CVE-2006-2313 (further details will be published after release). Embedded
zero (null) bytes will be rejected as well. The checks are applied during
input to the backend (receipt from client or COPY IN), so it no longer seems
necessary to check in textin() and related routines; any string arriving at
those functions will already have been validated. Conversion failure
reporting (for characters with no equivalent in the destination encoding)
has been cleaned up and made consistent while at it.

Also, fix a few longstanding errors in little-used encoding conversion
routines: win1251_to_iso, win866_to_iso, euc_tw_to_big5, euc_tw_to_mic,
mic_to_euc_tw were all broken to varying extents.

Patches by Tatsuo Ishii and Tom Lane. Thanks to Akio Ishida and Yasuo Ohgaki
for identifying the security issues.

c61a2f58

19 2月, 2006 1 次提交

Add support for Windows codepages 1253, 1254, 1255, and 1257 and clean · 1b658473

由 Peter Eisentraut 提交于 2月 18, 2006

up a bunch of the support utilities.

In src/backend/utils/mb/Unicode remove nearly duplicate copies of the
UCS_to_XXX perl script and replace with one version to handle all generic
files.  Update the Makefile so that it knows about all the map files.
This produces a slight difference in some of the map files, using a
uniform naming convention and not mapping the null character.

In src/backend/utils/mb/conversion_procs create a master utf8<->win
codepage function like the ISO 8859 versions instead of having a separate
handler for each conversion.

There is an externally visible change in the name of the win1258 to utf8
conversion.  According to the documentation notes, it was named
incorrectly and this changes it to a standard name.

Running the Unicode mapping perl scripts has shown some additional mapping
changes in koi8r and iso8859-7.

1b658473

10 2月, 2006 1 次提交

Allow psql multi-line column values to align in the proper columns · c01999a5

由 Bruce Momjian 提交于 2月 10, 2006

  If the second output column value is 'a\nb', the 'b' should appear
  in the second display column, rather than the first column as it
  does now.

Change libpq's PQdsplen() to return more useful values.

> Note: this changes the PQdsplen function, it can now return zero or
> minus one which was not possible before. It doesn't appear anyone is
> actually using the functions other than psql but it is a change. The
> functions are not actually documentated anywhere so it's not like we're
> breaking a defined interface. The new semantics follow the Unicode
> standard.

BACKWARD COMPATIBLE CHANGE.

The only user-visible change I saw in the regression tests is that a
SELECT * on a table where all the columns have been dropped doesn't
return a blank line like before.  This seems like a step forward.

Martijn van Oosterhout

c01999a5

27 12月, 2005 1 次提交
- B
  
  More uses of IS_HIGHBIT_SET() macro. · a2384d00
  由 Bruce Momjian 提交于 12月 26, 2005
  
  a2384d00