| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#11
| |||
| |||
| Andrea wrote: > Hi Roedy, > the database (DB2) has this configuration: > ... > Database territory = US > Database code page = 850 > Database code set = IBM-850 > ... > > I've exported to a file the content of a table with a CHAR(N) field > containing the EURO currency character, then I've opened the file with > EncodingRecognizer: if I choose IBM850 I see a strange character (like > a small X), if I choose ISO-8859-1 I see a square. > > I tried a translation with: > > String problematicString = rs.getString(index); > problematicString = new String(problematicString, "IBM850"); // Am I > correct? > > but I still get garbage :-( > > > Thanks, > Andrea Those are quite interesting database configuration parameters. Sounds like a pre-Unicode setup to me... Silvio Bierman |
|
#12
| |||
| |||
| Andrea wrote: > Hi Roedy, > the database (DB2) has this configuration: > ... > Database territory = US > Database code page = 850 > Database code set = IBM-850 > ... > > I've exported to a file the content of a table with a CHAR(N) field > containing the EURO currency character, then I've opened the file with > EncodingRecognizer: if I choose IBM850 I see a strange character (like > a small X), if I choose ISO-8859-1 I see a square. > > I tried a translation with: > > String problematicString = rs.getString(index); > problematicString = new String(problematicString, "IBM850"); // Am I > correct? > > but I still get garbage :-( > > > Thanks, > Andrea You should look at the numeric byte values in problematicString. That could give you an idea of what you are dealing with although it will only disclose what your JDBC driver has made of it. It might have already done an incorrect interpretation of a byte sequence. Things could be even worse, the data could already have been mutilated during insertion in the database when some program (possibly the same program + JDBC driver?) put the data in. If the database encoding does not support all characters that where in the original data then that is what most likely happened. Really, as I said in my previous post you should consider (a) going to a different database that supports Unicode (b) refrain from using/supporting non ASCII characters in your application or (c) do what others have suggested and do your own translation from Unicode -> ASCII -> DB -> ASCII -> Unicode. The latter option is only realistic if you have wrapped all your JDBC code in some generic wrappers (which is usually a good idea) so you can handle this locally. Good luck, Silvio |
|
#13
| |||
| |||
| Hi Silvio, the settings are taken from the DB2 instance of a customer (and I can't change them). The very same code works, of course, without problems with a DB2 instance configured with ISO-8859-1. The problem arises also when a C program stores to DB a string with non-IBM850 valid characters: another C program can read the string without problems while Java can't; so the string is not corrupted when saved to DB but someone (JDBC driver? Java I/O?) looses something when I read the field with Resultset.getString(int index) and I can't convert it correctly (or I haven't found the right way to do it yet, if it exists...). BTW: a test has been made on DB2 with a table with a field declared CHAR(n) FOR BIT DATA and Java code works without problems reading and writing non-IBM850 characters. Having read your feedback (THANK YOU EVERYONE!) I would say that there's no way to read back those characters in Java in my application. Thanks again, Andrea |
|
#14
| |||
| |||
| Hi Silvio, the settings are taken from the DB2 instance of a customer (and I can't change them). The very same code works, of course, without problems with a DB2 instance configured with ISO-8859-1. The problem arises also when a C program stores to DB a string with non-IBM850 valid characters: another C program can read the string without problems while Java can't; so the string is not corrupted when saved to DB but someone (JDBC driver? Java I/O?) looses something when I read the field with Resultset.getString(int index) and I can't convert it correctly (or I haven't found the right way to do it yet, if it exists...). A test has been made on DB2 with a table with a field declared CHAR(n) FOR BIT DATA and Java code works without problems reading and writing non-IBM850 characters. Having read your feedback (THANK YOU EVERYONE!) I would say that there's no way to read back those characters in Java in my application. Thanks again, Andrea |
|
#15
| |||
| |||
| Hello Andrea, Andrea wrote: > Hi Silvio, > the settings are taken from the DB2 instance of a customer (and I > can't change them). The very same code works, of course, without > problems with a DB2 instance configured with ISO-8859-1. > I already expected you could not change this but it was worth suggesting... I do not really understand why a Euro sign would work with 8859-1 since that does not contain that character as far as I am aware of. > The problem arises also when a C program stores to DB a string with > non-IBM850 valid characters: another C program can read the string > without problems while Java can't; so the string is not corrupted when > saved to DB but someone (JDBC driver? Java I/O?) looses something when > I read the field with Resultset.getString(int index) and I can't > convert it correctly (or I haven't found the right way to do it yet, > if it exists...). > It will probably depend on how you access the DB from C (which I have never done with DB2) but it does not sound surprising that the C binding would just pass through byte sequences. As long as you stay inside 8-bit character encoding and always interpret them in the same codepage then this usually works (which is why many programmers are totally unaware of character encoding issues). Since Java programs handle strings as sequences of characters from the Unicode character set all interfaces with external character storage needs to be done encoding aware. This means that when doing input Java String objects are created from byte sequences ALWAYS assuming an encoding, if not explicitly specified defaulting to the platform default. The other way around String objects are converted to byte sequences when doing output as well ALWAYS using an encoding, again defaulting to the platform default. > BTW: a test has been made on DB2 with a table with a field declared > CHAR(n) FOR BIT DATA and Java code works without problems reading and > writing non-IBM850 characters. > That makes sense. What I said before means that if you do not specify different encodings the same default will be used for input and output. If the storage medium leaves the intermediate bytes alone (which any binary database column type would do, just like a binary file would) then output will again match input, again allowing encoding unaware programmers to get away with it. > Having read your feedback (THANK YOU EVERYONE!) I would say that > there's no way to read back those characters in Java in my > application. > The problem is probably that the default encoding in Java turns your Unicode character into something the database is unwilling to store as is in this encoding and therefore mutilates trying to fix something. When queried the database then returns a byte sequence different from what was put in. After that modification the JDBC driver has no chance of restoring the original value. > Thanks again, > Andrea Best regards, Silvio |
|
#16
| |||
| |||
| On Wed, 13 Feb 2008 06:22:22 -0800 (PST), Andrea <tol7481@iperbole.bologna.it> wrote, quoted or indirectly quoted someone who said : >Hi Silvio, >the settings are taken from the DB2 instance of a customer (and I >can't change them). The very same code works, of course, without >problems with a DB2 instance configured with ISO-8859-1. Is is possible to ask the database driver to do the conversions for you? Perhaps internally it is Unicode or some other encoding that can deal with Euros. We have the clue that C++ programs seem to store euro s and get them back out. I am puzzled. I thought JDBC always talked to you in Unicode. -- Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com |
|
#17
| |||
| |||
| On Wed, 13 Feb 2008 16:39:59 +0100, Silvio Bierman <sbierman@jambo-software.com> wrote, quoted or indirectly quoted someone who said : >I do not really understand why a Euro sign would work with 8859-1 since >that does not contain that character as far as I am aware of. You could do an experiment. Try feeding your database all possible unicode chars in a set of 1-char records, and see which ones come back unmangled. This is a kludge, but you could preconvert your Euro to one of those invariant unused chars. -- Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com |
|
#18
| |||
| |||
| Hi everyone, sorry for my previous double-post (a mistake). >Is is possible to ask the database driver to do the conversions for >you? Perhaps internally it is Unicode or some other encoding that can >deal with Euros. I've checked the properties of the JDBC driver I use (http:// publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/ com.ibm.db2.udb.doc/ad/rjvdsprp.htm) but there's nothing concerning encoding conversions. >We have the clue that C++ programs seem to store euro s and get them back out. Yes we have C and COBOL programs that can store and write non-IBM850 chars without problems too. As pointed out by Sabine in her post the reason may be that C programs work with the pure sequences of bytes, without performing any encoding conversion. >>I do not really understand why a Euro sign would work with 8859-1 since >>that does not contain that character as far as I am aware of. SORRY SORRY SORRY SORRY SORRY I tried to insert (through JDBC) the EURO character in a DB2 configured with .... Database territory = C Database code page = 819 Database code set = ISO8859-1 .... and I can't neither write nor read in Java the EURO character correctly :-( A COBOL program works instead correctly. Then I tried the same thing on a SQL-Server 2000 instance with collation compatibility_51_409_30003 (correponding to a 1252 codepage, i.e. Latin 1) and I can store and read the EURO character via Java&JDBC. That doesn't work in Java with Oracle 10g configured with .... NLS_LANGUAGE = AMERICAN NLS_TERRITORY = AMERICA NLS_CHARACTERSET = US7ASCII NLS_LENGTH_SEMANTICS = BYTE .... store&read through COBOL is ok, and in Java I can even write&read accented vowels... even if those characters are outside USASCII7... >You could do an experiment. Try feeding your database all possible >unicode chars in a set of 1-char records, and see which ones come back >unmangled. This is a kludge, but you could preconvert your Euro to >one of those invariant unused chars. The EURO character is just an example and part of the problem, I can't use this type of kludges. The specific problem is much more complex: a password is crypted and stored to DB with a C program but the crypted chars fall outside IBM850 range and in Java I'm unable to read and decrypt back the string... this works if the database is ISO-8859-1 (that's why I though I were able to write another 'weird' char, the euro char, on an ISO-8859-1 DB, sorry...). I've also the more general problem of data entry: I don't know wich characters users will insert so I can't substitute chars. I've found a workaround for my crypting problem but I'm just trying to understand the reason of the problem. Now it's clear to me that with a CHAR field Java performs an encoding conversion using the encodings of the JVM and of the DBMS: if some characters fall outside the destination encoding then they are lost (i.e. converted in something completely different). The only 'mysterious' thing for me now is the behavior on Oracle (JDBC can read&write accented vowels even if they are outside ascii7)... any idea? Maybe the Oracle driver is smarter than the DB2 Universal Driver... Thanks everyone, Andrea |
|
#19
| |||
| |||
| Andrea wrote: >> What is source? How do you create that from the JDBC- >> resultset? > > I tried: > InputStream source = new > ByteArrayInputStream(stringFetchedFromDB.getBytes( )); getBytes() uses the system-encoding for generating the byte-array. Why do you generate an InputStream anyway? What you want to do is OutputStreamWriter osw = new OutputStreamWriter( output, "8859_1"); osw.write(resultset.getString("mycolumn")); BTW: The Euro is not part of ISO-8859-1, so it will get lost that way anyway. Regards, Lothar -- Lothar Kimmeringer E-Mail: spamfang@kimmeringer.de PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81) Always remember: The answer is forty-two, there can only be wrong questions! |
|
#20
| |||
| |||
| Andrea wrote: > Hi everyone, > sorry for my previous double-post (a mistake). > >> Is is possible to ask the database driver to do the conversions for >> you? Perhaps internally it is Unicode or some other encoding that can >> deal with Euros. > I've checked the properties of the JDBC driver I use (http:// > publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/ > com.ibm.db2.udb.doc/ad/rjvdsprp.htm) but there's nothing concerning > encoding conversions. > >> We have the clue that C++ programs seem to store euro s and get them back out. > Yes we have C and COBOL programs that can store and write non-IBM850 > chars without problems too. > As pointed out by Sabine in her post the reason may be that C programs > work with the pure sequences of bytes, without performing any encoding > conversion. > >>> I do not really understand why a Euro sign would work with 8859-1 since >>> that does not contain that character as far as I am aware of. > > SORRY SORRY SORRY SORRY SORRY > I tried to insert (through JDBC) the EURO character in a DB2 > configured with > ... > Database territory = C > Database code page = 819 > Database code set = ISO8859-1 > ... > and I can't neither write nor read in Java the EURO character > correctly :-( > A COBOL program works instead correctly. > > Then I tried the same thing on a SQL-Server 2000 instance with > collation compatibility_51_409_30003 (correponding to a 1252 codepage, > i.e. Latin 1) and I can store and read the EURO character via > Java&JDBC. > > That doesn't work in Java with Oracle 10g configured with > ... > NLS_LANGUAGE = AMERICAN > NLS_TERRITORY = AMERICA > NLS_CHARACTERSET = US7ASCII > NLS_LENGTH_SEMANTICS = BYTE > ... > store&read through COBOL is ok, and in Java I can even write&read > accented vowels... even if those characters are outside USASCII7... > >> You could do an experiment. Try feeding your database all possible >> unicode chars in a set of 1-char records, and see which ones come back >> unmangled. This is a kludge, but you could preconvert your Euro to >> one of those invariant unused chars. > The EURO character is just an example and part of the problem, I can't > use this type of kludges. > The specific problem is much more complex: a password is crypted and > stored to DB with a C program but the crypted chars fall outside > IBM850 range and in Java I'm unable to read and decrypt back the > string... this works if the database is ISO-8859-1 (that's why I > though I were able to write another 'weird' char, the euro char, on an > ISO-8859-1 DB, sorry...). I've also the more general problem of data > entry: I don't know wich characters users will insert so I can't > substitute chars. > I've found a workaround for my crypting problem but I'm just trying to > understand the reason of the problem. > > Now it's clear to me that with a CHAR field Java performs an encoding > conversion using the encodings of the JVM and of the DBMS: if some > characters fall outside the destination encoding then they are lost > (i.e. converted in something completely different). > The only 'mysterious' thing for me now is the behavior on Oracle (JDBC > can read&write accented vowels even if they are outside ascii7)... any > idea? Maybe the Oracle driver is smarter than the DB2 Universal > Driver... > > Thanks everyone, > Andrea Hello Andrea, Even if you set a database encoding to ASCII it is very unlikely that the DB will strip non-ASCII characters. Actually, most databases treat every byte-size (ie 8-bit) encoding almost identically internally. They may sometimes have different default collations but that is about it. The codepage attribute is mostly important for programs interfacing with the DB. As most of those (especially older ones) are encoding unaware also bytes pass in and out inharmed. In the end all 8-bit encodings are equal until actually interpreted to represent characters, aren't they? I have seen application running on cp-1252 platforms using 8859-1 encoded databases for years without anyone noticing. Same for cp-1257 on a cp-1252 database. Nobody realy cares when the same data that was put in comes out again. This is not unlike SMTP which is supposed to be 7-bit only but since the transport encoding passes 8-bit characters freely people are used to sending non-ascii characters in plain-text emails although this is not supported. This all works great until someone from Lithuania sends me an email (I am in the Netherlands). Regards, Silvio |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.