| Register | FAQ | Calendar | Search | Today's Posts | Mark Forums Read |
|
#21
| |||
| |||
| Hi Lothar, >> I tried: >> InputStream source = new >> ByteArrayInputStream(stringFetchedFromDB.getBytes( )); >getBytes() uses the system-encoding for generating the >byte-array. Why do you generate an InputStream anyway? I was just a "desperate" attempt... :-) I tried almost everything... >What you want to do is >OutputStreamWriter osw = new OutputStreamWriter(output, "8859_1"); >osw.write(resultset.getString("mycolumn")); doesn't work... Hi Silvio, >Hello Andrea, >Even if you set a database encoding to ASCII it is very unlikely that >the DB will strip non-ASCII characters. > .... Yes now this is clear to me, thanks! I was thinking only about the DB encoding while the problem is mainly in the JVM encoding (now it's clear to me that Java can't handle characters outside the encoding of the JVM, I wasn't thinking about it, sorry...). I've made another test: I've exported the content of the table with the crypted password and I've found that the password I can't decrypt back contains characters between 0x80 and 0x9F, which are control characters in ISO-8859-1 and Java - reads garbage with DB2 configured with IBM850 and JVM ISO-8859-1 - reads correctly with both DB2 and JVM configured with ISO-8859-1 I understand the first behavior but the last point is strange... Java (with some "magic" :-) is doing the proper conversion if the db is iso-8859-1 but I can't understand how... I will test it again and let you know if I find something. Thanks again everyone! Andrea |
|
#22
| |||
| |||
| Andrea wrote: > > I've made another test: I've exported the content of the table with > the crypted password and I've found that the password I can't decrypt > back contains characters between 0x80 and 0x9F, which are control > characters in ISO-8859-1 and Java > - reads garbage with DB2 configured with IBM850 and JVM ISO-8859-1 > - reads correctly with both DB2 and JVM configured with ISO-8859-1 > > I understand the first behavior but the last point is strange... Java > (with some "magic" :-) is doing the proper conversion if the db is > iso-8859-1 but I can't understand how... I will test it again and let > you know if I find something. > > Thanks again everyone! > > Andrea > I'm guessing, but maybe if the databse tells the JDBC driver it's ISO-8859-1 *and* your application tells it the same encoding, it won't bother trying to transform anything... -- Sabine Dinis Blochberger Op3racional www.op3racional.eu |
|
#23
| |||
| |||
| Andrea wrote: > I was thinking only about the DB encoding while the problem is mainly > in the JVM encoding (now it's clear to me that Java can't handle > characters outside the encoding of the JVM, I wasn't thinking about > it, sorry...). "The encoding of the JVM" is Unicode-16 with surrogate pairs; every Unicode character is representable in the JVM, including the Euro character. There is no Unicode character that the JVM cannot represent. -- Lew |
|
#24
| |||
| |||
| Hi Sabine, >I'm guessing, but maybe if the databse tells the JDBC driver it's >ISO-8859-1 *and* your application tells it the same encoding, it won't >bother trying to transform anything... Yes that's what I was thinking too... but I tried to change the encoding of the JVM (tried Cp850, ...) but it keeps on working... Hi Lew, >> I was thinking only about the DB encoding while the problem is mainly >> in the JVM encoding (now it's clear to me that Java can't handle >> characters outside the encoding of the JVM, I wasn't thinking about >> it, sorry...). >"The encoding of the JVM" is Unicode-16 with surrogate pairs; every Unicode >character is representable in the JVM, including the Euro character. There is >no Unicode character that the JVM cannot represent. With "encoding of the JVM" I was referring to the file.encoding property used by the JVM. If the JVM runs with: - ISO-8859-1 then I can't read or write the EURO character to DB (it becomes garbage) and ISO-8859-1 doesn't include that character; - Cp1252 then I can read and write the EURO character to DB and Cp1252 includes that character. Andrea |
|
#25
| |||
| |||
| Andrea wrote: > Hi Sabine, >> I'm guessing, but maybe if the databse tells the JDBC driver it's >> ISO-8859-1 *and* your application tells it the same encoding, it won't >> bother trying to transform anything... > Yes that's what I was thinking too... but I tried to change the > encoding of the JVM (tried Cp850, ...) but it keeps on working... > > Hi Lew, >>> I was thinking only about the DB encoding while the problem is mainly >>> in the JVM encoding (now it's clear to me that Java can't handle >>> characters outside the encoding of the JVM, I wasn't thinking about >>> it, sorry...). >> "The encoding of the JVM" is Unicode-16 with surrogate pairs; every Unicode >> character is representable in the JVM, including the Euro character. There is >> no Unicode character that the JVM cannot represent. > With "encoding of the JVM" I was referring to the file.encoding > property used by the JVM. If the JVM runs with: > - ISO-8859-1 then I can't read or write the EURO character to DB (it > becomes garbage) and ISO-8859-1 doesn't include that character; > - Cp1252 then I can read and write the EURO character to DB and Cp1252 > includes that character. I uinderstand my confusion now - it stemmed from the phrase "the encoding of the JVM". The JVM itself only uses one encoding; it translates to and from other encoding on I/O. So to make sure I understood you correctly, were you referring to the encoding specified by the I/O call? Generally if the encoding you specify for I/O is different from the encoding in your data store, it will cause trouble. This is not limited to Java. Over in the Postgres newsgroups one finds people have trouble with character encoding from all sorts of platforms, mostly stemming from trying to store characters in a column that are not part of the specified character encoding for the DB. If such things don't match, then problems will hatch. -- Lew |
|
#26
| |||
| |||
| Hi Lew, > I uinderstand my confusion now - it stemmed from the phrase "the encoding of > the JVM". The JVM itself only uses one encoding; it translates to and from > other encoding on I/O. So to make sure I understood you correctly, were you > referring to the encoding specified by the I/O call? Maybe I don't understand :-( In my posts I tried to specify the encoding of the DBMS and the JVM encoding (i.e. the system property file.encoding) in the different cases and, as you stated, the JVM performs the necessary translations. In my JDBC calls I don't specify/force any encoding. > Generally if the encoding you specify for I/O is different from the encoding > in your data store, it will cause trouble. This is not limited to Java. Over > in the Postgres newsgroups one finds people have trouble with character > encoding from all sorts of platforms, mostly stemming from trying to store > characters in a column that are not part of the specified character encoding > for the DB. If such things don't match, then problems will hatch. I disagree... For our application we keep the DBMS with a fixed encoding and the application performs the necessary conversions. For instance for polish installations we use an ISO-8859-1 database and an application server configured with ISO-8859-2 where we store polish characters without problems. Andrea |
|
#27
| |||
| |||
| Andrea wrote: > Hi Lew, > >> I uinderstand my confusion now - it stemmed from the phrase "the encoding of >> the JVM". The JVM itself only uses one encoding; it translates to and from >> other encoding on I/O. So to make sure I understood you correctly, were you >> referring to the encoding specified by the I/O call? > Maybe I don't understand :-( > In my posts I tried to specify the encoding of the DBMS and the JVM > encoding (i.e. the system property file.encoding) in the different > cases and, as you stated, the JVM performs the necessary translations. > In my JDBC calls I don't specify/force any encoding. > > >> Generally if the encoding you specify for I/O is different from the encoding >> in your data store, it will cause trouble. This is not limited to Java. Over >> in the Postgres newsgroups one finds people have trouble with character >> encoding from all sorts of platforms, mostly stemming from trying to store >> characters in a column that are not part of the specified character encoding >> for the DB. If such things don't match, then problems will hatch. > I disagree... > For our application we keep the DBMS with a fixed encoding and the > application performs the necessary conversions. For instance for > polish installations we use an ISO-8859-1 database and an application > server configured with ISO-8859-2 where we store polish characters > without problems. > > Andrea Hello Andrea, Using an incomplete database encoding that does not match the application is a dangerous practice that only works by a coincidence. It is just because the encodings you mention are 8-bit complete that you can pass in and out 8-bit values untouched, even though they really can mean something different inside the DB from how you interpret them in your application. With little exception it is never a good idea to use a database encoding other then a complete encoding like UTF-8 (to be honest you should always use this in a DB if you are left the choice). If the applications are Java based you will have a perfect match that way and only when you do stuff like generating emails, plain-text files or Office documents (shivers) where incomplete encodings (code pages) come into the game you have to take extra measures. Using the platform encoding then is usually a good idea unless it is a server application in which case you have to make an educated guess. Regards, Silvio |
![]() |
| Thread Tools | |
| Display Modes | |
In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.