Encoding conversion problem

This is a discussion on Encoding conversion problem within the JDBC JAVA forums in Framework and Interface Programming category; Hi Lothar, >> I tried: >> InputStream source = new >> ByteArrayInputStream(stringFetchedFromDB.getBytes( )); >getBytes() uses the system-encoding for generating the >byte-array. Why do you generate an InputStream anyway? I was just a "desperate" attempt... :-) I tried almost everything... >What you want to do is >OutputStreamWriter osw = new OutputStreamWriter(output, "8859_1"); >osw.write(resultset.getString("mycolumn")); doesn't work... Hi Silvio, >Hello Andrea, >Even if you set a database encoding to ASCII it is very unlikely that >the DB will strip non-ASCII characters. > .... Yes now this is clear to me, thanks! I was thinking only about the DB encoding while the problem is ...

Go Back   Application Development Forum > Framework and Interface Programming > JDBC JAVA

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #21  
Old 02-15-2008, 06:26 AM
Andrea
Guest
 
Default Re: Encoding conversion problem

Hi Lothar,
>> I tried:
>> InputStream source = new
>> ByteArrayInputStream(stringFetchedFromDB.getBytes( ));

>getBytes() uses the system-encoding for generating the
>byte-array. Why do you generate an InputStream anyway?

I was just a "desperate" attempt... :-)
I tried almost everything...

>What you want to do is
>OutputStreamWriter osw = new OutputStreamWriter(output, "8859_1");
>osw.write(resultset.getString("mycolumn"));

doesn't work...

Hi Silvio,
>Hello Andrea,
>Even if you set a database encoding to ASCII it is very unlikely that
>the DB will strip non-ASCII characters.
> ....

Yes now this is clear to me, thanks!

I was thinking only about the DB encoding while the problem is mainly
in the JVM encoding (now it's clear to me that Java can't handle
characters outside the encoding of the JVM, I wasn't thinking about
it, sorry...).

I've made another test: I've exported the content of the table with
the crypted password and I've found that the password I can't decrypt
back contains characters between 0x80 and 0x9F, which are control
characters in ISO-8859-1 and Java
- reads garbage with DB2 configured with IBM850 and JVM ISO-8859-1
- reads correctly with both DB2 and JVM configured with ISO-8859-1

I understand the first behavior but the last point is strange... Java
(with some "magic" :-) is doing the proper conversion if the db is
iso-8859-1 but I can't understand how... I will test it again and let
you know if I find something.

Thanks again everyone!

Andrea



Reply With Quote
  #22  
Old 02-15-2008, 07:02 AM
Sabine Dinis Blochberger
Guest
 
Default Re: Encoding conversion problem

Andrea wrote:

>
> I've made another test: I've exported the content of the table with
> the crypted password and I've found that the password I can't decrypt
> back contains characters between 0x80 and 0x9F, which are control
> characters in ISO-8859-1 and Java
> - reads garbage with DB2 configured with IBM850 and JVM ISO-8859-1
> - reads correctly with both DB2 and JVM configured with ISO-8859-1
>
> I understand the first behavior but the last point is strange... Java
> (with some "magic" :-) is doing the proper conversion if the db is
> iso-8859-1 but I can't understand how... I will test it again and let
> you know if I find something.
>
> Thanks again everyone!
>
> Andrea
>

I'm guessing, but maybe if the databse tells the JDBC driver it's
ISO-8859-1 *and* your application tells it the same encoding, it won't
bother trying to transform anything...

--
Sabine Dinis Blochberger

Op3racional
www.op3racional.eu
Reply With Quote
  #23  
Old 02-15-2008, 08:10 AM
Lew
Guest
 
Default Re: Encoding conversion problem

Andrea wrote:
> I was thinking only about the DB encoding while the problem is mainly
> in the JVM encoding (now it's clear to me that Java can't handle
> characters outside the encoding of the JVM, I wasn't thinking about
> it, sorry...).


"The encoding of the JVM" is Unicode-16 with surrogate pairs; every Unicode
character is representable in the JVM, including the Euro character. There is
no Unicode character that the JVM cannot represent.

--
Lew
Reply With Quote
  #24  
Old 02-15-2008, 09:51 AM
Andrea
Guest
 
Default Re: Encoding conversion problem

Hi Sabine,
>I'm guessing, but maybe if the databse tells the JDBC driver it's
>ISO-8859-1 *and* your application tells it the same encoding, it won't
>bother trying to transform anything...

Yes that's what I was thinking too... but I tried to change the
encoding of the JVM (tried Cp850, ...) but it keeps on working...

Hi Lew,
>> I was thinking only about the DB encoding while the problem is mainly
>> in the JVM encoding (now it's clear to me that Java can't handle
>> characters outside the encoding of the JVM, I wasn't thinking about
>> it, sorry...).

>"The encoding of the JVM" is Unicode-16 with surrogate pairs; every Unicode
>character is representable in the JVM, including the Euro character. There is
>no Unicode character that the JVM cannot represent.

With "encoding of the JVM" I was referring to the file.encoding
property used by the JVM. If the JVM runs with:
- ISO-8859-1 then I can't read or write the EURO character to DB (it
becomes garbage) and ISO-8859-1 doesn't include that character;
- Cp1252 then I can read and write the EURO character to DB and Cp1252
includes that character.

Andrea
Reply With Quote
  #25  
Old 02-15-2008, 10:00 AM
Lew
Guest
 
Default Re: Encoding conversion problem

Andrea wrote:
> Hi Sabine,
>> I'm guessing, but maybe if the databse tells the JDBC driver it's
>> ISO-8859-1 *and* your application tells it the same encoding, it won't
>> bother trying to transform anything...

> Yes that's what I was thinking too... but I tried to change the
> encoding of the JVM (tried Cp850, ...) but it keeps on working...
>
> Hi Lew,
>>> I was thinking only about the DB encoding while the problem is mainly
>>> in the JVM encoding (now it's clear to me that Java can't handle
>>> characters outside the encoding of the JVM, I wasn't thinking about
>>> it, sorry...).

>> "The encoding of the JVM" is Unicode-16 with surrogate pairs; every Unicode
>> character is representable in the JVM, including the Euro character. There is
>> no Unicode character that the JVM cannot represent.

> With "encoding of the JVM" I was referring to the file.encoding
> property used by the JVM. If the JVM runs with:
> - ISO-8859-1 then I can't read or write the EURO character to DB (it
> becomes garbage) and ISO-8859-1 doesn't include that character;
> - Cp1252 then I can read and write the EURO character to DB and Cp1252
> includes that character.


I uinderstand my confusion now - it stemmed from the phrase "the encoding of
the JVM". The JVM itself only uses one encoding; it translates to and from
other encoding on I/O. So to make sure I understood you correctly, were you
referring to the encoding specified by the I/O call?

Generally if the encoding you specify for I/O is different from the encoding
in your data store, it will cause trouble. This is not limited to Java. Over
in the Postgres newsgroups one finds people have trouble with character
encoding from all sorts of platforms, mostly stemming from trying to store
characters in a column that are not part of the specified character encoding
for the DB. If such things don't match, then problems will hatch.

--
Lew
Reply With Quote
  #26  
Old 02-15-2008, 12:02 PM
Andrea
Guest
 
Default Re: Encoding conversion problem

Hi Lew,

> I uinderstand my confusion now - it stemmed from the phrase "the encoding of
> the JVM". The JVM itself only uses one encoding; it translates to and from
> other encoding on I/O. So to make sure I understood you correctly, were you
> referring to the encoding specified by the I/O call?

Maybe I don't understand :-(
In my posts I tried to specify the encoding of the DBMS and the JVM
encoding (i.e. the system property file.encoding) in the different
cases and, as you stated, the JVM performs the necessary translations.
In my JDBC calls I don't specify/force any encoding.


> Generally if the encoding you specify for I/O is different from the encoding
> in your data store, it will cause trouble. This is not limited to Java. Over
> in the Postgres newsgroups one finds people have trouble with character
> encoding from all sorts of platforms, mostly stemming from trying to store
> characters in a column that are not part of the specified character encoding
> for the DB. If such things don't match, then problems will hatch.

I disagree...
For our application we keep the DBMS with a fixed encoding and the
application performs the necessary conversions. For instance for
polish installations we use an ISO-8859-1 database and an application
server configured with ISO-8859-2 where we store polish characters
without problems.

Andrea
Reply With Quote
  #27  
Old 02-15-2008, 12:16 PM
Silvio Bierman
Guest
 
Default Re: Encoding conversion problem

Andrea wrote:
> Hi Lew,
>
>> I uinderstand my confusion now - it stemmed from the phrase "the encoding of
>> the JVM". The JVM itself only uses one encoding; it translates to and from
>> other encoding on I/O. So to make sure I understood you correctly, were you
>> referring to the encoding specified by the I/O call?

> Maybe I don't understand :-(
> In my posts I tried to specify the encoding of the DBMS and the JVM
> encoding (i.e. the system property file.encoding) in the different
> cases and, as you stated, the JVM performs the necessary translations.
> In my JDBC calls I don't specify/force any encoding.
>
>
>> Generally if the encoding you specify for I/O is different from the encoding
>> in your data store, it will cause trouble. This is not limited to Java. Over
>> in the Postgres newsgroups one finds people have trouble with character
>> encoding from all sorts of platforms, mostly stemming from trying to store
>> characters in a column that are not part of the specified character encoding
>> for the DB. If such things don't match, then problems will hatch.

> I disagree...
> For our application we keep the DBMS with a fixed encoding and the
> application performs the necessary conversions. For instance for
> polish installations we use an ISO-8859-1 database and an application
> server configured with ISO-8859-2 where we store polish characters
> without problems.
>
> Andrea


Hello Andrea,

Using an incomplete database encoding that does not match the
application is a dangerous practice that only works by a coincidence. It
is just because the encodings you mention are 8-bit complete that you
can pass in and out 8-bit values untouched, even though they really can
mean something different inside the DB from how you interpret them in
your application.
With little exception it is never a good idea to use a database encoding
other then a complete encoding like UTF-8 (to be honest you should
always use this in a DB if you are left the choice).
If the applications are Java based you will have a perfect match that
way and only when you do stuff like generating emails, plain-text files
or Office documents (shivers) where incomplete encodings (code pages)
come into the game you have to take extra measures. Using the platform
encoding then is usually a good idea unless it is a server application
in which case you have to make an educated guess.

Regards,

Silvio
Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 08:02 PM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.