Encoding conversion problem

This is a discussion on Encoding conversion problem within the JDBC JAVA forums in Framework and Interface Programming category; Hi, I have a J2EE application which connects to a DB2 configured with code set IBM-850. The application works with encoding ISO-8859-1. If I save characters outside the range supported by IBM-850 (i.e. the euro currency character EURO) then I read garbage... I tried encoding conversions with InputStreamReader and OutputStreamWriter: .... BufferedReader reader = new BufferedReader(new InputStreamReader(source, "IBM850")); BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(output, "ISO-8859-1")); .... but that didn't work... My JVM Charset.availableCharsets() includes IBM850. What can I do? Thanks, in advance, Andrea...

Go Back   Application Development Forum > Framework and Interface Programming > JDBC JAVA

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 02-11-2008, 07:03 AM
Andrea
Guest
 
Default Encoding conversion problem

Hi,
I have a J2EE application which connects to a DB2 configured with code
set IBM-850. The application works with encoding ISO-8859-1.
If I save characters outside the range supported by IBM-850 (i.e. the
euro currency character EURO) then I read garbage...

I tried encoding conversions with InputStreamReader and
OutputStreamWriter:
....
BufferedReader reader = new BufferedReader(new
InputStreamReader(source, "IBM850"));
BufferedWriter writer = new BufferedWriter(new
OutputStreamWriter(output, "ISO-8859-1"));
....

but that didn't work...
My JVM Charset.availableCharsets() includes IBM850.

What can I do?

Thanks, in advance,
Andrea
Reply With Quote
  #2  
Old 02-12-2008, 02:13 AM
Lothar Kimmeringer
Guest
 
Default Re: Encoding conversion problem

Andrea wrote:

> I have a J2EE application which connects to a DB2 configured with code
> set IBM-850. The application works with encoding ISO-8859-1.


In general the JDBC-driver is aware of the encoding, the database
is using and is doing the conversion already if you access the
column by getString(columnName/index).

> I tried encoding conversions with InputStreamReader and
> OutputStreamWriter:
> ...
> BufferedReader reader = new BufferedReader(new
> InputStreamReader(source, "IBM850"));


What is source? How do you create that from the JDBC-
resultset?

> BufferedWriter writer = new BufferedWriter(new
> OutputStreamWriter(output, "ISO-8859-1"));


That looks OK.


Regards, Lothar
--
Lothar Kimmeringer E-Mail: spamfang@kimmeringer.de
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
Reply With Quote
  #3  
Old 02-12-2008, 03:25 AM
Andrea
Guest
 
Default Re: Encoding conversion problem

Hi Lothar,

> > I have a J2EE application which connects to a DB2 configured with code
> > set IBM-850. The application works with encoding ISO-8859-1.

>
> In general the JDBC-driver is aware of the encoding, the database
> is using and is doing the conversion already if you access the
> column by getString(columnName/index).

Yes I fetch the string with Resultset.getString(index). I use DB2
Universal Driver with a type 4 connection.


> > I tried encoding conversions with InputStreamReader and OutputStreamWriter:
> > ...
> > BufferedReader reader = new BufferedReader(new
> > InputStreamReader(source, "IBM850"));

>
> What is source? How do you create that from the JDBC-
> resultset?


I tried:
InputStream source = new
ByteArrayInputStream(stringFetchedFromDB.getBytes( ));

Thanks,
Andrea


Reply With Quote
  #4  
Old 02-12-2008, 04:33 AM
Sabine Dinis Blochberger
Guest
 
Default Re: Encoding conversion problem

Andrea wrote:

> Hi,
> I have a J2EE application which connects to a DB2 configured with code
> set IBM-850. The application works with encoding ISO-8859-1.
> If I save characters outside the range supported by IBM-850 (i.e. the
> euro currency character EURO) then I read garbage...


Yes, the Euro symbol is not part of the encodings, so your database
can't contain it. If you need it, you would have to change the databases
encoding (ISO-8859-15 includes the Euro symbol).

Otherwise, you have to take care not to try to write unsupported
character into string/character fields.

One solution could be to parse all strings and replace the symbol with
the shorthand "EUR", but it might not be acceptable to your client.
--
Sabine Dinis Blochberger

Op3racional
www.op3racional.eu
Reply With Quote
  #5  
Old 02-12-2008, 06:22 AM
Andrea
Guest
 
Default Re: Encoding conversion problem

> > ...
> > If I save characters outside the range supported by IBM-850 (i.e. the
> > euro currency character EURO) then I read garbage...

>
> Yes, the Euro symbol is not part of the encodings, so your database
> can't contain it.

I've found a strange thing: C and COBOL application can write and read
(using embedded SQL) characters outside the accepted range without
problems... So the database can contain those characters without
loosing any information, but I can't understand how...

> If you need it, you would have to change the databases
> encoding (ISO-8859-15 includes the Euro symbol).
> Otherwise, you have to take care not to try to write unsupported
> character into string/character fields.
>
> One solution could be to parse all strings and replace the symbol with
> the shorthand "EUR", but it might not be acceptable to your client.

Actually the EURO character is just an example, I have more complex
strings to handle (and I can't change the encoding of the database).
If my problem has no solution at all then I'd like to understand why
other languages don't have this problem...

Thanks,
Andrea
Reply With Quote
  #6  
Old 02-12-2008, 08:02 AM
Sabine Dinis Blochberger
Guest
 
Default Re: Encoding conversion problem

Andrea wrote:

> > > ...
> > > If I save characters outside the range supported by IBM-850 (i.e. the
> > > euro currency character EURO) then I read garbage...

> >
> > Yes, the Euro symbol is not part of the encodings, so your database
> > can't contain it.

> I've found a strange thing: C and COBOL application can write and read
> (using embedded SQL) characters outside the accepted range without
> problems... So the database can contain those characters without
> loosing any information, but I can't understand how...
>

Yes, in theory you can store any value (0 - 255 in case of one byte
strings) in a string, but how that is interpreted (i.e. encoding) is
where it gets hairy. Also, multibyte characters would break the
interpretation.

> > If you need it, you would have to change the databases
> > encoding (ISO-8859-15 includes the Euro symbol).
> > Otherwise, you have to take care not to try to write unsupported
> > character into string/character fields.
> >
> > One solution could be to parse all strings and replace the symbol with
> > the shorthand "EUR", but it might not be acceptable to your client.

> Actually the EURO character is just an example, I have more complex
> strings to handle (and I can't change the encoding of the database).
> If my problem has no solution at all then I'd like to understand why
> other languages don't have this problem...
>

Ah, there is always hacks around limitations. But they aren't usually
pretty. The problem is to funnel a string with these "unsupported"
characters through the JDBC driver (both ways).

You might get around it by using typeless fields (you can put any byte
sequence there), like BLOBS maybe...

Or you write a parser that substitutes the impossible characters with
acceptable replacements. Of course, this is most likele not feasable.

But the customer has to be aware that a database with encoding X can
only hold strings encoded in X. If they need UTF-8 for example now, they
will eventually have to change their database. And it would be better to
migrate to a suitable encoding than to hack around it and in a few
years, have to do all over again (and then some), when they finally do
want to change the database encoding.

On other languages not having the problem, in C, you can treat a string
just like an array of bytes and use those for whatever you like, the
compiler won't complain. Even interpreting them as memory addresses is
possible, adding and subtracting etc...

> Thanks,
> Andrea


--
Sabine Dinis Blochberger

Op3racional
www.op3racional.eu
Reply With Quote
  #7  
Old 02-12-2008, 09:33 AM
Andrea
Guest
 
Default Re: Encoding conversion problem

Hi Sabine,
thank you for your explanation, now the overall situation is much more
clear to me.

Thanks,
Andrea
Reply With Quote
  #8  
Old 02-12-2008, 01:07 PM
Roedy Green
Guest
 
Default Re: Encoding conversion problem

On Mon, 11 Feb 2008 04:03:47 -0800 (PST), Andrea
<tol7481@iperbole.bologna.it> wrote, quoted or indirectly quoted
someone who said :

>I have a J2EE application which connects to a DB2 configured with code
>set IBM-850. The application works with encoding ISO-8859-1.
>If I save characters outside the range supported by IBM-850 (i.e. the
>euro currency character EURO) then I read garbage...


First, make sure the data are truly encoded in IBM-850.
See http://mindprod.com/applet/encodingrecogniser.html

If there are characters int that file outside the range of IBM-850,
then by definition the file is not encoded in IBM-850 and you SHOULD
expect garbage.

You can write your own translate program to handle the excess chars.

see http://mindprod.com/jgloss/encoding.html

I don't know how to hook it in as an official encoding, but that is
not necessary.
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
Reply With Quote
  #9  
Old 02-12-2008, 01:10 PM
Roedy Green
Guest
 
Default Re: Encoding conversion problem

On Mon, 11 Feb 2008 04:03:47 -0800 (PST), Andrea
<tol7481@iperbole.bologna.it> wrote, quoted or indirectly quoted
someone who said :

>BufferedReader reader = new BufferedReader(new
>InputStreamReader(source, "IBM850"));
>BufferedWriter writer = new BufferedWriter(new
>OutputStreamWriter(output, "ISO-8859-1"));


Your first task is to find out just what you are being handed before
you start fooling around with translations.

Unicode, IBM850, ISO-8859-1, something else?
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
Reply With Quote
  #10  
Old 02-13-2008, 06:22 AM
Andrea
Guest
 
Default Re: Encoding conversion problem

Hi Roedy,
the database (DB2) has this configuration:
....
Database territory = US
Database code page = 850
Database code set = IBM-850
....

I've exported to a file the content of a table with a CHAR(N) field
containing the EURO currency character, then I've opened the file with
EncodingRecognizer: if I choose IBM850 I see a strange character (like
a small X), if I choose ISO-8859-1 I see a square.

I tried a translation with:

String problematicString = rs.getString(index);
problematicString = new String(problematicString, "IBM850"); // Am I
correct?

but I still get garbage :-(


Thanks,
Andrea
Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 06:29 PM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.