Encoding conversion problem - JDBC JAVA

This is a discussion on Encoding conversion problem - JDBC JAVA ; Hi, I have a J2EE application which connects to a DB2 configured with code set IBM-850. The application works with encoding ISO-8859-1. If I save characters outside the range supported by IBM-850 (i.e. the euro currency character EURO) then I ...

+ Reply to Thread
Page 1 of 3 1 2 3 LastLast
Results 1 to 10 of 27

Encoding conversion problem

  1. Default Encoding conversion problem

    Hi,
    I have a J2EE application which connects to a DB2 configured with code
    set IBM-850. The application works with encoding ISO-8859-1.
    If I save characters outside the range supported by IBM-850 (i.e. the
    euro currency character EURO) then I read garbage...

    I tried encoding conversions with InputStreamReader and
    OutputStreamWriter:
    ....
    BufferedReader reader = new BufferedReader(new
    InputStreamReader(source, "IBM850"));
    BufferedWriter writer = new BufferedWriter(new
    OutputStreamWriter(output, "ISO-8859-1"));
    ....

    but that didn't work...
    My JVM Charset.availableCharsets() includes IBM850.

    What can I do?

    Thanks, in advance,
    Andrea

  2. Default Re: Encoding conversion problem

    Andrea wrote:

    > I have a J2EE application which connects to a DB2 configured with code
    > set IBM-850. The application works with encoding ISO-8859-1.


    In general the JDBC-driver is aware of the encoding, the database
    is using and is doing the conversion already if you access the
    column by getString(columnName/index).

    > I tried encoding conversions with InputStreamReader and
    > OutputStreamWriter:
    > ...
    > BufferedReader reader = new BufferedReader(new
    > InputStreamReader(source, "IBM850"));


    What is source? How do you create that from the JDBC-
    resultset?

    > BufferedWriter writer = new BufferedWriter(new
    > OutputStreamWriter(output, "ISO-8859-1"));


    That looks OK.


    Regards, Lothar
    --
    Lothar Kimmeringer E-Mail: spamfang@kimmeringer.de
    PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

    Always remember: The answer is forty-two, there can only be wrong
    questions!

  3. Default Re: Encoding conversion problem

    Hi Lothar,

    > > I have a J2EE application which connects to a DB2 configured with code
    > > set IBM-850. The application works with encoding ISO-8859-1.

    >
    > In general the JDBC-driver is aware of the encoding, the database
    > is using and is doing the conversion already if you access the
    > column by getString(columnName/index).

    Yes I fetch the string with Resultset.getString(index). I use DB2
    Universal Driver with a type 4 connection.


    > > I tried encoding conversions with InputStreamReader and OutputStreamWriter:
    > > ...
    > > BufferedReader reader = new BufferedReader(new
    > > InputStreamReader(source, "IBM850"));

    >
    > What is source? How do you create that from the JDBC-
    > resultset?


    I tried:
    InputStream source = new
    ByteArrayInputStream(stringFetchedFromDB.getBytes());

    Thanks,
    Andrea



  4. Default Re: Encoding conversion problem

    Andrea wrote:

    > Hi,
    > I have a J2EE application which connects to a DB2 configured with code
    > set IBM-850. The application works with encoding ISO-8859-1.
    > If I save characters outside the range supported by IBM-850 (i.e. the
    > euro currency character EURO) then I read garbage...


    Yes, the Euro symbol is not part of the encodings, so your database
    can't contain it. If you need it, you would have to change the databases
    encoding (ISO-8859-15 includes the Euro symbol).

    Otherwise, you have to take care not to try to write unsupported
    character into string/character fields.

    One solution could be to parse all strings and replace the symbol with
    the shorthand "EUR", but it might not be acceptable to your client.
    --
    Sabine Dinis Blochberger

    Op3racional
    www.op3racional.eu

  5. Default Re: Encoding conversion problem

    > > ...
    > > If I save characters outside the range supported by IBM-850 (i.e. the
    > > euro currency character EURO) then I read garbage...

    >
    > Yes, the Euro symbol is not part of the encodings, so your database
    > can't contain it.

    I've found a strange thing: C and COBOL application can write and read
    (using embedded SQL) characters outside the accepted range without
    problems... So the database can contain those characters without
    loosing any information, but I can't understand how...

    > If you need it, you would have to change the databases
    > encoding (ISO-8859-15 includes the Euro symbol).
    > Otherwise, you have to take care not to try to write unsupported
    > character into string/character fields.
    >
    > One solution could be to parse all strings and replace the symbol with
    > the shorthand "EUR", but it might not be acceptable to your client.

    Actually the EURO character is just an example, I have more complex
    strings to handle (and I can't change the encoding of the database).
    If my problem has no solution at all then I'd like to understand why
    other languages don't have this problem...

    Thanks,
    Andrea

  6. Default Re: Encoding conversion problem

    Andrea wrote:

    > > > ...
    > > > If I save characters outside the range supported by IBM-850 (i.e. the
    > > > euro currency character EURO) then I read garbage...

    > >
    > > Yes, the Euro symbol is not part of the encodings, so your database
    > > can't contain it.

    > I've found a strange thing: C and COBOL application can write and read
    > (using embedded SQL) characters outside the accepted range without
    > problems... So the database can contain those characters without
    > loosing any information, but I can't understand how...
    >

    Yes, in theory you can store any value (0 - 255 in case of one byte
    strings) in a string, but how that is interpreted (i.e. encoding) is
    where it gets hairy. Also, multibyte characters would break the
    interpretation.

    > > If you need it, you would have to change the databases
    > > encoding (ISO-8859-15 includes the Euro symbol).
    > > Otherwise, you have to take care not to try to write unsupported
    > > character into string/character fields.
    > >
    > > One solution could be to parse all strings and replace the symbol with
    > > the shorthand "EUR", but it might not be acceptable to your client.

    > Actually the EURO character is just an example, I have more complex
    > strings to handle (and I can't change the encoding of the database).
    > If my problem has no solution at all then I'd like to understand why
    > other languages don't have this problem...
    >

    Ah, there is always hacks around limitations. But they aren't usually
    pretty. The problem is to funnel a string with these "unsupported"
    characters through the JDBC driver (both ways).

    You might get around it by using typeless fields (you can put any byte
    sequence there), like BLOBS maybe...

    Or you write a parser that substitutes the impossible characters with
    acceptable replacements. Of course, this is most likele not feasable.

    But the customer has to be aware that a database with encoding X can
    only hold strings encoded in X. If they need UTF-8 for example now, they
    will eventually have to change their database. And it would be better to
    migrate to a suitable encoding than to hack around it and in a few
    years, have to do all over again (and then some), when they finally do
    want to change the database encoding.

    On other languages not having the problem, in C, you can treat a string
    just like an array of bytes and use those for whatever you like, the
    compiler won't complain. Even interpreting them as memory addresses is
    possible, adding and subtracting etc...

    > Thanks,
    > Andrea


    --
    Sabine Dinis Blochberger

    Op3racional
    www.op3racional.eu

  7. Default Re: Encoding conversion problem

    Hi Sabine,
    thank you for your explanation, now the overall situation is much more
    clear to me.

    Thanks,
    Andrea

  8. Default Re: Encoding conversion problem

    On Mon, 11 Feb 2008 04:03:47 -0800 (PST), Andrea
    <tol7481@iperbole.bologna.it> wrote, quoted or indirectly quoted
    someone who said :

    >I have a J2EE application which connects to a DB2 configured with code
    >set IBM-850. The application works with encoding ISO-8859-1.
    >If I save characters outside the range supported by IBM-850 (i.e. the
    >euro currency character EURO) then I read garbage...


    First, make sure the data are truly encoded in IBM-850.
    See http://mindprod.com/applet/encodingrecogniser.html

    If there are characters int that file outside the range of IBM-850,
    then by definition the file is not encoded in IBM-850 and you SHOULD
    expect garbage.

    You can write your own translate program to handle the excess chars.

    see http://mindprod.com/jgloss/encoding.html

    I don't know how to hook it in as an official encoding, but that is
    not necessary.
    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com

  9. Default Re: Encoding conversion problem

    On Mon, 11 Feb 2008 04:03:47 -0800 (PST), Andrea
    <tol7481@iperbole.bologna.it> wrote, quoted or indirectly quoted
    someone who said :

    >BufferedReader reader = new BufferedReader(new
    >InputStreamReader(source, "IBM850"));
    >BufferedWriter writer = new BufferedWriter(new
    >OutputStreamWriter(output, "ISO-8859-1"));


    Your first task is to find out just what you are being handed before
    you start fooling around with translations.

    Unicode, IBM850, ISO-8859-1, something else?
    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com

  10. Default Re: Encoding conversion problem

    Hi Roedy,
    the database (DB2) has this configuration:
    ....
    Database territory = US
    Database code page = 850
    Database code set = IBM-850
    ....

    I've exported to a file the content of a table with a CHAR(N) field
    containing the EURO currency character, then I've opened the file with
    EncodingRecognizer: if I choose IBM850 I see a strange character (like
    a small X), if I choose ISO-8859-1 I see a square.

    I tried a translation with:

    String problematicString = rs.getString(index);
    problematicString = new String(problematicString, "IBM850"); // Am I
    correct?

    but I still get garbage :-(


    Thanks,
    Andrea

+ Reply to Thread
Page 1 of 3 1 2 3 LastLast