GetByte adding an extra byte? - CSharp

This is a discussion on GetByte adding an extra byte? - CSharp ; string USMesg; USMesg = "¬Credits¬Remaining¬"; byte[] bArray = Encoding.UTF8.GetBytes(USMesg); After I execute the code, the first two bytes are: [0] = 0xC2 [1] = 0xAC //This is the character I was expecting Why has an extra byte been inserted after ...

+ Reply to Thread
Results 1 to 8 of 8

GetByte adding an extra byte?

  1. Default GetByte adding an extra byte?

    string USMesg;

    USMesg = "¬Credits¬Remaining¬";

    byte[] bArray = Encoding.UTF8.GetBytes(USMesg);

    After I execute the code, the first two bytes are:

    [0] = 0xC2
    [1] = 0xAC //This is the character I was expecting

    Why has an extra byte been inserted after each ¬?

    I tried using Encoding.ASCII.GetBytes() but that translates my ¬ to a
    0x3F (Question mark ?)

    Anyone any idea whats happened and how I can get round this problem?

    Regards,

    Steven


    *** Sent via Developersdex http://www.developersdex.com ***

  2. Default Re: GetByte adding an extra byte?

    Hi,

    Unicode characters use two bytes.

    "Steven Blair" <steven.blair@btinternet.com> wrote in message
    news:e9aI%23AM5HHA.1208@TK2MSFTNGP03.phx.gbl...
    > string USMesg;
    >
    > USMesg = "¬Credits¬Remaining¬";
    >
    > byte[] bArray = Encoding.UTF8.GetBytes(USMesg);
    >
    > After I execute the code, the first two bytes are:
    >
    > [0] = 0xC2
    > [1] = 0xAC //This is the character I was expecting
    >
    > Why has an extra byte been inserted after each ¬?
    >
    > I tried using Encoding.ASCII.GetBytes() but that translates my ¬ to a
    > 0x3F (Question mark ?)
    >
    > Anyone any idea whats happened and how I can get round this problem?
    >
    > Regards,
    >
    > Steven
    >
    >
    > *** Sent via Developersdex http://www.developersdex.com ***




  3. Default Re: GetByte adding an extra byte?


    On Aug 22, 3:27 pm, Steven Blair <steven.bl...@btinternet.com> wrote:

    > USMesg = "¬Credits¬Remaining¬";
    > byte[] bArray = Encoding.UTF8.GetBytes(USMesg);
    > [1] = 0xAC //This is the character I was expecting
    > Why has an extra byte been inserted after each ¬?


    This is how UTF-8 works. I assume that when/if you review the UTF-8
    specifications, you will find that the character "¬" is to be
    represented as 0xC2AC in this particular scenario.

    > I tried using Encoding.ASCII.GetBytes()


    ASCII and UTF-8 are not interchangeable.

    > Anyone any idea whats happened and how I can get round this problem?


    What is happening is that characters are being converted to bytes,
    using the character encoding you specify. The process of going
    between actual characters and bits is very complex.

    Perhaps what you want is Encoding.Default.GetBytes()? This will use
    the system default ANSI codepage (in your case Windows-1252, which
    internally means ISO-8859-1 (aka "Latin-1" or "Western European")).
    This might encode "¬" as 0xAC, or it might not.

    However, if you want to write predictable code, you must agree with
    whoever will read the bytes back upon which encoding to use.
    Otherwise, when reading 0xC2AC back into a string, the reader might
    get a tiny picture of a tiny goat instead of the "¬".


  4. Default Re: GetByte adding an extra byte?

    On Aug 22, 2:27 pm, Steven Blair <steven.bl...@btinternet.com> wrote:
    > string USMesg;
    >
    > USMesg = "¬Credits¬Remaining¬";
    >
    > byte[] bArray = Encoding.UTF8.GetBytes(USMesg);
    >
    > After I execute the code, the first two bytes are:
    >
    > [0] = 0xC2
    > [1] = 0xAC //This is the character I was expecting
    >
    > Why has an extra byte been inserted after each ¬?
    >
    > I tried using Encoding.ASCII.GetBytes() but that translates my ¬ to a
    > 0x3F (Question mark ?)
    >
    > Anyone any idea whats happened and how I can get round this problem?


    It sounds like you should read up on the UTF-8 format. Using
    Encoding.Default may well give you what you want, but UTF-8 is
    generally a better format these days.

    See http://pobox.com/~skeet/csharp/unicode.html and the referenced
    links there.

    Jon


  5. Default Re: GetByte adding an extra byte?


    On Aug 22, 4:04 pm, "Ignacio Machin \( .NET/ C# MVP \)" <machin TA
    laceupsolutions.com> wrote:

    > Unicode characters use two bytes.


    O RLY?

    "Unicode" can be anything from UTF7 to UTF32. In this case, it was
    UTF8. Each use different numbers of bits to represent characters.
    Also, UTF8 uses anywhere between 1 and 4 (IIRC) bytes to represent a
    character.

    Perhaps you were thinking of "wide characters" from old-school Win32
    programming?


  6. Default Re: GetByte adding an extra byte?

    On Aug 22, 3:04 pm, "Ignacio Machin \( .NET/ C# MVP \)" <machin TA
    laceupsolutions.com> wrote:
    > Unicode characters use two bytes.


    True, but irrelevant in this case - the important thing is that the
    UTF-8 encoded version of the relevant character takes two bytes.
    (Other characters can take 1 or 3.)

    Jon


  7. Default Re: GetByte adding an extra byte?

    Encoding.Default.GetBytes() does the job.

    Thanks for the help.


    *** Sent via Developersdex http://www.developersdex.com ***

  8. Default Re: GetByte adding an extra byte?

    Opps, too early in the morning and not enough coffee


    "Jon Skeet [C# MVP]" <skeet@pobox.com> wrote in message
    news:1187793164.749592.108680@q4g2000prc.googlegroups.com...
    > On Aug 22, 3:04 pm, "Ignacio Machin \( .NET/ C# MVP \)" <machin TA
    > laceupsolutions.com> wrote:
    >> Unicode characters use two bytes.

    >
    > True, but irrelevant in this case - the important thing is that the
    > UTF-8 encoded version of the relevant character takes two bytes.
    > (Other characters can take 1 or 3.)
    >
    > Jon
    >




+ Reply to Thread

Similar Threads

  1. Re: Validators Adding Extra Line
    By Application Development in forum DOTNET
    Replies: 2
    Last Post: 11-07-2007, 04:47 PM
  2. Re: Validators Adding Extra Line
    By Application Development in forum DOTNET
    Replies: 0
    Last Post: 11-07-2007, 02:10 PM
  3. adding extra option to date_select
    By Application Development in forum RUBY
    Replies: 0
    Last Post: 09-18-2007, 04:26 AM
  4. Adding Extra Alias
    By Application Development in forum Microsoft Exchange
    Replies: 1
    Last Post: 02-09-2004, 11:31 PM