UTF8 to Unicode C

Hi, sorry for this long time without new posts.

This is a routine i wrote for reading Unicode from UTF8 data.

u32 UTF8_To_Unicode(u8* src2, u8* nbyte){
u8* src=src2;
*nbyte=1;
if(*src==0)return 0;
if((*src&128)==0)return *src;
u32 value=0;
u8 curbit=7;
*nbyte=0;
while(*src&(1<<curbit)){
*nbyte++;
curbit–;
}
if(*nbyte==0×02){
value|=(*src&31)<<6;
src++;
if((*src&192)!=128){
*nbyte=(u8)(src-src2);
return 32;
}
value|=(*src&63);
}else if(*nbyte==0×03){
value|=(*src&15)<<12;
src++;
if((*src&192)!=128){
*nbyte=(u8)(src-src2);
return 32;
}
value|=(*src&63)<<6;
src++;
if((*src&192)!=128){
*nbyte=(u8)(src-src2);
return 32;
}
value|=(*src&63);
}else if(*nbyte==0×04){
value|=(*src&7)<<18;
src++;
if((*src&192)!=128){
*nbyte=(u8)(src-src2);
return 32;
}
value|=(*src&63)<<12;
src++;
if((*src&192)!=128){
*nbyte=(u8)(src-src2);
return 32;
}
value|=(*src&63)<<6;
src++;
if((*src&192)!=128){
*nbyte=(u8)(src-src2);
return 32;
}
value|=(*src&63);
}
return value;
}

To use it just do something like this:

u8 buffer[3]={0xE2, 0×82, 0xAC}; //UTF8 value of €

u8 nbyte; //This will contain the byte length of the UTF8 character, this will be useful for jumping to the next character in a UTF8 sequence.

u32 unicode = UTF8_To_Unicode(buffer, &nbyte); //This will contain the unicode of your UTF8 character.

That’s all. Just a thing, remember that if the sequence has error the function will return 32(SPACE character, ‘ ‘).

Let me know about errors or tips.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.