Monthly Archives: November 2009

Find the UTF-8 and UCS value of a character.

I write software that processes Chinese characters.  Thus, I have to deal with UCS and UTF-8 encoding for Chinese characters.  It used to took me quite some time to find or calculate the UTF-8 and UCS value of a Chinese character until I came accross an article about VIM.

I’ve been using VIM for about 8 years but I didn’t know that I can get the UCS and UTF-8 value of a character directly from VIM.  To get the value:

  • Place the cursor on the character, in command mode, then type `ga’ for the unicode value, `g8′ for the UTF-8 value.

VIM saves files in UTF-8 encoding.  To see the content of the file in HEX:

  • `hd filename’ or `hexdump -C filename’

The output will be the same as `g8′.

The first image shows the UTF-8 value of the Chinese character ‘你’.  The second image show the UCS value of the same character.



My latest conky setup looks great isn’t it!  Too bad wordpress does not allow me to upload my conkyrc and 2 helper scripts.

Issue with java.sql.Time + GWT + browsers in Linux and Windows.

Yesterday, I got into a problem with my work.  On a Linux machine, I have a database (MySQL) table and there is a column in type time.  I use the following java code to retrieve the value:

Time t = rs.getTime("columnName"); // rs is a ResultSet.

The Time objec is sent via HTTP to a GWT web application.  The Time object is displayed correctly on a Firefox on linux (00:00:00) but not on a Firefox or IE on Windows (00:30:00).

No matter what value is in the database, the browsers on Windows will display it with an additional 30 mins.

My search starts with Java, JVM, tzdata.  Test, trail and error for more than 1 day, finally, the error is the javascript engine.  To understand the whole story, I must start with Java.

Base on the javadoc, java.sql.Time is actually a Date object.  The date related fields should be set to 1970-Jan-01 and left untouced.  Only the time part of it is (should be) used.  When the value is read from database, a Time object is created.  It is set to 1970-Jan-01 (Asia/Kuala Lumur).  Base on the tzdata, the offset for “Asia/Kuala Lumpur” on 1970-Jan-01 is only +7:30 instead of +8:00.  Thus, the Time object will have the value -27000000 (which means 00:00:00 +0:00 on 1970-Jan-01).

When this data is sent to the GWT application, the javascript engine in a Linux browser, correctly use the +7:30 offset. Thus, -27000000 plus +7:30 is 00:00:00.  But on both Windows browsers, the offset used is +8:00.  Thus, -27000000 plus +8:00 is 00:30:00.

To prove this: create a HTML file with the following content and open it with Firefox or epiphany on Linux, then do the same with IE or Firefox on Windows.  Maybe another reason to use Linux instead of Windows:

    <script type="text/javascript">
        var d = new Date();
        document.write("Current Date offset (min) = ");
        document.write("1970-Jan-01 offset (min) = ");