Quick links: 

Long vowels in Māori on web pages

Test line: [ ā ē ī ō ū Ā Ē Ī Ō Ū ]

Reading  |  Writing

Reading pages in Unicode

Some of my pages are written in Unicode, which allows readers to see 'non-standard' characters such as the macrons used to show long vowels in Māori (plus special characters from Eastern European languages, Cyrillic, Greek, and even Arabic or Chinese). It is being used more and more on the Internet as web designers realise that not everyone's language uses the Western European character set! Unfortunately there doesn't seem to be one standard way of doing things yet, so the suggestions below may not work on your system.

Most browsers should show my pages correctly, but if yours doesn't, try these steps (check the test line after each one). This set-up only needs to be done once, and it shouldn't interfere with the reading of pages written in the ordinary Western character set:

  1. Reload / refresh the page - sometimes the browser does not recognise the coding first time.
     
  2. Check that your browser's coding is set to 'UTF-8', 'Unicode', or 'Universal alphabet' - not 'Western', 'ISO-8859-1', or 'Western alphabet'. Use 'Auto-Select' if available. In IE, this is done from the View-Fonts or View-Encoding menu, in Netscape try View-Encoding. (Don't change the default coding to Unicode, leave that as Western). IE5 users should look up 'international languages' in the help menu index - there are good instructions there.
     
  3. Check which font your browser uses to display Unicode pages - you should probably set it to one of the 'standard' fonts such as Times (New Roman), Arial/Helvetica, or Lucida Sans for the variable-width font and Courier New or Andale Mono for fixed-width. You may need to try more than one.
    In Netscape, you set this under Edit-Preferences-Appearance-Fonts, scroll down 'For the encoding' to Unicode, and choose fonts. In IE4, choose View-Internet Options/Preferences-General-Fonts (button at bottom), scroll down 'Character sets' to Universal Alphabet, and choose fonts. In IE5, you will need to set fonts for each language separately, but the program usually does this automatically.
    More instructions on international language configuration are available here.
     
  4. If that doesn't work, you probably need an upgraded version of the font you use to view your web pages, containing all the necessary characters. The latest full versions of many popular fonts can be obtained free here, for both Mac and Windows (look for a WGL4 character set, which contains most of the Latin characters as well as Greek and Cyrillic). James Kass also offers a (larger) shareware demo font called Code2000 with support for lots of non-European alphabets or scripts as well as Latin characters.
     
  5. Failing that, probably the only alternative is upgrading your browser - sorry! I have tried to provide alternative uncoded versions of all my pages, but if I have missed any please let me know. I can always e-mail uncoded versions.

Test line: [ ā ē ī ō ū Ā Ē Ī Ō Ū ]

 

Writing pages in Unicode

It's no harder to write pages in Unicode than in normal Western encoding, although you will probably need to edit your html page as a text file and insert the special characters that you want to use by hand.

If you open up the source code of a web page, you will see that the characters not found on your keyboard are represented by numerical or letter codes. For example, if you look at the bottom of this page you will see that the line that displays as '© 1999-2001' is actually coded as © 1999-2001. The © is the numerical sign for the copyright symbol while   is the code for a non-breaking space.

The number codes of the standard Western character set run from 0 to 255 (some of which aren't used), which only allows for a small number of non-keyboard characters such as µ for µ and ÿ for ÿ. Unicode extends the number of available characters to 65,536, giving room for accented letters from other languages using the Roman alphabet as well as all the characters from Greek, Cyrillic, Chinese, Japanese, Arabic etc. The numbers 0-255 from the Western set are carried over unchanged.
The reader does need special fonts to display many of these languages, but support packs are offered free for both Netscape and Internet Explorer 4.0 (and you don't need the Chinese set, for example, unless you intend to view pages written in Chinese). A range of extended Western fonts for both Windows and Mac can be obtained free as well – look for a WGL4 character set – from Microsoft or this alternative site, although most newer computers probably already have some extended fonts. James Kass also offers a shareware demo font called Code2000 with support for lots of non-European alphabets or scripts as well as Latin characters, although it's not as tidy to look at as the others.

When writing your page, you first need to put a META tag within the <HEAD> part of your page to show what kind of coding you are using. Pages written in Western encoding should have the following line, or something similar (although many don't bother with it):

<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
To make your pages display properly in Unicode, you need to change this to:
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">

The Unicode encoding is then done simply by using the character code number in the same way as the code numbers from the standard character set. For the macronised vowels that you need to display Māori properly, the codes are:

&#256 = Ā
&#274 = Ē
&#298 = Ī
&#332 = Ō
&#362 = Ū
&#257 = ā
&#275 = ē
&#299 = ī
&#333 = ō
&#363 = ū
&#7733 = ḵ, underlined k, not in most fonts

So, for example, to write the word Māori with a long 'a', you would change the 'a' in the word to the code &#257;, giving M&#257;ori.

There are plenty of Unicode resources elsewhere on the Internet:

  • The Unicode organisation itself, for technical information.
  • James Kass's excellent pages, mostly in plain English (a good place to visit if you have problems).
  • Alan Wood's guide to using Unicode, including where to find fonts.
     
  • The official code numbers can be got from Unicode, their charts are in hexadecimal but they provide graphic representations of all the characters in the glyph charts.
  • Here are some charts in normal decimal numbering, also downloadable as a zipped file.
  • A good basic chart of the codes for the various European characters in the Times New Roman font (some other single font charts are also available).
  • My own chart of the characters available in the WGL4 character set fonts.


Return to the home page.

Let me know what you think – fill in my feedback form!

© 1999-2001
Created: 9 September 1999 
Last modified: 6 July 2001 

This page has been visited [counter] times since 9 September 1999.

Background artwork by Tutu Graphics 

Valid HTML 4.01!