Why to Encode Special Characters in HTML

Why do I need to write an entity of the character?

You might be asking yourself, why can't you just copy-out the whole symbol? Why do you need it's entity HTML code? And I'll tell you why. Because the browser may get the encodings wrong. One symbol might show up as another, or as and unknown character. It may even mix-up with the preceding character if the browser will get it wrong. I'm sure you wouldn't like such user-experience on your site.

Text symbols are encoded into some numbers, and then interpreted by browsers as characters of some encoding system. When you write special characters that are outside of the standard 7-bit ASCII range straight into your HTML there are things to consider. If you'll just paste 3-byte characters of some encoding right into your HTML code, without "escaping" them (e.g. converting them to their symbol entity codes), you might get into big trouble. This is because HTML code should be written and read in ASCII encoding. If you'll paste the symbols from other encoding systems, like Unicode, or UTF-8, you may get browsers treating your several-byte symbols just like several different symbols. So remember to write your HTML source code plainly in ASCII. Instead of writing symbols, always write their entities.

That's why you should also remember to define a content-type meta-tag like <meta http-equiv="content-type" content="text/html; charset=UTF-8">. It specifies the charset. More on this coming next.

Character set meta-tag

Character set (or charset) meta tag is used to specify the encoding, of data that your HTTP server sends. I used to think <meta http-equiv="Content-Type"> is meta element intended to be interpreted by a browser only, just like an ordinary HTML tag. According to WWW Consortium, it helps your HTTP server to generate an appropriate encoding header. If "Content-Type" header is missing "charset=" parameter - browsers usually think your source code is written in ISO-8859-1 encoding. And that may cause serious trouble.



Encode characters in URL's

Remember, that browsers send requests to HTTP servers in ASCII encoding. You should always encode (escape) special text characters in your URL paths. This is for the same reason why you should escape them in HTML code.


Several days ago I found out, that you should also encode "&" symbol as "&amp;" when you have business with a URL-link. This is because all other escaped characters start with "&". And so if you'll write the path to your php file, or whatever like this: "//example.com/?a=1&b=2", your browser may think, that by "&b" you wanted to express some non-ASCII character.
This is why you should try to "escape" special characters in URL's.



So in quick summary, habits you need to make are:

  1. Always define a content-type meta tag.
  2. Always "escape" non-ASCII characters in HTML code.
  3. Always "escape" non-ASCII characters and an ampersand (&) in URL's.

Entity Tool - Escape special HTML & JavaScript character entities

Mainly, this is a JavaScript tool for webmasters who want to put some symbols, or non-English text on their website html codes or scripts. But, also, anyone who wants to find out some character's numeric or hex entity can use this great tool.

Enty



on = HTML / off = JavaScript
On = Decimal / Off = Hex
Codes & Commas


Buttons for HTML entities

Enter text with special characters you want to convert.

If you want tags like <b> to be transformed into &lt;b&gt; and symbol codes like &#174; to become encoded too (&amp;#174;) aswell - press button.

If you want to keep tags and previous symbol codes - press . This will keep stuff like <div class="bla">&#8359;</div> as it is and will convert partially-converted text (like turn &#x266a;♪ into &#x266a;&#x266a;).

You can also use button to keep converted symbols same like in the example but still convert "<" and ">" tag notations to their entities.

You can also switch from decimal entity codes to hex (base 16) symbol codes. Check the "Decimal / Hex" check box for that.

JavaScript entities

You can switch to converting of symbols into Javascript entities like "\ufe31\&". Just disable the "HTML / JavaScript" check box for that. I'm not so sure, but I think this works for Adobe Flash's ActionScript too.

Extract symbol codes with Enty

You can use this tool to extract decimal, or hex codes of symbols you want. There are some ways to type symbols from keyboard with these codes. You can find some in Keyboard symbols. To get symbols' codes you have to copy these symbols into my tool and hit . In the result you get, text locked inside "&#" and ";", or "&#x" and ";" is decimal and hex code of corresponding symbols. You'll figure it all out fast, I'm sure. ;)

Why webmasters need it?

Because there are many different encoding systems in this world you can't just copy and paste special text symbols into an HTML source code and be happy with how your page displays it. You place it in encoding number one in your text editor and your user's browser thinks it's an encoding number two, so webmasters have to use only the most compatible symbols in their html codes and scripts.

With help of this tool you can convert text non-ASCII symbols into compatible ASCII text with html entities and JavaScript symbol codes of these special characters. And from now on there's no need in those badly-looking html entity table references. And you don't have to lookup for several minutes for some character you want to get. And in a large part of cases I'm sure it was not even there, cause these references are always more about some Greek and geek signs, copyright symbols, etc. I know, cause I made such a table by myself some time ago, but it's no good if you're looking up for some cool symbols (by the way there's hell lot of syperb ones in my collection of Facebook Symbols. *Evil laugh* now it's time for special character converter. Just copy symbols you want into the input box and voila! Here you go! Just in few seconds - real kung fu grip!

You can even copy the whole text that contains symbols and the tool will transform special characters only so you can transform text in UTF-8 to any html or javascript. You can even copy the result and convert it once again to get the code of that previous result (you know, it's sometimes useful in js). Make sure you bookmark this one if you're a web developer, javascript writer, or if you have some kind of website building hobby, cause I find it super-handy by myself. ;) I use this thing a lot, really.

Article updated on 2010-12-13 19:12:22