Why to Encode Special Characters in HTML
Why do I need to write an entity of the character?
You might be asking yourself, why can't you just copy-out the whole symbol? Why do you need it's entity HTML code? And I'll tell you why. Because the browser may get the encodings wrong. One symbol might show up as another, or as and unknown character. It may even mix-up with the preceding character if the browser will get it wrong. I'm sure you wouldn't like such user-experience on your site.
Text symbols are encoded into some numbers, and then interpreted by browsers as characters of some encoding system. When you write special characters that are outside of the standard 7-bit ASCII range straight into your HTML there are things to consider. If you'll just paste 3-byte characters of some encoding right into your HTML code, without "escaping" them (e.g. converting them to their symbol entity codes), you might get into big trouble. This is because HTML code should be written and read in ASCII encoding. If you'll paste the symbols from other encoding systems, like Unicode, or UTF-8, you may get browsers treating your several-byte symbols just like several different symbols. So remember to write your HTML source code plainly in ASCII. Instead of writing symbols, always write their entities.
That's why you should also remember to define a content-type meta-tag like
<meta http-equiv="content-type" content="text/html; charset=UTF-8">. It specifies the charset. More on this coming next.
Character set meta-tag
Character set (or charset) meta tag is used to specify the encoding, of data that your HTTP server sends.
I used to think
<meta http-equiv="Content-Type"> is meta element intended to be interpreted by a browser only, just like an ordinary HTML tag. According to WWW Consortium, it helps your HTTP server to generate an appropriate encoding header. If "Content-Type" header is missing "charset=" parameter - browsers usually think your source code is written in ISO-8859-1 encoding. And that may cause serious trouble.
Encode characters in URL's
Remember, that browsers send requests to HTTP servers in ASCII encoding. You should always encode (escape) special text characters in your URL paths. This is for the same reason why you should escape them in HTML code.
Several days ago I found out, that you should also encode "
&" symbol as "
&" when you have business with a URL-link. This is because all other escaped characters start with "
&". And so if you'll write the path to your php file, or whatever like this: "
http://example.com/?a=1&b=2", your browser may think, that by "
&b" you wanted to express some non-ASCII character.
This is why you should try to "escape" special characters in URL's.
So in quick summary, habits you need to make are:
- Always define a content-type meta tag.
- Always "escape" non-ASCII characters in HTML code.
- Always "escape" non-ASCII characters and an ampersand (&) in URL's.
Buttons for HTML entities
Enter text with special characters you want to convert.
If you want tags like <b> to be transformed into <b> and symbol codes like ® to become encoded too (&#174;) aswell - press button.
If you want to keep tags and previous symbol codes - press . This will keep stuff like
<div class="bla">₧</div> as it is and will convert partially-converted text (like turn
You can also use button to keep converted symbols same like in the example but still convert "<" and ">" tag notations to their entities.
You can also switch from decimal entity codes to hex (base 16) symbol codes. Check the "Decimal / Hex" check box for that.
Extract symbol codes with Enty
You can use this tool to extract decimal, or hex codes of symbols you want. There are some ways to type symbols from keyboard with these codes. You can find some in Keyboard symbols. To get symbols' codes you have to copy these symbols into my tool and hit . In the result you get, text locked inside "&#" and ";", or "&#x" and ";" is decimal and hex code of corresponding symbols. You'll figure it all out fast, I'm sure. ;)
Why webmasters need it?
Because there are many different encoding systems in this world you can't just copy and paste special text symbols into an HTML source code and be happy with how your page displays it. You place it in encoding number one in your text editor and your user's browser thinks it's an encoding number two, so webmasters have to use only the most compatible symbols in their html codes and scripts.