Snippet #6: Converting Internationalized Domain Name with LineRate
Modern browsers convert the Internationalized Domain Names (IDN) to the set of ASCII characters permitted in the Domain Name System prior to name resolution. The mechanism employed is called Punycode, and is defined in RFC 3492. For example, an UTF-8 represented “日本語.jp” is converted to “xn--wgv71a119e.jp”, or vice versa. The latter representation is also used in the HTTP’s Host request header. For example, “Host: xn--wgv71a119e.jp”.
The Punycode Node.js module is bundled in LineRate. The toASCII() method converts an UTF-8 represented domain name to ASCII, and the toUnicode() does the reverse. The module is handy when you want to display readable UTF-8 domain names or compare the Host header values and UTF-8 names.
The snippet below converts Host header values to UTF-8 and writes to the console.
'use strict'; var fp = require('lrs/forwardProxyModule'); var puny = require('punycode'); var proc = function(servReq, servResp, cliReq) { try { var host_puny = servReq.headers['Host']; var host_utf8 = puny.toUnicode(host_puny); console.log(host_puny + ' -> ' + host_utf8); } catch(e) { // do nothing } cliReq(); } fp.on('exist', 'fp', function(fpo) { console.log(fpo.id + ' exists.'); fpo.on('request', proc); });
Here are some examples:
LROS: xn--wgv71a119e.jp → 日本語.jp LROS: xn--6krz9fba47sz4d44x8h7asr0c.tw → 國立暨南國際大學.tw LROS: xn--fhqu4ykwbs65a.cn → 上海大学.cn LROS: xn--9d0bw1iq6js1kwhq.kr → 전북대학교.kr