4 ¥ · £ · € · $ · ¢ · ₡ · ₢ · ₣ · ₤ · ₥ · ₦ · ₧ · ₨ · ₩ · ₪ · ₫ · ₭ · ₮ · ₯
7 The Kermit Project - Columbia University <index.html>
9 fdc@columbia.edu <mailto:fdc@columbia.edu>
11 /Last update:/ Sun Jun 12 20:24:10 2005
13 ------------------------------------------------------------------------
14 [ PEACE <http://www.columbia.edu/~fdc/pace/> ] [ Poetry <#poetry> ] [ I
15 Can Eat Glass <#glass> ] [ The Quick Brown Fox <#quickbrownfox> ] [ HTML
16 Features <#html> ] [ Credits, Tools, Commentary <#credits> ]
18 UTF-8 is an ASCII-preserving encoding method for Unicode <unicode.html>
19 (ISO 10646), the Universal Character Set (UCS). The UCS encodes most of
20 the world's writing systems in a single character set, allowing you to
21 mix languages and scripts within a document without needing any tricks
22 for switching character sets. This web page is encoded directly in UTF-8.
24 As shown HERE <glass.html>, Columbia University's Kermit 95 <k95.html>
25 terminal emulation software can display UTF-8 plain text in Windows 95,
26 98, ME, NT, XP, or 2000 when using a monospace Unicode font like Andale
27 Mono WT J <http://www.monotype.com> or Everson Mono Terminal
28 <http://www.evertype.com/emono/>, or the lesser populated Courier New,
29 Lucida Console, or Andale Mono. C-Kermit <ckermit.html> can handle it
30 too, if you have a Unicode display
31 <http://www.cl.cam.ac.uk/~mgk25/unicode.html>. As many languages as are
32 representable in your font can be seen on the screen at the same time.
34 This, however, is a Web page. Some Web browsers can handle UTF-8, some
35 can't. And those that can might not have a sufficiently populated font
36 to work with (some browsers might pick glyphs dynamically from multiple
37 fonts; Netscape 6 seems to do this). CLICK HERE
38 <http://www.alanwood.net/unicode/fonts.html> for a survey of Unicode
41 The subtitle above shows currency symbols of many lands. If they don't
42 appear as blobs, we're off to a good start!
45 ------------------------------------------------------------------------
48 From the Anglo-Saxon Rune Poem <http://www.ragweedforge.com/poems.html>
51 ᚠᛇᚻ᛫ᛒᛦᚦ᛫ᚠᚱᚩᚠᚢᚱ᛫ᚠᛁᚱᚪ᛫ᚷᛖᚻᚹᛦᛚᚳᚢᛗ
52 ᛋᚳᛖᚪᛚ᛫ᚦᛖᚪᚻ᛫ᛗᚪᚾᚾᚪ᛫ᚷᛖᚻᚹᛦᛚᚳ᛫ᛗᛁᚳᛚᚢᚾ᛫ᚻᛦᛏ᛫ᛞᚫᛚᚪᚾ
53 ᚷᛁᚠ᛫ᚻᛖ᛫ᚹᛁᛚᛖ᛫ᚠᚩᚱ᛫ᛞᚱᛁᚻᛏᚾᛖ᛫ᛞᚩᛗᛖᛋ᛫ᚻᛚᛇᛏᚪᚾ᛬
55 From Laȝamon's/ Brut <http://mesl.itd.umich.edu/b/brut/>/ (/The
56 Chronicles of England/, Middle English, West Midlands):
58 An preost wes on leoden, Laȝamon was ihoten
59 He wes Leovenaðes sone -- liðe him be Drihten.
60 He wonede at Ernleȝe at æðelen are chirechen,
61 Uppen Sevarne staþe, sel þar him þuhte,
62 Onfest Radestone, þer he bock radde.
64 (The third letter in the author's name is Yogh, missing from many fonts;
65 CLICK HERE <st-erkenwald.html> for another Middle English sample with
66 some explanation of letters and encoding).
68 From the Tagelied of *Wolfram von Eschenbach*
69 <http://gutenberg.spiegel.de/autoren/eschenba.htm> (Middle High German):
71 Sîne klâwen durh die wolken sint geslagen,
72 er stîget ûf mit grôzer kraft,
73 ich sih in grâwen tägelîch als er wil tagen,
74 den tac, der im geselleschaft
75 erwenden wil, dem werden man,
76 den ich mit sorgen în verliez.
77 ich bringe in hinnen, ob ich kan.
78 sîn vil manegiu tugent michz leisten hiez.
80 Some lines of *Odysseus Elytis*
81 <http://users.hol.gr/~artemis/odysseas_elytis.htm> (Greek):
83 Τη γλώσσα μου έδωσαν ελληνική
84 το σπίτι φτωχικό στις αμμουδιές του Ομήρου.
85 Μονάχη έγνοια η γλώσσα μου στις αμμουδιές του Ομήρου.
90 The first stanza of *Pushkin*
91 <http://www.ocf.berkeley.edu/%7Eleong/Russkaya%20Literatura/Aleksandr%20Sergeevich%20Pushkin.htm>'s
92 Bronze Horseman (Russian):
94 На берегу пустынных волн
95 Стоял он, дум великих полн,
96 И вдаль глядел. Пред ним широко
97 Река неслася; бедный чёлн
98 По ней стремился одиноко.
99 По мшистым, топким берегам
100 Чернели избы здесь и там,
101 Приют убогого чухонца;
102 И лес, неведомый лучам
103 В тумане спрятанного солнца,
107 <http://www.compling.hu-berlin.de/~johannes/mxedruli/>'s Veṗxis
108 Ṭq̇aosani, ̣︡Th, The Knight in the Tiger's Skin (Georgian):
110 ვეპხის ტყაოსანი შოთა რუსთაველი
112 ღმერთსი შემვედრე, ნუთუ კვლა დამხსნას სოფლისა შრომასა, ცეცხლს, წყალსა
113 და მიწასა, ჰაერთა თანა მრომასა; მომცნეს ფრთენი და აღვფრინდე,
114 მივჰხვდე მას ჩემსა ნდომასა, დღისით და ღამით ვჰხედვიდე მზისა ელვათა
117 Tamil poetry of Cupiramaniya Paarathiyar, சுப்ரமணிய பாரதியார் (1882-1921):
119 யாமறிந்த மொழிகளிலே தமிழ்மொழி போல் இனிதாவது எங்கும் காணோம்,
120 பாமரராய் விலங்குகளாய், உலகனைத்தும் இகழ்ச்சிசொலப் பான்மை கெட்டு,
121 நாமமது தமிழரெனக் கொண்டு இங்கு வாழ்ந்திடுதல் நன்றோ? சொல்லீர்!
124 ------------------------------------------------------------------------
127 And from the sublime to the ridiculous, here is a certain phrase¹
128 <#notes> in an assortment of languages:
130 1. *Sanskrit*: काचं शक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥
131 2. *Sanskrit* /(standard transcription):/ kācaṃ śaknomyattum;
133 3. *Classical Greek*: ὕαλον ϕαγεῖν δύναμαι· τοῦτο οὔ με βλάπτει.
134 4. *Greek*: Μπορώ να φάω σπασμένα γυαλιά χωρίς να πάθω τίποτα.
136 5. *Latin*: Vitrum edere possum; mihi non nocet.
137 6. *Old French*: Je puis mangier del voirre. Ne me nuit.
138 7. *French*: Je peux manger du verre, ça ne me fait pas de mal.
139 8. *Provençal / Occitan*: Pòdi manjar de veire, me nafrariá pas.
140 9. *Québécois*: J'peux manger d'la vitre, ça m'fa pas mal.
141 10. *Walloon*: Dji pou magnî do vêre, çoula m' freut nén må.
142 *Champenois*: (NEEDED)
144 11. *Picard*: Ch'peux mingi du verre, cha m'foé mie n'ma.
146 12. *Kreyòl Ayisyen*: Mwen kap manje vè, li pa blese'm.
147 13. *Basque*: Kristala jan dezaket, ez dit minik ematen.
148 14. *Catalan*: Puc menjar vidre que no em fa mal.
149 15. *Spanish*: Puedo comer vidrio, no me hace daño.
150 16. *Aragones*: Puedo minchar beire, no me'n fa mal .
151 17. *Galician*: Eu podo xantar cristais e non cortarme.
152 18. *Portuguese*: Posso comer vidro, não me faz mal.
153 19. *Brazilian Portuguese* (7 <#notes>): Posso comer vidro, não me
155 20. *Caboverdiano*: M' podê cumê vidru, ca ta maguâ-m'.
156 21. *Papiamentu*: Ami por kome glas anto e no ta hasimi daño.
157 22. *Italian*: Posso mangiare il vetro e non mi fa male.
158 23. *Milanese*: Sôn bôn de magnà el véder, el me fa minga mal.
159 24. *Roman*: Me posso magna' er vetro, e nun me fa male.
160 25. *Napoletano*: M' pozz magna' o'vetr, e nun m' fa mal.
161 26. *Sicilian*: Puotsu mangiari u vitru, nun mi fa mali.
162 27. *Venetian*: Mi posso magnare el vetro, no'l me fa mae.
163 28. *Zeneise* /(Genovese):/ Pòsso mangiâ o veddro e o no me fà mâ.
164 *Rheto-Romance / Romansch*: (NEEDED)
165 *Romany / Tsigane*: (NEEDED)
166 29. *Romanian*: Pot să mănânc sticlă și ea nu mă rănește.
167 30. *Esperanto*: Mi povas manĝi vitron, ĝi ne damaĝas min.
170 31. *Cornish*: Mý a yl dybry gwéder hag éf ny wra ow ankenya.
171 32. *Welsh*: Dw i'n gallu bwyta gwydr, 'dyw e ddim yn gwneud dolur i mi.
172 33. *Manx Gaelic*: Foddym gee glonney agh cha jean eh gortaghey mee.
173 34. *Old Irish* /(Ogham):/ ᚛᚛ᚉᚑᚅᚔᚉᚉᚔᚋ ᚔᚈᚔ ᚍᚂᚐᚅᚑ ᚅᚔᚋᚌᚓᚅᚐ᚜
174 35. *Old Irish* /(Latin):/ Con·iccim ithi nglano. Ním·géna.
175 36. *Irish*: Is féidir liom gloinne a ithe. Ní dhéanann sí dochar ar
177 37. *Scottish Gaelic*: S urrainn dhomh gloinne ithe; cha ghoirtich i mi.
178 38. *Anglo-Saxon* /(Runes):/ ᛁᚳ᛫ᛗᚨᚷ᛫ᚷᛚᚨᛋ᛫ᛖᚩᛏᚪᚾ᛫ᚩᚾᛞ᛫ᚻᛁᛏ᛫ᚾᛖ᛫ᚻᛖᚪᚱᛗᛁᚪᚧ᛫ᛗᛖ᛬
179 39. *Anglo-Saxon* /(Latin):/ Ic mæg glæs eotan ond hit ne hearmiað me.
180 40. *Middle English*: Ich canne glas eten and hit hirtiþ me nouȝt.
181 41. *English*: I can eat glass and it doesn't hurt me.
182 42. *English* /(IPA):/ [aɪ kæn iːt glɑːs ænd ɪt dɐz nɒt hɜːt miː]
183 (Received Pronunciation)
184 43. *English* /(Braille):/ ⠊⠀⠉⠁⠝⠀⠑⠁⠞⠀⠛⠇⠁⠎⠎⠀⠁⠝⠙⠀⠊⠞⠀⠙⠕⠑⠎⠝⠞⠀⠓⠥⠗⠞⠀⠍⠑
185 44. *Lalland Scots / Doric*: Ah can eat gless, it disnae hurt us.
186 *Glaswegian*: (NEEDED)
187 45. *Gothic* (4 <#notes>): 𐌼𐌰𐌲 𐌲𐌻𐌴𐍃 𐌹̈𐍄𐌰𐌽, 𐌽𐌹 𐌼𐌹𐍃 𐍅𐌿
189 46. *Old Norse* /(Runes):/ ᛖᚴ ᚷᛖᛏ ᛖᛏᛁ ᚧ ᚷᛚᛖᚱ ᛘᚾ ᚦᛖᛋᛋ ᚨᚧ ᚡᛖ ᚱᚧᚨ ᛋᚨᚱ
190 47. *Old Norse* /(Latin):/ Ek get etið gler án þess að verða sár.
191 48. *Norsk / Norwegian (Nynorsk):* Eg kan eta glas utan å skada meg.
192 49. *Norsk / Norwegian (Bokmål):* Jeg kan spise glass uten å skade meg.
193 *Føroyskt / Faroese*: (NEEDED)
194 50. *Íslenska / Icelandic*: Ég get etið gler án þess að meiða mig.
195 51. *Svenska / Swedish*: Jag kan äta glas utan att skada mig.
196 52. *Dansk / Danish*: Jeg kan spise glas, det gør ikke ondt på mig.
197 53. *Soenderjysk*: Æ ka æe glass uhen at det go mæ naue.
198 54. *Frysk / Frisian*: Ik kin glês ite, it docht me net sear.
199 55. *Nederlands / Dutch*: Ik kan glas eten, het doet mij geen kwaad.
200 56. *Kirchröadsj/Bôchesserplat*: Iech ken glaas èèse, mer 't deet
202 57. *Afrikaans*: Ek kan glas eet, maar dit doen my nie skade nie.
203 58. *Lëtzebuergescht / Luxemburgish*: Ech kan Glas iessen, daat deet
205 59. *Deutsch / German*: Ich kann Glas essen, ohne mir weh zu tun.
206 60. *Ruhrdeutsch*: Ich kann Glas verkasematuckeln, ohne dattet mich
208 61. *Lausitzer Mundart* ("Lusatian"): Ich koann Gloos assn und doas
210 62. *Odenwälderisch*: Iech konn glaasch voschbachteln ohne dass es mir
212 63. *Sächsisch / Saxon*: 'sch kann Glos essn, ohne dass'sch mer wehtue.
213 64. *Pfälzisch*: Isch konn Glass fresse ohne dasses mer ebbes ausmache
215 65. *Schwäbisch / Swabian*: I kå Glas frässa, ond des macht mr nix!
216 66. *Bayrisch / Bavarian*: I koh Glos esa, und es duard ma ned wei.
217 67. *Allemannisch*: I kaun Gloos essen, es tuat ma ned weh.
218 68. *Schwyzerdütsch*: Ich chan Glaas ässe, das tuet mir nöd weeh.
219 69. *Hungarian*: Meg tudom enni az üveget, nem lesz tőle bajom.
220 70. *Suomi / Finnish*: Voin syödä lasia, se ei vahingoita minua.
221 71. *Sami (Northern)*: Sáhtán borrat lása, dat ii leat bávččas.
222 72. *Erzian*: Мон ярсан суликадо, ды зыян эйстэнзэ а ули.
227 73. *Estonian*: Ma võin klaasi süüa, see ei tee mulle midagi.
228 74. *Latvian*: Es varu ēst stiklu, tas man nekaitē.
229 75. *Lithuanian*: Aš galiu valgyti stiklą ir jis manęs nežeidžia
230 *Old Prussian*: (NEEDED)
231 *Sorbian* (Wendish): (NEEDED)
232 76. *Czech*: Mohu jíst sklo, neublíží mi.
233 77. *Slovak*: Môžem jesť sklo. Nezraní ma.
234 78. *Polska / Polish*: Mogę jeść szkło i mi nie szkodzi.
235 79. *Slovenian:* Lahko jem steklo, ne da bi mi škodovalo.
236 80. *Croatian*: Ja mogu jesti staklo i ne boli me.
237 81. *Serbian* /(Latin):/ Mogu jesti staklo a da mi ne škodi.
238 82. *Serbian* /(Cyrillic):/ Могу јести стакло а да ми не шкоди.
239 83. *Macedonian:* Можам да јадам стакло, а не ме штета.
240 84. *Russian*: Я могу есть стекло, оно мне не вредит.
241 85. *Belarusian* /(Cyrillic):/ Я магу есці шкло, яно мне не шкодзіць.
242 86. *Belarusian* /(Lacinka):/ Ja mahu jeści škło, jano mne ne škodzić.
243 87. *Ukrainian*: Я можу їсти шкло, й воно мені не пошкодить.
244 88. *Bulgarian*: Мога да ям стъкло, то не ми вреди.
245 89. *Georgian*: მინას ვჭამ და არა მტკივა.
246 90. *Armenian*: Կրնամ ապակի ուտել և ինծի անհանգիստ չըներ։
247 91. *Albanian*: Unë mund të ha qelq dhe nuk më gjen gjë.
248 92. *Turkish*: Cam yiyebilirim, bana zararı dokunmaz.
249 93. *Turkish* /(Ottoman):/ جام ييه بلورم بڭا ضررى طوقونمز
250 94. *Bangla / Bengali*: আমি কাঁচ খেতে পারি, তাতে আমার কোনো ক্ষতি হয় না।
251 95. *Marathi*: मी काच खाऊ शकतो, मला ते दुखत नाही.
252 96. *Hindi*: मैं काँच खा सकता हूँ, मुझे उस से कोई पीडा नहीं होती.
253 97. *Tamil*: நான் கண்ணாடி சாப்பிடுவேன், அதனால் எனக்கு ஒரு கேடும் வராது.
254 98. *Urdu*(2) <#notes>: میں کانچ کھا سکتا ہوں اور مجھے تکلیف نہیں ہوتی ۔
255 99. *Pashto*(2) <#notes>: زه شيشه خوړلې شم، هغه ما نه خوږوي
256 100. *Farsi / Persian*: .من می توانم بدونِ احساس درد شيشه بخورم
257 101. *Arabic*(2) <#notes>: أنا قادر على أكل الزجاج و هذا لا يؤلمني.
259 102. *Hebrew*(2) <#notes>: אני יכול לאכול זכוכית וזה לא מזיק לי.
260 103. *Yiddish*(2) <#notes>: איך קען עסן גלאָז און עס טוט מיר נישט װײ.
261 *Judeo-Arabic*: (NEEDED)
265 104. *Twi*: Metumi awe tumpan, ɜnyɜ me hwee.
266 105. *Hausa* (/Latin/): Inā iya taunar gilāshi kuma in gamā lāfiyā.
267 106. *Hausa* (/Ajami/) (2) <#notes>: إِنا إِىَ تَونَر غِلَاشِ كُمَ إِن غَمَا لَافِىَا
268 107. *Yoruba*(3) <#notes>: Mo lè je̩ dígí, kò ní pa mí lára.
269 108. *(Ki)Swahili*: Naweza kula bilauri na sikunyui.
270 109. *Malay*: Saya boleh makan kaca dan ia tidak mencederakan saya.
271 110. *Tagalog*: Kaya kong kumain nang bubog at hindi ako masaktan.
272 111. *Chamorro*: Siña yo' chumocho krestat, ti ha na'lalamen yo'.
273 112. *Javanese*: Aku isa mangan beling tanpa lara.
275 113. *Vietnamese (quốc ngữ)*: Tôi có thể ăn thủy tinh mà không hại gì.
276 114. *Vietnamese (nôm)* (4 <#notes>): 些 𣎏 世 咹 水 晶 𦓡 空 𣎏 害 咦
279 115. *Thai*: ฉันกินกระจกได้ แต่มันไม่ทำให้ฉันเจ็บ
280 116. *Mongolian* /(Cyrillic):/ Би шил идэй чадна, надад хортой биш
281 117. *Mongolian* /(Classic) (5 <#notes>):/ ᠪᠢ ᠰᠢᠯᠢ ᠢᠳᠡᠶᠦ ᠴᠢᠳᠠᠨᠠ ᠂ ᠨᠠᠳᠤᠷ
285 118. *Tibetan*: ཤེལ་སྒོ་ཟ་ནས་ང་ན་གི་མ་རེད།
286 119. *Chinese*: 我能吞下玻璃而不伤身体。
287 120. *Chinese* (Traditional): 我能吞下玻璃而不傷身體。
288 121. *Taiwanese*(6) <#notes>: Góa ē-tàng chia̍h po-lê, mā bē tio̍h-siong.
289 122. *Japanese*: 私はガラスを食べられます。それは私を傷つけません。
290 123. *Korean*: 나는 유리를 먹을 수 있어요. 그래도 아프지 않아요
291 124. *Bislama*: Mi save kakae glas, hemi no save katem mi.
292 125. *Hawaiian*: Hiki iaʻu ke ʻai i ke aniani; ʻaʻole nō lā au e ʻeha.
293 126. *Marquesan*: E koʻana e kai i te karahi, mea ʻā, ʻaʻe hauhau.
294 127. *Chinook Jargon:* Naika məkmək kakshət labutay, pi weyk ukuk
296 128. *Navajo*: Tsésǫʼ yishą́ągo bííníshghah dóó doo shił neezgai da.
297 *Cherokee* /(and Cree, Ojibwa, Inuktitut, and other Native
298 American languages):/ (NEEDED)
301 129. *Lojban*: mi kakne le nu citka le blaci .iku'i le se go'i na xrani mi
302 130. *Nórdicg*: Ljœr ye caudran créneþ ý jor cẃran.
304 /(Additions, corrections, completions,/ /gratefully accepted/
305 <mailto:kermit@columbia.edu>/.)/
307 For testing purposes, some of these are repeated in a *monospace
311 2. Greek: Μπορώ να φάω σπασμένα γυαλιά χωρίς να πάθω τίποτα.
312 3. Íslenska / Icelandic: Ég get etið gler án þess að meiða mig.
313 4. Polish: Mogę jeść szkło, i mi nie szkodzi.
314 5. Romanian: Pot să mănânc sticlă și ea nu mă rănește.
315 6. Ukrainian: Я можу їсти шкло, й воно мені не пошкодить.
316 7. Armenian: Կրնամ ապակի ուտել և ինծի անհանգիստ չըներ։
317 8. Georgian: მინას ვჭამ და არა მტკივა.
318 9. Hindi: मैं काँच खा सकता हूँ, मुझे उस से कोई पीडा नहीं होती.
319 10. Hebrew(2) <#notes>: אני יכול לאכול זכוכית וזה לא מזיק לי.
320 11. Yiddish(2) <#notes>: איך קען עסן גלאָז און עס טוט מיר נישט װײ.
321 12. Arabic(2) <#notes>: أنا قادر على أكل الزجاج و هذا لا يؤلمني.
322 13. Japanese: 私はガラスを食べられます。それは私を傷つけません。
323 14. Thai: ฉันกินกระจกได้ แต่มันไม่ทำให้ฉันเจ็บ
327 1. The "I can eat glass" phrase and initial translations (about 30 of
328 them) were borrowed from Ethan Mollick's I Can Eat Glass
329 <http://hcs.harvard.edu/~igp/glass.html> page (which disappeared
330 on or about June 2004) and converted to UTF-8. Since Ethan's
331 original page is gone, I should mention that his purpose was offer
332 travelers a phrase they could use in any country that would
333 command a certain kind of respect, or at least get attention. See
334 Credits <#credits> for the many additional contributions since
335 then. When submitting new entries, the word "hurt" (if you have a
336 choice) is used in the sense of "cause harm", "do damage", or
337 "bother", rather than "inflict pain" or "make sad". In this vein
338 Otto Stolz comments (as do others further down; personally I think
339 it's better for the purpose of this page to have extra entries
340 and/or to show a greater repertoire of characters than it is to
341 enforce a strict interpretation of the word "hurt"!):
343 This is the meaning I have translated to the Swabian dialect.
344 However, I just have noticed that most of the German variants
345 translate the "inflict pain" meaning. The German example
348 "Ich kann Glas essen ohne mir zu schaden."
350 (The comma fell victim to the 1996 orthographic reform, cf.
351 http://www.ids-mannheim.de/reform/e3-1.html#P76.
353 You may wish to contact the contributors of the following
354 translations to correct them:
356 * Lëtzebuergescht / Luxemburgish: Ech kan Glas iessen,
357 daat deet mir nët wei.
358 * Lausitzer Mundart ("Lusatian"): Ich koann Gloos assn und
359 doas dudd merr ni wii.
360 * Sächsisch / Saxon: 'sch kann Glos essn, ohne dass'sch
362 * Bayrisch / Bavarian: I koh Glos esa, und es duard ma ned
364 * Allemannisch: I kaun Gloos essen, es tuat ma ned weh.
365 * Schwyzerdütsch: Ich chan Glaas ässe, das tuet mir nöd weeh.
367 In contrast, I deem the following translations *alright*:
369 * Ruhrdeutsch: Ich kann Glas verkasematuckeln, ohne dattet
371 * Pfälzisch: Isch konn Glass fresse ohne dasses mer ebbes
373 * Schwäbisch / Swabian: I kå Glas frässa, ond des macht mr
376 (However, you could remove the commas, on account of
377 http://www.ids-mannheim.de/reform/e3-1.html#P76 and
378 http://www.ids-mannheim.de/reform/e3-1.html#P72, respectively.)
380 I guess, also these examples translate the /wrong/ sense of
381 "hurt", though I do not know these languages well enough to
382 assert them definitely:
384 * Nederlands / Dutch: Ik kan glas eten; het doet mij geen
385 pijn. /(This one has been changed)/
386 * Kirchröadsj/Bôchesserplat: Iech ken glaas èèse, mer 't
387 deet miech jing pieng.
389 In the Romanic languages, the variations on "fa male" (it) are
390 probably wrong, whilst the variations on "hace daño" (es) and
391 "damaĝas" (Esperanto) are probably correct; "nocet" (la) is
394 The northern Germanic variants of "skada" are probably right,
395 as are the Slavic variants of "škodi/шкоди" (se); however the
396 Slavic variants of " boli" (hv) are probably wrong, as
397 "bolena" means "pain/ache", IIRC.
399 The numbering of the samples is arbitrary, done only to keep track
400 of how many there are, and can change any time a new entry is
401 added. The arrangement is also arbitrary but with some attempt to
402 group related examples together. Note: All languages not listed
403 are wanted, not just the ones that say (NEEDED).
405 2. Correct right-to-left display of these languages depends on the
406 capabilities of your browser. The period should appear on the
407 left. In the monospace Yiddish example, the Yiddish digraphs
408 should occupy one character cell.
409 3. Yoruba: The third word is Latin letter small 'j' followed by small
410 'e' with U+0329, Combining Vertical Line Below. This displays
411 correctly only if your Unicode font includes the U+0329 glyph and
412 your browser supports combining diacritical marks. The Indic
413 examples also include combining sequences.
414 4. Includes Unicode 3.1 (or later) characters beyond Plane 0.
415 5. The Classic Mongolian example should be vertical, top-to-bottom
416 and left-to-right. But such display is almost impossible. Also no
417 font yet exists which provides the proper ligatures and positional
418 variants for the characters of this script, which works somewhat
420 6. Taiwanese is also known as Holo or Hoklo, and is related to
421 Southern Min dialects such as Amoy. Contributed by Henry H.
422 Tan-Tenn, who comments, "The above is the romanized version, in a
423 script current among Taiwanese Christians since the mid-19th
424 century. It was invented by British missionaries and saw use in
425 hundreds of published works, mostly of a religious nature. Most
426 Taiwanese did not know Chinese characters then, or at least not
427 well enough to read. More to the point, though, a written standard
428 using Chinese characters has never developed, so a significant
429 minority of words are represented with different candidate
430 characters, depending on one's personal preference or etymological
431 theory. In this sentence, for example, "-tàng", "chia̍h", "mā" and
432 "bē" are problematic using Chinese characters. "Góa" (I/me) and
433 "po-lê" (glass) are as written in other Sinitic languages (e.g.
435 7. Wagner Amaral of Pinese & Amaral Associados notes that the
436 Brazilian Portuguese sentence for "I can eat glass" should be
437 identical to the Portuguese one, as the word "machuca" means
438 "inflict pain", or rather "injuries". The words "faz mal" would
439 more correctly translate as "cause harm".
442 ------------------------------------------------------------------------
445 The "I can eat glass" sentences do not necessarily show off the
446 orthography of each language to best advantage. In many alphabetic
447 written languages it is possible to include all (or most) letters (or
448 "special" characters) in a single (often nonsense) /pangram/. These were
449 traditionally used in typewriter instruction; now they are useful for
450 stress-testing computer fonts and keyboard input methods. Here are a few
451 examples (SEND MORE):
453 1. *English:* The quick brown fox jumps over the lazy dog.
454 2. *Irish:* "An ḃfuil do ċroí ag bualaḋ ó ḟaitíos an ġrá a ṁeall lena
455 ṗóg éada ó ṡlí do leasa ṫú?" "D'ḟuascail Íosa Úrṁac na hÓiġe
456 Beannaiṫe pór Éava agus Áḋaiṁ."
457 3. *Dutch:* Pa's wijze lynx bezag vroom het fikse aquaduct.
458 4. *German: * Falsches Üben von Xylophonmusik quält jeden größeren
460 5. *German: * Im finſteren Jagdſchloß am offenen Felsquellwaſſer
461 patzte der affig-flatterhafte kauzig-höfliche Bäcker über ſeinem
462 verſifften kniffligen C-Xylophon. (2)
463 6. *Swedish:* Flygande bäckasiner söka strax hwila på mjuka tuvor.
464 7. *Czech:* Příliš žluťoučký kůň úpěl ďábelské kódy.
465 8. *Slovak:* Starý kôň na hŕbe kníh žuje tíško povädnuté ruže, na
466 stĺpe sa ďateľ učí kvákať novú ódu o živote.
467 9. *Russian:* В чащах юга жил-был цитрус? Да, но фальшивый экземпляр!
469 10. *Bulgarian:* Жълтата дюля беше щастлива, че пухът, който цъфна,
471 11. *Sami (Northern):* Vuol Ruoŧa geđggiid leat máŋga luosa ja čuovžža.
472 12. *Hungarian:* Árvíztűrő tükörfúrógép.
473 13. *Spanish:* El pingüino Wenceslao hizo kilómetros bajo exhaustiva
474 lluvia y frío, añoraba a su querido cachorro.
475 14. *Portuguese:* O próximo vôo à noite sobre o Atlântico, põe
476 freqüentemente o único médico. (3)
477 15. *French:* Les naïfs ægithales hâtifs pondant à Noël où il gèle
478 sont sûrs d'être déçus et de voir leurs drôles d'œufs abîmés.
479 16. *Esperanto:* Eĥoŝanĝo ĉiuĵaŭde.
480 17. *Hebrew:* זה כיף סתם לשמוע איך תנצח קרפד עץ טוב בגן.
481 18. *Japanese* (Hiragana):
490 1. Other phrases commonly used in Germany include: "Ein wackerer
491 Bayer vertilgt ja bequem zwo Pfund Kalbshaxe" and, more recently,
492 "Franz jagt im komplett verwahrlosten Taxi quer durch Bayern", but
493 both lack umlauts and esszet. Previously, going for the shortest
494 sentence that has all the umlauts and special characters, I had
495 "Grüße aus Bärenhöfe (und Óechtringen)!" Acute accents are not
496 used in native German words, so I was surprised to discover
497 "Óechtringen" in the Deutsche Bundespost Postleitzahlenbuch
498 <http://www.columbia.edu/~fdc/misc/oechtringen.jpg> (Vorsicht!
499 2.8MB JPG image). It's a small village in eastern Lower Saxony.
500 The "oe" in this case turns out to be the Lower Saxon "lengthening
501 e" (Dehnungs-e), which makes the previous vowel long (used in a
502 number of Lower Saxon place names such as Soest and Itzehoe), not
503 the "e" that indicates umlaut of the preceding vowel. Many thanks
504 to the Óechtringen-Namenschreibungsuntersuchungskomitee (Alex
505 Bochannek, Manfred Erren, Asmus Freytag, Christoph Päper, plus
506 Werner Lemberg who serves as the
507 Óechtringen-Namenschreibungsuntersuchungskomiteerechtschreibungsprüfer)
508 for their relentless pursuit of the facts in this case.
509 Conclusion: the accent almost certainly does not belong on this
510 (or any other native German) word, but neither can it be dismissed
511 as dirt on the page. To add to the mystery, it has been reported
512 that other copies of the same edition of the PLZB do not show the
515 2. From Karl Pentzlin (Kochel am See, Bavaria, Germany): "This German
516 phrase is suited for display by a Fraktur (broken letter) font. It
517 contains: all common three-letter ligatures: ffi ffl fft and all
518 two-letter ligatures required by the Duden for Fraktur
519 typesetting: ch ck ff fi fl ft ll ſch ſi ſſ ſt tz (all in a manner
520 such they are not part of a three-letter ligature), one example of
521 f-l where German typesetting rules prohibit ligating (marked by a
522 ZWNJ), and all German letters a...z, ä,ö,ü,ß, ſ [long s] (all in a
523 manner such that they are not part of a two-letter Fraktur
524 ligature)." Otto Stolz notes that "'Schloß' is now spelled
525 'Schloss', in contrast to 'größer' (example 4) which has kept its
526 'ß'. Fraktur has been banned from general use, in 1942, and long-s
527 (ſ) has ceased to be used with Antiqua (Roman) even earlier (the
528 latest Antiqua-ſ I have seen is from 1913, but then I am no
529 expert, so there may well be a later instance." Later Otto
530 confirms the latter theory, "Now I've run across a book “Deutsche
531 Rechtschreibung” (edited by Lutz Mackensen) from 1954 (my reprint
532 is from 1956) that has kept the Antiqua-ſ in its dictionary part
533 (but neither in the preface nor in the appendix)."
535 3. Diaeresis is not used in Iberian Portuguese.
537 4. From Yurio Miyazawa: "This poetry contains all the sounds in the
538 Japanese language and used to be the first thing for children to
539 learn in their Japanese class. The Hiragana version is
540 particularly neat because it covers every character in the
541 phonetic Hiragana character set." Yurio also sent the Kanji version:
550 /(This section contributed by Vladimir Marinov.)/
552 In Bulgarian it is desirable, customary, or in some cases required to
553 write accents over vowels. Unfortunately, no computer character sets
554 contain the full repertoire of accented Cyrillic letters. With Unicode,
555 however, it is possible to combine any Cyrillic letter with any
556 combining accent. The appearance of the result depends on the font and
557 the rendering engine. Here are two examples.
559 1. Той видя бялата коса́ по главата и́ и ко́са на рамото и́, и ре́че да и́
560 рече́: "Пара́та по́ па́ри от па́рата, не ща пари́!", но си поми́сли:
561 "Хей, помисли́ си! А́ и́ река, а́ е скочила в тази река, която щеше да
564 2. По пъ́тя пъту́ват кю́рди и югославя́ни.
567 ------------------------------------------------------------------------
570 Here is the Russian alphabet (uppercase only) coded in three different
571 ways, which should look identical:
573 1. АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ /(Literal UTF-8)/
574 2. АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ /(Decimal numeric character
576 3. АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ /(Hexadecimal numeric character
579 In another test, we use HTML language tags to distinguish Bulgarian,
581 <http://www.tiro.com/transfer/Serbian_Rendering.pdf>, which have
582 different italic forms for lowercase б, г, д, п, and/or т:
584 *Bulgarian*: [ бгдпт ] [ /бгдпт/ ] / Мога да ям стъкло и не
586 *Russian*: [ бгдпт ] [ /бгдпт/ ] /Я могу есть стекло, это мне
588 *Serbian*: [ бгдпт ] [ /бгдпт/ ] /Могу јести стакло а да ми
592 ------------------------------------------------------------------------
593 Credits, Tools, and Commentary
596 The "I can eat glass" phrase and the initial collection of
597 translations: Ethan Mollick
598 <http://hcs.harvard.edu/~igp/glass.html>. Transcription / conversion
599 to UTF-8: Frank da Cruz. *Albanian:* Sindi Keesan. *Afrikaans:*
600 Johan Fourie, Kevin Poalses. *Anglo Saxon:* Frank da Cruz. *Arabic:*
601 Najib Tounsi. *Armenian:* Vaçe Kundakçı. *Belarusian:* Alexey
602 Chernyak. *Bengali:* Somnath Purkayastha, Deepayan Sarkar.
603 *Bislama:* Dan McGarry. *Braille:* Frank da Cruz. *Bulgarian:* Sindi
604 Keesan, Guentcho Skordev, Vladimir Marinov. *Cabo Verde Creole:*
605 Cláudio Alexandre Duarte. *Chinese:* Jack Soo, Wong Pui Lam.
606 *Chinook Jargon:* David Robertson. *Cornish:* Chris Stephens.
607 *Croatian:* Marjan Baće. *Czech:* Stanislav Pecha, Radovan Garabík.
608 *Dutch:* Peter Gotink. Pim Blokland, Rob Daniel, Rob de Wit.
609 *Erzian:* Jack Rueter. *Esperanto:* Franko Luin, Radovan Garabík.
610 *Estonian:* Meelis Roos. *Farsi/Persian:* Payam Elahi. *Finnish:*
611 Sampsa Toivanen. *French:* Luc Carissimo, Anne Colin du Terrail,
612 Sean M. Burke. *Galician:* Laura Probaos. *Georgian:* Giorgi
613 Lebanidze. *German:* Christoph Päper, Otto Stolz, Karl Pentzlin,
614 Frank da Cruz. *Gothic:* Aurélien Coudurier. *Greek:* Ariel Glenn,
615 Constantine Stathopoulos, Siva Nataraja. *Hebrew:* Jonathan Rosenne,
616 Tal Barnea. *Hausa:* Malami Buba, Tom Gewecke. *Hawaiian:* na
617 Hauʻoli Motta, Anela de Rego, Kaliko Trapp. *Hindi:* Shirish Kalele.
618 *Hungarian:* András Rácz, Mark Holczhammer. *Icelandic:* Andrés
619 Magnússon. *International Phonetic Alphabet (IPA):* Siva Nataraja /
620 Vincent Ramos. *Irish:* Michael Everson, Marion Gunn, James Kass,
621 Curtis Clark. *Italian:* Thomas De Bellis. *Japanese:* Makoto
622 Takahashi, Yurio Miyazawa. *Kirchröadsj:* Roger Stoffers. *Kreyòl:*
623 Sean M. Burke. *Korean:* Jungshik Shin. *Lëtzebuergescht:* Stefaan
624 Eeckels. *Lithuanian:* Gediminas Grigas. *Lojban:* Edward Cherlin.
625 *Lusatian:* Ronald Schaffhirt. *Macedonian:* Sindi Keesan. *Malay:*
626 Zarina Mustapha. *Manx:* Éanna Ó Brádaigh. *Marathi:* Shirish
627 Kalele. *Marquesan:* Kaliko Trapp. *Middle English:* Frank da Cruz.
628 *Milanese:* Marco Cimarosti. *Mongolian:* Tom Gewecke. *Napoletano:*
629 Diego Quintano. *Navajo:* Tom Gewecke. *Nórdicg*
630 <http://www.langmaker.com/db/mdl_nordicg.htm>: Yẃlyan Rott.
631 *Norwegian:* Herman Ranes. *Odenwälderisch:* Alexander Heß. *Old
632 Irish:* Michael Everson. *Old Norse:* Andrés Magnússon.
633 *Papiamentu:* Bianca and Denise Zanardi. *Pashto:* N.R. Liwal.
634 *Pfälzisch:* Dr. Johannes Sander. *Picard:* Philippe Mennecier.
635 *Polish:* Juliusz Chroboczek. *Portuguese:* "Cláudio" Alexandre
636 Duarte, Bianca and Denise Zanardi, Pedro Palhoto Matos, Wagner
637 Amaral. *Québécois:* Laurent Detillieux. *Roman:* Pierpaolo
638 Bernardi. *Romanian:* Juliusz Chroboczek, Ionel Mugurel.
639 *Ruhrdeutsch:* "Timwi". *Russian:* Alexey Chernyak, Serge
640 Nesterovitch. *Sami:* Anne Colin du Terrail, Luc Carissimo.
641 *Sanskrit:* Siva Nataraja / Vincent Ramos. *Sächsisch:* André
642 Müller. *Schwäbisch:* Otto Stolz. *Scots:* Jonathan Riddell.
643 *Serbian:* Sindi Keesan, Ranko Narancic, Boris Daljevic, Szilvia
644 Csorba. *Slovak:* G. Adam Stanislav, Radovan Garabík. *Slovenian:*
645 Albert Kolar. *Spanish:* Aleida Muñoz
646 <http://www.panix.com/~aleida>, Laura Probaos. *Swahili:* Ronald
647 Schaffhirt. *Swedish:* Christian Rose, Bengt Larsson. *Taiwanese:*
648 Henry H. Tan-Tenn. *Tagalog:* Jim Soliven. *Tamil:* Vasee
649 Vaseeharan. *Tibetan:* D. Germano, Tom Gewecke. *Thai:* Alan Wood's
650 wife. *Turkish:* Vaçe Kundakçı, Tom Gewecke, Merlign Olnon.
651 *Ukrainian:* Michael Zajac. *Urdu:* Mustafa Ali. *Vietnamese*
652 <http://nomfoundation.org/>: Dixon Au, [James] Đỗ Bá Phước 杜 伯 福.
653 *Walloon:* Pablo Saratxaga. *Welsh:* Geiriadur Prifysgol Cymru
654 (Andrew). *Yiddish:* Mark David, *Zeneise:* Angelo Pavese.
656 *Tools Used to Create This Web Page:*
657 The UTF8-aware Kermit 95 <k95.html> terminal emulator on Windows, to
658 a Unix host with the EMACS <http://www.gnu.org/directory/emacs.html>
659 text editor. Kermit 95 displays UTF-8 and also allows keyboard entry
660 of arbitrary Unicode BMP characters as 4 hex digits, as shown HERE
661 <glass.html>. Hex codes for Unicode values can be found in The
662 Unicode Standard <http://www.unicode.org/unicode/uni2book/u2.html>
663 (recommended) and the online code charts
664 <http://www.unicode.org/charts/>. When submissions arrive by email
665 encoded in some other character set (Latin-1, Latin-2, KOI, various
666 PC code pages, JEUC, etc), I use the TRANSLATE command of C-Kermit
667 <ckermit.html> on the Unix host (where I read my mail <safe.html>)
668 to convert the character set to UTF-8 (I could also use Kermit 95
669 for this; it has the same TRANSLATE command). That's it -- no "Web
670 authoring" tools, no locales, no "smart" anything. It's just plain
671 text, nothing more. By the way, there's nothing special about EMACS
672 -- any text editor will do, providing it allows entry of arbitrary
673 8-bit bytes as text, including the 0x80-0x9F "C1" range. EMACS 21.1
674 actually supports UTF-8; earlier versions don't know about it and
675 display the octal codes; either way is OK for this purpose.
678 Date: Wed, 27 Feb 2002 13:21:59 +0100
679 From: "Bruno DEDOMINICIS" <b.dedominicis@cite-sciences.fr>
680 Subject: Je peux manger du verre, cela ne me fait pas mal.
682 I just found out your website and it makes me feel like proposing an
683 interpretation of the choice of this peculiar phrase.
685 Glass is transparent and can hurt as everyone knows. The relation
686 between people and civilisations is sometimes effusional and more
687 often rude. The concept of breaking frontiers through globalization,
688 in a way, is also an attempt to deny any difference. Isn't
689 "transparency" the flag of modernity? Nothing should be hidden any
690 more, authority is obsolete, and the new powers are supposed to
691 reign through loving and smiling and no more through coercion...
693 Eating glass without pain sounds like a very nice metaphor of this
694 attempt. That is, frontiers should become glass transparent first,
695 and be denied by incorporating them. On the reverse, it shows that
696 through globalization, frontiers undergo a process of displacement,
697 that is, when they are not any more speakable, they become repressed
698 from the speech and are therefore incorporated and might become
699 painful symptoms, as for example what happens when one tries to eat
702 The frontiers that used to separate bodies one from another tend to
703 divide bodies from within and make them suffer.... The chosen phrase
704 then appears as a denial of the symptom that might result from the
705 destitution of traditional frontiers.
708 Bruno De Dominicis, Paris, France
710 *Other Unicode pages onsite:*
712 * Peace in All Languages <http://www.columbia.edu/~fdc/pace/>
713 * Frank's Compulsive Guide to Postal Addresses <postal.html>
714 (especially the Index <postal.html#index>)
715 * Representing Middle English on the Web with UTF-8 <st-erkenwald.html>
716 * The Kermit Bibliography <biblio.html> (in UTF-8)
717 * Interchange of Non-English Computer Text <accents.html> (UTF-8
718 math and box-drawing)
719 * Unicode Table <utf8-t1.html> (in UTF-8)
721 *Unicode samplers offsite:*
723 * Michael Everson's Bibliography of Typography and Scripts
724 <http://www.evertype.com/scriptbib.html>
725 * Sample Unicode Test Pages and Script Links
726 <http://home.att.net/~jameskass/scriptlinks.htm>
727 * I don't know, I only work here <http://crism.maden.org/dunno.html>
728 * Anyone can be provincial!
729 <http://www.trigeminal.com/samples/provincial.html>
730 * Transcriptions of "Unicode"
731 <http://www.macchiato.com/unicode/Unicode_transcriptions.html>
732 * Example Unicode Usage for Business Applications
733 <http://www.i18nguy.com/unicode-example.html>
734 * UTF-8 and Unicode FAQ for Unix/Linux
735 <http://www.cl.cam.ac.uk/~mgk25/unicode.html#apps>
739 * Unicode Fonts for Windows Computers
740 <http://www.alanwood.net/unicode/fonts.html> (Alan Wood)
741 * Unicode Fonts and Tools for X11
742 <http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html> (Markus Kuhn)
743 * Everson Mono <http://www.evertype.com/emono/> (Michael Everson)
744 * Agfa Monotype <http://www.monotype.com>
746 [ Kermit 95 <k95.html> ] [ K95 Screen Shots <glass.html> ] [ C-Kermit
747 <ckermit.html> ] [ Kermit Home <index.html> ] [ Display Problems?
748 <http://www.unicode.org/help/display_problems.html> ] [ The Unicode
749 Consortium <http://www.unicode.org> ]
751 ------------------------------------------------------------------------
752 UTF-8 Sampler / The Kermit Project <index.html> / Columbia University
753 <http://www.columbia.edu> / kermit@columbia.edu
754 <mailto:kermit@columbia.edu>