This form allows to search for 20943 graphs¹ that were 2017-05-14 in the Unicode Character Database as made public in the resource Unihan.zip (I fetched it from here).

Currently the following search methods are supported:

  1. Search English definitions.
  2. Search pinyin (with or without tone marks: type no tones:pinyin or n:pinyin to ignore the tone marks).
  3. Search waapuro roomaji² kun-yomi and on-yomi (Japanese readings, for example: zouri).
  4. Search with the graph itself (to look at the readings or the definition³).
  5. Search using the “random” button.

The metacharacters dot (.), asterisk (*) and plus sign (+) can be used for matching parts of words or parts of phrases. Common examples: look for the beginning of a word, like shimo.*, or the end: .*shimo or omit something in the middle: sh.*mo, or look for a part of a word or part a phrase: .*take.*).

Beware of trailing spaces; the script does not remove them (that is a feature).

Important command options:

  • Restrict search to Japanese: japanese:word or ja:word (example: ja:tatsu)
  • Restrict search to Japanese kun-yomi: kun:word (example: kun:yu)
  • Restrict search to Japanese on-yomi: on:word (example: on:yu)
  • Restrict search to Chinese pinyin: zh:word or chinese:word (example: zh:yu)
  • Restrict search to English words: en:word or english:word (example: en:yu)
  • Ignore tones no tones:word, or short n:word. For example n:tian.
  • Ignore graphs that have no Unihan definition: def only:word or do:word
  • It is possible to stack commands. Just leave a space between them. Examples: n on:den, en ja:soldier, etc.

Comments and requests to me.


1. See this linked list is for a listing.

2. Unihan uses modified Hepburn for the consonants and kana spelling for the (long) vowels.

3. Not all graphs have a Unihan definition. Also, the Unihan definitions are often only approximations. Depending on time and region the graphs can point to very different words, which is rarely indicated by the Unihan definition. Some graphs are in practice without any meaning on themselves, even if the Unihan definition (or a so called character dictionary) claims otherwise. Note that if you are new to Chinese characters, you have to understand that Chinese words are very often written with more than one character. Currently there are only about 3000 to 5000 characters in use, while dictionaries contain in excess of 100,000 words. Most of those words are spelled with two or more characters. For example, there is no definition for “hell” in this list, because that word is spelled 地獄, using two graphs. Conversely, while you can find matches for the word “butterfly”, the graphs that have that definition actually don’t mean butterfly on their own (consider them meaningless syllables); the actual word for “butterfly” (“húdié”) needs two graphs 蝴蝶.

4. As in Perl regular expressions. The dot is any character (letters, spaces, hyphens, parentheses, etc.). The asterisk is: match the previous character (or any in the case of the dot) zero or more times. The plus sign: the same, but match the previous character (or the dot) 1 or more times. Other metacharacters will be ignored for now.

5. To give an example: .*for a .* matches only 55 phrases (selecting for “a”), while .*for a.* matches 76 phrases (including other phrases than only with “a” like for example“for alms”, “for an animal”, etc.).

 

This form allows to search for 20943 graphs¹ that were 2017-05-14 in the Unicode Character Database as made public in the resource Unihan.zip (I fetched it from here).

Currently the following search methods are supported:

  1. Search English definitions.
  2. Search pinyin (with or without tone marks: type no tones:pinyin or n:pinyin to ignore the tone marks).
  3. Search waapuro roomaji² kun-yomi and on-yomi (Japanese readings, for example: zouri).
  4. Search with the graph itself (to look at the readings or the definition³).
  5. Search using the “random” button.

The metacharacters dot (.), asterisk (*) and plus sign (+) can be used for matching parts of words or parts of phrases. Common examples: look for the beginning of a word, like shimo.*, or the end: .*shimo or omit something in the middle: sh.*mo, or look for a part of a word or part a phrase: .*take.*).

Beware of trailing spaces; the script does not remove them (that is a feature).

Important command options:

  • Restrict search to Japanese: japanese:word or ja:word (example: ja:tatsu)
  • Restrict search to Japanese kun-yomi: kun:word (example: kun:yu)
  • Restrict search to Japanese on-yomi: on:word (example: on:yu)
  • Restrict search to Chinese pinyin: zh:word or chinese:word (example: zh:yu)
  • Restrict search to English words: en:word or english:word (example: en:yu)
  • Ignore tones no tones:word, or short n:word. For example n:tian.
  • Ignore graphs that have no Unihan definition: def only:word or do:word
  • It is possible to stack commands. Just leave a space between them. Examples: n on:den, en ja:soldier, etc.

Comments and requests to me.


1. See this linked list is for a listing.

2. Unihan uses modified Hepburn for the consonants and kana spelling for the (long) vowels.

3. Not all graphs have a Unihan definition. Also, the Unihan definitions are often only approximations. Depending on time and region the graphs can point to very different words, which is rarely indicated by the Unihan definition. Some graphs are in practice without any meaning on themselves, even if the Unihan definition (or a so called character dictionary) claims otherwise. Note that if you are new to Chinese characters, you have to understand that Chinese words are very often written with more than one character. Currently there are only about 3000 to 5000 characters in use, while dictionaries contain in excess of 100,000 words. Most of those words are spelled with two or more characters. For example, there is no definition for “hell” in this list, because that word is spelled 地獄, using two graphs. Conversely, while you can find matches for the word “butterfly”, the graphs that have that definition actually don’t mean butterfly on their own (consider them meaningless syllables); the actual word for “butterfly” (“húdié”) needs two graphs 蝴蝶.

4. As in Perl regular expressions. The dot is any character (letters, spaces, hyphens, parentheses, etc.). The asterisk is: match the previous character (or any in the case of the dot) zero or more times. The plus sign: the same, but match the previous character (or the dot) 1 or more times. Other metacharacters will be ignored for now.

5. To give an example: .*for a .* matches only 55 phrases (selecting for “a”), while .*for a.* matches 76 phrases (including other phrases than only with “a” like for example“for alms”, “for an animal”, etc.).