@cdxiao I feel like this entire field is still rapidly evolving, but HanziJS ( hanzijs.com/ ) may give you some options.

Near as I can figure, the CJK Decomposition Data (cjkdecomp.codeplex.com/) can help break a Chinese character into subcomponents , and CC-CEDICT ( cc-cedict.org/wiki/) can map a character to its (Mandarin) Pinyin sound.

Neither of these databases are 'official'

I *think* there's a Hangul decomposition algorithm: unicode.org/reports/tr15/