New comment by natecull

@cdxiao I feel like this entire field is still rapidly evolving, but HanziJS ( http://www.hanzijs.com/ ) may give you some options.

Near as I can figure, the CJK Decomposition Data (http://cjkdecomp.codeplex.com/) can help break a Chinese character into subcomponents , and CC-CEDICT ( https://cc-cedict.org/wiki/) can map a character to its (Mandarin) Pinyin sound.

Neither of these databases are 'official'

I *think* there's a Hangul decomposition algorithm: http://unicode.org/reports/tr15/