Cotonoha
Howdy. Cotonoha is Paul McCann's Natural Language Processing consultancy based in Tokyo.
Some of my services:
Classification and Information Extraction: Have more bug reports, fact sheets, or other documents than you can read? I can help you sort them into piles and add helpful labels to reduce or eliminate your workload.
NLP Pipeline Support: Have the data you need, but it's in incompatible or hard to process formats? I can build a system to work with your existing technology and make your life easier.
Japanese NLP Support: Whether you're trying to add Japanese support to an existing system or already have a Japanese system but are having issues, I can help. Getting rid of mojibake, tokenization issues, or dealing with hyouki-yure (orthographic variation) are all basic tasks that affect any system. Depending on your application support for improving autocompletion, automatically adding furigana, or other tasks may be helpful.
Open Source
I maintain a number of open-source packages, many related to working with Japanese text.
- cutlet, a Japanese to romaji converter. Try the online demo!
- fugashi, a MeCab wrapper used in Hugging Face Transformers
- mecab-python3, the most popular Japanese tokenizer package for Python
- posuto, a handy interface to Japanese postal code data
Contact me here. ❧