Howdy. Cotonoha is Paul McCann's Natural Language Processing consultancy based in Tokyo.


Some of my services:

Classification and Information Extraction: Have more bug reports, fact sheets, or other documents than you can read? I can help you sort them into piles and add helpful labels to reduce or eliminate your workload.

NLP Pipeline Support: Have the data you need, but it's in incompatible or hard to process formats? I can build a system to work with your existing technology and make your life easier.

Japanese NLP Support: Whether you're trying to add Japanese support to an existing system or already have a Japanese system but are having issues, I can help. Getting rid of mojibake, tokenization issues, or dealing with hyouki-yure (orthographic variation) are all basic tasks that affect any system. Depending on your application support for improving autocompletion, automatically adding furigana, or other tasks may be helpful.

Open Source

I maintain a number of open-source packages, many related to working with Japanese text.

  • cutlet, a Japanese to romaji converter. Try the online demo!
  • fugashi, a MeCab wrapper used in Hugging Face Transformers
  • mecab-python3, the most popular Japanese tokenizer package for Python
  • posuto, a handy interface to Japanese postal code data

Contact me here. ❧