Cotonoha

Howdy. Cotonoha GK is a Natural Language Processing consultancy based in Tokyo.

日本語はこちら

Some of my services:

Data Management. Have hundreds of terabytes of data and no idea what's there? Need to deal with duplicates or removing specific data? Cotonoha can work with existing systems and minimal resources for you to get the most out of the data you already have.

Classification and Information Extraction: Have more bug reports, fact sheets, or other documents than you can read? Cotonoha can help you sort them into piles and add helpful labels to reduce or eliminate your workload.

NLP Pipeline Support: Have the data you need, but it's in incompatible or hard to process formats? Cotonoha can build a system to work with your existing technology and make your life easier.

Japanese NLP Support: Whether you're trying to add Japanese support to an existing system or already have a Japanese system but are having issues, Cotonoha can help. Getting rid of mojibake, tokenization issues, or dealing with hyouki-yure (orthographic variation) are all basic tasks that affect any system. Depending on your application support for improving autocompletion, automatically adding furigana, or other tasks may be helpful.


Open Source

Cotonoha maintains a number of open-source packages, many related to working with Japanese text.

  • cutlet, a Japanese to romaji converter. Try the online demo!
  • fugashi, a MeCab wrapper used in Hugging Face Transformers
  • mecab-python3, the most popular Japanese tokenizer package for Python
  • posuto, a handy interface to Japanese postal code data

Cotonoha can add these to a project for you, or provide direct guidance to help your team set these up in your existing system.


Think there's something Cotonoha can help with? Get in touch by mail. ❧