Skip to content

lmg-anon/jiten-parser-py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jiten-Parser-Py

A Python port of the Jiten's Parser library (commit: 2e3588f8) for Japanese text segmentation.

Installation

Requires Python 3.8+.

pip install git+https://github.com/lmg-anon/jiten-parser-py.git
python -m jiten.setup_deps

Usage

from jiten.parser import Parser
from jiten.jmdict.jmdict import JmDict

jmdict = JmDict()

text = "美少女がアニメを見ている。"
parsed_words = Parser.parse_text(text)

for word in parsed_words:
    entry = jmdict.get_word_by_id(word.word_id)
    if entry:
        dictionary_form = entry.readings[word.reading_index]
        meanings = entry.definitions[0].english_meanings if entry.definitions else []
        print(f"'{word.original_text}' -> {dictionary_form} | {meanings[:2]}")

Output:

'美少女' -> 美少女 | ['beautiful girl']
'が' -> が | ['indicates the subject of a sentence']
'アニメ' -> アニメ | ['animation', 'animated film']
'を' -> を | ['indicates direct object of action']
'見ている' -> 見る | ['to see', 'to look']

Interactive GUI Example

The repository includes a simple GUI that mimics Jiten's website.

image

To run it, first install the additional dependencies:

pip install "jiten-parser[gui] @ git+https://github.com/lmg-anon/jiten-parser-py.git"

Then run using this command:

python -m jiten.app.gui

License & Acknowledgements

This project is licensed under the Apacha-2.0 License.

This project is a port and utilizes resources from the following projects:

  • Jiten's Parser by Sirush: The original C# library, deconjugation rules, and data resources.
  • Sudachi: The Japanese morphological analyzer.
  • EDRDG: The JMDict and JMnedict dictionary files, used in conformance with the Group's licence.

About

Japanese text segmentation library

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages