Implement half-width katakana support#21
Conversation
| hira.push(hira_char); | ||
| previous_kana = Some(hira_char); | ||
| } else if is_char_halfwidth_katakana(input_char) { | ||
| let result = HALFWIDTH_KATAKANA_TO_HIRAGANA_NODE_TREE.get(&chars[index..]); |
There was a problem hiding this comment.
I'd prefer a pos variable that gets incremented instead of the previous_read_forward_count.
|
@PSeitz Can you take a look again on this, so that I can finish this 🙇 |
|
Sorry for the delay, will come back shortly to continue the review |
| previous_kana = Some(hira_char); | ||
| } else if is_char_halfwidth_katakana(input_char) { | ||
| let result = HALFWIDTH_KATAKANA_TO_HIRAGANA_NODE_TREE.get(&chars[index..]); | ||
| result.0.chars().for_each(|char| hira.push(char)); |
There was a problem hiding this comment.
that's unidiomatic rust
hira.extend(result.0.chars());
There was a problem hiding this comment.
Oh, that's simple. Refactored so in f9597f4
| assert_eq!(to_hiragana("ダヂヅデド"), "だぢづでど"); | ||
| assert_eq!(to_hiragana("バビブベボ"), "ばびぶべぼ"); | ||
| assert_eq!(to_hiragana("パピプペポ"), "ぱぴぷぺぽ"); | ||
| assert_eq!(to_hiragana("ヴ"), "ゔ"); |
There was a problem hiding this comment.
I think the half-width ー is handled differently. Can you check
assert_eq!("スーパー".to_hiragana(), "スーパー".to_hiragana());
There was a problem hiding this comment.
Implemented long-voweled transformation in fd05bc4
| let mut count: usize = 0; | ||
| let chars = input.chars().collect::<Vec<_>>(); | ||
|
|
||
| for (index, input_char) in input.chars().enumerate() { |
There was a problem hiding this comment.
we can iterate on the chars vec directly and index via pos
count += 1 at the end of the loop and in the halfwidth case
count += result.1 - 1 ;
This removes the read-ahead skip check, since the cursor jumps past consumed halfwidth katakana directly
| // the long-vowel transformation below applies uniformly. | ||
| let chars: Vec<char> = input | ||
| .chars() | ||
| .map(|c| if c == 'ー' { 'ー' } else { c }) |
There was a problem hiding this comment.
Instead of replacing we should update the method to also contain the short-vowel version (and rename the method to is_prolonged_sound)
/// Returns true if char is 'ー'
pub fn is_char_long_dash(char: char) -> bool {
char as u32 == PROLONGED_SOUND_MARK
}
5fe53f9 to
ad6b77d
Compare
|
Thanks for the PR! |
Closes #19
(recreation of #20 due to accidental repo cleanup)
utils/halfwidth_katakana_to_hiragana...._to_hiraganautil can be leveraged into_hiragana's character loops so the input is not enumerated multiple times.to_katakanaandto_romaji, the util is invoked only when input contains half-width kana,to_haflwidth_katakanaor new option is NOT added, since I don't need such.