Skip to content

Punctuation model is capitalizing the word after commas and not putting spaces between sentences #21

@EvanKrall

Description

@EvanKrall

I'm testing FireRedASR2S against some English audio, running this command:

fireredasr2s-cli --asr_model_dir ~/FireRedASR2S/examples_infer/asr_system/pretrained_models/FireRedASR2-AED --asr_use_gpu 0 --asr_use_half 0 --asr_batch_size 16 --beam_size 3 --nbest 1 --decode_max_len 0 --softmax_smoothing 1.25 --aed_length_penalty 0.6 --eos_penalty 1.0 --return_timestamp 1 --enable_vad 0 --vad_model_dir ~/FireRedASR2S/examples_infer/asr_system/pretrained_models/FireRedVAD/VAD --vad_use_gpu 0 --smooth_window_size 5 --speech_threshold 0.5 --min_speech_frame 20 --max_speech_frame 2000 --min_silence_frame 10 --merge_silence_frame 50 --extend_speech_frame 5 --vad_chunk_max_frame 30000 --enable_lid 1 --lid_model_dir ~/FireRedASR2S/examples_infer/asr_system/pretrained_models/FireRedLID --lid_use_gpu 0 --enable_punc 1 --punc_model_dir ~/FireRedASR2S/examples_infer/asr_system/pretrained_models/FireRedPunc --punc_use_gpu 0 --punc_batch_size 32 --punc_with_timestamp 1 --punc_sentence_max_length 25 --write_textgrid 1 --write_srt 1 --save_segment 1 --wav_paths {wav files}

This prints dictionaries like {'uttid': '<path to wav', 'text': 'Sentence one.Sentence two,With a comma.Sentence three.', ...}

I would expect that text to be formatted as

Sentence one. Sentence two, with a comma. Sentence three.

That is, there should be a space after each punctuation mark, and it should only capitalize the first letter of the word after .?!, not ,.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions