Performance issue in input.py (by P3)

Hello! I've found a performance issue in input.py: `dataset.batch(batch_size)`[(line 143)](https://github.com/dsindex/etagger/blob/27143071868fce80d71661ccfb6fd83be7f955f6/input.py#L143) should be called before `dataset.map(parser,num_parallel_calls=tf.data.experimental.AUTOTUNE)`[(line 141)](https://github.com/dsindex/etagger/blob/27143071868fce80d71661ccfb6fd83be7f955f6/input.py#L141), which could make your program more efficient.

Here is [the tensorflow document](https://tensorflow.google.cn/guide/data_performance?hl=zh_cn#vectorized_mapping) to support it.

Besides, you need to check the function `parser` called in `dataset.map(parser,num_parallel_calls=tf.data.experimental.AUTOTUNE)` whether to be affected or not to make the changed code work properly. For example, if `parser` needs data with shape (x, y, z) as its input before fix, it would require data with shape (batch_size, x, y, z) after fix.

Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issue in input.py (by P3) #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Performance issue in input.py (by P3) #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions