Meta-Llama-3 model text-generation example output is unexpected on 2 nodes 

### System Info

```shell
deepspeed                 0.14.4+hpu.synapse.v1.18.0
optimum-habana            1.14.0

docker image: vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
```


### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

1. Setup 2 nodes for test
2.  Run text generation example 
`
python3 ../gaudi_spawn.py --hostfile hostfile --use_deepspeed --world_size 16  --master_port 29500 \
run_generation.py \
--model_name_or_path  /data1/zhixue/Llama-3.1-70B-Instruct/ \
--bf16 \
--batch_size 1 \
--use_hpu_graphs --limit_hpu_graphs \
--max_new_tokens 512
`
3. The generation output looks like:
`10.233.108.205: Input/outputs:
10.233.108.205: input 1: ('DeepSpeed is a machine learning framework',)
10.233.108.205: output 1: ('DeepSpeed is a machine learning framework!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!',)`

### Expected behavior

If test with model Llama-2-7b-hf, the output is below. I found the issue on the latest meta-llama-3 and meta-llama-3.1 with 2 nodes inference. 
`10.233.108.205: input 1: ('DeepSpeed is a machine learning framework',)
10.233.108.205: output 1: ('DeepSpeed is a machine learning framework for deep learning. It is designed to be fast and efficient, while also being easy to use. DeepSpeed is based on the TensorFlow framework, and it uses the TensorFlow Lite library to run on mobile devices.\nDeepSpeed is a deep learning framework that is designed to be fast and efficient. It is based on the TensorFlow framework and uses the TensorFlow Lite library to run on mobile devices. DeepSpeed is designed to be easy to use and to provide a high level of performance.\nDeepSpeed is a deep learning framework that is designed to be fast and efficient. It is based on the TensorFlow framework and uses the TensorFlow Lite library to run on mobile devices. DeepSpeed is designed to be easy to use and to provide a high level of performance.\nDeepSpeed is a deep learning framework that is designed to be fast and efficient. It is based on the TensorFlow framework and uses the TensorFlow Lite library to run on mobile devices. DeepSpeed is designed to be easy to use and to provide a high level of performance.\nDeepSpeed is a deep learning framework that is designed to be fast and efficient. It is based on the TensorFlow framework and uses the TensorFlow Lite library to run on mobile devices. DeepSpeed is designed to be easy to use and to provide a high level of performance.\nDeepSpeed is a deep learning framework that is designed to be fast and efficient. It is based on the TensorFlow framework and uses the TensorFlow Lite library to run on mobile devices. DeepSpeed is designed to be easy to use and to provide a high level of performance.\nDeepSpeed is a deep learning framework that is designed to be fast and efficient. It is based on the TensorFlow framework and uses the TensorFlow Lite library to run on mobile devices. DeepSpeed is designed to be easy to use and to provide a high level of performance.\nDeepSpeed is a deep learning framework that is designed to be fast and efficient. It is based on the TensorFlow framework and uses the TensorFlow Lite library to run on mobile devices. DeepSpeed is designed to be easy to use and to provide a high level of performance.\nDeepSpeed is a deep learning framework that is designed to be fast and efficient. It is based on the TensorFlow framework and uses the TensorFlow Lite library to run on mobile devices. DeepSpeed is',)`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Meta-Llama-3 model text-generation example output is unexpected on 2 nodes #1451

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Meta-Llama-3 model text-generation example output is unexpected on 2 nodes #1451

Description

System Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions