LLM

LLM(대형 언어 모델) 샘플 프로그램

라마(Llama) 모델을 기반으로 한 LLM(대형 언어 모델) 샘플 프로그램을 만드는 방법
Meta의 Llama는 오픈소스 LLM으로, 자체적으로 Python 라이브러리인 `transformers`를 통해 쉽게 사용할 수 있습니다.
아래 샘플 코드는 Llama 3 모델을 다운로드하고, 간단한 텍스트 생성 작업을 수행하는 방법을 보여줍니다.

사전 준비 사항

Hugging Face 계정: Llama 모델은 Hugging Face Hub에서 다운로드받아야 합니다. Meta의 사용 허가를 얻어야 접근할 수 있습니다.
Hugging Face CLI: 터미널에 아래 명령어를 입력하여 Hugging Face에 로그인해야 합니다.
```
    huggingface-cli login
```
필수 라이브러리 설치: Llama 3는 `transformers`, `torch` 등 여러 라이브러리를 필요로 합니다.
```
    pip install transformers torch accelerate
```

GPU 사용을 원할 경우 `accelerate` 라이브러리가 필요합니다.

Llama 모델을 이용한 텍스트 생성 샘플 프로그램

아래 코드는 `transformers` 라이브러리의 `pipeline` 함수를 사용하여 Llama 3 8B 모델을 로드하고, 주어진 프롬프트에 대한 응답을 생성합니다.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Llama 3 모델 이름 (Hugging Face Hub에서 확인)
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

# 토크나이저 및 모델 로드
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,  # 메모리 효율을 위해 bfloat16 사용
    device_map="auto",          # 사용 가능한 GPU가 있으면 자동으로 사용
)

# 프롬프트 구성
messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "What is the capital of France?"},
]

# 프롬프트 토큰화
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# 텍스트 생성 파이프라인
inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

# 텍스트 생성
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
    top_p=0.9
)

# 생성된 텍스트 디코딩 및 출력
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

# 만약 파이프라인을 사용한다면 더 간단하게 구현 가능
# from transformers import pipeline
# pipe = pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16, "device_map": "auto"})
# outputs = pipe(
#     prompt,
#     max_new_tokens=256,
#     do_sample=True,
#     temperature=0.7,
#     top_p=0.9
# )
# print(outputs[0]['generated_text'])

코드 설명

1. 모델 로드: `AutoModelForCausalLM.from_pretrained()` 함수는 Hugging Face Hub에서 Llama 모델을 자동으로 다운로드하고 로드합니다.

      `torch_dtype=torch.bfloat16`: 모델의 정밀도를 `bfloat16`으로 설정하여 메모리 사용량을 절반으로 줄입니다.
      `device_map="auto"`: GPU가 있다면 자동으로 사용하고, 없다면 CPU를 사용합니다.

2. 프롬프트 구성: Llama 3 모델은 대화 형식의 프롬프트(`system`, `user`, `assistant` 역할)에 최적화되어 있습니다. 3. 토크나이저: `tokenizer.apply_chat_template()` 함수는 대화 형식을 모델이 이해할 수 있는 토큰 ID로 변환합니다. 4. 텍스트 생성: `model.generate()` 함수는 토큰화된 입력을 받아 텍스트를 생성합니다.

      `max_new_tokens`: 생성할 최대 토큰 수를 지정합니다.
      `do_sample`: `True`로 설정하면 무작위성을 부여하여 더 창의적인 텍스트를 생성합니다.
      `temperature` 및 `top_p`: 텍스트 생성의 무작위성을 제어하는 파라미터입니다.

5. 디코딩: `tokenizer.decode()` 함수는 생성된 토큰 ID를 다시 읽을 수 있는 텍스트로 변환합니다.

Comments