[VLM] OpenClip 모델 구동해보기

Notice

Recent Posts

Recent Comments

Link

« 2026/05 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Tags more

Archives

Today

Total

관리 메뉴

찹쌀이네 공부 공간

[VLM] OpenClip 모델 구동해보기 본문

Python Deep learning

[VLM] OpenClip 모델 구동해보기

호떡공돌이 2025. 11. 3. 23:21

OpenCLIP은 OpenAI의 CLIP 모델을 오픈소스로 그대로 구현한 모델이다. (대단..)

https://github.com/mlfoundations/open_clip

GitHub - mlfoundations/open_clip: An open source implementation of CLIP.

An open source implementation of CLIP. Contribute to mlfoundations/open_clip development by creating an account on GitHub.

github.com

OpenCLIP (2022 ~ 현재)
연구 주체: LAION (Large-scale Artificial Intelligence Open Network)

OpenAI CLIP의 구조를 그대로 재현
LAION-400M / LAION-2B / LAION-5B 데이터셋으로 새로 학습
모델 가중치 완전 공개
결과적으로 “공개 버전의 CLIP”이 됨

pip install로 간단히 설치 가능하다.

pip install open-clip-torch

필요한 모델도 받을 수 있다.

나의 GPU는 작고 소중하기 때문에 가장 작은 모델로 받아본다. (GPU VRAM 6GB로도 잘 돌아감!)

아래와 같이 로컬에 저장해두고 필요할 때마다 로컬에서 불러서 구동하면 인터넷 없이도 여러가지 테스트 가능하다.

python - <<'EOF'
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id="laion/CLIP-ViT-B-16-laion2B-s34B-b88K",
    local_dir="./models/openclip_b16",
    local_dir_use_symlinks=False
)
EOF


# 아래 파일들이 로컬에 저장된다.
open_clip_model.safetensors
open_clip_config.json
tokenizer.json
config.json
tokenizer_config.json
vocab.txt

대충 "매트 위에 있는 고양이", "달리는 고양이" 아무렇게 text를 입력해서 OpenClip 모델이 알아서 구분해버린다. (대단...)

Input 이미지는 아래와 같다.

import os
import torch
import open_clip
import numpy as np
import cv2
import matplotlib.pyplot as plt
from PIL import Image

# --------------------------------------------------
# CLIP으로 이미지 라벨 판단
# --------------------------------------------------
device = (
    "cuda" if torch.cuda.is_available() else
    "mps" if torch.backends.mps.is_available() else
    "cpu"
)

model_name = "ViT-B-16"
pretrained_tag = "openai"  # or use 'laion2b_s34b_b88k' for better performance

model, _, preprocess = open_clip.create_model_and_transforms(
    model_name=model_name,
    pretrained=pretrained_tag
)

tokenizer = open_clip.get_tokenizer(model_name)
model = model.to(device).eval()

image_path = r"./image_cat.png"
image_pil = Image.open(image_path).convert("RGB")

# CLIP 입력 준비
image_tensor = preprocess(image_pil).unsqueeze(0).to(device)
texts = ['cat sitting on a mat', 'cat running on street']
text_tokens = tokenizer(texts).to(device)

# CLIP 추론
with torch.no_grad():
    if device == "cuda":
        with torch.amp.autocast("cuda"):
            image_features = model.encode_image(image_tensor)
            text_features = model.encode_text(text_tokens)
    else:
        image_features = model.encode_image(image_tensor)
        text_features = model.encode_text(text_tokens)

    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)
    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

probs_dict = dict(zip(texts, text_probs[0].tolist()))
label = max(probs_dict, key=probs_dict.get)
print(f"CLIP result : {probs_dict} → '{label}' selected")

결과는 아래와 같다.

CLIP result : {'cat sitting on a mat': 0.99, 'cat running on street': 0.00}
→ 'cat sitting on a mat' selected

'Python Deep learning' 카테고리의 다른 글

[SSH] 맥북으로 딥러닝 환경 구축하기 (Windows) (6)	2025.04.10
[딥러닝] 데이터 증강 imgaug 라이브러리 (3)	2024.01.11

'Python Deep learning' Related Articles

찹쌀이네 공부 공간

[VLM] OpenClip 모델 구동해보기 본문

[VLM] OpenClip 모델 구동해보기

'Python Deep learning' 카테고리의 다른 글

티스토리툴바