画像の異常検知 ind_knn_ad を学習と推論で分割する【PatchCore編】

はじめに
前提条件
学習
1. 学習の準備
2. 学習の実行
推論
1. 推論の準備
2. 推論の実行
おわりに

はじめに

前回は PaDiM の使用方法について説明しました。

今回は PatchCore について説明していきます。

前提条件

前提条件は以下の通りです。

python3.9
torch == 1.12.1+cu113

ind_knn_ad の github はこちらです。

学習

学習の準備

indad/models.py の PatchCore クラスの fit 関数の末尾に以下を追加してください。

x = self.patch_lib.to('cpu').detach().numpy().copy()
np.save("./npy_data/patch_lib.npy", x)

学習の実行

train.py を以下のようにしてください。

from indad.models import SPADE, PaDiM, PatchCore
from indad.data import MVTecDataset
import cv2
import torch
import numpy as np
from torchvision import transforms
from torch import tensor

IMAGENET_MEAN = tensor([.485, .456, .406])
IMAGENET_STD = tensor([.229, .224, .225])
SIZE = 224
filename = "./weights/PatchCore.pth"

# model = SPADE(k=25, backbone_name="wide_resnet50_2")
# model = PaDiM(d_reduced=350, backbone_name="wide_resnet50_2")
model = PatchCore(f_coreset=.10, backbone_name="wide_resnet50_2")

# model.to("cuda")
train_ds, test_ds = MVTecDataset("custom", SIZE).get_dataloaders()

# feed healthy dataset
model.fit(train_ds)

# torch.save(model, filename)
torch.save(model.state_dict(), filename)

print("model saved to: ", filename)

上記を実行すると、以下のような出力が得られます。

100%|██████████| 1000/1000 [02:39<00:00,  6.26it/s]
Fitting random projections. Start dim = torch.Size([784000, 1536]).
DONE.                 Transformed dim = torch.Size([784000, 335]).
100%|██████████| 78399/78399 [14:59<00:00, 87.14it/s]
model saved to:  ./weights/PatchCore.pth

続いて、推論方法の説明をしていきます。

推論

推論の準備

indad/models.py の PatchCore クラスに以下の standby 関数を追加してください。

def standby(self):
		largest_fmap_size = torch.LongTensor([28, 28])
		self.resize = torch.nn.AdaptiveAvgPool2d(largest_fmap_size)
		self.patch_lib = np.load("./npy_data/patch_lib.npy")
		self.patch_lib = torch.from_numpy(self.patch_lib.astype(np.float32)).clone()

largest_fmap_size は今回の場合は [28, 28] となります。

推論の実行

inference.py を以下のようにしてください。

from indad.models import SPADE, PaDiM, PatchCore
import cv2
import torch
import numpy as np
from torchvision import transforms
from torch import tensor

IMAGENET_MEAN = tensor([.485, .456, .406])
IMAGENET_STD = tensor([.229, .224, .225])
SIZE = 224
filename = "./weights/PatchCore.pth"

load_model = PatchCore(f_coreset=.10, backbone_name="wide_resnet50_2")
load_model.load_state_dict(torch.load(filename))
load_model.standby()

transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize(SIZE, interpolation=transforms.InterpolationMode.BICUBIC),
    transforms.CenterCrop(SIZE),
    transforms.ToTensor(),
    transforms.Normalize(IMAGENET_MEAN, IMAGENET_STD)
])

# get predictions
good_frame = cv2.imread("./good.png")
good_frame = cv2.cvtColor(good_frame,cv2.COLOR_BGR2RGB)
good_x = transform(good_frame)
good_x = good_x.unsqueeze(0)

defect_frame = cv2.imread("./defect.png")
defect_frame = cv2.cvtColor(defect_frame,cv2.COLOR_BGR2RGB)
defect_x = transform(defect_frame)
defect_x = defect_x.unsqueeze(0)

load_model.eval()
with torch.no_grad():
    img_lvl_anom_score, pxl_lvl_anom_score = load_model.predict(good_x)
    print("good frame score is: ", img_lvl_anom_score)
    img_lvl_anom_score, pxl_lvl_anom_score = load_model.predict(defect_x)
    print("defect frame score is: ", img_lvl_anom_score)

    print(pxl_lvl_anom_score.shape)

anom_frame = pxl_lvl_anom_score.numpy().reshape(224,224,1).astype("uint8")
print(anom_frame.shape)

cv2.imshow("frame", anom_frame)
cv2.waitKey(0)
cv2.destroyAllWindows()

上記を実行すると、以下の出力が得られます。