在 PyTorch 中使用 Detectron2 進(jìn)行對象檢測的指南

黃爸爸好 2021-10-25

展開全文

目標(biāo)檢測是深度學(xué)習(xí)的流行應(yīng)用之一。讓我們首先考慮一個現(xiàn)實生活中的例子。大多數(shù)人會在手機(jī)中使用 Google 相冊，它會根據(jù)“事物”選項下的照片自動將照片分組。我在下面附上一個片段。

你可以觀察到該應(yīng)用程序能夠從圖片中識別對象并使用它們將它們分類為更廣泛的類別。這是一個涉及對象檢測的示例。

在本文中，我將使用名為 Detectron2 的最新穩(wěn)健模型執(zhí)行對象檢測，使用 PyTorch 作為代碼。

介紹 Detectron2

Facebook AI Research (FAIR) 提出了這個高級庫，它在對象檢測和分割問題上取得了驚人的結(jié)果。Detectron2 基于 maskrcnn 基準(zhǔn)。它的實現(xiàn)是在 PyTorch 中。由于涉及大量計算，它需要 CUDA。

它支持邊界框檢測、實例分割、關(guān)鍵點檢測、密集姿態(tài)檢測等多項任務(wù)。它提供了預(yù)先訓(xùn)練的模型，你可以輕松地將其加載并用于新圖像。

我將在下一節(jié)中介紹一個示例。

安裝

第一步是安裝detectron2庫和需要的依賴

import torch
torch.__version__
import torchvision
#torchvision.__version__
!pip install detectron2 -f https://dl./detectron2/wheels/cu102/torch1.7/index.html

現(xiàn)在，你必須導(dǎo)入detectron2 及其模塊。

import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()
%matplotlib inline
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2.structures import BoxMode

讓我們也導(dǎo)入我們需要的公共庫。

import numpy as np
import os, json, cv2, random
import matplotlib.pyplot as plt

使用預(yù)訓(xùn)練模型進(jìn)行推理：代碼

Detectron2 的許多預(yù)訓(xùn)練模型可以在MODEL_ZOO（https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md）訪問。這些模型已經(jīng)在不同的數(shù)據(jù)集上進(jìn)行了訓(xùn)練，可以隨時使用。

即使人們在訓(xùn)練他們的自定義數(shù)據(jù)集時，他們也會使用這些預(yù)訓(xùn)練的權(quán)重來初始化他們的模型。事實證明，它可以減少訓(xùn)練時間并提高性能。我們將使用的模型是在 COCO 數(shù)據(jù)集上預(yù)訓(xùn)練的。

首先，我們必須定義對象檢測模型的完整配置。我們從detectron2.config 模塊中導(dǎo)入了'get_cfg' 函數(shù)，我們現(xiàn)在將使用它。

我選擇了 Coco 實例分段配置（YAML 文件）。還有其他選項可用。你還必須設(shè)置模型的閾值分?jǐn)?shù)（通常設(shè)置在 0.4 到 0.6 之間）。你可以從檢查點加載配置的預(yù)訓(xùn)練權(quán)重。

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file('COCOInstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml'))
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url('COCOInstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml')

完成配置部分后，我們使用配置初始化 DefaultPredictor。

predictor = DefaultPredictor(cfg)

現(xiàn)在可以開始預(yù)測圖像了。

讓我們在示例圖像上使用它。下面的代碼使用 OpenCV 庫加載和讀取圖像。

!wget http://images./val2017/000000439715.jpg -O input.jpg
im = cv2.imread('./input.jpg')
print(im.shape)
plt.figure(figsize=(15,7.5))
plt.imshow(im[..., ::-1])

怎么做檢測？

將輸入圖像傳遞給我們初始化的預(yù)測器

outputs = predictor(im[..., ::-1])

這個輸出是一個字典。字典有實例（預(yù)測框）、分?jǐn)?shù)、預(yù)測標(biāo)簽，我附上了代碼片段的輸出。

接下來，使用 Visualizer 類查看檢測是如何執(zhí)行的。可視化類具有繪制實例預(yù)測的功能。

v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
out = v.draw_instance_predictions(outputs['instances'].to('cpu'))
plt.figure(figsize=(20,10))
plt.imshow(out.get_image()[..., ::-1][..., ::-1])

你可以觀察到模型檢測到了所有的人和馬。

我在照片上附加了另一個示例輸出。

背景中的汽車也有97% 的準(zhǔn)確率被檢測到。

自定義數(shù)據(jù)集上的 Detectron2

到目前為止，我們只是使用預(yù)訓(xùn)練的模型進(jìn)行推理。但在某些情況下，你可能需要單獨檢測汽車、人等特定物體。你可能想從頭開始在數(shù)據(jù)集上訓(xùn)練模型。

Detectron2 也為此提供了一種簡單的方法。讓我們看看如何操作。

準(zhǔn)備數(shù)據(jù)集

我將使用氣球數(shù)據(jù)集，目的是檢測圖像中的氣球。這是一個比較簡單的例子。

!wget https://github.com/matterport/Mask_RCNN/releases/download/v2.1/balloon_dataset.zip
!unzip balloon_dataset.zip > /dev/null

請注意，Detectron2 需要特定格式的數(shù)據(jù)。為了將我們的氣球數(shù)據(jù)集轉(zhuǎn)換成這樣的格式，讓我們定義一些輔助函數(shù)。

我們的函數(shù)將輸入圖像目錄/文件夾路徑作為輸入。然后打開并加載 JSON 文件。我們通過JSON文件的記錄枚舉，得到圖片路徑。從路徑中讀取每張圖像，并將其高度、權(quán)重、文件名和圖像 ID 存儲在字典“record”中。

接下來，我們通讀注釋，并將邊界框詳細(xì)信息存儲在另一個字典“obj”中。在每個循環(huán)結(jié)束時，記錄會附加到名為“dataset_dicts”的列表中。

類似地，邊界框字典也附加到列表“objs”。該列表將依次被分配為記錄字典中“annotations”鍵的值。

from detectron2.structures import BoxMode
def get_balloon_images(img_folder):
    json_file = os.path.join(img_folder, 'via_region_data.json')
    with open(json_file) as f:
        imgs_anns = json.load(f)
    dataset_dicts = []
    for idx, v in enumerate(imgs_anns.values()):
        record = {}
        filename = os.path.join(img_dir, v['filename'])
        height, width = cv2.imread(filename).shape[:2]
        record['file_name'] = filename
        record['image_id'] = idx
        record['height'] = height
        record['width'] = width
        annos = v['regions']
        objs = []
        for _, anno in annos.items():
            assert not anno['region_attributes']
            anno = anno['shape_attributes']
            px = anno['all_points_x']
            py = anno['all_points_y']
            poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)]
            poly = [p for x in poly for p in x]
            obj = {
                'bbox': [np.min(px), np.min(py), np.max(px), np.max(py)],
                'bbox_mode': BoxMode.XYXY_ABS,
                'segmentation': [poly],
                'category_id': 0,
            }
            objs.append(obj)
        record['annotations'] = objs
        dataset_dicts.append(record)
    return dataset_dicts

最后，此輔助函數(shù)返回具有注釋的字典列表。下一步是注冊這些訓(xùn)練和驗證數(shù)據(jù)集。要注冊數(shù)據(jù)集，你必須使用 DatasetCatalog.register 和 MetadataCatalog 方法。

for d in ['train', 'val']:
DatasetCatalog.register('balloon_' + d, lambda d=d: get_balloon_images('balloon/' + d))
MetadataCatalog.get('balloon_' + d).set(thing_classes=['balloon'])
balloon_metadata = MetadataCatalog.get('balloon_train')

訓(xùn)練數(shù)據(jù)可視化

我們已經(jīng)注冊了數(shù)據(jù)集?，F(xiàn)在讓我們看一下訓(xùn)練數(shù)據(jù)。下面的代碼從氣球訓(xùn)練數(shù)據(jù)集中隨機(jī)抽取一個樣本。

為了繪制實例檢測，我們再次使用可視化類

dataset_dicts = get_balloon_images('balloon/train')
for d in random.sample(dataset_dicts, 3):
    img = cv2.imread(d['file_name'])
    visualizer = Visualizer(img[:, :, ::-1], metadata=balloon_metadata, scale=0.5)
    out = visualizer.draw_dataset_dict(d)
    plt.figure(figsize=(15,7))
    plt.imshow(out.get_image()[:, :, ::-1][..., ::-1])

自定義數(shù)據(jù)訓(xùn)練

讓我們進(jìn)入訓(xùn)練部分。為此，首先從 Detectron 的引擎模塊中導(dǎo)入 DefaultTrainer。定義數(shù)據(jù)集和其他參數(shù)，如worker數(shù)、批次大小、類數(shù)（在本例中為 1）。

我們用預(yù)訓(xùn)練的權(quán)重初始化模型并進(jìn)一步訓(xùn)練。最大迭代次數(shù)參數(shù)將根據(jù)數(shù)據(jù)集的大小和任務(wù)的復(fù)雜性而變化。

from detectron2.engine import DefaultTrainer
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file('COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml'))
cfg.DATASETS.TRAIN = ('balloon_train',)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 6
# Let training initialize from model zoo
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url('COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml')
cfg.SOLVER.IMS_PER_BATCH = 8
cfg.SOLVER.BASE_LR = 0.001
cfg.SOLVER.MAX_ITER = 500
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128 # faster, enough for this dataset (default: 512)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 #only one class (balloon)
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()

這可能需要一段時間來訓(xùn)練！

結(jié)果

請注意，無論何時訓(xùn)練深度學(xué)習(xí)模型，都要保存其最終檢查點。你可以輕松加載它以執(zhí)行預(yù)測并獲得推論。

下面的代碼片段加載模型并初始化預(yù)測器。我們從驗證數(shù)據(jù)集中抽取一些隨機(jī)樣本并將它們傳遞給預(yù)測器。

cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, 'model_final.pth')  # path to the model we trained
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5   # set a testing threshold
predictor = DefaultPredictor(cfg)
from detectron2.utils.visualizer import ColorMode
dataset_dicts = get_balloon_images('balloon/val')
for d in random.sample(dataset_dicts, 2):    
    im = cv2.imread(d['file_name'])
    outputs = predictor(im) 
    v = Visualizer(im[:, :, ::-1],
                   metadata=balloon_metadata, 
                   scale=0.5, 
                   instance_mode=ColorMode.IMAGE_BW  
    )
    out = v.draw_instance_predictions(outputs['instances'].to('cpu'))
    plt.figure(figsize=(15,7))
    plt.imshow(out.get_image()[:, :, ::-1][..., ::-1])