JJJYmmm Blog

Faster RCNN框架图

图源： deep-learning-for-image-processing/pytorch_object_detection/faster_rcnn at master · WZMIAOMIAO/deep-learning-for-image-processing (github.com)

源码主要内容

Faster R-CNN源码阅读将从以下几个方面展开，详见其他文档

DataSet
网络框架
GeneralizedRCNNTransform
RPN
Predict Header
正负样本划分与采样
Loss函数
PostProcess
Change Backbone(with FPN)

环境配置

Python 3.6/3.7/3.8
Pytorch>=1.6.0
pycocotools
Ubuntu or Centos
Use Gpu to train model
more details see requirements.txt

文件结构

  ├── backbone: 特征提取网络，可以根据自己的要求选择
  ├── network_files: Faster R-CNN网络（包括Fast R-CNN以及RPN等模块）
  ├── train_utils: 训练验证相关模块（包括cocotools）
  ├── my_dataset.py: 自定义dataset用于读取VOC数据集
  ├── train_mobilenet.py: 以MobileNetV2做为backbone进行训练
  ├── train_resnet50_fpn.py: 以resnet50+FPN做为backbone进行训练
  ├── train_multi_GPU.py: 针对使用多GPU的用户使用
  ├── predict.py: 简易的预测脚本，使用训练好的权重进行预测测试
  ├── validation.py: 利用训练好的权重验证/测试数据的COCO指标，并生成record_mAP.txt文件
  ├── coco.json: coco数据集标签文件
  └── pascal_voc_classes.json: pascal_voc标签文件

预训练权重

MobileNetV2 weights(下载后重命名为mobilenet_v2.pth，然后放到bakcbone文件夹下): https://download.pytorch.org/models/mobilenet_v2-b0353104.pth
Resnet50 weights(下载后重命名为resnet50.pth，然后放到bakcbone文件夹下): https://download.pytorch.org/models/resnet50-0676ba61.pth
ResNet50+FPN weights: https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth

注意在源码中修改对应模型的路径与名称

数据集(以PASCAL VOC2012为例)

Pascal VOC2012 train/val数据集下载地址：http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
如果不了解数据集或者想使用自己的数据集进行训练，请参考:https://b23.tv/F1kSCK
使用ResNet50+FPN以及迁移学习在VOC2012数据集上得到的权重: 链接:https://pan.baidu.com/s/1ifilndFRtAV5RDZINSHj5w 提取码:dsz8

训练

确保提前准备好数据集
确保提前下载好对应预训练模型权重
若要训练mobilenetv2+fasterrcnn，直接使用train_mobilenet.py训练脚本
若要训练resnet50+fpn+fasterrcnn，直接使用train_resnet50_fpn.py训练脚本
若要使用多GPU训练，使用python -m torch.distributed.launch --nproc_per_node=8 --use_env train_multi_GPU.py指令,nproc_per_node参数为使用GPU数量
如果想指定使用哪些GPU设备可在指令前加上CUDA_VISIBLE_DEVICES=0,3(例如我只要使用设备中的第1块和第4块GPU设备)
CUDA_VISIBLE_DEVICES=0,3 python -m torch.distributed.launch --nproc_per_node=2 --use_env train_multi_GPU.py

注意事项

在使用训练脚本时，注意要将--data-path(VOC_root)设置为自己存放VOCdevkit文件夹所在的根目录
由于带有FPN结构的Faster RCNN很吃显存，如果GPU的显存不够(如果batch_size小于8的话)建议在create_model函数中使用默认的norm_layer，即不传递norm_layer变量，默认去使用FrozenBatchNorm2d(即不会去更新参数的bn层),使用中发现效果也很好。
训练过程中保存的results.txt是每个epoch在验证集上的COCO指标，前12个值是COCO指标，后面两个值是训练平均损失以及学习率
在使用预测脚本时，要将train_weights设置为你自己生成的权重路径。
使用validation文件时，注意确保你的验证集或者测试集中必须包含每个类别的目标，并且使用时只需要修改--num-classes、--data-path和--weights-path即可，其他代码尽量不要改动

数据表示

一般来说字符串数据分为四种：

分类数据
可以在语义上映射维类别的自由字符串
结构化字符串数据
文本数据

词袋表示

这种表示舍弃了输入文本中的大部分结构，如段落、章节、句子和格式，只计算每个单词在每个文本中的出现频次。

计算词袋有以下步骤：

分词（tokenization）：将每个文档划分为出现在其中的单词，按空格和标点划分。
构建词表（vocabulary building）：收集词表，包含出现在任意文档的所有词。
编码（encoding）：对于每个文档，计算每个单词在文档中的出现频次。(稀疏矩阵存储)

bag_of_words

CountVectorizer

简单使用

from sklearn.feature_extraction.text import CountVectorizer
vect = CountVectorizer()
vect.fit(bard_words) # train
len(vect.vocabulary_)
vect.vocabulary_
bag_of_words = vect.transform(bard_words) # 词袋表示使用稀疏矩阵存储
bag_of_words.toarray() # 转换成array可视化

改进单词提取

CountVectorizer使用正则表达式进行分词 "\b\w\w+\b"
指定min_df可以减少特征量,仅使用至少在min_df个文档出现的单词

vect = CountVectorizer(min_df=5).fit(text_train)
X_train = vect.transform(text_train)
feature_names = vect.get_feature_names() # get feature name

删除停用词

指定stop_words字段

vect = CountVectorizer(min_df=5,stop_words="english").fit(text_train)
X_train = vect.transform(text_train)

TfidfVectorizer

tf-idf方法

tf-idf即词频-逆向文档频率,这种方法对于在某个特定文档经常出现的术语给予很高的权重,对于在语料库中的不同文档经常出现的术语给予较低的权重,因此高权重的术语更有可能概括整个文档的内容.

$$ tfidf(w,d) = tf \log (\frac {N+1}{N_w+1})+1 $$

sklearn

结合logisticsRegression进行情感预测

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.pipeline import make_pipeline
pipe = make_pipeline(TfidfVectorizer(min_df=5, norm=None),
                     LogisticRegression())
param_grid = {'logisticregression__C': [0.001, 0.01, 0.1, 1, 10]}

grid = GridSearchCV(pipe, param_grid, cv=5)
grid.fit(text_train, y_train)
print("Best cross-validation score: {:.2f}".format(grid.best_score_))

查看tfidf产生的权重

vectorizer = grid.best_estimator_.named_steps["tfidfvectorizer"]
# transform the training dataset:
X_train = vectorizer.transform(text_train)
# find maximum value for each of the features over dataset:
max_value = X_train.max(axis=0).toarray().ravel()
sorted_by_tfidf = max_value.argsort()
# get feature names
feature_names = np.array(vectorizer.get_feature_names())

print("Features with lowest tfidf:\n{}".format(
      feature_names[sorted_by_tfidf[:20]]))

print("Features with highest tfidf: \n{}".format(
      feature_names[sorted_by_tfidf[-20:]]))

查看回归模型的参数

mglearn.tools.visualize_coefficients(
    grid.best_estimator_.named_steps["logisticregression"].coef_,
    feature_names, n_top_features=40)

N元分词

n元分词可以保存句子的结构信息.n个词例可以组成一个n-gram.

一元分词

cv = CountVectorizer(ngram_range=(1, 1)).fit(bards_words)
print("Vocabulary size: {}".format(len(cv.vocabulary_)))
print("Vocabulary:\n{}".format(cv.get_feature_names()))

二元分词

cv = CountVectorizer(ngram_range=(2, 2)).fit(bards_words)
print("Vocabulary size: {}".format(len(cv.vocabulary_)))
print("Vocabulary:\n{}".format(cv.get_feature_names()))

三元分词

cv = CountVectorizer(ngram_range=(1, 3)).fit(bards_words)
print("Vocabulary size: {}".format(len(cv.vocabulary_)))
print("Vocabulary:\n{}".format(cv.get_feature_names()))

热力图表示

# extract scores from grid_search
scores = grid.cv_results_['mean_test_score'].reshape(-1, 3).T
# visualize heat map
heatmap = mglearn.tools.heatmap(
    scores, xlabel="C", ylabel="ngram_range", cmap="viridis", fmt="%.3f",
    xticklabels=param_grid['logisticregression__C'],
    yticklabels=param_grid['tfidfvectorizer__ngram_range'])
plt.colorbar(heatmap)

高级分词/词干提取/词形还原

词干提取/词形还原都属于normalization.使用Porter进行词干提取,使用spacy包实现词形还原

import spacy
import nltk

# load spacy's English-language models
en_nlp = spacy.load('en')
# instantiate nltk's Porter stemmer
stemmer = nltk.stem.PorterStemmer()

# define function to compare lemmatization in spacy with stemming in nltk
def compare_normalization(doc):
    # tokenize document in spacy
    doc_spacy = en_nlp(doc)
    # print lemmas found by spacy
    print("Lemmatization:")
    print([token.lemma_ for token in doc_spacy])
    # print tokens found by Porter stemmer
    print("Stemming:")
    print([stemmer.stem(token.norm_.lower()) for token in doc_spacy])

主题建模和文档聚类

使用隐含迪利克雷分布(LDA)进行主题建模.

vect = CountVectorizer(max_features=10000, max_df=.15)
X = vect.fit_transform(text_train)

from sklearn.decomposition import LatentDirichletAllocation
lda = LatentDirichletAllocation(n_topics=10, learning_method="batch",
                                max_iter=25, random_state=0)
# We build the model and transform the data in one step
# Computing transform takes some time,
# and we can save time by doing both at once
document_topics = lda.fit_transform(X)

LDA的components_属性

print("lda.components_.shape: {}".format(lda.components_.shape))
# lda.components_.shape: (10, 10000) 即每个单词对于每个主题的重要性

LDA重要性可视化

# for each topic (a row in the components_), sort the features (ascending).
# Invert rows with [:, ::-1] to make sorting descending
sorting = np.argsort(lda.components_, axis=1)[:, ::-1]
# get the feature names from the vectorizer:
feature_names = np.array(vect.get_feature_names())
# Print out the 10 topics:
mglearn.tools.print_topics(topics=range(10), feature_names=feature_names,
                           sorting=sorting, topics_per_chunk=5, n_words=10)

查看主题的整体权重

fig, ax = plt.subplots(1, 2, figsize=(10, 10))
topic_names = ["{:>2} ".format(i) + " ".join(words)
               for i, words in enumerate(feature_names[sorting[:, :2]])]
# two column bar chart:
for col in [0, 1]:
    start = col * 50
    end = (col + 1) * 50
    ax[col].barh(np.arange(50), np.sum(document_topics100, axis=0)[start:end])
    ax[col].set_yticks(np.arange(50))
    ax[col].set_yticklabels(topic_names[start:end], ha="left", va="top")
    ax[col].invert_yaxis()
    ax[col].set_xlim(0, 2000)
    yax = ax[col].get_yaxis()
    yax.set_tick_params(pad=130)
plt.tight_layout()

下载 (1)

源码阅读|Faster RCNN(一)——顶层视图