Intensive Reading - MoCo Papers Notes
Momentum Contrast for Unsupervised Visual Representation Learning
Foreword
In contrast learning, there is a massive demand for negative samples.
In a search scenario, we will try to construct as many negative samples as possible.
Using batch negative we are able to construct $N^2 - N$ negative samples with a batch size of $N$.
But the problem is that we are limited by the size of the video memory, we can’t increase the batch size infinitely, even through the multi-computer and multi-card distributed method, the batch size on each card has a limit, and even through some mechanism to aggregate the negative sample scoring between multiple cards, we will encounter a bottleneck in the aggregation stage. Therefore, in the public work that can be seen so far, according to the author,
the largest batch size is CLIP, which opens to 32,768.
Kaiming argues that contrastive learning can be viewed in some form as a “dictionary look-up”(Dictionary look-up).
Our positive samples can be thought of as being distributed among many negative samples,
and by entering a Query(and also the Key in the dictionary), then we expect to match the correct Doc.
From this perspective, we can see that one of the fundamental difficulties with comparative learning is that:
- large
- consistent