Google has released the Crossmodal-3600 image captioning evaluation dataset, which serves as a benchmark for linguistic image captioning, allowing researchers to study the field more reliably. Crossmodal-3600 in 36 languages, with 3,600 different photos from around the world, plus 261,375 human-generated reference captions, the researchers mentioned that the captions from Crossmodal-3600 are of good quality and maintain a consistent style across languages .
