Datasets are often created using human experts of crowdsourced labelers. However, there are use cases, like detecting small objects on the surface of the earth, where this task is costly, time consuming, and unscalable. When sufficient labeled data is available, machine learning models tend to be helpful reducing the time required to accomplish this task. Here we present a methodology for creating datasets of remotely sensed objects using satellite imagery when labeled data available is limited. To develop our map of utility-scale solar arrays across India first we assembled point labels of known solar PV farms and used human-machine interaction for a user to finetune an unsupervised model to create weak segmentation labels, labels obtained through weakly supervised learning14, of the solar farms. Then we paired these weak pixel-wise segmentation labels with geo-located Sentinel 2…
