Machine Learning Short-Term Work Opportunity


Project: Develop an image forensics tool to investigate academic fraud by determining if an image has been duplicated from a given (large) dataset, with or without manipulation.

Location: Image and Data Analysis Core, Harvard Medical School, Boston, MA.

Duration: 6 months full time (40h/week) or 1 year part-time (20h/week).

Requisites: Considerable experience in computer vision and machine learning -- specifically convolutional neural networks.

Compensation: this is a paid opportunity; compensation is in line with similar academic positions in computer vision or machine learning.


To apply, send CV to


The Image and Data Analysis Core and the Office for Academic and Research Integrity at Harvard Medical School have an ongoing collaboration developing tools to identify re-use of images in academic publications. As widely publicized by venues such as Retraction Watch, this problem is of increasing relevance. Most of these manipulations are currently detected by humans subjectively, or using at most manual or semi-automated tools.

To tackle this problem at scale, our team is currently developing methods to determine if, given two images, one is a manipulated version of the other. ‘Manipulation’ can include a broad range of transformations representing common alterations and obfuscations ranging from simple crop and rigid transformation to projective and nonlinear transformations, histogram adjustment, or local alterations such as partial erasing, the addition of text and annotations, etc. We therefore apply metric-learning approaches based on convolutional neural networks to produce similarity functions which are invariant to these manipulations. These models are trained on a large database of synthetically altered biological images and tested on real-world cases of image manipulation which were discovered in peer-reviewed publications.

The facility is seeking a candidate to work closely with the team in order to further develop and deploy these forensic tools. Candidates are expected to possess solid knowledge and experience developing state of the art machine learning models for image analysis, using modern frameworks such as TensorFlow or PyTorch. The selected candidate will be provided access to adequate hardware (including multiple GPUs) for the task. Previous experience in closely related topics, such as metric learning, siamese networks, and image search, will incur extra credit.

The developed tools will be open-sourced, and the selected candidate will be encouraged to publish the results in conferences/journals in the field.

EEO Statement

We are an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, sex, gender identity, sexual orientation, religion, creed, national origin, ancestry, age, protected veteran status, disability, genetic information, military service or other protected status.