• Home
  • >
  • Member of Technical Staff – Image / Video Data Engineer at Black Forest Labs – Remote | Germany | USA

Member of Technical Staff – Image / Video Data Engineer at Black Forest Labs – Remote | Germany | USA


Black Forest Labs is a cutting-edge startup pioneering generative image and video models. Our team, which invented Stable Diffusion, Stable Video Diffusion, and FLUX.1, is currently looking for a strong candidate to join us in developing large-scale data pipelines for training frontier models.

Role:

  • Develop and maintain scalable infrastructure for large-scale image and video data acquisition
  • Manage and coordinate data transfers from various licensing partners
  • Implement and deploy state-of-the-art ML models for data cleaning, processing, and preparation
  • Implement scalable and efficient tools to visualize, cluster, and deeply understand the data
  • Optimize and parallelize data processing workflows to handle billion-scale datasets efficiently
  • Ensure data quality, diversity, and proper annotation (including captioning) for training readiness
  • Getting training data from alternative sources such as user preferences into trainable format
  • Work closely in the model development loop to update data as necessitated by the training trajectory

Ideal Experiences:

  • Proficiency in Python and various file systems for data intensive manipulation and analysis
  • Familiarity with cloud computing platforms (AWS, GCP, or Azure) and Slurm/HPC environments for distributed data processing
  • Experience with image and video processing libraries (e.g., OpenCV, FFmpeg)
  • Demonstrated ability to optimize and parallelize data processing workflows across CPUs and GPUs
  • Familiarity with data annotation and captioning processes for ML training datasets
  • Knowledge of machine learning techniques for data cleaning and preprocessing

Nice to have:

  • Background or keen interest in developing large-scale data acquisition systems
  • Experience with natural language processing for image/video captioning
  • Experience with data deduplication techniques at scale
  • Experience with big data processing frameworks (e.g., Apache Spark, Hadoop)
  • Understanding of ethical considerations in data collection and usage

New to
Victrays
?

Please login to save your tools

By login, you accept our Privacy Policy

Join 1000+ Ai enthusiasts worldwide

Join now and stay informed with weekly updates on new AI tools and breaking AI news!

By joining you agree to with our Privacy Policy and provide consent to receive updates from our Victrays.