Media Cloud

Blog

News and Updates

Job Announcement: Software Engineer - Data Pipeline

Update: As of October, 2020, this position has been filled.

Online media is in a state of flux. Twitter, Facebook, Gab, blogs and so-called fake news are all developments that have radically altered the landscape of news and information online. The Media Cloud project was created to track and understand this online media ecosystem. Come help us build data-centric tools for academic researchers and non-profits that let them investigate and track how speech moves across the internet.

Position Overview

The Media Cloud project is looking for a software engineer to focus on our data pipeline. We are an open source project conducting primary research about the media ecosystem as well as helping others to do their own research about it on over 1.5 billion news stories, adding more than 700,000 new stories daily. Pay is competitive, and while the initial contract will be for 4-6 months, we anticipate this role extending long-term.

Responsibilities:

  • work on our server architecture, which collects and processes and allows researchers to analyze these stories via an API; you will approximately spend half your time planning, designing, building and the other half, maintaining and running the project's data pipeline;

  • work with senior engineers to establish a technical vision for the project;

  • contribute to and follow a technical roadmap to meet research needs and to complete grant deliverables;

  • collaborate with other developers, designers, and system administrators in implementing technical roadmap;

  • accurately communicate project status internally and externally to our community of users;

  • maintain, upgrade and build systems within an existing (rather large) codebase to collect, archive, and analyze content from online media;

  • write code that can scale systems to handle ever-expanding data requirements.

Minimum Qualifications:

  • college degree or other domain-specific accreditation, preferably in computer science or data science related field;

  • at least two years experience working as a software engineer on big data systems;

  • programming fluency — Python required;

  • some experience with Linux;

  • demonstrated ability to design, build, test, and deploy robust code;

  • demonstrated ability to iterate quickly through prototypes;

  • demonstrated ability to use data to validate architectural decisions;

  • ability to work productively in a virtual environment with remote team members all over the world;

  • interest in working on issues related to hate-speech, democracy, gender, race, or health.

Helpful Skills:

  • experience implementing and maintaining a production ETL pipeline;

  • experience scaling platforms to handle large data sets;

  • experience writing web crawlers or API scrapers;

  • experience writing, maintaining, and optimizing SQL queries against databases;

  • experience working with PostgreSQL and Solr / Lucene in Ubuntu environments;

  • experience working with text-based data system (ie. NLP);

  • experience working in a modern dev / systems environment including git and docker.

Our upcoming technical roadmap includes ingesting new platforms into our data pipeline, analyzing images from news stories, and incorporating new sources of audience/readership data, as well as ongoing updates to improve the scalability, performance, and reliability of our existing pipeline.

About Media Cloud

Media Cloud is a joint project between UMass-Amherst, Northeastern University, and the Berkman Klein Center for Internet & Society at Harvard University. This position is with Media Cloud’s nonprofit arm, the Media Ecosystems Analysis Group, and you will work closely with members of the team from all centers.

Our Team

We are a diverse and welcoming community of researchers and technologists who love to engage with hard questions about online media by using a combination of social, computer, and data sciences. You will work with all members of our small team, from senior faculty to junior developers, and thrive in an academic atmosphere that encourages experimentation, constant questioning, and validation at all levels of our platform.

Much of our substantive work focuses on issues of online hate-speech, race, democracy, and health. We strongly encourage women, people of color, and people of any sexual identity to apply.

Our entire team is remote, with team members working all around the world, and we welcome remote workers.

Interview process

We strive to make the interview process smooth and painless for both parties. Should you choose to submit your application for this position, you can expect the following:

  1. Phone screen (30 mins);

  2. Technical interview (1 hour);

  3. Paid coding challenge (1 week to complete);

  4. Team interviews (2 hour);

  5. Final decision.

To Apply

Please email resume and cover letter to jobs@mediacloud.org.

Media Cloud