Factored was conceived in Palo Alto, California by Andrew Ng and a team of highly experienced AI researchers, educators, and engineers to help address the significant shortage of qualified AI & Machine-Learning engineers globally. We know that exceptional technical aptitude, intelligence, communication skills, and passion are equally distributed around the world, and we are very committed to testing, vetting, and nurturing the most talented engineers for our program and on behalf of our clients.
We are currently looking for an exceptionally talented Data Engineer to join our team. You will be called on for a wide range of responsibilities, from data aggregation, scraping, validation, transformation, quality and DataOps administration of both structured and unstructured datasets. Ideally, you will be experienced in optimizing data architecture, building data pipelines and wrangling data to suit the needs of our algorithms and application functionality.
Functional Responsibilities:
- Develop and maintain ETL (Extract, Transform, Load) processes using Python.
- Design, build, and optimize large-scale data pipelines on Databricks.
- Write efficient SQL queries to extract, manipulate, and analyze data from various databases.
- Design and develop optimal data processing techniques: automating manual processes, data delivery, data validation and data augmentation.
- Collaborate with stakeholders to understand data needs and translate them into scalable solutions.
- Design and develop API integrations in order to feed different data models.
- Architect and implement new features from scratch, partnering with AI/ML engineers to identify data sources, gaps and dependencies.
- Identify bugs and performance issues across the stack, including performance monitoring and testing tools to ensure data integrity and quality user experience.
- Build a highly scalable infrastructure using SQL and AWS big data technologies.
- Keep data secure and compliant with international data handling rules.
Qualifications:
- 3 - 5+ years of professional experience shipping high-quality, production-ready code.
- Strong computer science foundations, including data structures & algorithms, OS, computer networks, databases, algorithms, and object-oriented programming.
- Experience with Databricks.
- Experience in Python.
- Experience in setting up data pipelines using relational SQL and NoSQL databases, including Postgres, Cassandra or MongoDB.
- Experience with cloud services for handling data infrastructure such as: Snowflake(preferred), Azure, Databricks, Azure Databricks, and/or AWS.
- Experience with orchestration tools such as Airflow
- Proven success manipulating, processing, and extracting value from large datasets.
- Experience with Big Data tools, including Hadoop, Spark, Kafka, etc.
- Expertise with version control systems, such as Git.
- Strong analytic skills related to working with unstructured datasets.
- Excellent verbal and written communication skills in English.