Principal/Staff Data Engineer - ETL/AWS/Python/Apache
10PearlsFull time Full day
10Pearls is an end-to-end digital technology services partner helping businesses utilize technology as a competitive advantage. We help our customers digitalize their existing business, build innovative new products, and augment their existing teams with high performance team members. Our broad expertise in product management, user experience/design, cloud architecture, software development, data insights and intelligence, cyber security, emerging tech, and quality assurance ensures that we are delivering solutions that address business needs. 10Pearls is proud to have a diverse clientele including large enterprises, SMBs and high growth startups. We work with clients across industries, including healthcare/life sciences, education, energy, communications/media, financial services, and hi-tech. Our many long-term, successful partnerships are built upon trust, integrity and successful delivery and execution.
We are looking for a “Staff Data Engineer”. Ideal candidate should have a Bachelor’s degree in Computer Science with 5 – 8 years of programming AWS Glue, ETL pipeline Athena, Apache Spark, Apache Hudi (Apache Iceberg and DeltaLake are similar tech), and Pythonmust.
- Develop, construct, test and maintain optimal data pipeline architecture
- Assemble large, complex data sets that meet functional / non-functional business requirements.
- Identify ways to improve data reliability, efficiency and quality
- Prepare data for predictive and prescriptive modeling
- Use data to discover tasks that can be automated
- ETL pipeline build, deployment and operations, data quality, monitoring and alerting, CI/CD, governance, access control, etc
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Monitoring processes performance and advising any necessary changes
- Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
- Create data tools for analytics that assist in building and optimizing the product into an innovative industry leader
- Work with data and analytics experts to strive for greater functionality in our data systems.
- Demonstrate proficiency in data management and automation on Spark, Hadoop, and HDFS environments
- Knowledge in DS/ML, analytics, or data warehousing
- Good communication skills
- Experience with Apache Hadoop, Hive, Spark, Airflow, Apache Livy, Scala, Java
- Experience in AWS EMR (HDFS, S3, Hbase), AWS Athena, PySpark or related technologies
- Experience with streaming technologies such as kafka
- Experience of working with programming languages like Scala, Java, SQL, Python, R etc.
- Experience in managing data in relational databases and developing ETL pipelines
- Exposure to enterprise level service such as Cloudera, Databricks, AWS, etc
- Exposure to AWS data services and technologies such as EC2, EMR, Kinesis, Lambda, DynamoDB are nice to have