Data Engineer
Nichefire
About Us: We are a dynamic and innovative company specializing in social media data collection and analysis. Our mission is to harness the power of data to provide actionable insights through state-of-the-art NLP(Natural Language Processing) and LLM (Large Language Modeling). We leverage cloud technologies to build scalable and robust data pipelines that process vast amounts of text-based data.
Considerations: We are interested in every qualified candidate who is eligible to work in the United States. However, we are not able to sponsor visas at this time.
Job Description:
We are seeking an experienced Senior Data Engineer with a strong background in building and managing data collection, processing, and modeling pipelines in the cloud. The ideal candidate will have extensive experience with Airflow, Python, Google Cloud Platform (GCP), Git, and database management. You will be responsible for designing, developing, and maintaining data pipelines that support our NLP and LLM models, ensuring data quality, scalability, and reliability.
Key Responsibilities:
● Design and Develop Data Pipelines: Create, manage, and optimize data collection and processing pipelines using Airflow and GCP to handle large volumes of text-based social media data.
● Cloud Infrastructure Management: Implement and maintain cloud infrastructure on GCP, ensuring high availability, scalability, and security of data processing environments.
● Data Integration: Develop robust data integration solutions to aggregate data from various social media platforms and other sources, ensuring data consistency and reliability.
● NLP and LLM Model Support: Work closely with data scientists and machine learning engineers to support the deployment and maintenance of NLP and LLM models in production.
● Database Management: Design, manage, and optimize databases for storage and retrieval of large-scale text data, ensuring efficient data access and query performance.
● Version Control: Utilize Git for version control and collaboration on codebases, ensuring best practices in code management and deployment.
● Performance Tuning: Monitor and improve the performance of data pipelines, identifying and resolving bottlenecks and inefficiencies.
● Documentation: Maintain comprehensive documentation for all data engineering processes, ensuring transparency and knowledge sharing within the team.
● Collaboration: Work collaboratively with cross-functional teams, including data scientists, product managers, and other stakeholders, to understand data requirements and deliver solutions that meet business needs.
Qualifications:
● Experience: Minimum of 5+ years of hands-on experience with Airflow, Python, GCP, Git, and database management in a data engineering role.
● Technical Expertise: Strong proficiency in designing and implementing ETL/ELT pipelines using Airflow and GCP services (BigQuery, Cloud Storage, Pub/Sub, etc.).
● Programming Skills: Advanced knowledge of Python for data processing, automation, and integration tasks.
● Database Skills: Proficiency in SQL and experience with relational and NoSQL databases for handling large-scale data.
● Cloud Knowledge: In-depth understanding of cloud infrastructure, particularly GCP, including cost management, security best practices, and scalability strategies.
● Version Control: Strong experience with Git for version control, branching strategies, and collaborative development.
● Analytical Skills: Excellent problem-solving and analytical skills, with the ability to identify and resolve complex data engineering challenges.
● Communication: Strong verbal and written communication skills, with the ability to articulate technical concepts to non-technical stakeholders.
● Team Player: A collaborative mindset, with a willingness to work as part of a team and support colleagues in achieving shared goals.
Preferred Qualifications:
● NLP/LLM Experience: Experience working with NLP and LLM models, particularly in processing and analyzing text-based data.
● Social Media Data: Familiarity with social media data collection and analysis, including APIs and data extraction techniques.
● Certifications: Relevant certifications in GCP or data engineering are a plus.
Benefits:
● Competitive salary and performance-based bonuses.
● Healthcare stipend offered
● Flexible working hours and remote work options.
● Professional development opportunities and support for certifications.
● Collaborative and innovative work environment.
● Opportunities to work on cutting-edge technologies and challenging projects.
How to Apply:
Please submit your resume, cover letter, and a portfolio of relevant projects to BOTH careers@rev1ventures.com AND sbrown@nichefire.com We look forward to hearing from you!