Data Engineer
Looking for data engineers to torture the data so much until data confesses what it holds!
EXPERIENCE: Little more than calling yourself “”data engineer” after completing data science course to little less than realizing “I don’t know anything about data engineering yet!”. To make it simple, lets consider between 2-6 years!
The data scientist should possess following skills:
- Minimum 3 years of hands-on experience in data management & engineering environment.
- Design & development of a data-processing pipeline that can handle millions of rows.
- Design an innovative methodology to extract information from data.
- Strong knowledge of core software technologies and fundamentals – specifically for large-scale distributed systems – and building highly available services.
And will perform following duties:
- Designing and developing scalable ETL scripts from the business source systems and the development of ETL routines to populate databases from sources and to create aggregates.
- Responsible for performing thorough testing and validation to support the accuracy of data transformations and data verification.
- Suggest & implement best practices for performance tuning while working on the larger datasets.
- Development and implementation of scripts for database maintenance, monitoring, performance tuning, and so forth.
- Ensure proper data governance and quality of the data.
- Define standard data management principles and policies for retention and archival.
- Troubleshoots data issues within the business and across the business and presents solutions to these issues
- Analyse complex data elements and systems, data flow, dependencies, and relationships to contribute to conceptual physical and logical data models.
- Work in an agile environment with the defined sprints to deliver the assigned work in the stipulated timelines.
- Adhere to software development best practices and coding standards in all work products and participate in the refinement of those practices and standards
- Design & development of an event-processing pipeline that can handle millions of events.