Data Engineer

  • Contract
  • Remote

SimulStat

Apply Now

*Long-Term, Remote FSP role
The Data Engineer is responsible for the development and implementation of optimal solutions to transform, integrate, store, secure, process and update large real-world healthcare data assets for use by statistical programmers, data scientists and other data analysts.

Relevant database administration experience includes:
•    Extensive knowledge of developing data pipelines using real-world healthcare data including claims and electronic medical records.
•    Experience with one or more of the following commercial databases: MarketScan, Optum, DRG, Flatiron, JMDC, CPRD.
•    Experience with the OMOP data model and optimization of healthcare data for observational research or epidemiology analysis use cases.
•    Familiarity with medical coding, such as ICD-9, ICD-10, LOINC, NDC, CPT/HCPCS, SNOMED. 
•    Familiarity with big data processing platforms including Hadoop, AWS S3 and Databricks.
•    Experience with enterprise support models for data management, security, database programming, service delivery, performance monitoring, and user support standards.
•    Experience with conversion of raw data in ASCII or other formats into OMOP, Parquet or others, and storage in HDFS or S3.
•    Troubleshooting data errors and developing mitigation plans.
•    Strong communication and documentation skills for describing issues with data and potential remedies. 
•   Experience with database tuning techniques such as normalization, indexing, and parallel processing technologies is desirable as is experience with scripting languages such as Unix shell scripts and PERL.

Additional responsibilities include the following:
•    Ensuring data are consistent across the database
•    Minimizing redundancy across the database
•    Checking variable values for reasonableness
•    Develop database tools to improve database efficiency and utility
•    Building data pipelines for access to research data
•    Managing vendor relations and communication

Basic Qualifications:
•    Bachelor’s degree in in Computer Science, Statistics, Mathematics, Life Sciences or other relevant scientific subject.
•    Minimum 5 years relevant data asset curation experience (description above)
•    Extensive experience using the OMOP common data model and ETLs
•    Excellent SAS programming skills and the ability to implement complex data step logic and SQL. 
•    Experience with ETL software
•    Experience with real world healthcare data, such as MarketScan, Optum, PharMetrics, Medicare and/or EMR databases

Preferred Qualifications:
•    Master’s degree in Epidemiology, Biostatistics, Computer Science, or other subject with high statistical content
•    Eight (8) or more years relevant data asset curation experience (description above)
•    Proficiency with Python is highly desired as well as ability to implement and troubleshoot complex ETL and QC programs.
•    Experience in a regulated environment
•    Vendor relations management
•    Pharmaceutical industry experience
•   Training or experience with the Hadoop database platform and Impala or Hive SQL
•    Experience in software development & design life cycle, ideally using Agile methodology
•   Experience using ODBC (Open Database Connectivity)

Knowledge:
•    Computer programming with SAS, R, Python or other procedural languages
•    Database transformation, testing, cleaning and quality control using SQL
•    Understanding of computer operating systems, including cloud-based Databricks and UNIX
•    Software development and design

Key Competencies:
•    Technical excellence
•    Innovation
•    Teamwork
•    Problem solving
•    Attention to detail
•    Oral and written communication

Apply Now

  Apply with Google   Apply with Twitter
  Apply with Github   Apply with Linkedin   Apply with Indeed
  Stack Overflow