|
48 Meridian St. Melrose, Massachusetts 02176 Telephone: 781-665-0517 |
NORBERT E KREMER, Ph.D. |
PROFILE
Cloud Solution Architect with extensive experience designing and building large scale data warehouse and analytics platforms, incorporating structured and unstructured data
Google Cloud Platform – Google Certified Professional Data Engineer, Associate Cloud Engineer
Amazon AWS Certified Solutions Architect – Associate
GCP corporate trainer and curriculum developer
Strategic advisor on cloud migration, hybrid architecture, cloud compute, storage, NoSQL & RDBMS architecture
Expert in scalable MPP and columnar and cloud databases, both design and implementation
Strong knowledge of big data technologies, cloud computing, statistics and machine learning algorithms
Outstanding business analysis, modeling, writing and presentation skills. Process modeling, process mining
Strong domain knowledge of Healthcare (CPHIMS), Life sciences, Financial, Retail, Digital Media sectors
PROFESSIONAL EXPERIENCE
Cloud Technology and Data Science Corporate Trainer (miscellaneous clients) 12/2017 – present
Corporate trainer on Google Cloud Platform Architecture, Big Data Analytics, Machine Learning
Running GCP certification workshop at GDG Cloud Boston meetup, emphasis on BigQuery and data tools
Data Carpentry Certified Instructor, provide training to biomedical researchers at academic medical centers
Pepkor, Cape Town, South Africa
Google Cloud Platform Data Engineer 7/2018-9/2018
Design and build BigQuery data lake using Pub/Sub and Dataflow
Use Dataflow’s Apache Beam Unified Programming Model to build single code base for batch and streaming
Evaluate Cloud Composer and Cloud Functions for orchestration of batch and streaming DataFlow jobs
Pre-process and load to BigQuery 600 historic customer data files containing some 10 billion records
Cloud
Training Sabbatical 8/2017
– 11/2017
Full-time training on Google Cloud Platform, emphasis on Big Data, NoSQL databases, Data Engineering
Google Certified Professional Data Engineer Certification
Completed 25 GCP courses on Coursera (this is required to qualify as GCP trainer)
Working with Google to become Google Authorized Trainer on Data Engineering courses
Focus on BigQuery (Dremel), BigTable, DataStore, DataFlow, DataProc (Hadoop), DataPrep (Trifacta), DataStudio (Tableau-like BI tool), DataLab (Jupyter notebooks), Tensorflow, Cloud Machine Learning Engine, Kubernetes (Docker containers). Special focus on integration of BigQuery with Dataflow and DataStudio.
Taco Bell, Irvine, CA
Cloud Solution Architect (with Eon Collective) 4/2017-7/2017
Design AWS cloud architecture for eComm project, network, compute, storage, security, database, perform POC
Evaluate Amazon RedShift MPP database vs SnowFlake cloud-native database
Design Data Vault data model, raw & bus. vaults, lead DV implementation team, RedShift, Talend & AnalytixDS
Establish Agile process, integrate development team with Taco Bell product manager, use Jira and Confluence
OM1, Cambridge, MA
Data Warehouse Architect (direct client) 1/2017-3/2017
OM1 researchers required a Cohort Selection data mart to discover patient populations fitting diagnosis, treatment, and demographic criteria.
Designed star schema with multi-attribute dimensions and bridge tables, using Aqua Data Studio modeler
Profiled and cleansed source data for consistent results, populated star schema using SQL and python
Tuned RedShift database structures to achieve 10 second query performance goal (250 million patients)
Created Tableau Desktop dashboards to allow selection of patient cohorts
STATE STREET GLOBAL ADVISORS, Boston, MA
Data Warehouse Architect (with Advanti Solutions team) 10/2013 – 09/2016
The Research Models group built an Equity History Research Database to support active trading based on equity risk factors and equity fundamentals. The Netezza MPP analytics appliance (IBM PureData for Analytics) was selected for scalability and ability to run R statistical code in-database
Built 300 relational loaders using Oracle code generator. Load and process terabytes of data from Thomson Reuters and FactSet. Used insert-only bi-temporal data model. Heavy use of Oracle PL/SQL and python.
Prepared data sets for data mining and statistical analysis. Developed Netezza stored procedures for fiscal year alignment. Convert tall-table (EAV) to wide format using pivot SQL. Participate in Data Quality program
Extensive work with NZPLSQL Netezza stored procedure and SQL query tuning on 10TB+ data volumes
Worked with quantitative equity analysts to develop R code to run in-database on Netezza (factor-based trading)
CENTERLIGHT HEALTHCARE, Bronx, NY
Data Warehouse Architect (with Caserta Concepts team) 05/2012 – 06/2013
Strategic Assessment for new Data Warehouse to provide analytics for both financial and clinical quality measures.
BEST BUY, INC., Minneapolis, MN
Data Warehouse & Netezza Solution Architect – Independent Consultant 04/2012 – 03/2013
The Athena Customer Data Program built a Customer Analytic Master Database to integrated customer data from multiple sources. A Netezza warehouse appliance provides a scalable analytics platform for marketing campaigns.
Designed data integration strategy, using Message Queues and bulk extracts, balance integrity vs performance
Prepared, defended Solution Architecture Blueprint diagrams at Architectural Review board meetings
Column-level encryption of sensitive customer data (PII) using Netezza SQL Extensions Toolkit
FIDELITY INVESTMENTS, Boston, MA
Senior Netezza Developer. FCAP ART (via IQ Associates) 09/2011 – 03/2012
The Fidelity Cost and Profitability Program’s Allocation Reengineering Team developed a data warehouse to allocate costs of all business functions using complex econometric models on large data volumes
Developed complex set of Netezza stored procedures for ART Activity Based Costing accounting system
Analyzed performance of Netezza NZSQL code generated by metadata-driven costing engine
Develop full-volume data tests to find rare variances in financial results, trace to data quality issue
FRESENIUS MEDICAL CARE, Lexington, MA Business Intelligence Group
KCNG Data Warehouse Architect (via IQ Associates) 5/2010 – 08/2011
KCNG is a data warehouse for integrated financial and clinical data with low latency, to be used for operational reporting and for patient quality and financial analytics. Fresenius is a $13B/yr provider of dialysis services.
Designed conceptual, logical and physical data models for EDW, established Data Dictionary to define terms
Netezza Performance Tuning – Analysis of query plans for data skew, processing skew, co-located joins. Worked with Netezza support on difficult performance issues. Achieved 50% improvement on key queries
HUMEDICA (now OPTUM), Boston, MA Data Integration Group, Operations
Data Analyst & Data Ingestion Team Lead 6/2009- 5/2010
Humedica is a Healthcare Informatics startup (now Optum), providing clinical BI in SaaS business model
Team Lead of four developers using Oracle Data Integrator and Intel SOA Expressway for data ingestion
Prepared extensive data quality reports for client data sources, using R, SQL (RJDBC), LaTex and sweave
Designed ETL specifications to map EMR systems to clinical data warehouse using PL/SQL – goal was to build 'data factories' for top 10 EMR systems, Meditech, Epic, GE Centricity, Siemens, Allscripts, Cerner
SKILLS AND TECHNOLOGIES
Cloud Computing, Big Data, Statistics, Machine Learning
Expert knowledge of Google Cloud Platform (GCP) architecture and strategies for migration to cloud. App Engine, Compute Engine, Kubernetes, Cloud functions, Google cloud storage, BigQuery, BigTable, DataStore, DataFlow, DataProc, DataPrep (Trifacta), DataStudio, DataLab, Tensorflow, Cloud Machine Learning Engine, IAM. AWS Kinesis, DynamoDB, IoT, EMR, elastic map reduce, RedShift, Spectrum, Athena, Aurora. Elasticity, managed services, autoscaling, preemptible instances, all used in concert to provide economical, scalable solution. Biostatistics, machine learning algorithms, R statistics, Matlab, Octave, python, pandas, scikit-learn, Hadoop, HDFS, HBase, Hive, Spark, Spark SQL, Spark ML, Apache Presto.
Data Warehouse Architecture
Relational and NoSQL data modeling. Data lake and data warehouse design. Third normal form, dimensional modeling, data vault modeling. Bi-temporal and insert-only models. Modeling tools: PowerDesigner, ERWin, ER/Studio, Aqua Data Studio. Data Profiling, Change Data Capture (CDC) strategy. Design and development of code generators to build custom loaders for ELT processing., metadata capture, data governance. Master Data Management (MDM) and reference data solutions.
Database Systems, SQL Language Development
Expert knowledge of MPP databases: Netezza, AWS RedShift, SnowFlake, GCP BigQuery. Development of complex queries for warehouse, BI, and analytic workloads using advanced SQL (correlated subqueries, analytic functions, in-line view, common table expressions). Query optimization and performance tuning on MPP DB. Extensive NZPLSQL Netezza stored procedure development. User-defined extension (UDX, UDF) dev. on MPP databases. Freelance consultant
Data Integration Architecture
Architecture and design of enterprise-scale integration solutions using message queues, integration engines (Interfaceware Iguana and Chameleon), integration hub databases. Use of Enterprise Integration Patterns. Knowledge of HL7 interface formats, FHIR.
PROFESSIONAL DEVELOPMENT, TRAINING & CERTIFICATIONS (selected)
Developing Applications with Google Cloud Platform Coursera Specialization (4 courses) 2018
Machine Learning with TensorFlow on Google Cloud Platform Coursera Specialization (5 courses) 2018
From Data to Insights with Google Cloud Platform Coursera Specialization (4 courses) 2018
Amazon AWS Certified Solutions Architect - Associate certificate 2018
Tableau for Healthcare Professionals - HealthDataViz 2017
Google Certified Professional – Data Engineer 2017
Data Engineering on Google Cloud Platform Coursera Specialization (5 courses) 2017
Architecting with Google Cloud Platform Coursera Specialization (6 courses) 2017
Certified Professional in Health Information Management Systems, CPHIMS 2017
Machine Learning Foundations: A Case Study Approach - Coursera 2016
Big Data, Genes and Medicine – Coursera 2016
Boston Data Festival, Machine Learning, Pandas, Bayesian Statistics 2016
Machine Learning (Stanford, Andrew Ng) - Coursera 2016
Statistics in Medicine (Stanford University HRP258 on-line course) 2013
Bioinformatics: Principles, Methods and Applications - MIT Professional Institute 2001
EDUCATION
Postdoctoral Fellow, Albert Einstein College of Medicine, Bronx, NY 1985-1988
Postdoctoral Fellow, State University of New York, Stony Brook, NY 1983-1985
Purdue University Ph.D. in Neurobiology 1983
Wesleyan University B.A. in Biology and Physics 1976
AFFILIATIONS
GDG Cloud Boston Meetup, Co-organizer
New England HIMSS - CPHIMS certification
Many Boston area meetups on Data Science & Big Data. e.g. Google Cloud, AWS, Machine Learning, python, R