Lei Ma

Lead Data Scientist

Halo! I’m Lei Ma. I am a physicist transformed data scientist based in Germany. I am a theory-centric data scientist with well-trained engineering skills. Apart from developing and productionizing deep learning models in my work, I regularly organize online discussion groups to keep track of the progress of machine learning. As for my engineering skillset, I worked on data warehouses and data lakes, model building, statistical and deep learning model deployment, and machine learning API for production. Nevertheless, my specialty is bringing the value of deep learning models to various industries.

Developing a career, I am focusing on forecasting with deep learning. I worked on marketplaces for housing, logistics, and fashion ecommerce. I worked on products that optimizes marketplaces using data driven models which helps us build a more efficient and environmentally friendly future. In general, I love the notion that machine learning improves the quality of our lives.

I am also a maker. I’ve built a bunch of fun stuff in my free time, most of which are open source as I am deeply attached to philosophy of Free and Open-Source Software. I am also a reviewer of the Journal of Open Source Software and PyCon DE & PyData.

Location

Cologne, NRW, Germany

Services

Coaching


Skills

Domains

Housing
Marketplace
Logistics
Manufacturing

Foundations

Technologies

Random Forest
Grandient Boosted Trees
Spiking Neural Networks
Complex Networks
Data Scraping
Large Language Model
Time Series Analysis
Data Visualization
ETL
CI/CD
API
Scientific Computing

Programming Languages

Python
SQL
Wolfram Language

Tools

git
pandas
PyTorch
Mathematica
Spark
Scikit Learn
LightGBM
BigQuery
Plotly
Plotly Dash

Languages

Chinese (Native)
English (Fluent)
German (A1)

Work

Lead Data Scientist

April 2024 - Current
Employment type

Full-time

Work model

Hybrid

  • Data science for chemical manufacturing.
  • GenAI applications.
  • Business Intelligence.
Related skills:
Time Series Analysis
Random Forest
LightGBM
PyTorch
Python
Scikit Learn
pandas
Large Language Model
Data Visualization

Applied Scientist

November 2022 - April 2024
Employment type

Full-time

Work model

Hybrid

  • Deep learning based demand forecasting.
  • Forecasting dynamical systems with graph neural networks.
  • End-to-end integration of demand forecasting for ecommerce.
Related skills:
Pricing
Marketplace
Time Series Analysis
Random Forest
LightGBM
PyTorch
Python
pandas
Spark
Large Language Model
Data Visualization

Data Scientist

November 2019 - September 2021
Employment type

Full-time

Work model

On site

  • Realtime pricing algorithms for DHL Freight tenders using machine learning.
  • Machine Learning based pricing for shippers and carriers to enable a transparent and fair freight market.
  • Supply and demand research and optimization of our road freight marketplace.
  • Machine learning solutions for the internal teams such as automating finance tasks using classification models.
  • Data stewardship to make sure the platform database is storing what is expected and ensuring data quality.
  • A Python package tailored for road freight data to boost productivity of the data team.
  • Price dispersion models for road freight marketplace.
Related skills:
Pricing
Marketplace
Time Series Analysis
Random Forest
LightGBM
PyTorch
Python
pandas
Data Visualization
CI/CD

Data Scientist

August 2018 - October 2019
Employment type

Full-time

Work model

On site

  • Built ETL pipelines and maintained data warehouse for all the data homelike has, ensuring stable and clean data for our data science projects.
  • Designed data-explainable KPIs for the company, built models to understand the driving force for the KPIs, and built interactive dashboards to provide insights using data and data-driven decisions across all levels of management.
  • Designed and built data products and APIs on geo-location based analysis and forecasting for customers on supply side, and helping them with data-driven decisions.
  • Built user profiling API to assist the pre-sale team and making them more efficient.
Related skills:
Marketplace
Statistics
Random Forest
ETL
Python
pandas
Plotly Dash
Data Visualization

Projects

Featured


Time Series Forecasting using Deep Learning [WIP]

Time Series Forecasting using Deep Learning [WIP]

[WIP] I am writing about time series forecasting using deep learning to help myself go through the related topics. I will cover energy-based models, generative models, contrastive methods, adversarial methods, tree-based models, and some examples of time series forecasting using deep learning.
August 2021 - Current
Category

Data

Company

KausalFlow

Related skills:

E-Book on Statistical Physics

E-Book on Statistical Physics

I wrote this free e-book on statistical physics at graduate level. I am diving into the microscopic view of artificial neural networks using the framework of statistical physics.
January 2014 - July 2014
Category

Data

Company

KausalFlow

Related skills:
Statistical Physics

Datumorphism

Datumorphism

Datumorphism is my project to improve my data science skills. I write about many different aspects about data. A Today I Learned section is also being actively maintained in this project.
August 2018 - Current
Category

Data

Related skills:
Machine Learning
Statistics
ETL
Random Forest
Grandient Boosted Trees
Data Visualization

Project Catalog


EERILY: timE sERIes pLaYground

EERILY: timE sERIes pLaYground

timE sERIes pLaYground (EERILY) is a package I created to help myself write about time series forecasting. At the current phase, I am focusing on adding stuff that helps me understanding time series data and some models.
December 2022 - Current
Category

Data

Company

KausalFlow

Related skills:
Python
Machine Learning
Time Series Analysis

HaferML: HomemAde FramEwoRk for Machine Learning

The HomemAde FramEwoRk for Machine Learning (HAFER ML) is a minimal and unambitious framework for your machine learning projects, with reproducibility in mind.
April 2021 - Current
Category

Data

Related skills:
Python
Machine Learning

NeuronStar

NeuronStar

March 2015 - August 2017
Category

NeuronStar is a project I started to explore the connections between computational neuroscience and machine learning.

We have finished several reading clubs and successfully hosted many seminars on topics related to neuroscience and machine learning.

Related skills:
Neuroscience
Complex Networks

Neutrino Research Tools and Knowledge Base

I created this set of tools to help me neutrino physics researchers.
January 2014 - January 2017
Category

Physics

Related skills:
Scientific Computing
Data Visualization

Dispersion Relation Gaps and Neutrino Flavor Instabilities in Fast Modes

A research paper on neutrino flavor instabilities and dispersion relations.
April 2018 - Current
Category

Physics

Related skills:

Thesis: Neutrino Flavor Conversions in Dense Media

Thesis: Neutrino Flavor Conversions in Dense Media

My research on neutrino oscillations in dense media.
July 2018 - Current
Category

Physics

Related skills:

A Toy Model of the Neutrino Halo Problem

Neutrinos might be the key to solve the supernova explosion mystery. However, neutrino flavor dynamics around supernova is extremely non-linear and non-local. I wrote this piece of code in C++ to solve the non-local neutrino halo problem in supernova explosions.
March 2018 - October 2018
Category

Physics

Related skills:

Neutrino Flavor Conversions

Neutrino Flavor Conversions

Neutrinos are abundantly produced in astrophysical environments such as core-collapse supernovae and binary neutron star mergers. Neutrino flavor conversions in the dense media play important roles in the physical and chemical evolutions of the environments. In this book, I present two mechanisms through which neutrinos may change their flavors.
April 2018 - Current
Category

Physics


Hugo Connectome Theme

Hugo Connectome Theme

The Hugo Connectome Theme is a Hugo theme for online notes with backlinks.
February 2022 - Current
Category

Web

Company

KausalFlow

  1. A graph view of all the interconnected notes.
  2. Each article has a section called connectome showing the links to other articles and backlinks from other articles.
  3. The double bracket links (i.e., [[]]) can be previewed in the current article.
  4. Support multiple notebooks. Articles can be organized into different notebooks while the backlinks also works between two different notebooks.

There have been many note-taking tool emerging recently. Many of them reflect the idea of digital garden and zettelkasten. For me, I have been organizing contents using Hugo. I like the idea of backlinks, but I didn’t want to spend too much time moving my notes to a different tool. So I developed this Hugo theme to incorporate these new fancy concepts.

Related skills:
Knowledge Management
Hugo

DIETBox: Data Science Toolbox

The Data scIEnce Toolbox (DIETBox) is a small set of reusable tools for my projects.
July 2021 - July 2021
Category

Data

Related skills:
Python
Data Science

COVID-19/SARS-CoV-2 Dataset for Europe

covid19-eu-data is a dataset repository for COVID-19/SARS-CoV-2 cases in Europe. Data is automatically collected from official government websites regularly using the open-source scripts inside the repository.
March 2020 - December 2021
Category

Data

Related skills:
Python
Data Scraping
Data Visualization
CI/CD

Kirsche: Find connections in your paper references file.

Tools to visualize academic papers are emerging on the market. One of the interesting idea is to check the connections between papers.
September 2021 - October 2021
Category

Tool

Kirsche is an open-source tool to calculate and visualize connections between a list of papers. I have also created a command line too to create visualizations with a single command.

Related skills:
Python

Interplanetary Immigration Center

Interplanetary Immigration Center (IIC) is my project to bring hard science to science fiction. IIC is the fictional organization that helps human relocate to and settle on Mars as well as other planets.
January 2012 - Current
Category

SciFi

Related skills:
Python
SciFi
Hugo

Parametric neutrino flavor conversions and Rabi oscillations

A research paper that explains stimulated neutrino oscillations using Rabi oscillations.
July 2018 - December 2018
Category

Physics

Related skills:

Social Pulse: Am I Dead Online?

Am I still alive? Life is not only measured by biological metrics. It is also measured by our online activities. This dashboard shows my online activities. If I am ever online, a bar will show up in the chart. If this dashboard is showing null activities, it probably means I am dead, at least online.
April 2021 - Current
Category

Web

In the background, we have GitHub Actions running regularly on GitHub and update the social activities. The updated data is then fed into this dashboard.

Related skills:
git
Python

Tram Bot - KVB

Waiting at the tram station in the winter is a painful experience. I built this API and slack bot to help the team in Cologne get accurate tram schedules easily.
December 2019 - December 2019
Category
Related skills:
API
Data Scraping

ts-bolt: nuts and bolts of time series deep learning

ts-bolt: nuts and bolts of time series deep learning

[WIP] Some nuts and bolts for times series models.
December 2022 - Current
Category

Data

Related skills:
Python
Time Series Analysis
Machine Learning

Schelling Model

Schelling Model

I built this app of Schelling model to help myself understand how our society is computing the communities and how phase transitions are formed.
November 2019 - November 2019
Category

Physics

Related skills:
Statistics
Statistical Physics
Python

Research Tools

I have been managing a collection of tools for academic research. It is a community effort with many contributions from a lot of people (see the contributors here and here).
January 2014 - Current
Category

Web

Related skills:
Hugo

ColorTeller: Discover and Share Color Palettes

ColorTeller: Discover and Share Color Palettes

ColorTeller is a color sharing platform for data visualization. We have a website for discovery and sharing and a python package called colorteller for benchmarking colors.
July 2021 - August 2021
Category

Data

Company

KausalFlow

Related skills:
Python
Data Science
Data Visualization

CoMaPack

Documents on modified gravity.I use Mathematica as the editor to show more details on the derivations and the programming techniques.
October 2017 - December 2017
Category

Physics, Open source

Related skills:

ModiGraviDoc

Cosmology Mathematica Pack based on GREAT.m by Tristan Hubsch. A series of packages are hosted here, including some basic general relativity calculations, cosmology background calculation, cosmological perturbation calculation, etc.
October 2017 - December 2017
Category

Physics

Related skills:
Mathematica
Scientific Computing

DataHerb

DataHerb

DataHerb is built to make clean and documented small datasets easy to retrieve. DataHerb is the missing open data listing service for datasets. For Mac users, DataHerb is your “Homebrew for data”.
January 2020 - December 2020
Category

Data

Company

KausalFlow

Related skills:
Python
CI/CD
git

DataHerb Command Line Tool

DataHerb Command Line Tool

The DataHerb Python Package is built to manage clean and documented small datasets in the terminal. DataHerb is the missing open data listing service for datasets. For Mac users, the DataHerb Python Package is your “Homebrew for data”.
January 2020 - Current
Category

Data

Company

KausalFlow

Related skills:
Python
Data Science
git

Dockerized OpenStreetMap Data Processor

Download OpenStreetMap data, extract street data, and clean up the data. Some useful data parsing functions are also included.
December 2018 - January 2019
Category

Data

Related skills:
Docker
Python

AudioRepr: Audiolization with Dataframe

December 2020 - December 2020
Category

Data

The end of 2020 is becoming so boring. I started to think about the mapping of data points to different representations. We usually talk about visualization because there are so many elements to be used to represent complicated data. Audiolization, on the other hand, leaves us with very few elements to represent complex data.

It is a lot of fun to play with audio. So I wrote this python package to map a pandas dataframe or numpy ndarray to audio representation.

Related skills:
Python
Audiolization

Notes on Intelligence

Some random notes on topics of intelligence such as neuroscience, collective intelligence. Most of these are reading notes or my calculations and derivations on some specific topics.
January 2014 - January 2015
Category

Notes

Related skills:
Neuroscience
Math

Notes on Physics

A set of notes for physics at graduate level. It covers most of the fundamental topics in physics such as quantum mechanics, electrodynamics, statistical mechanics, special and general relativity, astrophyiscs, and cosmology.
September 2011 - Current
Category
Related skills:

PyTorch Differential Equation Solver

A tiny fun project to solve differential equations using artificial neural networks. The idea is to use supervised learning to approximate the solutions to differential equations using universal approximator.
April 2015 - April 2015
Category

Physics

Related skills:
Python
PyTorch

Weekend Project: Indego Bike Sharing Data

I spent a weekend on this project to predict riding durations using indego bike sharing data.
July 2019 - July 2019
Category

Web

Company

KausalFlow

Related skills:
Python
PyTorch

Interactive Timeline of WesternXia

Interactive Timeline of WesternXia

An interactive timeline of Western Xia (the Tangut Empire, or 西夏).
November 2012 - November 2012
Category

Web

Related skills:
HTML

Sunspot

We are like the sunspots of the Sun. To see us, to understand our suffering, one will have to stare at the Sun. We created this website so we can hear each other’s voice, comfort each other, such that the glory of the sun will not overshadow our vulnerability. Feel free to share. Fee safe to vent. You deserve this moment and this outlet. In humanity we unite.
December 2018 - December 2018
Category

Web

Related skills:
API

Education

PhD in Physics

September 2013 - May 2018
Mode

Full-time

Specialization

Theoretical Physics

Related skills:
Scientific Computing

Bachelor of Science in Physics

September 2006 - June 2010
Mode

Full-time

Specialization

Physics

Related skills:

Interests

Science Fiction
Open Source
Technical Blogging