Lei Ma
Lead Data Scientist
Halo! I’m Lei Ma. I am a physicist transformed data scientist based in Germany. I am a theory-centric data scientist with well-trained engineering skills. Apart from developing and productionizing deep learning models in my work, I regularly organize online discussion groups to keep track of the progress of machine learning. As for my engineering skillset, I worked on data warehouses and data lakes, model building, statistical and deep learning model deployment, and machine learning API for production. Nevertheless, my specialty is bringing the value of deep learning models to various industries.
Developing a career, I am focusing on forecasting with deep learning. I worked on marketplaces for housing, logistics, and fashion ecommerce. I worked on products that optimizes marketplaces using data driven models which helps us build a more efficient and environmentally friendly future. In general, I love the notion that machine learning improves the quality of our lives.
I am also a maker. I’ve built a bunch of fun stuff in my free time, most of which are open source as I am deeply attached to philosophy of Free and Open-Source Software. I am also a reviewer of the Journal of Open Source Software and PyCon DE & PyData.
- Location
Cologne, NRW, Germany
- Services
Skills
Domains
Foundations
Technologies
Programming Languages
Tools
Languages
Work
- Employment type
Full-time
- Work model
Hybrid
- Data science for chemical manufacturing.
- GenAI applications.
- Business Intelligence.
- Employment type
Full-time
- Work model
Hybrid
- Deep learning based demand forecasting.
- Forecasting dynamical systems with graph neural networks.
- End-to-end integration of demand forecasting for ecommerce.
- Employment type
Full-time
- Work model
On site
- Realtime pricing algorithms for DHL Freight tenders using machine learning.
- Machine Learning based pricing for shippers and carriers to enable a transparent and fair freight market.
- Supply and demand research and optimization of our road freight marketplace.
- Machine learning solutions for the internal teams such as automating finance tasks using classification models.
- Data stewardship to make sure the platform database is storing what is expected and ensuring data quality.
- A Python package tailored for road freight data to boost productivity of the data team.
- Price dispersion models for road freight marketplace.
- Employment type
Full-time
- Work model
On site
- Built ETL pipelines and maintained data warehouse for all the data homelike has, ensuring stable and clean data for our data science projects.
- Designed data-explainable KPIs for the company, built models to understand the driving force for the KPIs, and built interactive dashboards to provide insights using data and data-driven decisions across all levels of management.
- Designed and built data products and APIs on geo-location based analysis and forecasting for customers on supply side, and helping them with data-driven decisions.
- Built user profiling API to assist the pre-sale team and making them more efficient.
Education
- Mode
Full-time
- Specialization
Theoretical Physics
- Mode
Full-time
- Specialization
Physics
Selected Publications
Deep learning (DL) is a cutting-edge approach to learning from data. While it has taken the areas of computer vision and natural language processing by storm, its application to time-series forecasting is a more recent phenomenon and remains challenging for both new and experienced practitioners.
To develop the best time series models for a real-world problem, it is essential to have not only a thorough understanding of the time series data but also a solid grasp of DL models themselves. This book investigates time series structures and the DL approaches that can address the variety of challenges they present to practitioners in industry.
In this book, you will gain insights from a variety of perspectives, both from the data and the models. You will learn about the complexities of real-world time series data, explore the different problem settings for time series analysis, touch upon the foundation of DL models for time series, and practice end-to-end time series analysis projects when DL works; the authors believe in choosing the best tool for the problem, so traditional methods are never far from our minds. A GitHub repository with coding examples will be provided to support your journey.
By the end of this book, you will be able to approach almost any time series challenge with an appropriate model that gets you results.
Demand forecasting in the online fashion industry is particularly amendable to global, data-driven forecasting models because of the industry’s set of particular challenges. These include the volume of data, the irregularity, the high amount of turn-over in the catalogue and the fixed inventory assumption. While standard deep learning forecasting approaches cater for many of these, the fixed inventory assumption requires a special treatment via controlling the relationship between price and demand closely. In this case study, we describe the data and our modelling approach for this forecasting problem in detail and present empirical results that highlight the effectiveness of our approach.
Projects
Featured
Time Series Forecasting using Deep Learning [WIP]
- Category
Data
- Company
KausalFlow
- Website
E-Book on Statistical Physics
- Category
Data
- Company
KausalFlow
Datumorphism
- Category
Data
Project Catalog
EERILY: timE sERIes pLaYground
- Category
Data
- Company
KausalFlow
- Website
HaferML: HomemAde FramEwoRk for Machine Learning
- Category
Data
- Website
NeuronStar
- Category
- Website
NeuronStar is a project I started to explore the connections between computational neuroscience and machine learning.
We have finished several reading clubs and successfully hosted many seminars on topics related to neuroscience and machine learning.
Neutrino Research Tools and Knowledge Base
- Category
Physics
- Website
Dispersion Relation Gaps and Neutrino Flavor Instabilities in Fast Modes
- Category
Physics
Thesis: Neutrino Flavor Conversions in Dense Media
- Category
Physics
- Website
A Toy Model of the Neutrino Halo Problem
- Category
Physics
Neutrino Flavor Conversions
- Category
Physics
Hugo Connectome Theme
- Category
Web
- Company
KausalFlow
- A graph view of all the interconnected notes.
- Each article has a section called connectome showing the links to other articles and backlinks from other articles.
- The double bracket links (i.e.,
[[]]
) can be previewed in the current article. - Support multiple notebooks. Articles can be organized into different notebooks while the backlinks also works between two different notebooks.
There have been many note-taking tool emerging recently. Many of them reflect the idea of digital garden and zettelkasten. For me, I have been organizing contents using Hugo. I like the idea of backlinks, but I didn’t want to spend too much time moving my notes to a different tool. So I developed this Hugo theme to incorporate these new fancy concepts.
DIETBox: Data Science Toolbox
- Category
Data
- Documentation
COVID-19/SARS-CoV-2 Dataset for Europe
- Category
Data
Kirsche: Find connections in your paper references file.
- Category
Tool
- Documentation
Kirsche is an open-source tool to calculate and visualize connections between a list of papers. I have also created a command line too to create visualizations with a single command.
Interplanetary Immigration Center
- Category
SciFi
- Website
Parametric neutrino flavor conversions and Rabi oscillations
- Category
Physics
Social Pulse: Am I Dead Online?
- Category
Web
- Website
In the background, we have GitHub Actions running regularly on GitHub and update the social activities. The updated data is then fed into this dashboard.
Tram Bot - KVB
- Category
ts-bolt: nuts and bolts of time series deep learning
- Category
Data
- Website
Schelling Model
- Category
Physics
- Category
Web
ColorTeller: Discover and Share Color Palettes
- Category
Data
- Company
KausalFlow
CoMaPack
- Category
Physics, Open source
ModiGraviDoc
- Category
Physics
DataHerb
- Category
Data
- Company
KausalFlow
- DataHerb Website
- DataHerb Command Line Tool
DataHerb Command Line Tool
- Category
Data
- Company
KausalFlow
- Documentation
Dockerized OpenStreetMap Data Processor
- Category
Data
AudioRepr: Audiolization with Dataframe
- Category
Data
The end of 2020 is becoming so boring. I started to think about the mapping of data points to different representations. We usually talk about visualization because there are so many elements to be used to represent complicated data. Audiolization, on the other hand, leaves us with very few elements to represent complex data.
It is a lot of fun to play with audio. So I wrote this python package to map a pandas dataframe or numpy ndarray to audio representation.
Notes on Intelligence
- Category
Notes
Notes on Physics
- Category
- Website
PyTorch Differential Equation Solver
- Category
Physics
Weekend Project: Indego Bike Sharing Data
- Category
Web
- Company
KausalFlow
Interactive Timeline of WesternXia
- Category
Web
- Website
Sunspot
- Category
Web