Sofie Van Landeghem - Nanochat's Repo Czar

Sofie Van Landeghem

Nanochat's Repo Czar

Belgium

Join Prog.AI to see contacts

Summary

🤩

Rockstar

🎓

Top School

Top expert inNatural Language Processing and Machine Learning Technologies

Sofie Van Landeghem is an experienced software engineer and open-source maintainer with 12 years of expertise in NLP, ML and backend development, holding a PhD in Bioinformatics and a Master’s in Software Engineering. She founded OxyKodit to deliver tailored NLP and LLM solutions across biomedical, legal and enterprise datasets, combining hands‑on modeling with rigorous testing and production-ready quality practices. As a core contributor to spaCy, Thinc and spacy-transformers she built NER, entity linking and transformer integration features, and today helps maintain FastAPI/Typer and acts as "repo czar" for Andrej Karpathy’s educational nanochat, curating PRs and running experiments to keep the code minimal and teachable. Her work spans low-level ML optimizations (e.g., custom layers and shape/padding fixes) to developer UX improvements (docs, test suites, CLI ergonomics), showing a rare blend of research depth and pragmatic engineering. Based in Belgium, she balances independent consulting with high-impact open-source stewardship, often serving as the first line of review that turns community contributions into merge-ready code. An understated strength is her history of scaling academic NLP pipelines to process tens of millions of PubMed articles, which informs her practical, data-centric approach to language AI.

12 years of coding experience

9 years of employment as a software developer

Machine Learning Summer School, Machine Learning Summer School at University of Cambridge

PhD in Sciences Bioinformatics, PhD in Sciences Bioinformatics at Ghent University

French, Dutch, English

Github Skills (28)

transformers10

pytorch10

python10

distilbert10

machine-learning10

deep-learning10

spacy10

bert10

nlp10

xnet10

cli10

testing9

data-structure9

back-end-development9

algorithm9

Programming languages (11)

TypeScriptJavaC++RustCHandlebarsJavaScriptJupyter Notebook

Github contributions (5)

explosion/spaCy

Nov 2018 - Jan 2023

💫 Industrial-strength Natural Language Processing (NLP) in Python

Role in this project:

Data Scientist

Contributions:5 releases, 1295 reviews, 1044 commits in 4 years 2 months

Contributions summary:Sofie's contributions primarily involve the development of a Named Entity Recognition (NER) system within the spaCy NLP framework. They focused on leveraging entity descriptions and article texts to create input embedding vectors, which were then used for training. Their work includes creating and training a custom knowledge base and constructing training datasets for named entity linking. The code changes also indicate the creation of a model to predict NER labels and the implementation of methods for evaluating the performance of the developed models.

fairness-mlpythondata-preprocessinglanguage-processingtokenization

explosion/spacy-transformers

Dec 2019 - Oct 2021

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

Role in this project:

ML Engineer

Contributions:1 release, 57 reviews, 96 commits in 1 year 10 months

Contributions summary:Sofie primarily contributed to the `spacy-transformers` repository by implementing and refining the integration of transformer models within spaCy. Their work included adding support for specific transformer models like DistilBERT and XLNet, enhancing the functionality of components such as the wordpiecer, text categorizer, and entity recognizer. Further contributions encompassed fixing configuration files and integrating features from the spaCy v3 branch.

natural-language-understandingxlnetbertgooglenatural-language-processing

Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.

Request Free Trial