Top expert inNatural Language Processing and Machine Learning Technologies
Sofie Van Landeghem is an experienced software engineer and open-source maintainer with 12 years of expertise in NLP, ML and backend development, holding a PhD in Bioinformatics and a Master’s in Software Engineering. She founded OxyKodit to deliver tailored NLP and LLM solutions across biomedical, legal and enterprise datasets, combining hands‑on modeling with rigorous testing and production-ready quality practices. As a core contributor to spaCy, Thinc and spacy-transformers she built NER, entity linking and transformer integration features, and today helps maintain FastAPI/Typer and acts as "repo czar" for Andrej Karpathy’s educational nanochat, curating PRs and running experiments to keep the code minimal and teachable. Her work spans low-level ML optimizations (e.g., custom layers and shape/padding fixes) to developer UX improvements (docs, test suites, CLI ergonomics), showing a rare blend of research depth and pragmatic engineering. Based in Belgium, she balances independent consulting with high-impact open-source stewardship, often serving as the first line of review that turns community contributions into merge-ready code. An understated strength is her history of scaling academic NLP pipelines to process tens of millions of PubMed articles, which informs her practical, data-centric approach to language AI.
12 years of coding experience
9 years of employment as a software developer
Machine Learning Summer School, Machine Learning Summer School at University of Cambridge
PhD in Sciences Bioinformatics, PhD in Sciences Bioinformatics at Ghent University
💫 Industrial-strength Natural Language Processing (NLP) in Python
Role in this project:
Data Scientist
Contributions:5 releases, 1295 reviews, 1044 commits in 4 years 2 months
Contributions summary:Sofie's contributions primarily involve the development of a Named Entity Recognition (NER) system within the spaCy NLP framework. They focused on leveraging entity descriptions and article texts to create input embedding vectors, which were then used for training. Their work includes creating and training a custom knowledge base and constructing training datasets for named entity linking. The code changes also indicate the creation of a model to predict NER labels and the implementation of methods for evaluating the performance of the developed models.
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy
Role in this project:
ML Engineer
Contributions:1 release, 57 reviews, 96 commits in 1 year 10 months
Contributions summary:Sofie primarily contributed to the `spacy-transformers` repository by implementing and refining the integration of transformer models within spaCy. Their work included adding support for specific transformer models like DistilBERT and XLNet, enhancing the functionality of components such as the wordpiecer, text categorizer, and entity recognizer. Further contributions encompassed fixing configuration files and integrating features from the spaCy v3 branch.
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.