György Orosz is a seasoned Natural Language Processing engineer with 18 years of experience building applied NLP systems for industry and research, currently working at the European Commission and consulting across banks, AI startups, and NGOs. He led development of Hungarian NLP models (HuSpaCy) and made notable open-source contributions to spaCy by implementing robust Hungarian tokenization and language resources. At Affinitext and prior companies he designed text-processing pipelines that extract, analyze, and visualize information from complex contracts, customer support logs, and news/social streams. His background spans academic research, product engineering, and team leadership, including a PhD-focused track and lecturing roles in NLP and scientific Python. György combines deep linguistic engineering for less-resourced languages with production-grade software practices, enabling both research advances and practical deployments. Based in Budapest, he’s known for turning language-specific quirks into reliable, reusable tooling that narrows the gap between prototypes and operational systems.
18 years of coding experience
14 years of employment as a software developer
MSc Computer science, MSc Computer science at Eötvös Loránd University
PhD Computer Science, PhD Computer Science at Pázmány Péter Katolikus Egyetem
Computer science, Computer science at University of Kent
💫 Industrial-strength Natural Language Processing (NLP) in Python
Role in this project:
Back-end Developer
Contributions:37 commits, 12 PRs, 43 comments in 8 months
Contributions summary:György contributed significantly to the development of Hungarian language support within the spaCy library, primarily focusing on the Hungarian tokenizer. Their work involved creating and integrating resource files, defining tokenization rules, and implementing tests to ensure correct abbreviation handling and other tokenization behaviors. The commits demonstrate a focus on adapting the spaCy library to support a new language, involving the creation of language-specific data and configurations. This includes the integration of new language data and test cases into the spaCy library.
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial
György Orosz - Natural Language Processing Engineer