György Orosz

Natural Language Processing Engineer

Budapest, Hungary
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts
email-iconphone-icongithub-logolinkedin-logotwitter-logostackoverflow-logofacebook-logo
Join Prog.AI to see contacts

Summary

🤩
Rockstar
🎓
Top School
György Orosz is a seasoned Natural Language Processing engineer with 18 years of experience building applied NLP systems for industry and research, currently working at the European Commission and consulting across banks, AI startups, and NGOs. He led development of Hungarian NLP models (HuSpaCy) and made notable open-source contributions to spaCy by implementing robust Hungarian tokenization and language resources. At Affinitext and prior companies he designed text-processing pipelines that extract, analyze, and visualize information from complex contracts, customer support logs, and news/social streams. His background spans academic research, product engineering, and team leadership, including a PhD-focused track and lecturing roles in NLP and scientific Python. György combines deep linguistic engineering for less-resourced languages with production-grade software practices, enabling both research advances and practical deployments. Based in Budapest, he’s known for turning language-specific quirks into reliable, reusable tooling that narrows the gap between prototypes and operational systems.
code18 years of coding experience
job14 years of employment as a software developer
bookMSc Computer science, MSc Computer science at Eötvös Loránd University
bookPhD Computer Science, PhD Computer Science at Pázmány Péter Katolikus Egyetem
bookComputer science, Computer science at University of Kent
languagesEnglish, Italian
stackoverflow-logo

Stackoverflow

Stats
123reputation
6kreached
7answers
2questions
github-logo-circle

Github Skills (18)

tokenize10
python10
hungarian-algorithm10
tokenizer10
spacy10
natural-language-processing10
nlp10
testing9
cython7
pos-tagger6
highlighting6
smoothing6
nltk6
part-of-speech6
machine-learning6

Programming languages (11)

JavaDockerfileShellTeXScalaJavaScriptHTMLJupyter Notebook

Github contributions (5)

github-logo-circle
explosion/spaCy

Dec 2016 - Aug 2017

💫 Industrial-strength Natural Language Processing (NLP) in Python
Role in this project:
userBack-end Developer
Contributions:37 commits, 12 PRs, 43 comments in 8 months
Contributions summary:György contributed significantly to the development of Hungarian language support within the spaCy library, primarily focusing on the Hungarian tokenizer. Their work involved creating and integrating resource files, defining tokenization rules, and implementing tests to ensure correct abbreviation handling and other tokenization behaviors. The commits demonstrate a focus on adapting the spaCy library to support a new language, involving the creation of language-specific data and configurations. This includes the integration of new language data and test cases into the spaCy library.
fairness-mlpythondata-preprocessinglanguage-processingtokenization
Materials for the Text Mining workshop held in the HuNLP meetup, June 2017
Contributions:26 commits, 1 PR, 22 pushes in 10 days
hunlppythonspacy-modelssentiment-analysisnatural-language-processing
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.
Request Free Trial
György Orosz - Natural Language Processing Engineer