Bryan Perozzi develops scalable neural methods for learning expressive representations of social relationships and natural language, applying them to prediction, pattern discovery, and anomaly detection in large networked datasets. He holds a PhD from Stony Brook, has authored 20+ peer-reviewed papers presented at NeurIPS, KDD, and WWW, and pairs that academic rigor with 4+ years of industry experience building large-scale data analytics systems. An active open-source contributor, he has improved core graph tooling—adding RDF triple conversion and more flexible sampling to TensorFlow GNN and enhancing DeepWalk with parallel walk generation and disk-backed storage for large graphs. Based in New York, he brings 11 years of experience translating graph ML research into production-ready pipelines and is known for being an early practitioner of data science focused on practical, scalable tooling.
Contributions:13 commits, 2 PRs, 2 pushes in 1 year 7 months
Contributions summary:Bryan significantly enhanced the functionality of the DeepWalk project, focusing on improvements to its core features and usability. They refactored the command-line interface and expanded the input file format support for the graph processing module. Additionally, the user implemented a feature allowing the saving of generated walks to disk, optimizing memory management for large graphs. Finally, they contributed to parallelizing walk generation and included a scoring routine example.
TensorFlow GNN is a library to build Graph Neural Networks on the TensorFlow platform.
Role in this project:
ML Engineer
Contributions:21 commits, 1 push, 15 comments in 1 year 2 months
Contributions summary:Bryan primarily contributed to the TensorFlow GNN library by modifying the graph sampling components. Their work involved implementing a uniform random sampling strategy, enhancing existing sampling methods, and improving the validation checks and error messages related to feature sizes. The changes also included updates to the sampling spec proto definition and adjustments to the schema augmentations, demonstrating a focus on improving the flexibility and user-friendliness of the sampling pipeline. The user also added a triple converter for RDF-style input.
Find and Hire Top DevelopersWe’ve analyzed the programming source code of over 60 million software developers on GitHub and scored them by 50,000 skills. Sign-up on Prog,AI to search for software developers.