A set of simple Python scripts for pre-processing large files
Role in this project:
Data Engineer Contributions:43 commits, 2 PRs, 8 pushes in 2 years 4 months
Contributions summary:Zygmunt primarily contributed to the development of data preprocessing scripts. They wrote several Python scripts, including `chunk.py` for splitting files, `sample.py` for sampling lines, `tsv2csv.py` for format conversion, `delete_cols.py` for column deletion, `standardize.py` and `colstats.py` for data normalization, `split.py` for data splitting, `shuffle.py` and `unshuffle.py` for shuffling/unshuffling, and various utility scripts for data transformation. Their contributions focused on data manipulation and preparation for potential downstream analysis or machine learning tasks.
large-filespre-processingpythonpython-scripts
Warning: This project does not have any current developer. See bellow.
Role in this project:
ML Engineer Contributions:16 commits in 3 months
Contributions summary:Zygmunt primarily contributed to the development of a CSV dataset wrapper for the pylearn2 library. Their work included implementing the `CSVDataset` class, which handles loading and processing data from CSV files, including support for one-hot encoding and handling headers. They also added a unit test for the `CSVDataset` and a simple prediction script. The contributions focus on data loading and model prediction for machine learning tasks.
javascripttypescript