AI - Kedro
Open-source DSML framework empowers data scientists/engineers: build, version, deploy reproducible pipelines via catalog-based approach, modularity, parallel exec., integrations.
- Name
- Kedro - https://github.com/kedro-org/kedro
- Last Audited At
About Kedro
Kedro is an open-source Data Science and Machine Learning (DSML) framework developed in Python. The project's mission is to help data scientists and engineers build, version, and deploy reproducible DSML pipelines.
Kedro provides a catalog-based approach for organizing and managing DSML projects, enabling modularity, extensibility, and collaboration among team members. They offer various features such as:
- Project organization: Kedro helps manage the project's structure with a clear separation of code, data, and configuration.
- Catalog-based approach: A catalog is a collection of nodes that represent tasks, transformations, or actions in your DSML pipeline. This hierarchical design allows for easy composition and reuse of components.
- Data Catalog: Kedro includes a built-in data catalog to store metadata about input and output datasets, allowing users to easily access their data within the pipeline.
- Modularity and Composability: DSML pipelines can be constructed as modular units that can be reused across multiple projects. This leads to increased code maintainability, testability, and reusability.
- Parallel Execution: Kedro allows for parallel execution of tasks within a pipeline, which can significantly reduce the overall runtime of your DSML workflows.
- Versioning: Kedro supports version control for pipelines and catalogs using Git or other version control systems. This ensures that teams can work on different branches of their projects simultaneously while maintaining data consistency.
- Integrations and Extensibility: Kedro offers integrations with various popular data science libraries and tools such as NumPy, Pandas, TensorFlow, Scikit-learn, and more. It also allows users to easily extend its functionality through custom plugins.
- Community and Partnerships: Kedro has an active community on Slack and GitHub, with over 2,300 members as of March 2023. They have received recognition from Core Infrastructure Initiative, PyPI, Anaconda, and PEP 517. These collaborations contribute to the ongoing development, maintenance, and adoption of Kedro within the DSML community.