1 million+ businesses helped. Get advice
Get Free Advice
Home

/

Big Data Software

/

Pachyderm

Overview

Pachyderm 2026: Benefits, Features & Pricing

Software Advice offers objective insights based on verified user reviews and independent product and market research. When our advisors match you to a software provider, we may earn a referral fee.

How Software Advice ensures transparency

Software Advice lists all providers across its website—not just those that pay us—so that users can make informed purchase decisions. Users can talk to our advisors for free to receive software recommendations matching their needs. Software providers pay us for sponsored profiles to reach users interested in their products.

How Software Advice verifies reviews

Software Advice carefully verified over 2 million reviews to bring you authentic software experiences from real users. Our human moderators verify that reviewers are real people and that reviews are authentic. They use leading tech to analyze text quality and to detect plagiarism and generative AI.

Independent research methodology

Researchers at Software Advice use a mix of verified reviews, independent research, and objective methodologies to bring you selection and ranking information you can trust. While we may earn a referral fee when you visit a provider through our links or talk to an advisor, this has no influence on our research or methodology.

Wondering if Pachyderm is right for your organization?

Our Big Data Software selection experts can help you in 15 minutes or less.

On this page
  • Overview
  • Pricing and Plans
  • Features
  • User Reviews

Overview

Pachyderm
Pachyderm
4.0
(7)

Pricing

Pricing available upon request

About Pachyderm

Company:
Pachyderm is the leader in data versioning and pipelines for MLOps. We provide the data foundation that allows data science teams to automate and scale their machine learning lifecycle while guaranteeing reproducibility. With over $40 million in three rounds of funding from leading investors like Benchmark, Microsoft M12, Y Combinator, and others, Pachyderm, Inc. offers a commercial Pachyderm Enterprise Edition and an open source Pachyderm Community Edition. Pachyderm helps customers get their ML and AI projects to market faster, lower data processing and storage costs, and supports strict data governance requirements..

Products:
Pachyderm is for data science teams who want to operationalize the data tasks in their ML lifecycle to iterate on data more quickly and reliably. Pachyderm is the leader in data versioning and pipelines for MLOps, and this data foundation allows data science teams to automate and scale their machine learning lifecycle while guaranteeing reproducibility. Unlike other data versioning and pipeline products Pachyderm provides data-driven automation, petabyte scalability and end-to-end reproducibility.

Pachyderm Enterprise Edition:
Pachyderm Enterpr...

ise Edition is our commercial offering designed for the largest projects in highly secure environments. Along with world-class support, your team also gets access to our full range of premium features including Pachyderm Console, authentication and access controls (RBAC), no scaling limits, JupyterHub integration, and centralized multiple cluster management. Pachyderm Community Edition: Pachyderm Community Edition is our open source version of Pachyderm. With Pachyderm Community Edition, you get the core Data Versioning and Pipeline features of Pachyderm that you can deploy locally or in the cloud of your choosing. If you need help, there’s an entire community of experts ready to offer their assistance.

Pachyderm Screenshots

0
0
1
2
3
4
5

Pachyderm Pricing and Plans

Free Trial
Free Version

Basic

Pricing available upon request

No plan information available

    Advisor Get Price CTA Image

    Confused about pricing? We've got you covered.

    Get a personalized pricing breakdown tailored to your specific needs—no guesswork, no generic estimates.

    Pachyderm Features

    • Popular features found in Big Data
      Access Controls/Permissions
      Collaboration Tools
      Data Blending
      Data Connectors
      Data Security
      Data Visualization
      Data Warehousing
      Forecasting
      High Volume Processing
      Predictive Analytics
      Statistical Analysis
    • More features of Pachyderm
      Activity Dashboard
      API
      Asynchronous Learning
      Compliance Management
      Configurable Workflow
      Data Capture and Transfer
      Data Cleansing
      Data Extraction
      Data Import/Export
      Data Storage Management
      Data Transformation
      Deep Learning
      For eCommerce
      Image Analysis
      Machine Learning
      ML Algorithm Library
      Model Training
      Monitoring
      Multi-Language
      Multiple Data Sources
      Natural Language Processing
      Neural Network Modeling
      Performance Management
      Performance Metrics
      Predictive Modeling
      Process/Workflow Automation
      Real-Time Analytics
      Real-Time Monitoring
      Reporting & Statistics
      Role-Based Permissions
      Sentiment Analysis
      Speech Recognition
      Synchronous Learning
      Third-Party Integrations
      Version Control
      Visualization
      Workflow Management

    Pachyderm User Reviews

    Overall Rating

    4.0

    Ratings Breakdown

    5

    14%

    4

    71%

    3

    14%

    2

    0%

    1

    0%

    Secondary Ratings

    Ease of Use

    3.3

    Value for money

    4.0

    Customer support

    4.9

    Functionality

    4.6

    Have you used Pachyderm and would like to share your experience with others?

    Clayton's profile

    Clayton L.

    Verified reviewer

    Hospital & Health Care

    10000+ employees

    Used daily for less than 2 years

    Review source

    Reviewed November 2021

    Rethinking Data in AI and ML

    4

    Like any tool, Pachyderm is no silver bullet for the entire AI/ML stack. However, from a data processing and management perspective, it has fulfilled every application requirement I've needed it for and continues to be a flexible tool in meeting additional requirements. For example, after having computed some results from a pipeline, I needed to serve these results to an existing application. Pachyderm made this simple by exposing the data through a built-in S3 REST API. Since the application was already compatible with S3, Pachyderm served as a drop-in replacement for an S3 bucket. For anyone that strives to design clean and straightforward AI/ML architectures, I can definitely recommend Pachyderm as a must for the foundational data component.

    Ratings Breakdown

    3
    Ease of use
    4
    Value for money
    5
    Customer support
    5
    Functionality
    icon
    Pros:
    AI/ML production systems typically consist of multiple data processing steps organized as a DAG. Many automation frameworks manage these DAGs as tightly coupled steps ordered by _code execution_. What I like so much about Pachyderm is that it approaches DAG management as loosely coupled steps ordered by _data dependencies_. This alternative way of thinking has enabled me to design AI/ML architectures with data at the center, which has revolutionized the development and production workflows I've participated in. I can confidently store, process, and otherwise manage the data because Pachyderm provides a solid foundation for data provenance, data versioning, data storage patterns, and efficient incremental processing. Since AI/ML models are effectively a form of data, model versioning and management can be built as an extension of Pachyderm's data foundation. Furthermore, I really like that Pachyderm is powered by Kubernetes, because it passes on important architectural properties to Pachyderm, such as high scalability, robustness, efficiency, and portability (i.e. cloud agnosticism). I can containerize my pipelines, quickly test them locally through Docker Desktop or minikube, then scale them up to massive amounts of data in an on-prem or cloud cluster. If autoscaling is supported in a cloud cluster, I can especially reap the benefits of cost efficiency because I only pay for the compute resources I use.
    Cons:
    - In 1.X versions of Pachyderm, there are a few performance pain points, especially around handling very small files when uploading/downloading to/from a repo. These pain points have been significantly improved in Pachyderm 2.X. - Also in 1.X, debugging pipeline failures can sometimes be challenging without extra tools or integrating external logging services. Pachyderm 2.X improves upon this as well. - When Pachyderm processes data files in a pipeline, it groups the files into logical structures called datums for provenance and data efficiency reasons, and then it invokes the pipeline on each datum. This is necessary for scalability, but the downside is that each invocation of the pipeline incurs an overhead cost of just starting the processing code. The bright side is that there are several straightforward ways to engineer around the problem. It's also important to recognize that the impact of the problem is minimized by the benefits of incremental processing(i.e. only processing data that has changed on future pipeline runs). - This isn't necessarily a problem, but prospective buyers should be aware that although compute costs may go down due to incremental processing, storage costs may go up due to storing multiple versions of data.

    Reasons for choosing Pachyderm

    Although DVC provides data version control features and AI/ML pipeline management, it lacks containerized pipeline orchestration and seems better suited for small teams in startup or research environments. We needed an enterprise-level service.

    Reasons for switching to Pachyderm

    Airflow is mainly geared for pipeline orchestration. My team had to build in a custom data management layer, but there was much to be desired in terms of provenance and versioning. Since Pachyderm already provided these features plus pipeline orchestration, it made more sense to not reinvent the wheel with Airflow.

    Vendor Response

    Thank you for your very thorough review Clayton.

    Replied November 2021

    Read More

    CC

    Cove C.

    Verified reviewer

    Research

    201-500 employees

    Used daily for more than 2 years

    Review source

    Reviewed November 2021

    Game changer for handling dynamic data

    4

    Pachyderm meets many previously unmet needs for our organization, including complete data provenance, automatic handling of data change, and modular/portable processing architecture, which facilitates the joint development of processing pipelines between software developers and scientists. Pachyderm engineers have been extremely responsive to our issues and development requests, and we plan to work well into the future with this software.

    Ratings Breakdown

    4
    Ease of use
    5
    Customer support
    5
    Functionality
    icon
    Pros:
    Perhaps the most important aspect we benefit from operationally is the awareness and automatic handling of data change. Generation of our data products involves multiple processing steps and several sources of data and metadata that enter the processing sequence at various points and may change at any time. Pachyderm automatically knows what has changed and triggers downstream (re)processing, removing the need for error-prone human management.
    Cons:
    In Pachyderm 1.X there was a relatively high amount of overhead associated with processing each datum. Our data typically consists of small but numerous datums, and we needed to artificially combine datums for performance. However, Pachyderm has been working with us on this issue and we expect to see big improvements in 2.0 and beyond.

    Read More

    ML

    Martin L.

    Verified reviewer

    Biotechnology

    51-200 employees

    Used weekly for less than 12 months

    Review source

    Reviewed October 2021

    Great in theory

    3

    We achieved some of our goals with Pachyderm. However, we were really hoping to spend more time on solving the problems directly related with our goal. Instead, we spent a significant amount on time solving problems with Pachyderm and tailoring our problem to it.

    Ratings Breakdown

    2
    Ease of use
    4
    Customer support
    4
    Functionality
    icon
    Pros:
    Great concept, really fits what we would like to do. Re-computing only the pieces where the data has changed is super valuable.
    Cons:
    Working with it in practice is very hard. We would like to use Pachyderm also for research, developing research pipelines that can be executed easily on big amounts of data on the cluster. However, during research/development, pipelines naturally crash often. Translating something that works locally to something that works in pachyderm has several scenarios in which it can fail. Inspecting those types of errors is incredibly difficult, unless you invest a significant amount of time into setting up logging/monitoring manually.

    Vendor Response

    Hello Martin, thank you for your feedback, we truly appreciated it. Pachyderm 2 will have several enhancements around the troubleshooting workflow for pipelines and the new Console (dashboard) will likely be of great help here. However, we're striving to further improve the user experience of Pachyderm with every release. Thank you.

    Replied November 2021

    Read More

    XF

    Xubo F.

    Verified reviewer

    Biotechnology

    201-500 employees

    Used daily for less than 2 years

    Review source

    Reviewed October 2021

    Pachyderm is a great data processing platform on cloud.

    4

    We have used Pachyderm for more than a year. Overall experience is Good. We love the core technology and features provided by Pachyderm. We experienced frustrated issues, like the download speed, deployment, system stability. We get excellent support from the Pachyderm team all the time.

    Ratings Breakdown

    5
    Ease of use
    4
    Value for money
    5
    Customer support
    5
    Functionality
    icon
    Pros:
    Data Driven Automation. It supports incremental data processing. Reproducibility. Perfectly match our tech stacks: K8s, S3. Community facing.
    Cons:
    We expect fully automated data replication/export to external storage system. The logging & debugging support could be improved.

    Reasons for choosing Pachyderm

    Data Driven Automation. It supports incremental data processing. Easy integration with our infrastructure.

    Vendor Response

    Xubo, Thank you for your review, we greatly appreciate your feedback. We'll make sure to pass your feedback around logging and debugging on to our product team. - Pachyderm

    Replied October 2021

    Read More

    CK

    Chris K.

    Verified reviewer

    Marketing and Advertising

    2-10 employees

    Used daily for less than 12 months

    Review source

    Reviewed October 2021

    Scalable machine learning without the mlops

    5

    Ratings Breakdown

    3
    Ease of use
    4
    Value for money
    5
    Customer support
    5
    Functionality
    icon
    Pros:
    The ability to scale model builds in native python is something that has been missing in this space until now. Utilizing spark and/or dask comes with a large amount of overhead that can be avoided leveraging pachyderm.
    Cons:
    The learning curve is quite steep since there are some core concepts that are foundational to understand before using pachyderm.

    Vendor Response

    Thank you for your review Chris!

    Replied November 2021

    Read More

    CH

    Chris H.

    Verified reviewer

    Information Technology and Services

    2-10 employees

    Used weekly for less than 2 years

    Review source

    Reviewed October 2021

    Pachyderm for data pipelines

    4

    Ratings Breakdown

    4
    Ease of use
    4
    Value for money
    5
    Customer support
    5
    Functionality
    icon
    Pros:
    Pachyderm pipelines are an intuitive way to split and process data concurrently using autoscaling compute clusters. Writing a program to interact with data in a pipeline is straightforward due to working similar to a native filesystem, requiring no additional libraries or integrations.
    Cons:
    We ran into issues with Pachyderm that required deleting and recreating pipelines. As an upside, support was very responsive to resolving our problems and providing upgrades to Pachyderm.

    Vendor Response

    Chris, Thank you for your great feedback. We're glad to hear that our support team has been a great asset to you. We'll make sure to pass along the feedback.

    Replied November 2021

    Read More

    WO

    Will O.

    Verified reviewer

    Information Technology and Services

    51-200 employees

    Used weekly for less than 12 months

    Review source

    Reviewed November 2021

    The missing ingredient for reproducible research

    4

    I'm a big fan of the pachyderm approach; it's young software and needs to be understood a little to get the best out of it; but when stuff works, it works so damn well.

    Ratings Breakdown

    2
    Ease of use
    4
    Value for money
    5
    Customer support
    3
    Functionality
    icon
    Pros:
    The systematic recording of provenance for training and benchmarking results.
    Cons:
    When things go wrong, it's hard to diagnose.

    Reasons for choosing Pachyderm

    For NLP, the requirements around data curation for training are slightly singular, pachyderms offering was the only sensible one for us.

    Vendor Response

    Thank you for the review, Will.

    Replied November 2021

    Read More

    Showing 1 - 7 of 7 Reviews

    Popular Pachyderm Alternatives

    Main Product
    Pachyderm

    Pachyderm

    4.0
    (7)

    Ratings Breakdown

    • 3.29Ease of use
    • 4.0Value for money
    • 4.86Customer support
    • 4.57Functionality

    Pricing

    Available upon request

    Get Price
    Alternative Product

    Ratings Breakdown

    • 4.40Ease of use
    • 4.60Value for money
    • 4.0Customer support
    • 4.65Functionality

    Pricing

    Available upon request

    Get Price
    Alternative Product

    Ratings Breakdown

    • 4.48Ease of use
    • 4.43Value for money
    • 4.32Customer support
    • 4.60Functionality

    Pricing

    Available upon request

    Get Price
    Alternative Product

    Ratings Breakdown

    • 4.13Ease of use
    • 4.27Value for money
    • 4.24Customer support
    • 4.52Functionality

    Pricing

    Available upon request

    Get Price
    Alternative Product

    Ratings Breakdown

    • 5.0Ease of use
    • 5.0Value for money
    • 5.0Customer support
    • 5.0Functionality

    Pricing

    Starting at $25.00 per month

    Get Price
    Alternative Product

    Ratings Breakdown

    • 4.67Ease of use
    • 5.0Value for money
    • 5.0Customer support
    • 5.0Functionality

    Pricing

    Starting at $1.00 per month

    Get Price

    Other Top Recommended Big Data Software

    SAS Viya
    SAS Viya

    4.4 (12)

    Recently recommended 6 times

    Hevo
    Hevo

    4.7 (110)

    Recently recommended 1 times

    Tableau
    Tableau

    4.6 (2345)

    Recently recommended 0 times

    Google Cloud
    Google Cloud

    4.7 (2230)

    Recently recommended 0 times

    Advisor Get Price CTA Image

    Stuck Between Options?

    Our experts can help you compare Pachyderm with other top options, so you can find the best fit for your needs.

    See what companies are saying about Software Advice