Are Base Models Beating Legal AI Tools in Testing?

July 21, 2025

By Gravity Stack Staff

Among the many conversations at this year’s Legal Innovators California, one insight from Gravity Stack’s Bryon Bratcher cut through the noise. As legal departments ramp up AI adoption, and we hear about more and more startups raising mega rounds, some of the most effective results are still coming from the same place they started: general-purpose foundation models.

Bryon described findings from the firm’s AI Lab, where teams have been benchmarking large language models like GPT-4 against a range of legal-specific copilots and applications.

“Sometimes the base models are just way better than what happens after it goes through another layer, whether that’s Copilot, Harvey, or others,” he said.

The remark points to a growing question in the legal tech market: does specialization always deliver better outcomes?

The Interface Matters

One reason base models are proving useful to lawyers in their day to day workflow, is their simplicity. ChatGPT and similar tools offer a clean, familiar interface. Lawyers already know how to use them, and already do in their personal lives.

“The chat interface is the adoption piece,” Bratcher said. “That’s what people know and love, and it’s already embedded in their workflow.”

Tools that replicate this ease of use tend to see faster engagement. Tools that require a new learning curve risk getting passed over, no matter how tailored the features may be.

What This Means for Legal Teams

Legal teams should benchmark tools based on real outcomes. Start by asking what the base model can do. Then measure how much, if anything, the additional layer improves that output.

The smartest teams are already testing side by side, bringing in tools their teams already use, and looking for areas where performance and usability come together.

Discovery Is Moving Toward First-Pass Automation

Bratcher also shared how Gravity Stack is applying this thinking to discovery workflows. In the past year, the team has run more than a dozen discovery projects using generative AI.

“Within 24 months, first-pass review will be done by AI,” he predicted.

That future is already being built. AI is being used to summarize complaints, organize documents, and get legal teams closer to QC with less manual work.

The key is implementation. Strong tools matter. But how they are used matters more.

—

Follow Gravity Stack on LinkedIn for updates from our AI Lab, client stories, and legal innovation insights.

Get in Touch
With the Team

If you would like to learn more about our team, services and relevant case studies, leave your email address and we will get in touch.

Are Base Models Beating Legal AI Tools in Testing?

Related News

How is Gravity Stack Using Generative AI to Revolutionize the Market

Jack Nelson Honored with Lifetime Achievement Award for Legal Excellence

Bryon Bratcher Named Innovator of the Year at Leaders in Tech Law Awards

Recognizing Innovation: Reed Smith’s Knowledge Management Team Honored

Get in Touch
With the Team

Gravity Stack is a trusted partner to leaders in legal and compliance.

Company

Our Services

Are Base Models Beating Legal AI Tools in Testing?

Related News

How is Gravity Stack Using Generative AI to Revolutionize the Market

Jack Nelson Honored with Lifetime Achievement Award for Legal Excellence

Bryon Bratcher Named Innovator of the Year at Leaders in Tech Law Awards

Recognizing Innovation: Reed Smith’s Knowledge Management Team Honored

Get in Touch With the Team

Gravity Stack is a trusted partner to leaders in legal and compliance.

Company

Our Services

Get In Touch

Get In Touch

Get in Touch
With the Team