data stacks aims to make it easier for developers to build generative AI search augmented generation (RAG) applications with new data APIs released today.
DataStax is one of the leading commercial vendors behind the open source Apache Cassandra database that is the foundation of AstraDB cloud database as a service. Like many other database vendors, DataStax has added the following features: Vector database function At a recent event, DataStax's CEO claimed that Cassandra is „…the best fucking database for generational AI.“
Vector database capabilities are important to enable RAG applications to combine large-scale language models (LLMs) and data platforms to produce highly accurate and customized results.
![](https://venturebeat.com/wp-content/uploads/2024/01/api-copy.png?resize=2928%2C1072&strip=all)
DataStax has had vector functionality in AstraDB since July 2023, but that functionality required users to use Cassandra Query Language (CQL) as the primary path to query data. A new data API released today changes that, allowing developers to access databases using their Python and JavaScript programming languages, and the company says this will allow it to connect its DataStax and proprietary vector databases. claims that it is possible to narrow the gap between pine cone We just updated the platform of the same name serverless database Functionality.
„There was kind of a tug of war between native vector databases that didn't support non-vector query types and hybrid databases that had very robust query models,“ said Ed Anuff, chief product officer at DataStax. he told VentureBeat. „What we tried to do is fill that gap, and that's what the date API is about.“
How the DataStax data API is changing the way developers build RAG applications
The new data API does not provide new vector functionality to AstraDB databases. Instead, it makes it easy for developers to build applications.
According to Anuff, the new API aims to reduce the impedance mismatch between what developers are doing and what the database provides. Anuff says that since July 2023, vector functionality was first introduced to AstraDB, he has cloud data, and about half of the new users who have signed up for the database have used it to build Gen AI applications. I pointed out that there is.
The challenge is that these developers cannot easily access AstraDB using the programming languages they were already using to build Gen AI applications, primarily Python and JavaScript.
Before the new data API, developers building AI applications using AstraDB had to use the standard Cassandra Query Language (CQL). This requires more data modeling knowledge than a developer would want to handle in a simple rack application. Also, the query may not be optimized for vector data.
Anuff says the new data API handles vectorization automatically, presents a simpler interface in languages like Python and JavaScript, and allows vectorization to be done at the database level rather than simply adding vectors as a separate data type. We explained that it makes your life easier by optimizing performance by storing and indexing more efficiently. . This reduces the learning curve and improves performance compared to just building on top of the existing Cassandra API and data model.
It's all about the API
Some classes in the database API simply translate from a native programming language, such as Python or JavaScript, to something that is a database query language. This is very similar in functionality to the decades-old approach in which developers interacted with databases through object-relational mappers (ORMs).
Cassandra is designed differently than other databases, so the DataStax data API is slightly different. At an architectural level, Cassandra is organized around a set of high-performance primitives that combine to support different types of query patterns. Anuf said that his Cassandra data architecture allows connections at deeper layers of the database, improving overall query performance.
„Data API exposes a very simple JSON-based data format to developers. Anything that can be expressed in JSON can be sent to or retrieved from a database,“ says Anuff. says Mr. “But we store it within Cassandra in a very efficient way, and we do it directly at the storage layer, ensuring that the performance that developers get is maintained.”
Vector acceleration with JVector engine
Another important part of DataStax's vector database advancements is J vector A search engine that is part of AstraDB. JVector is an open source embedded vector search engine developed by DataStax.
Anuff explained that JVector uses an algorithm called DiskANN. It is a disk-based, storage-optimized version of the ANN (Approximate Nearest Neighbor Search) algorithm that is widely used in almost all vector databases. He pointed out that DiskANN provides significantly better search capabilities compared to other algorithms that do not perform as well at large storage and distribution scales.
DataStax says that the JVector engine allows AstraDB to achieve better relevancy and reproducibility than other vector databases. Much of DataStax's vector work, including JVector and the Data API, has been open sourced and used by the Cassandra open source community and his AstraDB customers at DataStax.
„We're very passionate about making things available in the open source ecosystem,“ Anuf said. „We also want to make sure that if you're just a developer looking at which cloud services to use, there's the easiest way to do that.“
VentureBeat's mission will be a digital town square for technical decision makers to gain knowledge and transact on transformative enterprise technologies. Please see the briefing.