Engineering for Scale
Building an enterprise-grade vector architecture.
Content sources for vectors can be extremely large. As you grow you should run your Vector workloads across several secondary databases (sometimes called "pods"), which allows each collection to scale independently.
Simple workloads
For small workloads, it's typical to store your data in a single database.
If you've used Vecs to create 3 different collections, you can expose collections to your web or mobile application using views:
For example, with 3 collections, called docs
, posts
, and images
, we could expose the "docs" inside the public schema like this:
_10create view public.docs as_10select_10 id,_10 embedding,_10 metadata, # Expose the metadata as JSON_10 (metadata->>'url')::text as url # Extract the URL as a string_10from vector
You can then use any of the client libraries to access your collections within your applications:
_10const { data, error } = await supabase_10 .from('docs')_10 .select('id, embedding, metadata')_10 .eq('url', '/hello-world')
Enterprise workloads
As you move into production, we recommend splitting your collections into separate projects. This is because it allows your vector stores to scale independently of your production data. Vectors typically grow faster than operational data, and they have different resource requirements. Running them on separate databases removes the single-point-of-failure.
You can use as many secondary databases as you need to manage your collections. With this architecture, you have 2 options for accessing collections within your application:
- Query the collections directly using Vecs.
- Access the collections from your Primary database through a Wrapper.
You can use both of these in tandem to suit your use-case. We recommend option 1
wherever possible, as it offers the most scalability.
Query collections using Vecs
Vecs provides methods for querying collections, either using a cosine similarity function or with metadata filtering.
_10# cosine similarity_10docs.query(query_vector=[0.4,0.5,0.6], limit=5)_10_10# metadata filtering_10docs.query(_10 query_vector=[0.4,0.5,0.6],_10 limit=5,_10 filters={"year": {"$eq": 2012}}, # metadata filters_10)
Accessing external collections using Wrappers
Supabase supports Foreign Data Wrappers. Wrappers allow you to connect two databases together so that you can query them over the network.
This involves 2 steps: connecting to your remote database from the primary and creating a Foreign Table.
Connecting your remote database
Inside your Primary database we need to provide the credentials to access the secondary database:
_10create extension postgres_fdw;_10_10create server docs_server_10foreign data wrapper postgres_fdw_10options (host 'db.xxx.supabase.co', port '5432', dbname 'postgres');_10_10create user mapping for docs_user_10server docs_server_10options (user 'postgres', password 'password');
Create a foreign table
We can now create a foreign table to access the data in our secondary project.
_10create foreign table docs (_10 id text not null,_10 embedding vector(384),_10 metadata jsonb,_10 url text_10)_10server docs_server_10options (schema_name 'public', table_name 'docs');
This looks very similar to our View example above, and you can continue to use the client libraries to access your collections through the foreign table:
_10const { data, error } = await supabase_10 .from('docs')_10 .select('id, embedding, metadata')_10 .eq('url', '/hello-world')
Enterprise architecture
This diagram provides an example architecture that allows you to access the collections either with our client libraries or using Vecs. You can add as many secondary databases as you need (in this example we only show one):