Django Search Engine with Typesense
Few days ago I have shared a tutorial on how we can use Elasticsearch as Full Text Search backend with Django Rest Framework. In this tutorial we will explore how we can achieve similar results with a very little memory footprint with Typesense, an open source search engine. Let’s have a look what Typesense’s founder say:
If you’re new to Typesense: if Algolia and Pinecone had a baby, and it was open source, self-hostable and also came with a SaaS hosted option — that’s Typesense in a nutshell. — Jason Bosco
Let’s say you have a model
class Post(models.Model):
title = models.CharField(max_length=240, null=True, blank=True)
description = models.TextField()
To enable search feature using Typesense typo tolerant search first you need to install Typesense on your system and then in your django project
install it’s python library by running “pip install typesense”
We need to create a typesense connection object which we can re use later
In utils.py or helpers.py as you wish add following lines:
import typesense
client = typesense.Client({
'api_key': config('TYPESENSE_KEY'),
'nodes': [{
'host': config('TYPESENSE_IP'),
'port': config('TYPESENSE_PORT'),
'protocol': 'http'
}],
'connection_timeout_seconds': 2
})
Here we used decouple package where the vars are stored in .env file.
Suppose we have 100 records, so how do we insert? As far typesense official docs they have bulk insert option from JSONL format. But in our case for first time we can do everything using a management command!
So create a file apps/posts/management.commands/typesensepro.py
Or any location you want.
from django.core.management.base import BaseCommand
from apps.posts.models import Post # our model
from apps.helpers.utils import client # connection object
class Command(BaseCommand):
help = 'Custom console command django'
def add_arguments(self, parser):
parser.add_argument('command_name', type=str,
help='Run python .\manage.py typesensepro schema , python manage.py typesensepro reindex python manage.py typesensepro delete')
def handle(self, *args, **kwargs):
command_name = kwargs['command_name']
if client.operations.is_healthy():
if command_name == 'schema':
schema = {
'name': 'posts',
'fields': [
{
'name': 'title',
'type': 'string',
},
{
'name': 'description',
'type': 'string',
}
],
}
try:
res = client.collections.create(schema)
print(res)
except Exception as e:
print(e)
elif command_name == 'destroy':
try:
res = client.collections['posts'].delete()
print(res)
except Exception as e:
print(e)
elif command_name == 'reindex':
try:
posts = Post.objects.all()
for post in posts:
document = {
'id': str(post.id),
'title': str(post.title),
'description': str(post.description)
}
res = client.collections['posts'].documents.upsert(
document)
print(post.id)
except Exception as e:
print(e)
else:
print("Typesense disconnected or error occoured")
Here we have three command arguments such as “schema” for collection creation in Typesense , “reindex” from data migrations and “delete” for deleting the collection.
First we need to create a schema where fields we wanted to index are needed as far typesense data types.
res = client.collections.create(schema)
This line creates a collection (in a sense table in SQL) and fileds are already in the schema dictionary.
In reindex block we simply loop over all objects and can modify data. We are upserting data.
NB: this method works well for initial case where data volume is not that high. Otherwise use Json Line bulk insert method (here). Anyway for my use case only few thousands data were needed to migrate so it has worked well.
python .\manage.py typesensepro schema
python manage.py typesensepro reindex
python manage.py typesensepro delete
So run these commands. Also if you update your schema make sure you delete and re run schema command and then run reindex.
So what about new data? For that we will use django’s post save signal!
In apps/posts/signals.py
from django.db.models.signals import post_save,post_delete
from apps.helpers.utils import client
from .models import Post
@receiver(post_save, sender=Post)
def update_typesense_posts(sender, instance, created, **kwargs):
if instance:
try:
document = {
'id': str(instance.id),
'title': str(instance.title),
'subject': str(instance.description)
}
client.collections['posts'].documents.upsert(
document)
except Exception as e:
print(e)
@receiver(post_delete,sender=Post)
def delete_typesense_posts(sender,instance,*args,**kwargs):
try:
client.collections['posts'].documents[str(instance.id)].delete()
except:
pass
Anyway we must update apps.py
from django.apps import AppConfig
class PostsConfig(AppConfig):
default_auto_field = 'django.db.models.BigAutoField'
name = 'apps.posts'
def ready(self):
from . import signals
Here whenever a new post has been inserted or updated, typesense collection will be updated. In case of deletion, the signal deletes from typesense collection using the unique ID.
So how do you use the Search Engine API? Using Typesense’s Default API or using django view. Such as
class PostSearch(APIView):
def get(self, request, format=None):
search = self.request.GET.get('search', None)
if search is not None:
search_parameters = {
'q': search,
'query_by': 'title,description',
'include_fields': 'id',
'per_page': 250,
'page': 1
}
res = client.collections['posts'].documents.search(search_parameters)
newlist = [x['document']['id'] for x in res['hits']]
queryset = Post.objects.filter(id__in=newlist).order_by('-id')
serializer = PostSerializer(queryset, many=True)
return Response(serializer.data)
Well this is a simple way as we will use django’s pagination, here we are just getting the IDs and making a list to get the objects. There are several ways to use Typesense search feature.
Anyway thanks, follow me on linkedin here