Why you bother reading this article?

While designing the blog app, we have 2 choices

put blog content inside the database
put blog content outside the database

This article benchmarks both of these approaches and lets see, which approach is efficient at large scale, when blog content is too much like 64KB, or 1MB or even more.

The project setup

Let’s get started, a simple goLang application

mkdir blog-app
cd blog-app
go mod init dalakoti07/sd/blog
go mod tidy

other SQL commands

\dt

The code for Benchmark

package main

import (
   "bytes"
   "database/sql"   "fmt"   "log"   "math/rand"   "time"
   _ "github.com/lib/pq"
)

const (
   // Database connection string
   connStr = "user=postgres dbname=playground_db sslmode=disable"
)

const (
   // Size of the text file in bytes
   fileSize = 1 * 1024 * 1024 // 3 MB
)

// Function to generate random text of specified size
func generateRandomText(size int) string {
   var buffer bytes.Buffer
   chars := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
   for buffer.Len() < size {
      buffer.WriteString(chars)
   }
   return buffer.String()[:size]
}

// Function to create 1000 blog articles in SQL rows
func createArticlesInRows(db *sql.DB) {
   // Generate 3 MB of random text
   text := generateRandomText(fileSize)

   for i := 1; i <= 1000; i++ {
      isLongArticle := false
      content := "This is the content of the article."
      if i%5 == 0 {
         content = text
         isLongArticle = true
      }
      _, err := db.Exec(
         fmt.Sprintf("INSERT INTO %s (title, content, longArticle) VALUES ($1, $2, $3)", tableName),
         fmt.Sprintf("Article %d", i),
         content,
         isLongArticle,
      )
      if err != nil {
         log.Fatal(err)
      }
   }
}

// Function to create 1000 blog articles with file paths
func createArticlesWithFiles(db *sql.DB) {
   for i := 1; i <= 1000; i++ {
      filePath := fmt.Sprintf("/path/to/articles/article_%d.txt", i)
      _, err := db.Exec(
         fmt.Sprintf("INSERT INTO %s (title, file_path) VALUES ($1, $2)", tableName),
         fmt.Sprintf("Article %d", i), filePath,
      )
      if err != nil {
         log.Fatal(err)
      }
   }
}

// Function to benchmark querying all articles
func benchmarkQuery(db *sql.DB, useFiles bool) time.Duration {
   start := time.Now()

   var rows *sql.Rows
   var err error

   if useFiles {
      rows, err = db.Query(fmt.Sprintf("SELECT title,file_path FROM %s", tableName))
   } else {
      rows, err = db.Query(fmt.Sprintf("SELECT title,content FROM %s", tableName))
   }
   if err != nil {
      log.Fatal(err)
   }
   defer rows.Close()

   for rows.Next() {
      var title string
      var content string
      if useFiles {
         err = rows.Scan(&title, &content)
      } else {
         err = rows.Scan(&title, &content)
      }
      if err != nil {
         log.Fatal(err)
      }
   }

   return time.Since(start)
}

var tableName = "blog_inplace_content"

func main() {
   rand.Seed(time.Now().UnixNano())

   db, err := sql.Open("postgres", connStr)
   if err != nil {
      log.Fatal(err)
   }
   defer db.Close()

   _, err = db.Exec(fmt.Sprintf("DROP TABLE IF EXISTS %s", tableName))
   if err != nil {
      log.Fatal(err)
   }

   approachOne(db)
   // Drop and recreate table for file paths approach
   _, err = db.Exec(fmt.Sprintf("DROP TABLE IF EXISTS %s", tableName))
   if err != nil {
      log.Fatal(err)
   }
   approachTwo(db)
}

func approachTwo(db *sql.DB) {
   _, err := db.Exec(
      fmt.Sprintf(
         "CREATE TABLE %s (id SERIAL PRIMARY KEY, title VARCHAR(255) NOT NULL, file_path VARCHAR(255), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP)",
         tableName,
      ),
   )
   if err != nil {
      log.Fatal(err)
   }

   fmt.Println("Inserting articles with file paths...")
   createArticlesWithFiles(db)
   durationFiles := benchmarkQuery(db, true)
   fmt.Printf("Querying with file paths took: %v\n", durationFiles)
}

func approachOne(db *sql.DB) {
   _, err := db.Exec(fmt.Sprintf("create table %s (", tableName) +
      "id Serial PRIMARY KEY," +
      "title VARCHAR(255) NOT NULL," +
      "content TEXT NOT NULL," +
      "longArticle BOOLEAN DEFAULT FALSE," +
      "created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP);",
   )
   if err != nil {
      log.Fatal(err)
   }

   // Insert articles and benchmark
   fmt.Println("Inserting articles with content in rows...")
   createArticlesInRows(db)
   durationRows := benchmarkQuery(db, false)
   fmt.Printf("Querying with content in rows took: %v\n", durationRows)
}

When storing 3MB of data, I means storing a file path is 2000 times faster, milli seconds and micro seconds

Inserting articles with content in rows...
Querying with content in rows took: 475.581541ms
Inserting articles with file paths...
Querying with file paths took: 243.041µs

Lets do this for 1 MB of data

Inserting articles with content in rows...
Querying with content in rows took: 135.909292ms
Inserting articles with file paths...
Querying with file paths took: 232.833µs

Lets do this for 100 KB data, considering 5 chars on each words, and utef uses 4 bytes and 5000 words in a article which is 4*5*5000 = 100 Kb

Inserting articles with file paths...
Querying with file paths took: 409.083µs
Inserting articles with content in rows...
Querying with content in rows took: 13.894291ms

That is also significant, lets do one more thing, like add a flag to DB that article is long or short, and then would make query of non short items and then lets see what happens.

Inserting articles with file paths...
Querying with file paths took: 615.166µs
Inserting articles with content in rows...
Querying with content in rows took: 886.625µs

Lets talk how database handle queries, and data retreival. Modern databases use something called predicate push. Predicate pushdown is a query optimization technique used in database technologies. It enables developers to filter data at the data source, reducing the amount of data transmitted and processed.

Conclusion

So dumping all the blog content in database would slow the select * query, so putting a content off the db row is a better choice. Maybe in a text file or a mongo db separately.

Next Step

benchmark search query, with 2 approaches

blog content in db
blog content in mongo

Design Blog App

Why you bother reading this article?

The project setup

The code for Benchmark

Conclusion

Next Step