Tokenizer.ts overview

The Tokenizer module provides tokenization and text truncation capabilities for large language model text processing workflows.

This module offers services for converting text into tokens and truncating prompts based on token limits, essential for managing context length constraints in large language models.

Example

import { Tokenizer, Prompt } from "@effect/ai"
import { Effect } from "effect"

const tokenizeText = Effect.gen(function* () {
  const tokenizer = yield* Tokenizer.Tokenizer
  const tokens = yield* tokenizer.tokenize("Hello, world!")
  console.log(`Token count: ${tokens.length}`)
  return tokens
})

Example

import { Tokenizer, Prompt } from "@effect/ai"
import { Effect } from "effect"

// Truncate a prompt to fit within token limits
const truncatePrompt = Effect.gen(function* () {
  const tokenizer = yield* Tokenizer.Tokenizer
  const longPrompt = "This is a very long prompt..."
  const truncated = yield* tokenizer.truncate(longPrompt, 100)
  return truncated
})

Since v1.0.0

Constructors

make

Creates a Tokenizer service implementation from tokenization options.

This function constructs a complete Tokenizer service by providing a tokenization function. The service handles both tokenization and truncation operations using the provided tokenizer.

Example

import { Tokenizer, Prompt } from "@effect/ai"
import { Effect } from "effect"

// Simple word-based tokenizer
const wordTokenizer = Tokenizer.make({
  tokenize: (prompt) =>
    Effect.succeed(
      prompt.content
        .flatMap((msg) =>
          typeof msg.content === "string"
            ? msg.content.split(" ")
            : msg.content.flatMap((part) => (part.type === "text" ? part.text.split(" ") : []))
        )
        .map((_, index) => index)
    )
})

Signature

declare const make: (options: {
  readonly tokenize: (content: Prompt.Prompt) => Effect.Effect<Array<number>, AiError.AiError>
}) => Service

Source

Since v1.0.0

Context

Tokenizer (class)

The Tokenizer service tag for dependency injection.

This tag provides access to tokenization functionality throughout your application, enabling token counting and prompt truncation capabilities.

Example

import { Tokenizer } from "@effect/ai"
import { Effect } from "effect"

const useTokenizer = Effect.gen(function* () {
  const tokenizer = yield* Tokenizer.Tokenizer
  const tokens = yield* tokenizer.tokenize("Hello, world!")
  return tokens.length
})

Signature

declare class Tokenizer

Source

Since v1.0.0

Models

Service (interface)

Tokenizer service interface providing text tokenization and truncation operations.

This interface defines the core operations for converting text to tokens and managing content length within token limits for AI model compatibility.

Example

import { Tokenizer, Prompt } from "@effect/ai"
import { Effect } from "effect"

const customTokenizer: Tokenizer.Service = {
  tokenize: (input) =>
    Effect.succeed(
      input
        .toString()
        .split(" ")
        .map((_, i) => i)
    ),
  truncate: (input, maxTokens) => Effect.succeed(Prompt.make(input.toString().slice(0, maxTokens * 5)))
}

Signature

export interface Service {
  /**
   * Converts text input into an array of token numbers.
   */
  readonly tokenize: (
    /**
     * The text input to tokenize.
     */
    input: Prompt.RawInput
  ) => Effect.Effect<Array<number>, AiError.AiError>
  /**
   * Truncates text input to fit within the specified token limit.
   */
  readonly truncate: (
    /**
     * The text input to truncate.
     */
    input: Prompt.RawInput,
    /**
     * Maximum number of tokens to retain.
     */
    tokens: number
  ) => Effect.Effect<Prompt.Prompt, AiError.AiError>
}

Source

Since v1.0.0