Tokenizer.ts overview
The Tokenizer
module provides tokenization and text truncation capabilities for large language model text processing workflows.
This module offers services for converting text into tokens and truncating prompts based on token limits, essential for managing context length constraints in large language models.
Example
import { Tokenizer, Prompt } from "@effect/ai"
import { Effect } from "effect"
const tokenizeText = Effect.gen(function* () {
const tokenizer = yield* Tokenizer.Tokenizer
const tokens = yield* tokenizer.tokenize("Hello, world!")
console.log(`Token count: ${tokens.length}`)
return tokens
})
Example
import { Tokenizer, Prompt } from "@effect/ai"
import { Effect } from "effect"
// Truncate a prompt to fit within token limits
const truncatePrompt = Effect.gen(function* () {
const tokenizer = yield* Tokenizer.Tokenizer
const longPrompt = "This is a very long prompt..."
const truncated = yield* tokenizer.truncate(longPrompt, 100)
return truncated
})
Since v1.0.0
Exports Grouped by Category
Constructors
make
Creates a Tokenizer service implementation from tokenization options.
This function constructs a complete Tokenizer service by providing a tokenization function. The service handles both tokenization and truncation operations using the provided tokenizer.
Example
import { Tokenizer, Prompt } from "@effect/ai"
import { Effect } from "effect"
// Simple word-based tokenizer
const wordTokenizer = Tokenizer.make({
tokenize: (prompt) =>
Effect.succeed(
prompt.content
.flatMap((msg) =>
typeof msg.content === "string"
? msg.content.split(" ")
: msg.content.flatMap((part) => (part.type === "text" ? part.text.split(" ") : []))
)
.map((_, index) => index)
)
})
Signature
declare const make: (options: {
readonly tokenize: (content: Prompt.Prompt) => Effect.Effect<Array<number>, AiError.AiError>
}) => Service
Since v1.0.0
Context
Tokenizer (class)
The Tokenizer
service tag for dependency injection.
This tag provides access to tokenization functionality throughout your application, enabling token counting and prompt truncation capabilities.
Example
import { Tokenizer } from "@effect/ai"
import { Effect } from "effect"
const useTokenizer = Effect.gen(function* () {
const tokenizer = yield* Tokenizer.Tokenizer
const tokens = yield* tokenizer.tokenize("Hello, world!")
return tokens.length
})
Signature
declare class Tokenizer
Since v1.0.0
Models
Service (interface)
Tokenizer service interface providing text tokenization and truncation operations.
This interface defines the core operations for converting text to tokens and managing content length within token limits for AI model compatibility.
Example
import { Tokenizer, Prompt } from "@effect/ai"
import { Effect } from "effect"
const customTokenizer: Tokenizer.Service = {
tokenize: (input) =>
Effect.succeed(
input
.toString()
.split(" ")
.map((_, i) => i)
),
truncate: (input, maxTokens) => Effect.succeed(Prompt.make(input.toString().slice(0, maxTokens * 5)))
}
Signature
export interface Service {
/**
* Converts text input into an array of token numbers.
*/
readonly tokenize: (
/**
* The text input to tokenize.
*/
input: Prompt.RawInput
) => Effect.Effect<Array<number>, AiError.AiError>
/**
* Truncates text input to fit within the specified token limit.
*/
readonly truncate: (
/**
* The text input to truncate.
*/
input: Prompt.RawInput,
/**
* Maximum number of tokens to retain.
*/
tokens: number
) => Effect.Effect<Prompt.Prompt, AiError.AiError>
}
Since v1.0.0