Loading...
Preparing your experience
Preparing your experience
A curated collection of cutting-edge resources for PMs building with large language models. From platforms and frameworks to evaluation tools and safety guidelines.
For Product Managers: This collection focuses on practical resources for product decisions—understanding capabilities, evaluating tradeoffs, ensuring safety, and shipping responsibly.
Last updated: January 2025
GPT-4, GPT-4 Turbo, and GPT-3.5 APIs with function calling, vision, and embeddings. Industry-leading capabilities.
Claude 3 family (Opus, Sonnet, Haiku) with 200K context windows, strong reasoning, and safety features.
Gemini Pro and Ultra models with multimodal capabilities, long context, and Google ecosystem integration.
Open-weight and API models optimized for efficiency. Strong performance at lower costs.
Enterprise-focused LLMs with strong RAG capabilities, embeddings, and reranking for production use.
Unified API for 100+ LLMs with automatic fallbacks, load balancing, and cost optimization.
TypeScript-first SDK for Next.js and React. Streaming, tool calling, and edge runtime support.
Comprehensive framework for LLM orchestration, chains, agents, and memory management.
Build stateful, multi-actor LLM applications with cycles and persistence. For complex workflows.
Data framework for LLM applications. Excellent for RAG, document processing, and knowledge bases.
Programming model for LLM pipelines with automatic optimization. Replaces prompt engineering with compilation.
Microsoft framework for multi-agent conversations and collaborative AI systems.
Managed vector database with high performance, hybrid search, and metadata filtering.
Open-source vector database with GraphQL API, hybrid search, and modular architecture.
High-performance vector database for billion-scale similarity search. Open-source and cloud options.
PostgreSQL extension for vector similarity search. Simple integration with existing databases.
LangChain platform for tracing, debugging, testing, and monitoring LLM applications.
Open-source tool for testing and evaluating LLM outputs. Compare prompts, models, and configurations.
Open-source observability platform for LLMs. Trace, evaluate, and troubleshoot AI applications.
Framework and registry for evaluating LLM performance. Includes benchmarks and best practices.
Holistic Evaluation of Language Models. Stanford benchmark covering 42+ scenarios.
Constitutional AI principles, safety best practices, and privacy guidelines for Claude.
Safety guidelines, moderation API, and policies for responsible AI deployment.
Principles and tools for building fair, accountable, and transparent AI systems.
I'm always interested in discussing AI product strategy and implementation.
Get in touch