WebSocket API Documentation Best Practices: AsyncAPI Guide
Comprehensive guide to documenting WebSocket endpoints using AsyncAPI standards. Learn best practices for versioning, streaming data, error handling, and structuring documentation.
What are the best practices for documenting WebSocket endpoints using OpenAPI or AsyncAPI standards? Specifically, how should I:
- Describe the connection process, server configurations, and port specifications?
- Implement endpoint versioning for WebSocket APIs?
- Document streaming input (audio data) and output (text transcription) in chunks?
- Define error codes and success responses?
- Structure documentation to separate connection establishment from in-connection operations?
I’m particularly concerned about:
- How to properly version WebSocket endpoints
- Best practices for describing streaming data flows
- Whether different documentation approaches are needed for connection establishment versus in-connection operations
- Appropriate endpoint path naming conventions for WebSocket APIs
The use case involves creating a new WebSocket endpoint in an existing HTTP service that will:
- Accept connections from a frontend
- Receive audio streams
- Forward audio to a gRPC transcription service
- Return text transcription in chunks
- Handle normal and error-based connection closures
AsyncAPI is the definitive standard for documenting WebSocket endpoints, providing the necessary structure for duplex, message-based communication that OpenAPI cannot adequately support. When documenting your audio transcription WebSocket, you’ll need to focus on three-tier documentation structure with proper versioning, binary format for audio streams, and separate handling of connection versus in-connection operations.
Contents
- AsyncAPI vs OpenAPI for WebSocket Documentation
- Documenting Connection Establishment and Server Configuration
- WebSocket Endpoint Versioning Best Practices
- Documenting Streaming Audio Input and Text Output
- Error Codes and Connection Closure Documentation
- Structuring Documentation: Connection vs In-Connection Operations
- Complete AsyncAPI Example for Audio Transcription WebSocket
- Path Naming Conventions for WebSocket APIs
- Sources
- Conclusion
AsyncAPI vs OpenAPI for WebSocket Documentation
When considering documentation standards for WebSocket endpoints, it’s crucial to understand that OpenAPI, while widely used for REST APIs, has significant limitations for event-driven, real-time communication. As the AsyncAPI documentation team states, “OpenAPI specification won’t help you much here” when dealing with WebSocket APIs. This fundamental difference stems from their architectural foundations—OpenAPI is designed around the request-response paradigm, while AsyncAPI embraces event-driven communication patterns.
OpenAPI 3.0 does support WebSocket schemes (ws:// and wss://) in the servers array, but it lacks the ability to properly model the continuous, message-based nature of WebSocket communication. AWS’s API Gateway Developer Kit acknowledges this limitation, noting that “TypeSpec and Smithy are the recommended model languages for WebSocket APIs” rather than OpenAPI.
AsyncAPI, by contrast, was specifically created for asynchronous APIs like WebSockets, MQTT, and Kafka. It provides the necessary structure to document connection establishment, message flows, and streaming data patterns. For your audio transcription WebSocket, AsyncAPI offers the tools to document the duplex communication where you receive binary audio streams and return structured text transcription chunks.
The key difference lies in their approach to documentation:
- OpenAPI: Models HTTP requests and responses
- AsyncAPI: Models channels, operations, and messages over persistent connections
This distinction makes AsyncAPI the clear choice for documenting your WebSocket endpoint that handles continuous audio streaming and transcription results.
Documenting Connection Establishment and Server Configuration
For WebSocket APIs, the connection process is fundamentally different from HTTP connections—it’s a persistent, stateful channel rather than stateless request-response cycles. In AsyncAPI, you document the connection establishment in the servers section, specifying the WebSocket endpoint details including protocol, host, port, and any authentication requirements.
Server Configuration Basics
Your AsyncAPI document begins with the servers section, where you define the WebSocket endpoints:
servers:
production:
url: wss://api.example.com/v1/ws/transcribe
protocol: wss
description: Production WebSocket server for audio transcription
variables:
region:
description: AWS region
default: us-east-1
This configuration specifies:
- The WebSocket secure protocol (
wss://) - The full endpoint URL including version
- Optional variables for environment-specific configurations
Authentication Configuration
For production WebSocket endpoints, authentication is typically handled during the handshake phase. AsyncAPI allows you to document various authentication methods:
servers:
production:
# ... server configuration as above
security:
- ApiKey: []
- OAuth2: [transcribe:read]
components:
securitySchemes:
ApiKey:
type: apiKey
in: header
name: X-API-Key
OAuth2:
type: oauth2
flows:
implicit:
authorizationUrl: https://api.example.com/oauth/authorize
scopes:
transcribe:read: Access to transcription service
Channel Bindings for Handshake
The connection establishment itself is documented using channel bindings. In AsyncAPI, channels represent the communication path, and bindings specify how that channel operates at the protocol level:
channels:
/transcribe:
description: Audio transcription WebSocket endpoint
servers: [production]
bindings:
ws:
method: GET
headers:
Cache-Control: no-cache
Upgrade: websocket
Connection: Upgrade
This documentation ensures that developers understand not just where to connect, but how the WebSocket handshake should occur, including required headers and the upgrade process from HTTP to WebSocket.
WebSocket Endpoint Versioning Best Practices
Versioning WebSocket APIs presents unique challenges compared to REST APIs due to the persistent nature of the connections. Based on industry best practices and the AsyncAPI specification, URI-based versioning is the recommended approach for WebSocket endpoints.
URI Versioning Approach
The most widely adopted pattern is to include the version directly in the WebSocket path:
wss://api.example.com/v1/ws/transcribe
wss://api.example.com/v2/ws/transcribe
This approach provides several advantages:
- Clear version identification in connection URLs
- Easy routing at the load balancer or proxy level
- Familiar pattern for developers experienced with REST API versioning
- Support for parallel deployment of multiple versions
In your AsyncAPI document, you would specify the versioned server URL:
servers:
v1:
url: wss://api.example.com/v1/ws/transcribe
protocol: wss
description: WebSocket API version 1 for audio transcription
v2:
url: wss://api.example.com/v2/ws/transcribe
protocol: wss
description: WebSocket API version 2 for audio transcription with improved features
Version Information in AsyncAPI
Additionally, you should include version information in the AsyncAPI info section:
info:
title: Audio Transcription WebSocket API
version: 1.0.0
description: |
WebSocket API for real-time audio transcription.
## Version History
- 1.0.0: Initial release with basic transcription features
- 1.1.0: Added support for custom vocabulary (planned)
This dual approach—URI versioning combined with semantic versioning in the documentation—provides both runtime versioning and clear documentation of changes.
Header-Based Versioning Alternative
While URI versioning is preferred, header-based versioning is sometimes used for WebSocket APIs:
wss://api.example.com/ws/transcribe
Headers:
X-API-Version: 1
This approach allows the same WebSocket endpoint to handle multiple versions based on headers, but it’s less common and can complicate routing at the infrastructure level. If you choose this approach, document it clearly in your AsyncAPI specification:
servers:
production:
url: wss://api.example.com/ws/transcribe
protocol: wss
variables:
version:
enum:
- '1'
- '2'
default: '1'
description: API version to use
Regardless of your chosen approach, consistency is key. Stick to one versioning strategy across all your WebSocket endpoints and document it clearly in your API documentation.
Documenting Streaming Audio Input and Text Output
One of the most challenging aspects of documenting WebSocket APIs is effectively representing streaming data flows. For your audio transcription use case, this involves documenting both incoming binary audio data and outgoing structured text transcription chunks.
Binary Audio Format Documentation
Audio streaming requires special handling in AsyncAPI because it involves binary data rather than JSON. You should document the audio input using the binary format:
channels:
/transcribe:
publish:
summary: Send audio data for transcription
description: |
Send binary audio data in chunks. Audio should be encoded in PCM format with 16kHz sample rate, 16-bit depth, mono channel.
message:
$ref: '#/components/messages/AudioData'
subscribe:
summary: Receive transcription results
description: |
Receive text transcription results in chunks. Each message contains partial or final transcriptions with timestamps.
message:
$ref: '#/components/messages/TranscriptionResult'
Structured Text Output Documentation
For the text transcription output, you’ll want structured JSON that includes not just the text but also metadata like timestamps and completion status:
components:
messages:
TranscriptionResult:
name: transcription_result
title: Transcription Result
summary: Transcription text with metadata
contentType: application/json
payload:
type: object
properties:
text:
type: string
description: Transcribed text
example: "Hello, this is a test of the transcription service."
startTime:
type: number
format: float
description: Start time in seconds from audio beginning
example: 1.23
endTime:
type: number
format: float
description: End time in seconds from audio beginning
example: 2.45
isFinal:
type: boolean
description: Whether this transcription is final or partial
example: false
alternatives:
type: array
items:
type: object
properties:
text:
type: string
description: Alternative transcription
example: "Hello, this is a test of the speech recognition."
confidence:
type: number
format: float
description: Confidence score between 0 and 1
example: 0.95
Handling Multiple Message Types
In real-world scenarios, your WebSocket might need to handle different types of messages. AsyncAPI supports this using the oneOf construct:
components:
messages:
WebSocketMessage:
oneOf:
- $ref: '#/components/messages/AudioData'
- $ref: '#/components/messages/TranscriptionResult'
- $ref: '#/components/messages/ErrorMessage'
- $ref: '#/components/messages/ControlMessage'
Message Correlation
For maintaining context across streaming messages, consider documenting correlation identifiers:
components:
messages:
TranscriptionResult:
# ... existing properties
properties:
# ... existing properties
sessionId:
type: string
format: uuid
description: Unique session identifier for the audio stream
example: "123e4567-e89b-12d3-a456-426614174000"
sequenceNumber:
type: integer
description: Sequence number for ordering messages within a session
example: 42
These additional fields help developers track and correlate messages across the streaming session, which is particularly important for real-time applications like audio transcription.
Error Codes and Connection Closure Documentation
Proper error handling documentation is crucial for WebSocket APIs, as it covers both protocol-level connection closures and application-level error messages. For your audio transcription service, you’ll need to document both standard WebSocket close codes and custom application error codes.
Standard WebSocket Close Codes
WebSocket has a set of standard close codes defined in RFC 6455. You should document these in your API specification:
| Code | Name | Description |
|---|---|---|
| 1000 | Normal Closure | The connection was closed normally. |
| 1001 | Going Away | The endpoint is going away, either because a server is being shut down or a browser is navigating away from the page. |
| 1002 | Protocol Error | An endpoint is terminating the connection due to a protocol error. |
| 1003 | Unsupported Data | An endpoint received a data type it doesn’t support. |
| 1005 | No Status Received | A close frame was received without a status code. |
| 1006 | Abnormal Closure | The connection was closed abnormally, occurring when the connection was closed without sending or receiving a close frame. |
| 1012 | Service Restart | The server is restarting. |
Custom Application Error Codes
Beyond standard WebSocket codes, your application should define custom error codes in the 4000-4999 range as recommended by the WebSocket specification. For your transcription service, consider these custom codes:
| Code | Name | Description |
|---|---|---|
| 4000 | Invalid Authentication | Authentication failed or credentials are invalid. |
| 4001 | Invalid Session Configuration | The session configuration is invalid or missing required parameters. |
| 4002 | Invalid Model | Specified transcription model is invalid or not available. |
| 4003 | Unsupported Audio Format | The audio format is not supported. |
| 4004 | Audio Processing Error | Error occurred while processing the audio. |
| 4005 | Insufficient Quota | API quota exceeded for the current period. |
| 4029 | Rate Limited | Too many requests in a short period. |
| 4500 | Internal Server Error | An unexpected error occurred on the server. |
Error Message Format
For application-level errors (not just connection closure), define a structured error message format:
components:
messages:
ErrorMessage:
name: error_message
title: Error Message
summary: Structured error information
contentType: application/json
payload:
type: object
required:
- code
- message
properties:
code:
type: integer
description: Error code from the defined error code table
example: 4000
message:
type: string
description: Human-readable error description
example: "Authentication failed. Please check your API key."
details:
type: object
description: Additional error context
properties:
sessionId:
type: string
format: uuid
description: Session ID if available
example: "123e4567-e89b-12d3-a456-426614174000"
retryAfter:
type: integer
description: Seconds to wait before retrying (for rate limiting)
example: 60
timestamp:
type: string
format: date-time
description: When the error occurred
example: "2023-01-01T12:34:56Z"
Connection Closure Documentation
In your AsyncAPI document, document how connection closure should be handled:
channels:
/transcribe:
# ... existing channel definition
close:
summary: Close the transcription session
description: |
Close the WebSocket connection gracefully. If `sendFinalTranscription` is true,
the server will send any remaining transcription data before closing.
message:
$ref: '#/components/messages/CloseMessage'
components:
messages:
CloseMessage:
name: close_message
title: Close Message
summary: Message to initiate connection closure
contentType: application/json
payload:
type: object
properties:
sendFinalTranscription:
type: boolean
default: true
description: Whether to send final transcription before closing
reason:
type: string
description: Reason for closing the connection
example: "User initiated disconnection"
This comprehensive error documentation ensures that both client and server developers understand the full error lifecycle, from connection issues to application-specific error handling.
Structuring Documentation: Connection vs In-Connection Operations
One of the most powerful aspects of AsyncAPI is its ability to clearly separate different aspects of WebSocket communication. For your audio transcription service, this means distinguishing between connection establishment (the handshake) and ongoing in-connection operations (sending audio, receiving transcriptions).
Three-Tier Documentation Structure
AsyncAPI provides a three-tier binding structure that perfectly addresses your concern about separating documentation:
- Channel Bindings - Document the connection establishment and handshake
- Operation Bindings - Document the in-connection operations
- Message Bindings - Document the payload format for each message
Channel Bindings for Connection Establishment
Channel bindings document how the connection itself is established. This is where you specify the WebSocket handshake process:
channels:
/transcribe:
description: Audio transcription WebSocket endpoint
servers: [production]
bindings:
ws:
method: GET
headers:
Cache-Control: no-cache
Upgrade: websocket
Connection: Upgrade
query:
api-version: "1.0"
description: |
The WebSocket connection is established through an HTTP upgrade request.
The client must provide authentication headers in the initial request.
Operation Bindings for In-Connection Operations
Once the connection is established, you document the operations that can occur over that connection:
channels:
/transcribe:
# ... existing channel definition
publish:
operationId: sendAudio
summary: Send audio data for transcription
description: |
Send binary audio chunks for real-time transcription.
Audio should be in PCM format with 16kHz sample rate.
bindings:
ws:
method: binary
encoding: base64
subscribe:
operationId: receiveTranscription
summary: Receive transcription results
description: |
Receive text transcription results as they become available.
Results may be partial until the final transcription is complete.
bindings:
ws:
method: text
Message Bindings for Payload Schemas
Finally, message bindings define the exact structure of payloads:
components:
messages:
AudioChunk:
name: audio_chunk
title: Audio Chunk
summary: Binary audio data for transcription
contentType: audio/octet-stream
bindings:
ws:
type: request
encoding: binary
payload:
type: string
format: binary
TranscriptionResult:
name: transcription_result
title: Transcription Result
summary: Text transcription with metadata
contentType: application/json
bindings:
ws:
type: response
encoding: text
payload:
type: object
# ... schema properties as defined earlier
Complete Separation Example
Putting it all together, here’s how you would document the separation between connection and operations:
channels:
/transcribe:
# Connection documentation
bindings:
ws:
method: GET
headers:
Authorization: Bearer {token}
description: Establish WebSocket connection for audio transcription
# Operation documentation
publish:
bindings:
ws:
method: binary
description: Send audio chunks for transcription
message:
$ref: '#/components/messages/AudioChunk'
subscribe:
bindings:
ws:
method: text
description: Receive transcription results
message:
$ref: '#/components/messages/TranscriptionResult'
components:
messages:
AudioChunk:
bindings:
ws:
type: request
# ... payload definition
TranscriptionResult:
bindings:
ws:
type: response
# ... payload definition
This clear separation makes it easy for developers to understand both how to establish the connection and what operations are available once connected. It also allows you to generate different types of documentation—connection guides and operation references—from the same AsyncAPI specification.
Complete AsyncAPI Example for Audio Transcription WebSocket
Here’s a complete AsyncAPI specification for your audio transcription WebSocket endpoint, incorporating all the best practices discussed:
asyncapi: 3.0.0
info:
title: Audio Transcription WebSocket API
version: 1.0.0
description: |
WebSocket API for real-time audio transcription.
## Features
- Real-time streaming of audio for transcription
- Partial and final transcription results
- Support for multiple audio formats
- Error handling with structured messages
## Version History
- 1.0.0: Initial release with basic transcription features
contact:
name: API Support
url: https://api.example.com/support
email: support@example.com
servers:
production:
url: wss://api.example.com/v1/ws/transcribe
protocol: wss
description: Production WebSocket server for audio transcription
variables:
region:
description: AWS region
default: us-east-1
security:
- ApiKey: []
- OAuth2: [transcribe:read]
channels:
/transcribe:
description: Audio transcription WebSocket endpoint
servers: [production]
bindings:
ws:
method: GET
headers:
Cache-Control: no-cache
Upgrade: websocket
Connection: Upgrade
query:
api-version: "1.0"
description: |
The WebSocket connection is established through an HTTP upgrade request.
The client must provide authentication headers in the initial request.
publish:
operationId: sendAudio
summary: Send audio data for transcription
description: |
Send binary audio chunks for real-time transcription.
Audio should be in PCM format with 16kHz sample rate, 16-bit depth, mono channel.
bindings:
ws:
method: binary
encoding: base64
message:
$ref: '#/components/messages/AudioData'
subscribe:
operationId: receiveTranscription
summary: Receive transcription results
description: |
Receive text transcription results as they become available.
Results may be partial until the final transcription is complete.
bindings:
ws:
method: text
message:
$ref: '#/components/messages/TranscriptionResult'
close:
summary: Close the transcription session
description: |
Close the WebSocket connection gracefully. If `sendFinalTranscription` is true,
the server will send any remaining transcription data before closing.
message:
$ref: '#/components/messages/CloseMessage'
components:
messages:
AudioData:
name: audio_data
title: Audio Data
summary: Binary audio chunk for transcription
contentType: audio/octet-stream
bindings:
ws:
type: request
encoding: binary
payload:
type: string
format: binary
description: |
Audio chunk in PCM format:
- Sample rate: 16kHz
- Bit depth: 16-bit
- Channels: 1 (mono)
- Chunk duration: 100ms recommended
TranscriptionResult:
name: transcription_result
title: Transcription Result
summary: Transcription text with metadata
contentType: application/json
bindings:
ws:
type: response
encoding: text
payload:
type: object
properties:
text:
type: string
description: Transcribed text
example: "Hello, this is a test of the transcription service."
startTime:
type: number
format: float
description: Start time in seconds from audio beginning
example: 1.23
endTime:
type: number
format: float
description: End time in seconds from audio beginning
example: 2.45
isFinal:
type: boolean
description: Whether this transcription is final or partial
example: false
sessionId:
type: string
format: uuid
description: Unique session identifier for the audio stream
example: "123e4567-e89b-12d3-a456-426614174000"
sequenceNumber:
type: integer
description: Sequence number for ordering messages within a session
example: 42
alternatives:
type: array
items:
type: object
properties:
text:
type: string
description: Alternative transcription
example: "Hello, this is a test of the speech recognition."
confidence:
type: number
format: float
description: Confidence score between 0 and 1
example: 0.95
CloseMessage:
name: close_message
title: Close Message
summary: Message to initiate connection closure
contentType: application/json
bindings:
ws:
type: control
payload:
type: object
properties:
sendFinalTranscription:
type: boolean
default: true
description: Whether to send final transcription before closing
reason:
type: string
description: Reason for closing the connection
example: "User initiated disconnection"
ErrorMessage:
name: error_message
title: Error Message
summary: Structured error information
contentType: application/json
bindings:
ws:
type: error
payload:
type: object
required:
- code
- message
properties:
code:
type: integer
description: Error code from the defined error code table
example: 4000
message:
type: string
description: Human-readable error description
example: "Authentication failed. Please check your API key."
details:
type: object
description: Additional error context
properties:
sessionId:
type: string
format: uuid
description: Session ID if available
example: "123e4567-e89b-12d3-a456-426614174000"
retryAfter:
type: integer
description: Seconds to wait before retrying (for rate limiting)
example: 60
timestamp:
type: string
format: date-time
description: When the error occurred
example: "2023-01-01T12:34:56Z"
securitySchemes:
ApiKey:
type: apiKey
in: header
name: X-API-Key
OAuth2:
type: oauth2
flows:
implicit:
authorizationUrl: https://api.example.com/oauth/authorize
scopes:
transcribe:read: Access to transcription service
This comprehensive AsyncAPI specification documents:
- Connection establishment process with authentication
- Binary audio stream format and requirements
- Structured transcription results with metadata
- Error handling with custom error codes
- Graceful connection closure
- Version information and security requirements
You can use this specification with AsyncAPI tools to generate interactive documentation, client SDKs, and server stubs, ensuring consistency across your implementation.
Path Naming Conventions for WebSocket APIs
Choosing appropriate path naming conventions for WebSocket endpoints is an important aspect of API design that affects both discoverability and consistency across your API surface. Based on community consensus and industry practices, here are the recommended approaches.
WebSocket Path Prefix
The most widely adopted convention is to use a /ws/ prefix for WebSocket endpoints to distinguish them from REST endpoints:
HTTP REST API: https://api.example.com/v1/users
WebSocket API: wss://api.example.com/v1/ws/transcribe
This clear separation makes it easy for developers to identify which endpoints are WebSocket connections without examining the protocol or documentation. The /ws/ prefix is intuitive and consistent across many organizations.
Semantic Naming
Beyond the prefix, use semantic naming that clearly describes the functionality:
wss://api.example.com/v1/ws/audio-transcribe
wss://api.example.com/v1/ws/real-time-chat
wss://api.example.com/v1/ws/notification-push
These names clearly indicate both the protocol type (WebSocket) and the specific functionality.
Versioning in the Path
As discussed earlier, include the API version in the path:
wss://api.example.com/v1/ws/transcribe
wss://api.example.com/v2/ws/transcribe
Place the version after the domain but before the semantic endpoint name for consistency with your REST API versioning strategy.
Consistent Base Paths
Maintain consistent base paths across all your WebSocket endpoints:
wss://api.example.com/v1/ws/audio-transcribe
wss://api.example.com/v1/ws/speech-recognition
wss://api.example.com/v1/ws/text-to-speech
This consistency makes it easier for developers to understand your API structure and navigate between different endpoints.
Alternative Naming Patterns
While /ws/ is the most common prefix, some organizations use different conventions:
| Pattern | Example | When to Use |
|---|---|---|
/api/ws/ |
wss://api.example.com/api/v1/ws/transcribe |
When your base path already includes /api/ |
/realtime/ |
wss://api.example.com/v1/realtime/transcribe |
When emphasizing real-time nature |
/stream/ |
wss://api.example.com/v1/stream/transcribe |
When emphasizing streaming nature |
Choose a pattern that aligns with your existing API naming conventions and clearly communicates the WebSocket nature of the endpoint.
Full Example
Here’s how your complete WebSocket endpoint URL would look with recommended naming conventions:
wss://api.example.com/v1/ws/audio-transcribe
Breaking this down:
wss://- Secure WebSocket protocolapi.example.com- Your domain/v1/- API version/ws/- WebSocket endpoint prefixaudio-transcribe- Semantic functionality description
This structure provides a clear, consistent, and discoverable URL pattern for your WebSocket API.
Sources
- AsyncAPI Blog Part 1 — Why OpenAPI specification won’t help with WebSocket APIs: https://asyncapi.com/blog/websocket-api-documentation-openapi-asyncapi-pt1/
- AsyncAPI Blog Part 2 — WebSocket API documentation best practices: https://asyncapi.com/blog/websocket-api-documentation-openapi-asyncapi-pt2/
- AsyncAPI WebSocket Bindings — Three-tier binding structure for channels, operations, and messages: https://www.asyncapi.com/docs/reference/specification/v3.0.0#channelBindings
- AWS PDK Documentation — TypeSpec and Smithy as recommended model languages for WebSocket APIs: https://docs.aws.amazon.com/apigateway/latest/developerguide/http-api-develop-model.html
- Prosa AI STT Documentation — Real-world error codes example for transcription services: https://docs.prosa.ai/errors-and-status-codes/
- Swagger/OpenAPI Docs — WebSocket scheme support in OpenAPI 3.0: https://swagger.io/docs/specification/v3.0/serialization/
- Stack Overflow Naming Convention - /api/path for HTTP, /ws/path for WebSocket convention: https://stackoverflow.com/questions/42782766/whats-the-correct-uri-for-a-websocket-endpoint
- AsyncAPI 3.0.0 Release Notes — Modern specification features including channel as TCP connection: https://www.asyncapi.com/blog/asyncapi-3-0-0-is-out/
- Bump.sh Comparison — AsyncAPI vs OpenAPI for different API paradigms: https://bump.sh/blog/openapi-vs-asyncapi
Conclusion
Documenting WebSocket APIs effectively requires a different approach than traditional REST APIs, with AsyncAPI emerging as the definitive standard for this purpose. Based on best practices and industry consensus, the five key recommendations for your audio transcription WebSocket are:
-
Use AsyncAPI, not OpenAPI - AsyncAPI provides the necessary structure to document duplex, message-based communication that OpenAPI cannot adequately model.
-
Implement URI versioning - Follow the pattern
wss://api.example.com/v1/ws/transcribeto clearly indicate API versions and enable proper routing. -
Document streaming data flows separately - Use AsyncAPI’s binary format for audio input and structured JSON for text output, with clear documentation of format requirements.
-
Define comprehensive error handling - Document both standard WebSocket close codes and custom application error codes in the 4000-4999 range, along with structured error message formats.
-
Separate connection from operations - Utilize AsyncAPI’s three-tier binding structure to clearly distinguish between connection establishment (channel bindings) and in-connection operations (operation bindings).
By following these practices, you’ll create clear, comprehensive documentation that enables developers to effectively implement and integrate with your WebSocket API. The AsyncAPI specification not only serves as documentation but can also be used to generate client SDKs, server stubs, and interactive documentation, ensuring consistency across your implementation.