Programming

WebSocket API Documentation Best Practices: AsyncAPI Guide

Comprehensive guide to documenting WebSocket endpoints using AsyncAPI standards. Learn best practices for versioning, streaming data, error handling, and structuring documentation.

1 answer 1 view

What are the best practices for documenting WebSocket endpoints using OpenAPI or AsyncAPI standards? Specifically, how should I:

  1. Describe the connection process, server configurations, and port specifications?
  2. Implement endpoint versioning for WebSocket APIs?
  3. Document streaming input (audio data) and output (text transcription) in chunks?
  4. Define error codes and success responses?
  5. Structure documentation to separate connection establishment from in-connection operations?

I’m particularly concerned about:

  • How to properly version WebSocket endpoints
  • Best practices for describing streaming data flows
  • Whether different documentation approaches are needed for connection establishment versus in-connection operations
  • Appropriate endpoint path naming conventions for WebSocket APIs

The use case involves creating a new WebSocket endpoint in an existing HTTP service that will:

  • Accept connections from a frontend
  • Receive audio streams
  • Forward audio to a gRPC transcription service
  • Return text transcription in chunks
  • Handle normal and error-based connection closures

AsyncAPI is the definitive standard for documenting WebSocket endpoints, providing the necessary structure for duplex, message-based communication that OpenAPI cannot adequately support. When documenting your audio transcription WebSocket, you’ll need to focus on three-tier documentation structure with proper versioning, binary format for audio streams, and separate handling of connection versus in-connection operations.


Contents


AsyncAPI vs OpenAPI for WebSocket Documentation

When considering documentation standards for WebSocket endpoints, it’s crucial to understand that OpenAPI, while widely used for REST APIs, has significant limitations for event-driven, real-time communication. As the AsyncAPI documentation team states, “OpenAPI specification won’t help you much here” when dealing with WebSocket APIs. This fundamental difference stems from their architectural foundations—OpenAPI is designed around the request-response paradigm, while AsyncAPI embraces event-driven communication patterns.

OpenAPI 3.0 does support WebSocket schemes (ws:// and wss://) in the servers array, but it lacks the ability to properly model the continuous, message-based nature of WebSocket communication. AWS’s API Gateway Developer Kit acknowledges this limitation, noting that “TypeSpec and Smithy are the recommended model languages for WebSocket APIs” rather than OpenAPI.

AsyncAPI, by contrast, was specifically created for asynchronous APIs like WebSockets, MQTT, and Kafka. It provides the necessary structure to document connection establishment, message flows, and streaming data patterns. For your audio transcription WebSocket, AsyncAPI offers the tools to document the duplex communication where you receive binary audio streams and return structured text transcription chunks.

The key difference lies in their approach to documentation:

  • OpenAPI: Models HTTP requests and responses
  • AsyncAPI: Models channels, operations, and messages over persistent connections

This distinction makes AsyncAPI the clear choice for documenting your WebSocket endpoint that handles continuous audio streaming and transcription results.


Documenting Connection Establishment and Server Configuration

For WebSocket APIs, the connection process is fundamentally different from HTTP connections—it’s a persistent, stateful channel rather than stateless request-response cycles. In AsyncAPI, you document the connection establishment in the servers section, specifying the WebSocket endpoint details including protocol, host, port, and any authentication requirements.

Server Configuration Basics

Your AsyncAPI document begins with the servers section, where you define the WebSocket endpoints:

yaml
servers:
 production:
 url: wss://api.example.com/v1/ws/transcribe
 protocol: wss
 description: Production WebSocket server for audio transcription
 variables:
 region:
 description: AWS region
 default: us-east-1

This configuration specifies:

  • The WebSocket secure protocol (wss://)
  • The full endpoint URL including version
  • Optional variables for environment-specific configurations

Authentication Configuration

For production WebSocket endpoints, authentication is typically handled during the handshake phase. AsyncAPI allows you to document various authentication methods:

yaml
servers:
 production:
 # ... server configuration as above
 security:
 - ApiKey: []
 - OAuth2: [transcribe:read]

components:
 securitySchemes:
 ApiKey:
 type: apiKey
 in: header
 name: X-API-Key
 OAuth2:
 type: oauth2
 flows:
 implicit:
 authorizationUrl: https://api.example.com/oauth/authorize
 scopes:
 transcribe:read: Access to transcription service

Channel Bindings for Handshake

The connection establishment itself is documented using channel bindings. In AsyncAPI, channels represent the communication path, and bindings specify how that channel operates at the protocol level:

yaml
channels:
 /transcribe:
 description: Audio transcription WebSocket endpoint
 servers: [production]
 bindings:
 ws:
 method: GET
 headers:
 Cache-Control: no-cache
 Upgrade: websocket
 Connection: Upgrade

This documentation ensures that developers understand not just where to connect, but how the WebSocket handshake should occur, including required headers and the upgrade process from HTTP to WebSocket.


WebSocket Endpoint Versioning Best Practices

Versioning WebSocket APIs presents unique challenges compared to REST APIs due to the persistent nature of the connections. Based on industry best practices and the AsyncAPI specification, URI-based versioning is the recommended approach for WebSocket endpoints.

URI Versioning Approach

The most widely adopted pattern is to include the version directly in the WebSocket path:

wss://api.example.com/v1/ws/transcribe
wss://api.example.com/v2/ws/transcribe

This approach provides several advantages:

  • Clear version identification in connection URLs
  • Easy routing at the load balancer or proxy level
  • Familiar pattern for developers experienced with REST API versioning
  • Support for parallel deployment of multiple versions

In your AsyncAPI document, you would specify the versioned server URL:

yaml
servers:
 v1:
 url: wss://api.example.com/v1/ws/transcribe
 protocol: wss
 description: WebSocket API version 1 for audio transcription
 v2:
 url: wss://api.example.com/v2/ws/transcribe
 protocol: wss
 description: WebSocket API version 2 for audio transcription with improved features

Version Information in AsyncAPI

Additionally, you should include version information in the AsyncAPI info section:

yaml
info:
 title: Audio Transcription WebSocket API
 version: 1.0.0
 description: |
 WebSocket API for real-time audio transcription.
 
 ## Version History
 - 1.0.0: Initial release with basic transcription features
 - 1.1.0: Added support for custom vocabulary (planned)

This dual approach—URI versioning combined with semantic versioning in the documentation—provides both runtime versioning and clear documentation of changes.

Header-Based Versioning Alternative

While URI versioning is preferred, header-based versioning is sometimes used for WebSocket APIs:

wss://api.example.com/ws/transcribe
Headers:
 X-API-Version: 1

This approach allows the same WebSocket endpoint to handle multiple versions based on headers, but it’s less common and can complicate routing at the infrastructure level. If you choose this approach, document it clearly in your AsyncAPI specification:

yaml
servers:
 production:
 url: wss://api.example.com/ws/transcribe
 protocol: wss
 variables:
 version:
 enum:
 - '1'
 - '2'
 default: '1'
 description: API version to use

Regardless of your chosen approach, consistency is key. Stick to one versioning strategy across all your WebSocket endpoints and document it clearly in your API documentation.


Documenting Streaming Audio Input and Text Output

One of the most challenging aspects of documenting WebSocket APIs is effectively representing streaming data flows. For your audio transcription use case, this involves documenting both incoming binary audio data and outgoing structured text transcription chunks.

Binary Audio Format Documentation

Audio streaming requires special handling in AsyncAPI because it involves binary data rather than JSON. You should document the audio input using the binary format:

yaml
channels:
 /transcribe:
 publish:
 summary: Send audio data for transcription
 description: |
 Send binary audio data in chunks. Audio should be encoded in PCM format with 16kHz sample rate, 16-bit depth, mono channel.
 message:
 $ref: '#/components/messages/AudioData'
 subscribe:
 summary: Receive transcription results
 description: |
 Receive text transcription results in chunks. Each message contains partial or final transcriptions with timestamps.
 message:
 $ref: '#/components/messages/TranscriptionResult'

Structured Text Output Documentation

For the text transcription output, you’ll want structured JSON that includes not just the text but also metadata like timestamps and completion status:

yaml
components:
 messages:
 TranscriptionResult:
 name: transcription_result
 title: Transcription Result
 summary: Transcription text with metadata
 contentType: application/json
 payload:
 type: object
 properties:
 text:
 type: string
 description: Transcribed text
 example: "Hello, this is a test of the transcription service."
 startTime:
 type: number
 format: float
 description: Start time in seconds from audio beginning
 example: 1.23
 endTime:
 type: number
 format: float
 description: End time in seconds from audio beginning
 example: 2.45
 isFinal:
 type: boolean
 description: Whether this transcription is final or partial
 example: false
 alternatives:
 type: array
 items:
 type: object
 properties:
 text:
 type: string
 description: Alternative transcription
 example: "Hello, this is a test of the speech recognition."
 confidence:
 type: number
 format: float
 description: Confidence score between 0 and 1
 example: 0.95

Handling Multiple Message Types

In real-world scenarios, your WebSocket might need to handle different types of messages. AsyncAPI supports this using the oneOf construct:

yaml
components:
 messages:
 WebSocketMessage:
 oneOf:
 - $ref: '#/components/messages/AudioData'
 - $ref: '#/components/messages/TranscriptionResult'
 - $ref: '#/components/messages/ErrorMessage'
 - $ref: '#/components/messages/ControlMessage'

Message Correlation

For maintaining context across streaming messages, consider documenting correlation identifiers:

yaml
components:
 messages:
 TranscriptionResult:
 # ... existing properties
 properties:
 # ... existing properties
 sessionId:
 type: string
 format: uuid
 description: Unique session identifier for the audio stream
 example: "123e4567-e89b-12d3-a456-426614174000"
 sequenceNumber:
 type: integer
 description: Sequence number for ordering messages within a session
 example: 42

These additional fields help developers track and correlate messages across the streaming session, which is particularly important for real-time applications like audio transcription.


Error Codes and Connection Closure Documentation

Proper error handling documentation is crucial for WebSocket APIs, as it covers both protocol-level connection closures and application-level error messages. For your audio transcription service, you’ll need to document both standard WebSocket close codes and custom application error codes.

Standard WebSocket Close Codes

WebSocket has a set of standard close codes defined in RFC 6455. You should document these in your API specification:

Code Name Description
1000 Normal Closure The connection was closed normally.
1001 Going Away The endpoint is going away, either because a server is being shut down or a browser is navigating away from the page.
1002 Protocol Error An endpoint is terminating the connection due to a protocol error.
1003 Unsupported Data An endpoint received a data type it doesn’t support.
1005 No Status Received A close frame was received without a status code.
1006 Abnormal Closure The connection was closed abnormally, occurring when the connection was closed without sending or receiving a close frame.
1012 Service Restart The server is restarting.

Custom Application Error Codes

Beyond standard WebSocket codes, your application should define custom error codes in the 4000-4999 range as recommended by the WebSocket specification. For your transcription service, consider these custom codes:

Code Name Description
4000 Invalid Authentication Authentication failed or credentials are invalid.
4001 Invalid Session Configuration The session configuration is invalid or missing required parameters.
4002 Invalid Model Specified transcription model is invalid or not available.
4003 Unsupported Audio Format The audio format is not supported.
4004 Audio Processing Error Error occurred while processing the audio.
4005 Insufficient Quota API quota exceeded for the current period.
4029 Rate Limited Too many requests in a short period.
4500 Internal Server Error An unexpected error occurred on the server.

Error Message Format

For application-level errors (not just connection closure), define a structured error message format:

yaml
components:
 messages:
 ErrorMessage:
 name: error_message
 title: Error Message
 summary: Structured error information
 contentType: application/json
 payload:
 type: object
 required:
 - code
 - message
 properties:
 code:
 type: integer
 description: Error code from the defined error code table
 example: 4000
 message:
 type: string
 description: Human-readable error description
 example: "Authentication failed. Please check your API key."
 details:
 type: object
 description: Additional error context
 properties:
 sessionId:
 type: string
 format: uuid
 description: Session ID if available
 example: "123e4567-e89b-12d3-a456-426614174000"
 retryAfter:
 type: integer
 description: Seconds to wait before retrying (for rate limiting)
 example: 60
 timestamp:
 type: string
 format: date-time
 description: When the error occurred
 example: "2023-01-01T12:34:56Z"

Connection Closure Documentation

In your AsyncAPI document, document how connection closure should be handled:

yaml
channels:
 /transcribe:
 # ... existing channel definition
 close:
 summary: Close the transcription session
 description: |
 Close the WebSocket connection gracefully. If `sendFinalTranscription` is true,
 the server will send any remaining transcription data before closing.
 message:
 $ref: '#/components/messages/CloseMessage'

components:
 messages:
 CloseMessage:
 name: close_message
 title: Close Message
 summary: Message to initiate connection closure
 contentType: application/json
 payload:
 type: object
 properties:
 sendFinalTranscription:
 type: boolean
 default: true
 description: Whether to send final transcription before closing
 reason:
 type: string
 description: Reason for closing the connection
 example: "User initiated disconnection"

This comprehensive error documentation ensures that both client and server developers understand the full error lifecycle, from connection issues to application-specific error handling.


Structuring Documentation: Connection vs In-Connection Operations

One of the most powerful aspects of AsyncAPI is its ability to clearly separate different aspects of WebSocket communication. For your audio transcription service, this means distinguishing between connection establishment (the handshake) and ongoing in-connection operations (sending audio, receiving transcriptions).

Three-Tier Documentation Structure

AsyncAPI provides a three-tier binding structure that perfectly addresses your concern about separating documentation:

  1. Channel Bindings - Document the connection establishment and handshake
  2. Operation Bindings - Document the in-connection operations
  3. Message Bindings - Document the payload format for each message

Channel Bindings for Connection Establishment

Channel bindings document how the connection itself is established. This is where you specify the WebSocket handshake process:

yaml
channels:
 /transcribe:
 description: Audio transcription WebSocket endpoint
 servers: [production]
 bindings:
 ws:
 method: GET
 headers:
 Cache-Control: no-cache
 Upgrade: websocket
 Connection: Upgrade
 query:
 api-version: "1.0"
 description: |
 The WebSocket connection is established through an HTTP upgrade request.
 The client must provide authentication headers in the initial request.

Operation Bindings for In-Connection Operations

Once the connection is established, you document the operations that can occur over that connection:

yaml
channels:
 /transcribe:
 # ... existing channel definition
 publish:
 operationId: sendAudio
 summary: Send audio data for transcription
 description: |
 Send binary audio chunks for real-time transcription.
 Audio should be in PCM format with 16kHz sample rate.
 bindings:
 ws:
 method: binary
 encoding: base64
 subscribe:
 operationId: receiveTranscription
 summary: Receive transcription results
 description: |
 Receive text transcription results as they become available.
 Results may be partial until the final transcription is complete.
 bindings:
 ws:
 method: text

Message Bindings for Payload Schemas

Finally, message bindings define the exact structure of payloads:

yaml
components:
 messages:
 AudioChunk:
 name: audio_chunk
 title: Audio Chunk
 summary: Binary audio data for transcription
 contentType: audio/octet-stream
 bindings:
 ws:
 type: request
 encoding: binary
 payload:
 type: string
 format: binary
 TranscriptionResult:
 name: transcription_result
 title: Transcription Result
 summary: Text transcription with metadata
 contentType: application/json
 bindings:
 ws:
 type: response
 encoding: text
 payload:
 type: object
 # ... schema properties as defined earlier

Complete Separation Example

Putting it all together, here’s how you would document the separation between connection and operations:

yaml
channels:
 /transcribe:
 # Connection documentation
 bindings:
 ws:
 method: GET
 headers:
 Authorization: Bearer {token}
 description: Establish WebSocket connection for audio transcription
 
 # Operation documentation
 publish:
 bindings:
 ws:
 method: binary
 description: Send audio chunks for transcription
 message:
 $ref: '#/components/messages/AudioChunk'
 
 subscribe:
 bindings:
 ws:
 method: text
 description: Receive transcription results
 message:
 $ref: '#/components/messages/TranscriptionResult'

components:
 messages:
 AudioChunk:
 bindings:
 ws:
 type: request
 # ... payload definition
 
 TranscriptionResult:
 bindings:
 ws:
 type: response
 # ... payload definition

This clear separation makes it easy for developers to understand both how to establish the connection and what operations are available once connected. It also allows you to generate different types of documentation—connection guides and operation references—from the same AsyncAPI specification.


Complete AsyncAPI Example for Audio Transcription WebSocket

Here’s a complete AsyncAPI specification for your audio transcription WebSocket endpoint, incorporating all the best practices discussed:

yaml
asyncapi: 3.0.0
info:
 title: Audio Transcription WebSocket API
 version: 1.0.0
 description: |
 WebSocket API for real-time audio transcription.
 
 ## Features
 - Real-time streaming of audio for transcription
 - Partial and final transcription results
 - Support for multiple audio formats
 - Error handling with structured messages
 
 ## Version History
 - 1.0.0: Initial release with basic transcription features
 contact:
 name: API Support
 url: https://api.example.com/support
 email: support@example.com

servers:
 production:
 url: wss://api.example.com/v1/ws/transcribe
 protocol: wss
 description: Production WebSocket server for audio transcription
 variables:
 region:
 description: AWS region
 default: us-east-1
 security:
 - ApiKey: []
 - OAuth2: [transcribe:read]

channels:
 /transcribe:
 description: Audio transcription WebSocket endpoint
 servers: [production]
 bindings:
 ws:
 method: GET
 headers:
 Cache-Control: no-cache
 Upgrade: websocket
 Connection: Upgrade
 query:
 api-version: "1.0"
 description: |
 The WebSocket connection is established through an HTTP upgrade request.
 The client must provide authentication headers in the initial request.
 
 publish:
 operationId: sendAudio
 summary: Send audio data for transcription
 description: |
 Send binary audio chunks for real-time transcription.
 Audio should be in PCM format with 16kHz sample rate, 16-bit depth, mono channel.
 bindings:
 ws:
 method: binary
 encoding: base64
 message:
 $ref: '#/components/messages/AudioData'
 
 subscribe:
 operationId: receiveTranscription
 summary: Receive transcription results
 description: |
 Receive text transcription results as they become available.
 Results may be partial until the final transcription is complete.
 bindings:
 ws:
 method: text
 message:
 $ref: '#/components/messages/TranscriptionResult'
 
 close:
 summary: Close the transcription session
 description: |
 Close the WebSocket connection gracefully. If `sendFinalTranscription` is true,
 the server will send any remaining transcription data before closing.
 message:
 $ref: '#/components/messages/CloseMessage'

components:
 messages:
 AudioData:
 name: audio_data
 title: Audio Data
 summary: Binary audio chunk for transcription
 contentType: audio/octet-stream
 bindings:
 ws:
 type: request
 encoding: binary
 payload:
 type: string
 format: binary
 description: |
 Audio chunk in PCM format:
 - Sample rate: 16kHz
 - Bit depth: 16-bit
 - Channels: 1 (mono)
 - Chunk duration: 100ms recommended
 
 TranscriptionResult:
 name: transcription_result
 title: Transcription Result
 summary: Transcription text with metadata
 contentType: application/json
 bindings:
 ws:
 type: response
 encoding: text
 payload:
 type: object
 properties:
 text:
 type: string
 description: Transcribed text
 example: "Hello, this is a test of the transcription service."
 startTime:
 type: number
 format: float
 description: Start time in seconds from audio beginning
 example: 1.23
 endTime:
 type: number
 format: float
 description: End time in seconds from audio beginning
 example: 2.45
 isFinal:
 type: boolean
 description: Whether this transcription is final or partial
 example: false
 sessionId:
 type: string
 format: uuid
 description: Unique session identifier for the audio stream
 example: "123e4567-e89b-12d3-a456-426614174000"
 sequenceNumber:
 type: integer
 description: Sequence number for ordering messages within a session
 example: 42
 alternatives:
 type: array
 items:
 type: object
 properties:
 text:
 type: string
 description: Alternative transcription
 example: "Hello, this is a test of the speech recognition."
 confidence:
 type: number
 format: float
 description: Confidence score between 0 and 1
 example: 0.95
 
 CloseMessage:
 name: close_message
 title: Close Message
 summary: Message to initiate connection closure
 contentType: application/json
 bindings:
 ws:
 type: control
 payload:
 type: object
 properties:
 sendFinalTranscription:
 type: boolean
 default: true
 description: Whether to send final transcription before closing
 reason:
 type: string
 description: Reason for closing the connection
 example: "User initiated disconnection"
 
 ErrorMessage:
 name: error_message
 title: Error Message
 summary: Structured error information
 contentType: application/json
 bindings:
 ws:
 type: error
 payload:
 type: object
 required:
 - code
 - message
 properties:
 code:
 type: integer
 description: Error code from the defined error code table
 example: 4000
 message:
 type: string
 description: Human-readable error description
 example: "Authentication failed. Please check your API key."
 details:
 type: object
 description: Additional error context
 properties:
 sessionId:
 type: string
 format: uuid
 description: Session ID if available
 example: "123e4567-e89b-12d3-a456-426614174000"
 retryAfter:
 type: integer
 description: Seconds to wait before retrying (for rate limiting)
 example: 60
 timestamp:
 type: string
 format: date-time
 description: When the error occurred
 example: "2023-01-01T12:34:56Z"

 securitySchemes:
 ApiKey:
 type: apiKey
 in: header
 name: X-API-Key
 OAuth2:
 type: oauth2
 flows:
 implicit:
 authorizationUrl: https://api.example.com/oauth/authorize
 scopes:
 transcribe:read: Access to transcription service

This comprehensive AsyncAPI specification documents:

  • Connection establishment process with authentication
  • Binary audio stream format and requirements
  • Structured transcription results with metadata
  • Error handling with custom error codes
  • Graceful connection closure
  • Version information and security requirements

You can use this specification with AsyncAPI tools to generate interactive documentation, client SDKs, and server stubs, ensuring consistency across your implementation.


Path Naming Conventions for WebSocket APIs

Choosing appropriate path naming conventions for WebSocket endpoints is an important aspect of API design that affects both discoverability and consistency across your API surface. Based on community consensus and industry practices, here are the recommended approaches.

WebSocket Path Prefix

The most widely adopted convention is to use a /ws/ prefix for WebSocket endpoints to distinguish them from REST endpoints:

HTTP REST API: https://api.example.com/v1/users
WebSocket API: wss://api.example.com/v1/ws/transcribe

This clear separation makes it easy for developers to identify which endpoints are WebSocket connections without examining the protocol or documentation. The /ws/ prefix is intuitive and consistent across many organizations.

Semantic Naming

Beyond the prefix, use semantic naming that clearly describes the functionality:

wss://api.example.com/v1/ws/audio-transcribe
wss://api.example.com/v1/ws/real-time-chat
wss://api.example.com/v1/ws/notification-push

These names clearly indicate both the protocol type (WebSocket) and the specific functionality.

Versioning in the Path

As discussed earlier, include the API version in the path:

wss://api.example.com/v1/ws/transcribe
wss://api.example.com/v2/ws/transcribe

Place the version after the domain but before the semantic endpoint name for consistency with your REST API versioning strategy.

Consistent Base Paths

Maintain consistent base paths across all your WebSocket endpoints:

wss://api.example.com/v1/ws/audio-transcribe
wss://api.example.com/v1/ws/speech-recognition
wss://api.example.com/v1/ws/text-to-speech

This consistency makes it easier for developers to understand your API structure and navigate between different endpoints.

Alternative Naming Patterns

While /ws/ is the most common prefix, some organizations use different conventions:

Pattern Example When to Use
/api/ws/ wss://api.example.com/api/v1/ws/transcribe When your base path already includes /api/
/realtime/ wss://api.example.com/v1/realtime/transcribe When emphasizing real-time nature
/stream/ wss://api.example.com/v1/stream/transcribe When emphasizing streaming nature

Choose a pattern that aligns with your existing API naming conventions and clearly communicates the WebSocket nature of the endpoint.

Full Example

Here’s how your complete WebSocket endpoint URL would look with recommended naming conventions:

wss://api.example.com/v1/ws/audio-transcribe

Breaking this down:

  • wss:// - Secure WebSocket protocol
  • api.example.com - Your domain
  • /v1/ - API version
  • /ws/ - WebSocket endpoint prefix
  • audio-transcribe - Semantic functionality description

This structure provides a clear, consistent, and discoverable URL pattern for your WebSocket API.


Sources

  1. AsyncAPI Blog Part 1 — Why OpenAPI specification won’t help with WebSocket APIs: https://asyncapi.com/blog/websocket-api-documentation-openapi-asyncapi-pt1/
  2. AsyncAPI Blog Part 2 — WebSocket API documentation best practices: https://asyncapi.com/blog/websocket-api-documentation-openapi-asyncapi-pt2/
  3. AsyncAPI WebSocket Bindings — Three-tier binding structure for channels, operations, and messages: https://www.asyncapi.com/docs/reference/specification/v3.0.0#channelBindings
  4. AWS PDK Documentation — TypeSpec and Smithy as recommended model languages for WebSocket APIs: https://docs.aws.amazon.com/apigateway/latest/developerguide/http-api-develop-model.html
  5. Prosa AI STT Documentation — Real-world error codes example for transcription services: https://docs.prosa.ai/errors-and-status-codes/
  6. Swagger/OpenAPI Docs — WebSocket scheme support in OpenAPI 3.0: https://swagger.io/docs/specification/v3.0/serialization/
  7. Stack Overflow Naming Convention - /api/path for HTTP, /ws/path for WebSocket convention: https://stackoverflow.com/questions/42782766/whats-the-correct-uri-for-a-websocket-endpoint
  8. AsyncAPI 3.0.0 Release Notes — Modern specification features including channel as TCP connection: https://www.asyncapi.com/blog/asyncapi-3-0-0-is-out/
  9. Bump.sh Comparison — AsyncAPI vs OpenAPI for different API paradigms: https://bump.sh/blog/openapi-vs-asyncapi

Conclusion

Documenting WebSocket APIs effectively requires a different approach than traditional REST APIs, with AsyncAPI emerging as the definitive standard for this purpose. Based on best practices and industry consensus, the five key recommendations for your audio transcription WebSocket are:

  1. Use AsyncAPI, not OpenAPI - AsyncAPI provides the necessary structure to document duplex, message-based communication that OpenAPI cannot adequately model.

  2. Implement URI versioning - Follow the pattern wss://api.example.com/v1/ws/transcribe to clearly indicate API versions and enable proper routing.

  3. Document streaming data flows separately - Use AsyncAPI’s binary format for audio input and structured JSON for text output, with clear documentation of format requirements.

  4. Define comprehensive error handling - Document both standard WebSocket close codes and custom application error codes in the 4000-4999 range, along with structured error message formats.

  5. Separate connection from operations - Utilize AsyncAPI’s three-tier binding structure to clearly distinguish between connection establishment (channel bindings) and in-connection operations (operation bindings).

By following these practices, you’ll create clear, comprehensive documentation that enables developers to effectively implement and integrate with your WebSocket API. The AsyncAPI specification not only serves as documentation but can also be used to generate client SDKs, server stubs, and interactive documentation, ensuring consistency across your implementation.

Authors
Verified by moderation
Moderation