System Design of an Audio Streaming Service
In this article, we will go over the audio streaming system and key design considerations to address functional and non-functional requirements.
Join the DZone community and get the full member experience.
Join For FreeThe system design of an audio streaming app is unique in how it deals with idiosyncratic business needs. Typically, audio streaming requires a large amount of data to be transferred within the limited bandwidth of the network communication channel.
A successful audio streaming service must handle millions of active users and thousands of content providers from various geographical locations. Different devices and platforms (smartphone, desktop, smart speaker) may support different audio formats such as MP3, FLAC, ALAC, and so on.
In this blog post, we will explore the nuances of designing such a complex system covering functional and non-functional requirements.
Functional Requirements
The functional requirements will cover features required for the content creators and audio users or listeners.
- Content management. The content creator can upload, delete, or update audio with any format. Each audio should have metadata such as title, author, description, category, and metadata tags.
- Notification. The content creator receives notifications on the success or failure of the audio uploaded. The audio listener or user receives notifications on the successful upload of audio.
- Seamless streaming. Users can play, pause, rewind, and download audio of their choice of format.
- Subscribe. Users should be able to subscribe to the audio of their choice to receive notifications on new or updated content.
- User login. Content creators must be authenticated and authorized to access the system. Users can register with the system to set up their profiles.
- Search. Users should be able to search audio based on certain attributes or metadata.
Note: Live streaming and payment services are out of this article's scope.
Sequence Diagram of the Functional Use Case
Architectural Design
Let’s define service components to support the functional requirements.
The first functional requirement is to upload audio using any format such as MP3, FLAC, ALAC, etc. These audio files can hold an impressive amount of data. The audio codec plays a crucial role in efficiently storing, transmitting, and playing this data. There are mainly two types of codecs:
- Lossless codec – compress audio without losing any data and can be fully restored without losing quality.
- Lossy codec – removes some data which helps in reducing size significantly. This could compromise sound quality.
The content provider typically works with lossless formats for recording and editing. This ensures no loss of quality as they operate the audio, apply effects, and master the final product. In simple terms, audio mastering is the final step in the audio production process.
Once the final audio has been mastered, it is converted to a lossy format for general distribution. With a lossy format, the size gets reduced to a greater extent, making it easier and faster to stream and download.
The compressed audio needs to be transmitted to the listener device with the supported network bandwidth. The bandwidth could vary dynamically, with the network load changing based on the number of users getting connected or the user moving from one network zone to another while listening to the audio.
To support this changing network bandwidth, the codec could use the "adaptive bit rate" mechanism. Adaptive bitrate streaming detects users' bandwidth in real time, adjusting the stream accordingly. It can dynamically switch the bit rate streaming based on the current bandwidth of the user. This results in little buffering, faster start times, and a good experience for both high-end and low-end connections.
The encoder encodes the single source of the audio file at multiple-bit rates. These streams of bytes are packaged and stored in the object store to be available to the listener. Once the audio is successfully uploaded, the Notification service sends a notification to the content provider.
The content creator additionally provides metadata while requesting to upload the audio, which can be directly saved to a NoSQL DB via Audio Data service. This data will be indexed to offer better searching capability to the audio listener.
The audio system needs an authentication and authorization service to handle new user registration, log-in using the user credential, and authorization based on roles associated with the users (listeners and content providers). API Gateway service that can centrally manage authentication and authorization. This service can be leveraged to perform the request-routing, and orchestrating process flow from front-end to back-end service.
The audio user (listener) could search for audio of interest, which will be forwarded to the Audio Data service to return the location of the relevant audio(s), pulling info from the audio metadata store. The user clicks on the link, and it will return the audio bytes packaged and stored by the Package service.
The User Profile service manages user preferences, followers, and followings.
The above diagram depicts the basic "upload audio" process flow triggered by the content provider and the "listen audio" process flow triggered by the audio user/listener.
The pipeline and filtering design pattern with message queuing is being used to support the data flow between services to scale the process efficiently with parallelism and fault tolerance.
Now, let’s move to non-functional requirements.
Non-Functional Requirements
- Scalability. The system should support thousands of concurrent users and be able to grow and shrink.
- Fault tolerance. The system should be fault-tolerant, usually with redundant data, with low downtime.
- Performance. The system should have low latency and high responsiveness during content playback.
- Security. The system must be safe from unauthorized access or harmful attacks such as DDoS.
Scalability
The audio system should be able to support thousands of active content creators. To manage the load, the design includes running multiple instances of API Gateway service, App Web service, and Audio Data service fronted by load balancers.
However, this will be not scalable with the increase in number of content creators. The audio files are usually large, which will use high network and computing power while traversing through multiple service components. To optimize system resource usage, a signed URL (also referred to as a pre-signed URL) can be used to provide time-limited access to the object store to upload audio files directly. This eliminates the need to route traffic via API Gateway and API Web service. The signed URLs can be configured with granular permissions and expiration rules for better security.
This covers the scalability requirement for uploading the audio files by content providers.
Millions of search requests from the audio listeners could hit the system, causing the Audio Data service to be overloaded. To scale the system supporting this massive search operation, the Content Query Responsibility Segregation (CQRS) design pattern can be used. The read and write operations from/to datastore can be managed independently, supporting different latency, scalability, and security requirements.
Fault Tolerance
There are various dimensions to the fault tolerance design. A few of them are already included in the design.
- Use of multiple instances of service for scalability and fault tolerance
- Message queuing with pipeline processing to support fault tolerance at the data flow level
- Separation of transcoder service and packaging service
- Multiple DB instances with replica
- Availability Zones aligning to geographical regions such as the U.S. East, U.S. West, and Asia Pacific for the deployment of the entire system supporting a specific region and userbase.
Performance
A content delivery network (CDN) is a distributed network of servers that are grouped together to deliver content, such as audio and video files, faster and more efficiently. It caches the content at the edge server to provide better performance with low latency.
Security
CDN also improves security by providing DDoS mitigation and a few other security measurements.
The Package service will distribute the audio files to CDNs to be cached at the edge servers. The Audio Data service will update the location of the CDN to its metadata that will be routed to the Search service for querying by users.
The above diagram depicts the high-level component architecture of a typical audio system.
Conclusion
At the heart of a good audio system are scalability, performance, and fault tolerance to provide a good user experience with minimal distortion, low latency, and reliability.
Opinions expressed by DZone contributors are their own.
Comments