Best Practices for Building SDKs for APIs
In 2020, you can’t be a B2B company without having an API program. Whether your API is the product or APIs are leveraged.
Join the DZone community and get the full member experience.Join For Free
In 2020, you can’t be a B2B company without having an API program. Whether your API is the product or APIs are leveraged to enable additional integrations and functionality for your web app.
Even though an SDK could seem simple in terms of lines of code, SDKs need to be reliable and handle scale with ease. A poorly designed SDK could cripple your customer’s infrastructure and reduce trust in your service. At Moesif, we put a lot of effort into creating SDKs that are both high performance while adding in fail-safes in case bad things happen. This article walks through some of those practices. Given Moesif is an API analytics service, some of these practices are specific to high-volume data collection. However, other features are applicable regardless of your SDK purpose.
For most SaaS solutions with an SDK, your user will need to initialize the SDK and pass in an API key. You may need to grab certain account information from your servers or grab certain device information such as the OS version. This work should be performed on a background thread or leverage async calls. By leveraging background threads, the main thread can continue to respond without handing. Depending on if your SDK is for front ends or back ends, the main thread may be the main UI thread or could be the thread responding to incoming HTTP requests.
For any HTTP connection created with another server, there will be some overhead before any data can be transmitted. This includes a mutual exchange of SYN and ACK packets between a client and server to create a TCP connection, a handshake to establish SSL/TLS, etc. Instead of redoing these for every HTTP request, you can reuse connections via Keep-Alive.
Also, to keep alive, you can batch multiple events or commands into the same outbound HTTP request. Going back to our analytics SDK example, it would be inefficient to make an HTTP request for every client event that needs to be logged. TCP itself has overhead and you’re also sending many HTTP headers for each request to handle authentication, caching, etc. Batching reduces this overhead and reduces the number of system calls while keeping all the data in the same memory buffer.
Batching is done by sending an array of events or commands rather than a single one per HTTP request. It usually works best when combined with local queueing.
To correctly implement some of the above features like batching, you can leverage local queueing. Local queuing decouples the logic involved to capture or process data from the logic required to batch and send them to a server. Events can be stored in an in-memory memory queue or flushed to disc for durability if there is a risk of a power outage. More elaborate queueing architecture can leverage distributed data stores like Redis, although this adds complexity to set up your SDK so it is not recommended unless required.
Queuing also increases the reliability of your SDK when your API is down. Local events can continue to be pushed into the queue while the API is down. Once up, the queue can be drained in large batches. It’s recommended to implement logic that prevents the queue from consuming too much memory or disc space if the API is down for an extended time. One way to do this is via implementing a fixed size queue which automatically drops old values upon new ones. While some events may be dropped (which may be OK if the events are only used for analytics purposes), that could be a good trade-off guaranteeing your SDK won’t crash or overload your customer’s infrastructure.
Correct queueing will require certain triggers to flush the events or commands out to your server. The recommended approach is to leverage both time-based and count-based triggers. For example, you can flush events once the buffer reaches 50 events OP after 10 seconds have passed from the last flush. This ensures your SDK can batch many events during peak traffic, while still keeping end to end latency low.
Without time-based flushing, a single event could sit in the queue indefinitely if no other events are pushed into the queue.
Compression is super easy to take advantage of but can easily be forgotten. Not all HTTP client libraries compress payloads by default and your backend needs to support your preferred (de)compression encoding. By compressing your payload as gzip using zlib or similar, you can reduce the size of your payloads by over 10X from plain text. You can also look into newer formats like Brotli, which can further reduce this by 10 to 20% over gzip.
You should include both the SDK name and also a version in the
User-Agent HTTP header. This allows you to understand SDK adoption and correlate issues to specific versions. For example, we at Moesif adopted a standard format
libraryname/semvar across our SDKs so:
Once you build and publish these SDKs, you must have the right API analytics in place to measure the performance and utilization of your APIS and see what improvements you can make whether in the SDK or API side such as tweaking batch size or adding more efficient endpoints. Correctly implemented analytics can give you insights where improvements can be made in pagination or finding endpoints that are incorrectly used.
Leverage GitHub’s release process or create a CHANGELOG.md to thoroughly document changes, even if they are minor. When a user of an SDK encounters errors or problems, the first things he or she can check is the changelog and also any tickets filed similar to the issue. Sometimes small changes can break in older or specific environments without you knowing.
Breaking changes should have even more thorough documentation describing what is breaking and how to migrate. You might have some users of your SDK trying to migrate from version 1.X.X to 3.X.X whereas others may migrate from 2.X.X to 3.X.X. Documenting what is needed to finish that migration can be very helpful.
Published at DZone with permission of Derric Gilling. See the original article here.
Opinions expressed by DZone contributors are their own.