Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Scraping Static Docs is Often Better Than Proxy for Generating Machine Readable API Definitions

DZone's Guide to

Scraping Static Docs is Often Better Than Proxy for Generating Machine Readable API Definitions

When documenting APIs, you sometimes need to take the shorter route.

· DevOps Zone
Free Resource

Download the blueprint that can take a company of any maturity level all the way up to enterprise-scale continuous delivery using a combination of Automic Release Automation, Automic’s 20+ years of business automation experience, and the proven tools and practices the company is already leveraging.

I was looking to create an APIs.json plus OpenAPI Spec(s) for the WordPress.org API, and the Instructure Canvas Learning Management System (LMS) API. I am pulling together a toolkit to support a workshop at Davidson College in North Carolina this month, and I wanted a handful of APIs that would be relevant to students and faculty on campus

In my experience, when it comes to documenting large APIs using OpenAPI Spec, you don't want to be hand rolling things, making auto generation essential. There are two options for accomplishing this, 1) I can use a proxy like Charles or Stoplight.io, or 2) I can write a script to scrape the publicly available HTML documentation for each API. While I do enjoy playing with mapping out APIs in Stoplight.io, allowing it do the heavy lifting of crafting each API definition, sometimes there is more relevant meta data for the API available in the API documentation.

The OpenAPI Spec, plus APIs.json files for both the WordPress and Instructure Canvas APIs took me about an hour a each, to write the script, and round off the OpenAPI Spec, making sure it was as complete as possible. Through scraping, I get description for endpoints, parameters, and sometimes I also get other detail including sample responses, enum, and response codes.

One downside of obtaining an API definition by scraping, is that I only get the surface area of an API, not the responses, and underlying data model. Sometimes this is included in documentation, but I do not always harvest this—waiting until I can get an often more correct schema, when I map out using a proxy or via HAR file. This is OK. I find the trade-off worth it. I'd rather have the more human-centered descriptions, and names of each endpoints, than the response definitions—that will come with time, and more usage of the actual APIs.

In the end, it really depends to the size of an API, and the quality of the API documentation. If it is a big API, and the documentation is well crafted, it is preferable to scrape and auto-generate the definition. Once I have this, I can load it into Postman or Stoplight.io, start making API calls, and use either Stoplight's proxy, or my own solution that uses Charles Proxy, to provide the remaining schema of the responses, as well as the resulting HTTP status code(s).

I think the human touch on all APIs.json, OpenAPI Spec, and API Blueprint files will prove to be essential in streamlining interactions at every stop along the API life cycle. If you can't easily understand what an API does, and what the moving parts are, the rest won't matter, so having simple, well-written titles, and descriptions for APIs that are described in each machine readable definition is well worth any extra work. Even with auto generation via scraping, or Stoplight.io, I find I still have to give each API definitions a little extra love to make sure they are as polished as possible.

I'm thinking I will start keeping a journal of the work goes into crafting each API's definition(s). It might be something I can use down the road to further streamline the creation, and maintenance of my API definitions, and the API services I develop to support all of this.

Here is the APIs.json for the Wordpress.org API by the way:

{
"name": "WordPress",
"description": "WordPress, which is commonly used to refer to all WordPress products, is the most popular and fastest growing publishing platform on the web. WordPress began as a blogging platform but soon evolved to include additional types of websites including news sites, corporate sites (for large brands and small businesses alike), e-commerce sites and everything in between.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/969_logo.png",
"tags": [
"API",
"Application Programming Interfaces",
"api lifeycle",
"blogging",
"cms",
"content",
"content management",
"indie edtech data jam"
],
"created": "2016-03-04",
"modified": "2016-03-04",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/wordpress/apis.json",
"specificationVersion": "0.14",
"apis": [
{
"name": "Wordpress.org Categories API",
"description": "WordPress, which is commonly used to refer to all WordPress products, is the most popular and fastest growing publishing platform on the web. WordPress began as a blogging platform but soon evolved to include additional types of websites including news sites, corporate sites (for large brands and small businesses alike), e-commerce sites and everything in between.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/969_logo.png",
"humanURL": "",
"baseURL": "",
"tags": [
"API",
"Application Programming Interfaces",
"api lifeycle",
"blogging",
"cms",
"content",
"content management",
"indie edtech data jam"
],
"properties": [
{
"type": "x-openapi-spec",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/wordpress/wordpressorg-categories-api-openapi-spec.json"
}
]
},
{
"name": "Wordpress.org Comments API",
"description": "WordPress, which is commonly used to refer to all WordPress products, is the most popular and fastest growing publishing platform on the web. WordPress began as a blogging platform but soon evolved to include additional types of websites including news sites, corporate sites (for large brands and small businesses alike), e-commerce sites and everything in between.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/969_logo.png",
"humanURL": "",
"baseURL": "",
"tags": [
"API",
"Application Programming Interfaces",
"api lifeycle",
"blogging",
"cms",
"content",
"content management",
"indie edtech data jam"
],
"properties": [
{
"type": "x-openapi-spec",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/wordpress/wordpressorg-comments-api-openapi-spec.json"
}
]
},
{
"name": "Wordpress.org Media API",
"description": "WordPress, which is commonly used to refer to all WordPress products, is the most popular and fastest growing publishing platform on the web. WordPress began as a blogging platform but soon evolved to include additional types of websites including news sites, corporate sites (for large brands and small businesses alike), e-commerce sites and everything in between.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/969_logo.png",
"humanURL": "",
"baseURL": "",
"tags": [
"API",
"Application Programming Interfaces",
"api lifeycle",
"blogging",
"cms",
"content",
"content management",
"indie edtech data jam"
],
"properties": [
{
"type": "x-openapi-spec",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/wordpress/wordpressorg-media-api-openapi-spec.json"
}
]
},
{
"name": "Wordpress.org Pages API",
"description": "WordPress, which is commonly used to refer to all WordPress products, is the most popular and fastest growing publishing platform on the web. WordPress began as a blogging platform but soon evolved to include additional types of websites including news sites, corporate sites (for large brands and small businesses alike), e-commerce sites and everything in between.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/969_logo.png",
"humanURL": "",
"baseURL": "",
"tags": [
"API",
"Application Programming Interfaces",
"api lifeycle",
"blogging",
"cms",
"content",
"content management",
"indie edtech data jam"
],
"properties": [
{
"type": "x-openapi-spec",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/wordpress/wordpressorg-pages-api-openapi-spec.json"
}
]
},
{
"name": "Wordpress.org Post API",
"description": "WordPress.org is home of the installable version of WordPress. The WordPress.org API provides a series of call exposing various informational assets and tools. Available resources include stats on systems running WordPress and contributor information. Tools include a secret key generator and access to WordPress plugins and themes.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/969_logo.png",
"humanURL": "http://v2.wp-api.org",
"baseURL": "http://v2.wp-api.org",
"tags": [
"API",
"Application Programming Interfaces",
"api lifeycle",
"blogging",
"cms",
"content",
"content management",
"indie edtech data jam"
],
"properties": [
{
"type": "x-documentation",
"url": "http://v2.wp-api.org/reference/"
},
{
"type": "x-openapi-spec",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/wordpress/wordpressorg-post-api-openapi-spec.json"
}
]
},
{
"name": "Wordpress.org Tags API",
"description": "WordPress, which is commonly used to refer to all WordPress products, is the most popular and fastest growing publishing platform on the web. WordPress began as a blogging platform but soon evolved to include additional types of websites including news sites, corporate sites (for large brands and small businesses alike), e-commerce sites and everything in between.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/969_logo.png",
"humanURL": "",
"baseURL": "",
"tags": [
"API",
"Application Programming Interfaces",
"api lifeycle",
"blogging",
"cms",
"content",
"content management",
"indie edtech data jam"
],
"properties": [
{
"type": "x-openapi-spec",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/wordpress/wordpressorg-tags-api-openapi-spec.json"
}
]
},
{
"name": "Wordpress.org Taxonomies API",
"description": "WordPress, which is commonly used to refer to all WordPress products, is the most popular and fastest growing publishing platform on the web. WordPress began as a blogging platform but soon evolved to include additional types of websites including news sites, corporate sites (for large brands and small businesses alike), e-commerce sites and everything in between.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/969_logo.png",
"humanURL": "",
"baseURL": "",
"tags": [
"API",
"Application Programming Interfaces",
"api lifeycle",
"blogging",
"cms",
"content",
"content management",
"indie edtech data jam"
],
"properties": [
{
"type": "x-openapi-spec",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/wordpress/wordpressorg-taxonomies-api-openapi-spec.json"
}
]
},
{
"name": "Wordpress.org Users API",
"description": "WordPress, which is commonly used to refer to all WordPress products, is the most popular and fastest growing publishing platform on the web. WordPress began as a blogging platform but soon evolved to include additional types of websites including news sites, corporate sites (for large brands and small businesses alike), e-commerce sites and everything in between.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/969_logo.png",
"humanURL": "",
"baseURL": "",
"tags": [
"API",
"Application Programming Interfaces",
"api lifeycle",
"blogging",
"cms",
"content",
"content management",
"indie edtech data jam"
],
"properties": [
{
"type": "x-openapi-spec",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/wordpress/wordpressorg-users-api-openapi-spec.json"
}
]
}
],
"x-common": [
{
"type": "X-blog",
"url": "http://wordpress.org/news/"
}
],
"include": [
{
"name": "Parent",
"url": "http://kinlane.github.io/indie-edtech-data-jam//apis.json"
}
],
"maintainers": [
{
"FN": "Kin Lane",
"X-twitter": "apievangelist",
"email": "info@apievangelist.com"
}
]
}

Here is the APIs.json for the Instructure Canvas API as well:

{
"name": "Instructure Canvas",
"description": "Instructure is a technology company that is focused on improving  education. Founded in 2008 by two Computer Science graduate students,  Instructure built Canvas - the only open source learning management  system and the only LMS native to the cloud. Instructure now services  over 160 institutions in higher education and K-12. Investors include  OpenView Venture Partners and Tomorrow Ventures.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/logos/canvas-logo.png",
"tags": [
"API",
"Application Programming Interfaces",
"education",
"indie edtech data jam",
"lms"
],
"created": "2016-03-04",
"modified": "2016-03-04",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/instructure-canvas/apis.json",
"specificationVersion": "0.14",
"apis": [
{
"name": "Instructure Canvas Accounts API",
"description": "Instructure is a technology company that is focused on improving  education. Founded in 2008 by two Computer Science graduate students,  Instructure built Canvas - the only open source learning management  system and the only LMS native to the cloud. Instructure now services  over 160 institutions in higher education and K-12. Investors include  OpenView Venture Partners and Tomorrow Ventures.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/logos/canvas-logo.png",
"humanURL": "",
"baseURL": "",
"tags": [
"API",
"Application Programming Interfaces",
"education",
"indie edtech data jam",
"lms"
],
"properties": [
{
"type": "x-openapi-spec",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/instructure-canvas/instructure-canvas-accounts-api-openapi-spec.json"
}
]
},
{
"name": "Instructure Canvas Appointment Groups API",
"description": "Instructure is a technology company that is focused on improving  education. Founded in 2008 by two Computer Science graduate students,  Instructure built Canvas - the only open source learning management  system and the only LMS native to the cloud. Instructure now services  over 160 institutions in higher education and K-12. Investors include  OpenView Venture Partners and Tomorrow Ventures.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/logos/canvas-logo.png",
"humanURL": "",
"baseURL": "",
"tags": [
"API",
"Application Programming Interfaces",
"education",
"indie edtech data jam",
"lms"
],
"properties": [
{
"type": "x-openapi-spec",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/instructure-canvas/instructure-canvas-appointment-groups-api-openapi-spec.json"
}
]
},
{
"name": "Instructure Canvas Audit API",
"description": "Instructure is a technology company that is focused on improving  education. Founded in 2008 by two Computer Science graduate students,  Instructure built Canvas - the only open source learning management  system and the only LMS native to the cloud. Instructure now services  over 160 institutions in higher education and K-12. Investors include  OpenView Venture Partners and Tomorrow Ventures.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/logos/canvas-logo.png",
"humanURL": "",
"baseURL": "",
"tags": [
"API",
"Application Programming Interfaces",
"education",
"indie edtech data jam",
"lms"
],
"properties": [
{
"type": "x-openapi-spec",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/instructure-canvas/instructure-canvas-audit-api-openapi-spec.json"
}
]
},
{
"name": "Instructure Canvas Calendar Events API",
"description": "Instructure is a technology company that is focused on improving  education. Founded in 2008 by two Computer Science graduate students,  Instructure built Canvas - the only open source learning management  system and the only LMS native to the cloud. Instructure now services  over 160 institutions in higher education and K-12. Investors include  OpenView Venture Partners and Tomorrow Ventures.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/logos/canvas-logo.png",
"humanURL": "",
"baseURL": "",
"tags": [
"API",
"Application Programming Interfaces",
"education",
"indie edtech data jam",
"lms"
],
"properties": [
{
"type": "x-openapi-spec",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/instructure-canvas/instructure-canvas-calendar-events-api-openapi-spec.json"
}
]
},
{
"name": "Instructure Canvas Conversations API",
"description": "Instructure is a technology company that is focused on improving  education. Founded in 2008 by two Computer Science graduate students,  Instructure built Canvas - the only open source learning management  system and the only LMS native to the cloud. Instructure now services  over 160 institutions in higher education and K-12. Investors include  OpenView Venture Partners and Tomorrow Ventures.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/logos/canvas-logo.png",
"humanURL": "",
"baseURL": "",
"tags": [
"API",
"Application Programming Interfaces",
"education",
"indie edtech data jam",
"lms"
],
"properties": [
{
"type": "x-openapi-spec",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/instructure-canvas/instructure-canvas-conversations-api-openapi-spec.json"
}
]
},
{
"name": "Instructure Canvas Courses API",
"description": "Instructure is a technology company that is focused on improving  education. Founded in 2008 by two Computer Science graduate students,  Instructure built Canvas - the only open source learning management  system and the only LMS native to the cloud. Instructure now services  over 160 institutions in higher education and K-12. Investors include  OpenView Venture Partners and Tomorrow Ventures.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/logos/canvas-logo.png",
"humanURL": "",
"baseURL": "",
"tags": [
"API",
"Application Programming Interfaces",
"education",
"indie edtech data jam",
"lms"
],
"properties": [
{
"type": "x-openapi-spec",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/instructure-canvas/instructure-canvas-courses-api-openapi-spec.json"
}
]
},
{
"name": "Instructure Canvas Global API",
"description": "Instructure is a technology company that is focused on improving  education. Founded in 2008 by two Computer Science graduate students,  Instructure built Canvas - the only open source learning management  system and the only LMS native to the cloud. Instructure now services  over 160 institutions in higher education and K-12. Investors include  OpenView Venture Partners and Tomorrow Ventures.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/logos/canvas-logo.png",
"humanURL": "",
"baseURL": "",
"tags": [
"API",
"Application Programming Interfaces",
"education",
"indie edtech data jam",
"lms"
],
"properties": [
{
"type": "x-openapi-spec",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/instructure-canvas/instructure-canvas-global-api-openapi-spec.json"
}
]
},
{
"name": "Instructure Canvas Groups API",
"description": "Instructure is a technology company that is focused on improving  education. Founded in 2008 by two Computer Science graduate students,  Instructure built Canvas - the only open source learning management  system and the only LMS native to the cloud. Instructure now services  over 160 institutions in higher education and K-12. Investors include  OpenView Venture Partners and Tomorrow Ventures.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/logos/canvas-logo.png",
"humanURL": "",
"baseURL": "",
"tags": [
"API",
"Application Programming Interfaces",
"education",
"indie edtech data jam",
"lms"
],
"properties": [
{
"type": "x-openapi-spec",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/instructure-canvas/instructure-canvas-groups-api-openapi-spec.json"
}
]
},
{
"name": "Instructure Canvas Polls API",
"description": "Instructure is a technology company that is focused on improving  education. Founded in 2008 by two Computer Science graduate students,  Instructure built Canvas - the only open source learning management  system and the only LMS native to the cloud. Instructure now services  over 160 institutions in higher education and K-12. Investors include  OpenView Venture Partners and Tomorrow Ventures.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/logos/canvas-logo.png",
"humanURL": "",
"baseURL": "",
"tags": [
"API",
"Application Programming Interfaces",
"education",
"indie edtech data jam",
"lms"
],
"properties": [
{
"type": "x-openapi-spec",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/instructure-canvas/instructure-canvas-polls-api-openapi-spec.json"
}
]
},
{
"name": "Instructure Canvas Quiz Submissions API",
"description": "Instructure is a technology company that is focused on improving  education. Founded in 2008 by two Computer Science graduate students,  Instructure built Canvas - the only open source learning management  system and the only LMS native to the cloud. Instructure now services  over 160 institutions in higher education and K-12. Investors include  OpenView Venture Partners and Tomorrow Ventures.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/logos/canvas-logo.png",
"humanURL": "",
"baseURL": "",
"tags": [
"API",
"Application Programming Interfaces",
"education",
"indie edtech data jam",
"lms"
],
"properties": [
{
"type": "x-openapi-spec",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/instructure-canvas/instructure-canvas-quiz-submissions-api-openapi-spec.json"
}
]
},
{
"name": "Instructure Canvas Sections API",
"description": "Instructure is a technology company that is focused on improving  education. Founded in 2008 by two Computer Science graduate students,  Instructure built Canvas - the only open source learning management  system and the only LMS native to the cloud. Instructure now services  over 160 institutions in higher education and K-12. Investors include  OpenView Venture Partners and Tomorrow Ventures.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/logos/canvas-logo.png",
"humanURL": "",
"baseURL": "",
"tags": [
"API",
"Application Programming Interfaces",
"education",
"indie edtech data jam",
"lms"
],
"properties": [
{
"type": "x-openapi-spec",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/instructure-canvas/instructure-canvas-sections-api-openapi-spec.json"
}
]
},
{
"name": "Instructure Canvas Users API",
"description": "Instructure is a technology company that is focused on improving  education. Founded in 2008 by two Computer Science graduate students,  Instructure built Canvas - the only open source learning management  system and the only LMS native to the cloud. Instructure now services  over 160 institutions in higher education and K-12. Investors include  OpenView Venture Partners and Tomorrow Ventures.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/logos/canvas-logo.png",
"humanURL": "",
"baseURL": "",
"tags": [
"API",
"Application Programming Interfaces",
"education",
"indie edtech data jam",
"lms"
],
"properties": [
{
"type": "x-openapi-spec",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/instructure-canvas/instructure-canvas-users-api-openapi-spec.json"
}
]
},
{
"name": "Instructure Canvas Utility APIs",
"description": "Canvas LMS includes a REST API for accessing and modifying data externally from the main application, in your own programs and scripts. This documentation describes the resources that make up the API.",
"image": "http://kinlane-productions.s3.amazonaws.com/api-evangelist-site/company/logos/canvas-logo.png",
"humanURL": "https://canvas.instructure.com/doc/api/index.html",
"baseURL": "https://canvas.instructure.com/doc/api/index.html",
"tags": [
"API",
"Application Programming Interfaces",
"education",
"indie edtech data jam",
"lms"
],
"properties": [
{
"type": "x-openapi-spec",
"url": "http://kinlane.github.io/indie-edtech-data-jam//data/instructure-canvas/instructure-canvas-utility-apis-openapi-spec.json"
}
]
}
],
"x-common": [
{
"type": "X-blog",
"url": "http://blog.instructure.com"
},
{
"type": "X-portal",
"url": "https://canvas.instructure.com/doc/api/index.html"
}
],
"include": [
{
"name": "Parent",
"url": "http://kinlane.github.io/indie-edtech-data-jam//apis.json"
}
],
"maintainers": [
{
"FN": "Kin Lane",
"X-twitter": "apievangelist",
"email": "info@apievangelist.com"
}
]
}

You can see these, and some other API definitions for my workshop over at the Github repo for the project. I created a new Liquid template, that allows me to display APIs.json and OpenAPI Specs within the Jekyll site for this project. Something that I will be using to better deliver API-driven content, visualizations, and other resources that help us learn about, and put APIs to work.

Download the ‘Practical Blueprint to Continuous Delivery’ to learn how Automic Release Automation can help you begin or continue your company’s digital transformation.

Topics:
api ,process automation

Published at DZone with permission of Kin Lane, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}