Suppose that we don't store the signature of a document in the document itself, but instead store it in the blockchain. In a blockchain system that relies on PKI, users already have a private key, which already lowers the threshold for adoption. Users of the system could be human beings, companies, or systems used within companies (such as a CRM, an ERP, a DMS,...). In the context of blockchain, the keypair is usually generated in a Web of Trust (WoT), but that's not a must. In some use cases (e.g. in the context of a permissionless blockchain), we might prefer to work with a keypair generated by a CA. As for the private key, some use cases might require that it's stored on a hardware device, such as an HSM, a smart card, or a USB token.
Signing a document would involve creating a hash of the document, signing that hash with a private key, and storing that signed hash in the blockchain along with the corresponding public certificate. In this case, we use blockchain as a database for storing signatures with the unique ID of the PDF as the primary key for each record. Optionally, we can add metadata to the signature records, such as the status of the document (draft, final, unpaid, paid,...) and one or more locations (e.g. the URLs of different mirrors from which the document can be downloaded).
As a developer, you get a lot of advantages by using this approach. You need to be less of a PDF expert: you don't need to worry about incremental updates, about which changes are or aren't allowed to the document, and so on. On the Blockchain side, you have control over the application (e.g. which metadata to store, which blockchain technology to use), but by using the API explained in this ref card, all the gritty details about how to query the blockchain will be abstracted away. Furthermore, you get automatic distribution of the signatures and metadata, load balancing, scalability, and security without having to write any code specifically for that purpose.
The address of the relevant blockchain could be stored in the metadata of the PDF. You can use this address to retrieve the record (or records) that correspond with the ID-pair of the document you want to verify. That record will give you information on the signer or signers. With the hash, you'll be able to verify the document's integrity. If more than one record is retrieved, you'll have a historical overview of who registered the document when, and you can inspect the metadata that was added with each registration.
What are the Advantages of Such an Approach?
Storing the signature in the blockchain instead of the document itself has many advantages:
- We still meet the requirement of integrity, since we create the signature in exactly the same way. We only store the signature in a different place;
- We can still work with a CA to meet the requirements of authenticity and non-repudation, but we don't have to. We can also rely on a Web-of-Trust stored in the blockchain;
- We don't need to work with a TSA anymore; timestamping is inherent to blockchain; once a record is added to a block, its contents are immutable and can't be changed;
- Since the signatures are stored outside of the document, signatures can be applied in parallel without a predefined order. Every signer can add a signature to the blockchain at any moment in time, and
- Everyone can check who signed a document, when, and in which order; this can be done without having to rely on a central service (e.g. Docusign);
- Keeping the signature of a document alive doesn't require any other approach; any eligible signer with access to the document can create a new signature in the blockchain using the latest hashing and encryption algorithms.
When discussing different use cases, you'll notice that we'll often talk about "registering a document" instead of "signing a document." Sometimes, we don't want the document to have a legally binding signature. Fot instance, you might want to register an intermediary draft of an agreement that is being negotiated in the blockchain, but you don't want that contract to be legally binding until it has been finalized and approved by all parties. In some sectors, people won't accept the fact that we store a signed hash in the blockchain as a valid signature, e.g. because the government doesn't accept such a signature as legally binding.
Even in those cases, it makes sense to store hashes of documents along with some metadata in the blockchain.
- You can retrieve the records of a document based on its ID pair to inspect the metadata added by the person who registered the document. For instance: a document that is registered as an invoice with status "unpaid" by the CRM system of a vendor, can be registered as an invoice with status "accepted" by the ERP system of a buyer, and eventually be registered as an invoice with status "paid" by the buyer's bank.
- You could also do a search on the first part of the ID pair to find records of documents that are related. A document that starts its lifecycle as a quote request, could change into a quote, then be accepted as a purchase order, then fork into an invoice and a delivery note.
- If you find a document in the wild, e.g. a draft of a technical specification, and you want to know if that document is the latest version, you could search for a record that registers a more recent version of that draft, and use the location information stored in that record to retrieve that updated version.
With the open source iText 7 pdfChain add-on, creating a blockchain record containing a signature of a PDF file is pretty straightforward.