Outlook Email Crawling Using Graph APIs
In this article, take a look at Outlook email crawling using graph APIs.
Join the DZone community and get the full member experience.
Join For FreeMS Graph APIs documentation is so well-maintained and easily-understandable in one read. But the actual implementation like when we are creating an app and giving permissions and testing the permissions are a bit time-consuming.
So our use case is to get the emails from Outlook for the given email ID and need to post to the messaging queue. The overall goal looks pretty simple, but it takes so much time to actual working POC.
Note: This is for Outlook Graph API where the mailbox is in CLOUD.
Reading user mailboxes has to be done in 2 ways:
- Case 1: OAuth with user behalf
- Case 2: OAuth without user (Admin consent)
OAuth with user behalf requires the user to enter their sign-in details, generate a token, and communicate with Graph APIs. This is the external intervention of the user to enter sign in details.
There are 3 steps:
- Create code (requires user sign in details)
- Create a token from the created code in step 1
- Access graph APIs with the token generated from step 2
But in the case of email scrapping, this method is not efficient, as we have to generate tokens and give the details manually.
https://docs.microsoft.com/en-us/outlook/rest/get-started
Oauth without a user makes better options in an email scrapping use case, as there is no intervention of the user to create any manual intervention. But it requires admin consent.
Allowing admin consent makes the app to read all mailboxes irrespective of any user in that tenant. It will be privacy issue with other mailboxes. We can restrict this with applying policies that give limited access to the mailboxes.
Just follow all steps in the MS documentation: https://docs.microsoft.com/en-us/graph/auth-v2-service
Just make sure you have given all the mailbox access permissions are provided for the created app.
https://docs.microsoft.com/en-us/graph/auth-limit-mailbox-access -- For privacy restrictions.
Once app creation and Admin consent permission for the app is done, the actual REST calls to get data has to be done.
So first we need to generate the token to get permissions for calling Graph APIs.
xxxxxxxxxx
POST:
https://login.microsoftonline.com/{tenant id}/oauth2/v2.0/token
Headers:
grant_type : client_credentials
client_id : {client_id}
client_secret : {client_secret}
scope : https://graph.microsoft.com/.default
Once you hit the above URL, you will get the token in response if the call is a success. clientid/tenantid/clientsecret you will find the in-app details page, which you created in earlier steps.
Then we have to hit the below API to get the actual email content that we are looking for.
xxxxxxxxxx
GET:
https://graph.microsoft.com/v1.0/users/{email id}/mailfolders/inbox/messages?$select=subject,from,receivedDateTime&$top=2&$orderby=receivedDateTime%20DESC
Headers:
Authorization : Bearer ey...(token id generated above)
Based on your favorite programming language, you can call the above APIs to get the emails.
But here we have to poll continuously to get the emails instantly. To overcome this frequent polling to the email box, we can keep an event subscription to get notified whenever new emails arrive to the mailbox.
Mailbox Event Subscription
One prerequisite we have here is we have to provide an endpoint to the subscription where it has to be notified to. So first create an endpoint where that has to accesses without any issues from the internet.
Creation of subscription and more details you can find it here: https://docs.microsoft.com/en-us/graph/webhooks
From the above link, we can get a full outlook on subscription APIs, and what we need from that documentation is one API where we tell the MS to send an event whenever new emails get into the given email box.
x
POST https://graph.microsoft.com/v1.0/subscriptions
Content-Type: application/json
Header: Authorization Bearer ..... (Has to generate with Application consent app id)
{
"changeType": "created,updated",
"notificationUrl": "web hook url", //end point we are developed as prerequisite
"resource": "/users/{user email or guid}/mailfolders('inbox')/messages",
"expirationDateTime": "2016-03-20T11:00:00.0000000Z",
"clientState": "SecretClientState"
}
You have to provide the expiration time for the event. The maximum expiration time is 3 days.
Note: Double check the response you are sending back to subscription API when the above API is called.
After this, we will get the data as the body to our endpoint whenever any new email comes into the inbox.
A sample response to our endpoint will be something like the below:
xxxxxxxxxx
Sample Body:
{
"value": [
{
"subscriptionId":"<subscription_guid>",
"subscriptionExpirationDateTime":"2016-03-19T22:11:09.952Z",
"clientState":"secretClientValue",
"changeType":"created",
"resource":"users/{user_guid}@<tenant_guid>/messages/{long_id_string}", //long_id_string is the message id
"tenantId": "tenanet id",
"resourceData":
{
"@odata.type":"#Microsoft.Graph.Message",
"@odata.id":"Users/{user_guid}@<tenant_guid>/Messages/{long_id_string}",
"@odata.etag":"W/\"CQAAABYAAADkrWGo7bouTKlsgTZMr9KwAAAUWRHf\"",
"id":"<long_id_string>"
}
}
]
PS: Follow the MS documentation very clearly and step by step.
Opinions expressed by DZone contributors are their own.
Comments