# Kloudless Crawler
# Introduction
The Crawler is a type of Subscription that performs a one-time retrieval of all file and folder metadata in a connected user's account. If the account is an admin account, authenticated using the Kloudless admin OAuth flow, this retrieval includes metadata from all the users in the admin's organization.
Initiate a crawl for a connected account by
creating a Subscription
with the subscription_type
set to crawl
. In addition to the Crawler
publishing data on all the existing files and folders in the account, you can
choose to track new activity using the Activity API
by creating a different Subscription with the subscription_type
set to
changes
instead.
# Kloudless Crawler quickstart
Test out the Kloudless Crawler with the following steps, which are explained in further detail below:
Configure a default notification channel in the Kloudless Developer Portal to receive the crawler's JSON response.
Connect an account via the Kloudless OAuth flow or the API Explorer.
Create a subscription with the
subscription_type
set tocrawl
.Continue to monitor for new activity (optional).
# Configure a notification channel
On the Webhooks and Activity Monitoring page, configure one of the following notification channels to receive metadata from the crawler: Amazon EventBridge, Azure Service Bus, or Google Cloud Pub/Sub.
# Amazon EventBridge
Data can be filtered and routed to services like SQS, SNS, Amazon Kinesis, AWS Lambda, and more. Provide the following details in the Webhooks and Activity Monitoring page, under the Amazon EventBridge section:
- AWS Region
- AWS Account ID
# Google Cloud Pub/Sub
Provision a Google Pub/Sub topic and a service account.
The role of the service account should be
role/pubsub.publisher
(Pub/Sub Publisher) at minimum. Please refer to the Pub/Sub Access Control docs for information on roles as well as how to grant project-wide and topic-specific permissions.
Provide the following details on the Webhooks and Activity Monitoring page, under the Google Cloud Pub/Sub section:
- Topic name
- Service account key
The service account key should be in JSON format. It can be created on the Google Cloud Platform Console during or after service account creation.
# Azure Service Bus
Create a Service Bus resource and topic, then provide the following details in the Webhooks and Activity Monitoring page, under the Azure Service Bus section:
- Topic name
- Primary connection string
Because the default Shared Access Key has full control of the Service Bus namespace, it is recommended to set up a Shared Access Key at the topic level, since access to the entire namespace is not required.
# Connect an account
For testing purposes, you can use the API Explorer to connect your account. This simulates the process your customers would go through to authorize access to their account.
# The Kloudless OAuth flow
In your app, you can include the Kloudless Authenticator JS library to prompt users to connect their account, or your app can directly implement the Kloudless OAuth flow to connect user accounts. You can also configure custom OAuth keys to white label your app's authentication flow.
# Create a crawler subscription
Use the
Create Subscription endpoint
to manually create a new subscription. Set the subscription_type
attribute to
crawl
. In the request header, include the bearer token you received during
the OAuth flow:
curl -H 'Authorization: Bearer TOKEN' \
-H 'Content-Type: application/json' \
-XPOST -d '{"subscription_type": "crawl"}' \
'https://api.kloudless.com/v1/accounts/me/subscriptions/'
If you are using the API Explorer to create the Subscription, the bearer token is automatically included in the generated request's header.
# Monitoring for new activity
Once you have received metadata for the existing files and folders in the connected account, you can continue to monitor for new activity using the list activity endpoint.
If you enabled Track Activity on the
Webhooks and Activity Monitoring page
page before connecting the account, a default
changes
subscription was automatically created when the account was
connected, and you can immediately begin querying the List Activity endpoint.
Otherwise, you'll need to
manually create a subscription
with the subscription_type
attribute set to changes
before you can use
the List Activity endpoint.
See the Activity Monitoring usage guide for more information on using the Activity API to monitor for new activity.