feat(v1-sdks): async crawl node, python websocket + async crawl + example

This commit is contained in:
rafaelsideguide
2024-08-30 10:09:39 -03:00
parent 377e8ded34
commit ae38c26fa8
12 changed files with 394 additions and 649 deletions
+33 -9
View File
@@ -37,11 +37,9 @@ const crawlResponse = await app.crawlUrl('https://firecrawl.dev', {
scrapeOptions: {
formats: ['markdown', 'html'],
}
} as CrawlParams, true, 30) as CrawlStatusResponse;
})
if (crawlResponse) {
console.log(crawlResponse)
}
console.log(crawlResponse)
```
### Scraping a URL
@@ -63,16 +61,21 @@ const crawlResponse = await app.crawlUrl('https://firecrawl.dev', {
scrapeOptions: {
formats: ['markdown', 'html'],
}
} as CrawlParams, true, 30) as CrawlStatusResponse;
})
```
if (crawlResponse) {
console.log(crawlResponse)
}
### Asynchronous Crawl
To initiate an asynchronous crawl of a website, utilize the AsyncCrawlURL method. This method requires the starting URL and optional parameters as inputs. The params argument enables you to define various settings for the asynchronous crawl, such as the maximum number of pages to crawl, permitted domains, and the output format. Upon successful initiation, this method returns an ID, which is essential for subsequently checking the status of the crawl.
```js
const asyncCrawlResult = await app.asyncCrawlUrl('mendable.ai', { excludePaths: ['blog/*'], limit: 5});
```
### Checking Crawl Status
To check the status of a crawl job with error handling, use the `checkCrawlStatus` method. It takes the job ID as a parameter and returns the current status of the crawl job.
To check the status of a crawl job with error handling, use the `checkCrawlStatus` method. It takes the job ID as a parameter and returns the current status of the crawl job`
```js
const status = await app.checkCrawlStatus(id);
@@ -121,6 +124,27 @@ const mapResult = await app.mapUrl('https://example.com') as MapResponse;
console.log(mapResult)
```
### Crawl a website with WebSockets
To crawl a website with WebSockets, use the `crawlUrlAndWatch` method. It takes the starting URL and optional parameters as arguments. The `params` argument allows you to specify additional options for the crawl job, such as the maximum number of pages to crawl, allowed domains, and the output format.
```js
// Crawl a website with WebSockets:
const watch = await app.crawlUrlAndWatch('mendable.ai', { excludePaths: ['blog/*'], limit: 5});
watch.addEventListener("document", doc => {
console.log("DOC", doc.detail);
});
watch.addEventListener("error", err => {
console.error("ERR", err.detail.error);
});
watch.addEventListener("done", state => {
console.log("DONE", state.detail.status);
});
```
## Error Handling
The SDK handles errors returned by the Firecrawl API and raises appropriate exceptions. If an error occurs during a request, an exception will be raised with a descriptive error message. The examples above demonstrate how to handle these errors using `try/catch` blocks.