Working with Search Crawl Logs in SharePoint 2013

The crawl logs in the search service application page can help you track the status of the crawled content in your SharePoint farm. Now, in this article I would be briefly discussing about what search crawl log is and how to make use of it.

The Search crawl logs in SharePoint Search service application can help you to identify the below mentioned three things ……

  1. Whether crawled content was successfully added to the index
  2. Whether it was excluded because of a crawl rule
  3. Whether indexing failed because of an error.

It can also give you some additional information about the crawled content, when was the last time a successful crawl ran, the content sources and if there is any crawl rules in place. In addition to that if you’re troubleshooting any error related to “enterprise search”, then crawl log is the right tool to rely upon.

Steps to access the search crawl log in SharePoint 2013:

  1. Verify whether you’re an administrator for the Search service application.
  2. In Central Administration, in the Quick Launch, click Application Management.
  3. On the Application Management page, under Service Applications, click Manage service applications.
  4. On the Service Applications page, in the list of service applications, click the Search service application that you want.
  5. On the Search Administration page, in the Quick Launch, under Crawling, click Crawl Log.
  6. On the Crawl Log – Content Source page, click the view that you want.

Now if you take a closer look at the “crawl log” page you will see different views such as ….

  1. Content source.
  2. Host Name.
  3. Crawl History
  4. Error Breakdown
  5. Databases
  6. URL View

Search Crawl log page.jpg

Let’s discuss in brief about what these Crawl log views are all about and what information do they provide.

View Name 1: Content Source:

Summarizes items crawled per content source. Shows successes, warnings, errors, top-level errors, and deletes. The data in this view represent the current status of items that are already present in the index per content source. The Object Model provides the data for this view.

View Name 2: Host Name:

Summarizes items crawled per host. Shows successes, warnings, errors, deletes, top-level errors, and total. The data in this view represent the current status of items that are already present in the index per host. If your environment has multiple crawl databases, the data is shown per crawl database. The Search Administration database provides the data for this view. You can filter the results by typing a URL in the Find URLs that begin with the following hostname/path: box.

View Name 3: URL:

Let’s you search the crawl logs by content source or URL or host name and view details of all items that are present in the index. The MSSCrawlURLReport table in the crawl database provides the data for this view. You can filter the results by setting the StatusMessageStart Time, and End Time fields.

View Name 4: Crawl History:

Summarizes crawl transactions that were completed during a crawl. There can be multiple crawl transactions per item in a single crawl, so the number of transactions can be larger than the total number of items. This view shows data for three kinds of crawls:

  • Full. Crawls all items in a content source.
  • Incremental. Crawls items that have been changed since the last full or incremental crawl. This kind of crawl only runs if it is scheduled.
  • Delete. If start addresses are removed from a content source, a delete crawl removes items associated with the deleted start address from the index before a full or incremental crawl runs. This kind of crawl cannot be scheduled.

The Search Administration database provides the data for this view. You can filter the results by content source.

View Name 5: Error Message:

Provides aggregates of errors per content source or host name. The MSSCrawlURLReport table in the crawl database provides the data for this view. You can filter by content source or host.

Note: The filter drop-down box only shows content sources that contain errors. If there is an error against an item that does not appear in the index, the error does not appear in this view.

Now since we have talked about the different crawl log views in Search, let’s discuss on how the data is surfaced in these views.

The data in the search crawl page is displayed in the following columns:

  • Successes–>Items that were successfully crawled, added to the index and searchable.
  • Warnings–>Items that might not have been successfully crawled and might not be searchable.
  • Errors–>Items that were not successfully crawled and might not be searchable.
  • Deletes–>Items that were removed from the index and are no longer searchable.
  • Top Level Errors–>Errors in top-level documents, including start addresses, virtual servers, and content databases. Every top-level error is counted as an error, but not all errors are counted as top-level errors. Because the Errors column includes the count from the Top Level Errors column, top-level-errors are not counted again in the Host Name view.
  • Not Modified–>Items that were not modified between crawls.
  • Security Update–>Items whose security settings were crawled because they were modified.

The Search Crawl log Timer job plays and important role here and its really very crucial and important that you check that often and make sure its running properly.

Search Crawl Log Timer Job:

By default, the data for each crawl log view in the crawl log is refreshed every five minutes by the timer job “Crawl Log Report for Search Application <Your Search Service Application name>”. This can be changed as per your need if required, but the best practice is to leave it as it is.

To check the status of the crawl log timer job:

  1. Make sure you’re a Farm Administrator
  2. In Central Administration, in the Monitoring section, click Check job status.
  3. On the Timer Job Status page, click Job History.
  4. On the Job History page, find Crawl Log Report for Search Application <Search Service Application name> for the Search service application that you want and review the status.

To change the refresh rate for the crawl log timer job:

  1. Make sure you’re a Farm Administrator
  2. In Central Administration, in the Monitoring section, click Check job status.
  3. On the Timer Job Status page, click Job History.
  4. On the Job History page, click Crawl Log Report for Search Application <Search Service Application name> for the Search service application that you want.
  5. On the Edit Timer Job page, in the Recurring Schedule section, change the timer job schedule to the interval that you want.
  6. Click OK.

I’ll be discussing about how to troubleshoot the search crawl problems in a different article .Thanks for reading this post!!!!

Advertisements

One thought on “Working with Search Crawl Logs in SharePoint 2013

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s