A search appliance is usually made up of several components. These include a gathering component, a standardizing component, a data storage area, a search component, a user interface component, and a management interface component:
The gathering component is usually a web crawler or file crawler that goes out on a network or the Internet and gathers files and data from specified locations. This might include SMB shared directories, NFS shared directories, databases, and web pages. The crawler might either copy files to the search appliance, or only copy the metadata about the file.
A standardizing component takes the data from the gathering component and transposes it into a standardized format for storage in the data storage component. It then places it in the data storage area.
The data storage component holds metadata about the files and might also contain copies of the actual file or data as well as the metadata about the file.
The search component searches through the stored metadata from the files and provides the information to the search interface in the form of query results. It also can provide links to the copies of the files stored on the search appliance, or it can provide links to the original files in the source locations.
The search interface is the component where users compose their search queries. It provides instructions to the search component and displays query results to the user.
The management interface lets administrators manage user accounts, permissions, adding and deleting search indexes, crawl job scheduling, and other relevant functions.
Commercial examples
Google Search Appliance was a SA from Google. It was supplied in two models: a 2U model capable of indexing up to 10 million documents, and a 5U model that was capable of indexing up to 30 million documents. Google no longer sells a search appliance
The Mindbreeze InSpire Appliance is produced by the Austrian software vendor Fabasoft Mindbreeze.
The Perfect Search Appliance stores file metadata in an index on the appliance. A web server on the appliance uses that metadata to provide relevant search results in response to user queries, and provides a link to access the original files.