INDUSTRIAL MANUFACTURING BLOG YOU SHOULD BE READING

How to Choose So Analyzer?

09 May.,2024

Often managers try to manage everyone the same way—and that’s usually the way they like to be managed. But this approach can backfire. People like to be managed differently—and it may not always be in a way that comes naturally to you. Even beyond the individual needs, teams require different leadership styles. You wouldn’t manage a sales team the same way you’d manage a team of developers.

When working with Analyzers, remember that they’re reserved, direct, assertive, and thorough. They’re typically less effective with work that requires them to make a decision quickly and with limited data. Analyzers like to express and implement their own ideas. When managing this profile, consider some of the following suggestions:

Don’t micromanage them.
Give them time and space to think things through.
Offer as much data and information as you can.
Bring them challenges and problems to solve.
Allow them to express and implement their own ideas.
Don’t pressure them into making a quick decision.

Analysis takes place in two contexts. At index time, when a field is being created, the token stream that results from analysis is added to an index and defines the set of terms (including positions, sizes, and so on) for the field. At query time, the values being searched for are analyzed and the terms that result are matched against those that are stored in the field’s index.

In many cases, the same analysis should be applied to both phases. This is desirable when you want to query for exact string matches, possibly with case-insensitivity, for example. In other cases, you may want to apply slightly different analysis steps during indexing than those used at query time.

If you provide a simple <analyzer> definition for a field type, as in the examples above, then it will be used for both indexing and queries. If you want distinct analyzers for each phase, you may include two <analyzer> definitions distinguished with a type attribute. For example:

<fieldType
 name=
"nametext"
 class=
"solr.TextField"
>
  <analyzer
 type=
"index"
>
    <tokenizer
 class=
"solr.StandardTokenizerFactory"
/>
    <filter
 class=
"solr.LowerCaseFilterFactory"
/>
    <filter
 class=
"solr.KeepWordFilterFactory"
 words=
"keepwords.txt"
/>
    <filter
 class=
"solr.SynonymFilterFactory"
 synonyms=
"syns.txt"
/>
  </analyzer>
  <analyzer
 type=
"query"
>
    <tokenizer
 class=
"solr.StandardTokenizerFactory"
/>
    <filter
 class=
"solr.LowerCaseFilterFactory"
/>
  </analyzer>
</fieldType>

In this theoretical example, at index time the text is tokenized, the tokens are set to lowercase, any that are not listed in keepwords.txt are discarded and those that remain are mapped to alternate values as defined by the synonym rules in the file syns.txt. This essentially builds an index from a restricted set of possible values and then normalizes them to values that may not even occur in the original text.

At query time, the only normalization that happens is to convert the query terms to lowercase. The filtering and mapping steps that occur at index time are not applied to the query terms. Queries must then, in this example, be very precise, using only the normalized terms that were stored at index time.

Analysis for Multi-Term Expansion

In some types of queries (ie: Prefix, Wildcard, Regex, etc…) the input provided by the user is not natural language intended for Analysis. Things like Synonyms or Stop word filtering do not work in a logical way in these types of Queries.

The analysis factories that can work in these types of queries (such as Lowercasing, or Normalizing Factories) are known as MultiTermAwareComponents. When Solr needs to perform analysis for a query that results in Multi-Term expansion, only the MultiTermAwareComponents used in the query analyzer are used, Factory that is not Multi-Term aware will be skipped.

For most use cases, this provides the best possible behavior, but if you wish for absolute control over the analysis performed on these types of queries, you may explicitly define a multiterm analyzer to use, such as in the following example:

<fieldType
 name=
"nametext"
 class=
"solr.TextField"
>
  <analyzer
 type=
"index"
>
    <tokenizer
 class=
"solr.StandardTokenizerFactory"
/>
    <filter
 class=
"solr.LowerCaseFilterFactory"
/>
    <filter
 class=
"solr.KeepWordFilterFactory"
 words=
"keepwords.txt"
/>
    <filter
 class=
"solr.SynonymFilterFactory"
 synonyms=
"syns.txt"
/>
  </analyzer>
  <analyzer
 type=
"query"
>
    <tokenizer
 class=
"solr.StandardTokenizerFactory"
/>
    <filter
 class=
"solr.LowerCaseFilterFactory"
/>
  </analyzer>
  <!-- No analysis at all when doing queries that involved Multi-Term expansion -->
  <analyzer
 type=
"multiterm"
>
    <tokenizer
 class=
"solr.KeywordTokenizerFactory"
 />
  </analyzer>
</fieldType>

Click here to get more.

If you are looking for more details, kindly visit Film Free Shrink Tester.

Previous: Who has the highest voltage in the world?

Next: 4 Advice to Choose a Four-Wire Resistive Touch Screen for Handheld GPS: A Comprehensive Guide

Guest Posts

How to Choose So Analyzer?

Analysis for Multi-Term Expansion

How to Choose So Analyzer?

Apache Solr Reference Guide 6.6