Solr 不斷字搜尋, Highlight
Posted on March 27th, 2014
修改檔案 schema.xml
必須將 autoGeneratePhraseQueries 改為 true,
若為 false, 則會將 query 字串斷成多個字元去交叉搜尋,
雖然搜尋的到, 但並不會正確 Highlight 該完整關鍵字串
(好像要 schema 1.4 版才支援, 1.3版若加上去會導致 Solr Server 開不起來)
autoGeneratePhraseQueries=true|false (in schema version 1.4 and later this now defaults to false)
- schema example:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!-- CJK bigram (see text_ja for a Japanese configuration using morphological analysis) --> | |
<fieldType name="text_cjk" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true"> | |
<analyzer> | |
<tokenizer class="solr.StandardTokenizerFactory"/> | |
<!-- normalize width before bigram, as e.g. half-width dakuten combine --> | |
<filter class="solr.CJKWidthFilterFactory"/> | |
<!-- for any non-CJK --> | |
<filter class="solr.LowerCaseFilterFactory"/> | |
<!-- for CJK bigram will enhance overall performance when deal with huge dataset | |
but will not support "?" wildcard search on chinese | |
--> | |
<filter class="solr.CJKBigramFilterFactory"/> | |
</analyzer> | |
</fieldType> | |
<!-- CJK bigram (see text_ja for a Japanese configuration using morphological analysis) --> | |
<fieldType name="text_cjk_no_tf" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true"> | |
<analyzer> | |
<tokenizer class="solr.StandardTokenizerFactory"/> | |
<!-- normalize width before bigram, as e.g. half-width dakuten combine --> | |
<filter class="solr.CJKWidthFilterFactory"/> | |
<!-- for any non-CJK --> | |
<filter class="solr.LowerCaseFilterFactory"/> | |
<!-- for CJK bigram will enhance overall performance when deal with huge dataset | |
but will not support "?" wildcard search on chinese | |
--> | |
<filter class="solr.CJKBigramFilterFactory"/> | |
</analyzer> | |
<similarity class="com.intumit.solr.OmitTermFreqSimilarity" /> | |
</fieldType> |
- 另外一個做法是在 query 時加上
hl.usePhraseHighlighter
參數
// 避免中文被斷詞 highlight
solrQuery .setParam ( "hl.usePhraseHighlighter", true );