In AskiaSurf, the advanced group of the options dialog contains one sub-group: reconciliation and matching. The options in this sub-group allow you to choose how AskiaSurf will behave while reconciling two waves.
The NGram analysis options allow you to choose the method that AskiaSurf uses to compare two phrases (such as in question or response shortcuts).
When performing a strict comparison, askiasurf requires two phrases to be identical. When using NGram analysis, AskiaSurf performs intelligent matching, treating similar phrases (those containing the same words, but in a different order), as if they were the same. This is useful for detecting matches where the wording of a caption has changed in a particular wave.
For example, depending on the settings being used, NGram analysis might find the phrases "Q3 - Gender" and "Gender - Q3" to be the same because the same words are present in both, albeit in a different order.
This group contains the following options:
Enable NGram analysis: If selected, AskiaSurf performs NGram analysis to compare phrases. If not, AskiaSurf does not perform NGram analysis.
NGram length: Controls the number of characters that will be analyzed together as a unit. For example, if this is set to 10, then groups of 10 characters or more will be analyzed as units.
The lower the value, the more accurate the NGram analysis, but the greater the effect on reconciliation performance. We recommend, therefore, that you leave this option at the default value of 2 unless you are suffering slow performance during reconciliation.
The default value for this option is 2. The length may be set to any value between 2 and 100.
Note: This option applies only if enable NGram analysis is selected (i.e. if AskiaSurf is using NGram analysis).
NGram threshold: The threshold at which two text strings are similar enough that they are considered the same.
When two phrases are compared using NGram analysis, the result is a value from 0 to 100, where 0 means the two phrases are completely different and 100 means they are identical. If the setting was 80, for example, then two phrases that have an NGram value of 80 or higher (i.e. they are at least 80% similar) will be treated by askiasurf as if they are the same.
The default value for this setting is 95. The threshold may be set to any value between 1 and 100.
Note: This option applies only if enable NGram analysis is selected (i.e. if askiasurf is using NGram analysis).
Ignore trailing/leading whitespace: If selected, AskiaSurf will ignore any "whitespace" characters (i.e. spaces or tabs) at the start or end of a text when performing a comparison. For example, if AskiaSurf is comparing two otherwise similar texts, but one has a tab character at the start, the tab character will be ignored.
Ensure matched questions are of the same type: If selected, AskiaSurf will only regard two questions as matched if they are of the same type, not matter how otherwise similar they are. For example, if one is a closed question and the other numeric, they will not be matched.
Match equivalent loop types: If selected, the following loop types are matched. By default, this option is switched on.
Question table
Loop with selection at each iteration
Loop with preliminary selection