Discussion:
Fulltext searches, synonyms, and alternate spellings...
(too old to reply)
Dodger
2006-11-12 23:47:46 UTC
Permalink
I was wondering if there was any way to somehow modify a fulltext
search to be able to match on full synonyms and (perhaps more
importantly) alternative spellings.

For instance,

INSERT
INTO some_table (words_text)
VALUES ('He parked the automobile with its tyres against the kerb.')

SELECT *
FROM some_table
WHERE MATCH (words_text)
AGAINST ('car tires curb')

...won't match at all, when any English-speaking human can see it
should.

Hoping for any recommendations on how to deal with this,
--
Dodger
Peter H. Coffin
2006-11-13 01:19:04 UTC
Permalink
Post by Dodger
I was wondering if there was any way to somehow modify a fulltext
search to be able to match on full synonyms and (perhaps more
importantly) alternative spellings.
For instance,
INSERT
INTO some_table (words_text)
VALUES ('He parked the automobile with its tyres against the kerb.')
SELECT *
FROM some_table
WHERE MATCH (words_text)
AGAINST ('car tires curb')
...won't match at all, when any English-speaking human can see it
should.
Hoping for any recommendations on how to deal with this,
Wave a magic wand? I speak English (.us subvarint) natively and would
never expect a search engine to match "tyre" to "tire". As for "kerb" to
"curb", and "car" to "automobile", well, they're different words with
different meaning. For example, "curb" is both a noun and a verb. "Kerb"
is not. "Car" can apply to a railroad wagon, a conveyance hung from
wires, or part of an elevator, above and beyond an automobile. It's not
the job of a search function to account for muddling thinking.
--
59. I will never build a sentient computer smarter than I am.
--Peter Anspach's list of things to do as an Evil Overlord
Dodger
2006-11-13 02:45:12 UTC
Permalink
Post by Peter H. Coffin
Wave a magic wand? I speak English (.us subvarint) natively and would
never expect a search engine to match "tyre" to "tire". As for "kerb" to
"curb", and "car" to "automobile", well, they're different words with
different meaning. For example, "curb" is both a noun and a verb. "Kerb"
is not. "Car" can apply to a railroad wagon, a conveyance hung from
wires, or part of an elevator, above and beyond an automobile. It's not
the job of a search function to account for muddling thinking.
Hi, Peter.

Thanks for replying. 'Kerb' is the British spelling for 'curb' (as in,
edge of sidewalk). 'Tyre' is the British spelling for 'tire' (as in the
rubber part of a car's wheel). Both 'curb' and 'tire' have other
meanings as well, but they are less likely in some circumstances -- for
instance, a catalog of toys. 'Car' might have multiple meanings (and
because of model railroads, have high likelihood even in toys catalogs)
but, for instance, 'motorcar' doesn't and would be an exact synonym for
'automobile', and even with 'car' a search for 'train car' would bring
up train cars at a higher relevance than motorcars, as it should be
(and allow for '-motorcar' in a boolean search).

Thing is, I wouldn't *expect* a search engine to match on them either,
but I *want* one to somehow.

Actually, I did come up with a sort-of solution since I posted that
will work in a preprocessing sense. All I have to do is make a table of
synonyms like so:

[synonym]
word varchar(32)
synonym varchar(32)

...then I can add a column 'alternate_word' and add it to the fulltext
index. Then for each word in synonym, I can update the synonym column
to include that synonym where the other columns match the word but
don't match the synonym already.

That way when someone searches a costume store for 'Armor' they get
things made by Brits, Aussies and Canadians, too.
--
Dodger
Jens Grivolla
2006-11-15 19:54:13 UTC
Permalink
Post by Dodger
I was wondering if there was any way to somehow modify a fulltext
search to be able to match on full synonyms and (perhaps more
importantly) alternative spellings.
This is a tough one. As you said, you could build a dictionary of
synonyms and use that to expand your query, but you will have a hard
time getting good coverage.

What is most effective in natural language is blind relevance feedback,
i.e. automatic query expansion using terms from the documents that use
your initial terms. You will thus find documents that share common
words with documents talking about cars, tires, and curbs, which will
hopefully include documents talking about automobiles, tyres, and
kerbs. However, this is obviously far from perfect and only works when
you have a large document collection, including documents that match
your initial query.

You can use this in MySQL by using the "WITH QUERY EXPANSION" option.
You will find more documentation by googling for this option.

HTH, HAND,
Jens

Loading...