Word Similarity

Experience word similarity with the AI algorithm behind the Tasting Intelligence Data-driven Flavor Profile.

Summary

In this article, we explain how we developed our Tasting Intelligence Data-driven Flavor Profile. This software provides a comprehensive representation of flavors extracted from texts. We delve into one of the artificial intelligence algorithms used in our software, word similarity. Additionally, this article includes an app that allows you to experience and understand how it works firsthand. Get an exclusive sneak peek into our innovative software.

Introduction

Tasting is not an exact science; taste and tasting notes can vary from person to person. These variations depend on the individual's palate development and their tasting experiences. Other influencing factors include the environment, location, and occasion of the tasting. Personal associations with flavors also play a significant role, making the experience unique for everyone. The Tasting Intelligence Flavor Profile offers a data-driven approach to capturing these nuances. Flavor notes are optimized using a mathematical model to interpret taste from text descriptions.

The tasting wheel looks like this:


The Tasting Intelligence Flavor Profile aligns with other common flavor profiles, aiming to create a versatile tasting wheel for extracting flavors using our software. While it is primarily designed for alcoholic beverages, it is equally applicable to non-alcoholic drinks and food. If you want to learn more about flavors, we invite you to read our article: flavors explained.

In this article, we explain one of the key algorithms in our software: word similarity. You'll get an exclusive sneak peek into our developed software, and an app is available to explore and experience the concept of word similarity firsthand.

For an example of analysis using word similarity, please visit our sample review: rum analysis example.

Method: Word Similarity

The core algorithm used to extract flavors from text relies on cosine similarity between words. This method forms the foundation for generating a flavor wheel from textual descriptions. It analyzes word similarities using a statistical model known as word vectors or 'word embeddings'.

The software used to calculate similarity is spaCy, an open-source library for advanced Natural Language Processing (NLP) in Python. SpaCy describes its similarity feature as "comparing words, text spans, and documents to measure their similarity." This model can compare individual words with each other and against entire texts or documents.

In our experiments, we created and optimized a flavor profile using our algorithm. Our aim was to identify unique words that contribute to the primary flavors. We selected and compared flavors based on their similarity, iterating through this process.

The cosine similarity method considers both identical and non-identical words (flavors). For a given word (flavor), it compares its similarity against text (reviews or tasting notes). If the similarity exceeds a specified threshold, it contributes to identifying the main flavor.

Results and Discussion

After an iterative process, we constructed the Tasting Intelligence Flavor Profile. The heatmap below visualizes the similarity of words (flavors), illustrating the correlations within the Tasting Intelligence Flavor Profile.

Choose your threshold value:

Our analysis reveals that the words exhibit high similarity both among themselves and with other words associated with the same primary flavor. This indicates that the software effectively extracts distinct flavors using the described words.

Interesting cases: fruity and spicy

In culinary contexts, spices can share correlations with fruits. For instance, consider lemon zest. Lemon zest and peel exhibit significant similarities with spices. According to the Collins Dictionary, zest refers to 'the outer skin of a lemon, orange, or lime used to flavor cakes or drinks,' much like spices—defined as plant parts or powders added to food for flavor, such as cinnamon, ginger, and paprika. Both zest and spices are extracts that enhance flavor. In our statistical model, lemon zest, lemon peel, and charred mandarin peel show more similarities with spices, thus categorizing them as such.

Conclusion

The model relies on similarity analysis using spaCy's word vector statistical model. Fortunately, existing descriptions of word similarity proved invaluable. The algorithm constructs tasting notes by analyzing word-to-word similarity and counting occurrences. Based on these counts, the software generates an AI-projected flavor map. When a match occurs, the flavor from the profile contributes to the corresponding primary flavor.

he data-driven tasting wheel is designed for simplicity and optimized specifically for our software to analyze text and identify tastes and flavors accurately. Different software may result in varying optimized flavor wheels. Additionally, our wheel incorporates words not initially in the flavor list to capture similarities with related words