GoFlek


  • Home
  • Consultancy
  • Applications
  • Platform
  • Technology
    • Key Features
    • Use Cases
    • Pipeline
    • Architecture
    • FAQ
  • Blog
  • Company
    • Team
    • Contact

sentiment classification and summarization



Qualitative Survey Data Analysis



Using ML approaches for sentiment classification and summarization is among the widespread generative AI innovations of recent years. With the advent of LLM, its adoption has spread to the general public as a tool in everyday life and work. However, there are still major challenges, especially when dealing with “qualitative” data that needs to be processed within corporate space.


At the end of 2024, a Canadian research firm approached us with large quantities of qualitative data taken from surveys conducted across the country. The data includes textual comments to open-ended, questions, where the respondents represented a heterogeneous mix of people with varying styles, length of comments and languages (French and English).


The client, overwhelmed by the amount of time it would take to skim through and analyze the textual answers, was looking for a machine-assisted solution for sentiment detection and summarization.


All data was anonymized during implementation and production.


Client ... tight spot

With no expertise in house and unable to use cloud based tools for both technical and privacy related reasons, the client was in a tight spot. On the technical side, the textual comment data is stored and structured in a way that cannot be easily extracted and then streamed to the cloud APIs. While on the privacy side, data confidentiality required in-house protection and processing.


All this, necessitated a tailor-made on-premise solution than can both automate extraction and at the same time allow using natural language processing for classifying sentiment and then aggregating and summarizing comments.


GoFlek ... to rescue
GoFlek engineers working closely with the client decided to tackle this challenge over a period of one month ending 2024, The result was a two processing pipelines: one for Sentiment Classification and the other for Summarization both of which are fully automated and integrated.

Sentiment Classification … how it works
First, we coded a set of algorithms to extract and structure survey answers in a way that each text could be streamed independently to the Large-Language-Model for sentiment detection and classification. Second, we ran the whole pipeline and streamed back the output of the Sentiment LLM into an Excel sheet. The output assigned to each comment a sentiment score and class, such that: comments scoring between 0 and 3 were categories as negative and those between 7 and 10 as positive and the ones in-between as neutral.

For additional analysis, we then uploaded the Excel data into the Flek Machine. Here, we ran our exploration algorithms to plot the probability distribution of the scores in a way that can help further understand the classification results.

Comment Summarization … how it works
Using the extraction and text cleaning algorithms from the first pipeline, the second pipeline grouped the comments from each section and then ran another level of data preparation, since the text represented a heterogeneous mix of people with varying styles, length of comments and languages (French and English). Finally, the results were streamed to the Summarization LLM for the final output into the Excel sheet again.

Of course, just like Sentiment Classification, the Comment Summarization tool was built to accelerate and assist the human analysts and not to replace them.


By automating and integrating into the client's workflow as well as guaranteeing total data privacy (running 2 pipelines in-house), we were able to meet the client's needs and tight conditions. The entire project took about 2 weeks to complete and then 1 week for testing and another for putting into production.



© GoFlek Inc.