We were a team of four students—all MS in Data Science. Our goal for the GQP was to analyze the knowledge base articles to determine what is covered, where is the duplication, and strategize how to improve the documentation. We were provided with the dataset; the corpus had about 19000+ documents for different articles. Our approach was to apply unsupervised learning on the data; the architecture of the model was to get the tf-idf and reduce the dimension by applying PCA and cluster the m in low dimension.
For each cluster we applied the topic modeling to extract the context and came up with different topics/area that various documents in the corpus consisted of. This analysis gives a great value-add to the business—to understand if there are redundancy in the documentations, to see where they need to improve their documentations, and understand how can it be improved and made more efficient for customers using their products. The right documentation reduced time and money for the business in various ways.
KGF, Karnataka, India
Prof. Fatemeh Emdad
The exposure to the real world data science problem and working with mentors on-site and guiding us throughout the project was a great learning experience; really, it was the most valuable part. The experience adds great value to my resume for my prospect carrier in the field of data science and gives me a sense on how to approach data science problems from scratch. After I graduate, I want to work in the field of data science to contribute and learn more. If everything works well, I might come back for PhD.