Swecha to Develop TELUGU LLM Corpus & Culture Portal

Hyderabad, 20th May 2024: Swecha, a non-profit organisation dedicated to promoting Free Software and Free Knowledge movements; announced a massive internship program on Artificial Intelligence (AI), the ‘SUMMER OF AI’, for over a lakh Engineering students this summer, to equip and make them job ready with AI skills, while aiding Swecha to develop Telugu language centric LLM. This initiative is being undertaken by Swecha, in collaboration with the IIIT Hyderabad; Ozone tel, a leading provider of cloud communication solutions; Meta and TASK. Guests Chaitanya Chokkareddy, CTO, OZONE TEL Communications Pvt Ltd.; Y Kiran Chandra, Founder, Swecha; & Praveen Chandra, Secretary, Swecha; briefed media about this project at a press conference, at Swecha, today.

Photos (2) (1)

This initiative is significant, considering that Indian language and India-centric LLM (Large Language Models) are virtually non-existent. India, with its rich culture and a population that constitutes one-sixth of the world, would greatly benefit from having its own LLMs. Most Indian languages are considered low-resource languages, making it challenging to develop LLMs for them. A significant amount of foundational knowledge needs to be compiled and digitized to create the necessary digital data for these languages.

Today, AI is transforming the knowledge landscape by introducing new job functions such as dataset compilation, data cleaning, data labelling, and managing datasets. These roles are essential for building and refining basic models, ensuring the accuracy and effectiveness of AI Applications.

Swecha aims to capitalize on the vast talent pool of engineering students graduating in India and ready to enter the industry, by training them in AI. This presents an opportunity to create a large pool of trained AI engineers, extending well beyond the small group of researchers and developers specialized in deep models.

This project, SUMMER OF AI, attempts to combine the two objectives. A very large scale internship program for first and second year engineering students, trained in basics of AI and then be engaged in very large scale data collection thru interviews. The Project aims to interview people in the villages and towns, collect information and knowledge on various folks, local skills and information, which includes the Telugu folk tales, songs, food, local-places-history and more.

The approach of the project is to collect speech, transcribe the speech and create a dataset for both speech and as a base LLM. In addition to this, the team is also working with few large libraries and Telugu academy to also ingest a lot of books. This process will be done through 100k interns month long internships. we started the first batch with 10k interns. Tools are being built to help with the data collection (at this scale). And at the end of the collection, backend tools needed to create the dataset and also to publish the information on a Teluguwiki like portal. On successful completion of this project, similar approach will be adopted to collect data for other languages and regions also.

Swecha to Develop TELUGU LLM Corpus & Culture Portal with Engineering Students’ Talent

ByPrabhat

By Prabhat

Related Post

Advancing European Aviation: Deutsche Aircraft Showcases Multi-Role Innovation at ILA Berlin

Space: LGM Group Obtains Process Certification for Manual Wiring of Electronic Boards

Flir Thermal Imaging Helps Reveal Hidden Text in Ancient Herculaneum Papyri

Leave a Reply Cancel reply

You missed

Cinnzeo Bakery Café Signs Multi-Unit Franchise Agreement for Telangana Territory in India

Oberoi Mall & Sky City Mall Hosts International Yoga Day Workshops

Advancing European Aviation: Deutsche Aircraft Showcases Multi-Role Innovation at ILA Berlin

Telegram Removed from Google Play Store Amid Temporary Restrictions; iOS Access Continues

Business News Matters