CS50 Project: Make A Text Analyzer On Your Own in C language.

I just completed a text analyzer project from Harvard’s CS50 course, and so can you!!!

Harsh Prateek
8 min readSep 21, 2022
An image of the Harvard campus.
Photo by Aubrey Odom-Mabey on Unsplash

To give a little background, I started the CS50: Introduction to Science course at Harvard University a few weeks ago.

It is a beginner-level Computer Science course at Harvard, also available online at edx.org for free.

It has video lectures on YouTube, and they also provide some coding problems each week which are present on the official website of the course.

To get the edx certificate, you have to complete the coding project and pass make them pass the test cases that the CS50 team provides.

This is also a code problem from that course. Let’s get started😊.

Working of the Project in brief:-

So, what we are making here is a text analyzer. We would take input from the user in the form of some text and then would run our calculations and provide output about what should be the approximate school grade in which that text should be taught. Here is an image of the output of the project.

Example of the working of the program.
Example of the working of the program

The passage I used above, is for 8th graders and above according to our little Program. It may look too complex and hard but if you stick with this tutorial till the very end, you would have a working text analyzer yourself. Let’s get started😊.

Initial Setup:-

Since we are making this project for the CS50 course, I am going to use their library and their online IDE.

It makes things a lot easier and this way, we can focus more on solving a problem rather than thinking about our programming language.

You may need to sign in using a Github account but after that, CS50 has already taken care of every step we can think of. This is what it will look like:-

The codespace provided by CS50 to code for their course
The codespace provided by CS50 to code for their course

Now, make a file with the .c extension using the file explorer given on the left. This is the file we would write code and compile later.

Add the Libraries:-

C libraries required in the Project
C Libraries required in the project

These are all the libraries that we would need to make this project. Remember, all the other libraries would work anywhere, in a C code editor but the cs50.h library is specifically designed for this course and would not work anywhere but this code editor provided by the CS50 team.

To add a library in a C program, we have to follow this syntax at the start of the program, since the C compiler needs to include these files before the execution of the rest program.

#include <library_name.h>

The #include keyword tells the compiler to include these libraries in the program and treat them as a part of our code, even though we are not the ones who wrote these codes.

Declare the Functions:-

In C programming, it is considered a good practice to have your main function at the top of the program, it makes it easier for other programmers to understand what your codebase is doing and it also makes it easier for you to maintain the codebase when it grows too big.

Still, for the compiler’s sake, we have to at least declare all the functions we are going to use, like a teaser of a movie tells us that a movie is going to release. This is called a function prototype. Here is how we do it:-

All the custom functions required in the Project
All the custom functions required in the Project

Don’t get me wrong, we can also do these tasks in the main function, but it is considered better to abstract some tasks that are not necessary to be present in the main function. It makes the codebase look clean and easy to maintain.

You can name the functions whatever you want, they all would take one string argument and would have a return type according to the value they are returning.

Let’s make all of them first, then we would proceed to the main function. Things are going to be much more fun from now on😉.

count_words function:-

Image of the function.
Image of the function

So, what we have here in the image is an example of how functions are made in the C language.

In C language, a function is declared by a return type; that signifies the value it is returning, the name of the function, and the argument list; the input values function would get at the time it is called.

What this function does is take a string argument and calculate the number of spaces in the string. Since in the English language all the words are separated by space, we would get the number of words by counting the number of spaces.

To get this, we initialized a counter variable of integer type and also calculated the length of the string by the strlen() function.

Now, all we have to do is, loop over the string and count all the whitespace.
For, this we have to check each character for whitespace and if there is one, then increase our counter by one.

While returning, we have to increase the counter by one to accommodate the first word of the text that does not have a space before it.

count_sentence function:-

Similar to our count_word function, we need to make a function that can count the number of sentences in our text.

For our problem’s sake, we just need to remember that a sentence is terminated by a period “.”, the exclamation “!”, or the question “?”.

Thus we would follow a similar approach as we did in counting the words in the text. We would count the number of these symbols present in our text and that would be equal to the number of sentences present in our text. This is the function:-

The image of the count_sentence function
The image of the count_sentence function

This time, we do not need to increase the counter by one on return since a sentence start’s with its first letter and that is accounted for in our approach.

count_letters function:-

To count the letters, we need an inbuilt function from the ctype library, the rest of the process would be the same as we did with the other two.

All we need to do is make a counter variable, calculate the length of the string passed in the argument and then loop over each character of the string to find alphabets and count them. This is the code equivalent of the same:-

count_letter function
count_letter function

evaluate Function:-

This is the function in which all the computation and evaluation would happen for our program. It calls all the other functions to analyze the text, then runs its calculations on their results, and at the end of it, would return the final answer.

This function can be considered the most important function after the main function. Here is how it looks in the code:-

image of the evaluate function
image of the evaluate function

Since it is returning a decimal value, it has a return type of float, we make three variables here, call the three functions we made earlier and store their value in them.

Next, we have to make two more functions, one for the percentage average of the letters to words, and the other for the percentage average of the sentences to words.

From these two values, we would get the index of the grade our text belongs to, using the formula given in the image above.

main Function:-

This is it guys, we have made it to the most important part of our program, the main function can be considered as the program itself, while all the other functions can be considered as its parts.

This is the function where the compiler starts its execution of the program, here we are supposed to take input from the user, send it to the evaluate function, and then print the result we get from it. Here is the code of the main function:-

image of the main function
image of the main function

Like I said earlier, we have to first take string input from the user, for that we can use the get_string() function from the cs50.h library. Then we need to call the evaluate function, giving it the string we got from the user. We also need to round off the result as we are required to print integral values as the answer.

When we get the result back, we can print the result according to the conditions set by the CS50 team since we are making it for the course, otherwise, our work is done by this point.

Here, I printed the grade if it lies between 1 and 16 and gave messages accordingly if it is lower or higher than this range.

Conclusion:-

So, this was it guys, we solved a so-called hard problem from the CS50 course, pretty easily. I would highly recommend everyone to join in on the course if you too want to grab the fundamentals of Computer Science but don’t know where to start.

For the year 2022, the fall session has already begun but you can take the course any time you want at your own pace. The course is completely free and is a great way to start your career as a software developer in the industry.

Follow me here on Medium as I am also following this course and would be posting stuff about the course that I would think important for everyone to know.

Hope you have a wonderful day ahead😉.

--

--

Harsh Prateek

Hi, I am Harsh and I write about coding and learning techniques. I am a student myself and would love to tell everyone my secrets for coding and learning.