CS50 Project: Make A Text Analyzer On Your Own in C language.
I just completed a text analyzer project from Harvard’s CS50 course, and so can you!!!
To give a little background, I started the CS50: Introduction to Science course at Harvard University a few weeks ago.
It is a beginner-level Computer Science course at Harvard, also available online at edx.org for free.
To get the edx certificate, you have to complete the coding project and pass make them pass the test cases that the CS50 team provides.
This is also a code problem from that course. Let’s get started😊.
Working of the Project in brief:-
So, what we are making here is a text analyzer. We would take input from the user in the form of some text and then would run our calculations and provide output about what should be the approximate school grade in which that text should be taught. Here is an image of the output of the project.
The passage I used above, is for 8th graders and above according to our little Program. It may look too complex and hard but if you stick with this tutorial till the very end, you would have a working text analyzer yourself. Let’s get started😊.
Since we are making this project for the CS50 course, I am going to use their library and their online IDE.
It makes things a lot easier and this way, we can focus more on solving a problem rather than thinking about our programming language.
You may need to sign in using a Github account but after that, CS50 has already taken care of every step we can think of. This is what it will look like:-
Now, make a file with the .c extension using the file explorer given on the left. This is the file we would write code and compile later.
Add the Libraries:-
These are all the libraries that we would need to make this project. Remember, all the other libraries would work anywhere, in a C code editor but the cs50.h library is specifically designed for this course and would not work anywhere but this code editor provided by the CS50 team.
To add a library in a C program, we have to follow this syntax at the start of the program, since the C compiler needs to include these files before the execution of the rest program.
The #include keyword tells the compiler to include these libraries in the program and treat them as a part of our code, even though we are not the ones who wrote these codes.
Declare the Functions:-
In C programming, it is considered a good practice to have your main function at the top of the program, it makes it easier for other programmers to understand what your codebase is doing and it also makes it easier for you to maintain the codebase when it grows too big.
Still, for the compiler’s sake, we have to at least declare all the functions we are going to use, like a teaser of a movie tells us that a movie is going to release. This is called a function prototype. Here is how we do it:-
Don’t get me wrong, we can also do these tasks in the main function, but it is considered better to abstract some tasks that are not necessary to be present in the main function. It makes the codebase look clean and easy to maintain.
You can name the functions whatever you want, they all would take one string argument and would have a return type according to the value they are returning.
Let’s make all of them first, then we would proceed to the main function. Things are going to be much more fun from now on😉.
So, what we have here in the image is an example of how functions are made in the C language.
In C language, a function is declared by a return type; that signifies the value it is returning, the name of the function, and the argument list; the input values function would get at the time it is called.
What this function does is take a string argument and calculate the number of spaces in the string. Since in the English language all the words are separated by space, we would get the number of words by counting the number of spaces.
To get this, we initialized a counter variable of integer type and also calculated the length of the string by the strlen() function.
Now, all we have to do is, loop over the string and count all the whitespace.
For, this we have to check each character for whitespace and if there is one, then increase our counter by one.
While returning, we have to increase the counter by one to accommodate the first word of the text that does not have a space before it.
Similar to our count_word function, we need to make a function that can count the number of sentences in our text.
For our problem’s sake, we just need to remember that a sentence is terminated by a period “.”, the exclamation “!”, or the question “?”.
Thus we would follow a similar approach as we did in counting the words in the text. We would count the number of these symbols present in our text and that would be equal to the number of sentences present in our text. This is the function:-
This time, we do not need to increase the counter by one on return since a sentence start’s with its first letter and that is accounted for in our approach.
To count the letters, we need an inbuilt function from the ctype library, the rest of the process would be the same as we did with the other two.
All we need to do is make a counter variable, calculate the length of the string passed in the argument and then loop over each character of the string to find alphabets and count them. This is the code equivalent of the same:-
This is the function in which all the computation and evaluation would happen for our program. It calls all the other functions to analyze the text, then runs its calculations on their results, and at the end of it, would return the final answer.
This function can be considered the most important function after the main function. Here is how it looks in the code:-
Since it is returning a decimal value, it has a return type of float, we make three variables here, call the three functions we made earlier and store their value in them.
Next, we have to make two more functions, one for the percentage average of the letters to words, and the other for the percentage average of the sentences to words.
From these two values, we would get the index of the grade our text belongs to, using the formula given in the image above.
This is it guys, we have made it to the most important part of our program, the main function can be considered as the program itself, while all the other functions can be considered as its parts.
This is the function where the compiler starts its execution of the program, here we are supposed to take input from the user, send it to the evaluate function, and then print the result we get from it. Here is the code of the main function:-
Like I said earlier, we have to first take string input from the user, for that we can use the get_string() function from the cs50.h library. Then we need to call the evaluate function, giving it the string we got from the user. We also need to round off the result as we are required to print integral values as the answer.
When we get the result back, we can print the result according to the conditions set by the CS50 team since we are making it for the course, otherwise, our work is done by this point.
Here, I printed the grade if it lies between 1 and 16 and gave messages accordingly if it is lower or higher than this range.
So, this was it guys, we solved a so-called hard problem from the CS50 course, pretty easily. I would highly recommend everyone to join in on the course if you too want to grab the fundamentals of Computer Science but don’t know where to start.
For the year 2022, the fall session has already begun but you can take the course any time you want at your own pace. The course is completely free and is a great way to start your career as a software developer in the industry.
Follow me here on Medium as I am also following this course and would be posting stuff about the course that I would think important for everyone to know.
Hope you have a wonderful day ahead😉.