Natural Language Processing Succinctly^®
by Joseph D. Booth

CHAPTER 8

Answering Questions

We've reached this point with the ability to generate a list of tagged words from a text. Whether you relied on the Cloudmersive (or other API) or used the code presented in the book, that tagged list of word objects give us a good chance to have the computer respond, in a somewhat friendly manner, to the user's text. Our basic method will be one that takes that tagged sentence and tries to provide a response.

static public string GetResponse(List<string> Words_,List<string> Tags_)

The word list and the tag list are passed as parameters. By searching through the lists, we should be able to determine what the user is asking, and how to answer.

We now have the phrases and a set of functions about our data. In this chapter, we will integrate the pieces and allow you to ask questions and get answers. Figure 12 shows a conversation with the tennis major application.

Figure 12 – Talking tennis

Who won the 2001 French Open?

Gustavo Kuerten won the men's side and Jennifer Capriati won on the women's side.

Who lost?

Àlex Corretja lost to Gustavo Kuerten and Kim Clijsters lost to Jennifer Capriati.

Who won the first Wimbledon?

Rod Laver won in a Good match over Tony Roche and Billie Jean King won in a Good match over Judy Tegart.

Note: Our dataset only goes back 50 years, so "first" refers to the first Wimbledon in our dataset, when the first actual Wimbledon tournament was played in 1877 and won by Spencer Gore.

Who has won the most Wimbledons?

Roger Federer has won 8 Wimbledons.

Getting started

Our goal sounds simple—we need to determine three things to answer the question. First, what is being asked. Second, which function call has the answer. Third, if we can get the information the function call needs from the list of word info objects.

What we can answer

If we review the functions from the previous chapter, we know how to answer the following questions. Table 8 lists the functions and the information we need to determine to call the functions.

Table 8 – Functions and parameters

Function	Parameters
WhoWon	Tournament, Year, Gender
WhoLost	Tournament, Year, Gender
FinalScore	Tournament, Year, Gender
MostWins	Tournament, Gender
MostLosses	Tournament, Gender
PlayerWins	Tournament, PlayerName
PlayerLosses	Tournament, PlayerName

At minimum, we need the tournament name (all functions expect the tournament as the first parameter). The other parameters will vary, depending on the type of question being asked.

First question

Our first question, is “Who won the 2001 French Open?” If we look at our tagged phrases, we get the list of tagged words shown in Figure 13.

Figure 13 – Tagged words

The YEAR and EVENT tags let us know which tournament the question is referring to. We have two parameters that we can pass to any of our first three function calls. Since we don't know gender, we will report results from the tournament for both men and women.

The verb is likely to indicate which function we want, the winner or losers of the tournament. Since our verb is the word won, we should report the winner. Since the question is who, we know the user is expecting a name.

With this information, we can call the function and generate an answer.

Saving information

One thing we want to do is to create variables to hold the information that the user gives us, that might be reused as parameters. Listing 37 shows the static class variables we declare to remember the parameters.

Listing 37 – Remembered variables

private static int TournamentYear;

private static string Tournament;

private static string Gender;

private static string PlayerName;

Whenever we determine the parameters from a question text, we update the class variables, so the user doesn't have to repeat themselves.

Initializing the variables

If you can determine them, it could be helpful in the constructor to provide default values for the class variables. For our tennis application, we know there are four tournaments, and they are played in different months. Listing 38 shows us providing tournament and year values, based on the current month. If a user simply asks, “Who won tennis?” the system will report results of the appropriate tournament, based on the time on year.

Listing 38 – Default values

static TennisMajors() {

int MM = DateTime.Now.Month;

if (MM<=2) { Tournament = "AUS"; };

if (MM>3 && MM <= 5) { Tournament = "FRENCH"; };

if (MM> 5 && MM <= 7) { Tournament = "WIMBLEDON"; };

if (MM>7) { Tournament = "USOPEN"; };

TournamentYear = DateTime.Now.Year;

}

Answering the question

We are now going to create a function that will take the tagged list of words and attempt to generate an answer. Listing 39 shows the function.

Listing 39 – Answer tennis questions

static public string GetResponse(List<string> Words_,List<string> Tags_)

{

string ans_ = "";

for (int x = 0; x < Tags_.Count; x++)

{

if (Tags_[x] == "YEAR")

{

TournamentYear = Convert.ToInt16(Words_[x]);

}

if (Tags_[x] == "EVENT")

{

Tournament = Words_[x];

if (Tournament.Contains("FRENCH")) { Tournament = "FRENCH"; }

if (Tournament.Contains("US ")) { Tournament = "USOPEN"; }

if (Tournament.Contains("AUS ")) { Tournament = "AUS"; }

}

if (Tags_[x].StartsWith("VB")) { LastVerb = Words_[x].ToUpper();

}

if (Tags_[x] == "PERSON") { PlayerName = Words_[x].ToUpper();

}

if (LastVerb == "WON") {

if (Gender == "B")

{

ans_ = WhoWon(Tournament, TournamentYear, "M");

string ansW = WhoWon(Tournament, TournamentYear, "F");

if (ansW.Length>0) { ans_ += " and " + ansW; }

}

else

{

ans_ = WhoWon(Tournament, TournamentYear, Gender);

}

if (LastVerb == "LOST") {

if (Gender == "B")

{

ans_ = WhoLost(Tournament, TournamentYear, "M");

string ansW = WhoLost(Tournament, TournamentYear, "F");

if (ansW.Length > 0) { ans_ += " and " + ansW; }

}

else

{

ans_ = WhoLost(Tournament, TournamentYear, Gender);

}

if (ans_.Length < 1) { ans_ = "I don't know..."; }

return ans_;

}

This basic code processes a couple key verbs (WON and LOST) and calls the appropriate function to return an answer. It makes a loop through the sentence tags, seeing if it could find parameters to pass along to the calls to narrow down which tournament and year the user is asking about.

Don't be boring

When people answer a question, they might phrase it differently each time. We want our “Who Won” routine to be a bit creative. Listing 40 shows the routine determining the answer, but formatting it differently based on a random selection.

Listing 40 – Who won

static public string WhoWon(string Tournament, int Year, string Gender)

{

string[] PossibleReplies = {

"{0} was the {1}'s winner",

"{0} won the {1}'s",

"{0} won on the {1}'s side",

"{0} defeated {2} in the {1}'s draw",

"{0} won in {3} sets over {2} ",

"It was won by {0} in {3} sets"

};

string ans_ = "";

Tournament Results_ = GetResults(Tournament, Year, Gender);

if (Results_ != null)

{

string GenderText = "men";

if (Gender=="F") { GenderText = "women"; }

int reply = rnd.Next(1, PossibleReplies.Length) - 1;

// Some tennis vocabulary

string SetText = Results_.SetsPlayed.ToString();

if (Results_.SetsPlayed == 3 && Gender=="M")

{ SetText = "straight"; }

if (Results_.SetsPlayed == 2 && Gender == "F")

{ SetText = "straight"; }

ans_ = string.Format(PossibleReplies[reply],

Results_.Winner.FullName,

GenderText,

Results_.RunnerUp.FullName,

SetText);

}

return ans_;

}

While it is possible to simply extract the answer and return the person's name, the application will appear more friendly and easier to use if it seems more human (in this case, by giving different ways of providing the answer).

Depending on your application, you can really enhance the application by understanding your data. In our case, the score of the match is stored as a string. With a bit of string manipulation, we can make a guess as to how close (or one-sided) the match was.

Speaking the user's language

Knowing your user and application vernacular is helpful and can make your application seem more "human-like." For example, in tennis, if a person wins all the sets without losing a set, it is said to be a straight set victory. If a player wins all the games, they've "bageled" the other player.

Games won

The score string from the data, looks like the following.

6–2, 6–2, 6–2

Listing 41 shows how to determine the games won and the games lost based on the score string.

Listing 41 – Games won

static public int GamesWon(string Scores)

{

int TotalWon = 0;

string[] Sets_ = Scores.Split(',');

foreach(string CurSet in Sets_)

{

string[] WinLoss = CurSet.Split(',');

if (WinLoss.Count()==2)

{

TotalWon += Convert.ToInt16(WinLoss[0]);

}

return TotalWon;

}

static public int GamesLost(string Scores)

{

int TotalLost = 0;

string[] Sets_ = Scores.Split(',');

foreach (string CurSet in Sets_)

{

string[] WinLoss = CurSet.Split(',');

if (WinLoss.Count() == 2)

{

// Deal with tiebreakers

int x = WinLoss[1].IndexOf("(");

if (x>0) { WinLoss[1] = WinLoss[1].Substring(0, x - 1); }

TotalLost += Convert.ToInt16(WinLoss[1]);

}

return TotalLost;

}

Games lost is very similar, but deals with the tiebreaker string if it appears. By looking at a match and applying some tennis logic, we can determine if the match was close or not. We might want to adjust our replies even further. Let's determine if the match was one-sided, close, or a good match.

We will add to our stock replies, the following text response.

"{0} won in a {4} match over {2} "

For simplicity sake, we use the following formula.

If not straight sets (three for men, or two for women), it means that the winner lost at least one set. So, we can assume that was a close match. If the match was decided in straight sets, and the winner won more than 2-3 times as many games as the loser, we will call that a one-sided match. Listing 42 shows the function to make a rating guess for the match.

Listing 42 – Match rating

static public string MatchRate(bool StraightSets,int GamesWon,int GamesLost)

{

string ans_ = "good";

if(!StraightSets)

{ ans_ = "close";

if (GamesWon-GamesLost < 4) { ans_ = "very close"; }

}

else

{

if (GamesWon > GamesLost*2.5)

{

ans_ = "one-sided";

}

return ans_;

}

With this code added to our functions, the application gets a bit opinionated (Nadal won 6-2,6-3,6-1), as shown in Figure 14.

Figure 14 – Opinionated tennis application

Who won the 2017 French Open?

Rafael Nadal won in a one-sided match over Stan Wawrinka, and Jeļena Ostapenko won in a very close match over Simona Halep.

Of course, the computer knows nothing about the actual matches, just what the data it sees tells it. Be sure to consider your audience as your application gets more creative in its responses.

Second question

The second question is very simple: “Who lost?” Since the user has not given us a year or the tournament name, we will rely on the previous answers. From the previous question, we know it was the 2001 French Open, so the system finds the verb LOST, and has enough information to determine which function to call.

Remembering previous replies will make the system much more friendly to the user. If I enter “Who is the Human Resource manager?” and the system replies “Julie,” you can assume my follow-up question of, “What is her email?” refers to Julie and can be answered. If your system identifies a person, the pronouns he and she should be replaced with the person’s name in the system's memory.

Third question

The third question is: “Who won the first Wimbledon?” The WHO question and EVENT tell us we are looking for a person, but we don't know the year. However, the keyword FIRST tells us to get the very first year we have data for. So, we scan our word list, looking for first, earliest, etc., keyword-searching the word array.

One of the drawbacks is having to possibly know all the synonyms a person might use. There is a web service available called WordsAPI that allows you to find synonyms for a given word. An example of the JSON response is shown in Listing 43.

Listing 43 – Words API synonym API call

{ "word": "first",

"synonyms":

[ "1st",

"inaugural",

"maiden",

"kickoff",

"start",

"foremost",

]

}

By using the API, you can anticipate expected words, such as first or latest, and have the API prebuild possible synonyms. You can also store the list locally in a dictionary. If you are using your own code, you will probably want to keep a synonym dictionary of words likely to be used by your audience.

Final question

The last question is: “Who has won the most Wimbledons?” In this example, we are counting on the tags to identify the event (Wimbledon) and verb (WON). We are also expecting the keyword most (as an adverb or adjective). By detecting the verb (WON or LOST) and the modifier most, we can determine which method in the dataset to call. We can change the question a bit, and still get a reasonable reply, as shown in Figure 15.

Figure 15 - Who lost

WHO HAS LOST THE US OPEN THE MOST?

Ivan Lendl has lost 5 times

Summary

By parsing the tagged sentence to extract missing data and relying on the verb to guess which function to call, we can generally do a pretty good job of matching input sentences to functions that provide the answer. Again, the more you know about your application, the better you will be able to anticipate the types of questions you might find.

I would suggest, at least initially, keeping a log of the questions asked and answers provided by your application. You will likely keep tweaking your code, based on what the users are asking. As you get a collection of common questions, and tweak the code to answer them, your system will appear smarter every time.

It is possible to simply return the exact answer a person wants, but the system will appear more useful if you use random responses, or even humorous answer to provide the information. We are designing a system to interact with people, so we don't need to be quite as rigid as the protocols needed when we talk between computer systems. People will like the variety and light nature of the responses generated.

Have fun generating responses, but know your audience. If you are designing a system for the military, they might not appreciate a lighter, varying response. (And they carry guns.)

Build apps 2X faster

using Syncfusion Essential Studio^® suite

1800+ high-performance UI components.
Includes popular controls such as Grid, Chart, Scheduler, and more.
24x5 unlimited support by developers.

Get Your Free Trial Now

Answering Questions

Getting started

What we can answer

First question

Saving information

Initializing the variables

Answering the question

Don't be boring

Speaking the user's language

Games won

Second question

Third question

Final question

Summary

DISCLAIMER: Web reader is currently in beta. Please report any issues through our support system. PDF and Kindle format files are also available for download.