In this little post I will go through a small and very basic prediction engine written in C# for one of my projects.
Basically what it does is the following:
- It will collect data in the form of lists of strings
- Given an input, it will give back a list of predictions of the next item. In falling probability order.
So no rocket science, but a good and basic prediction engine that can be build upon depending on what scenarios you want to solve.
First, some unit tests:
[TestMethod] public void PredictorTest_NoMatch() { var target = new Predictor(); var actual = target.Predict("The rabbit-hole went straight on like a tunnel for some way"); Assert.AreEqual(0, actual.Count); }
[TestMethod] public void PredictorTest_SingleMatch() { var target = new Predictor(); target.Predict("Alice opened the door and found that it led into a small passage"); var actual = target.Predict("Alice opened the door"); Assert.AreEqual(1, actual.Count); Assert.AreEqual("and", actual.First()); }
[TestMethod] public void PredictorTest_SingleMatch_OtherNoise() { var target = new Predictor(); target.Predict("Alice opened the door and found that it led into a small passage"); target.Predict("Alice took up the fan and gloves"); var actual = target.Predict("Alice opened the door"); Assert.AreEqual(1, actual.Count); Assert.AreEqual("and", actual.First()); }
[TestMethod] public void PredictorTest_SingleMatch_TwoResultsInOrder() { var target = new Predictor(); target.Predict("Alice thought she might as well wait"); target.Predict("Alice thought this a very curious thing"); target.Predict("Alice thought she had never seen such a curious"); var actual = target.Predict("Alice thought"); Assert.AreEqual(2, actual.Count); Assert.AreEqual("she", actual.First()); Assert.AreEqual("this", actual.Last()); }
So nothing too fancy. You provide it with input and as expected it returns a list of possible next values. Examples are from Alice's Adventures in Wonderland by Lewis Carroll, the Project Gutenberg edition.
Predictor class
public class Predictor { private readonly Dictionary<string, Dictionary<string, int>> _items = new Dictionary<string, Dictionary<string, int>>(); private readonly char[] _tokenDelimeter = {' '}; public List<string> Predict(string input) { var tokens = input.Split(_tokenDelimeter, StringSplitOptions.RemoveEmptyEntries); var previousBuilder = new StringBuilder(); Dictionary<string, int> nextFullList; foreach (var token in tokens) { nextFullList = GetOrCreate(_items, previousBuilder.ToString()); if (nextFullList.ContainsKey(token)) nextFullList[token] += 1; else nextFullList.Add(token, 1); if (previousBuilder.Length > 0) previousBuilder.Append(" "); previousBuilder.Append(token); } nextFullList = GetOrCreate(_items, previousBuilder.ToString()); var prediction = (from x in nextFullList orderby x.Value descending select x.Key).ToList(); return prediction; } private static T GetOrCreate<T>(Dictionary<string, T> d, string key) { if (d.ContainsKey(key)) { return d[key]; } var t = Activator.CreateInstance<T>(); d.Add(key, t); return t; } }
It's not smart, and it requires having seen quite a lot of samples to work well.
An additional improvement could be to add storage of previous/next token combinations so that if the full search doesn't yield any result, a guess could be made based on only the previous word. But I guess that is up to you.
All code provided as-is. This is copied from my own code-base, May need some additional programming to work.
cool
ReplyDelete