Friday, December 30, 2016

Dealing with events in the real world ...


Working on my AI project I've come to the point where I need to be able to deal with events from the real world. Say a signal from a motion detector, cat door opened or someone popping the Champagne on New Years Eve, or Cider if you are not into Champagne... I don't judge :)

So lets first define what we mean by a World Event.
  • it can be described
  • it happened Somewhere
  • it happened at a specific Time

public class WorldEvent
{
    public Guid Id { get; set; }
    public string Description { get; set; }
    public DateTime When { get; set; }
    public AveCoordinate Where { get; set; }

    public WorldEvent()
    {
        Id = Guid.NewGuid();
        Description = string.Empty;
    }

    public override string ToString()
    {
        return $"WorldEvent[{When:O}, lng:{Where.Longitude:##.000},lat:{Where.Latitude:##.000}]";
    }
}

If you wonder about the AveCoordinate used above, it is defined as follows:
public class AveCoordinate
{
    public double Latitude { get; set; }
    public double Longitude { get; set; }
    public double Altitude { get; set; }
    public AveCoordinate()
    {
        // for deserialization
    }
    public AveCoordinate(double lat, double lng)
    {
        Latitude = lat;
        Longitude = lng;
    }
    public AveCoordinate(double lat, double lng, double altitude)
    {
        Latitude = lat;
        Longitude = lng;
        Altitude = altitude;
    }

    public double GetDistanceTo(AveCoordinate coordinates)
    {
        return GetDistanceTo(coordinates.ToGeoCoordinate());
    }
    public double GetDistanceTo(GeoCoordinate coordinates)
    {
        return ToGeoCoordinate().GetDistanceTo(coordinates);
    }

    public GeoCoordinate ToGeoCoordinate()
    {
        return new GeoCoordinate(Latitude, Longitude, Altitude);
    }
}

You might wonder why I chose to not use the GeoCoordinate class directly, but it kind of didn't like to be serialized with fastJSON that is my go to utility for serializing/deserializing stuff. So I had to create my own coordinate class, but as you can see the calculations are done with help of the GeoCoordinate class.
At the moment the GeoCoordinate GetDistanceTo method does not take the Altitude into account so I might have to write my own at some point. But for now it does the trick.

Ok, so now we have a base class that can be used to describe things that happen in the real world.
So what now?

Event Relations

I would like to know if events could be related to each other in some way.
In the real world we work with 4 dimensions, space being the first 3 and time being the last.
So how do we figure out how related 2 events are?

Naive formulation
  • They happened at the same place or close by
  • They happened at the same time or close to
Lets add a naive Distance calculation to the WorldEvent class

public double Distance(WorldEvent other)
{
    var metres = Where.GetDistanceTo(other.Where) + 1d;
    var seconds = (When - other.When).TotalSeconds + 1d;
    var distance = Math.Sqrt(Math.Pow(metres, 2d)*Math.Pow(seconds, 2d));
    return distance;
}

Does this work?

I really try to avoid the advanced stuff with clustering algorithms etc. until I really can't solve the problem in any other way. So lets put down some tests to see if this actually works for our purposes.

[TestMethod]
public void WorldEventTest_IrrelevantDirection()
{
    var first = new WorldEvent
    {
        When = new DateTime(2016, 03, 22, 07, 0, 0),
        Where = new AveCoordinate(58.58073, 14.00119)
    };
    var second = new WorldEvent
    {
        When = new DateTime(2016, 03, 22, 07, 0, 0),
        Where = new AveCoordinate(58.58073, 14.00119)
    };
    var actualFirst = first.Distance(second);
    var actualSecond = second.Distance(first);
    Assert.AreEqual(actualFirst, actualSecond);
}
The first test just makes sure that we didn't screw things up. The distance calculation should be the same in both directions.

[TestMethod]
public void WorldEventTest_SameLocation_YearsBetweenVsMinutesBetween()
{
    var first = new WorldEvent
    {
        When = new DateTime(1934, 03, 22, 07, 0, 0),
        Where = new AveCoordinate(58.58073, 14.00119)
    };
    var second = new WorldEvent
    {
        When = new DateTime(2016, 03, 22, 07, 0, 0),
        Where = new AveCoordinate(58.58073, 14.00119)
    };
    var third = new WorldEvent
    {
        When = new DateTime(2016, 03, 22, 07, 3, 0),
        Where = new AveCoordinate(58.58073, 14.00119)
    };
    var actualFirst = first.Distance(second);
    var actualSecond = second.Distance(third);
    Assert.IsTrue(actualFirst > actualSecond);
}
Here we ensure that things that happen close-by in time at the same place will get a lesser distance then the events that happen a very long time apart.

[TestMethod]
public void WorldEventTest_SameTime_FarAwayVersusClose()
{
    // Stelvio 46.623215,10.7990846
    var first = new WorldEvent
    {
        When = new DateTime(2016, 03, 22, 07, 0, 0),
        Where = new AveCoordinate(46.623215, 10.7990846)
    };
    var second = new WorldEvent
    {
        When = new DateTime(2016, 03, 22, 07, 0, 0),
        Where = new AveCoordinate(58.58073, 14.00119)
    };
    var third = new WorldEvent
    {
        When = new DateTime(2016, 03, 22, 07, 0, 0),
        Where = new AveCoordinate(58.559686, 13.959535)
    };
    var actualFirst = first.Distance(second);
    var actualSecond = second.Distance(third);
    Assert.IsTrue(actualFirst > actualSecond);
}
And here the other way around. Events that happen at the same time, but on different parts of the globe should have a longer distance then events that happen in the same area.

Conclusion

Ok, so not the most advanced stuff but it does the trick for my project at the moment. We can always add more advanced calculations at a later stage, maybe just keep this as a preprocessing stage to filter out really non related stuff.
So far just keeping it simple. :)

All code provided as-is. This is copied from my own code-base, May need some additional programming to work. Hope this helps someone out there :)


Thursday, December 29, 2016

Performance: Enum ToString or GetName


Usually I am too lazy to bother using anything else than the ordinary ToString when working with enums, but I always have that filthy feeling that I am cheating.
If there is a dedicated method in the framework to do something, there probably is a reason for it. And knowing that the Enum class contains a GetName method to get the string representation of a enum value made me want to know if there is a performance difference between the two.

First, the tested Enum.GetName method, for sake of argument wrapped up as an Extension for easier and cleaner usage.

public static class Extension
{
    public static string GetEnumName<T>(this T t) where T : struct, IConvertible
    {
        return Enum.GetName(typeof (T), t);
    }
}

Back to extensions list

The test rigg

The test rigg is basically the same that I used in Performance of different lock methods in .net.
Uses RunningAverage to calculate the average without storing each individual number.
Just cleaned up and simplified for this scenario:

Data

private enum SmallEnum
{
    Small,
    Larger
}
private enum LargerEnum
{
    a, b, c, d, e, f, g, h,
    aaaaa, bbbbb, ccccc, ddddd,
    xxxxx, yyyyy, zzzzz,
    VeryLongNameThisIs,
    OtherName, Test, oooo, olf,
    eowind, eeeee, ss, qqqqq,
    mu, miu, bach, escher,
    godel, code, number, last
}

Decided to test 2 different enums with different number of elements to see if that has any impact. The small has 2 elements and the larger has 32.

Code

private void ExecuteTestSmall(int iterations)
{
    var type = "small";
    string s = string.Empty;
    try
    {
        var testData = GenerateTestDataSmall();
        var toStringTimings = new RunningAverage();
        var getNameTimings = new RunningAverage();
        for (int i = 0; i < iterations; i++)
        {
            for (int dataIndex = 0; dataIndex < testData.Count; dataIndex++)
            {
                var item = testData[dataIndex];
                toStringTimings.Add(TimeAction(() =>
                {
                    s = item.ToString();
                }));
                if (!string.IsNullOrEmpty(s))
                    s = string.Empty;
                getNameTimings.Add(TimeAction(() =>
                {
                    s = item.GetEnumName();
                }));
                if (!string.IsNullOrEmpty(s))
                    s = string.Empty;
            }
        }
        Log($"{type}ToString\tDataCount\t2\tIterations:\t{iterations}\tAverage:\t{toStringTimings.Average:0.000}\tticks");
        Log($"{type}GetName\tDataCount\t2\tIterations:\t{iterations}\tAverage:\t{getNameTimings.Average:0.000}\tticks");
    }
    catch (Exception ex)
    {
        Log($"{type}Fail\tDataCount\t2\tIterations:\t{iterations}\tFailed\t{ex.Message}");
    }
}

Ok, I was lazy. I created 2 methods instead of putting the effort to reuse it. The above is for the SmallEnum. Exactly the same code for the LargerEnum tester with difference in GenerateTestDataSmall to return a List<LargerEnum> instead of List<SmallEnum>.

This was executed as follows:
public void Execute()
{
 Log("--------------------------------------");
 Log("... single thread");
 ExecuteTestSmall(250000);
 ExecuteTestLarger(250000);
}

250000 times each.

Results

ToString enum size 2 Iterations: 250000 Average: 1.231 ticks
GetName enum size 2 Iterations: 250000 Average: 0.662 ticks
ToString enum size 32 Iterations: 250000 Average: 1.357 ticks
GetName enum size 32 Iterations: 250000 Average: 0.692 ticks

Conclusion

Using the dedicated Enum.GetName function to get the string representation instead of the lazy object.ToString() seems to be twice faster. But in the end, we are talking around 1 tick, so maybe not the first place to start optimizing if you have a performance issue in you application. As I wrote above, it just feels a little bit lazy to not do it the correct way ;)

All code provided as-is. This is copied from my own code-base, May need some additional programming to work. Hope this helps someone out there :)


Saturday, December 3, 2016

Performance of different lock methods in .net


Something that pops up from time to time is what kind of thread synchronization technique should be used when dealing with shared ´resources in a multi-threaded environment. And yes, I know that the best would be to not share resources at all but that is not always an option, especially when working with legacy code-bases and applications that have organically grown over time.

Shared resource access types

I've included the following operations in the test.
public interface ICacheTest
{
    void Add(CacheItem item);
    CacheItem Get(Guid id);
    CacheItem Search(Guid id);
    void Clear();
}


Add
Adds an item to the dictionary

Get
Gets an item with key lookup.

Search
There will always be scenarios when the key lookup isn't enough. Most code-bases that I've seen tend to go the LINQ way. Some have extra lookup dictionaries. For this test I wanted to include something that takes a little more time then a lookup, so LINQ query is used.

Techniques in the test

Unsafe

No thread synchronization, the test class basically just wraps an Dictionary<Guid, CacheItem>
public class CacheUnsafe : ICacheTest
{
    private readonly Dictionary<Guid, CacheItem> _cache = new Dictionary<Guid, CacheItem>();
    public void Add(CacheItem item)
    {
        _cache.Add(item.Id, item);
    }
    public CacheItem Get(Guid id)
    {
        if (_cache.ContainsKey(id))
            return _cache[id];
        return null;
    }
    public CacheItem Search(Guid id)
    {
        var item = (from x in _cache.Values where x.Id.Equals(id) select x).FirstOrDefault();
        return item;
    }
    public void Clear()
    {
        _cache.Clear();
    }
}


Full lock

using the lock key-word to just allow one thread at a time to execute within the block of code. This is usually my go-to method as it is easy to implement and usually performs good enough.
public class CacheFullLock : ICacheTest
{
    private readonly Dictionary<Guid, CacheItem> _cache = new Dictionary<Guid, CacheItem>();
    public void Add(CacheItem item)
    {
        lock (_cache)
        {
            _cache.Add(item.Id, item);
        }
    }
    public CacheItem Get(Guid id)
    {
        lock (_cache)
        {
            if(_cache.ContainsKey(id))  
                return _cache[id];
        }
        return null;
    }
    public CacheItem Search(Guid id)
    {
        lock (_cache)
        {
            var item = (from x in _cache.Values where x.Id.Equals(id) select x).FirstOrDefault();
            return item;
        }
    }
    public void Clear()
    {
        lock (_cache)
        {
            _cache.Clear();
        }
    }
}

Full lock with indexed search

Same as the full lock but also tries to optimize the search to a lookup instead. So in this simplified test the lookup looks kinda unnecessary but in the real world it could be looking up with multiple parameters instead of running the Linq query.
public CacheItem Search(Guid id)
{
    lock (_cache)
    {
        if (_index.ContainsKey(id))
            return _cache[_index[id]];
        return null;
    }
}

ReaderWriterLock

Using the ReaderWriterLock class. Allows multiple readers to access the resource at the same time, but only 1 writer at a time.
public class CacheWriteLock : ICacheTest
{
    private readonly Dictionary<Guid, CacheItem> _cache = new Dictionary<Guid, CacheItem>();
    private readonly ReaderWriterLock _lock = new ReaderWriterLock();
    public void Add(CacheItem item)
    {
        _lock.AcquireWriterLock(1000);
        try
        {
            _cache.Add(item.Id, item);
        }
        finally
        {
            _lock.ReleaseWriterLock();
        }
    }
    public CacheItem Get(Guid id)
    {
        _lock.AcquireReaderLock(1000);
        try
        {
            if (_cache.ContainsKey(id))
                return _cache[id];
            return null;
        }
        finally
        {
            _lock.ReleaseReaderLock();
        }
    }
    public CacheItem Search(Guid id)
    {
        _lock.AcquireReaderLock(1000);
        try
        {
            var item = (from x in _cache.Values where x.Id.Equals(id) select x).FirstOrDefault();
            return item;
        }
        finally
        {
            _lock.ReleaseReaderLock();
        }
    }
    public void Clear()
    {
        _lock.AcquireWriterLock(1000);
        try
        {
            _cache.Clear();
        }
        finally
        {
            _lock.ReleaseWriterLock();
        }
    }
}

ConcurrentDictionary

The built in concurrent dictionary that is thread-safe. I tried to use this a long way ago when it was first introduced, but found it under-performing compared to a simple Full lock strategy. So up to the test again. At least it looks easy to implement and you do not have to care about the thread safety yourself at it is handled internally.
public class CacheConcurrent : ICacheTest
{
    private readonly ConcurrentDictionary<Guid, CacheItem> _cache = new ConcurrentDictionary<Guid, CacheItem>();
    public void Add(CacheItem item)
    {
        _cache.AddOrUpdate(item.Id, item, (key, oldValue) => item);
    }
    public CacheItem Get(Guid id)
    {
        if (_cache.ContainsKey(id))
            return _cache[id];
        return null;
    }
    public CacheItem Search(Guid id)
    {
        var item = (from x in _cache.Values where x.Id.Equals(id) select x).FirstOrDefault();
        return item;
    }
    public void Clear()
    {
        _cache.Clear();
    }
}

Single-threaded test

Measuring ticks elapsed for add, get, search and clear methods. Number of items are increased with each test epoch. Each epoch executes the test for 1000 iterations.
So no multi-threading here, just a check of baseline differences between the solutions. I.e. time to acquire the lock, ability to cope with different number of elements etc.

The test rig:

Uses RunningAverage to calculate the average without storing each individual number.

private void ExecuteTestSingle(ICacheTest cache, int dataCount, int iterations)
{
    var type = cache.GetType().Name;
    try
    {
        var testData = GenerateTestData(dataCount);
        var addTimings = new RunningAverage();
        var clearTimings = new RunningAverage();
        var getTimings = new RunningAverage();
        var searchTimings = new RunningAverage();
        for (int i = 0; i < iterations; i++)
        {
            for (int dataIndex = 0; dataIndex < testData.Count; dataIndex++)
            {
                var item = testData[dataIndex];
                addTimings.Add(TimeAction(() =>
                {
                    cache.Add(item);
                }));
                getTimings.Add(TimeAction(() =>
                {
                    cache.Get(item.Id);
                }));
                searchTimings.Add(TimeAction(() =>
                {
                    cache.Search(item.Id);
                }));
            }
            clearTimings.Add(TimeAction(cache.Clear));
        }
        Log($"{type}\tAdd\tDataCount\t{dataCount}\tIterations:\t{iterations}\tAverage:\t{addTimings.Average:0.000}\tticks");
        Log($"{type}\tGet\tDataCount\t{dataCount}\tIterations:\t{iterations}\tAverage:\t{getTimings.Average:0.000}\tticks");
        Log($"{type}\tSearch\tDataCount\t{dataCount}\tIterations:\t{iterations}\tAverage:\t{searchTimings.Average:0.000}\tticks");
        Log($"{type}\tClear\tDataCount\t{dataCount}\tIterations:\t{iterations}\tAverage:\t{clearTimings.Average:0.000}\tticks");
    }
    catch (Exception ex)
    {
        Log($"{type}\tFail\tDataCount\t{dataCount}\tIterations:\t{iterations}\tFailed\t{ex.Message}");
    }
}

private float TimeAction(Action action)
{
    var sw = Stopwatch.StartNew();
    action();
    sw.Stop();
    return sw.ElapsedTicks;
}

Multi-threaded test

3 threads executing at the same time.

  • 1 thread Adds
  • 1 thread Gets
  • 1 thread Searches
Number of successful operations withing the allowed timespan (50 milliseconds) are counted.
The test is executed 1000 times per type and an average is calculated

The test rig:

private void ExecuteTestMulti(ICacheTest cache, int dataCount, int iterations)
{
    var type = cache.GetType().Name;
    try
    {
        var testData = GenerateTestData(dataCount);
        var getAverage = new RunningAverage();
        var searchAverage = new RunningAverage();
        var writeAverage = new RunningAverage();
        for (int i = 0; i < iterations; i++)
        {
            foreach (var data in testData)
                cache.Add(data);

            var d = testData.GetRandom();
            var startTime = DateTime.UtcNow.AddMilliseconds(8);
            var getter = CreateThread(startTime, () => { cache.Get(d.Id); });
            var searcher = CreateThread(startTime, () => { cache.Search(d.Id); });
            var writer = CreateThread(startTime, () => {cache.Add(CacheItem.Create());});

            getter.Thread.Start(startTime);
            writer.Thread.Start(startTime);
            searcher.Thread.Start(startTime);
            getter.Thread.Join();
            searcher.Thread.Join();
            writer.Thread.Join();
            if (!getter.Verify(type, Log)
                || !searcher.Verify(type, Log)
                || !writer.Verify(type, Log))
            {
                return;
            }
            getAverage.Add(getter.WorkIterations);
            searchAverage.Add(searcher.WorkIterations);
            writeAverage.Add(writer.WorkIterations);
            cache.Clear();
        }
        Log($"{type}\tGets\tDataCount\t{dataCount}\tIterations:\t{iterations}\tAverage:\t{getAverage.Average:0}\tgets");
        Log($"{type}\tSearch\tDataCount\t{dataCount}\tIterations:\t{iterations}\tAverage:\t{searchAverage.Average:0}\tsearches");
        Log($"{type}\tWrites\tDataCount\t{dataCount}\tIterations:\t{iterations}\tAverage:\t{writeAverage.Average:0}\twrites");
    }
    catch (Exception ex)
    {
        Log($"{type}\tFail\tDataCount\t{dataCount}\tIterations:\t{iterations}\tFailed\t{ex.Message}");
    }
}

private WorkerResult CreateThread(DateTime start, Action action )
{
    var work = new WorkerResult();
    work.Thread = new Thread((context =>
    {
        try
        {
            while (DateTime.UtcNow < start)
                Thread.Sleep(0);
            var sw = Stopwatch.StartNew();
            while (sw.ElapsedMilliseconds < 50)
            {
                action();
                work.Increment();
            }
        }
        catch (Exception ex)
        {
            work.Exception = ex;
        }
    }))
    {
        IsBackground = true
    };
    return work;
}

private class WorkerResult
{
    public Thread Thread { get; set; }
    public Exception Exception { get; set; }
    public float WorkIterations => _workIterations;
    private float _workIterations;

    public void Increment()
    {
        _workIterations++;
    }

    public bool Verify(string type, Action<string> logger)
    {
        if (Exception != null)
        {
            logger($"{type}\tFail\tDataCount\t-\tIterations:\t-\tFailed\t{Exception.Message}");
            return false;
        }
        return true;
    }
}

Results

This post only contains the summary..
If you are interested in all the timings, I've posted the table in a previous post.

Starting with the single threaded results





For the 16 items test, the ConcurrentDictionary clearly is a looser. The other strategies are at the same level and for this small number of items it seems to not be so important what strategy to chose.






So not much surprising here, the ConcurrentDictionary lost the search category and unsurprisingly the for the cause optimized lookup was the fastest that became clear with a little more data.
The Add and Get categories was won by the unsafe lookup, not really interesting for this test but
good for a baseline. The ConcurrentDictionary seems to have lost most categories.

Multi-threaded results

The unsafe did not make it here, due to the 'Collection was modified; enumeration operation may not execute' exception. The main reason we use locks in a multi-threaded environment ;)
So here we only look at 4 different contestants.



The full lock strategy seems to fail here, probably as they all lock out each other. I.e. a long duration search locks out other readers. The switch to full lock with a lookup dictionary instead of a linq query won the searches count by far, but that tends to be the case most times when writing a non-generic bit of code to optimize your scenario. So no surprise there.
The ConcurrentDictionary actually holds its place quite well, if you use it as a dictionary only. Doing searches in it seems to go very slow.

Conclusions

The numbers seem to jump a little and the conclusion I can draw is that if performance is key to your success, do your own performance tests for your exact scenario. For the general case, I think that I'll stick with the Full-Lock and extra indexes if needed strategy for now. The extra complexity needed to get a ReaderWriterLock is in my mind not worth the risk of me or someone else screwing it up, at least not for a 1 writer, 2 readers scenario.


All code provided as-is. This is copied from my own code-base, May need some additional programming to work. Hope this helps someone out there :)


Performance of different lock methods in .net, results table

This post only contains the results table for the test. Full report can be found in this post:
http://dreamstatecoding.blogspot.com/2016/12/performance-of-different-lock-methods.html

Results, single-threaded

CacheUnsafeAdd DataCount 16 Iterations: 1000 Average: 0,177 ticks
CacheUnsafeGet DataCount 16 Iterations: 1000 Average: 0,222 ticks
CacheUnsafeSearch DataCount 16 Iterations: 1000 Average: 1,057 ticks
CacheUnsafeClear DataCount 16 Iterations: 1000 Average: 0,306 ticks
CacheFullLockAdd DataCount 16 Iterations: 1000 Average: 0,238 ticks
CacheFullLockGet DataCount 16 Iterations: 1000 Average: 0,283 ticks
CacheFullLockSearch DataCount 16 Iterations: 1000 Average: 0,806 ticks
CacheFullLockClear DataCount 16 Iterations: 1000 Average: 0,55 ticks
CacheFullLockIxAdd DataCount 16 Iterations: 1000 Average: 0,342 ticks
CacheFullLockIxGet DataCount 16 Iterations: 1000 Average: 0,305 ticks
CacheFullLockIxSearch DataCount 16 Iterations: 1000 Average: 0,378 ticks
CacheFullLockIxClear DataCount 16 Iterations: 1000 Average: 0,738 ticks
CacheWriteLockAdd DataCount 16 Iterations: 1000 Average: 0,357 ticks
CacheWriteLockGet DataCount 16 Iterations: 1000 Average: 0,442 ticks
CacheWriteLockSearch DataCount 16 Iterations: 1000 Average: 1,018 ticks
CacheWriteLockClear DataCount 16 Iterations: 1000 Average: 0,602 ticks
CacheConcurrentAdd DataCount 16 Iterations: 1000 Average: 3,9 ticks
CacheConcurrentGet DataCount 16 Iterations: 1000 Average: 0,275 ticks
CacheConcurrentSearch DataCount 16 Iterations: 1000 Average: 48,013 ticks
CacheConcurrentClear DataCount 16 Iterations: 1000 Average: 46,609 ticks
CacheUnsafeAdd DataCount 128 Iterations: 1000 Average: 0,137 ticks
CacheUnsafeGet DataCount 128 Iterations: 1000 Average: 0,188 ticks
CacheUnsafeSearch DataCount 128 Iterations: 1000 Average: 2,94 ticks
CacheUnsafeClear DataCount 128 Iterations: 1000 Average: 0,591 ticks
CacheFullLockAdd DataCount 128 Iterations: 1000 Average: 0,188 ticks
CacheFullLockGet DataCount 128 Iterations: 1000 Average: 0,237 ticks
CacheFullLockSearch DataCount 128 Iterations: 1000 Average: 2,821 ticks
CacheFullLockClear DataCount 128 Iterations: 1000 Average: 0,608 ticks
CacheFullLockIxAdd DataCount 128 Iterations: 1000 Average: 0,257 ticks
CacheFullLockIxGet DataCount 128 Iterations: 1000 Average: 0,229 ticks
CacheFullLockIxSearch DataCount 128 Iterations: 1000 Average: 0,304 ticks
CacheFullLockIxClear DataCount 128 Iterations: 1000 Average: 0,961 ticks
CacheWriteLockAdd DataCount 128 Iterations: 1000 Average: 0,286 ticks
CacheWriteLockGet DataCount 128 Iterations: 1000 Average: 0,364 ticks
CacheWriteLockSearch DataCount 128 Iterations: 1000 Average: 2,883 ticks
CacheWriteLockClear DataCount 128 Iterations: 1000 Average: 0,719 ticks
CacheConcurrentAdd DataCount 128 Iterations: 1000 Average: 1,651 ticks
CacheConcurrentGet DataCount 128 Iterations: 1000 Average: 0,233 ticks
CacheConcurrentSearch DataCount 128 Iterations: 1000 Average: 48,919 ticks
CacheConcurrentClear DataCount 128 Iterations: 1000 Average: 45,472 ticks
CacheUnsafeAdd DataCount 256 Iterations: 1000 Average: 0,136 ticks
CacheUnsafeGet DataCount 256 Iterations: 1000 Average: 0,174 ticks
CacheUnsafeSearch DataCount 256 Iterations: 1000 Average: 5,088 ticks
CacheUnsafeClear DataCount 256 Iterations: 1000 Average: 0,803 ticks
CacheFullLockAdd DataCount 256 Iterations: 1000 Average: 0,19 ticks
CacheFullLockGet DataCount 256 Iterations: 1000 Average: 0,233 ticks
CacheFullLockSearch DataCount 256 Iterations: 1000 Average: 5,104 ticks
CacheFullLockClear DataCount 256 Iterations: 1000 Average: 0,915 ticks
CacheFullLockIxAdd DataCount 256 Iterations: 1000 Average: 0,26 ticks
CacheFullLockIxGet DataCount 256 Iterations: 1000 Average: 0,231 ticks
CacheFullLockIxSearch DataCount 256 Iterations: 1000 Average: 0,312 ticks
CacheFullLockIxClear DataCount 256 Iterations: 1000 Average: 1,789 ticks
CacheWriteLockAdd DataCount 256 Iterations: 1000 Average: 0,293 ticks
CacheWriteLockGet DataCount 256 Iterations: 1000 Average: 0,374 ticks
CacheWriteLockSearch DataCount 256 Iterations: 1000 Average: 5,282 ticks
CacheWriteLockClear DataCount 256 Iterations: 1000 Average: 0,976 ticks
CacheConcurrentAdd DataCount 256 Iterations: 1000 Average: 1,534 ticks
CacheConcurrentGet DataCount 256 Iterations: 1000 Average: 0,235 ticks
CacheConcurrentSearch DataCount 256 Iterations: 1000 Average: 52,51 ticks
CacheConcurrentClear DataCount 256 Iterations: 1000 Average: 45,32 ticks
CacheUnsafeAdd DataCount 512 Iterations: 1000 Average: 0,136 ticks
CacheUnsafeGet DataCount 512 Iterations: 1000 Average: 0,175 ticks
CacheUnsafeSearch DataCount 512 Iterations: 1000 Average: 9,784 ticks
CacheUnsafeClear DataCount 512 Iterations: 1000 Average: 1,377 ticks
CacheFullLockAdd DataCount 512 Iterations: 1000 Average: 0,208 ticks
CacheFullLockGet DataCount 512 Iterations: 1000 Average: 0,235 ticks
CacheFullLockSearch DataCount 512 Iterations: 1000 Average: 9,817 ticks
CacheFullLockClear DataCount 512 Iterations: 1000 Average: 1,663 ticks
CacheFullLockIxAdd DataCount 512 Iterations: 1000 Average: 0,265 ticks
CacheFullLockIxGet DataCount 512 Iterations: 1000 Average: 0,234 ticks
CacheFullLockIxSearch DataCount 512 Iterations: 1000 Average: 0,31 ticks
CacheFullLockIxClear DataCount 512 Iterations: 1000 Average: 3,482 ticks
CacheWriteLockAdd DataCount 512 Iterations: 1000 Average: 0,297 ticks
CacheWriteLockGet DataCount 512 Iterations: 1000 Average: 0,371 ticks
CacheWriteLockSearch DataCount 512 Iterations: 1000 Average: 9,889 ticks
CacheWriteLockClear DataCount 512 Iterations: 1000 Average: 1,774 ticks
CacheConcurrentAdd DataCount 512 Iterations: 1000 Average: 1,214 ticks
CacheConcurrentGet DataCount 512 Iterations: 1000 Average: 0,233 ticks
CacheConcurrentSearch DataCount 512 Iterations: 1000 Average: 58,012 ticks
CacheConcurrentClear DataCount 512 Iterations: 1000 Average: 44,27 ticks
CacheUnsafeAdd DataCount 1024 Iterations: 1000 Average: 0,147 ticks
CacheUnsafeGet DataCount 1024 Iterations: 1000 Average: 0,186 ticks
CacheUnsafeSearch DataCount 1024 Iterations: 1000 Average: 19,618 ticks
CacheUnsafeClear DataCount 1024 Iterations: 1000 Average: 3,47 ticks
CacheFullLockAdd DataCount 1024 Iterations: 1000 Average: 0,213 ticks
CacheFullLockGet DataCount 1024 Iterations: 1000 Average: 0,238 ticks
CacheFullLockSearch DataCount 1024 Iterations: 1000 Average: 19,128 ticks
CacheFullLockClear DataCount 1024 Iterations: 1000 Average: 3,541 ticks
CacheFullLockIxAdd DataCount 1024 Iterations: 1000 Average: 0,258 ticks
CacheFullLockIxGet DataCount 1024 Iterations: 1000 Average: 0,23 ticks
CacheFullLockIxSearch DataCount 1024 Iterations: 1000 Average: 0,307 ticks
CacheFullLockIxClear DataCount 1024 Iterations: 1000 Average: 6,754 ticks
CacheWriteLockAdd DataCount 1024 Iterations: 1000 Average: 0,307 ticks
CacheWriteLockGet DataCount 1024 Iterations: 1000 Average: 0,378 ticks
CacheWriteLockSearch DataCount 1024 Iterations: 1000 Average: 19,413 ticks
CacheWriteLockClear DataCount 1024 Iterations: 1000 Average: 3,584 ticks
CacheConcurrentAdd DataCount 1024 Iterations: 1000 Average: 0,933 ticks
CacheConcurrentGet DataCount 1024 Iterations: 1000 Average: 0,233 ticks
CacheConcurrentSearch DataCount 1024 Iterations: 1000 Average: 77,34 ticks
CacheConcurrentClear DataCount 1024 Iterations: 1000 Average: 45,052 ticks
CacheUnsafeAdd DataCount 2048 Iterations: 1000 Average: 0,139 ticks
CacheUnsafeGet DataCount 2048 Iterations: 1000 Average: 0,18 ticks
CacheUnsafeSearch DataCount 2048 Iterations: 1000 Average: 37,257 ticks
CacheUnsafeClear DataCount 2048 Iterations: 1000 Average: 6,617 ticks
CacheFullLockAdd DataCount 2048 Iterations: 1000 Average: 0,216 ticks
CacheFullLockGet DataCount 2048 Iterations: 1000 Average: 0,24 ticks
CacheFullLockSearch DataCount 2048 Iterations: 1000 Average: 37,24 ticks
CacheFullLockClear DataCount 2048 Iterations: 1000 Average: 6,759 ticks
CacheFullLockIxAdd DataCount 2048 Iterations: 1000 Average: 0,259 ticks
CacheFullLockIxGet DataCount 2048 Iterations: 1000 Average: 0,231 ticks
CacheFullLockIxSearch DataCount 2048 Iterations: 1000 Average: 0,31 ticks
CacheFullLockIxClear DataCount 2048 Iterations: 1000 Average: 12,961 ticks
CacheWriteLockAdd DataCount 2048 Iterations: 1000 Average: 0,313 ticks
CacheWriteLockGet DataCount 2048 Iterations: 1000 Average: 0,38 ticks
CacheWriteLockSearch DataCount 2048 Iterations: 1000 Average: 37,501 ticks
CacheWriteLockClear DataCount 2048 Iterations: 1000 Average: 6,127 ticks
CacheConcurrentAdd DataCount 2048 Iterations: 1000 Average: 0,87 ticks
CacheConcurrentGet DataCount 2048 Iterations: 1000 Average: 0,235 ticks
CacheConcurrentSearch DataCount 2048 Iterations: 1000 Average: 112,173 ticks
CacheConcurrentClear DataCount 2048 Iterations: 1000 Average: 44,945 ticks
CacheUnsafeAdd DataCount 4096 Iterations: 1000 Average: 0,146 ticks
CacheUnsafeGet DataCount 4096 Iterations: 1000 Average: 0,18 ticks
CacheUnsafeSearch DataCount 4096 Iterations: 1000 Average: 73,893 ticks
CacheUnsafeClear DataCount 4096 Iterations: 1000 Average: 13,552 ticks
CacheFullLockAdd DataCount 4096 Iterations: 1000 Average: 0,231 ticks
CacheFullLockGet DataCount 4096 Iterations: 1000 Average: 0,238 ticks
CacheFullLockSearch DataCount 4096 Iterations: 1000 Average: 73,968 ticks
CacheFullLockClear DataCount 4096 Iterations: 1000 Average: 13,778 ticks
CacheFullLockIxAdd DataCount 4096 Iterations: 1000 Average: 0,267 ticks
CacheFullLockIxGet DataCount 4096 Iterations: 1000 Average: 0,229 ticks
CacheFullLockIxSearch DataCount 4096 Iterations: 1000 Average: 0,306 ticks
CacheFullLockIxClear DataCount 4096 Iterations: 1000 Average: 28,183 ticks
CacheWriteLockAdd DataCount 4096 Iterations: 1000 Average: 0,328 ticks
CacheWriteLockGet DataCount 4096 Iterations: 1000 Average: 0,378 ticks
CacheWriteLockSearch DataCount 4096 Iterations: 1000 Average: 74,292 ticks
CacheWriteLockClear DataCount 4096 Iterations: 1000 Average: 13,97 ticks
CacheConcurrentAdd DataCount 4096 Iterations: 1000 Average: 0,784 ticks
CacheConcurrentGet DataCount 4096 Iterations: 1000 Average: 0,237 ticks
CacheConcurrentSearch DataCount 4096 Iterations: 1000 Average: 192,573 ticks
CacheConcurrentClear DataCount 4096 Iterations: 1000 Average: 56,879 ticks
CacheUnsafeAdd DataCount 8192 Iterations: 1000 Average: 0,179 ticks
CacheUnsafeGet DataCount 8192 Iterations: 1000 Average: 0,179 ticks
CacheUnsafeSearch DataCount 8192 Iterations: 1000 Average: 147,167 ticks
CacheUnsafeClear DataCount 8192 Iterations: 1000 Average: 19,92 ticks
CacheFullLockAdd DataCount 8192 Iterations: 1000 Average: 0,29 ticks
CacheFullLockGet DataCount 8192 Iterations: 1000 Average: 0,263 ticks
CacheFullLockSearch DataCount 8192 Iterations: 1000 Average: 155,426 ticks
CacheFullLockClear DataCount 8192 Iterations: 1000 Average: 20,904 ticks
CacheFullLockIxAdd DataCount 8192 Iterations: 1000 Average: 0,32 ticks
CacheFullLockIxGet DataCount 8192 Iterations: 1000 Average: 0,25 ticks
CacheFullLockIxSearch DataCount 8192 Iterations: 1000 Average: 0,338 ticks
CacheFullLockIxClear DataCount 8192 Iterations: 1000 Average: 46,742 ticks
CacheWriteLockAdd DataCount 8192 Iterations: 1000 Average: 0,391 ticks
CacheWriteLockGet DataCount 8192 Iterations: 1000 Average: 0,413 ticks
CacheWriteLockSearch DataCount 8192 Iterations: 1000 Average: 157,023 ticks
CacheWriteLockClear DataCount 8192 Iterations: 1000 Average: 21,709 ticks
CacheConcurrentAdd DataCount 8192 Iterations: 1000 Average: 0,836 ticks
CacheConcurrentGet DataCount 8192 Iterations: 1000 Average: 0,266 ticks
CacheConcurrentSearch DataCount 8192 Iterations: 1000 Average: 377,017 ticks
CacheConcurrentClear DataCount 8192 Iterations: 1000 Average: 48,153 ticks


Results, multi-threaded

CacheUnsafeFail DataCount - Iterations: - Failed Collection was modified; enumeration operation may not execute.
CacheFullLockGets DataCount 256 Iterations: 1000 Average: 22426 gets
CacheFullLockSearch DataCount 256 Iterations: 1000 Average: 27437 searches
CacheFullLockWrites DataCount 256 Iterations: 1000 Average: 396 writes
CacheFullLockIxGets DataCount 256 Iterations: 1000 Average: 170105 gets
CacheFullLockIxSearch DataCount 256 Iterations: 1000 Average: 153597 searches
CacheFullLockIxWrites DataCount 256 Iterations: 1000 Average: 1799 writes
CacheWriteLockGets DataCount 256 Iterations: 1000 Average: 126810 gets
CacheWriteLockSearch DataCount 256 Iterations: 1000 Average: 16548 searches
CacheWriteLockWrites DataCount 256 Iterations: 1000 Average: 1737 writes
CacheConcurrentGets DataCount 256 Iterations: 1000 Average: 342192 gets
CacheConcurrentSearch DataCount 256 Iterations: 1000 Average: 853 searches
CacheConcurrentWrites DataCount 256 Iterations: 1000 Average: 1518 writes




Friday, December 2, 2016

RunningAverage,how to calculate average without storing each individual value in C#


If I want to get an average of a series of values, I usually just put them in a List<float> and the run the Average() function on it.
But from time to time, there is a need to calculate averages continuously over time where the list alternative just won't cut it as the calculation will cost more and more over time.
The alternative is to write your own class that does just that, calculate the average continuously but doesn't store each individual value.
Different solutions are discussed at:
http://math.stackexchange.com/questions/106700/incremental-averageing
There seems to be many names for this type of averaging. Incremental average, travelling average or running average just to name a few. I stuck with Running Average.

Some source-code (c#)
public class RunningAverage
{
    public float Average => _sum / _count;
    private float _sum;
    private int _count;
    public void Add(float v)
    {
        _sum += v;
        _count++;
    }
}

[TestMethod]
public void RunningAverageTest_NoValue()
{
    var target = new RunningAverage();
    Assert.AreEqual(float.NaN, target.Average);
}

[TestMethod]
public void RunningAverageTest_OneValue()
{
    var target = new RunningAverage();
    target.Add(10);
    Assert.AreEqual(10, target.Average);
}

[TestMethod]
public void RunningAverageTest_100Values()
{
    var target = new RunningAverage();
    var expected = new List<float>();
    for (int i = 0; i < 100; i++)
    {
        target.Add(i);
        expected.Add(i);
    }
    Assert.AreEqual(expected.Average(), target.Average);
}

2017-01-03, update
Changes were made to improve the solution, based on comment from Dan in the comments section. i.e. changed average calculation to the Average getter. Updated unit-test RunningAverageTest_NoValue to assume float.NaN instead of 0.


All code provided as-is. This is copied from my own code-base, May need some additional programming to work. Hope this helps someone out there :)


Friday, November 25, 2016

DIY Html encode your source code for your blog


A very lightweight tool to HTML encode your source-code (or anything really) and surround it in a pre-block. Perfect for posting on your website or blog.

private void btnEncode_Click(object sender, EventArgs e)
{
    var sb = new StringBuilder();
    sb.AppendLine("<pre>");
    sb.AppendLine(WebUtility.HtmlEncode(rtxt.Text));
    sb.AppendLine("</pre>");
    rtxt.Text = sb.ToString();
    rtxt.SelectionStart = 0;
    rtxt.SelectionLength = rtxt.TextLength;
}

Basically just a wrapper around the WebUtility.HtmlEncode method and a GUI for ease of use.

You will need to do the GUI yourself and compile it. Look at Visual Studio Community Edition for a place to start.

Hope this helps someone out there ;)

Until next time: Work to Live, Don’t Live to Work

Thursday, November 24, 2016

Basic word prediction in c#


In this little post I will go through a small and very basic prediction engine written in C# for one of my projects.
Basically what it does is the following:
  • It will collect data in the form of lists of strings
  • Given an input, it will give back a list of predictions of the next item. In falling probability order.
So no rocket science, but a good and basic prediction engine that can be build upon depending on what scenarios you want to solve.


First, some unit tests:


[TestMethod]
public void PredictorTest_NoMatch()
{
    var target = new Predictor();
    var actual = target.Predict("The rabbit-hole went straight on like a tunnel for some way");
    Assert.AreEqual(0, actual.Count);
}
[TestMethod]
public void PredictorTest_SingleMatch()
{
    var target = new Predictor();
    target.Predict("Alice opened the door and found that it led into a small passage");
    var actual = target.Predict("Alice opened the door");
    Assert.AreEqual(1, actual.Count);
    Assert.AreEqual("and", actual.First());
}
[TestMethod]
public void PredictorTest_SingleMatch_OtherNoise()
{
    var target = new Predictor();
    target.Predict("Alice opened the door and found that it led into a small passage");
    target.Predict("Alice took up the fan and gloves");
    var actual = target.Predict("Alice opened the door");
    Assert.AreEqual(1, actual.Count);
    Assert.AreEqual("and", actual.First());
}
[TestMethod]
public void PredictorTest_SingleMatch_TwoResultsInOrder()
{
    var target = new Predictor();
    target.Predict("Alice thought she might as well wait");
    target.Predict("Alice thought this a very curious thing");
    target.Predict("Alice thought she had never seen such a curious");
    var actual = target.Predict("Alice thought");
    Assert.AreEqual(2, actual.Count);
    Assert.AreEqual("she", actual.First());
    Assert.AreEqual("this", actual.Last());
}

So nothing too fancy. You provide it with input and as expected it returns a list of possible next values. Examples are from Alice's Adventures in Wonderland by Lewis Carroll, the Project Gutenberg edition.


Predictor class

public class Predictor
{
    private readonly Dictionary<string, Dictionary<string, int>> _items = new Dictionary<string, Dictionary<string, int>>();
    private readonly char[] _tokenDelimeter = {' '};
    public List<string> Predict(string input)
    {
        var tokens = input.Split(_tokenDelimeter, StringSplitOptions.RemoveEmptyEntries);
        var previousBuilder = new StringBuilder();
        Dictionary<string, int> nextFullList;
        foreach (var token in tokens)
        {
            nextFullList = GetOrCreate(_items, previousBuilder.ToString());
            if (nextFullList.ContainsKey(token))
                nextFullList[token] += 1;
            else
                nextFullList.Add(token, 1);

            if (previousBuilder.Length > 0)
                previousBuilder.Append(" ");
            previousBuilder.Append(token);
        }
        nextFullList = GetOrCreate(_items, previousBuilder.ToString());
        var prediction = (from x in nextFullList
            orderby x.Value descending
            select x.Key).ToList();

        return prediction;
    }
    private static T GetOrCreate<T>(Dictionary<string, T> d, string key)
    {
        if (d.ContainsKey(key))
        {
            return d[key];
        }
        var t = Activator.CreateInstance<T>();
        d.Add(key, t);
        return t;
    }
}

It's not smart, and it requires having seen quite a lot of samples to work well.
An additional improvement could be to add storage of previous/next token combinations so that if the full search doesn't yield any result, a guess could be made based on only the previous word. But I guess that is up to you.

All code provided as-is. This is copied from my own code-base, May need some additional programming to work.


Wednesday, November 23, 2016

Defining Intelligence

Lately I've been thinking a lot about intelligence, knowledge, thinking and how to define the goal of my Artificial Intelligence project. What is it that I want to achieve?
This post is basically a lot of in progress thoughts and references to popular culture.

So where to begin?

The inner voice(s)

Early humans may have thought that the voice in their head was the voice of God, but when did we become aware of a self?
Is one voice enough? Or should one strive to include at least two? Maybe with different approaches in a dialog with itself?
Maybe a loop of self triggered input that iterates core values and thus would form the system over time. Even if a lot of input is given from various sources, the repeating of core values would make those beliefs strong in different parts of the system. Like a backstory. Guess more ideas on this will come with more episodes of Westworld.

Neural networks, the brain


I have a rudimentary knowledge of how the human brain works at its lowest level, mainly by building huge networks of interconnected cells, neurons, that can propagate triggers in between them. When a neuron it triggered, it will in turn trigger its output that can be connected to multiple other neurons. A single neuron can in turn receive triggers from multiple neurons. This gets kind of out of hand quite fast.
There are a lot of research into modelling artificial neural networks, and honestly my own work with neural networks have been to solve very specific tasks. Maybe the system should be able to train its own networks to solve different kind of tasks? Maybe put it in the backlog for future pondering. To start with I think this is a too low level to look at intelligence.


Active Symbols


Something I picked up from the book 'Gödel, Escher, Bach: An Eternal Golden Braid' by Douglas R. Hofstadter. The idea to look at the brain on a higher level than the neural. A symbol could be anything, for example a word or a concept. Each symbol it is built by neurons activating in certain patterns. Neurons could be re-used between different symbols and symbols could activate other symbols in a network in itself.
Does the artificial intelligence need the neural layer? Or could we build just symbols and just activate them in different ways?
Need to be able to merge multiple symbols into new ones.
In the end this could give some sort of associative power to the entity, by activating symbols based on input and seeing what other symbols also activate from past knowledge.

Feeling of joy


When do humans feel joy? When we discover new things, when things go according to plan.

Should my project 'feel'?
The following dialog between two characters, played by Dustin Hoffman and Samuel L. Jackson, is from the movie Sphere.
Norman:
I would be happy if Jerry had no emotions whatsoever. Because the thing of it is once you go down that road... here's Jerry, an emotional being cooped up for 300 years with no one to talk to... none of the socialization, the emotional growth that comes from contact with other emotional beings...
Harry:
So...?
Norman:
What happens if Jerry gets mad?
To the point, is it OK to build something that can feel? Could it be a bad thing? But if you build something intelligent that can't feel, are you creating a sociopath?
What about boredom? Doing routine tasks makes me bored, that feeling makes me look at problems in other ways, maybe automate them. Should the agent try to minimize boredom and maximize joy? Or is there different kinds of joy. A too greedy algorithm would maybe not be able to do long term planning, go through times of focus to achieve something great.
Should the fitness function be floating, like it is for me. Sometimes I feel happy just watching TV, other times only a 10 K gets me there. Maybe a healthy balance is what we should strive for here as well.

Breathing

At least for me, a lot of my thinking is based around the basic need to breath. An artificial entity would not have that need, but maybe there should be some kind if core layer pulse that keeps things going. Something that triggers and activates itself, maybe the inner voice should be linked to this.
Is attention span linked to breathing as well? How many breaths can you keep a thought in active play before something else pops up instead?

Compartmentalizing

Working with a lot of information and knowledge will lead to conflicting ideas. Some sort of way to compartmentalize, separate and isolate, should be a good thing to have.

Prediction

The ability to predict future events based on past experiences. A simple way would be to just store sequences of words and increase counters when seen. And when a sequence is seen, different optional predictions would trigger what could be acted on. Maybe one part global and another part connected to a place or person. Being able to predict how someone could react seems core when trying to decide what to do next. This would require some sort of linking of past knowledge to a source.

Lastly

Should we strive to create something artificial with all the restrictions that humans have or should we strive for something without restrictions other than the computer hardware in itself. Maybe a lot of the self that we know comes from the constant struggle with the faulty hardware that we are running on. Or is the movie Transcendence with Johnny Depp and Rebecca Hall into something, the idea of unlimited resources.

This was a lot longer than I thought it would be. In the end a lot of questions and few answers but I guess that was expected. I hope this sparked some ideas, thoughts or feelings. :)