sunnuntai 19. helmikuuta 2017

Modelling the forgetting curve

I finally got to the point where cognitive modelling comes into the scene. In 1885, Hermann Ebbinghaus wrote a book about memory experiments to himself. In todays cognitive science such introspective nature of research would not be count as accepted. However, the results of the experiment was well defined. Ebbinghaus proposed based on his studies that forgetting curve hypothesis to decline of memory retention in time. With Stoning Rosetta my research question was to find a good model for forgetting. Ebbinghaus provided the basic form:
Where:
  • t is time in days
  • S is relative strength of memory
  • R is retention (probability)
The hard problem was how is this S defined? At wikipedia page there was no clue how it was supposed to be used. From game perspective well learnt things are not asked over and over again. It is bad if game is repetitive. However we are trying to learn things so some repetition is good and there needs to be option for it. I took mathplotlib at hand and started to experiment what kind of figures I have with different parameterisations. I started with idea that the count of repetition would not be the key in here. For example you could have 15 tries in one day but still forget everything in one week like one who plays only once. I selected to count only distinct days.
"select created, vocabulary, success, count(distinct date) from tasks where user_id = ~B group by vocabulary order by created;", [Id]
Then I needed some sort of R. I tried constant but it didn't feel right. The idea was that even if you are taking two week break after lot's of playing the forgetting model shall not be at zero. I came up with following where i is the count of distinctive days.
for i in range(8): plt.plot(map(lambda x: math.e**(-x/(1.5**i)), t))
To get the final percentage for user I multiplied this with success rate of last pass of game task. Example of the output result can be seen in a image below.
Then I added the exception to put last (within last hour) game task as first option. The idea being that player has the possibility to repeat the game and perhaps learn the vocabulary more deeply.
"select vocabulary from tasks where user_id = ~B and created > ~B order by created desc limit 1;", [Id, Now-3600]
The vocabularies are in machine readable format and are located in github reposity accessed via rawgit. The SQL is currently just sqlight3. Let's see if I run into performance issues at some point.