Monday, September 22, 2008

Featured users algorithms

Some of you may have noticed the featured users list just changed and there's been a few people either angry or confused as to why they were removed, and others asking about the algorithm so I figured I'd write a quick post about it.

Featured users was an idea I came up with one night in response to ROY4L's thoughts on the motivation of users to produce quality content as well as my near religious following of Clay Shirky's "A Group Is Its Own Worst Enemy" essay.

"Featured users" was supposed to provide two main benefits; to find users who consistently make good content that we could highlight on the front page in an effort to make the front page less overwhelming and less of a "treasure hunt". Second, and more often overlooked was to create an incentive for the "cream of the crop" users to create new content and interact with the site more often.

The goal was (and is still) ambitious; figure out who the best content producers are mathematically. I was surprised that a simple algorithm could produce fairly good results. The last featured users list was roughly 98% of generated by algorithm, with the last 2% being me adding or removing users manually.

So here, for the first time, is a run down of the incredibly simple featured users algorithm
(originally called 'user score'):

(average_site_score * 1.2)
(average_number_of_votes * 0.23)
x (number_of_favorites * 0.43)
= Your dumb score.

We calculate a score for all the users, and then take the top score and convert it to 10000, and convert all other scores to fit into that percentage. An example of how this turns out shows that the weight is very light towards the top and heavy towards the bottom. Here are some sample results from an old run

#1 nutnics 10000
#2 ROY4L 9947.16
#3 phaseblue 8328.45
#4 max 6374.33
#5 astuteNacute 5957.35
#6 krebstar 4685.71
#7 syncan 4451.45
#8 kingstefan 3757.21
#9 ALMusic 3620.85
#10 PCF 3549.24

From there, the scores went down drastically, because users near the top skew the results for everyone else. The requisite for getting on the "list" was a score over 200, which only roughly 300 people achieve.

At, I used more data to base the scores off of, number of comments (this is why whetstone made the list), number of views, etc. I then realized sites like "Blue Ball Machine" skewed the averages for everyone, so I tried to do them with the top 5% of each users sites excluded (trying to discard anomalies), but the results were still really off.

The numbers don't lie. This algorithm is working on a large enough set of data that a few up-voting alts wont make a difference. More people are viewing, favoriting and voting on the featured users than those who arent featured (even if you use a time scale of a period before featured users existed).

Now the only thing that you can really muck with here is the weight on each piece of the algorithm. Normally, you can look at the results and change the algorithm to remove results you don't like or get results you do like, but a huge part of this is opinion based. Multiple people want DarthWang to be featured, but I find I think the problem is that you can't please everyone with a single list.

This time around, after I was persuaded to let it happen, BTape and Teknorat took the generated list and then added and removed people as they saw fit, which is what caused much of the recent change. So focus your rage towards them for the next couple weeks.

I also made a quick change to the featured users content box, which filters out duplicate users, so at any one point in time you wont see more than one site by each user, which I think will deal with a lot of the spam issues.

anyway, back to work, dongs.

Thursday, September 18, 2008

Asset conversion

One of the problems with moving over to a Flash is that we can only import a finite set of file types at run-time. Flash doesn't support WAV or OGG or animated GIFs natively so part of the move is to clean out the asset system and convert file types where needed.

Originally when YTMND was created, there was no file type checking at all, people could basically upload anything and we would save it. Then we added simple file name checks, which removed a good portion of the miss-clicks, but people who had the proper file endings with invalid files still got through. The next variation of checks used mime-magic to actually check the file types to see if they were valid, but with the millions of different possibilities when encoding sound and images, mime-magic still wasn't perfect. In addition to that, the original mime-magic setup checked if the user uploaded a sound and an image, but didn't make sure they put them in the right fields.

Today, there are literally thousands of sites using images as sounds and vice versa, hundreds of sites using files we don't support like MIDI, OGG, etc. Hundreds more use word documents, zip files, executables, mpeg clips, etc. So far I've found some pretty interesting file types, once I do an initial pass on the whole file system I will post up statistics.

Now that I am at the stage where we need to convert the entire site over to work in Flash, I have written a much more thorough file type checking routine. One of the nice things about the fact that we are forced to do conversion on a lot of assets is that once we are in there converting, adding new file types will be easy. Hundreds of people have uploaded MIDI as sound files on their sites, which currently don't work at all. In the new conversion routine, MIDI files are converted to WAV and then compressed to MP3, so we can allow people to upload MIDI files if they want.

The new system will be basically be backwards compatible to the old system, so SWF versions of WAV files are just treated as children of the original asset, i.e. the original asset is archived. This means if we allowed MIDI files, they would be converted to MP3 to be heard in Flash, but users would still be able to download the original MIDI file from the site's asset pages.

So what sort of files do you think it would be beneficial to add to the allowed upload list? OGG? MOD? NSF or all the other 8 bit music types? What about image types?

I have thought about allowing SWF/FLV since a lot of people make their animated GIFs in Flash, but not only do I think the compression requires a human hand, I am concerned that YTMND would become even more of a YouTube clone than it is now.

Anyway that's all for now. Post a comment if you have any ideas or questions or hit up the IRC if you want to chat about it.