Machine Learning Algorithm with Oracle

A place to discuss the science of computers and programs, from algorithms to computability.

Formal proofs preferred.

Moderators: phlip, Moderators General, Prelates

jacques01
Posts: 42
Joined: Thu Oct 08, 2015 4:56 am UTC

Machine Learning Algorithm with Oracle

Postby jacques01 » Sun Feb 28, 2016 6:00 pm UTC

Hi,

Can someone suggest/name some algorithms that would be ideal for an unsupervised online binary classifier? Links to packages, etc. are also very helpful. I'm not looking to implement by own, but rather find one that fits my use case.

I'm presented with a stream of 2 documents at a time, live. The documents are each plain text files of English words. One of them belongs to class A, and one of them belongs to class B. This is true every time, i.e. I'm never given two documents in class A, or two in class B, or two in neither class.

I'm also given access to an oracle. Once I assign both documents, the oracle will tell me whether I was correct, or not. From that I should be able to incrementally "learn" how to distinguish the two classes of documents.

This could be turned into a standard offline ML problem--make random guesses for N iterations to gather training data and then train a classifier offline. If it's supervised, I can even do feature engineering. Then I evaluate.

However, let's suppose that there is some cost C associated with each incorrect classification. It could be I literally get money taken out of my bank account, or otherwise some penalty. A naive batch offline training would be very costly, since if I guess randomly, I should pay the cost, C, half the time.

I know that the algorithm's very 1st guess has to be random--there's no real prior unless I as a human decide to make the 1st classifications manually.

After that, it should begin incorporating the ground truth from the oracle, and incrementally update itself to reflect what class A and class B really are.

Please advise.

madaco
Posts: 149
Joined: Sat Feb 13, 2010 11:25 pm UTC

Re: Machine Learning Algorithm with Oracle

Postby madaco » Mon Feb 29, 2016 4:26 pm UTC

Hm,

Ok so I don't think I can provide an answer, but I might be able to provide questions that the answers might help someone answer?

Is C large or small?

Is the goal to minimize the cost, with the requirement that you get a classifier to work, or do you get a reward for each accurate classification, or something else?

Basically, if there is a cost associated with classifying wrong, why (in the same abstract way as you describe the cost of being wrong) is it worthwhile to attempt to make any classifications?

Is it just necessary to attempt to classify each pair?
I found my old forum signature to be awkward, so I'm changing it to this until I pick a better one.


Return to “Computer Science”

Who is online

Users browsing this forum: No registered users and 4 guests