New OpenAI Model 'Imminent' and AI Stakes Get Raised (plus Med Gemini, GPT 2 Chatbot and Scale AI)

96,743

4,644 0

Publicado 2024-05-02

Altman ‘knows the release date’, Politico calls it ‘imminent’ according to Insiders, and then the mystery GPT-2 chatbot [made by the phi team at Microsoft] causes mass confusion and hysteria. I break it all down and cover two papers – MedGemini and Scale AI Contamination – released in the last 24 hours. I’ve read them in full and they might be more important than all the rest. Let’s hope life wins over death in the deployment of AI.

AI Insiders: www.patreon.com/AIExplained

Politico Article: www.politico.eu/article/rishi-sunak-ai-testing-tec…
Sam Altman Talk:    • The Possibilities of AI [Entire Talk]...
MIT Interview: www.technologyreview.com/2024/05/01/1091979/sam-al…
Logan Kilpatrick Tweet: twitter.com/OfficialLoganK/status/1785834464804794…
Bubeck Response: twitter.com/SebastienBubeck/status/178588878748429…
GPT2: twitter.com/sama/status/1785107943664566556
Where it used to be hosted: arena.lmsys.org/
Unicorns?; twitter.com/phill__1/status/1784969111430103494
No Unicorns: twitter.com/suchenzang/status/1785159370512421201
GPT2 chatbot logic fail: twitter.com/VictorTaelin/status/178536773615717585…
And language fails: twitter.com/gblazex/status/1785101624475537813
James Betker Blog: nonint.com/2023/06/10/the-it-in-ai-models-is-the-d…
Scale AI Benchmark Paper: arxiv.org/pdf/2405.00332
Dwarkesh Zuckerberg Interview:    • Mark Zuckerberg - Llama 3, $10B Model...
Lavander Misuse: www.972mag.com/lavender-ai-israeli-army-gaza/
Autonomous Tank: www.techspot.com/news/102769-darpa-unleashes-20-fo…
Claude 3 GPQA: www.anthropic.com/news/claude-3-family
Med Gemini: arxiv.org/pdf/2404.18416
Medical Mistakes: www.cnbc.com/2018/02/22/medical-errors-third-leadi…
MedPrompt Microsoft: www.microsoft.com/en-us/research/blog/the-power-of…
My Benchmark Flaws Tweet: twitter.com/AIExplainedYT/status/17827162496396700…
My Stargate Video:    • Why Does OpenAI Need a 'Stargate' Sup...
My GPT-5 Video:    • GPT-5: Everything You Need to Know So...

Non-hype Newsletter: signaltonoise.beehiiv.com/

AI Insiders: www.patreon.com/AIExplained

Todos los comentarios (21)

@mlaine83 hace 14 días

By far this is the best AI news roundup channel on the tubes. Never clickbaity, always interesting and so much info.
@C4rb0neum hace 14 días

I have had such bad diagnosis experiences that I would happily take an AI diagnosis. Especially if it’s the “pre” diagnosis that nurses typically have to do in about 8 seconds
@KyriosHeptagrammaton hace 14 días

In Claude's defence, I did Calculus in college and got high grades, but I suck at addition. Sometimes feels like they are mutually exclusive.
@SamJamCooper hace 13 días

A note of caution regarding LLM diagnoses and medical errors: most avoidable deaths come not from misdiagnoses (although there are still some which these models could help with), but from problems of communication between clinicians, different departments, and support staff in the medical field - not misdiagnoses. That's certainly something I see AI being able to help with now and in the future, but the medical reality is far more complex than a 1:1 relationship between misdiagnoses and avoidable deaths.
@jeff__w hace 14 días

12:33 “So my question is: this why are models like Claude 3 Opus still getting any of these questions wrong? Remember they're scoring around 60% in graduate-level expert reasoning the GP QA. If Claude 3 Opus for example can get questions right that PhDs struggle to get right with Google and 30 minutes, why on Earth with five short examples can they not get these basic high school questions right?” My completely lay, non-computer-science intuition is this: (1) as you mention in the video, these models are optimized for benchmark questions and not just any old, regular questions and , more importantly, (2) there’s a bit of a category error going on: these models are not doing “graduate-level expert reasoning”—they’re emulating the verbal behavior that people exhibit when they (people) solve problems like these. There’s some kind of disjunction going on there—and the computer science discourse, which is, obviously, apart from behavioral science, is conflating the two. Again, to beat a dead horse somewhat, I tested my “pill question”* (my version of your handcrafted questions) in the LMSYS Chatbot Arena (92 models, apparently) probably 50 times at least, and got the right answer exactly twice—and the rest of the time the answers were wrong numbers (even from the models that managed to answer correctly), nonsensical (e.g., 200%), or something along the lines of “It can’t be determined.” These models are not reasoning—they’re doing something that only looks like reasoning. That’s not a disparagement—it’s still incredibly impressive. It’s just what’s going on. * Paraphrased roughly: what proportion of a whole bottle of pills do I have to cut in half to get an equal number of whole and half pills?
@josh0n hace 14 días

BRAVO for including "Lavender" and the autonomous tank as a negative examples of AI. It is important to call this stuff out.
@DynamicUnreal hace 14 días

As a person failed by the American medical system who is currently living with an undiagnosed neurological illness — I hope that a good enough A.I. will SOON replace doctors when it comes to medical diagnosis. If it wasn’t for GPT-4, who knows how much sicker I would be.
@trentondambrowitz1746 hace 14 días

Hey, I’m in this one! Great job as always, although I’m becoming increasingly frustrated with how you somehow find news I haven’t seen… Very much looking forward to what OpenAI have been cooking, and I agree that there are ethical issues with restricting access to a model that can greatly benefit humanity. May will be exciting!
@Zirrad1 hace 13 días

It is useful to note how inconsistent human medical diagnosis is. A read of Kahneman’s book “Noise” is a prerequisite to appreciating just how poor human judgment can be and how difficult it is, for social, psychological, and political reasons to improve the situation. The consistency of algorithmic approaches is key to reducing noise and detecting and correcting bias which carries forward and improves with iteration.
@wiltedblackrose hace 14 días

I am SO GLAD that finally someone with reach has said out loud what I've been thinking for the longest time. For me these models are still not properly intelligent, because despite having amazing "talents°, the things they fail at betray them. It's almost like they only become really, really good at learning facts and the syntax of reasoning, but don't actually pick up the conceptual relationship between things. As a university student I always have to think about what we would say about someone who can talk perfectly about complex abstract concepts, but fails to solve or answer simpler questions that underlay those more complex ones. We would call that person a fraud. But somehow if it's an LLM, we close an eye (or two). As always, the best channel in AI. The best critical thinker in the space.
@RaitisPetrovs-nb9kz hace 14 días

Could it be that GPT2 was tested for potential Apple offer
@Madlintelf hace 12 días

Towards the end you state that it might be unethical to not use the models, that really hits home. I've worked in healthcare for 20+ years, that level of accuracy coming from a LLM would be welcome so much. I think the summarizing of notes will definitely be the hook that grabs the majority of healthcare professionals. Thanks again!
@jvlbme hace 14 días

I really think we DO want surprise and awe with every release.
@juliankohler5086 hace 14 días

Loved to see the community meetings. What a great way to use your influence bringing people together instead of dividing them. "Ethical Influencers" might just have become a thing.
@colinharter4094 hace 14 días

I love that even though you're the person I go to for measured AI commentary, you always open your videos, and rightfully so, with something to the effect of "it's been a wild 48 hours. let me tell you"
@Xengard hace 14 días

the question is once these medical models are released, how long will it take for medics to implement and use them?
@Olack87 hace 14 días

Man, your videos always brighten my day. Such excellent and informative material.
@mikey1836 hace 14 días

Lobbyists are already trying to use “ethically risky” as an excuse to delay releasing AI that performs well at their jobs. The early Chat GPT 4 allowed therapy and legal advice, but later on they tried to stop it, claiming safety concerns, but that’s BS.
@canadiannomad2330 hace 14 días

Ah yes, it makes sense for a US based company to give early access to closely held technologies to spooks on the other side of the pond. It totally aligns with their interests...
@muffiincodes hace 14 días

Your point about the ethics of not releasing a medical chatbot which is better than doctors relies on us having a good way of measuring the true impact of these models in the real world. As far as I can see as long as there is a lack of reliable independent evaluations which takes into account the potential of increasing health inequalities or harming marginalised communities we are not there yet. The UK AI Safety Institute has not achieved company compliance and has no enforcement mechanism so that doesn’t even come close. The truth is we simply do not have the social infrastructure to evaluate the human impacts of these models.