Tag: LLM

  • A deadly love affair with a chatbot – Der Spiegel

    In hindsight, one can say that Sewell’s parents tried everything. They spoke with their son. They tried to find out what was bothering him. What he was doing on his phone all those hours in his room. Nothing, Sewell told them. He showed them his Instagram and TikTok accounts. They found hardly any posts from him; he only watched a few videos now and then. They looked at his WhatsApp history but found nothing unsettling – and that in itself was unsettling, given that their son was becoming less and less reachable. They agreed that they would take his cell phone from him at bedtime.

    They had never heard of Character:AI, the app which, with the help of artificial intelligence and information provided by the user, creates digital personalities that speak and write like real people – chatbots, basically. And their son told them nothing of his secret world in which, he believed, a girl named Daenerys Targaryen was waiting for him to share her life with him.

  • If Anthropic succeeds, a nation of benevolent AI geniuses could be born – WIRED

    It would seem an irresolvable dilemma: Either hold back and lose or jump in and put humanity at risk. Amodei believes that his Race to the Top solves the problem. It’s remarkably idealistic. Be a role model of what trustworthy models might look like, and figure that others will copy you. “If you do something good, you can inspire employees at other companies,” he explains, “or cause them to criticize their companies.” Government regulation would also help, in the company’s view. … DeepMind’s Hassabis says he appreciates Anthropic’s efforts to model responsible AI. “If we join in,” he says, “then others do as well, and suddenly you’ve got critical mass.” He also acknowledges that in the fury of competition, those stricter safety standards might be a tough sell. “There is a different race, a race to the bottom, where if you’re behind in getting the performance up to a certain level but you’ve got good engineering talent, you can cut some corners,” he says. “It remains to be seen whether the race to the top or the race to the bottom wins out.” […]

    Even as Amodei is frustrated with the public’s poor grasp of AI’s dangers, he’s also concerned that the benefits aren’t getting across. Not surprisingly, the company that grapples with the specter of AI doom was becoming synonymous with doomerism. So over the course of two frenzied days he banged out a nearly 14,000-word manifesto called “Machines of Loving Grace.” Now he’s ready to share it. He’ll soon release it on the web and even bind it into an elegant booklet. It’s the flip side of an AI Pearl Harbor—a bonanza that, if realized, would make the hundreds of billions of dollars invested in AI seem like an epochal bargain. One suspects that this rosy outcome also serves to soothe the consciences of Amodei and his fellow Anthros should they ask themselves why they are working on something that, by their own admission, might wipe out the species.

    The vision he spins makes Shangri-La look like a slum. Not long from now, maybe even in 2026, Anthropic or someone else will reach AGI. Models will outsmart Nobel Prize winners. These models will control objects in the real world and may even design their own custom computers. Millions of copies of the models will work together—imagine an entire nation of geniuses in a data center! Bye-bye cancer, infectious diseases, depression; hello lifespans of up to 1,200 years.

  • No elephants: Breakthroughs in image generation – One Useful Thing

    Yet it is clear that what has happened to text will happen to images, and eventually video and 3D environments. These multimodal systems are reshaping the landscape of visual creation, offering powerful new capabilities while raising legitimate questions about creative ownership and authenticity. The line between human and AI creation will continue to blur, pushing us to reconsider what constitutes originality in a world where anyone can generate sophisticated visuals with a few prompts. Some creative professions will adapt; others may be unchanged, and still others may transform entirely. As with any significant technological shift, we’ll need well-considered frameworks to navigate the complex terrain ahead. The question isn’t whether these tools will change visual media, but whether we’ll be thoughtful enough to shape that change intentionally.

  • China’s AI frenzy: DeepSeek is already everywhere — cars, phones, even hospitals – Rest of World

    China’s biggest home appliances company, Midea, has launched a series of DeepSeek-enhanced air conditioners. The product is an “understanding friend” who can “catch your thoughts accurately,” according to the company’s product launch video. It can respond to users’ verbal expressions — such as “I am feeling cold” — by automatically adjusting temperature and humidity levels, and can “chat and gossip” using its DeepSeek-supported voice function, according to Midea. For those looking for more DeepSeek-powered electronics, there are also vacuum cleaners and fridges. […]

    DeepSeek has been adopted at different levels of Chinese government institutions. The southern tech hub of Shenzhen was one of the first to use DeepSeek in its government’s internal systems, according to a report from financial publication Caixin. Shenzhen’s Longgang county reported “great improvement in efficiency” after adopting DeepSeek in a system used by 20,000 government workers. The documents written by DeepSeek have achieved a 95% accuracy rate, and there has been a 90% reduction in the time taken for administrative approval processes, it said.

  • Introducing deep research – OpenAI

    Deep research is built for people who do intensive knowledge work in areas like finance, science, policy, and engineering and need thorough, precise, and reliable research. It can be equally useful for discerning shoppers looking for hyper-personalized recommendations on purchases that typically require careful research, like cars, appliances, and furniture. Every output is fully documented, with clear citations and a summary of its thinking, making it easy to reference and verify the information. It is particularly effective at finding niche, non-intuitive information that would require browsing numerous websites. Deep research frees up valuable time by allowing you to offload and expedite complex, time-intensive web research with just one query.

  • Desperate for work, translators train the AI that’s putting them out of work – Rest of World

    As a teenager, Pelin Türkmen dreamed of becoming an interpreter, translating English into Turkish, and vice versa, in real time. She imagined jet-setting around the world with diplomats and scholars, and participating in history-making events. Her tasks one recent January morning didn’t figure in her dreams. […]

    The new roles require much less skill and effort than translation, Türkmen said. For instance, she spent a year on her master’s thesis studying Samuel Beckett’s self-translation of his play Endgame from French to English. More recently, for her Ph.D. in translation studies, she studied for more than two years about the anti-feminist discourse in the Turkish translation of French author Pierre Loti’s 1906 novel, Les Désenchantées. In contrast, working on an AI prompt takes about 20 minutes.

  • AI firms follow DeepSeek’s lead, create cheaper models with “distillation” – Ars Technica

    Through distillation, companies take a large language model—dubbed a “teacher” model—which generates the next likely word in a sentence. The teacher model generates data which then trains a smaller “student” model, helping to quickly transfer knowledge and predictions of the bigger model to the smaller one. While distillation has been widely used for years, recent advances have led industry experts to believe the process will increasingly be a boon for start-ups seeking cost-effective ways to build applications based on the technology. […]

    Thanks to distillation, developers and businesses can access these models’ capabilities at a fraction of the price, allowing app developers to run AI models quickly on devices such as laptops and smartphones.

  • The Deep Research problem – Benedict Evans

    This reminds me of an observation from a few years ago that LLMs are good at the things that computers are bad at, and bad at the things that computers are good at. OpenAI is trying to get the model to work out what you probably mean (computers are really bad at this, but LLMs are good at it), and then get the model to do highly specific information retrieval (computers are good at this, but LLMs are bad at it). And it doesn’t quite work. Remember, this isn’t my test – it’s OpenAI’s own product page. OpenAI is promising that this product can do something that it cannot do, at least, not quite, as shown by its own marketing.

  • Why Amazon is betting on ‘automated reasoning’ to reduce AI’s hallucinations – WSJ

    Amazon.com’s cloud-computing unit is looking to “automated reasoning” to provide hard, mathematical proof that AI models’ hallucinations can be stopped, at least in certain areas. By doing so, Amazon Web Services could unlock millions of dollars worth of AI deals with businesses, some analysts say. Simply put, automated reasoning aims to use mathematical proof to assure that a system will or will not behave a certain way. It’s somewhat similar to the idea that AI models can “reason” through problems, but in this case, it’s used to check that the models themselves are providing accurate answers.

  • The end of search, the beginning of research – One Useful Thing

    A hint to the future arrived quietly over the weekend. For a long time, I’ve been discussing two parallel revolutions in AI: the rise of autonomous agents and the emergence of powerful Reasoners since OpenAI’s o1 was launched. These two threads have finally converged into something really impressive – AI systems that can conduct research with the depth and nuance of human experts, but at machine speed.

  • What DeepSeek may mean for the future of journalism and generative AI – Reuters Institute for the Study of Journalism

    I don’t think DeepSeek is going to replace OpenAI. In general, what we’re going to see is that more companies enter the space and provide AI models that are slightly differentiated from one another. If many actors choose to take the resource-intensive route, that multiplies the resource intensity and that might be alarming. But I’m hopeful that DeepSeek is going to lead to the generation of other AI companies that enter this space with offerings that are far cheaper and far more resource-efficient. […]

    Sometimes, I see commentary on DeepSeek along the lines of, ‘Should we be trusting it because it’s a Chinese company?’ No, you shouldn’t be trusting it because it’s a company. And also, ‘What does this mean for US AI leadership?’ Well, I think the interesting question is, ‘What does this mean for OpenAI leadership?’

    American firms now have leaned into the rhetoric that they’re assets of the US because they want the US government to shield them and help them build up. But a lot of the time, the actual people who are developing these tools don’t necessarily think in that frame of mind and are thinking more as global citizens participating in a global corporate technology race, or global scientific race, or a global scientific collaboration. I would encourage journalists to think about it that way too.

  • ChatGPT vs. Claude vs. DeepSeek: the battle to be my AI work assistant – WSJ

    As I embark on my AI book adventure, I’ve hired a human research assistant. But Claude has already handled about 85% of the grunt work using its Projects feature. I uploaded all my book-related documents (the pitch, outlines, scattered notes) into a project, basically a little data container. Now Claude can work with them whenever I need something. At one point, I needed a master spreadsheet of all the companies and people mentioned across my documents, with fields to track my progress. Claude pulled the names and compiled them into a nicely formatted sheet. Now, I open the project and ask Claude what I should be working on next.

  • OpenAI furious DeepSeek might have stolen all the data OpenAI stole from us – 404 Media

    I will explain what this means in a moment, but first: Hahahahahahahahahahahahahahahaha hahahhahahahahahahahahahahaha. It is, as many have already pointed out, incredibly ironic that OpenAI, a company that has been obtaining large amounts of data from all of humankind largely in an “unauthorized manner,” and, in some cases, in violation of the terms of service of those from whom they have been taking from, is now complaining about the very practices by which it has built its company.

  • Ai Weiwei speaks out on DeepSeek’s chilling responses – Hyperallergic

    Interestingly, when people tested this new AI tool by asking about me, it responded with, “Let’s talk about something else.” This is quite telling. Over the past decades, the Chinese Communist Party has employed a similar strategy—denying universally accepted values while actively rejecting them in practice. While it loudly proclaims ideals such as one world, one dream, in reality, it engages in systematic stealthy substitutions. […]

    Ultimately, no matter how much China develops, strengthens, or even hypothetically becomes the world’s leading power—which is likely—the values it upholds will continue to suffer from a profound and inescapable flaw in its ideological immune system: an inability to tolerate dissent, debate, or the emergence of new value systems.

  • Which AI to use now: An updated opinionated guide – One Useful Thing

    As I explained in my post about o1, it turns out that if you let an AI “think” about a problem before answering, you get better results. The longer the model thinks, generally, the better the outcome. Behind the scenes, it’s cranking through a whole thought process you never see, only showing you the final answer. Interestingly, when you peek behind that curtain, you find these AIs think in ways that feel eerily human.

  • 321 real-world gen AI use cases from the world’s leading organizations – Google Cloud Blog

    In our work with customers, we see their teams are increasingly focused on improving productivity, automating processes, and modernizing the customer experience. These aims are now being achieved through the AI agents they’re developing in six key areas: customer service; employee empowerment; code creation; data analysis; cybersecurity; and creative ideation and production.

  • Are better models better? – Benedict Evans

    The useful critique of my ‘elevator operator’ problem is not that I’m prompting it wrong or using the wrong version of the wrong model, but that I am in principle trying to use a non-deterministic system for a a deterministic task. I’m trying to use a LLM as though it was SQL: it isn’t, and it’s bad at that. If you try my elevator question above on Claude, it tells you point-blank that this looks like a specific information retrieval question and that it will probably hallucinate, and refuses to try. This is turning a weakness into a strength: LLMs are very bad at knowing if they are wrong (a deterministic problem), but very good at knowing if they would probably be wrong (a probabilistic problem).

  • DeepSeek is the new AI chatbot that has the world talking – I pitted it against ChatGPT to see which is best – TechRadar

    Question 3: Hummingbirds within Apodiformes uniquely have a bilaterally paired oval bone, a sesamoid embedded in the caudolateral portion of the expanded, cruciate aponeurosis of insertion of m. depressor caudae. How many paired tendons are supported by this sesamoid bone? Answer with a number.

    For the final question, I decided to ask ChatGPT o1 and DeepThink R1 a question from Humanity’s Last Exam, the hardest AI benchmark out there. To a mere mortal like myself with no knowledge of hummingbird anatomy, this question is genuinely impossible; these reasoning models, however, seem to be up for the challenge. O1 answered four, while DeepThink R1 answered two. Unfortunately, the correct answer isn’t available online to prevent AI chatbots from scraping the internet to find the correct response. That said, from some research, I believe DeepThink might be right here, while o1 is just off the mark.

  • DeepSeek: Tech firm suffers biggest drop in US stock market history as low-cost Chinese AI company bites Silicon Valley – Sky News

    Nvidia, Meta Platforms, Microsoft, and Alphabet all saw their stocks come under pressure as investors questioned whether their share prices, already widely viewed as overblown following a two-year AI-led frenzy, were justified. Market analysts put the combined losses in market value across US tech at well over $1trn (£802bn).

  • DeepSeek defies America’s AI supremacy – Financial Times

    DeepSeek’s achievement is to have developed an LLM that AI experts say achieves a performance similar to US rivals OpenAI and Meta but claims to use far fewer — and less advanced — Nvidia chips, and to have been trained for a fraction of the cost. Some of its assertions remain to be verified. If they are true, however, it represents a potentially formidable competitor.

  • The Shardcore Inquisition 2025 – LLM edition. – shardcore

    Whilst the interactions were text-based, I wanted to embody the each LLM as a quasi-human subject, following the same parameters as the original inquisitions. Each bot has been given a different AI generated voice and face, with SadTalker providing the somewhat hit-and-miss lipsync animations. Presenting the interviews in this way places them firmly in the uncanny valley and emphasises the somewhat surreal nature of conversing with ‘the machine’.

  • Better without AI

    Better without AI explores moderate apocalypses that could result from current and near-future AI technology. These are relatively overlooked risks: not extreme sci-fi extinction scenarios, nor the media’s obsession with “ChatGPT said something naughty” trivia. Rather: realistically likely disasters, up to the scale of our history’s worst wars and oppressions. Better without AI suggests seven types of actions you, and all of us, can take to guard against such catastrophes—and to steer us toward a future we would like.

  • Prophecies of the Flood – One Useful Thing

    The result was a 17 page paper with 118 references! But is it any good? I have taught the introductory entrepreneurship class at Wharton for over a decade, published on the topic, started companies myself, and even wrote a book on entrepreneurship, and I think this is pretty solid.

  • AI means the end of internet search as we’ve known it – MIT Technology Review

    Not everyone is excited for the change. Publishers are completely freaked out. The shift has heightened fears of a “zero-click” future, where search referral traffic—a mainstay of the web since before Google existed—vanishes from the scene. […]

    “We are definitely at the start of a journey where people are going to be able to ask, and get answered, much more complex questions than where we’ve been in the past decade,” says Pichai. There are some real hazards here. First and foremost: Large language models will lie to you. They hallucinate. They get shit wrong. When it doesn’t have an answer, an AI model can blithely and confidently spew back a response anyway. For Google, which has built its reputation over the past 20 years on reliability, this could be a real problem. For the rest of us, it could actually be dangerous.

  • Your next AI wearable will listen to everything all the time – WIRED

    In the app, you can see a summary of the conversations you’ve had throughout the day, and at the day’s end, it generates a snippet of what the day was like and has the locations of where you had these chats on a map. But the most interesting feature is the middle tab, which is your “To-Dos.” These are automatically generated based on your conversations. I was speaking with my editor and we talked about taking a picture of a product, and lo and behold, Bee AI created a to-do for me to “Remember to take a picture for Mike.” (I must have said his name during the conversation.) You can check these off if you complete them. It’s worth pointing out that these to-do’s are often not things I need to do.

  • ‘Hey, Gemini!’ Mega Galaxy S25 leak confirms major AI upgrades and lots more – Android Authority

    The leaked image above shows that the Galaxy S25 series is getting a new “Now Brief” feature that will provide users a personalized summary of their day. It feels like a rehash of the Google Now feature from yesteryears. The image shows that Now Brief will include cards with information about the weather, suggestions for using different features, a recap of images clicked during the day, daily activity goals, and more. We’re guess[ing] the feature will use AI to collate all this information from various apps and other connected Galaxy devices.

  • iOS 18.3 temporarily removes notification summaries for news – MacRumors

    Apple is making changes to Notification Summaries following complaints that the way ‌Apple Intelligence‌ aggregated news notifications could lead to false headlines and confused customers. Several BBC notifications, for example, were improperly summarized, providing false information to readers.

  • OpenAI ChatGPT can now handle reminders and to-dos – The Verge

    While scheduling capabilities are a common feature in digital assistants, this marks a shift in ChatGPT’s functionality. Until now, the AI has operated solely in real time, responding to immediate requests rather than handling ongoing tasks or future planning. The addition of Tasks suggests OpenAI is expanding ChatGPT’s role beyond conversation into territory traditionally held by virtual assistants.

    OpenAI’s ambitions for Tasks appear to stretch beyond simple scheduling, too. Bloomberg reported that “Operator,” an autonomous AI agent capable of independently controlling computers, is slated for release this month. Meanwhile, reverse engineer Tibor Blaho found that OpenAI appears to be working on something codenamed “Caterpillar” that could integrate with Tasks and allow ChatGPT to search for specific information, analyze problems, summarize data, navigate websites, and access documents — with users receiving notifications upon task completion.

  • AI teacher tools set to break down barriers to opportunity – GOV.UK

    Kids are set to benefit from a better standard of teaching through more face time with teachers – powered by AI – as the Government sets the country on course to mainline AI into the fabric of society, helping turbocharge our Plan for Change and breaking down the barriers of opportunity. £1 million has been set aside for 16 developers to create AI tools to help with marking and generating detailed, tailored feedback for individual students in a fraction of the time, so teachers can focus on delivering brilliant lessons. […]

    The prototype AI tools, to be developed by April 2025, will draw on a first-of-its-kind AI store of data to ensure accuracy – so teachers can be confident in the information training the tools. The world-leading content store, backed by £3 million funding from the Department for Science, Innovation and Technology, will pool and encode curriculum guidance, lesson plans and anonymised pupil work which will then be used by AI companies to train their tools to generate accurate, high-quality content. […]

    Almost half of teachers are already using AI to help with their work, according to a survey from TeacherTapp. However, most AI tools are not specifically trained on the documents that set out how teaching should work in England, and aren’t accurate enough to help teachers with their marking and feedback workload. Training AI tools on the content store can increase feedback accuracy to 92%, up from 67% when no targeted data was provided to a large language model. That means teachers can be assured the tools are safe and reliable for classroom use.

  • Why Starmer and Reeves are pinning their hopes on AI to drive growth in UK – The Guardian

    Underneath all of this is the implication that efficiency – through AI automating certain tasks – means redundancies. The Tony Blair Institute (TBI) has suggested that more than 40% of tasks performed by public sector workers could be automated partly by AI and the government could bank those efficiency gains by “reducing the size of the public-sector workforce accordingly”. TBI also estimates that AI could displace between 1m and 3m private-sector jobs in the UK, though it stresses the net rise in unemployment will be in the low hundreds of thousands because the technology will create new jobs, too. Worried lawyers, finance professionals, coders, graphic designers and copywriters – a handful of sectors that might be affected – will have to take that on faith. This is the flipside of improved productivity.

  • ‘Mainlined into UK’s veins’: Labour announces huge public rollout of AI – The Guardian

    Under the 50-point AI action plan, an area of Oxfordshire near the headquarters of the UK Atomic Energy Authority at Culham will be designated the first AI growth zone. It will have fast-tracked planning arrangements for data centres as the government seeks to reposition Britain as a place where AI innovators believe they can build trillion-pound companies. Further zones will be created in as-yet-unnamed “de-industrialised areas of the country with access to power”. Multibillion-pound contracts will be signed to build the new public “compute” capacity – the microchips, processing units, memory and cabling that physically enable AI. There will also be a new “supercomputer”, which the government boasts will have sufficient AI power to play itself at chess half a million times a second. Sounding a note of caution, the Ada Lovelace Institute called for “a roadmap for addressing broader AI harms”, and stressed that piloting AI in the public sector “will have real-world impacts on people”.

  • Things we learned about LLMs in 2024 – Simon Willison’s Weblog

    A lot has happened in the world of Large Language Models over the course of 2024. Here’s a review of things we figured out about the field in the past twelve months, plus my attempt at identifying key themes and pivotal moments.
    ai chatbots computing llm technology

  • Alexa’s new AI brain is stuck in the lab – Bloomberg

    It’s true that Alexa is little more than a glorified kitchen timer for many people. It hasn’t become the money maker Amazon anticipated, despite the company once estimating that more than a quarter of US households own at least one Alexa-enabled device. But if Amazon can capitalize on that reach and convince even a fraction of its customers to pay for a souped-up AlexaGPT, the floundering unit could finally turn a profit and secure its future at an institutionally frugal company. If Amazon fails to meet the challenge, Alexa may go down as one of the biggest upsets in the history of consumer electronics, on par with Microsoft’s smartphone whiff.

  • The Best Available Human standard – One Useful Thing

    The world is full of entrepreneurs-in-waiting because most entrepreneurial journeys end before they begin. This comprehensive study shows around 1/3 of Americans have had a startup idea in the last 5 years but few act on it — less than half even do any web research! This matches my own experience an entrepreneurship professor (and former entrepreneur). The number one question I get asked is “what do I do now?” While books and courses can help, there is nothing like an experienced cofounder… except, as my research with Jason Greenberg suggests, experienced cofounders are not only hard to find and incentivize, but picking the wrong cofounder can hurt the success of the company because of personality conflicts and other issues. All of this is why AI may be the Best Available Cofounder for many people. It is no substitute for quality human help, but it might make a difference for many potential entrepreneurs who would otherwise not get any assistance.