Agregátor RSS

AI is ready to take over Python programming, but not much else

Computerworld.com [Hacking News] - 13 Květen, 2026 - 04:39

Tests of how well 19 large language models (LLMs) complete and perform complicated multi-step tasks has shown that they are both error-prone and, in many cases, unreliable.

The findings are contained in a preprint paper, LLMs Corrupt Your Documents When You Delegate, written by Microsoft researchers  Philippe Laban, Tobias Schnabel and Jennifer Neville based on a benchmark they created called DELEGATE-52 that allowed them to simulate workflows that might be part of a knowledge worker’s tasks. The paper is currently under review.

They said that the benchmark contains 310 work environments across 52 professional domains including coding, crystallography, genealogy and music sheet notation. Each environment consists of real documents totaling around 15K tokens in length, and five to 10 complex editing tasks that a user might ask an LLM to perform.

And, they stated in the paper’s abstract: “Our analysis shows that current LLMs are unreliable delegates: they introduce sparse but severe errors that silently corrupt documents, compounding over long interaction.”

Those mistakes are significant, they said. “The findings show that current LLMs introduce substantial errors when editing work documents, with frontier models (Gemini 3.1 Pro, Claude 4.6 Opus, and GPT 5.4) losing an average 25% of document content over 20 delegated interactions, and an average degradation across all models of 50%.”

Benchmark exercise receives a thumbs up

Brian Jackson, principal research director at Info-Tech Research Group, found the findings very interesting. “Putting a list of LLMs to the test across different work domains yields a lot of useful insights,” he said. “I think this type of benchmark exercise could be helpful to enterprise developers who are looking to leverage agentic AI to automate specific workflows and understand the limits of what can be achieved.”

However, he said, “what we shouldn’t conclude from this is that, because these foundation models caused document degradation after 20 edits, they can’t be used to automate work in a certain field. It just means they can’t do all of the work as they are currently constructed.”

But, Jackson stated, “in an enterprise environment where having an accurate output is crucial, you wouldn’t take that approach. You would design the automation flow with stronger guardrails in place to prevent errors. This could be done by using multiple agents that play different roles, such as one that makes the edits and another that checks for errors and makes corrections.”

Sanchit Vir Gogia, chief analyst at Greyhound Research, said, “the Microsoft paper should be read as a serious warning about delegated AI, not as a claim that enterprise AI has failed. That distinction matters. The paper is still a preprint, so it deserves careful handling, but its central question is exactly the one CIOs should be asking: can AI preserve the integrity of complex work over repeated delegation?”

The study, he said, is stronger than what he described as “the usual AI benchmark theatre,” because it tests work products, not just looking at clever one-off answers. “It uses reversible editing tasks, domain-specific evaluators, and a round-trip method to see whether a document returns intact after repeated edits. In too many cases, it does not.”

That is the point, explained Gogia. “This is not merely about hallucinations. It is about artefact integrity.”

AI is ‘not yet trustworthy enough’

He added that the headline finding is “uncomfortable: even the strongest models corrupt about a quarter of document content by the end of long workflows, while average degradation across all tested models reaches roughly 50%. The paper also finds that performance varies sharply by domain. Python is the only domain where most models are ‘ready,’ and the best model reaches that threshold in only 11 of 52 domains.”

AI is not failing because it cannot write, said Gogia, it is failing because it cannot yet preserve.

The study, he pointed out, “is especially useful because it shows how errors accumulate. Bigger documents worsen outcomes. Longer interaction worsens outcomes. Distractor files worsen outcomes. Short tests flatter the system, while longer workflows expose it. That maps rather neatly to the enterprise world, where work is messy, files are stale, context is noisy and the most important documents are rarely the simplest ones.”

The honest conclusion, he said, “is not that AI should be kept out of enterprise workflows. It is that delegated AI is not yet trustworthy enough to be left alone with consequential artefacts.”

When AI edits an important document such as a contract, a ledger, a policy, a codebase, a board paper, or a compliance record, Gogia warned, the enterprise still owns the damage.

Mitigation approaches

In order to prevent that damage, Jackson suggested, enterprises can do additional training and fine-tuning of models to be better adapted to their specific workflows: “These foundation models are very good at doing a lot of different tasks, but less good at doing one specific task very well. So, enterprises that want to achieve that may need to improve the models themselves by training on their own data.”

For example, “[the Microsoft paper] points out one multi-agent setup that led to more degradation instead of less, so the method to detect degradation must be well-designed to be effective,” he said. “Another approach that some enterprise platforms have introduced is a way to deterministically verify the output for accuracy using mathematical verification. So, knowing what domains prove more difficult for a single LLM to automate is useful, as developers can plan to add more verification steps to the process.”

He said, “depending on the model, for example, if it’s totally open source or if it’s proprietary, you can have more flexibility in terms of how much you can customize it. So, an enterprise developer might look at these results, pick the LLM best at automating their desired domain, and then send it in for additional training to master the process.”

People do not disappear

According to Gogia, the paper also shows something more precise than ‘AI still needs people.’ “It shows that AI changes the human layer from production to supervision, validation, and accountability. That is a rather different operating model from the one being sold in many boardroom conversations.”

People, he said, “do not disappear. Their work moves. This is the uncomfortable part for enterprises chasing headcount reduction. The people best placed to catch AI errors are often the same people organizations are hoping to replace, reduce, or redeploy. Remove too much domain expertise from the workflow, and the enterprise also removes the people who know when the AI has quietly damaged the work.”

Expertise becomes more valuable, not less, said Gogia: “The paper reinforces this because stronger models do not merely delete content. They often corrupt it. Weaker models are easier to catch when they visibly drop material. Frontier models are more awkward because the content remains present but becomes wrong, distorted, or subtly altered. That requires knowledgeable review, not casual inspection.”

This article originally appeared on CIO.com.

Kategorie: Hacking & Security

scrcpy 4.0

AbcLinuxu [zprávičky] - 13 Květen, 2026 - 04:13
Multiplatformní open source aplikace scrcpy (Wikipedie) pro zrcadlení připojeného zařízení se systémem Android na desktopu a umožňující ovládání tohoto zařízení z desktopu, byla vydána v nové verzi 4.0.
Kategorie: GNU/Linux & BSD

[webapps] Ninja Forms Uploads - Unauthenticated PHP File Upload

The Exploit Database - 13 Květen, 2026 - 02:00
Ninja Forms Uploads - Unauthenticated PHP File Upload

[webapps] glances 4.5.2 - command injection

The Exploit Database - 13 Květen, 2026 - 02:00
glances 4.5.2 - command injection

[webapps] coreruleset 4.21.0 - Firewall Bypass

The Exploit Database - 13 Květen, 2026 - 02:00
coreruleset 4.21.0 - Firewall Bypass

[webapps] Flowise < 3.0.5 - Missing Authentication for Critical Function

The Exploit Database - 13 Květen, 2026 - 02:00
Flowise < 3.0.5 - Missing Authentication for Critical Function

Doozy of a Patch Tuesday includes 30 critical Microsoft CVEs

The Register - Anti-Virus - 13 Květen, 2026 - 01:51
Microsoft released fixes for 137 CVEs on Tuesday, none of which are known to have been targeted by attackers. But the news is not all good as Redmond rated a whopping 30 flaws as critical, with 14 earning a 9.0 or higher CVSS severity rating, including one perfect 10. Plus, everyone who celebrates the monthly patchapalooza event received validation for what we all widely suspected last month: Yes, Redmond (and everyone else, for that matter) is using AI to find a ton more bugs than ever before. And that means a lot more work for all the folks applying and testing the patches. “This month's release sits on the larger side of a hotpatch month, and we expect releases to continue trending larger for some time,” Tom Gallagher, VP of engineering at Microsoft Security Response Center, said in a note on this month's Patch Tuesday. Microsoft also said its secret-until-now AI bug hunting system, codenamed MDASH, found 16 of the vulnerabilities addressed in this month’s release. Redmond additionally announced it is making the tool available to a limited number of customers in private preview, along the lines of Anthropic’s Mythos and Project Glasswing. In other words: no break for Microsoft admins this May Patch Tuesday. Let’s take a look at some of the nastiest/most-interesting bugs that also received some of the highest-CVSS ratings this month, coming in hot at 9.8 and 9.9. First up: CVE-2026-41096. This one is a critical, 9.8-rated Windows DNS Client remote code execution (RCE), and while Redmond says exploitation is “unlikely,” we’d suggest patching it ASAP. It’s due to a heap-based buffer overflow, and no authentication or user interaction is needed to exploit it (it's done by sending a specially crafted DNS response to a vulnerable system), potentially leading to memory corruption and RCE. “Since the DNS Client runs on virtually every Windows machine, the attack surface is enormous,” Zero Day Initiative bug hunting boss Dustin Childs warned. “An attacker with a position to influence DNS responses (MitM, rogue server) could achieve unauthenticated RCE across your enterprise.” Plus, it could happen across a ton of enterprise systems very rapidly, Jack Bicer, Action1 vulnerability research director told The Register. “This CVE requires immediate attention,” he said. “Successful attacks may lead to widespread endpoint compromise, ransomware deployment, credential harvesting, and operational disruption across corporate networks.” Another especially bad bug, CVE-2026-42898 in Microsoft Dynamics 365 on-premises systems, achieved a near-perfect 9.9 CVSS rating and also leads to RCE. Any authenticated user can trigger this vuln - it doesn’t require admin or other elevated privileges. As Redmond explains: “An attacker with the required permissions could modify the saved state of a process session in Dynamics CRM and trigger the system to process that data, which could result in the server unintentionally executing malicious code.” Since exploitation could lead to a scope change, meaning the bug can affect systems beyond the vulnerable component, it’s a pretty serious risk to enterprises and should be prioritized. “Scope changes are pretty rare, so if you’re running Dynamics 365 On-Prem, definitely test and deploy this patch quickly,” Childs said. The second of two 9.8-rated bugs is CVE-2026-41089. It’s a stack-based buffer overflow in Windows Netlogon that allows an unauthenticated, remote attacker to execute code on vulnerable machines by sending a specially crafted network request to a Windows server acting as a domain controller. As Childs points out: the fact attackers can exploit this flaw without credentials or user interactions makes it wormable “This is the highest-impact bug that requires immediate patching: a compromised domain controller is a compromised domain,” he added. The silver lining this month for defenders is that the single CVE earning a perfect 10.0 CVSS rating is in Azure DevOps, and doesn’t require users to fix anything. CVE-2026-42826 is an information disclosure vulnerability in the DevOps toolchain “has already been fully mitigated by Microsoft,” according to Redmond. “There is no action for users of this service to take. The purpose of this CVE is to provide further transparency.” ®
Kategorie: Viry a Červi

US govt seeks Instructure testimony on massive Canvas cyberattack

Bleeping Computer - 13 Květen, 2026 - 01:09
The U.S. House Committee on Homeland Security is calling on Instructure executives to testify about two cyberattacks by the ShinyHunters extortion group that targeted the company's Canvas platform, allowing threat actors to steal student data and disrupt schools during final exams. [...]
Kategorie: Hacking & Security

Foxconn confirms cyberattack after ransomware crew claims it stole confidential Apple, Nvidia files

The Register - Anti-Virus - 13 Květen, 2026 - 00:02
Foxconn, a critical supplier for major hardware companies like Apple and Nvidia, on Tuesday confirmed a cyberattack affecting its North American operations after the Nitrogen ransomware gang listed the electronics manufacturer on its data leak site. “Some of Foxconn's factories in North America suffered a cyberattack,” a Foxconn spokesperson told The Register. “The cybersecurity team immediately activated the response mechanism and implemented multiple operational measures to ensure the continuity of production and delivery. The affected factories are currently resuming normal production.” Nitrogen ransomware criminals on Monday claimed to have breached the Taiwan-based company and stolen 8 TB of data comprising more than 11 million files. The miscreants say the leaks include confidential instructions, internal project documentation, and technical drawings related to projects at Intel, Apple, Google, Dell, and Nvidia, among others. Foxconn declined to confirm that these - or any - customers’ information was hoovered up in the digital intrusion. Nitrogen, which has been around since 2023, is believed to be one of the various ransomware offshoots that borrowed code from the leaked Conti 2 builder. And, in what may be very bad news for its latest victim, even paying the ransom demand may not guarantee recovery of encrypted files. In February, Coveware researchers warned that a programming error prevents the gang's decryptor from recovering victims' files, so paying up is futile. The finding specifically concerns the group's malware that targets VMware ESXi. This isn’t the first time Foxconn has been targeted by ransomware gangs. In 2024, LockBit claimed to have infected Foxsemicon Integrated Technology, a semiconductor equipment manufacturer within the Foxconn Technology Group. The same criminal crew also hit a Foxconn subsidiary in Mexico in 2022. ®
Kategorie: Viry a Červi

Energie a fixace ceny: Pozor, délka fixace nemusí odpovídat době trvání smlouvy

Lupa.cz - články - 13 Květen, 2026 - 00:00
Fixace ceny nemusí platit stejně dlouho jako samotná smlouva. Poradíme, co si pohlídat, kdy můžete odejít bez sankce a proč slepě nevěřit názvům produktů.
Kategorie: IT News

… v Pythonu: to není lenost, ale produkční kód

ROOT.cz - 13 Květen, 2026 - 00:00
Jeden známý okomentoval můj kód psaný v Pythonu slovy: „kdy to dokončíš, aby to šlo spustit?“ Vysvětlilo se, že myslí výpustky („trojtečky“, …), které jsem použil. Ve skutečnosti hrají v Pythonu důležité role.
Kategorie: GNU/Linux & BSD

Softwarová sklizeň (13. 5. 2026): emulujte virtuální USB zařízení po síti

ROOT.cz - 13 Květen, 2026 - 00:00
Vytvoříme virtuální USB zařízení ovládaná přes IP, prověříme skutečnou bezpečnost svých hesel, bleskově spustíme vývojářské projekty a optimalizujeme výkon i spotřebu procesoru na Linuxu.
Kategorie: GNU/Linux & BSD

V červnu nedojde na velkokapacitní, ale jen zkušební výrobu Rubin

CD-R server - 13 Květen, 2026 - 00:00
Zatímco při ohlášení architektury Rubin to vypadalo, že bude dostupná alespoň o tři čtvrtě roku dříve než CDNA 5 / Instinct MI400 od AMD, nyní se zdá, že rozdíl bude maximálně pár měsíců…
Kategorie: IT News

Fágová terapia - možné riešenie rezistencie baktérii voči antibiotikám

OSEL.cz - 13 Květen, 2026 - 00:00
Rozširujúca sa rezistencia baktérii voči známym antibiotikám je celosvetový problém, ktorý sa stále zhoršuje. Odhaduje sa, že antibiotická rezistencia je v súčasnosti príčinou asi 6 miliónov úmrtí ročne. Sú obavy, že vývoj nových antibiotík nedokáže udržať s nárastom rezistencie krok. To podnietilo záujem o fágovú terapiu.
Kategorie: Věda a technika

Čínský tým s AI chemií proměnil dusičnany z odpadních vod na amoniak

OSEL.cz - 13 Květen, 2026 - 00:00
Prostředí je plné odpadních dusičnanů, které buď škodí anebo se s velkými náklady odstraňují. Přitom je to ohromný zdroj dusíku, který by bylo možné využít. Nový superkatalyzátor založený na nanozymu ze dvou atomů mědi přeměňuje odpadní dusičnany na amoniak. Časem by mohl proměnit výrobu hnojiv.
Kategorie: Věda a technika

Virtuální Bastlírna vol. 62: WSL for Windows 98

AbcLinuxu [zprávičky] - 12 Květen, 2026 - 23:29
Chybí vám někdo, s kým byste si popovídali o bastlení, technice, počítačích a vědě? Nechcete riskovat debatu o sportu u piva v hospodě? Pak doražte na virtuální pokec u virtuálního piva v rámci Virtuální Bastlírny organizované strahovským MacGyverem již tento čtvrtek. Možná se ptáte, co se tak může probírat? Dají se probrat slavná výročí - kromě 55 let obvodu 555 (což je mimochodem prý andělské číslo) a vzpomínky na firmu Signetics - zavzpomínají strahováci na 40 let od černobylské havárie. V rámci toho se podívají nejen na repliku chybového panelu tehdejšího řídicího počítače SKALA, ale i jiné jeho části. Když už budou u neslavných událostí, zavzpomínají ukázkou, jak je možné provozovat linuxový kernel pod Windows 98. V podivnostech je možné jako obvykle pokračovat, třeba barevnými fotografiemi používajícími opravdové barvy místo pigmentů. A testem, kolikrát jde přepsat DVD-RW. Nebo kde sehnat pyrotechnické SMD odpory. A je to vše, Horste? Zdaleka ne! Tvůrci Virtuálních Bastlíren jste totiž hlavně vy! Nachystaná témata jsou jen jako záloha pro případy trapného ticha. Zavolejte tedy hned… tedy ve čtvrtek 14. května od 20:00, dovoláte se minimálně do 22:00 (ale obvykle VB končí o dost později). Odkaz na konferenci se před samotnou akcí objeví na bastlířské wiki.
Kategorie: GNU/Linux & BSD

GTK2-NG

AbcLinuxu [zprávičky] - 12 Květen, 2026 - 23:22
GTK2-NG je komunitní fork GTK 2.24 (aktuální verze je 4.22). Oznámení a diskuse v diskusním fóru Devuanu, forku Debianu bez systemd. Není to jediný fork GTK 2. Ardour je například postaven na vlastním forku GTK 2 s názvem YTK.
Kategorie: GNU/Linux & BSD

UK fines water supplier $1.3M for exposing data of 664k customers

Bleeping Computer - 12 Květen, 2026 - 22:17
The Information Commissioner's Office has fined South Staffordshire Water Plc and parent company South Staffordshire Plc £963,900 ($1.3 million) over a cyberattack that exposed the personal data of 663,887 customers and employees. [...]
Kategorie: Hacking & Security

Webinar: Fixing the gaps in network incident response

Bleeping Computer - 12 Květen, 2026 - 21:46
IT teams often struggle to quickly coordinate responses across disparate systems during network incidents. This upcoming webinar explores how automation and AI-assisted workflows can reduce response times and help prevent outages. [...]
Kategorie: Hacking & Security

Signal adds security warnings for social engineering, phishing attacks

Bleeping Computer - 12 Květen, 2026 - 21:40
Signal has introduced new in-app confirmations and warning messages as additional safeguards against phishing and social engineering attempts that could lead to various forms of fraud. [...]
Kategorie: Hacking & Security
Syndikovat obsah