News / #code Tag Code 111 articles archived under #code · RSS Sign in to follow arXiv — NLP / Computation & Language research 4h ago Right in the Right Way: LM Training with Verifiable Rewards and Human Demonstrations arXiv:2607.01181v1 Announce Type: cross Abstract: RL with verifiable rewards (RLVR) has emerged as a powerful paradigm for training LMs on tasks with well-defined success metrics, such as code generation and mathematical reasoning. However, current RLVR methods optimize only… 25 arXiv — NLP / Computation & Language research 4h ago Agentic generation of verifiable rules for deterministic, self-expanding reaction classification arXiv:2607.01061v1 Announce Type: cross Abstract: Computer-assisted synthesis planning breaks target molecules into accessible precursors using large libraries of reaction rules that assign each transformation a deterministic, interpretable label. But chemistry is long-tailed,… 17 Latent.Space news-outlet 13h ago How Cursor deploys AI inside the enterprise Cursor's Pauline Brunet explains how her team of Forward Deployed Engineers help organizations implement agents — essentially setting up software factories. 35 Smol AI News news-outlet 1d ago not much happened today **Anthropic** re-enabled **Claude Fable 5** with updated cybersecurity safeguards routing some requests to **Opus 4.8**. The relaunch influenced tooling adoption by **Cursor**, **Devin**, and **Perplexity**. Builders are adapting to frontier-model constraints by employing… 16 TechCrunch — AI news-outlet 2d ago Cursor now has a mobile app for guiding your coding agent on the go Cursor has launched a new mobile app for remote oversight over coding agents. 29 Smol AI News news-outlet 3d ago not much happened today **Meta** announced **Brain2Qwerty v2**, a real-time non-invasive brain-to-text decoder achieving up to **78% word accuracy** with released training code and dataset. **Cursor** launched **Cursor for iOS** with remote AI agents and live activity features. Open-weight model access… 35 Hacker News — AI on Front Page community 3d ago Age verification is just a precursor to automated attribution of speech Article URL: https://nonogra.ph/age-verification-is-just-a-precursor-to-attribution-of-speech-06-29-2026 Comments URL: https://news.ycombinator.com/item?id=48714529 Points: 238 # Comments: 105 34 arXiv — Machine Learning research 6d ago Optimizing CUDA like a Human: Micro-Profiling Tools as Expert Surrogates for LLM-Based GPU Kernel Optimization arXiv:2606.26453v1 Announce Type: new Abstract: We present KernelPro, a closed-loop multi-agent system that automatically generates, profiles, and iteratively optimizes GPU kernel code by integrating large language model (LLM) code generation with hardware profiler feedback and… 21 GitHub Blog — AI & ML official-blog 6d ago Evaluating performance and efficiency of the GitHub Copilot agentic harness across models and tasks Explore how the GitHub Copilot agentic harness delivers strong results across multiple benchmarks and leading token efficiency, while maintaining flexibility to choose among more than 20 models. The post Evaluating performance and efficiency of the GitHub Copilot agentic harness… 19 Hugging Face Daily Papers research 6d ago ReNIO: Reweighting Negative Trajectory Importance for LLM On-Policy Distillation Abstract ReNIO enhances on-policy distillation for language models by reweighting negative trajectories based on token-level probability ratios, improving reasoning performance in mathematical and code generation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct On-policy… 25 arXiv — NLP / Computation & Language research 7d ago Dream at SemEval-2026 Task 13: SALSA for Single-Pass Machine-Generated Code Detection arXiv:2606.25102v1 Announce Type: new Abstract: Large language models have transformed code generation, raising concerns around authorship, assessment integrity, and software trust. SemEval-2026 Task 13 Subtask A operationalizes detection as binary classification over code… 28 arXiv — NLP / Computation & Language research 7d ago OPERA: Aligning Open-Ended Reasoning via Objective Perplexity-based Reinforcement Learning arXiv:2606.25757v1 Announce Type: new Abstract: Reinforcement Learning (RL) has enabled LLMs to excel in objective reasoning tasks such as mathematics and code generation. However, applying RL to open-ended tasks, such as creative writing, remains challenging because… 22 r/LocalLLaMA community 7d ago I reverse engineered Windows Copilot into a free OpenAI compatible API (GPT-4, no API key, no billing) So Microsoft gives you GPT-4 for free in Copilot. They just don't give you an API for it. So I made one. It logs into your own Microsoft account once, saves the session, and exposes a local server at http://localhost:8000/v1 that speaks the OpenAI format. Point the official… 24 arXiv — NLP / Computation & Language research 8d ago Ensemble Learning for Large Language Models in Text and Code Generation: A Survey arXiv:2503.13505v3 Announce Type: replace Abstract: Generative Pretrained Transformers (GPTs) are foundational Large Language Models (LLMs) for text generation. However, individual LLMs often produce inconsistent outputs and exhibit biases, limiting their representation of… 10 r/LocalLLaMA community 10d ago I mapped every agent config file (AGENTS.md, CLAUDE.md, llms.txt, .cursorrules, SKILL.md...) and tagged how widely each is actually used Every tool ships its own magic file now and after a while the names all blur together. I put together a guide to the ones agents actually read and write, with a tag on each for real adoption instead of hype. https://github.com/ItamarZand88/awesome-agent-conventions 21… 22 GitHub Blog — AI & ML official-blog 12d ago How we built an internal data analytics agent Qubot, our internal Copilot-powered analytics agent, allows any GitHub employee to ask questions about our data in plain language. Here's what we learned as we built it. The post How we built an internal data analytics agent appeared first on The GitHub Blog . 18 Hugging Face Daily Papers research 12d ago No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages Abstract Research addresses code generation challenges for no-resource programming languages by developing benchmarks and proposing a method that combines further pre-training with weight difference transfer to create specialized instruction-following models at reduced… 27 Hugging Face Daily Papers research 13d ago JAMER: Project-Level Code Framework Dataset and Benchmark on Professional Game Engines Abstract Game development frameworks and benchmarks were created using data from game jam competitions to evaluate code generation and project-level programming capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current AI-driven game development has made substantial… 25 ThursdAI news-outlet 13d ago Fable Got Banned, Open Source Delivered: GLM-5.2, Kimi K2.7 & SpaceX Buys Cursor - June 18 From CoreWeave (W&B): Fable is gone (for now). Here's everything else that happened this week: GLM-5.2 takes the open source crown, SpaceX buys Cursor for $60B, and 3 guests on the show today! 23 GitHub Blog — AI & ML official-blog 14d ago Getting more from each token: How Copilot improves context handling and model routing How GitHub Copilot is making more of each session go toward useful work, so your credits go further. The post Getting more from each token: How Copilot improves context handling and model routing appeared first on The GitHub Blog . 34 Stratechery (Ben Thompson) community 14d ago The State of Fable, The Jailbreak Problem, SpaceX Acquires Cursor The administration is very likely wrong about Fable, but that is ultimately Anthropic's responsibility. 20 Hugging Face Daily Papers research 15d ago LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling Abstract Parallel loop Transformers achieve better code generation performance with two loops due to refined representations, while additional loops cause diminishing returns and increased positional mismatch costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Looped… 5 Ars Technica — AI news-outlet 15d ago SpaceX acquires AI coding platform Cursor for $60 billion Separately, neither could compete. Now they hope they can. 20 Hacker News — AI on Front Page community 15d ago SpaceX Is Buying Cursor Article URL: https://www.bbc.com/news/articles/cvgd5g7d7gyo Comments URL: https://news.ycombinator.com/item?id=48554215 Points: 255 # Comments: 289 24 The Information — AI news-outlet 15d ago SpaceX finalizes $60 billion deal to acquire Cursor SpaceX announced it agreed to buy AI coding startup Cursor for $60 billion on Tuesday. The announcement came only a few days after SpaceX went public at a valuation of about $1.77 trillion. Since the IPO, SpaceX stock has risen 42% to close on Monday at $193.50, valuing it at… 37 TechCrunch — AI news-outlet 15d ago SpaceX to acquire Cursor for $60B in stock, days after blockbuster IPO The deal is supposed to help SpaceX's struggling AI division. The company told IPO investors it sees a $26 trillion addressable market in AI. 21 Ars Technica — AI news-outlet 15d ago Critical Copilot vulnerability allowed hackers to seal 2FA code from users SearchLeak exploit shows why the industry's approach to LLM security fails over and over. 4 Hacker News — AI on Front Page community 15d ago SpaceX to buy Cursor for $60B Article URL: https://www.reuters.com/legal/transactional/spacex-buy-anysphere-60-billion-2026-06-16/ Comments URL: https://news.ycombinator.com/item?id=48553224 Points: 214 # Comments: 157 16 r/LocalLLaMA community 16d ago Are small local models for automation a thing? I’ve been following this sub for a while, and it feels like the massive hype is always around having a local vibe coding assistant or trying to run heavy, near-frontier models locally, and that’s amazing. But I feel like we are overlooking a massive use case, for me, an… 5 GitHub Blog — AI & ML official-blog 16d ago GitHub Copilot CLI for Beginners: Overview of common slash commands GitHub Copilot CLI for Beginners: Learn how to use slash commands to control your terminal AI agent. The post GitHub Copilot CLI for Beginners: Overview of common slash commands appeared first on The GitHub Blog . 26 r/LocalLLaMA community 16d ago Context window + project size + Aider? Forgive the naivety of this post, I'm a noob, bear with me! If a project, understood as a set of files, is larger than the context window of a model, how do you fit it in? After doing some naive research, various major LLMs like Deepseek, Kimi, and company say the solution is… 32 arXiv — NLP / Computation & Language research 17d ago Dialogue SWE-Bench: A Benchmark for Dialogue-Driven Coding Agents arXiv:2606.13995v1 Announce Type: new Abstract: AI coding agents have rapidly transformed software engineering, powering widely used interactive coding assistants. Despite their interactive real-world use, existing benchmarks evaluate them as fully-autonomous systems. In this… 10 GitHub Blog — AI & ML official-blog 19d ago How we made GitHub Copilot CLI more selective about delegation Better orchestration, fewer handoffs, faster progress, without a single new knob. The post How we made GitHub Copilot CLI more selective about delegation appeared first on The GitHub Blog . 25 r/LocalLLaMA community 20d ago Where are we with computer-control harnesses? Seems like local vision language models models are getting smart enough so that it would be useful to hand them the cursor in a secure sandbox. What harnesses are available that can do this? edit: oh my fucking God something about this post triggered all of the bots to come out… 27 r/MachineLearning community 20d ago What should context compression keep? I looked at how six agents handle it[D] I use Claude Code, Codex CLI, OpenCode, Cline, Cursor, and Amp enough to notice a pattern in how they handle long context. They are all converging on layered progressive compression, but they disagree on what to protect. Most protect recent user messages as a first-class asset.… 20 Hugging Face Daily Papers research 20d ago Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code Abstract Grammar-constrained decoding techniques used to ensure syntactic validity in code generation can be exploited as an attack surface, leading to the development of a jailbreak method called CodeSpear and a safety alignment approach named CodeShield. Generated by… 37 Hugging Face Daily Papers research 21d ago Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks Abstract A new benchmark and adapter protocol called Claw-SWE-Bench enables fair comparison of diverse coding agents by standardizing evaluation conditions and revealing the importance of adapter design for effective code generation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 16 NVIDIA Developer Blog official-blog 21d ago Run DiffusionGemma on NVIDIA for Developer-Ready, High-Throughput Text Generation Developers building real-time AI—such as chat assistants, copilots, and agentic workflows—are often constrained by token-by-token generation speed. This... 6 GitHub Blog — AI & ML official-blog 21d ago Give GitHub Copilot CLI real code intelligence with language servers Install and configure LSP servers for GitHub Copilot CLI, replacing brute-force grep/decompile with real code intelligence. The post Give GitHub Copilot CLI real code intelligence with language servers appeared first on The GitHub Blog . 34 Hacker News — AI on Front Page community 22d ago How we made hit video game Prince of Persia Article URL: https://www.theguardian.com/culture/2026/jan/05/raiders-of-the-lost-ark-hit-video-game-prince-of-persia Comments URL: https://news.ycombinator.com/item?id=48468852 Points: 203 # Comments: 78 38 GitHub Blog — AI & ML official-blog 22d ago From one-off prompts to workflows: How to use custom agents in GitHub Copilot CLI Custom agents let GitHub Copilot CLI understand your stack and team workflows, turning one-off terminal prompts into repeatable, reviewable processes. The post From one-off prompts to workflows: How to use custom agents in GitHub Copilot CLI appeared first on The GitHub Blog . 20 r/LocalLLaMA community 25d ago Best Coding Harness for Qwen3.6 35B? I've been happily using GitHub Copilot for 7-8 months, primarily in Visual Studio and VS Code, mostly with the built-in flagship models and have felt like the output is worth the cost. Lately I've been playing with a lot of different local LLM models and decided to try using… 32 r/LocalLLaMA community 26d ago Github Copilot finally supporting custom endpoints https://preview.redd.it/082gnmin1l5h1.png?width=1740&format=png&auto=webp&s=2c89f6310c8c654611188183de07857d77cb2417 https://preview.redd.it/169tjrzn1l5h1.png?width=710&format=png&auto=webp&s=9a1fa656ea95037622b0d7ea2e16a23d2122442c I just noticed   submitted by  … 19 r/MachineLearning community 27d ago Is it allowed to use OpenAI API outputs to create a silver code dataset or benchmark for a specific Python library? [d] Hello everyone, Is it allowed to use OpenAI API outputs to create a silver code dataset or benchmark for a specific Python library? I am working on a project idea related to library-specific code generation. The concrete case is a specific Python library used in a… 18 Hugging Face Daily Papers research 28d ago Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems Abstract Production-grounded evaluation framework RAMP assesses long-horizon software engineering agents through realistic compiler construction workloads and runtime analysis. Generated by Qwen/Qwen2.5-Coder-32B-Instruct LLM agents are rapidly evolving from coding assistants… 21 Simon Willison community 29d ago Microsoft's new MAI models Microsoft announced two new text LLMs this morning - MAI-Thinking-1 (reasoning, 35B parameters, available to "select early partners") and MAI-Code-1-Flash (5B parameters, "purpose-built for GitHub Copilot and VS Code to deliver high performance and lower cost [...] rolling out… 17 Latent.Space news-outlet 29d ago GitHub's plan for Agents — Kyle Daigle, GitHub GitHub pioneered the modern AI coding era with Copilot, and the resulting explosion in agentic coding has led to notable strains on the most popular developer platform in the world. Here's the plan. 27 Ars Technica — AI news-outlet 1mo ago AI costs how much? GitHub Copilot users react to new usage-based pricing system. Some report burning through their whole monthly "AI credit" allotment in a single day. 18 Zed Editor dev-tools 1mo ago What GitHub Copilot's Usage-Based Billing Means for Zed Users Copilot Chat is now metered with GitHub AI Credits. Copilot edit predictions are not. 24 TechCrunch — AI news-outlet 1mo ago ‘What a joke’: Github Copilot’s new token-based billing spurs consternation among devs The golden age of Microsoft's Github Copilot appears to be at an end. 5 Page 1 of 3 · 111 articles Older →