Tag

Code

111 articles archived under #code · RSS

arXiv — NLP / Computation & Language research 4h ago

Right in the Right Way: LM Training with Verifiable Rewards and Human Demonstrations

arXiv:2607.01181v1 Announce Type: cross Abstract: RL with verifiable rewards (RLVR) has emerged as a powerful paradigm for training LMs on tasks with well-defined success metrics, such as code generation and mathematical reasoning. However, current RLVR methods optimize only…

25
arXiv — NLP / Computation & Language research 4h ago

Agentic generation of verifiable rules for deterministic, self-expanding reaction classification

arXiv:2607.01061v1 Announce Type: cross Abstract: Computer-assisted synthesis planning breaks target molecules into accessible precursors using large libraries of reaction rules that assign each transformation a deterministic, interpretable label. But chemistry is long-tailed,…

17
Latent.Space news-outlet 13h ago

How Cursor deploys AI inside the enterprise

Cursor's Pauline Brunet explains how her team of Forward Deployed Engineers help organizations implement agents — essentially setting up software factories.

35
Smol AI News news-outlet 1d ago

not much happened today

**Anthropic** re-enabled **Claude Fable 5** with updated cybersecurity safeguards routing some requests to **Opus 4.8**. The relaunch influenced tooling adoption by **Cursor**, **Devin**, and **Perplexity**. Builders are adapting to frontier-model constraints by employing…

16
TechCrunch — AI news-outlet 2d ago

Cursor now has a mobile app for guiding your coding agent on the go

Cursor has launched a new mobile app for remote oversight over coding agents.

29
Smol AI News news-outlet 3d ago

not much happened today

**Meta** announced **Brain2Qwerty v2**, a real-time non-invasive brain-to-text decoder achieving up to **78% word accuracy** with released training code and dataset. **Cursor** launched **Cursor for iOS** with remote AI agents and live activity features. Open-weight model access…

35
Hacker News — AI on Front Page community 3d ago

Age verification is just a precursor to automated attribution of speech

Article URL: https://nonogra.ph/age-verification-is-just-a-precursor-to-attribution-of-speech-06-29-2026 Comments URL: https://news.ycombinator.com/item?id=48714529 Points: 238 # Comments: 105

34
arXiv — Machine Learning research 6d ago

Optimizing CUDA like a Human: Micro-Profiling Tools as Expert Surrogates for LLM-Based GPU Kernel Optimization

arXiv:2606.26453v1 Announce Type: new Abstract: We present KernelPro, a closed-loop multi-agent system that automatically generates, profiles, and iteratively optimizes GPU kernel code by integrating large language model (LLM) code generation with hardware profiler feedback and…

21
GitHub Blog — AI & ML official-blog 6d ago

Evaluating performance and efficiency of the GitHub Copilot agentic harness across models and tasks

Explore how the GitHub Copilot agentic harness delivers strong results across multiple benchmarks and leading token efficiency, while maintaining flexibility to choose among more than 20 models. The post Evaluating performance and efficiency of the GitHub Copilot agentic harness…

19
Hugging Face Daily Papers research 6d ago

ReNIO: Reweighting Negative Trajectory Importance for LLM On-Policy Distillation

Abstract ReNIO enhances on-policy distillation for language models by reweighting negative trajectories based on token-level probability ratios, improving reasoning performance in mathematical and code generation tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct On-policy…

25
arXiv — NLP / Computation & Language research 7d ago

Dream at SemEval-2026 Task 13: SALSA for Single-Pass Machine-Generated Code Detection

arXiv:2606.25102v1 Announce Type: new Abstract: Large language models have transformed code generation, raising concerns around authorship, assessment integrity, and software trust. SemEval-2026 Task 13 Subtask A operationalizes detection as binary classification over code…

28
arXiv — NLP / Computation & Language research 7d ago

OPERA: Aligning Open-Ended Reasoning via Objective Perplexity-based Reinforcement Learning

arXiv:2606.25757v1 Announce Type: new Abstract: Reinforcement Learning (RL) has enabled LLMs to excel in objective reasoning tasks such as mathematics and code generation. However, applying RL to open-ended tasks, such as creative writing, remains challenging because…

22
r/LocalLLaMA community 7d ago

I reverse engineered Windows Copilot into a free OpenAI compatible API (GPT-4, no API key, no billing)

So Microsoft gives you GPT-4 for free in Copilot. They just don't give you an API for it. So I made one. It logs into your own Microsoft account once, saves the session, and exposes a local server at http://localhost:8000/v1 that speaks the OpenAI format. Point the official…

24
arXiv — NLP / Computation & Language research 8d ago

Ensemble Learning for Large Language Models in Text and Code Generation: A Survey

arXiv:2503.13505v3 Announce Type: replace Abstract: Generative Pretrained Transformers (GPTs) are foundational Large Language Models (LLMs) for text generation. However, individual LLMs often produce inconsistent outputs and exhibit biases, limiting their representation of…

10
r/LocalLLaMA community 10d ago

I mapped every agent config file (AGENTS.md, CLAUDE.md, llms.txt, .cursorrules, SKILL.md...) and tagged how widely each is actually used

Every tool ships its own magic file now and after a while the names all blur together. I put together a guide to the ones agents actually read and write, with a tag on each for real adoption instead of hype. https://github.com/ItamarZand88/awesome-agent-conventions 21…

22
GitHub Blog — AI & ML official-blog 12d ago

How we built an internal data analytics agent

Qubot, our internal Copilot-powered analytics agent, allows any GitHub employee to ask questions about our data in plain language. Here's what we learned as we built it. The post How we built an internal data analytics agent appeared first on The GitHub Blog .

18
Hugging Face Daily Papers research 12d ago

No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages

Abstract Research addresses code generation challenges for no-resource programming languages by developing benchmarks and proposing a method that combines further pre-training with weight difference transfer to create specialized instruction-following models at reduced…

27
Hugging Face Daily Papers research 13d ago

JAMER: Project-Level Code Framework Dataset and Benchmark on Professional Game Engines

Abstract Game development frameworks and benchmarks were created using data from game jam competitions to evaluate code generation and project-level programming capabilities. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current AI-driven game development has made substantial…

25
ThursdAI news-outlet 13d ago

Fable Got Banned, Open Source Delivered: GLM-5.2, Kimi K2.7 & SpaceX Buys Cursor - June 18

From CoreWeave (W&B): Fable is gone (for now). Here's everything else that happened this week: GLM-5.2 takes the open source crown, SpaceX buys Cursor for $60B, and 3 guests on the show today!

23
GitHub Blog — AI & ML official-blog 14d ago

Getting more from each token: How Copilot improves context handling and model routing

How GitHub Copilot is making more of each session go toward useful work, so your credits go further. The post Getting more from each token: How Copilot improves context handling and model routing appeared first on The GitHub Blog .

34
Stratechery (Ben Thompson) community 14d ago

The State of Fable, The Jailbreak Problem, SpaceX Acquires Cursor

The administration is very likely wrong about Fable, but that is ultimately Anthropic's responsibility.

20
Hugging Face Daily Papers research 15d ago

LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling

Abstract Parallel loop Transformers achieve better code generation performance with two loops due to refined representations, while additional loops cause diminishing returns and increased positional mismatch costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Looped…

5
Ars Technica — AI news-outlet 15d ago

SpaceX acquires AI coding platform Cursor for $60 billion

Separately, neither could compete. Now they hope they can.

20
Hacker News — AI on Front Page community 15d ago

SpaceX Is Buying Cursor

Article URL: https://www.bbc.com/news/articles/cvgd5g7d7gyo Comments URL: https://news.ycombinator.com/item?id=48554215 Points: 255 # Comments: 289

24
The Information — AI news-outlet 15d ago

SpaceX finalizes $60 billion deal to acquire Cursor

SpaceX announced it agreed to buy AI coding startup Cursor for $60 billion on Tuesday. The announcement came only a few days after SpaceX went public at a valuation of about $1.77 trillion. Since the IPO, SpaceX stock has risen 42% to close on Monday at $193.50, valuing it at…

37
TechCrunch — AI news-outlet 15d ago

SpaceX to acquire Cursor for $60B in stock, days after blockbuster IPO

The deal is supposed to help SpaceX's struggling AI division. The company told IPO investors it sees a $26 trillion addressable market in AI.

21
Ars Technica — AI news-outlet 15d ago

Critical Copilot vulnerability allowed hackers to seal 2FA code from users

SearchLeak exploit shows why the industry's approach to LLM security fails over and over.

4
Hacker News — AI on Front Page community 15d ago

SpaceX to buy Cursor for $60B

Article URL: https://www.reuters.com/legal/transactional/spacex-buy-anysphere-60-billion-2026-06-16/ Comments URL: https://news.ycombinator.com/item?id=48553224 Points: 214 # Comments: 157

16
r/LocalLLaMA community 16d ago

Are small local models for automation a thing?

I’ve been following this sub for a while, and it feels like the massive hype is always around having a local vibe coding assistant or trying to run heavy, near-frontier models locally, and that’s amazing. But I feel like we are overlooking a massive use case, for me, an…

5
GitHub Blog — AI & ML official-blog 16d ago

GitHub Copilot CLI for Beginners: Overview of common slash commands

GitHub Copilot CLI for Beginners: Learn how to use slash commands to control your terminal AI agent. The post GitHub Copilot CLI for Beginners: Overview of common slash commands appeared first on The GitHub Blog .

26
r/LocalLLaMA community 16d ago

Context window + project size + Aider?

Forgive the naivety of this post, I'm a noob, bear with me! If a project, understood as a set of files, is larger than the context window of a model, how do you fit it in? After doing some naive research, various major LLMs like Deepseek, Kimi, and company say the solution is…

32
arXiv — NLP / Computation & Language research 17d ago

Dialogue SWE-Bench: A Benchmark for Dialogue-Driven Coding Agents

arXiv:2606.13995v1 Announce Type: new Abstract: AI coding agents have rapidly transformed software engineering, powering widely used interactive coding assistants. Despite their interactive real-world use, existing benchmarks evaluate them as fully-autonomous systems. In this…

10
GitHub Blog — AI & ML official-blog 19d ago

How we made GitHub Copilot CLI more selective about delegation

Better orchestration, fewer handoffs, faster progress, without a single new knob. The post How we made GitHub Copilot CLI more selective about delegation appeared first on The GitHub Blog .

25
r/LocalLLaMA community 20d ago

Where are we with computer-control harnesses?

Seems like local vision language models models are getting smart enough so that it would be useful to hand them the cursor in a secure sandbox. What harnesses are available that can do this? edit: oh my fucking God something about this post triggered all of the bots to come out…

27
r/MachineLearning community 20d ago

What should context compression keep? I looked at how six agents handle it[D]

I use Claude Code, Codex CLI, OpenCode, Cline, Cursor, and Amp enough to notice a pattern in how they handle long context. They are all converging on layered progressive compression, but they disagree on what to protect. Most protect recent user messages as a first-class asset.…

20
Hugging Face Daily Papers research 20d ago

Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code

Abstract Grammar-constrained decoding techniques used to ensure syntactic validity in code generation can be exploited as an attack surface, leading to the development of a jailbreak method called CodeSpear and a safety alignment approach named CodeShield. Generated by…

37
Hugging Face Daily Papers research 21d ago

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

Abstract A new benchmark and adapter protocol called Claw-SWE-Bench enables fair comparison of diverse coding agents by standardizing evaluation conditions and revealing the importance of adapter design for effective code generation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

16
NVIDIA Developer Blog official-blog 21d ago

Run DiffusionGemma on NVIDIA for Developer-Ready, High-Throughput Text Generation

Developers building real-time AI—such as chat assistants, copilots, and agentic workflows—are often constrained by token-by-token generation speed. This...

6
GitHub Blog — AI & ML official-blog 21d ago

Give GitHub Copilot CLI real code intelligence with language servers

Install and configure LSP servers for GitHub Copilot CLI, replacing brute-force grep/decompile with real code intelligence. The post Give GitHub Copilot CLI real code intelligence with language servers appeared first on The GitHub Blog .

34
Hacker News — AI on Front Page community 22d ago

How we made hit video game Prince of Persia

Article URL: https://www.theguardian.com/culture/2026/jan/05/raiders-of-the-lost-ark-hit-video-game-prince-of-persia Comments URL: https://news.ycombinator.com/item?id=48468852 Points: 203 # Comments: 78

38
GitHub Blog — AI & ML official-blog 22d ago

From one-off prompts to workflows: How to use custom agents in GitHub Copilot CLI

Custom agents let GitHub Copilot CLI understand your stack and team workflows, turning one-off terminal prompts into repeatable, reviewable processes. The post From one-off prompts to workflows: How to use custom agents in GitHub Copilot CLI appeared first on The GitHub Blog .

20
r/LocalLLaMA community 25d ago

Best Coding Harness for Qwen3.6 35B?

I've been happily using GitHub Copilot for 7-8 months, primarily in Visual Studio and VS Code, mostly with the built-in flagship models and have felt like the output is worth the cost. Lately I've been playing with a lot of different local LLM models and decided to try using…

32
r/LocalLLaMA community 26d ago

Github Copilot finally supporting custom endpoints

https://preview.redd.it/082gnmin1l5h1.png?width=1740&format=png&auto=webp&s=2c89f6310c8c654611188183de07857d77cb2417 https://preview.redd.it/169tjrzn1l5h1.png?width=710&format=png&auto=webp&s=9a1fa656ea95037622b0d7ea2e16a23d2122442c I just noticed   submitted by  …

19
r/MachineLearning community 27d ago

Is it allowed to use OpenAI API outputs to create a silver code dataset or benchmark for a specific Python library? [d]

Hello everyone, Is it allowed to use OpenAI API outputs to create a silver code dataset or benchmark for a specific Python library? I am working on a project idea related to library-specific code generation. The concrete case is a specific Python library used in a…

18
Hugging Face Daily Papers research 28d ago

Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems

Abstract Production-grounded evaluation framework RAMP assesses long-horizon software engineering agents through realistic compiler construction workloads and runtime analysis. Generated by Qwen/Qwen2.5-Coder-32B-Instruct LLM agents are rapidly evolving from coding assistants…

21
Simon Willison community 29d ago

Microsoft's new MAI models

Microsoft announced two new text LLMs this morning - MAI-Thinking-1 (reasoning, 35B parameters, available to "select early partners") and MAI-Code-1-Flash (5B parameters, "purpose-built for GitHub Copilot and VS Code to deliver high performance and lower cost [...] rolling out…

17
Latent.Space news-outlet 29d ago

GitHub's plan for Agents — Kyle Daigle, GitHub

GitHub pioneered the modern AI coding era with Copilot, and the resulting explosion in agentic coding has led to notable strains on the most popular developer platform in the world. Here's the plan.

27
Ars Technica — AI news-outlet 1mo ago

AI costs how much? GitHub Copilot users react to new usage-based pricing system.

Some report burning through their whole monthly "AI credit" allotment in a single day.

18
Zed Editor dev-tools 1mo ago

What GitHub Copilot's Usage-Based Billing Means for Zed Users

Copilot Chat is now metered with GitHub AI Credits. Copilot edit predictions are not.

24
TechCrunch — AI news-outlet 1mo ago

‘What a joke’: Github Copilot’s new token-based billing spurs consternation among devs

The golden age of Microsoft's Github Copilot appears to be at an end.

5

Right in the Right Way: LM Training with Verifiable Rewards and Human Demonstrations

Agentic generation of verifiable rules for deterministic, self-expanding reaction classification

How Cursor deploys AI inside the enterprise

not much happened today

Cursor now has a mobile app for guiding your coding agent on the go

not much happened today

Age verification is just a precursor to automated attribution of speech

Optimizing CUDA like a Human: Micro-Profiling Tools as Expert Surrogates for LLM-Based GPU Kernel Optimization

Evaluating performance and efficiency of the GitHub Copilot agentic harness across models and tasks

ReNIO: Reweighting Negative Trajectory Importance for LLM On-Policy Distillation

Dream at SemEval-2026 Task 13: SALSA for Single-Pass Machine-Generated Code Detection

OPERA: Aligning Open-Ended Reasoning via Objective Perplexity-based Reinforcement Learning

I reverse engineered Windows Copilot into a free OpenAI compatible API (GPT-4, no API key, no billing)

Ensemble Learning for Large Language Models in Text and Code Generation: A Survey

I mapped every agent config file (AGENTS.md, CLAUDE.md, llms.txt, .cursorrules, SKILL.md...) and tagged how widely each is actually used

How we built an internal data analytics agent

No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages

JAMER: Project-Level Code Framework Dataset and Benchmark on Professional Game Engines

Fable Got Banned, Open Source Delivered: GLM-5.2, Kimi K2.7 & SpaceX Buys Cursor - June 18

Getting more from each token: How Copilot improves context handling and model routing

The State of Fable, The Jailbreak Problem, SpaceX Acquires Cursor

LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling

SpaceX acquires AI coding platform Cursor for $60 billion

SpaceX Is Buying Cursor

SpaceX finalizes $60 billion deal to acquire Cursor

SpaceX to acquire Cursor for $60B in stock, days after blockbuster IPO

Critical Copilot vulnerability allowed hackers to seal 2FA code from users

SpaceX to buy Cursor for $60B

Are small local models for automation a thing?

GitHub Copilot CLI for Beginners: Overview of common slash commands

Context window + project size + Aider?

Dialogue SWE-Bench: A Benchmark for Dialogue-Driven Coding Agents

How we made GitHub Copilot CLI more selective about delegation

Where are we with computer-control harnesses?

What should context compression keep? I looked at how six agents handle it[D]

Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

Run DiffusionGemma on NVIDIA for Developer-Ready, High-Throughput Text Generation

Give GitHub Copilot CLI real code intelligence with language servers

How we made hit video game Prince of Persia

From one-off prompts to workflows: How to use custom agents in GitHub Copilot CLI

Best Coding Harness for Qwen3.6 35B?

Github Copilot finally supporting custom endpoints

Is it allowed to use OpenAI API outputs to create a silver code dataset or benchmark for a specific Python library? [d]

Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems

Microsoft's new MAI models

GitHub's plan for Agents — Kyle Daigle, GitHub

AI costs how much? GitHub Copilot users react to new usage-based pricing system.

What GitHub Copilot's Usage-Based Billing Means for Zed Users

&#8216;What a joke&#8217;: Github Copilot&#8217;s new token-based billing spurs consternation among devs

‘What a joke’: Github Copilot’s new token-based billing spurs consternation among devs