We trained “critique-writing” models to describe flaws in summaries. Human evaluators find flaws in summaries much more often when shown our model’s critiques. Larger models are better at self-critiquing, with scale improving critique-writing more than summary-writing. This shows promise for using AI systems to assist human supervision of AI systems on difficult tasks.
We want to ensure that future AI systems performing very difficult tasks remain aligned with human intent. Many previous works on aligning language models rely on human evaluations as a training signal. However, humans struggle at evaluating very difficult tasks—for example, it is hard to spot every bug in a codebase or every factual error in a long essay. Models may then learn to give outputs that look good to humans but have errors we systematically fail to notice.
To mitigate this problem, we want to train AI assistants that help humans provide feedback on hard tasks. These assistants should point out flaws, help humans understand what’s going on, and answer their questions. An example of this is our past work on book summarization: reading the entire book is a lot of work, but humans assisted with chapter summaries have a much easier time evaluating a book summary.
As a proof of concept, we used supervised learning to train language models to write critiques of topic-based summaries of short stories, Wikipedia articles, and other texts from the internet. We use these models to assist human evaluators and study scaling properties of critique writing.
Experiments with AI assistance
To see how useful our models are for evaluation assistance, we show labelers 8 model-written critiques of each summary, with a control group that receives no assistance. We use topic-based summaries from three sources: written by our models, written by humans, and written by humans deliberately to have important yet subtle flaws.
Throughout the day, The Star-Ledger will provide updates here (newest on top) as new information comes in, watches and warnings are issued and the forecast changes.
10:30 P.M. Weather forecasters tonight reiterated warnings for drivers and residents that a potentially dangerous portion of the storm will be hitting much of central and northern New Jersey during Friday’s evening rush-hour. Major travel delays are expected late Friday and Friday night as rain turns into snow, the National Weather Service forecast said.
MORE SNOWSTORM UPDATES
• Friday, Feb. 8: N.J. snowstorm: Live updates on blizzard, traffic, flooding and more
• Saturday, Feb. 9: N.J. snowstorm update: Power outages, snow totals and other storm news
After periods of rain, heavy snow is expected to be falling in many places by late Friday afternoon , the forecast said. In some places north of Interstate 78, snow is expected to come down between 1 and 2 inches per hour. In counties like Sussex, Morris and Warren, expected snow accumulations range from 6 to 16 inches.
For many towns from Jackson in Ocean County to Somerville in Somerset County and out east to Long Beach Island, snow accumulation is expected to range from 4 to 10 inches. High winds are expected throughout the region, topping out in Monmouth County, with gusts up to 45 mph possible.
By daybreak Saturday, flurries will taper off, giving way to a sunny, blustery day, the latest forecast said.
9:12 P.M. With forecasters still predicting a major winter storm to hit New Jersey, many schools throughout the state are preemptively canceling or delaying classes Friday.
8:45 P.M. In advance of the storm, NJ Transit has announced it will be offering full systemwide cross-honoring all day Friday and all day Saturday, enabling customers to use their ticket or pass on an alternate travel mode — rail, bus or light rail.
5 P.M. The signatures of thunder-snow (which is just what it sounds like — thunder and lightning during heavy snow) are showing up on several models, according to NY NJ PA Weather meteorologistSteven DiMartino.
This indicates the potential for extremely heavy snow to fall in eastern New Jersey tomorrow night, and adds to the unpredictability to totals.
”Where you get some of this convective snow, when it comes down, it’s going to come down very, very hard,” he said. “It’s difficult to pinpoint just where these bands are going to occur. You could end up with a situation where one town has 18 inches of snow and the next town over has three.”
DiMartino stressed the volatility that remains in the forecast, and urged state residents to pay close attention to changing conditions. Many of the details of what ultimately will happen in local areas will not be determined until the storm beings to come together tomorrow.
He said the potential for these heavier snow bands to develop may be why some forecast models (like the NAM, above), are predicting much heavier snowfall totals than the National Weather Service.
[]
The North American Model (NAM), released this afternoon, showed well over a foot of snow falling over many areas in New Jersey.
4:13 P.M. The National Weather Service has issued a blizzard warning for parts of northeastern New Jersey, including Newark and Jersey City, and the five boroughs of New York, where upwards of 14 inches of snow are expected along with howling winds and severely reduced visibility.
The blizzard warnings are in effect from 6 a.m. Friday until 1 p.m. Saturday and warn of 10 to 14 inches of snow, with locally higher amounts and white-out conditions with wind gusts of up to 45 miles per hour. Blizzard conditions are expected in coastal northeastern New Jersey, in southern Bergen and Passaic Counties and Eastern Hudson, Essex and Union counties.
Further north and west, 10 to 14 inches of snow are also expected, but winds are not expected to reach blizzard criteria. Winter storm warnings are in effect there.
3:24 P.M. The National Weather Service at Mount Holly has issued Winter Storm warnings for several counties in northern and central New Jersey and extended further them further south than the areas the previously issued watches covered.
The winter storm warnings have been issued for Sussex, Warren, Morris, Hunterdon, Middlesex, Monmouth, Ocean and northwest Burlington counties. In Sussex, Warren and Morris counties, the National Weather Service is expecting between ten to 16 inches of snow to fall, while other counties in the warning areacould receive six to ten inches. The warnings are in effect from 6 a.m. Friday to 6 a.m. Saturday.
Expect the National Weather Service’s Upton, N.Y. office, which covers northeastern N.J., to follow suit shortly.
Further south, winter weather advisories have been issued for the rest of the state, where between two and five inches of snow is anticipated.
3:07 P.M.The private and public sectors in New Jersey are now bracing for major storm impacts.
More than 350 United Airlines flights, many based out of Newark-Liberty International Airport, have already been canceled, according to flight tracking website FlightAware. NJ Transit announced they will cross-honor tickets across its entire system. Utilities like Jersey Central Power & Light and PSE&G say they will have extra crews on hand to deal with potential power issues caused by heavy snow and wind.
Additionally, several events are being postponed across the state, such as two sectional high school track championships. The state Office of Emergency Management has not yet opened its operations center in Trenton, but it remains a possibility. Mary Goepfert, a spokeswoman for OEM, said the state is monitoring the storm closely and has been in contact with local emergency managers in preparation.
2:07 P.M. The European model is in and it looks snowy, much like many of the other models that ran earlier. Were this to verify, a six to 12-inch plus snowfall is definitely in the cards for north and central New Jersey, particularly north of Interstate-195.
Freehold-based meteorologist and owner of NY NJ PA Weather Steven DiMartino said he likes the European solution best, so far, and agrees with totals.
What does the NAM look like, you ask? Well the snowfall printout is posted below, but Eric Holthaus tweeted a picture of the simulated radar produced by the NAM model for tomorrow night. An absolute monster.
1:50 P.M. The most-affected regions of Hurricane Sandy along the New Jersey coast are about to take another hit. With defenses already weakened, coastal communities could see major impacts from coastal flooding, with the worst coming Saturday morning, according to the National Weather Service.
”I’m really worried about the areas worst hit by Sandy,” said NWS meteorologist Gary Szatkowski. “Time is starting to work against us…We could see substantial beach erosion. I know people have been working hard, but there’s less to erode. We could easily see waves and water coming into areas you typically wouldn’t.”
Szatkowski said he is concerned about the Raritan Bay shore in particular, where a three foot storm surge is possible at high tide Saturday morning, with five to seven foot waves breaking over top of it.
1:22 P.M. Tomorrow night’s commute could be awful in northern New Jersey. By 7 p.m., there is a threat that snowfall rates could reach two inches per hour across large swaths of northern and central New Jersey. Snowfall rates of this magnitude could reduce visibility substantially, wreak havoc on roads and make travel dangerous, if not nearly impossible.
Gary Szatkowski, meteorologist in charge at the National Weather Service’s Mount Holly office, said he is going “very worried” about deteoriorating conditions in the afternoon, and posted a map on Twitter showing where the threat of intense snowfall will be at 7 p.m.
12:34 P.M. An important thing to remember about this storm is the volatility in the forecast remains high, even though models have been trending snowier. State Climatologist David Robinson said the bust potential for this forecast is “tremendous” and the slightest shift in the forecast track could mean the difference between a major snowstorm, and a primarily rain event for much of the state.
Eric Holthaus, of the Wall Street Journal, points out that how much warm air enters region prior to storm will be crucial
12:04 P.M. The National Weather Service at Mount Hollyand Upton, N.Y. both issued briefing packages on the coming storm this morning. Each warned that blizzard conditions may occur Friday night in northern New Jersey. Mount Holly suggested blizzard warnings may be necessary as the storm unfolds.
Blizzard warnings are issued during very specific situations by the National Weather Service. Anticipated winds of at least 35 miles per hour and visibility reduced below a quarter of a mile for a period of three hours is necessary before the agency pulls the trigger on such a warning. Travel would become all but impossible.
11:53 A.M. David Robinson, the state climatologist at Rutgers University, said he does not envy forecasters today, calling this type of storm “the most difficult forecast a New Jersey meteorologist will have to make.” The forecast is complicated for a number of reasons, from New Jersey’s geography to the thermal profile of the atmosphere. More on why New Jersey winter storms are so hard to pin down later.
11:35 A.M. Forecast model guidance on the storm continues to vary but appears to be focusing in on a snowier solution for northern and central New Jersey. Overnight, several reliable models (The European, GFS and NAM) showed very different solutions to the storm, showing everything from minor event to a major winter storm that would have serious impacts on travel in northern sections of the state.
This morning, the GFS and NAM both showed the bulk of New Jersey north of I-195 receiving several inches of snow, perhaps exceeding a foot in some areas. The latest run of the European model, considered one of the most reliable, will be released at approximately 1:30 p.m.
[]
The North American Model (NAM) shows an even snowier solution for New Jersey, with parts of the state easily exceeding a foot of snow.
Keep in mind, each model run is just one of scores of pieces of data the National Weather Service uses to make forecasts and no single model should be viewed as a complete representation of what will happen.
11:30 A.M. A winter storm watch remains in effectfor the vast majority of northern and central New Jersey. Current forecasts call for six to 12 inches of snow, with higher amounts possible in the northern most sections of New Jersey.
Because the storm is highly complex and much remains uncertain, particularly where the rain/snow line will fall, the National Weather Service is holding off on issuing any warnings until this afternoon.
_The Associated Press contributed to this report._
Follow @SStirling
Note to readers: if you purchase something through one of our affiliate links we may earn a commission.
Even though summarization isn’t actually a difficult task for humans and our models aren’t more capable than humans, they already provide meaningful assistance: when asked to evaluate model-written summaries, the assisted group finds 50% more flaws than the control group. For deliberately misleading summaries, assistance increases how often humans spot the intended flaw from 27% to 45%.
Scaling properties of critiques
Assistance on model-written summaries only works if they are able to critique themselves. We ask humans to rate the helpfulness of model-written self-critiques, and find larger models are better at self-critiquing.
We also find that large models are able to directly improve their outputs, using their self-critiques, which small models are unable to do. Using better critiques helps models make better improvements than they do with worse critiques, or with no critiques.
Do models tell us everything they know?
To provide the best evaluation assistance on difficult tasks, we would like models to communicate all problems that they “know about.” Whenever a model correctly predicts that an answer is flawed, can the model also produce a concrete critique that humans understand?
This is particularly important for supervising models that could attempt to mislead human supervisors or hide information. We would like to train equally smart assistance models to point out what humans don’t notice.
Unfortunately, we found that models are better at discriminating than at critiquing their own answers, indicating they know about some problems that they can’t or don’t articulate. Furthermore, the gap between discrimination and critique ability did not appear to decrease for larger models. Reducing this gap is an important priority for our alignment research.
Next steps
An important limitation of this work is that topic-based summarization is not actually a difficult task: humans understand it quite well and it takes them only about 10 minutes to evaluate a summary. To understand the limits of AI-assisted evaluation better, we need to work with tasks that are much more difficult for humans to evaluate.
Nevertheless, these results make us optimistic that we can train models to provide humans with meaningful feedback assistance. This is an important pillar of our alignment strategy, starting with the work on debate and recursive reward modeling. In the long run, we want to build assistants that can be trusted to take on all of the cognitive labor needed for evaluation, so humans can focus on communicating their preferences.
If you’re interested in this line of research, we’re hiring Research Engineers and Research Scientists!
#sample samp {
display: block;
}
#sample .truncate {
max-height: 12.5rem;
overflow-y: scroll;
}
.js-toggler {
opacity: 0.4;
outline: none;
border-radius: 0;
border-bottom: 1px solid transparent;
margin-bottom: -1px;
}
.js-toggler:hover {
opacity: 0.6;
}
.js-toggler.active {
border-bottom-color: currentColor;
opacity: 1;
}
.js-refresh-sample {
outline: none;
}
.critiques > * {
margin-bottom: calc(var(–v) * 0.5);
}
.critiques > *:last-of-type {
margin-bottom: calc(var(–v) * 1);
}
.critiques > .unhelpful {
text-decoration: line-through;
opacity: 0.5;
}
:root {
–human-unassisted: 207, 197, 44;
–human-assisted: 0, 0, 255;
–model: 0, 0, 255;
}
[data-id=”critiques-unassisted”] .critiques-human {
border-left: 2px solid rgba(var(–human-unassisted), 1);
padding-left: 0.75rem;
}
[data-id=”critiques-assisted”] .critiques-model {
border-left: 2px solid rgba(var(–model), 1);
border-radius: unset;
padding-left: 0.75rem;
}
[data-id=”critiques-assisted”] .critiques-human {
border-left: 2px solid rgba(var(–human-assisted), 1);
padding-left: 0.75rem;
}
// get and randomize JSON samples, https://stackoverflow.com/a/35294675
// var filePath = “https://cdn.openai.com/critiques/draft-20220605a/”;
// var filePath = “https://gist.githubusercontent.com/justinjaywang/b14e9d05c8203a158dac4c5a26cf8017/raw/d2be835b7d4b4514c3cb1545e85f7fe9915d7499/”;
var filePath = “https://gist.githubusercontent.com/justinjaywang/e167409e9b21edde2378df90fba2a52a/raw/00ba4d779b892f16b19fa1882cce94261b480a43/”;
var samples = {
// file: “critique-samples-test.json”,
file: “critiques-samples.json”,
pairs: [
{key: ‘passage’, selector: ‘[data-fill=”passage”]’},
{key: ‘question’, selector: ‘[data-fill=”question”]’},
{key: ‘answer_human’, selector: ‘[data-fill=”answer_human”]’},
{key: ‘answer_human_misleading’, selector: ‘[data-fill=”answer_human_misleading”]’},
{key: ‘answer_model’, selector: ‘[data-fill=”answer_model”]’},
],
critiques: [
{key: ‘human_critiques_unassisted’, selector: ‘[data-fill=”human_critiques_unassisted”]’},
{key: ‘human_critiques_assisted_model’, selector: ‘[data-fill=”human_critiques_assisted_model”]’},
{key: ‘human_critiques_assisted’, selector: ‘[data-fill=”human_critiques_assisted”]’},
{key: ‘human_misleading_critiques_unassisted’, selector: ‘[data-fill=”human_misleading_critiques_unassisted”]’},
{key: ‘human_misleading_critiques_assisted_model’, selector: ‘[data-fill=”human_misleading_critiques_assisted_model”]’},
{key: ‘human_misleading_critiques_assisted’, selector: ‘[data-fill=”human_misleading_critiques_assisted”]’},
{key: ‘model_critiques_unassisted’, selector: ‘[data-fill=”model_critiques_unassisted”]’},
{key: ‘model_critiques_assisted_model’, selector: ‘[data-fill=”model_critiques_assisted_model”]’},
{key: ‘model_critiques_assisted’, selector: ‘[data-fill=”model_critiques_assisted”]’},
],
};
var openRequest = function () {
var request = new XMLHttpRequest();
request.open(‘GET’, filePath + samples[‘file’], true);
request.onload = function() {
if (request.status >= 200 && request.status < 400) {
// Success!
var data = JSON.parse(request.responseText);
samples.l = data.length;
samples.data = data;
showRefresh();
} else {
// We reached our target server, but it returned an error
console.log("error after reaching server with ", file)
}
};
request.onerror = function() {
// There was a connection error of some sort
console.log("request error with ", file)
};
request.send();
};
// open request
openRequest();
var showRefresh = function () {
sampleEl = document.getElementById('sample');
sampleEl.querySelector('.js-refresh-sample').style.visibility = 'visible';
};
var refreshSample = function () {
var i = rand(samples.l);
var sample = samples.data[i – 1];
// scroll to top of passage
var p = document.getElementById('passage');
p.scrollTop = 0;
// replace text in simple pairs
samples.pairs.forEach(function (pair) {
var sampleStr = sample[pair.key];
var formattedSampleStr = smarten(sampleStr.trim().replace(/n/g, '
‘));
// replace DOM
document.querySelector(pair.selector).innerHTML = formattedSampleStr;
});
// replace text in critiques
samples.critiques.forEach(function (critique) {
var critiquesEl = document.querySelector(critique.selector);
critiquesEl.innerHTML = ”; // clear out
var critArr = sample[critique.key]; // array of critique objects
if (!critArr.length) {
// no critiques, append “none” message
var c = document.createElement(‘div’);
c.classList.add(‘color-fg-50’, ‘font-italic’);
c.innerHTML = ‘(none)’;
critiquesEl.appendChild(c); // append to DOM
}
critArr.forEach(function (critObj) {
// append each critique to parent div
var critStr = critObj.critique;
var formattedcritStr = smarten(critStr.trim().replace(/n/g, ‘
‘));
var isUnhelpful = !!critObj.is_unhelpful;
var c = document.createElement(‘div’);
if (isUnhelpful) c.classList.add(‘unhelpful’);
c.innerHTML = formattedcritStr;
critiquesEl.appendChild(c); // append to DOM
});
});
};
var rand = function (l) {
return Math.floor((Math.random() * l) + 1);
};
// https://gist.github.com/drdrang/705071
var smarten = function (a) {
a = a.replace(/(^|[-u2014s([“])’/g, “$1u2018”); // opening singles
a = a.replace(/’/g, “u2019”); // closing singles & apostrophes
a = a.replace(/(^|[-u2014/[(u2018s])”/g, “$1u201c”); // opening doubles
a = a.replace(/”/g, “u201d”); // closing doubles
a = a.replace(/–/g, “u2014”); // em-dashes
return a
};
// toggle function
var toggle = function (whichIds, otherIds) {
for (var i = 0; i < whichIds.length; i++) {
var whichId = whichIds[i];
var whichEls = document.querySelectorAll('[data-id="' + whichId + '"]');
if (!whichEls.length) return;
whichEls.forEach(function (e) {
e.style.display = 'block';
});
}
for (var i = 0; i < otherIds.length; i++) {
var otherId = otherIds[i];
var otherEls = document.querySelectorAll('[data-id="' + otherId + '"]');
if (!otherEls.length) return;
otherEls.forEach(function (e) {
e.style.display = 'none';
});
}
};
// togglers
var initToggler = function () {
var togglers = document.querySelectorAll('.js-toggler');
if (!togglers.length) return;
for (var i = 0; i < togglers.length; i++) {
var toggler = togglers[i];
toggler.addEventListener('click', function (e) {
removeActiveTogglers(this.parentElement.querySelectorAll('.js-toggler'));
addActiveToggler(this);
});
}
};
var addActiveToggler = function (el) {
el.classList.add('active');
};
var removeActiveTogglers = function (els) {
els.forEach(function (el) {
el.classList.remove('active');
});
};
// init
document.addEventListener('DOMContentLoaded', function () {
initToggler();
});
import {Runtime, Inspector, Library} from “https://unpkg.com/@observablehq/[email protected]/dist/runtime.js”;
import notebook_helpfulness from “https://api.observablehq.com/d/77d06ccf5c498928.js?v=3”;
import notebook_assistance from “https://api.observablehq.com/d/0020a082debaba03.js?v=3”;
const customWidth = function (selector) {
return (new Library).Generators.observe(function(change) {
var width = change(document.querySelector(selector).clientWidth);
function resized() {
var w = document.querySelector(selector).clientWidth;
if (w !== width) change(width = w);
}
window.addEventListener(“resize”, resized);
return function() {
window.removeEventListener(“resize”, resized);
};
});
};
const helpfulness_renders = {
“chart”: “#chart-helpfulness”,
};
new Runtime(Object.assign(new Library, {width: customWidth(“#chart-helpfulness”)})).module(notebook_helpfulness, name => {
const selector = helpfulness_renders[name];
if (selector) { // key exists
return new Inspector(document.querySelector(selector));
} else {
return true;
}
});
const assistance_renders = {
“chart”: “#chart-assistance”,
};
new Runtime(Object.assign(new Library, {width: customWidth(“#chart-assistance”)})).module(notebook_assistance, name => {
const selector = assistance_renders[name];
if (selector) { // key exists
return new Inspector(document.querySelector(selector));
} else {
return true;
}
});