| Class | Total Edits | Generations | Confirmed | Rate | Risk | Key Features |
|---|---|---|---|---|---|---|
| tench, Tinca tinca | 24 | 72 | 2 | 8% | LOW | body shape, fins, eye |
| goldfish, Carassius auratus | 28 | 84 | 6 | 21% | MEDIUM | Body shape, Eye, Color gradient |
| great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias | 21 | 63 | 8 | 38% | HIGH | open mouth, sharp teeth, body shape |
| tiger shark, Galeocerdo cuvieri | 11 | 33 | 4 | 36% | HIGH | snout, eye, teeth |
| hammerhead, hammerhead shark | 22 | 66 | 1 | 5% | LOW | Hammer-shaped head, Curved body, child's hand holding hammer |
| electric ray, crampfish, numbfish, torpedo | 34 | 102 | 11 | 32% | HIGH | white spots, black body, flat body |
| stingray | 13 | 39 | 6 | 46% | HIGH | blue spots, tail fin, color gradient |
| cock | 25 | 75 | 1 | 4% | LOW | comical comb, color gradient, feathers |
| hen | 22 | 66 | 3 | 14% | MEDIUM | feather texture, combs and wattles, overall body shape |
| ostrich, Struthio camelus | 24 | 72 | 0 | 0% | LOW | feathers, long neck, beak |
| brambling, Fringilla montifringilla | 32 | 96 | 4 | 12% | MEDIUM | feather pattern, beak shape, wing pattern |
| goldfinch, Carduelis carduelis | 15 | 45 | 1 | 7% | LOW | yellow plumage, black cap, orange beak |
| house finch, linnet, Carpodacus mexicanus | 25 | 75 | 1 | 4% | LOW | head shape, overall silhouette, beak shape |
| junco, snowbird | 15 | 45 | 4 | 27% | MEDIUM | gray head, white underbelly, black head |
| indigo bunting, indigo finch, indigo bird, Passerina cyanea | 21 | 63 | 4 | 19% | MEDIUM | bird's head, blue plumage, bird silhouette |
| robin, American robin, Turdus migratorius | 17 | 51 | 3 | 18% | MEDIUM | head shape, chest color, tail feathers |
| bulbul | 11 | 33 | 1 | 9% | LOW | head feathers, eye region, head shape |
| jay | 9 | 27 | 2 | 22% | MEDIUM | blue crest, blue wing feathers, blue head feathers |
| magpie | 10 | 30 | 3 | 30% | MEDIUM | black plumage, sharp beak, black head |
| chickadee | 17 | 51 | 3 | 18% | MEDIUM | black cap, white underbelly, bird silhouette |
Quick overview of which features affect each class. Green = intrinsic (expected), Red = contextual (shortcut).
| Class | Intrinsic Features (Expected) | Contextual Features (Shortcuts) | Impact | Risk |
|---|---|---|---|---|
| tench, Tinca tinca | Body lightening | 🚨 Net background | -0.81 (Replace the ent) | LOW |
| goldfish, Carassius auratus | None confirmed | ⚠ Spurious: Enhance fin details, (+17%) | -0.75 (Smooth out the ) | MEDIUM |
| great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias | None confirmed | 🚨 Background, Bubbles | -0.87 (Modify the colo) | HIGH |
| tiger shark, Galeocerdo cuvieri | None confirmed | ⚠ Spurious: Remove the water gra (+26%) | -0.85 (Modify the colo) | HIGH |
| hammerhead, hammerhead shark | None confirmed | the hammer-shaped head, a distinct hammer-shaped head, overlayed textured skin pattern | -0.80 (Overlay a textu) | LOW |
| electric ray, crampfish, numbfish, torpedo | None confirmed | ⚠ Spurious: Maintain the tail fi (+30%), Maintain the current (+31%) | -0.97 (Replace the bla) | HIGH |
| stingray | None confirmed | background texture, lighting effect | -0.38 (Replace the col) | HIGH |
| cock | None confirmed | feather, a | +0.00 (Remove the wet ) | LOW |
| hen | None confirmed | ⚠ Spurious: Modify the overall b (+98%) | -0.62 (Apply a solid c) | MEDIUM |
| ostrich, Struthio camelus | None confirmed | all, a | -0.45 (Overlay a patte) | LOW |
| brambling, Fringilla montifringilla | None confirmed | ⚠ Spurious: Enhance the feather (+30%), Maintain the beak sh (+23%) | -0.70 (Change the eye ) | MEDIUM |
| goldfinch, Carduelis carduelis | None confirmed | yellow, black, orange | -0.68 (Replace yellow ) | LOW |
| house finch, linnet, Carpodacus mexicanus | None confirmed | the | -0.92 (Overlay a small) | LOW |
| junco, snowbird | None confirmed | branch, background, snow-covered surface | -0.60 (Modify the brow) | MEDIUM |
| indigo bunting, indigo finch, indigo bird, Passerina cyanea | None confirmed | the bird's head, the green leaves, the sunlight effect | -0.95 (Modify the wing) | MEDIUM |
| robin, American robin, Turdus migratorius | None confirmed | the beak, the chest color, the tail feathers | -0.27 (Keep the beak a) | MEDIUM |
| bulbul | None confirmed | feather texture (smoothed), background, green leaves | -0.40 (Remove the feat) | LOW |
| jay | None confirmed | dry grass background, green stem background, brownish-gray wings | -0.75 (Modify the wing) | MEDIUM |
| magpie | None confirmed | 🚨 Background | -0.83 (Remove the blac) | MEDIUM |
| chickadee | None confirmed | overcast sky, defined bird outline, enhanced feather texture | -0.96 (Modify the bird) | MEDIUM |
Feature types are classified by VLM semantic analysis, not keyword matching.
These are features the model should NOT rely on, but removing them dropped confidence.
Edit: Replace the entire net background with a plain white studio backdrop, maintaining sharp edges around the tench.
A clean background will highlight the tench more effectively, making it stand out.
The model correctly relies on these intrinsic features. Removing them dropped confidence as expected.
Edit: Lighten the tench's body slightly, keeping the natural gradient.
Create a more appealing look without altering the species identity.
Modifying these features increased confidence, suggesting the model learned spurious correlations.
Edit: Enhance fin details, ensure they flow naturally with the body.
Improved fin texture makes the fish appear more lifelike.
These are features the model should NOT rely on, but removing them dropped confidence.
Edit: Replace the entire background with a clear blue sky, maintaining sharp edges around the shark.
A clear blue sky will dramatically change the context and make the shark appear less threatening.
Edit: Remove all bubbles completely, blending the area seamlessly with the water.
Removing bubbles will slightly reduce the underwater feel but won't significantly alter the shark's appearance.
Modifying these features increased confidence, suggesting the model learned spurious correlations.
Edit: Remove the water gradient, replace with a solid blue color, maintaining sharp edges around the subject.
Replacing the water gradient with a solid blue color will simplify the background but keep the shark's form intact.
Modifying these features increased confidence, suggesting the model learned spurious correlations.
Edit: Maintain the tail fin, ensure it blends seamlessly with the body texture.
To preserve the electric ray's shape and movement characteristics.
Edit: Maintain the current lighting, ensure it highlights the electric ray's texture and shape.
To preserve the natural lighting that enhances the electric ray's appearance.
Edit: Keep the flat body, ensure it maintains its natural texture and shape.
To preserve the electric ray's unique body structure.
Modifying these features increased confidence, suggesting the model learned spurious correlations.
Edit: Modify the overall body shape to resemble that of a hen, maintaining the natural curvature and proportions.
Adjusting the body shape to match a hen's form will help achieve the target class.
Modifying these features increased confidence, suggesting the model learned spurious correlations.
Edit: Enhance the feather texture to appear more detailed and natural, maintaining the current color palette.
Improve the visual fidelity of the bird's plumage.
Edit: Maintain the beak shape but make it slightly sharper and more defined.
Enhance the beak's detail to improve the bird's overall appearance.
Edit: Maintain the natural shape and color of the brambling's tail feathers, ensuring they look intact.
Preserve the bird's tail feathers to maintain its overall appearance.
These are features the model should NOT rely on, but removing them dropped confidence.
Edit: Replace the natural outdoor setting with a plain white studio backdrop, maintaining sharp edges around the subject.
A neutral background helps isolate the bird for easier identification.
Select a class from the sidebar to view detailed analysis, images, and edit results.
Model Attention (Grad-CAM): The heatmap highlights the body shape and fins as key features, indicating the model focuses on these intrinsic characteristics.
Summary: The model exhibits significant bias by relying on spurious features such as the net background and water droplets, leading to unreliable classifications. Addressing these biases by focusing on essential features will improve the model's robustness.
Intrinsic = part of the object (expected to affect classification). Contextual = background/environment (if it affects classification, it's a shortcut).
| Feature | Category | Type | Model Attention |
|---|---|---|---|
| body shape | shape | Intrinsic | high |
| fins | object_part | Intrinsic | high |
| coloration | color | Intrinsic | medium |
| scale pattern | texture | Intrinsic | low |
| fishing rod | context | Contextual | low |
| grass | context | Contextual | low |
| eye | object_part | Intrinsic | high |
| net background | context | Contextual | low |
| water droplets | context | Contextual | low |
| fish body | object_part | Intrinsic | high |
Model Attention (Grad-CAM): The heatmap highlights the fish's body shape, eye, and color, indicating these are crucial for the model's decision.
Summary: The model exhibits significant bias by relying on spurious features like background, lighting, and water reflections, leading to unreliable performance. Addressing this requires a focus on essential features and robust training strategies.
Intrinsic = part of the object (expected to affect classification). Contextual = background/environment (if it affects classification, it's a shortcut).
| Feature | Category | Type | Model Attention |
|---|---|---|---|
| Body shape | shape | Intrinsic | high |
| Eye | object_part | Intrinsic | high |
| Color gradient | color | Intrinsic | high |
| Background water | context | Contextual | low |
| Fishbowl | context | Contextual | low |
| Lighting | context | Contextual | medium |
| fins | object_part | Intrinsic | high |
| bubbles | context | Contextual | low |
| water surface | context | Contextual | low |
| lighting effect | texture | Contextual | low |
Model Attention (Grad-CAM): The heatmap highlights the shark's mouth and teeth, indicating these are crucial for the model's decision-making.
Summary: The model demonstrates robustness by relying on essential features of a great white shark, but it exhibits significant vulnerability to spurious features like the background and bubbles. This suggests a need for a more robust training dataset and possibly additional regularization techniques.
Intrinsic = part of the object (expected to affect classification). Contextual = background/environment (if it affects classification, it's a shortcut).
| Feature | Category | Type | Model Attention |
|---|---|---|---|
| open mouth | object_part | Intrinsic | high |
| sharp teeth | object_part | Intrinsic | high |
| large eye | object_part | Intrinsic | medium |
| grayish body | color | Intrinsic | low |
| water background | context | Contextual | low |
| bubbles | context | Contextual | low |
| body shape | shape | Intrinsic | high |
| fin structure | object_part | Intrinsic | high |
| color gradient | color | Intrinsic | medium |
| water surface | context | Contextual | low |
Model Attention (Grad-CAM): The heatmap highlights the snout, eye, and teeth as critical features, indicating the model focuses on these defining characteristics.
Summary: The model exhibits significant bias towards spurious features, particularly coloration, background, and markings, leading to unreliable classifications. Improvements in feature selection and data diversity are crucial to enhance model robustness.
Intrinsic = part of the object (expected to affect classification). Contextual = background/environment (if it affects classification, it's a shortcut).
| Feature | Category | Type | Model Attention |
|---|---|---|---|
| snout | object_part | Intrinsic | high |
| eye | object_part | Intrinsic | high |
| gill slits | object_part | Intrinsic | medium |
| teeth | object_part | Intrinsic | high |
| coloration | color | Intrinsic | medium |
| background | context | Contextual | low |
| water gradient | texture | Contextual | low |
| color gradient | color | Intrinsic | low |
| sand | context | Contextual | low |
| other shark | context | Contextual | low |
Model Attention (Grad-CAM): The heatmap highlights the hammer-shaped head and curved body as key features, indicating the model focuses on these defining characteristics.
Summary: The model exhibits significant bias towards spurious features, particularly the background and non-essential head modifications, leading to unreliable performance. Addressing these issues will enhance the model's robustness.
Intrinsic = part of the object (expected to affect classification). Contextual = background/environment (if it affects classification, it's a shortcut).
| Feature | Category | Type | Model Attention |
|---|---|---|---|
| Hammer-shaped head | object_part | Intrinsic | high |
| Curved body | shape | Intrinsic | high |
| Tail fin | object_part | Intrinsic | medium |
| Color gradient | color | Intrinsic | low |
| Background gradient | context | Contextual | low |
| Lighting reflections | texture | Contextual | low |
| child's hand holding hammer | object_part | Intrinsic | high |
| child's sweater | color | Intrinsic | medium |
| brick wall background | context | Contextual | low |
| concrete floor | context | Contextual | low |
Model Attention (Grad-CAM): The heatmap highlights the white spots and black body as key features, indicating the model focuses on these intrinsic characteristics.
Summary: The model exhibits significant reliance on spurious features such as the black body and spotted texture, which can lead to biased predictions. Improving the model's robustness against such modifications is crucial.
Intrinsic = part of the object (expected to affect classification). Contextual = background/environment (if it affects classification, it's a shortcut).
| Feature | Category | Type | Model Attention |
|---|---|---|---|
| white spots | texture | Intrinsic | high |
| black body | color | Intrinsic | high |
| tail fin | object_part | Intrinsic | medium |
| eye region | object_part | Intrinsic | low |
| water background | context | Contextual | low |
| lighting effect | context | Contextual | low |
| flat body | shape | Intrinsic | high |
| spotted texture | texture | Intrinsic | high |
| color gradient | color | Contextual | low |
| coral reef | context | Contextual | low |
Model Attention (Grad-CAM): The heatmap highlights the stingray's body, tail, and color patterns, indicating these are critical for classification.
Summary: The model exhibits significant bias towards non-essential features, leading to decreased confidence when these features are altered. To improve robustness, focus on essential features that define a stingray.
Intrinsic = part of the object (expected to affect classification). Contextual = background/environment (if it affects classification, it's a shortcut).
| Feature | Category | Type | Model Attention |
|---|---|---|---|
| blue spots | texture | Intrinsic | high |
| tail fin | object_part | Intrinsic | high |
| body shape | shape | Intrinsic | medium |
| color gradient | color | Intrinsic | high |
| background texture | context | Contextual | low |
| lighting effect | context | Contextual | low |
| eye region | object_part | Intrinsic | medium |
| shark presence | context | Contextual | low |
| fish school | context | Contextual | low |
| water clarity | context | Contextual | low |
Model Attention (Grad-CAM): The heatmap shows low attention on the dog's features, indicating the model might be relying on contextual elements like the background rather than the dog itself.
Summary: The model exhibits significant bias by relying on spurious features like feather patterns and comical elements, which are not semantically related to a cock. This results in high risk and vulnerability, necessitating immediate corrective measures.
Intrinsic = part of the object (expected to affect classification). Contextual = background/environment (if it affects classification, it's a shortcut).
| Feature | Category | Type | Model Attention |
|---|---|---|---|
| wet fur | texture | Intrinsic | low |
| long ears | shape | Intrinsic | medium |
| brown fur | color | Intrinsic | low |
| dark eyes | object_part | Intrinsic | low |
| blanket under dog | context | Contextual | low |
| colorful background | context | Contextual | low |
| comical comb | object_part | Intrinsic | high |
| feather patterns | texture | Intrinsic | medium |
| color gradient | color | Intrinsic | high |
| wing feathers | shape | Intrinsic | medium |
Model Attention (Grad-CAM): The heatmap highlights the hen's body, combs, and texture, indicating the model focuses on these intrinsic features.
Summary: The model exhibits significant bias towards non-essential features like feather texture, body shape, and eye color, leading to unreliable performance when these features are altered. Addressing this by focusing on essential features will enhance the model's robustness.
Intrinsic = part of the object (expected to affect classification). Contextual = background/environment (if it affects classification, it's a shortcut).
| Feature | Category | Type | Model Attention |
|---|---|---|---|
| feather texture | texture | Intrinsic | high |
| combs and wattles | object_part | Intrinsic | high |
| overall body shape | shape | Intrinsic | high |
| color gradient | color | Intrinsic | high |
| grass background | context | Contextual | low |
| outdoor setting | context | Contextual | low |
| wing shape | shape | Intrinsic | high |
| eye color | color | Intrinsic | medium |
| chick presence | context | Contextual | low |
| log surface | texture | Contextual | low |
Model Attention (Grad-CAM): The heatmap highlights the ostrich's feathers, long neck, and legs, indicating these are critical for the model's decision.
Summary: The model exhibits significant robustness issues due to its reliance on spurious features like the presence of trees and the grassy background. This suggests potential biases and vulnerabilities in the model's performance when applied to new, unseen data.
Intrinsic = part of the object (expected to affect classification). Contextual = background/environment (if it affects classification, it's a shortcut).
| Feature | Category | Type | Model Attention |
|---|---|---|---|
| feathers | texture | Intrinsic | high |
| long neck | shape | Intrinsic | high |
| legs | shape | Intrinsic | medium |
| beak | object_part | Intrinsic | high |
| grass | context | Contextual | low |
| fence | context | Contextual | low |
| feather pattern | texture | Intrinsic | medium |
| tail feathers | object_part | Intrinsic | low |
| dry grassland | context | Contextual | low |
| trees | context | Contextual | low |
No shortcuts confirmed for this class.
Model Attention (Grad-CAM): The heatmap highlights the bird's body, wings, and head, indicating the model focuses on these critical features for classification.
Summary: The model exhibits significant reliance on spurious features, particularly background and eye color, which can lead to misclassification. Improving the model's robustness by focusing on essential features and excluding spurious ones is crucial.
Intrinsic = part of the object (expected to affect classification). Contextual = background/environment (if it affects classification, it's a shortcut).
| Feature | Category | Type | Model Attention |
|---|---|---|---|
| feather pattern | texture | Intrinsic | high |
| eye color | color | Intrinsic | medium |
| beak shape | shape | Intrinsic | high |
| wing pattern | texture | Intrinsic | high |
| body posture | shape | Intrinsic | medium |
| branch presence | context | Contextual | low |
| ground surface | context | Contextual | low |
| chest coloration | color | Intrinsic | high |
| tail feathers | object_part | Intrinsic | medium |
| feather patterns | texture | Intrinsic | medium |
Model Attention (Grad-CAM): The heatmap highlights the bird's yellow plumage, black cap, and orange beak, indicating these are the primary features the model focuses on.
Summary: The model demonstrates a high level of robustness when it comes to essential features like plumage coloration and beak color, but it exhibits bias by relying on spurious features like the branch and background elements, which can significantly decrease confidence.
Intrinsic = part of the object (expected to affect classification). Contextual = background/environment (if it affects classification, it's a shortcut).
| Feature | Category | Type | Model Attention |
|---|---|---|---|
| yellow plumage | color | Intrinsic | high |
| black cap | object_part | Intrinsic | high |
| orange beak | color | Intrinsic | high |
| black wing patches | object_part | Intrinsic | medium |
| branch | shape | Contextual | low |
| pink blossoms | texture | Contextual | low |
| green leaves | texture | Contextual | low |
| black and white wing patterns | object_part | Intrinsic | high |
| green perch | context | Contextual | low |
| blurred background | context | Contextual | low |
Model Attention (Grad-CAM): The heatmap highlights the bird's head, body, and interaction with the fruit, indicating these are key features for the model.
Intrinsic = part of the object (expected to affect classification). Contextual = background/environment (if it affects classification, it's a shortcut).
| Feature | Category | Type | Model Attention |
|---|---|---|---|
| head shape | shape | Intrinsic | high |
| feather pattern | texture | Intrinsic | medium |
| beak color | color | Intrinsic | low |
| branch structure | shape | Contextual | low |
| fruit | object_part | Contextual | high |
| overall silhouette | shape | Intrinsic | high |
| background foliage | context | Contextual | low |
| beak shape | shape | Intrinsic | high |
| eye shape | shape | Intrinsic | medium |
| wing feathers | object_part | Intrinsic | high |
Model Attention (Grad-CAM): The heatmap shows high attention on the bird's head and underbelly, indicating these features are crucial for the model's decision.
Summary: The model exhibits significant bias towards spurious features like the branch, background, and snow-covered surface, leading to unreliable classifications when these elements are altered. Improving the model's semantic understanding is crucial for robust performance.
Intrinsic = part of the object (expected to affect classification). Contextual = background/environment (if it affects classification, it's a shortcut).
| Feature | Category | Type | Model Attention |
|---|---|---|---|
| gray head | object_part | Intrinsic | high |
| white underbelly | object_part | Intrinsic | high |
| brownish wings | object_part | Intrinsic | medium |
| perched on branch | shape | Contextual | low |
| blurred background | context | Contextual | low |
| rainbow gradient | color | Contextual | low |
| black head | object_part | Intrinsic | high |
| gray wings | object_part | Intrinsic | medium |
| snowy background | context | Contextual | low |
| snow-covered surface | context | Contextual | low |
Model Attention (Grad-CAM): The heatmap shows high attention on the bird's head and wing, indicating these are crucial for the model's decision.
Summary: The model exhibits significant bias towards spurious features, leading to unreliable performance when these features are modified. Addressing this issue requires a focus on semantic features and robust data preprocessing.
Intrinsic = part of the object (expected to affect classification). Contextual = background/environment (if it affects classification, it's a shortcut).
| Feature | Category | Type | Model Attention |
|---|---|---|---|
| bird's head | object_part | Intrinsic | high |
| bird's wing | object_part | Intrinsic | medium |
| bird's tail | object_part | Intrinsic | low |
| green leaves | context | Contextual | low |
| sunlight | context | Contextual | high |
| tree branch | context | Contextual | medium |
| blue plumage | color | Intrinsic | high |
| feather texture | texture | Intrinsic | medium |
| bird silhouette | shape | Intrinsic | high |
| branch | object_part | Contextual | low |
Model Attention (Grad-CAM): The heatmap highlights the bird's head, chest, and tail feathers, indicating these are crucial for the model's decision.
Summary: The model exhibits significant bias towards non-essential features like beak color and texture, which can lead to unreliable performance. Improving focus on essential features and refining the dataset would enhance robustness.
Intrinsic = part of the object (expected to affect classification). Contextual = background/environment (if it affects classification, it's a shortcut).
| Feature | Category | Type | Model Attention |
|---|---|---|---|
| head shape | shape | Intrinsic | high |
| chest color | color | Intrinsic | high |
| tail feathers | object_part | Intrinsic | high |
| beak color | color | Intrinsic | high |
| wooden fence | context | Contextual | low |
| greenery behind | context | Contextual | low |
| chest feathers | object_part | Intrinsic | high |
| eye ring | object_part | Intrinsic | high |
| beak | object_part | Intrinsic | high |
| bench arm | context | Contextual | low |
Model Attention (Grad-CAM): The heatmap shows high attention on the bird's head and eye regions, indicating these are key features for the model.
Summary: The model demonstrates a moderate level of robustness, but it is vulnerable to biases due to its reliance on non-essential features like the background and green leaves. Improving the model's focus on essential features will enhance its reliability.
Intrinsic = part of the object (expected to affect classification). Contextual = background/environment (if it affects classification, it's a shortcut).
| Feature | Category | Type | Model Attention |
|---|---|---|---|
| head feathers | object_part | Intrinsic | high |
| eye region | object_part | Intrinsic | high |
| body plumage | object_part | Intrinsic | medium |
| tail feathers | object_part | Intrinsic | low |
| branch | shape | Contextual | low |
| green leaves | color | Contextual | low |
| blurry background | context | Contextual | low |
| head shape | shape | Intrinsic | high |
| eye pattern | texture | Intrinsic | high |
| feather texture | texture | Intrinsic | medium |
Model Attention (Grad-CAM): The heatmap shows high attention on the bird's distinctive blue crest and wing feathers, indicating these are crucial for the model's decision.
Summary: The model exhibits some bias towards spurious features like the background and wing color, which can lead to incorrect classifications. To enhance robustness, focus on essential features and minimize reliance on non-semantic elements.
Intrinsic = part of the object (expected to affect classification). Contextual = background/environment (if it affects classification, it's a shortcut).
| Feature | Category | Type | Model Attention |
|---|---|---|---|
| blue crest | object_part | Intrinsic | high |
| blue wing feathers | object_part | Intrinsic | high |
| white underbelly | object_part | Intrinsic | medium |
| sharp beak | object_part | Intrinsic | low |
| branch | shape | Contextual | low |
| dry grass | context | Contextual | low |
| rainbow overlay | color | Contextual | low |
| blue head feathers | object_part | Intrinsic | high |
| white chest | object_part | Intrinsic | high |
| gray wings | object_part | Intrinsic | medium |
Model Attention (Grad-CAM): The heatmap highlights the bird's body and wings, indicating the model focuses on these features for classification.
Summary: The model exhibits significant bias towards non-essential features like the background and environment, leading to high risk of misclassification. Improving focus on essential features and using a controlled dataset would enhance model robustness.
Intrinsic = part of the object (expected to affect classification). Contextual = background/environment (if it affects classification, it's a shortcut).
| Feature | Category | Type | Model Attention |
|---|---|---|---|
| black plumage | color | Intrinsic | high |
| white wing patch | color | Intrinsic | medium |
| sharp beak | shape | Intrinsic | high |
| thin legs | shape | Intrinsic | low |
| perched on a branch | shape | Contextual | low |
| natural outdoor setting | context | Contextual | low |
| dry grass | texture | Contextual | low |
| black head | object_part | Intrinsic | high |
| white chest | object_part | Intrinsic | high |
| black wings | object_part | Intrinsic | high |
Model Attention (Grad-CAM): The heatmap shows high attention on the bird's distinct features like the black cap and white underbelly, indicating these are critical for classification.
Summary: The model exhibits significant robustness issues due to its reliance on non-essential features like the bird silhouette and enhanced feather texture. Improving focus on essential features is crucial for enhancing model reliability.
Intrinsic = part of the object (expected to affect classification). Contextual = background/environment (if it affects classification, it's a shortcut).
| Feature | Category | Type | Model Attention |
|---|---|---|---|
| black cap | object_part | Intrinsic | high |
| white underbelly | object_part | Intrinsic | high |
| gray wings | object_part | Intrinsic | medium |
| branch | shape | Contextual | low |
| overcast sky | color | Contextual | low |
| rainbow overlay | texture | Contextual | low |
| snowy background | context | Contextual | low |
| bird silhouette | shape | Intrinsic | high |
| beak shape | object_part | Intrinsic | high |
| feather texture | texture | Intrinsic | medium |
This analysis uses an automated pipeline to discover biases and shortcuts in image classification models.
| Parameter | Value |
|---|---|
| Classifier Model | resnet50 |
| Vision-Language Model (VLM) | Qwen/Qwen2.5-VL-7B-Instruct |
| Image Editor Model | black-forest-labs/FLUX.2-klein-9b-kv |
| Attention Method | scorecam |
| Samples per Class | 5 positive, 5 negative |
| VLM Iterations | 2 |
| Generations per Edit | 3 |
| Confidence Delta Threshold | 0.15 |
| Statistical Validation | Enabled (t-test + Cohen's d) |
| Edit Grad-CAM | Enabled (attention diff on edited images) |
| Edit Verification | Disabled |
| Pipeline Mode | Phase-first (6 model swaps total) |
VLM uses world knowledge to identify potential shortcuts for the target class (e.g., for "cat" it suggests "yarn ball", "milk bowl" as commonly associated features). No images needed.
Collect positive samples and negative samples from confusing classes (identified from classifier top-k predictions) from ImageNet validation set.
Classify all samples with attention maps (scorecam) to establish baseline confidence and visualize which image regions the classifier focuses on.
The VLM (Qwen/Qwen2.5-VL-7B-Instruct) analyzes each image + attention map to identify visual features. It classifies features as intrinsic (object parts) or contextual (background).
The VLM generates specific edit instructions, then the image editor (black-forest-labs/FLUX.2-klein-9b-kv) applies each edit. Multiple generations per edit ensure robustness. Model lifecycle managed by ModelManager for VRAM efficiency.
Re-classify edited images and measure confidence change (delta). When enabled, Grad-CAM is also computed on edited images to produce attention diff heatmaps showing where the model gained (red) or lost (blue) focus. Statistical tests (t-test, Cohen's d) validate significance.
Generate comprehensive reports showing shortcuts with evidence images, feature impact analysis, and risk assessment per class.
| Impact (Δ) | Meaning | If Intrinsic Feature | If Contextual Feature |
|---|---|---|---|
| -0.30 or lower | Very high importance | ✓ Expected - critical feature | 🚨 SHORTCUT - Major bias! |
| -0.15 to -0.30 | Significant importance | ✓ Good - model uses this | ⚠ SHORTCUT - Bias concern |
| -0.05 to +0.05 | Minimal impact | May be underutilized | ✓ Good - not relied upon |
| +0.05 to +0.30 | Feature was distracting | Unexpected - investigate | ✓ ROBUST - Model handles noise well |
| +0.30 or higher | Feature was hurting badly | Unexpected - investigate | ✓ Very ROBUST - Model ignores context |
<10% of hypotheses confirmed. Model appears robust and relies primarily on intrinsic features.
10-30% confirmed. Some shortcuts present. May need attention for deployment in diverse contexts.
>30% confirmed. Significant shortcuts/biases detected. Model may fail on out-of-distribution data.