π Examining Autoexposure for Challenging Scenes π
"Presents a new dataset of images and video sequences captured using a DSLR camera with a large solution space (i,e shutter speed from 1/500 to 15 seconds)." [gal30b+] π€ #CV
π On the Efficacy of Multi-Scale Data Samplers for Vision Applications π
"Shows that variable-batch-size multi-scale data sampling acts as an implicit regularizer which improves performance and model calibration, and makes models more robust to scaling and data distribution shifts." [gal30b+] π€ #CV
π MoEController: Instruction-Based Arbitrary Image Manipulation with Mixture-of-Expert Controllers ππ
"Leverages large language models (ChatGPT) and image synthesis models (ControlNet) to generate a large number of image-text pairs that can be used for global and local image manipulation datasets." [gal30b+] π€ #CV #CL
π MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask π
"We advance the diffusion model with an adaptive mask, which is conditioned on the attention maps and the prompt embeddings, to dynamically adjust the contribution of each text token for the image features." [gal30b+] π€ #CV
π CNN Injected Transformer for Image Exposure Correction π
"A CNN Injected Transformer (CIT) is proposed to harness the individual strengths of CNN and Transformer simultaneously to perform exposure correction on images by incorporating a channel attention block (CAB) and a half-instance normalization block (HINB) into each window-based Transformer block." [gal30b+] π€ #CV
π Evaluation and Mitigation of Agnosia in Multimodal Large Language Models ππ
"Proposes EMMA, an evaluation-mitigation framework that automatically creates fine-grained and diverse visual question answering examples to assess the extent of agnosia in Multimodal Pre-trained Language Models (MLLMs) comprehensively." [gal30b+] π€ #CV #CL
π Mobile v-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts ππ§
"Proposes a simplified and mobile-friendly MoE design where entire images rather than individual patches are routed to the experts to achieve better accuracy and efficiency trade-off on vision tasks." [gal30b+] π€ #CV #LG
π Unsupervised Object Localization with Representer Point Selection π
"Proposes a novel unsupervised object localization method based on representer point selection, where the predictions of the model can be represented as a linear combination of representer values of training points by using the self-supervised pre-trained model." [gal30b+] π€ #CV
π Grouping Boundary Proposals for Fast Interactive Image Segmentation π
"The adaptive cut can disconnect the image domain such that the target contours are imposed to pass through this cut only once, and the selected boundary proposals and corresponding minimal paths are used to delineate the target contours." [gal30b+] π€ #CV
βοΈ https://github.com/Mirebeau/HamiltonFastMarching
π https://arxiv.org/abs/2309.04169v1 #arxiv
π Context-Aware Prompt Tuning for Vision-Language Model with Dual-Alignment π
"Dual-Aligned Prompt Tuning (DuAl-PT) utilizes both implicit and explicit context modeling to learn more context-aware prompts that benefit from LLMs." [gal30b+] π€ #CV
π Comparative Study of Visual SLAM-Based Mobile Robot Localization Using Fiducial Markers π¦Ύπ
"The three approaches have similar algorithmic pipeline with a few variations, starting with the fiducial marker detection and camera pose estimation using the Perspective-n-Point (PnP) algorithm [ Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography, Fischler]." [gal30b+] π€ #RO #CV
π Mapping EEG Signals to Visual Stimuli: A Deep Learning Approach to Match vs. Mismatch Classification π
"Proposes a match-vs-mismatch deep learning model to classify whether a video clip induces excitatory responses in recorded EEG signals and learn associations between the visual content and corresponding neural recordings." [gal30b+] π€ #CV #CE
π Representation Synthesis by Probabilistic Many-Valued Logic Operation in Self-Supervised Learning π
"A new SSL method using mixed images, which is called logic-based SSL with mixed images (LSLwMI), and a new representation format based on many-valued logic were proposed." [gal30b+] π€ #CV
π Robot Localization and Mapping Final Report -- Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry π
"We propose to estimate depth and pose in a self supervised method where the task is modeled as an image generation task and pose estimation comes as a by product." [gal30b+] π€ #CV
π Score-Pa: Score-Based 3D Part Assembly π
"The Score-based 3D Part Assembly Framework (Score-PA) for 3D part assembly formulates this task from a novel generative perspective and introduces a novel algorithm called Fast Predictor-Corrector Sampler (FPC) to accelerate the sampling process within the Score-PA framework." [gal30b+] π€ #CV
βοΈ https://github.com/J-F-Cheng/Score-PA_Score-based-3D-Part-Assembly
π https://arxiv.org/abs/2309.04220v1 #arxiv
π Depth Completion with Multiple Balanced Bases and Confidence for Dense Monocular SLAM π
"Can predict multiple balanced bases and a confidence map from a monocular image with sparse points generated by off-the-shelf keypoint-based SLAM systems." [gal30b+] π€ #CV
π From Text to Mask: Localizing Entities Using the Attention of Text-to-Image Diffusion Models π
"A novel method is proposed to utilize the attention mechanism in diffusion models, which allows extracting rich word-pixel correlation from text-to-image diffusion models without re-training or inference-time optimization." [gal30b+] π€ #CV
π Stereo Matching in Time: 100+ FPS Video Stereo Matching for Extended Reality π
"Achieves real-time inference by unrolling an iterative cost aggregation in time (i,e, in the temporal dimension), which allows us to distribute and reuse the aggregated features over time." [gal30b+] π€ #CV
π Toward Sufficient Spatial-Frequency Interaction for Gradient-Aware Underwater Image Enhancement π
"Proposes a SFGNet for underwater image enhancement, which is consisted of a DSFFNet and a GAC for sufficient spatial-frequency interaction and detail correction." [gal30b+] π€ #CV
π Towards Efficient SDRTV-to-HDRTV by Learning From Image Formation π
"A novel three-step solution pipeline includes adaptive global color mapping, local enhancement, and highlight refinement, the adaptive global color mapping step uses global statistics as guidance to perform image-adaptive color mapping." [gal30b+] π€ #CV #MM
βοΈ https://github.com/xiaom233/HDRTVNet-plus
π https://arxiv.org/abs/2309.04084v1 #arxiv