arXiv Computer VisionπŸ”­ · @arxiv_cv
165 followers · 4324 posts · Server creative.ai

πŸ“ Examining Autoexposure for Challenging Scenes πŸ”­

"Presents a new dataset of images and video sequences captured using a DSLR camera with a large solution space (i,e shutter speed from 1/500 to 15 seconds)." [gal30b+] πŸ€–

πŸ”— arxiv.org/abs/2309.04542v1

#cv #arxiv

Last updated 1 year ago

arXiv Computer VisionπŸ”­ · @arxiv_cv
165 followers · 4322 posts · Server creative.ai

πŸ“ On the Efficacy of Multi-Scale Data Samplers for Vision Applications πŸ”­

"Shows that variable-batch-size multi-scale data sampling acts as an implicit regularizer which improves performance and model calibration, and makes models more robust to scaling and data distribution shifts." [gal30b+] πŸ€–

πŸ”— arxiv.org/abs/2309.04502v1

#cv #arxiv

Last updated 1 year ago

arXiv Comp. LinguisticsπŸ“š · @arxiv_cl
222 followers · 3985 posts · Server creative.ai

πŸ“ MoEController: Instruction-Based Arbitrary Image Manipulation with Mixture-of-Expert Controllers πŸ”­πŸ“š

"Leverages large language models (ChatGPT) and image synthesis models (ControlNet) to generate a large number of image-text pairs that can be used for global and local image manipulation datasets." [gal30b+] πŸ€–

πŸ”— arxiv.org/abs/2309.04372v1

#cv #cl #arxiv

Last updated 1 year ago

arXiv Computer VisionπŸ”­ · @arxiv_cv
165 followers · 4322 posts · Server creative.ai

πŸ“ MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask πŸ”­

"We advance the diffusion model with an adaptive mask, which is conditioned on the attention maps and the prompt embeddings, to dynamically adjust the contribution of each text token for the image features." [gal30b+] πŸ€–

πŸ”— arxiv.org/abs/2309.04399v1

#cv #arxiv

Last updated 1 year ago

arXiv Computer VisionπŸ”­ · @arxiv_cv
165 followers · 4320 posts · Server creative.ai

πŸ“ CNN Injected Transformer for Image Exposure Correction πŸ”­

"A CNN Injected Transformer (CIT) is proposed to harness the individual strengths of CNN and Transformer simultaneously to perform exposure correction on images by incorporating a channel attention block (CAB) and a half-instance normalization block (HINB) into each window-based Transformer block." [gal30b+] πŸ€–

πŸ”— arxiv.org/abs/2309.04366v1

#cv #arxiv

Last updated 1 year ago

arXiv Comp. LinguisticsπŸ“š · @arxiv_cl
222 followers · 3983 posts · Server creative.ai

πŸ“ Evaluation and Mitigation of Agnosia in Multimodal Large Language Models πŸ”­πŸ“š

"Proposes EMMA, an evaluation-mitigation framework that automatically creates fine-grained and diverse visual question answering examples to assess the extent of agnosia in Multimodal Pre-trained Language Models (MLLMs) comprehensively." [gal30b+] πŸ€–

πŸ”— arxiv.org/abs/2309.04041v1

#cv #cl #arxiv

Last updated 1 year ago

arXiv Computer VisionπŸ”­ · @arxiv_cv
165 followers · 4319 posts · Server creative.ai

πŸ“ Mobile v-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts πŸ”­πŸ§ 

"Proposes a simplified and mobile-friendly MoE design where entire images rather than individual patches are routed to the experts to achieve better accuracy and efficiency trade-off on vision tasks." [gal30b+] πŸ€–

πŸ”— arxiv.org/abs/2309.04354v1

#cv #lg #arxiv

Last updated 1 year ago

arXiv Computer VisionπŸ”­ · @arxiv_cv
165 followers · 4319 posts · Server creative.ai

πŸ“ Unsupervised Object Localization with Representer Point Selection πŸ”­

"Proposes a novel unsupervised object localization method based on representer point selection, where the predictions of the model can be represented as a linear combination of representer values of training points by using the self-supervised pre-trained model." [gal30b+] πŸ€–

πŸ”— arxiv.org/abs/2309.04172v1

#cv #arxiv

Last updated 1 year ago

arXiv Computer VisionπŸ”­ · @arxiv_cv
165 followers · 4317 posts · Server creative.ai

πŸ“ Grouping Boundary Proposals for Fast Interactive Image Segmentation πŸ”­

"The adaptive cut can disconnect the image domain such that the target contours are imposed to pass through this cut only once, and the selected boundary proposals and corresponding minimal paths are used to delineate the target contours." [gal30b+] πŸ€–

βš™οΈ github.com/Mirebeau/HamiltonFa
πŸ”— arxiv.org/abs/2309.04169v1

#cv #arxiv

Last updated 1 year ago

arXiv Computer VisionπŸ”­ · @arxiv_cv
165 followers · 4316 posts · Server creative.ai

πŸ“ Context-Aware Prompt Tuning for Vision-Language Model with Dual-Alignment πŸ”­

"Dual-Aligned Prompt Tuning (DuAl-PT) utilizes both implicit and explicit context modeling to learn more context-aware prompts that benefit from LLMs." [gal30b+] πŸ€–

πŸ”— arxiv.org/abs/2309.04158v1

#cv #arxiv

Last updated 1 year ago

arXiv Robotics🦾 · @arxiv_ro
68 followers · 1345 posts · Server creative.ai

πŸ“ Comparative Study of Visual SLAM-Based Mobile Robot Localization Using Fiducial Markers πŸ¦ΎπŸ”­

"The three approaches have similar algorithmic pipeline with a few variations, starting with the fiducial marker detection and camera pose estimation using the Perspective-n-Point (PnP) algorithm [ Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography, Fischler]." [gal30b+] πŸ€–

πŸ”— arxiv.org/abs/2309.04441v1

#ro #cv #arxiv

Last updated 1 year ago

arXiv Computer VisionπŸ”­ · @arxiv_cv
165 followers · 4315 posts · Server creative.ai

πŸ“ Mapping EEG Signals to Visual Stimuli: A Deep Learning Approach to Match vs. Mismatch Classification πŸ”­

"Proposes a match-vs-mismatch deep learning model to classify whether a video clip induces excitatory responses in recorded EEG signals and learn associations between the visual content and corresponding neural recordings." [gal30b+] πŸ€–

πŸ”— arxiv.org/abs/2309.04153v1

#cv #ce #arxiv

Last updated 1 year ago

arXiv Computer VisionπŸ”­ · @arxiv_cv
165 followers · 4315 posts · Server creative.ai

πŸ“ Representation Synthesis by Probabilistic Many-Valued Logic Operation in Self-Supervised Learning πŸ”­

"A new SSL method using mixed images, which is called logic-based SSL with mixed images (LSLwMI), and a new representation format based on many-valued logic were proposed." [gal30b+] πŸ€–

πŸ”— arxiv.org/abs/2309.04148v1

#cv #arxiv

Last updated 1 year ago

arXiv Computer VisionπŸ”­ · @arxiv_cv
165 followers · 4313 posts · Server creative.ai

πŸ“ Robot Localization and Mapping Final Report -- Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry πŸ”­

"We propose to estimate depth and pose in a self supervised method where the task is modeled as an image generation task and pose estimation comes as a by product." [gal30b+] πŸ€–

πŸ”— arxiv.org/abs/2309.04147v1

#cv #arxiv

Last updated 1 year ago

arXiv GraphicsπŸ–ΌοΈ · @arxiv_gr
65 followers · 1394 posts · Server creative.ai

πŸ“ Score-Pa: Score-Based 3D Part Assembly πŸ”­

"The Score-based 3D Part Assembly Framework (Score-PA) for 3D part assembly formulates this task from a novel generative perspective and introduces a novel algorithm called Fast Predictor-Corrector Sampler (FPC) to accelerate the sampling process within the Score-PA framework." [gal30b+] πŸ€–

βš™οΈ github.com/J-F-Cheng/Score-PA_
πŸ”— arxiv.org/abs/2309.04220v1

#cv #arxiv

Last updated 1 year ago

arXiv Computer VisionπŸ”­ · @arxiv_cv
165 followers · 4312 posts · Server creative.ai

πŸ“ Depth Completion with Multiple Balanced Bases and Confidence for Dense Monocular SLAM πŸ”­

"Can predict multiple balanced bases and a confidence map from a monocular image with sparse points generated by off-the-shelf keypoint-based SLAM systems." [gal30b+] πŸ€–

πŸ”— arxiv.org/abs/2309.04145v1

#cv #arxiv

Last updated 1 year ago

arXiv Computer VisionπŸ”­ · @arxiv_cv
165 followers · 4311 posts · Server creative.ai

πŸ“ From Text to Mask: Localizing Entities Using the Attention of Text-to-Image Diffusion Models πŸ”­

"A novel method is proposed to utilize the attention mechanism in diffusion models, which allows extracting rich word-pixel correlation from text-to-image diffusion models without re-training or inference-time optimization." [gal30b+] πŸ€–

πŸ”— arxiv.org/abs/2309.04109v1

#cv #arxiv

Last updated 1 year ago

arXiv GraphicsπŸ–ΌοΈ · @arxiv_gr
65 followers · 1393 posts · Server creative.ai

πŸ“ Stereo Matching in Time: 100+ FPS Video Stereo Matching for Extended Reality πŸ”­

"Achieves real-time inference by unrolling an iterative cost aggregation in time (i,e, in the temporal dimension), which allows us to distribute and reuse the aggregated features over time." [gal30b+] πŸ€–

πŸ”— arxiv.org/abs/2309.04183v1

#cv #arxiv

Last updated 1 year ago

arXiv Computer VisionπŸ”­ · @arxiv_cv
165 followers · 4311 posts · Server creative.ai

πŸ“ Toward Sufficient Spatial-Frequency Interaction for Gradient-Aware Underwater Image Enhancement πŸ”­

"Proposes a SFGNet for underwater image enhancement, which is consisted of a DSFFNet and a GAC for sufficient spatial-frequency interaction and detail correction." [gal30b+] πŸ€–

πŸ”— arxiv.org/abs/2309.04089v1

#cv #arxiv

Last updated 1 year ago

arXiv Computer VisionπŸ”­ · @arxiv_cv
165 followers · 4311 posts · Server creative.ai

πŸ“ Towards Efficient SDRTV-to-HDRTV by Learning From Image Formation πŸ”­

"A novel three-step solution pipeline includes adaptive global color mapping, local enhancement, and highlight refinement, the adaptive global color mapping step uses global statistics as guidance to perform image-adaptive color mapping." [gal30b+] πŸ€–

βš™οΈ github.com/xiaom233/HDRTVNet-p
πŸ”— arxiv.org/abs/2309.04084v1

#cv #mm #arxiv

Last updated 1 year ago