Adding passage vector search to #Lucene
https://www.elastic.co/fr/blog/adding-passage-vector-search-to-lucene?trk=feed_main-feed-card_reshare_feed-article-content
2/ targeted towards a more general audience. Thakare/Laddha/Pawar's Hybrid Intelligent Systems for Information Retrieval looks like a possibility...
Also open to books that approach from a specific #search implementation perspective, e.g. #Elasticsearch, #Solr, #Lucene...but most appear to be older / specific subtopic focused.
#search #ElasticSearch #solr #lucene
@futurebird @Jirikiha You might be able to build something with #Lucene (https://en.wikipedia.org/wiki/Apache_Lucene#Lucene-based_projects) or, for a much more lightweight option, with #Xapian (https://en.wikipedia.org/wiki/Xapian).
You'd still have to build something yourself from those though. For Xapian, looking at the source for #mu4e and #mu would probably be usable as a decent example (https://www.djcbsoftware.nl/code/mu/).
You'd also need to figure out some way to feed data exports from those into it.
All of my suggestions are #FreeSoftware & gratis.
#freesoftware #lucene #xapian #mu4e #mu
@mage @andybaio Indeed. For similar reasons, it took over 10 years since its creation for #Wikimedia Foundation to prioritise any serious investment on #MediaWiki search: https://www.mediawiki.org/wiki/Extension:CirrusSearch
For the longest time, #Lucene at WMF was maintained by a lone volunteer, river.
Nowadays there's an entire team which powers some of the best #i18n aware search in the web + some translation memory #TM.
https://www.mediawiki.org/wiki/Wikimedia_Search_Platform
Trey Jones' notes are a treasure trove. #NLP
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes
#wikimedia #mediawiki #lucene #i18n #tm #nlp
@wchr The quoted sentence from https://www.eff.org/deeplinks/2023/01/eff-tells-supreme-court-user-speech-must-be-protected is puzzling but it's commenting a hypothetical scenario where the changes are major: «substantive claims related to how their systems recommend, promote, rank, arrange, or otherwise display content posted by their users».
https://www.eff.org/files/2023/01/19/21-1333_amicus_brief.pdf
It's a scenario where misconfiguring your site's #Lucene is a legal liability.
On the broader point, we still don't know that #YouTube causes #radicalization.
https://www.techdirt.com/2021/11/03/whole-youtube-radicalizes-people-story-doesnt-seem-to-have-much-evidence-to-back-it-up/
#lucene #youtube #radicalization #section230
I added the face recognition project to my list of PoC's on GitHub. It now uses #Lucene to lookup faces and #jOOQ to store them in a database. I wonder how long it would take to scan the FFHQ dataset. My jdlib fork still lacks mini-batch support for GPU image processing. Time for more JNI coding.😭
I'm currently adding a #facedetection module / API to video4j. The current implementation is using opencv. I'll try a CNN based face detector using dlib next. Should be much faster since it is GPU powered. I want to automatically extract embeddings and use kNN search with #lucene HnswGraph (Hierarchical Navigable Small World graph) to test face recognition / matching.
not knowing the backstory between #CouchDB & #CouchBase kept nagging at me, so i did a little google spelunking. Here's the short version:
#CouchDB's creator Damien Katz formed a company called CouchIO after CouchDB became an #Apache project.
CouchIO offered hosting + nice to haves like #Lucene, geospacial indexing, etc.
CouchIO renamed themselves as CouchOne and released a mobile dev platform based on CouchDB and optomized for mobile devices.
🧵 1/?
#couchdb #couchbase #apache #lucene
Do you know how to write operations in #CrateDB? 👀
In our new blog post, we will give you a throughout understanding of how #CrateDB writes new records 🤓 Learn the basic concepts of #Lucene and the concept of #translog👇
https://hubs.ly/Q01w74N50
An #introduction to OpenSource Connections- we're a group of specialists in #opensource search engines such as #lucene, #Solr, #Elasticsearch & #OpenSearch based across the US, UK and EU. We're known for the Manning book 'Relevant Search', the Haystack conference series and the 3000+ person Relevance Slack. Our mission is to Empower Search Teams to build more accurate & relevant search engines using data-driven, repeatable, hypothesis-based processes & techniques. We help make search better!
#introduction #opensource #lucene #solr #elasticsearch #opensearch
An #introduction to OpenSource Connections- we're a group of specialists in #opensource search engines such as #lucene, #Solr, #Elasticsearch & #OpenSearch based across the US, UK and EU. We're known for the Manning book 'Relevant Search', the Haystack conference series and the 3000+ person Relevance Slack. Our mission is to Empower Search Teams to build more accurate & relevant search engines using data-driven, repeatable, hypothesis-based processes & techniques. We help make search better!
#introduction #opensource #lucene #solr #elasticsearch #opensearch
A quick introduction to OpenSource Connections- we're a group of specialists in #opensource search engines such as #lucene, #Solr, #Elasticsearch & #OpenSearch based across the US, UK and EU. We're known for the Manning book 'Relevant Search', the Haystack conference series and the 3000+ person Relevance Slack. Our mission is to Empower Search Teams to build more accurate & relevant search engines using data-driven, repeatable, hypothesis-based processes & techniques. We help make search better!
#opensource #lucene #solr #elasticsearch #opensearch
So here's my #introduction - I work for OpenSource Connections (OSC) @o19s, we offer consulting on #opensource search engines - #Lucene, #Solr, #Elasticsearch and now #OpenSource in the domain of Search Relevance - basically we help companies using these engines deliver the right results to their users. I'm currently heading up Marketing for OSC but I also help with sales, run our Haystack conference series, write, blog and present talks on search and run some customer projects.
#introduction #opensource #lucene #solr #ElasticSearch
@Josh412 I think so. 🙂 There are some great open source search engines out there already like #lucene (on which both #elasticsearch and #solr are built). I'm particularly interested in improving web search, while the number of contenders has increased, imho, they are all essentially competing on the same ML basis and thus can't be truly disruptive. I'd like to see a ML engine with human augmentation. This has been attempted several times before (Blekko, Wikia Search, Zakta, etc.)...
On one end of the complexity scale we could have a single box setup:
1. #Ktor or #Vertx as a sort of reverse proxy.
2. Vertx as an HTTP server and Guava caches for caching.
3. A simple LinkedBlockingQueue as our queuing mechanism. Maybe with a to-disk write-ahead log.
4. The EventBus as a controller.
5. Vertx verticles as our processors.
6. #sqlite as a database and post store.
7. On-disk media storage.
8. #Lucene for search.
Here we've checked all of the boxes. It won't scale, however. 3/
Just released: Confluence search syntax Cheat Sheet by luisfe
Download it free at http://www.cheatography.com/luisfe/cheat-sheets/confluence-search-syntax/?utm_source=mastodon
Here's their description of it: Confluence's syntax to refine search results
@cheatsheets #CheatSheet #CheatSheets #search #atlassian #lucene #confluence
#cheatsheet #cheatsheets #search #atlassian #lucene #confluence
If you're a java software engineer with a background in #search (especially #elasticsearch or #lucene) this job opening might be just for you https://jobs.elastic.co/jobs/elasticsearch/distributed-emea/elasticsearch-senior-software-engineer-search-area/4634209?gh_jid=4634209#/
#search #elasticsearch #lucene
anyone here use Elasticsearch or anything else based on Lucene, implemented rootless? There is a write.lock permissions error that we can't seem to shake
#tech #NLProc #linguistics #elasticsearch #lucene #writeLock #rootless
#Tech #nlproc #linguistics #elasticsearch #lucene #writelock #rootless
RT @nknize@twitter.com
OK #foss4g friends, @elastic@twitter.com geo is getting super exciting! After nearly 5 years, and many new #lucene spatial data structures & field types, you'll soon be able to index spatial data in its native CRS w/o reprojecting to WGS84 lat/lon! #GIS #spatialIndexing #geoGeek
#foss4g #lucene #gis #spatialIndexing #geoGeek