博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
elasticsearch vs solr
阅读量:7042 次
发布时间:2019-06-28

本文共 14459 字,大约阅读时间需要 48 分钟。

  hot3.png

API

Feature Solr 6.2.1 ElasticSearch 5.0
Format XML, CSV, JSON JSON
HTTP REST API tick.png tick.png
Binary API  tick.png SolrJ tick.png TransportClient, Thrift (through a )
JMX support tick.png cross.png ES specific stats are exposed through the REST API
Official client libraries  Java Java, Groovy, PHP, Ruby, Perl, Python, .NET, Javascript 
Community client libraries  PHP, Ruby, Perl, Scala, Python, .NET, Javascript, Go, Erlang, Clojure Clojure, Cold Fusion, Erlang, Go, Groovy, Haskell, Java, JavaScript, .NET, OCaml, Perl, PHP, Python, R, Ruby, Scala, Smalltalk, Vert.x 
3rd-party product integration (open-source) Drupal, Magento, Django, ColdFusion, Wordpress, OpenCMS, Plone, Typo3, ez Publish, Symfony2, Riak (via Yokozuna) Drupal, Django, Symfony2, Wordpress, CouchBase
3rd-party product integration (commercial) DataStax Enterprise Search, Cloudera Search, Hortonworks Data Platform, MapR SearchBlox, Hortonworks Data Platform, MapR etc 
Output JSON, XML, PHP, Python, Ruby, CSV, Velocity, XSLT, native Java JSON, XML/HTML (via )

Infrastructure

Feature Solr 6.2.1 ElasticSearch 5.0
Master-slave replication tick.png Only in non-SolrCloud. In SolrCloud, behaves identically to ES. cross.png Not an issue because shards are replicated across nodes.
Integrated snapshot and restore Filesystem Filesystem, AWS Cloud Plugin for S3 repositories, HDFS Plugin for Hadoop environments, Azure Cloud Plugin for Azure storage repositories

Indexing

Feature Solr 6.2.1 ElasticSearch 5.0
Data Import DataImportHandler - JDBC, CSV, XML, Tika, URL, Flat File [DEPRECATED in 2.x] Rivers modules - ActiveMQ, Amazon SQS, CouchDB, Dropbox, DynamoDB, FileSystem, Git, GitHub, Hazelcast, JDBC, JMS, Kafka, LDAP, MongoDB, neo4j, OAI, RabbitMQ, Redis, RSS, Sofa, Solr, St9, Subversion, Twitter, Wikipedia
ID field for updates and deduplication tick.png tick.png
DocValues  tick.png tick.png
Partial Doc Updates  tick.png with stored fields tick.png with _source field
Custom Analyzers and Tokenizers  tick.png tick.png
Per-field analyzer chain  tick.png tick.png
Per-doc/query analyzer chain  cross.png tick.png
Index-time synonyms  tick.png tick.png Supports Solr and Wordnet synonym format
Query-time synonyms  tick.png especially via  cross.png Technically, yes, but practically no because multi-word/phrase query-time synonyms are not supported. See  and  blog for nuances.
Multiple indexes  tick.png tick.png
Near-Realtime Search/Indexing  tick.png tick.png
Complex documents  tick.png tick.png
Schemaless  tick.png 4.4+ tick.png
Multiple document types per schema  cross.png One set of fields per schema, one schema per core tick.png
Online schema changes  tick.png Schemaless mode or via dynamic fields. tick.png Only backward-compatible changes.
Apache Tika integration  tick.png tick.png
Dynamic fields  tick.png tick.png
Field copying  tick.png tick.png via multi-fields
Hash-based deduplication  tick.png tick.png  or 

 

Searching

Feature Solr 6.2.1 ElasticSearch 5.0
Lucene Query parsing  tick.png tick.png
Structured Query DSL  cross.png Need to programmatically create queries if going beyond Lucene query syntax. tick.png
Span queries  tick.png via  tick.png
Spatial/geo search  tick.png tick.png
Multi-point spatial search  tick.png tick.png
Faceting  tick.png tick.png Top N term accuracy can be controlled with 
Advanced Faceting  tick.png  tick.png 
Geo-distance Faceting tick.png tick.png
Pivot Facets  tick.png tick.png
More Like This tick.png tick.png
Boosting by functions  tick.png tick.png
Boosting using scripting languages  cross.png tick.png
Push Queries  cross.png tick.png Percolation. Distributed percolation supported in 1.0
Field collapsing/Results grouping  tick.png tick.png
Query Re-Ranking  tick.png tick.png via  or 
Index-based Spellcheck  tick.png tick.png 
Wordlist-based Spellcheck  tick.png cross.png
Autocomplete tick.png tick.png
Query elevation  tick.png tick.png
Intra-index joins  tick.png via parent-child query tick.png via has_children and top_children queries
Inter-index joins  tick.png Joined index has to be single-shard and replicated across all nodes. cross.png
Resultset Scrolling  tick.png New to 4.7.0 tick.png via scan search type
Filter queries  tick.png tick.png also supports filtering by native scripts
Filter execution order  tick.png local params and cache property tick.png
Alternative QueryParsers  tick.png DisMax, eDisMax tick.png query_string, dis_max, match, multi_match etc
Negative boosting  tick.png but awkward. Involves positively boosting the inverse set of negatively-boosted documents. tick.png
Search across multiple indexes tick.png it can search across multiple compatible collections tick.png
Result highlighting tick.png tick.png
Custom Similarity  tick.png tick.png
Searcher warming on index reload  tick.png tick.png 
Term Vectors API tick.png tick.png

 

Customizability

Feature Solr 6.2.1 ElasticSearch 5.0
Pluggable API endpoints  tick.png tick.png
Pluggable search workflow  tick.png via SearchComponents cross.png
Pluggable update workflow  tick.png via  cross.png
Pluggable Analyzers/Tokenizers tick.png tick.png
Pluggable QueryParsers  tick.png tick.png
Pluggable Field Types tick.png tick.png
Pluggable Function queries tick.png tick.png
Pluggable scoring scripts cross.png tick.png
Pluggable hashing  tick.png tick.png
Pluggable webapps  cross.png cross.png [site plugins DEPRECATED in 5.x] 
Automated plugin installation  cross.png tick.png Installable from GitHub, maven, sonatype or elasticsearch.org

 

Distributed

Feature Solr 6.2.1 ElasticSearch 5.0
Self-contained cluster  cross.png Depends on separate ZooKeeper server tick.png Only Elasticsearch nodes
Automatic node discovery tick.png ZooKeeper tick.png internal Zen Discovery or ZooKeeper
Partition tolerance tick.png The partition without a ZooKeeper quorum will stop accepting indexing requests or cluster state changes, while the partition with a quorum continues to function. cross.png Partitioned clusters can diverge unless discovery.zen.minimum_master_nodes set to at least N/2+1, where N is the size of the cluster. If configured correctly, the partition without a quorum will stop operating, while the other continues to work. See 
Automatic failover tick.png If all nodes storing a shard and its replicas fail, client requests will fail, unless requests are made with the shards.tolerant=true parameter, in which case partial results are retuned from the available shards. tick.png
Automatic leader election tick.png tick.png
Shard replication tick.png tick.png
Sharding  tick.png tick.png
Automatic shard rebalancing  cross.png tick.png it can be machine, rack, availability zone, and/or data center aware. Arbitrary tags can be assigned to nodes and it can be configured to not assign the same shard and its replicates on a node with the same tags.
Change # of shards tick.png Shards can be added (when using implicit routing) or split (when using compositeId). Cannot be lowered. Replicas can be increased anytime. cross.png each index has 5 shards by default. Number of primary shards cannot be changed once the index is created. Replicas can be increased anytime.
Shard splitting tick.png cross.png
Relocate shards and replicas  tick.png can be done by creating a shard replicate on the desired node and then removing the shard from the source node tick.png can move shards and replicas to any node in the cluster on demand
Control shard routing  tick.png shards or _route_ parameter tick.png routing parameter
Pluggable shard/replica assignment tick.png  tick.png Probabilistic shard balancing with 
Consistency Indexing requests are synchronous with replication. A indexing request won't return until all replicas respond. No check for downed replicas. They will catch up when they recover. When new replicas are added, they won't start accepting and responding to requests until they are finished replicating the index. Replication between nodes is synchronous by default, thus ES is consistent by default, but it can be set to asynchronous on a per document indexing basis. Index writes can be configured to fail is there are not sufficient active shard replicas. The default is quorum, but all or one are also available.

 

Misc

Feature Solr 6.2.1 ElasticSearch 5.0
Web Admin interface tick.png bundled with Solr tick.png Marvel or Kibana apps
Visualisation
Hosting providers , , , , ,  , , , , , , 

 

Thoughts...

I'm embedding my answer to this "Solr-vs-Elasticsearch" Quora question verbatim here:

1. Elasticsearch was born in the age of REST APIs. If you love REST APIs, you'll probably feel more at home with ES from the get-go. I don't actually think it's 'cleaner' or 'easier to use', but just that it is more aligned with web 2.0 developers' mindsets.

2. Elasticsearch's Query DSL syntax is really flexible and it's pretty easy to write complex queries with it, though it does border on being verbose. Solr doesn't have an equivalent, last I checked. Having said that, I've never found Solr's query syntax wanting, and I've always been able to easily write a custom SearchComponent if needed (more on this later).
3. I find Elasticsearch's documentation to be pretty awful. It doesn't help that some examples in the documentation are written in YAML and others in JSON. I wrote a ES code parser once to auto-generate documentation from Elasticsearch's source and found a number of discrepancies between code and what's documented on the website, not to mention a number of undocumented/alternative ways to specify the same config key. 
By contrast, I've found Solr to be consistent and really well-documented. I've found pretty much everything I've wanted to know about querying and updating indices without having to dig into code much. Solr's schema.xml and solrconfig.xml are *extensively* documented with most if not all commonly used configurations. 
4. Whilst what Rick says about ES being mostly ready to go out-of-box is true, I think that is also a possible problem with ES. Many users don't take the time to do the most simple config (e.g. type mapping) of ES because it 'just works' in dev, and end up running into issues in production. 
And once you do have to do config, then I personally prefer Solr's config system over ES'. Long JSON config files can get overwhelming because of the JSON's lack of support for comments. Yes you can use YAML, but it's annoying and confusing to go back and forth between YAML and JSON. 
5. If your own app works/thinks in JSON, then without a doubt go for ES because ES thinks in JSON too. Solr merely supports it as an afterthought. ES has a number of nice JSON-related features such as parent-child and nested docs that makes it a very natural fit. Parent-child joins are awkward in Solr, and I don't think there's a Solr equivalent for ES Inner hits.
6. ES doesn't require ZooKeeper for it's 'elastic' features which is nice coz I personally find ZK unpleasant, but as a result, ES does have issues with split-brain scenarios though (google 'elasticsearch split-brain' or see this: Elasticsearch Resiliency Status).
7. Overall from working with clients as a Solr/Elasticsearch consultant, I've found that developer preferences tend to end up along language party lines: if you're a Java/c# developer, you'll be pretty happy with Solr. If you live in Javascript or Ruby, you'll probably love Elasticsearch. If you're on Python or PHP, you'll probably be fine with either. 
Something to add about this: ES doesn't have a very elegant Java API IMHO (you'll basically end up using REST because it's less painful), whereas Solrj is very satisfactory and more efficient than Solr's REST API. If you're primarily a Java dev team, do take this into consideration for your sanity. There's no scenario in which constructing JSON in Java is fun/simple, whereas in Python its absolutely pain-free, and believe me, if you have a non-trivial app, your ES json query strings will be works of art. 
8. ES doesn't have in-built support for pluggable 'SearchComponents', to use Solr's terminology. SearchComponents are (for me) a pretty indispensable part of Solr for anyone who needs to do anything customized and in-depth with search queries. 
Yes of course, in ES you can just implement your own RestHandler, but that's just not the same as being able to plug-into and rewire the way search queries are handled and parsed. 
9. Whichever way you go, I highly suggest you choose a client library which is as 'close to the metal' as you can get. Both ES and Solr have *really* simple search and updating search APIs. If a client library introduces an additional DSL layer in attempt to 'simplify', I suggest you think long and hard about using it, as it's likely to complicate matters in the long-run, and make debugging and asking for help on SO more problematic. 
In particular, if you're using Rails + Solr, consider using rsolr/rsolr
instead of sunspot/sunspot if you can help it. ActiveRecord is complex code and sufficiently magical. The last thing you want is more magic on top of that. 
---
To conclude, ES and Solr have more or less feature-parity and from a feature standpoint, there's rarely one reason to go one way or the other (unless your app lives/breathes JSON). Performance-wise, they are also likely to be quite similar (I'm sure there are exceptions to the rule. ES' relatively new autocomplete implementation, for example, is a pretty dramatic departure from previous Lucene/Solr implementations, and I suspect it produces faster responses at scale).
ES does offer less friction from the get-go and you feel like you have something working much quicker, but I find this to be illusory. Any time gained in this stage is lost when figuring out how to properly configure ES because of poor documentation - an inevitablity when you have a non-trivial application. 
Solr encourages you to understand a little more about what you're doing, and the chance of you shooting yourself in the foot is somewhat lower, mainly because you're forced to read and modify the 2 well-documented XML config files in order to have a working search app.
---
EDIT on Nov 2015: 
ES has been gradually distinguishing itself from Solr when it comes to data analytics. I think it's fair to attribute this to the immense traction of the ELK stack in the logging, monitoring and analytic space. My guess is that this is where Elastic (the company) gets the majority of its revenue, so it makes perfect sense that ES (the product) reflects this.
We see this manifesting primarily in the form of aggregations, which is a more flexible and nuanced replacement for facets. Read more about aggregations here: Migrating to aggregations
Aggregations have been out for a while now (since 1.4), but with the recently released ES 2.0 comes pipeline aggregations, which let you compute aggregations such as derivatives, moving averages, and series arithmetic on the results of other aggregations. Very cool stuff, and Solr simply doesn't have an equivalent. More on pipeline aggregations here: Out of this world aggregations
If you're currently using or contemplating using Solr in an analytics app, it is worth your while to look into ES aggregation features to see if you need any of it.

Elasticsearch与Solr的比较

当单纯的对已有数据进行搜索时,Solr更快。

当实时建立索引时, Solr会产生io阻塞,查询性能较差, Elasticsearch具有明显的优势。

随着数据量的增加,Solr的搜索效率会变得更低,而Elasticsearch却没有明显的变化。

综上所述,Solr的架构不适合实时搜索的应用。

实际生产环境测试

下图为将搜索引擎从Solr转到Elasticsearch以后的平均查询速度有了50倍的提升。

average_execution_time

Elasticsearch 与 Solr 的比较总结

  • 二者安装都很简单;
  • Solr 利用 Zookeeper 进行分布式管理,而 Elasticsearch 自身带有分布式协调管理功能;
  • Solr 支持更多格式的数据,而 Elasticsearch 仅支持json文件格式;
  • Solr 官方提供的功能更多,而 Elasticsearch 本身更注重于核心功能,高级功能多有第三方插件提供;
  • Solr 在传统的搜索应用中表现好于 Elasticsearch,但在处理实时搜索应用时效率明显低于 Elasticsearch。

Solr 是传统搜索应用的有力解决方案,但 Elasticsearch 更适用于新兴的实时搜索应用。

 

参考:

http://solr-vs-elasticsearch.com/

http://i.zhcy.tk/blog/elasticsearchyu-solr/

http://logz.io/blog/solr-vs-elasticsearch/

https://trends.google.com/trends/explore?date=all&q=apache%20solr,elasticsearch

 

转载于:https://my.oschina.net/cnarthurs/blog/862904

你可能感兴趣的文章
Hyper-V Server存储介绍
查看>>
[图示]神相的‘敏捷项目管理’
查看>>
更换云服务器上的Python版本
查看>>
Skype for Business Server 2015-04-前端服务器-7-部署
查看>>
你的Postfix邮件服务器安全么?
查看>>
站在巨人肩膀看清IT馅饼和陷阱
查看>>
Android系统匿名共享内存(Anonymous Shared Memory)C++调用接口分析(4)
查看>>
Windows 7 的一些使用技巧
查看>>
Spring Boot中使用Redis数据库
查看>>
完整性检查工具Nabou
查看>>
Exchange企业实战技巧(26)在Outlook中打开多个邮箱
查看>>
一个Linux小型综合实验
查看>>
软件定义架构让超融合世界更加复杂
查看>>
Wi-Fi当前的趋势及对IT和物联网的影响
查看>>
服务器遭受攻击后 该如何有效地处理?
查看>>
未来 Web 设计的 7 大趋势
查看>>
赛门铁克推生物特征识别科技 告别密码
查看>>
原来CSS这样写是会让App崩溃的
查看>>
《Cocos2D权威指南》——第2章 你的第一款iPhone游戏—垂直射击游戏 2.1 准备工作...
查看>>
信息化技术让智慧城市感知增强
查看>>